This resource page features course content from the Knight Center for Journalism in the America's massive open online course (MOOC), titled "Introduction to R for journalists: How to find great stories in data." The five-week course took place from June 23 to August 26, 2018. We are now making the content free and available to students who took the course and anyone else who is interested in learning how to use the statistical computing and graphics language R to enhance data analysis and reporting process.
The course, which was supported by the Knight Foundation, was taught by Andrew Ba Tran. He created and curated the content for the course, which includes video classes and tutorials, readings, exercises, and more.
The course materials are broken up into five modules:
As you review this resource page, we encourage you to watch the videos, read the readings, and complete the exercises as time allows. The course materials build off each other, but the videos and readings also act as standalone resources that you can return to over time.
We hope you enjoy the materials and share them with others who is interested in learning how to use the statistical computing and graphics language R to enhance data analysis and reporting process. If you have any questions, please contact us at journalismcourses@austin.utexas.edu.
Andrew Ba Tran is a data reporter for The Washington Post’s rapid response investigative team. He previously was a data editor at The Connecticut Mirror’s TrendCT.org, a nonprofit news site that helped the public find and understand data and its potential impact on the community. Prior to that, Andrew was a data producer at The Boston Globe and he’s also worked in newsrooms at The Virginian-Pilot and the Sun-Sentinel. He has contributed to investigative projects and breaking news coverage that were awarded the Pulitzer Prize.
He’s a Metpro Fellow, a Chips Quinn Scholar, and a graduate of the University of Texas. Andrew has taught data journalism as a Koeppel Fellow at Wesleyan University and at American University. He’s from Dallas, Texas.
Congratulations on signing up for our new online course "Intro to R for journalists: How to find great stories in data." During the 5 weeks, you will learn how to use the statistical computing and graphics language R to enhance your data analysis and reporting process.
Introduction
1. Downloading R & R Studio for Mac
2. Installing Git for a Mac
3. Downloading R & R Studio for PC
4. Installing Git for a PC
5. Data Journalism with R at FiveThirtyEight
6. The Growing Popularity of R in Data Journalism
7. Swiss Public Broadcast GitHub site
8. How Jared Kushner built a luxury skyscraper using loans meant for job-starved areas [The Washington Post]
*This is an example of how R is used in a news story
9. Methodology used for the story above
In this module you will be introduced to RStudio and learn how to start a new analysis project. You will learn the basics of how to import and explore data with R.
This module will cover:
Welcome Video
(Some videos have Japanese transcripts, translated by Hiroyuki Yokoyama.)
1. Welcome to Intro to R for Journalists
Watch Video Files for Module 1 (1) Files for Module 1 (2)
How to Use R
2. R Studio Tour
3. Introduction to R
Watch Video 1 Watch Video 2 R Script Exercise
4. Data Structures
5. Case Data
Importing/Exporting Data
6. Importing Data
7. CSV Files
Watch Video R Script Exercise PDF PDF-Japanese
8. Excel Data
Watch Video R Script Exercise PDF PDF-Japanese
9. Fixed Width (Optional)
10. Fixed json (Optional)
Watch Video R Script PDF PDF-Japanese
11. Data Pasta (Optional)
12. SPSS
13. Bulk Combine
Readings
1. Meet the 28-year-old grad student who just shook the global austerity movement By [New York Magazine]
2. Storytelling with R [ProPublica] (video - 20 min)
3. Why is R so Hard to learn By Robert A. Muenchen [r4stats.com]
4. What I use to visualize data [FlowingData]
Optional Materials
In this module you will learn how to transform and analyze data the tidy way using the dplyr package.
This module will cover:
Facebook Live with Andrew Tran
Video Class
1. Wrangling
2. dplyr
Watch Video 1 Watch Video 2 Watch Video 3
3. tidyr
4. Case study
5. Strings
6. Dates
Readings
1. Text analysis of Trump tweets confirms he only writes anger ones By David Robinson [varianceexplained.org]
2. Tidy Data By Hadley Wickham [Journal of Statistical Software]
3. Serial Killers Should Fear This Algorithm By Robert Kolker [Bloomberg]
4. When algorithms decide what you pay By Julia Angwin, Terry Parris Jr. and Surya Mattu [ProPublica]
Optional Materials
In this module, you’ll learn about the grammar of graphics and how to use the ggplot2 package to make quick exploratory data visualizations.
This module will cover:
Video Class
1. Intro ggplot: Visualizing Data
2. ggplot
Watch Video 1 Watch Video 2 Watch Video 3 R Script Exercise
3. ggstyle
4. gghighlight
5. ggextensions
ggplot2 Resources
1. Using ProPublica’s “statefaces” in ggplot2 [hrbrmstr]
2. ggplot2 cheatsheet [RStudio]
3. From Data to Viz [data-to-viz]
ggplot2 Examples
1. ggplot2 as a creativity engine and other ways R is transforming quantitative journalism [Financial Times]
2. Gender gap: Six things we’ve learnt By Clara Guibourg [BBC]
3. The complete history of every No. 1 tennis player in the world By Duc-Quang Nguyen [SWI]
4. The complete history of every No. 1 tennis player in the world [GH]
5. Huge increase in arrests of homeless in L.A - but mostly for minor offenses Gale Holland, Christine Zhang [Los Angeles Times]
6. Huge increase in arrests of homeless in L.A - but mostly for minor offenses [GH]
7. What I use to visualize data [FlowingData]
In this module, you will learn how to visualize geographical data and look for neighborhood racial profiling disparities using Census data and traffic stop data from Connecticut.
This module will cover:
Video Class
1. Intro to Spatial
2. Static
Watch Video 1 Watch Video 2 R Script Exercise 1 Exercise 2 PDF
3. Geolocating
4. Case Study
Watch Video 1 Watch Video 2 R Script PDF
5. Interactive Dots (OPTIONAL)
6. Choropleths (OPTIONAL)
Watch Video R Script Exercise PDF
Readings
1. Behind the dialect map interactive: How an intern created The New York Times' most popular piece of content in 2013 by [Knight Lab]
2. Geographic divide of Oscar movies By Matt Daniels, Ilia Blinderman (Maps) & Russell Goldenberg (Maps) [The Pudding]
3. Regional smoothing using R By Ilia Blinderman [The Pudding]
4. Buzzfeed 311 calls increase in gentrifying neighborhoods By Lam Thuy Vo [Buzzfeed]
5. Buzzfeed 311 calls increase in gentrifying neighborhoods By Lam Thuy Vo [GH]
Optional Materials
1. How to Create State and County Maps Easily in R [Urban Institute]
2. Beautiful thematic maps with ggplot2 (only)
3. Geofacets [hafen]
4. Spatial data analysis and modeling with R [rspatial]
5. Murder with Impunity [Washington Post]
6. Spies in the Skies By Peter Aldhous and Charles Seife [Buzzfeed]
7. Spies in the Skies By Peter Aldhous and Charles Seife [GH]
8. America is more diverse than ever — but still segregated [Washington Post]
In this module you will learn how to use RMarkdown to present your analysis in a narrative format. You’ll also learn how to log changes to your project with version-control software and publish your analysis on the Internet.
This module will cover:
Google Hangout with Andrew Ba Tran and Hadley Wickham
Video Class
1. Publishing Intro
2. R Markdown
3. More r Markdown
4. Workflow Practices
5. Git
6. Github Pages
7. Best Practices & Bye
Readings
1. How open should open source data visualization be? [flowingdata]
2. How Trump is changing the face of legal immigration [The Washington Post]
3. How Trump is changing the face of legal immigration [GH]
4. Excuse me, do you have a moment to talk about version control? [peerj]