texas-moody

Self-Directed Course

Intro to R for journalists: How to find great stories in data

Instructor(s):   Andrew Ba Tran
Choose from the options below

This resource page features course content from the Knight Center for Journalism in the America's massive open online course (MOOC), titled "Introduction to R for journalists: How to find great stories in data." The five-week course took place from June 23 to August 26, 2018. We are now making the content free and available to students who took the course and anyone else who is interested in learning how to use the statistical computing and graphics language R to enhance data analysis and reporting process.

The course, which was supported by the Knight Foundation, was taught by Andrew Ba Tran. He created and curated the content for the course, which includes video classes and tutorials, readings, exercises, and more.

 The course materials are broken up into five modules:

  • Module 1: Offers an introduction to RStudio and how to start a new analysis project. You will learn the basics of how to import and explore data with R.
  • Module 2: Covers how to transform and analyze data the tidy way using the dplyr package.
  • Module 3: Covers the grammar of graphics and how to use the ggplot2 package to make quick exploratory data visualizations.
  • Module 4: Covers how to visualize geographical data and look for neighborhood racial profiling disparities using Census data and traffic stop data from Connecticut.
  • Module 5: Provides tutorials about RMarkdown. You will learn how to use it to present your analysis in a narrative format. You’ll also learn how to log changes to your project with version-control software and publish your analysis on the Internet. Also, Hadley Wickham, Chief Data Scientist at RStudio, joined in for a Google Hangout. Wickham is the creator of several notable and widely used data analysis packages collectively known as the "tidyuniverse." Link to the video is listed under this Module.

As you review this resource page, we encourage you to watch the videos, read the readings, and complete the exercises as time allows. The course materials build off each other, but the videos and readings also act as standalone resources that you can return to over time.

We hope you enjoy the materials and share them with others who is interested in learning how to use the statistical computing and graphics language R to enhance data analysis and reporting process. If you have any questions, please contact us at journalismcourses@austin.utexas.edu.

About the Instructor

Andrew Ba Tran is a data reporter for The Washington Post’s rapid response investigative team.
He previously was a data editor at The Connecticut Mirror's TrendCT.org, a nonprofit news site that helped the public find and understand data and its potential impact on the community. Prior to that, Andrew was a data producer at The Boston Globe and he’s also worked in newsrooms at The Virginian-Pilot and the Sun-Sentinel. He has contributed to investigative projects and breaking news coverage that were awarded the Pulitzer Prize.
He’s a Metpro Fellow, a Chips Quinn Scholar, and a graduate of the University of Texas. Andrew has taught data journalism as a Koeppel Fellow at Wesleyan University and at American University. He’s from Dallas, Texas.

Welcome to the introduction module of our course!

Congratulations on signing up for our new online course "Intro to R for journalists: How to find great stories in data." During the 5 weeks,  you will learn how to use the statistical computing and graphics language R to enhance your data analysis and reporting process.

 Introduction

1. Welcome video

Watch Video   

2. Course syllabus

Syllabus

 Materials

Programming in R

In this module you will be introduced to RStudio and learn how to start a new analysis project. You will learn the basics of how to import and explore data with R.

 This module will cover:

  • A tour of the RStudio IDE
  • Syntax for coding in R
  • Creating R scripts
  • Importing packages
  • Good habits for workflow and documentation habits
  • How to import data like CSVs, Excel spreadsheets, XML
  • Exploring the data’s structure

 Welcome Video

(Some videos have Japanese transcripts, translated by Hiroyuki Yokoyama.)

1. Welcome to Intro to R for Journalists

Watch Video Files for Module 1 (1) Files for Module 1 (2)

 How to Use R

2. R Studio Tour

Watch Video

3. Introduction to R

Watch Video 1  Watch Video 2  R Script Exercise 

PDF PDF-Japanese

4. Data Structures

Watch Video R Script Exercise 

PDF  PDF-Japanese

5. Case Data

Watch Video  PDF  PDF-Japanese

 Importing/Exporting Data

6. Importing Data

Watch Video  PDF

7. CSV Files

Watch Video R Script  Exercise  PDF  PDF-Japanese

8. Excel Data

Watch Video R Script  Exercise  PDF  PDF-Japanese

9. Fixed Width (Optional)

Watch Video R Script PDF 

10. Fixed json (Optional)

Watch Video R Script PDF  PDF-Japanese

11. Data Pasta (Optional)

Watch Video R Script PDF

12. SPSS

Watch Video R Script PDF

13. Bulk Combine

Watch Video R Script PDF

 Readings

1. Meet the 28-year-old grad student who just shook the global austerity movement By Kevin Roose [New York Magazine]

2. Storytelling with R [ProPublica] (video - 20 min)

3. Why is R so Hard to learn By Robert A. Muenchen [r4stats.com]

4. What I use to visualize data [FlowingData]

 Optional Materials

Wrangling Data

In this module you will learn how to transform and analyze data the tidy way using the dplyr package.

 This module will cover:

  • Filtering, selecting, arranging, mutating, summarizing data
  • How to join two data sets for more insight
  • Chaining analyses functions with pipes for efficiency and readability

 Facebook Live with Andrew Tran

 Video Class

 Readings

1. Text analysis of Trump tweets confirms he only writes anger ones By David Robinson [varianceexplained.org]

2. Tidy Data By Hadley Wickham [Journal of Statistical Software]

3. Serial Killers Should Fear This Algorithm By Robert Kolker [Bloomberg]

4. When algorithms decide what you pay By Julia Angwin, Terry Parris Jr. and Surya Mattu [ProPublica]

 Optional Materials

Visualizing Data

In this module, you’ll learn about the grammar of graphics and how to use the ggplot2 package to make quick exploratory data visualizations.

 This module will cover:

  • The aesthetics of data visualizations
  • How to create different charts like bars, boxes, lines, scatterplots
  • Grouping for charts
  • How to create facets or small multiples with the data
  • Labels and titles for visualizations

 Video Class

1. Intro ggplot: Visualizing Data

Watch Video PDF 

2. ggplot

Watch Video 1 Watch Video 2 Watch Video 3 R Script Exercise 

PDF

3. ggstyle

Watch Video 1 Watch Video 2 

R Script Exercise 

PDF

4. gghighlight

Watch Video R Script PDF

5. ggextensions

Watch Video

 ggplot2 Resources

 ggplot2 Examples

Spatial Analysis

In this module, you will learn how to visualize geographical data and look for neighborhood racial profiling disparities using Census data and traffic stop data from Connecticut.

 This module will cover:

  • Creating interactive maps with the R Leaflet package
  • How to geolocate addresses in R
  • Importing and visualizing shapefiles
  • Points in a polygon analysis that merge location data and boundaries for deeper insights

 Video Class

1. Intro to Spatial

Watch Video  PDF 

2. Static

Watch Video 1 Watch Video 2 R Script Exercise 1 Exercise 2 PDF

3. Geolocating

Watch Video R Script PDF

4. Case Study

Watch Video 1 Watch Video 2 R Script PDF

5. Interactive Dots (OPTIONAL)

Watch Video R Script PDF

6. Choropleths (OPTIONAL)

Watch Video R Script Exercise PDF

 Readings

1. Behind the dialect map interactive: How an intern created The New York Times' most popular piece of content in 2013 by Ryan Graff [Knight Lab]

2. Geographic divide of Oscar movies By Matt Daniels, Ilia Blinderman (Maps) & Russell Goldenberg (Maps) [The Pudding]

3. Regional smoothing using R By Ilia Blinderman [The Pudding]

4. Buzzfeed 311 calls increase in gentrifying neighborhoods By Lam Thuy Vo [Buzzfeed]

5. Buzzfeed 311 calls increase in gentrifying neighborhoods By Lam Thuy Vo [GH]

 Optional Materials

1. How to Create State and County Maps Easily in R  [Urban Institute]

2. Beautiful thematic maps with ggplot2 (only)

3. Geofacets [hafen]

4. Spatial data analysis and modeling with R  [rspatial]

5. Murder with Impunity [Washington Post]

6. Spies in the Skies By Peter Aldhous and Charles Seife [Buzzfeed]

7. Spies in the Skies By Peter Aldhous and Charles Seife [GH]

8. America is more diverse than ever — but still segregated [Washington Post]

Publishing for Reproducibility

In this module you will learn how to use RMarkdown to present your analysis in a narrative format. You’ll also learn how to log changes to your project with version-control software and publish your analysis on the Internet.

 This module will cover:

  • The git version control software and its integration with GitHub
  • How data journalists use GitHub and RMarkdown and other notebooks to publish their work
  • How to use the Markdown markup language to annotate RMarkdown
  • How to create a new git code repository and start tracking code
  • How to connect the repository to GitHub and publish to Github Pages

 Google Hangout with Andrew Ba Tran and Hadley Wickham

 Video Class

1. Publishing Intro

Watch Video  PDF

2. R Markdown

Watch Video  PDF

3. More r Markdown

Watch Video  PDF

4. Workflow Practices

Watch Video  PDF

5. Git

Watch Video  PDF 1  PDF 2  PDF 3

6. Github Pages

Watch Video  PDF

7. Best Practices & Bye

Watch Video  PDF

 Readings