Instructor: Andrew Ba Tran

July 23-August 26

With support from:

Welcome to the massive open online course (MOOC) “Intro to R for Journalists: How to Find Great Stories in Data,” from the Knight Center for Journalism in the Americas at the University of Texas at Austin. This is a free course open to anyone interested in quickly asking questions of and assessing data. It is meant for journalists, editors, journalism professors and students, but anyone from anywhere in the world who has an interest in data analysis and visualization is welcome to enroll.

Registering in the platform is easy. Please follow these steps:

  • 1. Create an account in the Journalism Courses system. Even if you’ve taken a course with us before, you may need to create a new account. Check to see if your previous username and password work before creating a new account.
  • 2. Wait for a confirmation in your email indicating that your account has been created. If you do not receive this, please check your spam folder.
  • 3. Log into the platform, scroll down until you see the course listings, and click on the “Intro to R for Journalists: How to Find Great Stories in Data” course.
  • 4. A button will appear. Click “Enroll” to enroll yourself in the course. You will be able to access the course from the “My Courses” menu at the top of the page.
  • 5. You will receive an email confirming your enrollment.

Please add the email addresses and to your address book to ensure you receive emails about the course.

In this course, you will learn how to use the statistical computing and graphics language R to enhance your data analysis and reporting process. You’re going to dive right into dealing with the messy and disparate types of data that journalists regularly encounter. You’ll get an introduction to packages that will help you wrangle data efficiently and effectively. You’ll learn how to create visualizations that will communicate what you’ve discovered in your data with charts and maps. Finally, you’ll learn how to turn your analyses into notebooks that can be shared with others in your newsroom and hopefully eventually published online for reproducibility.

This course is designed to give you a taste of all the possibilities from programming in R. It'll emphasize packages that will help you do data analysis and visualization. Once you see how fun and easy it can be, you'll be more open to tackling the more challenging task of robust statistical analysis in R, with all its quirks. I'll be sure to include links and suggestions for ways to build on top of what you'll learn in this course.

Upon completion of this course, students will be able to:

  • Import different types of data into R
  • Wrangle, clean, join, and analyze data
  • Create exploratory and publishable visualizations
  • Make static and interactive maps with geolocated data and the Census
  • Join the reproducible research movement and publish your methodology

We welcome anyone interested in quickly asking questions of and assessing data. Journalists, editors, technology professors, students and faculty who have an interest in data analysis and visualization should enroll. If you know how to use spreadsheets expertly or even how one looks like, this course is for you. This course is free and is using open-source software and open-source packages created by members of the R community. Join us with an open mind and with the flexibility of coping with technical hiccups with creativity and generosity.

This course requires you to have access to an Internet connection and Web browser, as well as R and RStudio. We will provide instructions and tutorials on how to install R and RStudio for those who need help doing so.

First of all, note that this is an asynchronous course. That means there are no live events scheduled at specific times. You can log in to the course and complete activities throughout the week at your own pace, at the times and on the days that are most convenient for you.

Despite its asynchronous nature, there are still structures in place for the duration of the course. The material is organized into five weekly modules.  Each module will be taught by one instructor, Andrew Tran, and will cover a different topic through videos, presentations, readings and discussion forums. There will be a quiz each week to test the knowledge you've gained through the course materials. The weekly quizzes, and weekly participation in the discussion forums, are the basic requirements for earning a certificate of participation at the end of the course.

This course is very flexible, and if you are behind with the materials, you have the entire length of the course to complete them. We do recommend you complete each of the following before the end of each week so you don’t fall behind:

  • Video lectures
  • Readings and handouts/exercises
  • Participation in the discussion forums
  • Quizzes covering concepts from video lectures and/or readings

Introduction Module: R

In this introductory module, you will learn how to configure your computer to work with R. Before you can use it analyze data, your computer needs the following tools installed:

  • A command-line interface to interact with your computer
  • The git version control software and a GitHub account
  • The latest version of R
  • The latest version of RStudio
  • An API key from (

Module 1: Programming in R

This week you will be introduced to RStudio and learn how to start a new analysis project. You will learn the basics of how to import and explore data with R.

This module will cover:

  • A tour of the RStudio IDE
  • Syntax for coding in R
  • Creating R scripts
  • Importing packages
  • Good habits for workflow and documentation habits
  • How to import data like CSVs, Excel spreadsheets, XML
  • Exploring the data’s structure

Module 2: Wrangling data

This week you will learn how to transform and analyze data the tidy way using the dplyr package.

This module will cover:

  • Filtering, selecting, arranging, mutating, summarizing data
  • How to join two data sets for more insight
  • Chaining analyses functions with pipes for efficiency and readability

Module 3: Visualizing data

This week, you’ll learn about the grammar of graphics and how to use the ggplot2 package to make quick exploratory data visualizations.

This module will cover:

  • The aesthetics of data visualizations
  • How to create different charts like, bar, box, line, scatterplots
  • Grouping for charts
  • How to create facets or small multiples with the data
  • Labels and titles for visualizations

Module 4: Spatial analysis

This week you will learn how to visualize geographical data and look for neighborhood racial profiling disparities using Census data and traffic stop data from Connecticut.

This module will cover:

  • Creating interactive maps with the R Leaflet package
  • How to geolocate addresses in R
  • Importing and visualizing shapefiles
  • Points in a polygon analysis that merges location data and boundaries for deeper insights

Module 5: Publishing for reproducibility

This week you will learn how to use RMarkdown to present your analysis in a narrative format. You’ll also learn how to log changes to your project with version-control software and publish your analysis on the Internet.

This module will cover:

  • The git version control software and its integration with Github
  • How data journalists use GitHub and RMarkdown and other notebooks to publish their work
  • How to use the Markdown markup language to annotate RMarkdown
  • How to create a new git code repository and start tracking code
  • How to connect the repository to GitHub and publish to Github Pages

Andrew Ba Tran is a data reporter for the Washington Post’s rapid response investigative team.

He previously was a data editor at The Connecticut Mirror's, a non-profit news site that helped the public find and understand data and its potential impact on the community.

Prior to that, Andrew was a data producer at The Boston Globe and he’s also worked in newsrooms at The Virginian-Pilot and the Sun-Sentinel. He has contributed to investigative projects and breaking news coverage that were awarded the Pulitzer Prize.

He’s a Metpro Fellow, a Chips Quinn Scholar, and a graduate of the University of Texas.

Andrew has taught data journalism as a Koeppel Fellow at Wesleyan University and at American University.

He’s from Dallas, Texas.

A certificate of completion is available for those who meet all of the course requirements, and pay online an administrative fee of $30 (thirty U.S. dollars), using a credit card. After the course ends, the Knight Center will send a message with an online form you can submit, if you are interested in the certificate. The online form will be available during the last week of the course. After the form closes, the Knight Center team will verify if you fulfilled the course requirements. This process takes at least a week after the form closes. After confirmation of course requirements, the Knight Center will send a message through the course platform with confirmation that you fulfilled the course requirements and you qualify for the certificate. In this message we will also send you instructions for how to make payment.

Those who meet all of the course requirements will be able to download a PDF version of a certificate of completion. The Knight Center for Journalism in the Americas awards the certificate to attest to students' participation in the online course, and no formal course credit of any kind is associated with it.

If you'd like to receive a certificate of completion for the course, you must meet the following requirements:

  • Complete weekly quizzes with a minimum score of 70% by the weekly deadline.
  • Watch weekly video lectures and review weekly readings.
  • Participate in at least one discussion forum each week by the given deadline.

If all requirements are met, an electronic certificate will be emailed to the student.

Please add the email addresses and to your address book to ensure you receive emails about the course.

Connect With Us:

Facebook Twitter