Archived Course

Advanced data journalism — Doing more with R

September 5 to October 2, 2022
Instructor(s):   Andrew Ba Tran

Welcome to the Knight Center's new Big Online Course (BOC), "Advanced Data Journalism — Doing More with R," organized by the Knight Center for Journalism in the Americas. During this four-week course BOC, which will be held from September 5 to October 2, 2022, students will learn how to obtain, analyze, and visualize data using the statistical programming language R. Watch the video below and read on for more details, including instructions on how to register.

This BOC costs $95, which includes full access to the course as well as a certificate of completion for those who meet the course requirements. Our BOCs provide a more advanced level of training and are limited to a few hundred students, allowing for greater interaction between the students and instructor.

Choose from the options below

Registering on the platform is easy. Please follow these steps:

1) Create an account in the Journalism Courses system. Even if you’ve taken a course with us before, you may need to create a new account if your original account has been inactive.

2) Wait for a confirmation in your email indicating that your account has been created. If you do not receive this, please check your spam folder.

3) Pay the $95 registration fee. Click here to pay. The payment must be made only by those who already have an account in the Journalism Courses system. The username and email you used in the system will be required when you complete payment.

Unlike most Knight Center courses, which are free, this Big Online Course (BOC) will provide a more advanced level of training and will be limited to a few hundred students, allowing for greater interaction between the students and instructor.

Once you pay the $95 course fee, you will receive a payment receipt but you will not automatically be enrolled in the course. It may take up to 72 hours for the Knight Center to enroll you. Once enrolled, you will receive an enrollment confirmation email with details on how to access the introductory module of the course. Please add the email addresses journalismcourses@austin.utexas.edu and filipa.rodrigues@utexas.edu to your address book to ensure you receive emails about the course.

Through hands-on tutorials over the next four weeks, we want to make you a better data journalist who can use a few free tools and specific techniques to make maps that help you understand your data and illustrate your stories. We will use real-world datasets along with a dataset of your own choosing.

Provide clear, direct, repeatable steps to make and present data-driven maps you can use to power your journalism for years to come. You will finish this class with the ability to:

  • Analyze data and visualize their results to enhance stories
  • Pull data from APIs and some websites
  • Apply statistical methods to journalism stories

Introduction Module: Getting set up with the class data, and a guide to finding your own

We'll start by on-ramping you into working with R. If you're totally new, in this module we will provide a checklist you should follow to get you up to speed before moving on to the rest of the course (like understanding how to import data). Have a data set in hand or a topic in mind you'd like to apply these skills to throughout this course and talk about it in the discussion board. We'll walk through how this course is going to work– like how code from the interactive modules will interact with the course quizzes.

This module will cover:

  • A quick overview of the lessons
  • Getting set up with the programs and libraries we'll be using/li>
  • How to download course files from Github and load them on your computer
  • A guide to working with interactive tutorials and applying them to course quizzes

Module 1: Analyzing data with R (September 5 - September 11)

We're going to dive right into equipping you with the vocabulary to script out your data analysis, including dealing with common data formats like strings and dates. These are the important building blocks for everything you'll do with your projects.

This module will cover:

  • Getting acquainted with data and R
  • Wrangling data verbs
  • Mutating, arranging, filtering, and summarizing with tidyverse
  • Introduction to dealing with dates with Lubridate

Module 2: More wrangling and intro to exploratory data viz (September 12 - September 18)

We'll focus on the most common data wrangling tasks, like reshaping and joining data. Then we'll try our hand at visualizing data with the grammar of graphics so you can iterate through exploratory analysis.

This module will cover:

  • Joining data for enhanced analysis
  • Transforming data with tidyr
  • Iterative exploratory data visualization with ggplot2

Module 3: Advanced data gathering (September 19 - September 25)

As data journalists, we have to be prepared to deal with data that doesn't come in tabular format that can be opened in Excel. This week, we'll learn how to pull and transform unstructured data online into structured data we can analyze and draw stories from. First, we'll need to wrap our heads around APIs and website structures and also talk about programming concepts such as loops.

This module will cover:

  • How APIs work and how to pull data from them
  • Understanding the structure of websites so you can better scrape it
  • R programing concepts such as loops and parallelization
  • Relevant R packages such as jsonlite, httr, rvest

Module 4: Statistics and stories (September 26 - October 2)

We've made it this far. Now let's use R for its originally created purpose: statistics. We’ll show you how to do some regression analysis and modeling and then talk about why you may want to avoid using them in your story altogether. We'll go over examples of how journalists have used the conclusions or simpler alternative methods in stories.

This module will cover:

  • Regression analysis in R
  • Translating conclusions into stories
  • Considering other methods of incorporating statistics into stories

AndtrewAndrew Ba Tran is a data reporter for The Washington Post’s rapid response investigative team. He previously was a data editor at The Connecticut Mirror’s TrendCT.org, a nonprofit news site that helped the public find and understand data and its potential impact on the community. Prior to that, Andrew was a data producer at The Boston Globe and he’s also worked in newsrooms at The Virginian-Pilot and the Sun-Sentinel. He has contributed to investigative projects and breaking news coverage that were awarded the Pulitzer Prize.

He’s a Metpro Fellow, a Chips Quinn Scholar, and a graduate of the University of Texas. Andrew has taught data journalism as a Koeppel Fellow at Wesleyan University and at American University. He’s from Dallas, Texas.

This class is designed for folks who have had some exposure to R, a free statistical programming language, but if you're totally new to this, there are some materials I've prepared for you to help get you up to speed. It's pretty much pre-class homework. But you'll want to sign up and run through them well before class officially starts.

First of all, note that this is an asynchronous course. That means there are no live classes scheduled at specific times. You can log in to the course and complete activities throughout the week at your own pace, at the times and on the days that are most convenient for you.

Despite its asynchronous nature, there are still structures in place for the duration of the course.

The material is organized into four weekly modules. Each module will be taught by Andrew Ba Tran, an investigative data reporter at The Washington Post, and will cover a different topic through videos, presentations, readings and discussion forums.

There will be a mix of guided R coding exercises that you can run locally or through your browser. The quizzes may require you to rely on answers from these exercises.

These quizzes will be given each week to test the knowledge you've gained through the course materials and in the online exercises. The weekly quizzes, and weekly participation in the discussion forums, are the basic requirements for earning a certificate of participation at the end of the course.

This course is very flexible, and if you are behind with the materials, you have the entire length of the course to complete them. We do recommend you complete each of the following before the end of each week so you don’t fall behind:

  • Video lectures
  • Tutorials and exercises
  • Participation in the discussion forums
  • Quizzes covering concepts from video lectures and/or readings

A certificate of completion is available for those who pay the $95 course fee and meet all of the course requirements. After verifying that these requirements have been met, the Knight Center will send a confirmation message with instructions on how to download the certificate. To be eligible for a certificate, you must:

  • Watch the weekly video classes and read the weekly readings
  • Complete weekly quizzes with a 70% minimum score. (You can retake the quizzes as many times as needed. Only the highest score will be recorded.)
  • Create OR reply to at least one discussion forum each week

The certificate of completion is included in the $95 course fee. No formal course credit of any kind is associated with the certificate. The certificate is awarded by the Knight Center for Journalism in the Americas to attest to the participation in the online course.