Python for Data Journalists: Analyzing Money in Politics

Instructor: Ben Welsh

June 12 - July 9, 2017







Welcome to Python for Data Journalists: Analyzing money in politics! This online course is presented in four weekly modules, starting on June 12, 2017 and ending on July 9, 2017.


Registering in the platform is easy. Please follow these steps:

  • 1. Create an account in the Journalism Courses system. Even if you’ve taken a course with us before, you may need to create a new account. Check to see if your previous username and password work before creating a new account.
  • 2. Wait for a confirmation in your email indicating that your account has been created. If you do not receive this, please check your spam folder.
  • 3. Log into the platform, scroll down until you see the course listings, and click on the “Python for Data Journalists: Analyzing money in politics” course.
  • 4. A button will appear, click “Enroll” to enroll yourself into the course. You will be able to access the course from the “My Courses” menu at the top of the page.
  • 5. You will receive an email confirming your enrollment.

Please add the email addresses knightcenter@austin.utexas.edu and filipa.rodrigues@utexas.edu to your address book to ensure you receive emails about the course.

For these next four weeks I will be guiding you through an investigation of money in politics using data from the California Civic Data Coalition.

You will learn just enough of the Python computer programming language to work with the pandas library, a popular open-source tool for analyzing data. The course will teach you how to use pandas to read, filter, join, group, aggregate and rank structured data.

You will also learn how to record, remix and republish your analysis using the Jupyter Notebook, a browser-based application for writing code that is emerging as the standard for sharing reproducible research in the sciences.

And most important: you will see how these tools can increase the speed and veracity of your journalism.

This course is open to anyone interested in using Python and other computer programming tools to conduct data analysis. If you've tried Python once or twice, have good attitude and know how to take a few code crashes in stride, you are qualified for this class.

We are also recruiting journalists, students, academics and other professionals from California who interested in studying the state’s campaign finance system in greater depth. To serve this audience we are offering a “class within the class” that offers deeper engagement with the instructor and your fellow students. If you live or work in California and are interested in joining in, please apply for the limited number of seats on this group.

This course requires that you have a computer with a command-line interface, Internet browser and administrator privileges. Configuration instructions will be provided and you will be responsible for properly installing all the necessary software.

They are:

  • A command-line interface to interact with your computer
  • The git version control software and a GitHub account
  • Version 2.7 of the Python programming language
  • The pip package manager and virtualenv environment manager for Python
  • A code compiler that can install our heavy-duty analysis tools

First of all, note that this is an asynchronous course. That means there are no live events scheduled at specific times. You can log in to the course and complete activities throughout the week at your own pace, at the times and on the days that are most convenient for you.

Despite its asynchronous nature, there are still structures in place for the duration of the course. The material is organized into four weekly modules. Each module will be taught by Ben Welsh and will cover a different topic through videos, presentations, readings and discussion forums. There will be a quiz each week to test the knowledge you've gained through the course materials. The weekly quizzes, and weekly participation in the discussion forums, are the basic requirements for earning a certificate of participation at the end of the course.

This course is very flexible, and if you are behind with the materials, you have the entire length of the course to complete them. We do recommend you complete each of the following before the end of each week so you don’t fall behind:

  • Video lectures
  • Readings and handouts/exercises
  • Participation in the discussion forums
  • Quizzes covering concepts from video lectures and/or readings

Module 0: Hello world


In this introductory module, you will learn how to configure your computer to work with Python. Before you can use it analyze data, your computer needs the following tools installed:

  • A command-line interface to interact with your computer
  • The git version control software and a GitHub account
  • Version 2.7 of the Python programming language
  • The pip package manager and virtualenv environment manager for Python
  • A code compiler that can install our heavy-duty analysis tools

Module 1: Hello notebook


This week you will learn how to start a new Python analysis project and introduce you to pandas and the Jupyter Notebook. You will use them to draft an elementary data analysis that is clear and reproducible.

  • Creating a new Python workspace with virtualenv
  • Using pip to install the pandas and Jupyter Notebook libraries
  • Creating your first Jupyter Notebook
  • How to write Python code in a Jupyter Notebook
  • Importing the pandas library into the Jupyter Notebook

Module 2: Hello data


This week you will download a list of campaign contributors published by the California Civic Data Coalition and load it into a Jupyter Notebook for analysis with pandas. This class will cover:

  • Learning how the money funding campaigns is tracked in the United States
  • Downloading campaign data from the California Civic Data Coalition website
  • Importing structured data files as a DataFrame with pandas’ read_csv method
  • Inspecting DataFrames with pandas’ info and head methods
  • Inspecting and summarizing DataFrame columns with pandas’ value_counts and describe and sum methods

Module 3: Hello analysis


This week you will learn how to use pandas to conduct a data analysis and document your work with the Jupyter Notebook. It will cover:

  • Filtering a DataFrame with pandas’ indexing system
  • Merging two DataFrames with pandas’ merge method
  • Sorting a DataFrame with pandas’ sort_values method
  • Aggregating a DataFrame with pandas’ groupby method
  • Using these tools to responsibly navigate and analyze California campaign data

Module 4: Hello Internet


This week you will learn how to log changes to your Jupyter Notebook with version-control software and publish your analysis on the Internet. It will cover:

  • The git version control software and its integration with GitHub’s social network
  • How data journalists use GitHub and Jupyter Notebook to publish their work
  • How to use the Markdown markup language to annotate a Jupyter Notebook
  • How to create a new git code repository and start tracking code
  • How to connect the repository to GitHub and publish a Jupyter Notebook

Ben Welsh is the editor of the Los Angeles Times Data Desk, a team of reporters and computer programmers in the newsroom that works to collect, organize, analyze and present large amounts of information.

He is also a cofounder of the California Civic Data Coalition, an open-source network of developers working to open up public data, and the creator of PastPages, an archive dedicated to the preservation of online news.

Ben has worked at the Los Angeles Times since 2007. Before working at The Times, Ben conducted data analysis for investigative projects at The Center for Public Integrity in Washington DC.

Projects he has contributed to have been awarded the Pulitzer Prize, the Library of Congress' Innovation Award and numerous other prizes for investigative reporting, digital design and online journalism.

Ben graduated from DePaul University in 2004. During his time there, he worked with Carol Marin and Don Moseley at the DePaul Documentary Project. He later earned a master’s degree from the Missouri School of Journalism — where he served as a graduate assistant at the National Institute for Computer-Assisted Reporting.

He is originally from Swisher, Iowa.

A certificate of completion will be available for those who meet all of the course requirements. You will have until to complete the class criteria. After July 9, the Knight Center will review your record. After confirmation, of course requirements, the Knight Center will send a message through the course platform with confirmation that you fulfilled the course requirements and you qualify for the certificate. In this message, we will also send you instructions to download a PDF copy of your certificate through the course platform. You will be able to then download your certificate before the course closes down. No formal course credit of any kind is associated with the certificate. The certificate is awarded by the Knight Center for Journalism in the Americas to attest to the participation in the online course.

For those that meet all of the course requirements, a certificate of completion will be available to download, in PDF format. No formal course credit of any kind is associated with the certificate. The certificate is awarded by the Knight Center for Journalism in the Americas to attest to the participation in the online course.

For those that want to receive a certificate of completion for the course, you must meet the following requirements:

1) Complete weekly quizzes with a 70% score minimum by the weekly deadline.

2) Watch weekly video lectures and review weekly readings.

3) Participate in at least 1 discussion forum each week by given deadline.

If all requirements are met, an electronic certificate in PDF format will be emailed to the student.

Please add the following email addresses filipa.rodrigues@utexas.edu and knightcenter@austin.utexas.edu to your address book to ensure you receive emails about the course.





Connect With Us:

Facebook Twitter