Hands-on machine learning solutions for journalists

November 18 - December 15, 2019

Instructor: John Keefe

If you’re nervous that you don’t know much about machine learning, or don’t know how to code, don’t worry. During this four-week course, "Hands-on Machine Learning Solutions for Journalists," instructor John Keefe will hold your hand step-by-step through the concepts and code you’ll need to get a feel for using machine learning for journalism. By the end of this four-week course, you'll learn about using off-the-shelf tools for images, making a custom image detector, improving your detector & detecting objects in videos, analyzing text documents, and more.

The course is $95 and designed for anyone from anywhere in the world who is interested in machine learning for journalism.

Registering in the platform is easy. Please follow these steps:

  • 1. Create an account in the Journalism Courses system. Even if you’ve taken a course with us before, you may need to create a new account. Check to see if your previous username and password work before creating a new account.
  • 2. Wait for a confirmation in your email indicating that your account has been created. If you do not receive this, please check your spam folder.
  • 3. Pay the $95 registration fee. The payment must be made only by those who already have an account in the Journalism Courses system. The username and email you used in the system will be required when you complete your payment.
  • 4. You will receive an email confirming your registration and granting you access to the introductory module of the course.

Please add the email addresses journalismcourses@austin.utexas.edu and filipa.rodrigues@utexas.edu to your address book to ensure you receive emails about the course.

During this four-week course, instructor John Keefe will introduce you to the situations where machine learning can help in your reporting -- such as when you’re faced with a trove of documents or images -- and then guide you through the basics of using some machine learning methods to help you.

You will first learn how to use some off-the-shelf systems to get fast answers to basic questions: What’s in all of these images? What are these documents about? Then we’ll move to building custom machine learning models to help with a particular project, such as sorting documents into particular piles.

Our work will be done with pre-written code, so you always start with a working base. You’ll then learn more by modifying it.

This course is open to anyone interested in hands-on machine learning solutions for finding, researching, and enhancing your journalism. If you know how to code, that’s great; it will be super helpful. But you don’t need any previous coding experience to take or benefit from the course.

The course is a great primer for journalists interested in the fast.ai machine learning Python library and/or the fast.ai courses, but our examples and use-cases will be tailored specifically for reporters and data journalists.

Module 1: Using off-the-shelf tools for images

This module will give you hands-on experience using some existing machine learning tools. You'll learn to solve problems such as getting descriptions of each photo in a folder full of images, extracting text appearing in those images, or understanding the content of a folder full of documents. So next time you get a trove of documents, you'll have a better sense of what's in them — faster.

This module will cover:

  • Basic AI concepts tailored to journalism projects
  • Getting started with Google Colaboratory notebooks
  • Multi-category image detection with open-source tools
  • Multi-category image detection using Google Vision
  • Optional challenges:
    • Analyze your own set of images
    • Extract text detected inside images

Module 2: Making a custom image detector

Specific stories have specific needs, sometimes not easily solved with general tools. Say you have thousands of images taken by housing inspectors. General tools may spot smoke detectors, but probably not the telltale rings of *missing* smoke detectors. Or maybe you have lots of maps of helicopter paths, but you want just those in which helicopters are circling. You'll learn how to make custom detectors specific to your investigation.

This module will cover:

  • Introduction to transfer learning and fast.ai
  • Training a custom model to sort images
  • Using your own data & saving your work
  • Discussion: Data as images
  • Optional challenges:
    • Use this method on your own set of images
    • Generate “data images” from a data set you know

Module 3: Improving your detector & detecting objects in videos

Training a machine learning model to work well is pretty straightforward. Getting it to work even better, or to fix problems when they arise, takes more craft. We'll introduce you to some of the methods and tricks to make your detector work even better. We'll also work with detecting objects in videos.

This module will cover:

  • Finding particular objects or scenes inside videos
  • Tips and tricks for improving the accuracy of your model
  • Optional Challenge:
    • Try your hand at more techniques to improve your model
    • Putting an image-detector into the wild for you, your colleagues, or your audience

Module 4: Analyzing text documents

Image problems are great for learning machine learning, but many of the thorniest journalism problems involve troves of text documents. How might you find all of the names used in a huge FOIA response? Can we sort documents into two piles based on their contents? Machine learning can help, and we'll see how.

This module will cover:

  • Entity extraction using open-source libraries
  • Sorting tweets into useful piles
  • Optional challenges:
    • Categorize your inbox
    • Using AI to find similar documents

John Keefe is the investigations editor at Quartz and leads the Quartz AI Studio. Keefe also teaches classes on product prototyping, design, and development at the Craig Newmark Graduate School of Journalism at CUNY and runs a product tinkering company called Really Good Smarts LLC. Before joining Quartz, he was Senior Editor for Data News at public radio station WNYC, leading a team of journalists who specialize in data reporting, coding, and design for visualizations and investigations. He was previously WNYC's news director for nearly a decade. A self-described "professional beginner," Keefe is the author of Family Projects for Smart Objects: Tabletop Projects That Respond to Your World from Maker Media, which grew from his effort to make something new every week for a year. Keefe has led classes and workshops at Columbia University, Stanford University, the New School University, and New York University. He also has served as an Innovator in Residence at West Virginia University's Reed College of Media. Keefe blogs at johnkeefe.net and tweets as @jkeefe.

This course is open to anyone interested in hands-on machine learning solutions for finding, researching, and enhancing your journalism. If you know how to code, that’s great; it will be super helpful. But you don’t need any previous coding experience to take or benefit from the course.

This course requires that you have a computer with an Internet browser. All work is done online, using online services, all of which will be free to you at the volume of work we’re doing. You'll need an account on this service:

  • Google: https://accounts.google.com
  • First of all, note that this is an asynchronous course. That means there are no live events scheduled at specific times. You can log in to the course and complete activities throughout the week at your own pace, at the times and on the days that are most convenient for you.

    Despite its asynchronous nature, there are still structures in place for the duration of the course.

    The material is organized into four weekly modules. Each module will be taught by John Keefe, the technical architect for bots and machine learning at Quartz, and will cover a different topic through videos, presentations, readings and discussion forums. There will be a quiz each week to test the knowledge you've gained through the course materials. The weekly quizzes, and weekly participation in the discussion forums, are the basic requirements for earning a certificate of participation at the end of the course.

    This course is very flexible, and if you are behind on the materials, you have the entire length of the course to complete them. We do recommend you complete each of the following before the end of each week so you don’t fall behind:

    • Video lectures
    • Readings and handouts/exercises
    • Participation in the discussion forums
    • Quizzes covering concepts from video lectures and/or readings

    A certificate of completion will be available for those who meet all of the course requirements. Once the course ends, the Knight Center will review your activity in the course and will send a message letting you know if you fulfilled the course requirements and if you qualify for the certificate. If you don't yet qualify, we'll tell you which activities you still need to complete. If you do qualify, we'll send you instructions on how to download a PDF copy of your certificate through the course platform. You will be able to then download your certificate before the course closes. The certificate is awarded by the Knight Center for Journalism in the Americas to attest to the participation in the online course. No formal course credit of any kind is associated with the certificate.

    A certificate of completion will be available to download in PDF format for all those who meet the requirements. The certificate is awarded by the Knight Center for Journalism in the Americas to attest to the participation in the online course, and no formal course credit of any kind is associated with it.

    Those who want to receive a certificate of completion for the course must meet the following requirements:

    • Complete weekly quizzes with a 70% score minimum by the weekly deadline
    • Watch weekly video lectures and review weekly reading
    • Participate in at least 1 discussion forum each week by given deadline

    Please add the email addresses filipa.rodrigues@utexas.edu and journalismcourses@austin.utexas.edu to your address book to ensure you receive emails about the course.

    Connect With Us:

    Facebook Twitter