Part of the 2024 Local Data Journalism Initiative, this project is a collaboration between the University of Chicago's Mansueto Institute for Urban Innovation and the Chicago Tribune. The StopWatch analyzes a unique dataset of over 100 million real-time bus locations collected by Chi Hack Night Ghost Bus.
The goal of this project is to understand how bus service reliability changed over time in the city of Chicago after the disruptions of the COVID-19 pandemic and across the city’s community areas. To address our research questions, we built a novel, comprehensive data set showing the actual arrival time of each bus at every bus stop in Chicago from June 2022 to July 2024. In addition, we processed schedule data to compare actual performance to planned service. With this data, we computed several metrics to assess the reliability and accessibility of bus service. Our results showed that the CTA initially decreased scheduled service in early 2022 through 2023 to match the slower real-time performance and later increased it in early 2024. However, these changes were not uniformly felt across Chicago’s community areas. Along with our analysis, we published the bus data sets and created Bus Report Cards — an interactive platform with indicators at the community, route, and bus stop level. These products will be updated through an automated pipeline, which can be consulted in the open-source code repository of this project.
View the project online at https://ctastopwatch.miurban-dashboards.org.
Read the project's report at https://bit.ly/MansuetoStopWatch.
Learn more about the Mansueto Institute of Urban Innovation.
This project contains:
-
Scrapping Realtime Bus Locations - Building upon the work of Chi Hack Night Ghost Bus project, the Manseuto Institute has taken over the implementation of maintenance of the data and data pipelines created by Ghost Buses which scrapes real-time bus location data from the CTA bus tracker API get vehicles feed every five minutes. Files of daily locations can be found at https://d2v7z51jmtm0iq.cloudfront.net/cta-stop-watch/full_day_data/YYYY-MM-DD.csv - please replace YYYY-MM-DD with the date you need. The earliest date in our database is May 19, 2022 linked here. The code for this pipeline can be found in
ghostbus-cta-scrape/
. -
Processed Actual Bus Service Data Set - Using real time bus location scrapped every 5 minutes from the CTA bus tracker API, we have created a bus stop level dataset of actual service for the CTA starting June 2022 building off the work of Chi Hack Night Ghost Bus project. This pipeline currenly runs daily and the processsed data can be found at https://d2v7z51jmtm0iq.cloudfront.net/cta-stop-watch/processed_by_pid/trips_PID_full.parquet - replace the PID with the PID of the Pattern you require (A Pattern is a subset of a route - list of all available PIDs by the route is available here). The code for this pipeline code can be found in
report_automation/
. See methods for more information. -
Summary Metrics - Using both the historic real-time bus location and the historic schedule data bus stop level, we calculated the a set of metrics for different time periods, including hour of the day, day of the week, week of the year, month of the year, year, week for each given year and month for each given year. These summary metrics are used for the Bus Report Cards on the web app. Download the most recently metrics here. The code for the metrics creation can be found in
report_automation/
. See methods for more information. -
Bus Stop Report Cards Web App - A FastAPI app that includes an interactive Tableau dashboard with indicators at the community, route and bus stop level to allow riders to explore relevant metrics about the stops and routes that they use. The report cards will update monthly with up to date metrics. Access the web app here. The code for the web app can be found in
bus_report_cards/
. -
Bus Service Analysis - A first analysis found that the CTA initially decreased scheduled service in early 2022 through 2023 to match the slower real-time performance and later increased it in early 2024. However, these changes were not uniformly felt across Chicago’s community areas. Notebooks for the analysis can be found in
analysis/
Lastly, git-issues-review/
contains exploratory analysis of bugs and improvement for the projects as documented here.
The Mansueto StopWatch is centered around the bus stop. We picked this unit of analysis since it constitutes the main point of contact between users and bus service: the stop is where users wait for the bus and it determines how close the service is from their trip origin and destination. Since there is no public data that precisely shows when a bus stopped at a bus stop, our first and major technical challenge was to build such a dataset. This would then be the building block for the aggregated metrics. To produce the dataset, we took a variety of steps to process and merge several data sources, such as community, bus stop and route shapefiles, historic bus location pings, and the historic bus schedule data.
For the breakdown of how we process these inputs, we follow the terminology used by the CTA API documentation:
- Route: A collection of patterns
- Pattern: One possible set of stops that a bus can travel on
- Trip: For a given pattern, a bus's journey from the first stop to the last stop
We began with over 110 million real-time bus locations from June 1, 2022, to July 28, 2024, as provided by the Chi Hack Night Ghost Bus project. These bus locations are real-time data from the CTA bus tracker API get vehicles
feed, which stores real-time data queried every five minutes. Each bus location ping includes metadata such as vehicle number, route, pattern ID, and a trip ID. Due to the lack of uniqueness of the existing trip ID provided, we created a unique trip ID to group a collection of bus locations together. Each trip represents a specific bus on a specific pattern and route at a certain time of the day (e.g., bus with vehicle ID 4654 traveling northbound on pattern 1456 on route 6 on June 30, 2023). While this method is not perfect, this new trip ID allows us to group bus pings into one trip, facilitating analysis of service reliability. Using this method, we identified 10,235,984 unique trips in the original dataset. For our analysis, it was necessary to transform the raw bus location data into the desired bus stop view. The original data represents 5-minute snapshots of every bus in the CTA system, which was converted to the times that each bus passes a bus stop using imputation. For example, if two buses pass a stop within a 5-minute snapshot, there will be two rows, each listing the estimated time the first bus passed a stop and the estimated time since the last bus. This transformation is necessary as it allows us to derive performance metrics that are more interpretable and easier to localize than bus positions.
To do this, we:
- Determined the bus stops for a particular trip by using the pattern ID and bus stop locations as provided by the CTA
- Combined bus locations and stop locations for a trip spatially
- Removed bus locations that were not on route
- Interpolated the time a bus arrived at a bus stop by using the time and distance between bus locations and the distance from bus stops between the bus locations
We then processed historic schedules of bus service to contrast it with the actual service provided. For this purpose, we used General Transit Feed Specification (GTFS) data. The CTA only allows for the download of the current schedule, which was an obstacle considering that we planned to evaluate bus service going back to June 2022. However, Transit.land, an open data platform that collects GTFS data, maintains a historic archive of all feeds. Historic feeds back to May 2022 were downloaded. Schedules were recreated from this historic GTFS data using GTFS Kit, an open-source Python library to work with GTFS data.
In addition to bus pings and schedules, the analysis relies on shapefiles of three main units of analysis: community areas, bus stops, and routes. These shapefiles are mainly used for visualizations and for spatial operations. More specifically, we performed point-in-polygon operations to aggregate service performance metrics at the community level—by identifying the bus stops that serve each of the 77 community areas. Up-to-date shapefiles are available at the Chicago Data Portal for the following spatial units:
Using both the historic real-time bus location and the historic schedule data at the bus stop level, we calculated the following metrics for different time periods (including hour of the day, day of the week, week of the year, month of the year, year, week for each given year, and month for each given year).
-
Time to next bus stop
- Given a bus is at a bus stop, the time until the next bus on the same route arrives.
-
Excess Time to Next Bus
- The actual time to next bus minus the scheduled time to next bus.
-
Trip Duration
- The time difference between the first and last stop.
-
Trip Delay
- The actual trip duration minus the scheduled trip duration.
-
Number of buses
- How many buses passed a bus stop in each time interval.
-
Excess number of buses
- Actual number of buses in each time interval minus scheduled number of buses.
To calculate the metrics, we:
- Filtered to only trips with stops between 6am and 8pm
- Calculated the median, mean, standard deviation, max, min, 25th quartile, and 75th quartile for each metric
- Aggregated to the route and community area level for varying time periods by finding the weighted median value of the metric for each stop using the number of buses that pass each bus stop in the aggregation unit
For further details on the project data and methodology, consult the full report.
See project_history.md for a history of the start of the project.
We recommend installing poetry https://python-poetry.org and then running poetry install
to install the dependencies for the project
Berry, Christopher R., Eric Langowski, Divij Sinha, Austin Steinhart, Regina Isabel Medina Rosales, and Joseph De Leon. Mansueto Institute’s StopWatch. V. 1.0.0, January 2025. https://github.com/mansueto-institute/cta-stop-watch