Find Connections#

In this project we generate departure times for all stops in a region of interest for connections to one arrival stop with fixed (latest) arrival time.

The projects uses the gtfspy data base created in the Get Data and Set Up the Environment project. Basic Pandas knowledge is required to solve the tasks (read Series, Data Frames, Advanced Indexing before you start, Performance Issues may be of interest, too).

Data Base and Time Frame#

Task: Connect to the data base, that is, create a gtfspy.gtfs.GTFS object.

Solution:

# your solution

The routing algorithm of gtfspy looks for public transport connections in a user-defined time frame. Start and end time have to be provided in Unix time.

Task: Compute Unix times for start and end of your time frame of interest. Use the GTFS object’s get_day_start_ut method to convert a date to it’s 00:00 unix time. Then add hours and minutes to this value.

Hint

The Python standard library provides functions for getting Unix times. But GTFS.get_day_start_ut takes care of time zone information in the GTFS data.

Solution:

# your solution

Arrival Stop#

The routing algorithm of gtfspy computes public transport connections from all stops in the data base to a user-defined arrival stop. The arrival stop has to be specified by it’s GTFS ID (column 'stop_I' in the data frame returned by GTFS.stops()).

Task: Get the stops data frame. Use column 'stop_I' (GTFS stop ID) as index. Rename the index column to 'id' and the column 'stop_id' to 'code' (the stop’s GTFS short name). Drop all columns but 'id', 'code', 'name', 'lat', 'lon'.

Solution:

# your solution

Task: Write some code to find all stops containing some string (e.g., all stops containing 'Zwickau, Zentrum'). Use the stops’ geolocation and OpenStreetMap to decide for an arrival stop.

Hint

An advanced and very comfortable solution is to generate for each relevant stop a link to OSM (with marker at the stop). Rendering these links as HTML in Jupyter you simply have to click the stops’ links to see where they are on the map.

  • OSM link with marker: https://www.osm.org/?mlat=MARKER_LAT&mlon=MARKER_LON

  • HTML rendering for links:

    import IPython.display
    display(IPython.display.HTML('<a href="URL">LINK_TEXT</a>'))
    

Solution:

# your solution

Routing#

The routing API of gtfspy is relatively complex and unintuitive. To generate all connections to the arrival stop following steps are necessary:

  1. Call gtfspy.routing.helpers.get_transit_connections.

  2. Call gtfspy.routing.helpers.get_walk_network(G, max_walk).

  3. Create a gtfspy.routing.multi_objective_pseudo_connection_scan_profiler.MultiObjectivePseudoCSAProfiler object. Pass the results of steps 1 and 2 to the constructor (arguments transit_events and walk_network).

  4. Call the run method of the object created in step 3.

Task: Follow the above steps. Have a look at gtfspy’s source for available arguments. A good walking speed is 1.5. With track_vehicle_legs and track_time you (presumably) can influence whether connections with fewer transfers and lower travel time shall be preferred by the routing algorithm.

Solution:

# your solution

Best Connection#

The MultiObjectivePseudoCSAProfiler object now contains information about all connections to the arrival stop in the specified time frame. The stop_profiles member variable is subscriptable with allowed indices returned by the keys member function. Indices are stop IDs. If i is a stop ID, then stop_profiles[i].get_final_optimal_labels() returns an iterable object with one item per connection from stop i to the arrival stop. Each item has a departure_time member containing the departure time of the connection in Unix time.

Task: Add a column to your stops data frame, which contains the difference between latest allowed arrival time and latest possible departure time from the considered stop in minutes. For stops without connection to the arrival stop use -1.

Solution:

# your solution

Grouping Stops#

In the stops data frame most stops appear multiple times, e.g., each platform of a station has its own item in the data frame. For visualization nearby stops should be merged to one stop. The GTFS object’s get_stops_within_distance method yields a data frame of nearby stops. The first argument is the considered stop’s ID, the second argument is the distance in meters.

Task: Think about an algorithm for grouping stops and implement it. Add a column to your stops data frame, which contains a group ID for each stop. All stops with identical group ID are considered one and the same stop (in the visualization to create in a follow-up project).

Solution:

# your solution

Task: How many stop groups do you have? What’s the largest group? Show all its stops.

Solution:

# your solution

Save Results#

Task: Save your stops data frame to a CSV file.

Solution:

# your solution