Find Connections
Contents
Find Connections#
In this project we generate departure times for all stops in a region of interest for connections to one arrival stop with fixed (latest) arrival time.
The projects uses the gtfspy
data base created in the Get Data and Set Up the Environment project. Basic Pandas knowledge is required to solve the tasks (read Series, Data Frames, Advanced Indexing before you start, Performance Issues may be of interest, too).
Data Base and Time Frame#
Task: Connect to the data base, that is, create a gtfspy.gtfs.GTFS
object.
Solution:
# your solution
The routing algorithm of gtfspy
looks for public transport connections in a user-defined time frame. Start and end time have to be provided in Unix time.
Task: Compute Unix times for start and end of your time frame of interest. Use the GTFS
object’s get_day_start_ut
method to convert a date to it’s 00:00 unix time. Then add hours and minutes to this value.
Hint
The Python standard library provides functions for getting Unix times. But GTFS.get_day_start_ut
takes care of time zone information in the GTFS data.
Solution:
# your solution
Arrival Stop#
The routing algorithm of gtfspy
computes public transport connections from all stops in the data base to a user-defined arrival stop. The arrival stop has to be specified by it’s GTFS ID (column 'stop_I'
in the data frame returned by GTFS.stops()
).
Task: Get the stops data frame. Use column 'stop_I'
(GTFS stop ID) as index. Rename the index column to 'id'
and the column 'stop_id'
to 'code'
(the stop’s GTFS short name). Drop all columns but 'id'
, 'code'
, 'name'
, 'lat'
, 'lon'
.
Solution:
# your solution
Task: Write some code to find all stops containing some string (e.g., all stops containing 'Zwickau, Zentrum'
). Use the stops’ geolocation and OpenStreetMap to decide for an arrival stop.
Hint
An advanced and very comfortable solution is to generate for each relevant stop a link to OSM (with marker at the stop). Rendering these links as HTML in Jupyter you simply have to click the stops’ links to see where they are on the map.
OSM link with marker:
https://www.osm.org/?mlat=MARKER_LAT&mlon=MARKER_LON
HTML rendering for links:
import IPython.display display(IPython.display.HTML('<a href="URL">LINK_TEXT</a>'))
Solution:
# your solution
Routing#
The routing API of gtfspy
is relatively complex and unintuitive. To generate all connections to the arrival stop following steps are necessary:
Call
gtfspy.routing.helpers.get_transit_connections
.Call
gtfspy.routing.helpers.get_walk_network(G, max_walk)
.Create a
gtfspy.routing.multi_objective_pseudo_connection_scan_profiler.MultiObjectivePseudoCSAProfiler
object. Pass the results of steps 1 and 2 to the constructor (argumentstransit_events
andwalk_network
).Call the
run
method of the object created in step 3.
Task: Follow the above steps. Have a look at gtfspy
’s source for available arguments. A good walking speed is 1.5
. With track_vehicle_legs
and track_time
you (presumably) can influence whether connections with fewer transfers and lower travel time shall be preferred by the routing algorithm.
Solution:
# your solution
Best Connection#
The MultiObjectivePseudoCSAProfiler
object now contains information about all connections to the arrival stop in the specified time frame.
The stop_profiles
member variable is subscriptable with allowed indices returned by the keys
member function. Indices are stop IDs.
If i
is a stop ID, then stop_profiles[i].get_final_optimal_labels()
returns an iterable object with one item per connection from stop i
to the arrival stop. Each item has a departure_time
member containing the departure time of the connection in Unix time.
Task: Add a column to your stops data frame, which contains the difference between latest allowed arrival time and latest possible departure time from the considered stop in minutes. For stops without connection to the arrival stop use -1
.
Solution:
# your solution
Grouping Stops#
In the stops data frame most stops appear multiple times, e.g., each platform of a station has its own item in the data frame.
For visualization nearby stops should be merged to one stop. The GTFS
object’s get_stops_within_distance
method yields a data frame of nearby stops. The first argument is the considered stop’s ID, the second argument is the distance in meters.
Task: Think about an algorithm for grouping stops and implement it. Add a column to your stops data frame, which contains a group ID for each stop. All stops with identical group ID are considered one and the same stop (in the visualization to create in a follow-up project).
Solution:
# your solution
Task: How many stop groups do you have? What’s the largest group? Show all its stops.
Solution:
# your solution