Cafeteria
Contents
Cafeteria#
Have a look at the Zwickau and Chemnitz Universities’s menu (cafeterias of both universities are operated by Studentenwerk Chemnitz-Zwickau). In this project we want to scrape as much as possible historic menu data from that website. Read Accessing Data before you start. Section Web Access is of particular importance.
The API#
Often web APIs come with some documentation. In our case we neither see an obvious API nor some documentation. Clicking through the menus of past weeks and watching the browser’s address bar we see how date and other information is encoded in the URL. This is our key for scraping historic data.
In addition, there is a link an the lower right looking like information about the API. But it turns out, that there is not much API related information, but the useful hint on on XML interface using the same parameter envoding like the HTML interface.
Task: Understand the arguments in the HTML URLs. Then try the XML API from your browser’s address bar. Note all location IDs (for ‘Mensa Ring’ and so on) and the oldest available menu (by trial and error).
Solution:
# your answer
Legal Considerations#
Have a look at the license information. There we read that it’s okay to use the data for our intended purposes.
Remember to not fire too many requests in short time to the server! This may trigger some protection mechanism making the server refuse any communication with us.
Limit the number of requests per second by pausing your script after each request.
While developing and testing automatic download limit the total number of requests to a hand full until you’re certain that your script works correctly.
Getting Raw Data#
We proceed in two steps:
get all the XML files,
parse all XML files.
Parsing will require lots of trial and error. Thus, first downloading all files and parsing in a second step avoids repeated requests to the server while developing and testing code for parsing.
Task: Write a Python script which downloads menu XML files for all week days and mensa IDs 3 and 4. Write all files into the same directory. Before you start: How many requests will be send to the server? How long will it take if we send two requests per second?
Solution:
# your solution
Parsing#
Task: From all the downloaded files extract all meals including date, category, description, and prices for students, staff, guests. Save the data to a CSV file.
Solution:
# your solution