Cafeteria#

Have a look at the Zwickau and Chemnitz Universities’s menu (cafeterias of both universities are operated by Studentenwerk Chemnitz-Zwickau). In this project we want to scrape as much as possible historic menu data from that website. Read Accessing Data before you start. Section Web Access is of particular importance.

The API#

Often web APIs come with some documentation. In our case we neither see an obvious API nor some documentation. Clicking through the menus of past weeks and watching the browser’s address bar we see how date and other information is encoded in the URL. This is our key for scraping historic data.

In addition, there is a link an the lower right looking like information about the API. But it turns out, that there is not much API related information, but the useful hint on on XML interface using the same parameter envoding like the HTML interface.

Task: Understand the arguments in the HTML URLs. Then try the XML API from your browser’s address bar. Note all location IDs (for ‘Mensa Ring’ and so on) and the oldest available menu (by trial and error).

Solution:

# your answer

Getting Raw Data#

We proceed in two steps:

  • get all the XML files,

  • parse all XML files.

Parsing will require lots of trial and error. Thus, first downloading all files and parsing in a second step avoids repeated requests to the server while developing and testing code for parsing.

Task: Write a Python script which downloads menu XML files for all week days and mensa IDs 3 and 4. Write all files into the same directory. Before you start: How many requests will be send to the server? How long will it take if we send two requests per second?

Solution:

# your solution

Parsing#

Task: From all the downloaded files extract all meals including date, category, description, and prices for students, staff, guests. Save the data to a CSV file.

Solution:

# your solution