ZIP Files#

Large data sets usually ship as compressed files, mostly ZIP files. Extracting such files requires lots of disk space. Compressed text files, for instance, are much smaller than the original files (factor 5 to 10).

In Python we might use the zipfile module. This module allows to read single files from a ZIP archive without extracting the whole archive. We have to create an object of type zipfile.ZipFile. Such objects provide an open method. The return value of open is a file-like object, that is, it can be processed like usual files. Files from ZIP archives are always opened in binary mode by ZipFile objects’ open method.

The namelist method returns a list of file names in the ZIP archive.

import os.path
import zipfile

# open zip file
zf = zipfile.ZipFile(os.path.join('testdir', 'test.zip'))

# show contents of zip file
print(zf.namelist())

# read one specific file from zip file
f = zf.open('file.txt')
print('file contents:')
print(f.read().decode())    # opened in binary mode!
f.close()

# close zip file
zf.close()
['another_file.txt', 'file.txt']
file contents:
This is a file for testing zipfile module.