XML Files#

XML (extensible markup language) files look like HTML Files, but with custom tag names. Each XML file may use its own set of tags to describe data. In principle, HTML is a special case of XML.

Standard conforming XML files have some format specifications in there first lines. Content without such specifications could look like this:

<person>
    <first_name>John</first_name>
    <last_name>Miller</last_name>
    <town>Atown</town>
</person>
<person>
    <first_name>Ann</first_name>
    <last_name>Abor</last_name>
    <town>Betown</town>
</person>
<person>
    <first_name>Bob</first_name>
    <last_name>Builder</last_name>
    <town>Cetown</town>
</person>
<person>
    <first_name>Nina</first_name>
    <last_name>Morning</last_name>
    <town>Detown</town>
</person>

HTML files can be parsed like HTML files with Beautiful Soup.

import bs4

xml = '''\
<person>
    <first_name>John</first_name>
    <last_name>Miller</last_name>
    <town>Atown</town>
</person>
<person>
    <first_name>Ann</first_name>
    <last_name>Abor</last_name>
    <town>Betown</town>
</person>
<person>
    <first_name>Bob</first_name>
    <last_name>Builder</last_name>
    <town>Cetown</town>
</person>
<person>
    <first_name>Nina</first_name>
    <last_name>Morning</last_name>
    <town>Detown</town>
</person>
'''

soup = bs4.BeautifulSoup(xml)

# find all towns
towns = soup.find_all('town')

print('#towns:', len(towns))
print('last town:', towns[-1])
#towns: 4
last town: <town>Detown</town>