how do I extract an element, sub-elements and the full path from xml in python?

Question

Welcome To Ask or Share your Answers For Others

how do I extract an element, sub-elements and the full path from xml in python?

asked Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

how do I extract an element, sub-elements and the full path from xml in python?

I would like to extract an element, including sub-elements and the full path from xml.

If this is my xml doc:

<world>
    <countries>
        <country>
            <name>a</name>
            <description>a short description</description>
            <population>
                <now>250000</now>
                <2000>100000</2000>
            </population>
        </country>
        <country>
            <name>b</name>
            <description>b short description</description>
            <population>
                <now>350000</now>
                <2000>150000</2000>
            </population>
        </country>
    </countries>
</world>

I would like to end up with this (see below) based on an xpath expression of ('//country[name="a"]

<world>
    <countries>
        <country>
            <name>a</name>
            <description>a short description</description>
            <population>
                <now>250000</now>
                <2000>100000</2000>
            </population>
        </country>
    </countries>
</world>

question from:https://stackoverflow.com/questions/65948160/how-do-i-extract-an-element-sub-elements-and-the-full-path-from-xml-in-python

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-06T18:54:43+0000

This type of thing can be taken care of using xpath with lxml.

One thing, though, one of the html tags (<2000>) is invalid since it doesn't begin with a letter. If you have no control over the source, you have to replace the offending tag before parsing and then replace it again after processing.

So, all together:

import lxml.html as lh
countries = """[your html above]"""
doc = lh.fromstring(countries.replace('2000','xxx'))

states = doc.xpath('//country')
for country in states:
    if country.xpath('./name/text()')[0]!='a':
        country.getparent().remove(country)
print(lh.tostring(doc).decode().replace('xxx','2000'))

Output:

<world>
    <countries>
        <country>
            <name>a</name>
            <description>a short description</description>
            <population>
                <now>250000</now>
                <2000>100000</2000>
            </population>
        </country>
        </countries>
</world>

Categories

how do I extract an element, sub-elements and the full path from xml in python?

how do I extract an element, sub-elements and the full path from xml in python?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags