This type of thing can be taken care of using xpath with lxml.
One thing, though, one of the html tags (<2000>
) is invalid since it doesn't begin with a letter. If you have no control over the source, you have to replace the offending tag before parsing and then replace it again after processing.
So, all together:
import lxml.html as lh
countries = """[your html above]"""
doc = lh.fromstring(countries.replace('2000','xxx'))
states = doc.xpath('//country')
for country in states:
if country.xpath('./name/text()')[0]!='a':
country.getparent().remove(country)
print(lh.tostring(doc).decode().replace('xxx','2000'))
Output:
<world>
<countries>
<country>
<name>a</name>
<description>a short description</description>
<population>
<now>250000</now>
<2000>100000</2000>
</population>
</country>
</countries>
</world>
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…