Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
286 views
in Technique[技术] by (71.8m points)

Extract the first paragraph from a Wikipedia article (Python)

How can I extract the first paragraph from a Wikipedia article, using Python?

For example, for Albert Einstein, that would be:

Albert Einstein (pronounced /??lb?rt ?a?nsta?n/; German: [?alb?t ?a?n?ta?n] ( listen); 14 March 1879 – 18 April 1955) was a theoretical physicist, philosopher and author who is widely regarded as one of the most influential and iconic scientists and intellectuals of all time. A German-Swiss Nobel laureate, Einstein is often regarded as the father of modern physics.[2] He received the 1921 Nobel Prize in Physics "for his services to theoretical physics, and especially for his discovery of the law of the photoelectric effect".[3]

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

I wrote a Python library that aims to make this very easy. Check it out at Github.

To install it, run

$ pip install wikipedia

Then to get the first paragraph of an article, just use the wikipedia.summary function.

>>> import wikipedia
>>> print wikipedia.summary("Albert Einstein", sentences=2)

prints

Albert Einstein (/??lb?rt ?a?nsta?n/; German: [?alb?t ?a?n?ta?n] ( listen); 14 March 1879 – 18 April 1955) was a German-born theoretical physicist who developed the general theory of relativity, one of the two pillars of modern physics (alongside quantum mechanics). While best known for his mass–energy equivalence formula E = mc2 (which has been dubbed "the world's most famous equation"), he received the 1921 Nobel Prize in Physics "for his services to theoretical physics, and especially for his discovery of the law of the photoelectric effect".

As far as how it works, wikipedia makes a request to the Mobile Frontend Extension of the MediaWiki API, which returns mobile friendly versions of Wikipedia articles. To be specific, by passing the parameters prop=extracts&exsectionformat=plain, the MediaWiki servers will parse the Wikitext and return a plain text summary of the article you are requesting, up to and including the entire page text. It also accepts the parameters exchars and exsentences, which, not surprisingly, limit the number of characters and sentences returned by the API.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...