I am trying to extract list of persons and organizations using Stanford Named Entity Recognizer (NER) in Python NLTK.
When I run:
from nltk.tag.stanford import NERTagger
st = NERTagger('/usr/share/stanford-ner/classifiers/all.3class.distsim.crf.ser.gz',
'/usr/share/stanford-ner/stanford-ner.jar')
r=st.tag('Rami Eid is studying at Stony Brook University in NY'.split())
print(r)
the output is:
[('Rami', 'PERSON'), ('Eid', 'PERSON'), ('is', 'O'), ('studying', 'O'),
('at', 'O'), ('Stony', 'ORGANIZATION'), ('Brook', 'ORGANIZATION'),
('University', 'ORGANIZATION'), ('in', 'O'), ('NY', 'LOCATION')]
what I want is to extract from this list all persons and organizations in this form:
Rami Eid
Sony Brook University
I tried to loop through the list of tuples:
for x,y in i:
if y == 'ORGANIZATION':
print(x)
But this code only prints every entity one per line:
Sony
Brook
University
With real data there can be more than one organizations, persons in one sentence, how can I put the limits between different entities?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…