python - parsing a sentence - match inflections and skip punctuation

Question

Welcome To Ask or Share your Answers For Others

python - parsing a sentence - match inflections and skip punctuation

asked Oct 6, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - parsing a sentence - match inflections and skip punctuation

I'm trying to parse sentences in python- for any sentence I get I should take only the words that appear after the words 'say' or 'ask' (if the words doesn't appear, I should take to whole sentence) I simply did it with regular expressions:

sen = re.search('(?s)(?<=say|Say).*$', current_game_row["sentence"], re.M | re.I)

(this is only for 'say', but adding 'ask' is not a problem...)

The problem is that if I get a sentence with punctuations like comma, colon (,:) after the word 'say' it takes it too. Someone suggested me to use nltk tokenization in order to define it, but I'm new in python and don't understand how to use it. I see that nltk has the function RegexpParser but I'm not sure how to use it. Please help me :-)

** I forgot to mention that- I want to recognize 'said'/ asked etc. too and don't want to catch word that include the word 'say' or 'ask' (I'm not sure there are such words...). In addition, if where are multiply 'say' or 'ask' , I only want to catch the first token in in the sentence. **

question from:https://stackoverflow.com/questions/66060945/parsing-a-sentence-match-inflections-and-skip-punctuation

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-06T03:07:40+0000

Everything after a Keyword

We can deal with the unwanted punctuation by using w to eat up all non-unicode.

sentence = "Hearsay? With masked flasks I said: abracadabra"

keys = '|'.join(['ask', 'asks', 'asked', 'say', 'says', 'said'])
result = re.search(rf'({keys})W+(.*)', sentence, re.S | re.I)

if result == None:
    print(sentence)
else:    
    print(result.group(2))

Output:

abracadabra

case-sensitive: You have case-insensitive flag re.I, so we can remove Say permutation.

multi-line: You have re.M option which directs ^ to not only match at the start of your string, but also right after every within that string. We can drop this since we do not need to use ^.

dot-matches-all: You have (?s) which directs . to match everything including . This is the same as applying re.S flag.

I'm not sure what the net effect of having both re.M and re.S is. I think your sentence might be a text blob with newlines inside, so I removed re.M and kept (?s) as re.S

Categories

python - parsing a sentence - match inflections and skip punctuation

python - parsing a sentence - match inflections and skip punctuation

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Everything after a Keyword

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags