Python - Split a list with a long string based on 2 keywords

Question

Welcome To Ask or Share your Answers For Others

Python - Split a list with a long string based on 2 keywords

asked Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

Python - Split a list with a long string based on 2 keywords

I have a list with a long string in it. How can I split the string to extract the sections from 'MyKeyword' to 'My Data'. These words appear multiple times in my list so I'd like to split it based on this and include the MyKeyword and MyData if possible

Current data example:

['MyKeyword This is my data MyData. MyKeyword and chunk of text here. Random text. MyData is this etc etc ']

Desired output:

['MyKeyword This is my data', 'MyData.', 'MyKeyword and chunk of text here. Random text.','MyData is this etc etc ']

Current code:


from itertools import groupby
#linelist = ["a", "b", "", "c", "d", "e", "", "a"]
split_at = "MyKeyword"
[list(g) for k, g in groupby(output2, lambda x: x != split_at) if k]

question from:https://stackoverflow.com/questions/65918346/python-split-a-list-with-a-long-string-based-on-2-keywords

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-06T19:09:32+0000

You can use a regular expression, matching all the text from MyKeyword to MyData in lazy mode:

>>> import re
>>> re.findall("MyKeyword.*?MyData.?","MyKeyword This is my data, MyData. MyKeyword and chunk of text here. Random text. MyData is this etc etc ")
['MyKeyword This is my data, MyData.', 'MyKeyword and chunk of text here. Random text. MyData']

.*? means 0 to infinite characters, but in lazy mode (*?), i.e. as less as possible;
.? means an optional period.

EDIT (according to the new requirement):

The regex you need is something like

MyKeyword.*?(?= ?MyData|$)|MyData.*?(?= ?MyKeyword|$)

It starts from the point where it matches MyKeyword (resp. MyData), and then it catches as less characters as possible, as above, until it reaches MyData (resp. MyKeyword) or the end of the string.

Indeed:

| is a special character which means "or"
$ matches the end of the string
? is an optional space
(?=<expr>) is called positive lookahead and it means "followed by <expr>"

Categories

Python - Split a list with a long string based on 2 keywords

Python - Split a list with a long string based on 2 keywords

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags