Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
307 views
in Technique[技术] by (71.8m points)

Python: use regex to extract strings from text and put them into a list

I have a txt file and its content has the following pattern, some strings ( a ; b ; c ) some strings ( d ; e ; f ) and so on

How can I extract them from the text and put them into a list like lists = [['a', 'c', 'b'], ['d', 'f', 'e']]?

Thank you.

question from:https://stackoverflow.com/questions/66057074/python-use-regex-to-extract-strings-from-text-and-put-them-into-a-list

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Maybe it's because my regex-fu not as strong as some's but I'd do it in 2 stages and only use regex on the first.

import re

text_ = "some strings ( a ; b ; c ) some strings ( d ; e ; f ) and so on"

#extract anything bounded by parenthesis
pat1 =re.compile(r"(([^)]+))")
split1 = pat1.findall(text_)

def split(substr):
    """ dont need to be all fancy, split on ; after stripping the parenthesis """
    return [v.strip() for v in substr.lstrip("(").rstrip(")").split(";")]

result = [split(val) for val in split1]

print(result)

output:

[['a', 'b', 'c'], ['d', 'e', 'f']]

Alternatively, you can let the first regex exclude the parenthesis in the groups so that it simplifies your split function. Cleaner, same output.

pat1 =re.compile(r"(([^)]+))")
split1 = pat1.findall(text_)

def split(substr):
    """ dont need to be all fancy, split on ; after stripping the parenthesis """
    return [v.strip() for v in substr.split(";")]



与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...