I'm studying Python 3 but I'm struggling to get regex with the re module.
Here's my problem: I have the string
phrase = "s000000000 s1133122 s21 s3 s4 s5212638476234857634 s6 s7 s8 s9000"
and, using the function
re.findall(pattern, phrase)
I'd like to extract:
- s0-s9 strings without the additional characters;
- s0-s3 strings without the additional characters;
- s0-s3 strings with the additional characters;
- s4-s9 strings with the additional characters.
I managed to accomplish the first three tasks by using these following patterns:
pattern = "s[0-9]"
pattern = "s[0-3]"
pattern = "s[0-3]+"
For the last task, though, I tried to replicate what I did in the third one and used
pattern = "s[4-9]+"
but, instead of getting as result
["s4", "s5212638476234857634", "s6", "s7", "s8", "s9000"]
I get
["s4", "s5", "s6", "s7", "s8", "s9"]
Why is that? What am I missing? The instructions on the book I'm studying from states that the plus sign means "one or more characters", and the s[0-3]+ pattern in fact works, but I cannot make it work for this specific problem.
question from:
https://stackoverflow.com/questions/65923618/how-to-choose-a-regex-pattern-in-python 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…