Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.1k views
in Technique[技术] by (71.8m points)

regex - negative lookahead assertion not working in python

Task:
- given: a list of images filenames
- todo: create a new list with filenames not containing the word "thumb" - i.e. only target the non-thumbnail images (with PIL - Python Imaging Library).

I've tried r".*(?!thumb).*" but it failed.

I've found the solution (here on stackoverflow) to prepend a ^ to the regex and to put the .* into the negative lookahead: r"^(?!.*thumb).*" and this now works.

The thing is, I would like to understand why my first solution did not work but I don't. Since regexes are complicated enough, I would really like to understand them.

What I do understand is that the ^ tells the parser that the following condition is to match at the beginning of the string. But doesn't the .* in the (not working) first example also start at the beginning of the string? I thought it would start at the beginning of the string and search through as many characters as it can before reaching "thumb". If so it would return a non-match.

Could someone please explain why r".*(?!thumb).*" does not work but r"^(?!.*thumb).*" does?

Thanks!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Could someone please explain why r".*(?!thumb).*" does not work but r"^(?!.*thumb).*" does?

The first will always match as the .* will consume all the string (so it can't be followed by anything for the negative lookahead to fail). The second is a bit convoluted and will match from the start of the line, the most amount of characters until it encounters 'thumb' and if that's present, then the entire match fails, as the line does begin with something followed by 'thumb'.

Number two is more easily written as:

  • 'thumb' not in string
  • not re.search('thumb', string) (instead of match)

Also as I mentioned in the comments, your question says:

filenames not containing the word "thumb"

So you may wish to consider whether or not thumbs up is supposed to be excluded or not.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...