Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
525 views
in Technique[技术] by (71.8m points)

python - Reuse part of a Regex pattern

Consider this (very simplified) example string:

1aw2,5cx7

As you can see, it is two digit/letter/letter/digit values separated by a comma.

Now, I could match this with the following:

>>> from re import match
>>> match("dwwd,dwwd", "1aw2,5cx7")
<_sre.SRE_Match object at 0x01749D40>
>>>

The problem is though, I have to write dwwd twice. With small patterns, this isn't so bad but, with more complex Regexes, writing the exact same thing twice makes the end pattern enormous and cumbersome to work with. It also seems redundant.

I tried using a named capture group:

>>> from re import match
>>> match("(?P<id>dwwd),(?P=id)", "1aw2,5cx7")
>>>

But it didn't work because it was looking for two occurrences of 1aw2, not digit/letter/letter/digit.

Is there any way to save part of a pattern, such as dwwd, so it can be used latter on in the same pattern? In other words, can I reuse a sub-pattern in a pattern?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

No, when using the standard library re module, regular expression patterns cannot be 'symbolized'.

You can always do so by re-using Python variables, of course:

digit_letter_letter_digit = r'dwwd'

then use string formatting to build the larger pattern:

match(r"{0},{0}".format(digit_letter_letter_digit), inputtext)

or, using Python 3.6+ f-strings:

dlld = r'dwwd'
match(fr"{dlld},{dlld}", inputtext)

I often do use this technique to compose larger, more complex patterns from re-usable sub-patterns.

If you are prepared to install an external library, then the regex project can solve this problem with a regex subroutine call. The syntax (?<digit>) re-uses the pattern of an already used (implicitly numbered) capturing group:

(dwwd),(?1)
^........^ ^..^
|           
|             re-use pattern of capturing group 1  

  capturing group 1

You can do the same with named capturing groups, where (?<groupname>...) is the named group groupname, and (?&groupname), (?P&groupname) or (?P>groupname) re-use the pattern matched by groupname (the latter two forms are alternatives for compatibility with other engines).

And finally, regex supports the (?(DEFINE)...) block to 'define' subroutine patterns without them actually matching anything at that stage. You can put multiple (..) and (?<name>...) capturing groups in that construct to then later refer to them in the actual pattern:

(?(DEFINE)(?<dlld>dwwd))(?&dlld),(?&dlld)
          ^...............^ ^......^ ^......^
          |                           /          
 creates 'dlld' pattern      uses 'dlld' pattern twice

Just to be explicit: the standard library re module does not support subroutine patterns.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

2.1m questions

2.1m answers

60 comments

57.0k users

...