Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
123 views
in Technique[技术] by (71.8m points)

Python regex - r prefix

Can anyone explain why example 1 below works, when the r prefix is not used? I thought the r prefix must be used whenever escape sequences are used. Example 2 and example 3 demonstrate this.

# example 1
import re
print (re.sub('s+', ' ', 'hello     there      there'))
# prints 'hello there there' - not expected as r prefix is not used

# example 2
import re
print (re.sub(r'(w+)(s+1)+', r'1', 'hello     there      there'))
# prints 'hello     there' - as expected as r prefix is used

# example 3
import re
print (re.sub('(w+)(s+1)+', '1', 'hello     there      there'))
# prints 'hello     there      there' - as expected as r prefix is not used
Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Because begin escape sequences only when they are valid escape sequences.

>>> '
'
'
'
>>> r'
'
'\n'
>>> print '
'


>>> print r'
'


>>> 's'
'\s'
>>> r's'
'\s'
>>> print 's'
s
>>> print r's'
s

Unless an 'r' or 'R' prefix is present, escape sequences in strings are interpreted according to rules similar to those used by Standard C. The recognized escape sequences are:

Escape Sequence   Meaning Notes

ewline  Ignored  
\    Backslash ()    
'    Single quote (')     
"    Double quote (")     
a    ASCII Bell (BEL)     
    ASCII Backspace (BS)     
f    ASCII Formfeed (FF)  

    ASCII Linefeed (LF)  
N{name}  Character named name in the Unicode database (Unicode only)  

    ASCII Carriage Return (CR)   
    ASCII Horizontal Tab (TAB)   
uxxxx    Character with 16-bit hex value xxxx (Unicode only) 
Uxxxxxxxx    Character with 32-bit hex value xxxxxxxx (Unicode only) 
v    ASCII Vertical Tab (VT)  
ooo  Character with octal value ooo
xhh  Character with hex value hh

Never rely on raw strings for path literals, as raw strings have some rather peculiar inner workings, known to have bitten people in the ass:

When an "r" or "R" prefix is present, a character following a backslash is included in the string without change, and all backslashes are left in the string. For example, the string literal r" " consists of two characters: a backslash and a lowercase "n". String quotes can be escaped with a backslash, but the backslash remains in the string; for example, r""" is a valid string literal consisting of two characters: a backslash and a double quote; r"" is not a valid string literal (even a raw string cannot end in an odd number of backslashes). Specifically, a raw string cannot end in a single backslash (since the backslash would escape the following quote character). Note also that a single backslash followed by a newline is interpreted as those two characters as part of the string, not as a line continuation.

To better illustrate this last point:

>>> r''
SyntaxError: EOL while scanning string literal
>>> r'''
"\'"
>>> ''
SyntaxError: EOL while scanning string literal
>>> '''
"'"
>>> 
>>> r'\'
'\\'
>>> '\'
'\'
>>> print r'\'
\
>>> print r''
SyntaxError: EOL while scanning string literal
>>> print '\'


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

2.1m questions

2.1m answers

60 comments

57.0k users

...