I have a function that removes punctuation from a list of strings:
def strip_punctuation(input):
x = 0
for word in input:
input[x] = re.sub(r'[^A-Za-z0-9 ]', "", input[x])
x += 1
return input
I recently modified my script to use Unicode strings so I could handle other non-Western characters. This function breaks when it encounters these special characters and just returns empty Unicode strings. How can I reliably remove punctuation from Unicode formatted strings?
Question&Answers:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…