I need to add the same number before the last character in a string (thats in a column of a spark dataframe) using pyspark
.
For example, say I have the string 2020_week4
or 2021_week5
. I need to add a zero in front of 4 and the 5 like so: 2020_week04
or 2021_week05
. The larger context is that the replacement is conditional -only for single digit weeks. So something along the lines of:
df.withColumn('week', when(len(col("week")) == 10, regexp_replace(week, REGEX_PATTERN, "0")).otherwise(col("week")))
Things to note, the week
column will always be 10 characters long for the single digit strings that need replacing.
Per @thefourthbird 's suggestion in regards to the regex statement, I tried the following:
df1.withColumn('week', when(len(col("week")) == 10, regexp_replace(week, "^d{4}_week(?=d$)", "$00")).otherwise(col("week")))
The error I'm getting has nothing to do with the regex itself but rather how to implement regex in general in pyspark. Error:
TypeError: object of type 'Column' has no len()
I also tried:
import pyspark.sql.functions as F
df1.withColumn('week', when(F.length("week") == 10, regexp_replace(week, "^d{4}_week(?=d$)", "$00")).otherwise(col("week")))
Error:
NameError: name 'week' is not defined
UPDATE:
df10.withColumn('week', when(length(col('week')) == 10, regexp_replace("week", "(?<=k)(?=d$)", "0")).otherwise(col("week")))
question from:
https://stackoverflow.com/questions/65893437/add-number-to-a-string-before-the-last-character-in-the-string-using-regex-in-py 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…