My situation is that I'm receiving transaction data from a vendor that has a datetime that is in local time but it has no offset. For example, the ModifiedDate column may have a value of
'2020-05-16T15:04:55.7429192+00:00'
I can get the local timezone by pulling some other data together about the store in which the transaction occurs
timezone_local = tz.timezone(tzDf[0]["COUNTRY"] + '/' + tzDf[0]["TIMEZONE"])
I then wrote a function to take those two values and give it the proper timezone:
from datetime import datetime
import dateutil.parser as parser
import pytz as tz
def convert_naive_to_aware(datetime_local_str, timezone_local):
yy = parser.parse(datetime_local_str).year
mm = parser.parse(datetime_local_str).month
dd = parser.parse(datetime_local_str).day
hh = parser.parse(datetime_local_str).hour
mm = parser.parse(datetime_local_str).minute
ss = parser.parse(datetime_local_str).second
# ms = parser.parse(datetime_local_str).microsecond
# print('yy:' + str(yy) + ', mm:' + str(mm) + ', dd:' + str(dd) + ', hh:' + str(hh) + ', mm:' + str(mm) + ', ss:' + str(ss))
aware = datetime(yy,mm,dd,hh,mm,ss,0,timezone_local)
return aware
It works fine when I send it the timestamp as a string in testing but balks when I try to apply it to a dataframe. I presume because I don't yet know the right way to pass the column value as a string. In this case, I'm trying to replace the current ModifiedTime value with the results of the call to the function.
from pyspark.sql import functions as F
.
.
.
ordersDf = ordersDf.withColumn("ModifiedTime", ( convert_naive_to_aware( F.substring( ordersDf.ModifiedTime, 1, 19 ), timezone_local)),)
Those of you more knowledgeable than I won't be surprised that I received the following error:
TypeError: 'Column' object is not callable
I admit, I'm a bit of a tyro at python and dataframes and I may well be taking the long way 'round. I've attempted a few other things such as ordersDf.ModifiedTime.cast("String"), etc but no luck I'd be grateful for any suggestions.
We're using Azure Databricks, the cluster is Scala 2.11.
question from:
https://stackoverflow.com/questions/65876730/pass-dataframe-column-value-to-function