Change Timestamp Year in Spark
Spark does not provide a standard function to modify a date. In my case users have incorrectly entered a year so the timestamps came out as 0021-01-31 15:00:00
where they need to be 2021-01-31 15:00:00
. The trick I used to change the year:
df = (df
.withColumn("dateTime1", f.concat(f.year(df.dateTime) + 2000,
f.date_format(df.dateTime, "-MM-dd HH:mm:ss")).cast("timestamp")))
What it does:
- Format timestamp as
-MM-dd HH:mm:ss
i.e. part without a year. - Get year part with built-in
year
function in Spark. - Concatenate year with formatted string.
- Parse the string as
timestamp
again.
And sample result:
dateTime | dateTime1 |
---|---|
0021-01-27 17:15:00 | 2021-01-27 17:15:00 |
0021-01-28 06:30:00 | 2021-01-28 06:30:00 |
0021-01-30 06:33:00 | 2021-01-30 06:33:00 |
0021-01-31 07:00:00 | 2021-01-31 07:00:00 |
0021-01-31 07:00:00 | 2021-01-31 07:00:00 |
0201-01-31 18:40:00 | 2201-01-31 18:40:00 |
To contact me, send an email anytime or leave a comment below.