I’m attempting to get a window operate to return and get a earlier row by a selected date and am not fairly certain what goes flawed however it’s giving me the earlier row as a substitute of the desired date row. To calculate this I’m taking the present rows date and discovering the present Monday in relation to that week like so
def previous_day(date, dayOfWeek): return date_sub(next_day(date, "monday"), 7) spark_df = spark_df.withColumn("last_monday", previous_day(spark_df['calendarday'], "monday"))
I’m then calculating the distinction between the present day and its closest earlier Monday in days by
d = F.datediff(spark_df['calendarday'], spark_df['last_monday']) spark_df = spark_df.withColumn("daysSinceMonday",d)
I can see from my daysSinceMonday worth is appropriate per row. Subsequent I need to create a window and select the primary row it it however vary them by the d worth that i arrange however for some motive it would not work.
days = lambda i: i * 86400 w = (Window.partitionBy(column_list).orderBy(col('calendarday').forged("timestamp").forged("lengthy")).rangeBetween(-days(d), 0)) spark_df = spark_df.withColumn('PreviousYearUnique', first("yearUnique").over(w))
Any concepts what goes flawed?