Create a new column based on a condition with Spark SQL?

Question

Welcome To Ask or Share your Answers For Others

Create a new column based on a condition with Spark SQL?

1 Answer

深蓝 · Answer 1 · 2021-10-06T19:02:22+0000

SQL tables represent unordered sets (technically multi-sets because they can have duplicates). There is no ordering without a column that specifies the ordering. There is no "last" row, because there is no ordering.

Often, such an ordering is available as an id or an insertion timestamp or something else. If you have such a column, you can enumerate the rows for each no and then add the label:

select t.*,
       (case when row_number() over (partition by type, no order by <ordering col> desc) = 1
             then 'done'
        end) as flag
from t;

Note: This guarantees that exactly one row with no/type has "done". This is true even if the rows are interleaved (based on the ordering column). If you just want to check if the next row is different, then you can use lead() instead.

Categories

Create a new column based on a condition with Spark SQL?

Create a new column based on a condition with Spark SQL?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags