Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
297 views
in Technique[技术] by (71.8m points)

Create a new column based on a condition with Spark SQL?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

SQL tables represent unordered sets (technically multi-sets because they can have duplicates). There is no ordering without a column that specifies the ordering. There is no "last" row, because there is no ordering.

Often, such an ordering is available as an id or an insertion timestamp or something else. If you have such a column, you can enumerate the rows for each no and then add the label:

select t.*,
       (case when row_number() over (partition by type, no order by <ordering col> desc) = 1
             then 'done'
        end) as flag
from t;

Note: This guarantees that exactly one row with no/type has "done". This is true even if the rows are interleaved (based on the ordering column). If you just want to check if the next row is different, then you can use lead() instead.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...