I have a sample data and I have been trying to get the required data as shown below. I have able to implement some sort of Islands and gaps solution as well. Here is the most near version I reached near the expected data.
DROP TABLE IF EXISTS #data
CREATE TABLE #data(
factor varchar(50),
val int,
[start_date] date, [end_date] date
)
Go
INSERT INTO #data VALUES
('a', 15, '2021-01-01', '2021-01-05'),
('a', 15, '2021-01-08', '2021-01-10'),
('a', 20, '2021-01-11', '2021-01-20'),
('a', 15, '2021-01-21', '2099-01-01'),
('b', 10, '2021-01-01', '2021-01-04'),
('b', 12, '2021-01-05', '2021-01-13'),
('b', 12, '2021-01-17', '2021-01-19'),
('b', 12, '2021-01-20', '2021-01-23'),
('b', 10, '2021-01-24', '2099-01-01');
WITH value_cte As (
SELECT * ,
RANK() OVER(PARTITION BY factor ORDER BY [start_date]) - RANK() OVER(PARTITION BY factor, val ORDER BY [start_date]) grp
FROM #data
)
SELECT factor, val, MIN(start_date) st, MAX(end_date) ed
FROM value_cte
GROUP BY factor, val, grp
ORDER BY factor, st
Result from above query:
Expected Result:
factor val st ed
a 15 2021-01-01 2021-01-05
a 15 2021-01-08 2021-01-10
a 20 2021-01-11 2021-01-20
a 15 2021-01-21 2099-01-01
b 10 2021-01-01 2021-01-04
b 12 2021-01-05 2021-01-13
b 12 2021-01-17 2021-01-23
b 10 2021-01-24 2099-01-01
Even if the value is same for the two consecutive islands and there is a gap then that gap should not be merged and if the two islands are continuous then they should be merged. Unfortunately, I can't change the source here (sample data structure)
question from:
https://stackoverflow.com/questions/65940060/islands-and-gaps-problem-on-time-series-data