I'd like to generate cumulative sums with a reset if the "current" sum exceeds some threshold, using dplyr. In the below, I want to cumsum over 'a'.
library(dplyr)
library(tibble)
tib <- tibble(
t = c(1,2,3,4,5,6),
a = c(2,3,1,2,2,3)
)
# what I want
## thresh = 5
# A tibble: 6 x 4
# t a g c
# <dbl> <dbl> <int> <dbl>
# 1 1.00 2.00 0 2.00
# 2 2.00 3.00 0 5.00
# 3 3.00 1.00 1 1.00
# 4 4.00 2.00 1 3.00
# 5 5.00 2.00 1 5.00
# 6 6.00 3.00 2 3.00
# what I want
## thresh = 4
# A tibble: 6 x 4
# t a g c
# <dbl> <dbl> <int> <dbl>
# 1 1.00 2.00 0 2.00
# 2 2.00 3.00 0 5.00
# 3 3.00 1.00 1 1.00
# 4 4.00 2.00 1 3.00
# 5 5.00 2.00 1 5.00
# 6 6.00 3.00 2 3.00
# what I want
## thresh = 6
# A tibble: 6 x 4
# t a g c
# <dbl> <dbl> <int> <dbl>
# 1 1.00 2.00 0 2.00
# 2 2.00 3.00 0 5.00
# 3 3.00 1.00 0 6.00
# 4 4.00 2.00 1 2.00
# 5 5.00 2.00 1 4.00
# 6 6.00 3.00 1 7.00
I've examined many of the similar questions here (such as resetting cumsum if value goes to negative in r) and have gotten what I hoped was close, but no.
I've tried variants of
thresh <-5
tib %>%
group_by(g = cumsum(lag(cumsum(a) >= thresh, default = FALSE))) %>%
mutate(c = cumsum(a)) %>%
ungroup()
which returns
# A tibble: 6 x 4
t a g c
<dbl> <dbl> <int> <dbl>
1 1.00 2.00 0 2.00
2 2.00 3.00 0 5.00
3 3.00 1.00 1 1.00
4 4.00 2.00 2 2.00
5 5.00 2.00 3 2.00
6 6.00 3.00 4 3.00
You can see that the "group" is not getting reset after the first time.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…