You need two operations here: grouping (I'll demonstrate with dplyr
) and a rolling-window summation (I'll use zoo
).
Also, your data is a little too sparse to be very awesome grouping by all three of team, season, simulation_ID
, so I'll just show grouping by team
this time. For your larger data, replace group_by(team)
with group_by(team, season, simulation_ID)
.
library(dplyr)
k <- 3 # window size
dat %>%
group_by(team) %>%
mutate(accumulated_team_points = zoo::rollapply(team_points, k, FUN = sum, align = "right", fill = NA)) %>%
ungroup()
# # A tibble: 10 x 8
# match_ID season simulation_ID home_team team match_result team_points accumulated_team_points
# <int> <chr> <int> <lgl> <chr> <chr> <int> <int>
# 1 1 2015-2016 1 TRUE Manchester Utd Home win 3 NA
# 2 1 2015-2016 2 TRUE Manchester Utd Draw 1 NA
# 3 1 2015-2016 3 TRUE Manchester Utd Home win 3 7
# 4 1 2015-2016 1 FALSE Tottenham Home win 0 NA
# 5 1 2015-2016 2 FALSE Tottenham Home win 0 NA
# 6 1 2015-2016 3 FALSE Tottenham Away win 3 3
# 7 2 2015-2016 1 TRUE Leicester Home win 3 NA
# 8 2 2015-2016 2 TRUE Leicester Home win 3 NA
# 9 2 2015-2016 3 TRUE Leicester Away win 0 6
# 10 2 2015-2016 1 FALSE Sunderland Draw 1 NA
That defaults to NA
for the first k-1
instances in a window, which is usually a sane and defensible default. If, however, you want to sum even partial sums, then replace fill=NA
with partial=TRUE
:
dat %>%
group_by(team) %>%
mutate(accumulated_team_points = zoo::rollapply(team_points, k, FUN = sum, align = "right", partial = TRUE)) %>%
ungroup()
# # A tibble: 10 x 8
# match_ID season simulation_ID home_team team match_result team_points accumulated_team_points
# <int> <chr> <int> <lgl> <chr> <chr> <int> <int>
# 1 1 2015-2016 1 TRUE Manchester Utd Home win 3 3
# 2 1 2015-2016 2 TRUE Manchester Utd Draw 1 4
# 3 1 2015-2016 3 TRUE Manchester Utd Home win 3 7
# 4 1 2015-2016 1 FALSE Tottenham Home win 0 0
# 5 1 2015-2016 2 FALSE Tottenham Home win 0 0
# 6 1 2015-2016 3 FALSE Tottenham Away win 3 3
# 7 2 2015-2016 1 TRUE Leicester Home win 3 3
# 8 2 2015-2016 2 TRUE Leicester Home win 3 6
# 9 2 2015-2016 3 TRUE Leicester Away win 0 6
# 10 2 2015-2016 1 FALSE Sunderland Draw 1 1
FYI: I'm assuming that the data is pre-ordered.
Data
dat <- structure(list(match_ID = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), season = c("2015-2016", "2015-2016", "2015-2016", "2015-2016", "2015-2016", "2015-2016", "2015-2016", "2015-2016", "2015-2016", "2015-2016"), simulation_ID = c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L), home_team = c(TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, FALSE), team = c("Manchester Utd", "Manchester Utd", "Manchester Utd", "Tottenham", "Tottenham", "Tottenham", "Leicester", "Leicester", "Leicester", "Sunderland"), match_result = c("Home win", "Draw", "Home win", "Home win", "Home win", "Away win", "Home win", "Home win", "Away win", "Draw"), team_points = c(3L, 1L, 3L, 0L, 0L, 3L, 3L, 3L, 0L, 1L)), class = "data.frame", row.names = c(NA, -10L))