r - Function that sums previous N observations

Question

Welcome To Ask or Share your Answers For Others

r - Function that sums previous N observations

asked Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

r - Function that sums previous N observations

I would like to find an efficient way to sum the number of points that a team received in the previous N matches (not including the points in the current match) for every combination of simulation_ID, team and season. If a team played less than N matches so far, the function should return NA.

A simplified data set:

match_ID	season	simulation_ID	home_team	team	match_result	team_points
1	2015-2016	1	TRUE	Manchester Utd	Home win	3
1	2015-2016	2	TRUE	Manchester Utd	Draw	1
1	2015-2016	3	TRUE	Manchester Utd	Home win	3
1	2015-2016	1	FALSE	Tottenham	Home win	0
1	2015-2016	2	FALSE	Tottenham	Home win	0
1	2015-2016	3	FALSE	Tottenham	Away win	3
2	2015-2016	1	TRUE	Leicester	Home win	3
2	2015-2016	2	TRUE	Leicester	Home win	3
2	2015-2016	3	TRUE	Leicester	Away win	0
2	2015-2016	1	FALSE	Sunderland	Draw	1

question from:https://stackoverflow.com/questions/65890463/function-that-sums-previous-n-observations

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-06T19:18:50+0000

You need two operations here: grouping (I'll demonstrate with dplyr) and a rolling-window summation (I'll use zoo).

Also, your data is a little too sparse to be very awesome grouping by all three of team, season, simulation_ID, so I'll just show grouping by team this time. For your larger data, replace group_by(team) with group_by(team, season, simulation_ID).

library(dplyr)
k <- 3 # window size
dat %>%
  group_by(team) %>%
  mutate(accumulated_team_points = zoo::rollapply(team_points, k, FUN = sum, align = "right", fill = NA)) %>%
  ungroup()
# # A tibble: 10 x 8
#    match_ID season    simulation_ID home_team team           match_result team_points accumulated_team_points
#       <int> <chr>             <int> <lgl>     <chr>          <chr>              <int>                   <int>
#  1        1 2015-2016             1 TRUE      Manchester Utd Home win               3                      NA
#  2        1 2015-2016             2 TRUE      Manchester Utd Draw                   1                      NA
#  3        1 2015-2016             3 TRUE      Manchester Utd Home win               3                       7
#  4        1 2015-2016             1 FALSE     Tottenham      Home win               0                      NA
#  5        1 2015-2016             2 FALSE     Tottenham      Home win               0                      NA
#  6        1 2015-2016             3 FALSE     Tottenham      Away win               3                       3
#  7        2 2015-2016             1 TRUE      Leicester      Home win               3                      NA
#  8        2 2015-2016             2 TRUE      Leicester      Home win               3                      NA
#  9        2 2015-2016             3 TRUE      Leicester      Away win               0                       6
# 10        2 2015-2016             1 FALSE     Sunderland     Draw                   1                      NA

That defaults to NA for the first k-1 instances in a window, which is usually a sane and defensible default. If, however, you want to sum even partial sums, then replace fill=NA with partial=TRUE:

dat %>%
  group_by(team) %>%
  mutate(accumulated_team_points = zoo::rollapply(team_points, k, FUN = sum, align = "right", partial = TRUE)) %>%
  ungroup()
# # A tibble: 10 x 8
#    match_ID season    simulation_ID home_team team           match_result team_points accumulated_team_points
#       <int> <chr>             <int> <lgl>     <chr>          <chr>              <int>                   <int>
#  1        1 2015-2016             1 TRUE      Manchester Utd Home win               3                       3
#  2        1 2015-2016             2 TRUE      Manchester Utd Draw                   1                       4
#  3        1 2015-2016             3 TRUE      Manchester Utd Home win               3                       7
#  4        1 2015-2016             1 FALSE     Tottenham      Home win               0                       0
#  5        1 2015-2016             2 FALSE     Tottenham      Home win               0                       0
#  6        1 2015-2016             3 FALSE     Tottenham      Away win               3                       3
#  7        2 2015-2016             1 TRUE      Leicester      Home win               3                       3
#  8        2 2015-2016             2 TRUE      Leicester      Home win               3                       6
#  9        2 2015-2016             3 TRUE      Leicester      Away win               0                       6
# 10        2 2015-2016             1 FALSE     Sunderland     Draw                   1                       1

FYI: I'm assuming that the data is pre-ordered.

Data

dat <- structure(list(match_ID = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), season = c("2015-2016", "2015-2016", "2015-2016", "2015-2016", "2015-2016", "2015-2016", "2015-2016", "2015-2016", "2015-2016", "2015-2016"), simulation_ID = c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L), home_team = c(TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, FALSE), team = c("Manchester Utd", "Manchester Utd", "Manchester Utd", "Tottenham", "Tottenham", "Tottenham", "Leicester", "Leicester", "Leicester", "Sunderland"), match_result = c("Home win", "Draw", "Home win", "Home win", "Home win", "Away win", "Home win", "Home win", "Away win", "Draw"), team_points = c(3L, 1L, 3L, 0L, 0L, 3L, 3L, 3L, 0L, 1L)), class = "data.frame", row.names = c(NA, -10L))

Categories

r - Function that sums previous N observations

r - Function that sums previous N observations

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags