Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
193 views
in Technique[技术] by (71.8m points)

r - Function that sums previous N observations

I would like to find an efficient way to sum the number of points that a team received in the previous N matches (not including the points in the current match) for every combination of simulation_ID, team and season. If a team played less than N matches so far, the function should return NA.

A simplified data set:

match_ID season simulation_ID home_team team match_result team_points
1 2015-2016 1 TRUE Manchester Utd Home win 3
1 2015-2016 2 TRUE Manchester Utd Draw 1
1 2015-2016 3 TRUE Manchester Utd Home win 3
1 2015-2016 1 FALSE Tottenham Home win 0
1 2015-2016 2 FALSE Tottenham Home win 0
1 2015-2016 3 FALSE Tottenham Away win 3
2 2015-2016 1 TRUE Leicester Home win 3
2 2015-2016 2 TRUE Leicester Home win 3
2 2015-2016 3 TRUE Leicester Away win 0
2 2015-2016 1 FALSE Sunderland Draw 1
question from:https://stackoverflow.com/questions/65890463/function-that-sums-previous-n-observations

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

You need two operations here: grouping (I'll demonstrate with dplyr) and a rolling-window summation (I'll use zoo).

Also, your data is a little too sparse to be very awesome grouping by all three of team, season, simulation_ID, so I'll just show grouping by team this time. For your larger data, replace group_by(team) with group_by(team, season, simulation_ID).

library(dplyr)
k <- 3 # window size
dat %>%
  group_by(team) %>%
  mutate(accumulated_team_points = zoo::rollapply(team_points, k, FUN = sum, align = "right", fill = NA)) %>%
  ungroup()
# # A tibble: 10 x 8
#    match_ID season    simulation_ID home_team team           match_result team_points accumulated_team_points
#       <int> <chr>             <int> <lgl>     <chr>          <chr>              <int>                   <int>
#  1        1 2015-2016             1 TRUE      Manchester Utd Home win               3                      NA
#  2        1 2015-2016             2 TRUE      Manchester Utd Draw                   1                      NA
#  3        1 2015-2016             3 TRUE      Manchester Utd Home win               3                       7
#  4        1 2015-2016             1 FALSE     Tottenham      Home win               0                      NA
#  5        1 2015-2016             2 FALSE     Tottenham      Home win               0                      NA
#  6        1 2015-2016             3 FALSE     Tottenham      Away win               3                       3
#  7        2 2015-2016             1 TRUE      Leicester      Home win               3                      NA
#  8        2 2015-2016             2 TRUE      Leicester      Home win               3                      NA
#  9        2 2015-2016             3 TRUE      Leicester      Away win               0                       6
# 10        2 2015-2016             1 FALSE     Sunderland     Draw                   1                      NA     

That defaults to NA for the first k-1 instances in a window, which is usually a sane and defensible default. If, however, you want to sum even partial sums, then replace fill=NA with partial=TRUE:

dat %>%
  group_by(team) %>%
  mutate(accumulated_team_points = zoo::rollapply(team_points, k, FUN = sum, align = "right", partial = TRUE)) %>%
  ungroup()
# # A tibble: 10 x 8
#    match_ID season    simulation_ID home_team team           match_result team_points accumulated_team_points
#       <int> <chr>             <int> <lgl>     <chr>          <chr>              <int>                   <int>
#  1        1 2015-2016             1 TRUE      Manchester Utd Home win               3                       3
#  2        1 2015-2016             2 TRUE      Manchester Utd Draw                   1                       4
#  3        1 2015-2016             3 TRUE      Manchester Utd Home win               3                       7
#  4        1 2015-2016             1 FALSE     Tottenham      Home win               0                       0
#  5        1 2015-2016             2 FALSE     Tottenham      Home win               0                       0
#  6        1 2015-2016             3 FALSE     Tottenham      Away win               3                       3
#  7        2 2015-2016             1 TRUE      Leicester      Home win               3                       3
#  8        2 2015-2016             2 TRUE      Leicester      Home win               3                       6
#  9        2 2015-2016             3 TRUE      Leicester      Away win               0                       6
# 10        2 2015-2016             1 FALSE     Sunderland     Draw                   1                       1

FYI: I'm assuming that the data is pre-ordered.


Data

dat <- structure(list(match_ID = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), season = c("2015-2016", "2015-2016", "2015-2016", "2015-2016", "2015-2016", "2015-2016", "2015-2016", "2015-2016", "2015-2016", "2015-2016"), simulation_ID = c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L), home_team = c(TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, FALSE), team = c("Manchester Utd", "Manchester Utd", "Manchester Utd", "Tottenham", "Tottenham", "Tottenham", "Leicester", "Leicester", "Leicester", "Sunderland"), match_result = c("Home win", "Draw", "Home win", "Home win", "Home win", "Away win", "Home win", "Home win", "Away win", "Draw"), team_points = c(3L, 1L, 3L, 0L, 0L, 3L, 3L, 3L, 0L, 1L)), class = "data.frame", row.names = c(NA, -10L))

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...