Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
195 views
in Technique[技术] by (71.8m points)

dataframe - fill in NA by outcome of formula between previous and following non-NA values in R

I have the following dataframe:

day <- c(1,2,3,4,5,6,7,8,9, 10, 11)
totalItems <- c(700, NA, 32013, NA, NA, NA, 39599, NA, NA, NA, 107542)
df <- data.frame(day, totalItems)

I need to create another variable/column where NAs are replaced by the outcome of a formula: (following available non NA - previous available non NA) / (row number of following non NA - row number of previous non NA), in order to get this final dataframe :

day <- c(1,2,3,4,5,6,7,8,9, 10, 11)
totalItems <- c(700, NA, 32013, NA, NA, NA, 39599, NA, NA, NA, 107542)
estimatedDaily <- c(700, 15656, 15656, 1897, 1897, 1897, 1897, 16986, 16986, 16986, 16986)
df.new <- data.frame(day, totalItems, estimatedDaily)

I tried to juggle with tidyr::replace_na() but I couldn't figure out how to define the formula to be able to identify the previous and the following available non NA. Many thanks in advance for helping.

question from:https://stackoverflow.com/questions/65650171/fill-in-na-by-outcome-of-formula-between-previous-and-following-non-na-values-in

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

You can create groups in your data based on presence of NA values.

library(dplyr)

df1 <- df %>% mutate(group = cumsum(lag(!is.na(totalItems), default = TRUE)))
df1

#   day totalItems group
#1    1        700     1
#2    2         NA     2
#3    3      32013     2
#4    4         NA     3
#5    5         NA     3
#6    6         NA     3
#7    7      39599     3
#8    8         NA     4
#9    9         NA     4
#10  10         NA     4
#11  11     107542     4

Keep only the rows in df1 which has value in it apply the formula to each group and join it with df1 to get same number of rows back.

df1 %>%
  group_by(group) %>%
  slice(n()) %>%
  ungroup %>%
  transmute(group, estimatedDaily = (totalItems - lag(totalItems, default = 0))/
                                    (day - lag(day, default = 0))) %>%
  left_join(df1, by = 'group') %>%
  select(-group)

#  estimatedDaily   day totalItems
#            <dbl> <dbl>      <dbl>
# 1           700      1        700
# 2         15656.     2         NA
# 3         15656.     3      32013
# 4          1896.     4         NA
# 5          1896.     5         NA
# 6          1896.     6         NA
# 7          1896.     7      39599
# 8         16986.     8         NA
# 9         16986.     9         NA
#10         16986.    10         NA
#11         16986.    11     107542

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...