Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
164 views
in Technique[技术] by (71.8m points)

r - Fill Dates Between Two Event Timestamps Formatted as POSIXct

I have a dataframe containing the start and end timestamps of events. I would like to create a sequence of timestamps to fill the dates between these events. Here's the structure of my data:

dat <- structure(list(event_id = 1:2,
                      start_time = structure(c(1617346800,1617348000),
                                             class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
                      end_time = structure(c(1617347400, 1617348300),
                                           class = c("POSIXct", "POSIXt"), tzone = "UTC")),
                 class = "data.frame", row.names = c(NA, -2L))

What I'm hoping to do is lengthen the data frame so there are as many rows as there are minutes between the two events. So you'd end up with something like this:

final <- structure(list(event_id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, NA, NA, NA, NA, NA, NA, NA, NA, NA, 2L, 2L, 2L, 2L, 2L, 
2L), start_time = structure(c(1617346800, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1617348000, 
NA, NA, NA, NA, NA), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
    end_time = structure(c(NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, 1617347400, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, 1617348300), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
    record_time = c("2/4/2021 7:00", "2/4/2021 7:01", "2/4/2021 7:02", 
    "2/4/2021 7:03", "2/4/2021 7:04", "2/4/2021 7:05", "2/4/2021 7:06", 
    "2/4/2021 7:07", "2/4/2021 7:08", "2/4/2021 7:09", "2/4/2021 7:10", 
    "2/4/2021 7:11", "2/4/2021 7:12", "2/4/2021 7:13", "2/4/2021 7:14", 
    "2/4/2021 7:15", "2/4/2021 7:16", "2/4/2021 7:17", "2/4/2021 7:18", 
    "2/4/2021 7:19", "2/4/2021 7:20", "2/4/2021 7:21", "2/4/2021 7:22", 
    "2/4/2021 7:23", "2/4/2021 7:24", "2/4/2021 7:25")), class = "data.frame", row.names = c(NA, 
-26L))

So far what I've come up with is creating a dataframe of the full sequence:

timeline <- as.POSIXct(
                  seq.POSIXt(from = min(dat$start_time),
                       to = max(dat$end_time), by = "min"))

From here, I'm stuck as to how to join the two. Using a for loop I can make a list of sequences, though at that point, I'm still not sure how to get both these things together (also, I'm sure it could be done better with purrr but I'm not good with purrr yet).

event_timelines <- list()

for (row in 1:nrow(dat)) {
  event_timelines[[row]] <- seq.POSIXt(from = dat[row,]$start_time,
                                       to = dat[row,]$end_time,
                                       by = "min")
}

Thanks!

question from:https://stackoverflow.com/questions/66056236/fill-dates-between-two-event-timestamps-formatted-as-posixct

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Try this dplyr approach (with fuzzyjoin).

library(dplyr)

timerange <- range(unlist(dat[,c("start_time", "end_time")]))
attributes(timerange) <- attributes(dat$start_time)

tibble(record_time = seq(timerange[1], timerange[2], by = "min")) %>%
  fuzzyjoin::fuzzy_full_join(
    dat,
    by = c("record_time" = "start_time", "record_time" = "end_time"),
    match_fun = list(`>=`, `<=`)
  ) %>%
  group_by(event_id) %>%
  mutate(
    start_time = start_time[c(1, rep(NA, n()-1))],
    end_time = end_time[c(rep(NA, n()-1), 1)]
  ) %>%
  arrange(record_time) %>%
  ungroup() %>%
  print(n=99)
# # A tibble: 26 x 4
#    record_time         event_id start_time          end_time           
#    <dttm>                 <int> <dttm>              <dttm>             
#  1 2021-04-02 07:00:00        1 2021-04-02 07:00:00 NA                 
#  2 2021-04-02 07:01:00        1 NA                  NA                 
#  3 2021-04-02 07:02:00        1 NA                  NA                 
#  4 2021-04-02 07:03:00        1 NA                  NA                 
#  5 2021-04-02 07:04:00        1 NA                  NA                 
#  6 2021-04-02 07:05:00        1 NA                  NA                 
#  7 2021-04-02 07:06:00        1 NA                  NA                 
#  8 2021-04-02 07:07:00        1 NA                  NA                 
#  9 2021-04-02 07:08:00        1 NA                  NA                 
# 10 2021-04-02 07:09:00        1 NA                  NA                 
# 11 2021-04-02 07:10:00        1 NA                  2021-04-02 07:10:00
# 12 2021-04-02 07:11:00       NA NA                  NA                 
# 13 2021-04-02 07:12:00       NA NA                  NA                 
# 14 2021-04-02 07:13:00       NA NA                  NA                 
# 15 2021-04-02 07:14:00       NA NA                  NA                 
# 16 2021-04-02 07:15:00       NA NA                  NA                 
# 17 2021-04-02 07:16:00       NA NA                  NA                 
# 18 2021-04-02 07:17:00       NA NA                  NA                 
# 19 2021-04-02 07:18:00       NA NA                  NA                 
# 20 2021-04-02 07:19:00       NA NA                  NA                 
# 21 2021-04-02 07:20:00        2 2021-04-02 07:20:00 NA                 
# 22 2021-04-02 07:21:00        2 NA                  NA                 
# 23 2021-04-02 07:22:00        2 NA                  NA                 
# 24 2021-04-02 07:23:00        2 NA                  NA                 
# 25 2021-04-02 07:24:00        2 NA                  NA                 
# 26 2021-04-02 07:25:00        2 NA                  2021-04-02 07:25:00

The reason I chose to use fuzzyjoin is because POSIXt are floating-point numeric-like with some attributes, and sometimes (due to general computing challenges, see Why are these numbers not equal?, Is floating point math broken?, and https://en.wikipedia.org/wiki/IEEE_754), "floating-point equality" is not always guaranteed. In this case, fuzzyjoin is doing a range-based join, effectively is record_time between start_time and end_time.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...