There are two ways to define "bins" for a number range such that all provided numbers are within one of the bins:
- find the minimum, find the maximum, and since
Date
-bins are generally right=FALSE
meaning right-open, bump the maximum out a little; or
- find the minimum, and don't find the maximum, instead use
Inf
so that it always contains the maximum values.
cut.Date
chose the first of the two. Further, instead of "bump out from the maximum by 1 day", it chose to "bump out by 'step'". This means that when you say "2 months"
, it will ensure that the next bin "edge" is 2 months from the penultimate boundary.
Namely, if you look at the source for cut.Date
:
start <- as.POSIXlt(min(x, na.rm = TRUE))
# ...
end <- as.POSIXlt(max(x, na.rm = TRUE))
# and then if 'months', then
end <- as.POSIXlt(end + (31 * step * 86400))
# and eventually
breaks <- as.Date(seq(start, end, breaks))
So I'll debug(cut.Date)
and take a look at cut(dates, "2 months")
:
start
# [1] "2021-01-01 UTC"
# debug: end <- as.POSIXlt(max(x, na.rm = TRUE))
# debug: step <- if (length(by2) == 2L) as.integer(by2[1L]) else 1L
end
# [1] "2021-12-31 UTC"
step
# [1] 2
# debug: as.integer(by2[1L])
# debug: end <- as.POSIXlt(end + (31 * step * 86400))
end
# [1] "2022-03-03 UTC"
# debug: end$mday <- 1L
# debug: end$isdst <- -1L
# debug: breaks <- as.Date(seq(start, end, breaks))
breaks
# [1] "2021-01-01" "2021-03-01" "2021-05-01" "2021-07-01" "2021-09-01" "2021-11-01" "2022-01-01"
# [8] "2022-03-01"
It then eventually does breaks[-length(breaks)]
, which explains why we don't see eight. My guess is that there are corner cases (leap years, perhaps?) where the 31 * step * 86400
(or other by
-units) do not always align perfectly, so they buffered it a little.
Long story short (too late), I suggest you use labels=FALSE
instead.
sextiles <- cut(dates, "2 months", labels = FALSE)
table(sextiles)
# sextiles
# 1 2 3 4 5 6
# 59 61 61 62 61 61
If you want them to be integer-looking factor
s (which are string levels with true integers underneath), then perhaps
sextiles <- factor(sextiles)
head(sextiles)
# [1] 1 1 1 1 1 1
# Levels: 1 2 3 4 5 6
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…