Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
672 views
in Technique[技术] by (71.8m points)

datetime - How to add/subtract an array of varying Month timedeltas in Pandas?

I wish to subtract some number of months from a datetime column. Each row has a different number of months to subtract. For example,

df = pd.DataFrame({
    'timestamp': pd.date_range('2017-10-01', '2018-10-01', freq='m'),
    'delta_in_months': [1, 4, 2, 5, 1, 3, 1, 5, 2, 4, 1, 3]
})

The outcome should look as so (the day rounding irrelevant, it could be 01 or 28/29/30/31, it was easier to type in 01),

    timestamp   delta_in_months  new_timestamp
0   2017-10-31  1                2017-09-01
1   2017-11-30  4                2017-07-01
2   2017-12-31  2                2017-10-01
3   2018-01-31  5                2017-08-01
4   2018-02-28  1                2018-01-01
5   2018-03-31  3                2017-12-01
6   2018-04-30  1                2018-03-01
7   2018-05-31  5                2017-12-01
8   2018-06-30  2                2018-04-01
9   2018-07-31  4                2018-03-01
10  2018-08-31  1                2018-07-01
11  2018-09-30  3                2018-06-01

Bear in mind that this will be for a much larger dataframe.


I have tried,

months_delta = df.delta_in_months.apply(pd.tseries.offsets.MonthOffset)
df['new_timestamp'] = df.timestamp - months_delta

but this gave very unexpected results, with each row entry being a DatetimeIndex.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

If you are landing here because you were searching for a vectorized, fast and correct solution to the problem of adding a variable number of months to a Series of Timestamps, then read on.

In some problems, we do indeed want to add actual months (the way pd.offsets.DateOffset(months=x) works), i.e.: 2021-01-31 + 1 month --> 2021-02-28, and not just "30 days". But trying to use pd.offsets.DateOffset directly raises a warning (PerformanceWarning: Adding/subtracting object-dtype array to DatetimeArray not vectorized ). For example:

  • dates + df['months'].apply(lambda m: pd.offsets.DateOffset(months=m))
  • dates + months * pd.offsets.DateOffset(months=1), which additionally is wrong in some cases (e.g. 2015-07-29 + 59 months should be 2020-06-29, not 2020-06-28).

Instead, we can do a bit of arithmetic ourselves and get a vectorized solution:

# note: not timezone-aware
def vadd_months(dates, months):
    ddt = dates.dt
    m = ddt.month - 1 + months
    mb = pd.to_datetime(pd.DataFrame({
        'year': ddt.year + m // 12,
        'month': (m % 12) + 1,
        'day': 1})) + (dates - dates.dt.normalize())
    me = mb + pd.offsets.MonthEnd()
    r = mb + (ddt.day - 1) * pd.Timedelta(days=1)
    r = np.minimum(r, me)
    return r

Usage for the OP example

df['new_timestamp'] = vadd_months(df['timestamp'], df['delta_in_months'])

Speed

n = int(100_000)
df = pd.DataFrame({
    'timestamp': pd.Series(pd.to_datetime(np.random.randint(
            pd.Timestamp('2000').value,
            pd.Timestamp('2020').value,
            n
        ))).dt.floor('1s'),
    'months': np.random.randint(0, 120, n),
})

%%time
newts = vadd_months(df['timestamp'], df['months'])
# CPU times: user 52.3 ms, sys: 4.01 ms, total: 56.3 ms

Verification

Check with the (non-vectorized) direct use of pd.offsets.DateOffset:

import warnings

%%time
with warnings.catch_warnings():
    warnings.simplefilter(action='ignore', category=pd.errors.PerformanceWarning)
    check = df['timestamp'] + df['months'].apply(lambda m: pd.offsets.DateOffset(months=m))
# CPU times: user 2.41 s, sys: 43.9 ms, total: 2.45 s

>>> newts.equals(check)
True

Note that vadd_months is 40x faster than the non-vectorized version, and there are no warnings to catch.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...