How to subset rows based on date overlap range efficiently using python pandas?

Question

Welcome To Ask or Share your Answers For Others

How to subset rows based on date overlap range efficiently using python pandas?

asked Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

How to subset rows based on date overlap range efficiently using python pandas?

My data frame has two date type columns: start and end (yyyy-mm-dd).

Here's my data frame:

import pandas as pd
import datetime 

data=[["2016-10-17","2017-03-08"],["2014-08-17","2016-09-08"],["2014-01-01","2015-01-01"],["2017-12-20","2019-01-01"]]

df=pd.DataFrame(data,columns=['start','end'])
df['start'] =  pd.to_datetime(df['start'], format='%Y-%m-%d')
df['end'] =  pd.to_datetime(df['end'], format='%Y-%m-%d')

       start        end
0 2016-10-17 2017-03-08
1 2014-08-17 2016-09-08
2 2014-01-01 2015-01-01
3 2017-12-20 2019-01-01

And I have reference start and end date as following.

ref_start=datetime.date(2015, 9, 20)
ref_end=datetime.date(2017,1,31)
print(ref_start,ref_end)
2015-09-20 2017-01-31

I would like to subset rows if the start and end date range of a row overlaps with reference start and end date. The third and the fourth rows are not selected since the start and end date range does not overlap with reference date range (2015-09-20 ~ 2017-01-31)

So my desired outcome looks like this:

        start         end
0  2016-10-17  2017-03-08
1  2014-08-17  2016-09-08

To do that, I was thinking about using the following codes based on this: Efficient date range overlap calculation in python?

df[(max(df['start'],ref_start)>min(df['end'],ref_end))]

However, it doesn't work. Is there any way to get the desired outcome efficiently?

question from:https://stackoverflow.com/questions/65643401/how-to-subset-rows-based-on-date-overlap-range-efficiently-using-python-pandas

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-06T18:46:36+0000

A trick I learned early on in my career is what I call "crossing the dates":

# pd.Timestamp can do everything that datetime/date does and some more
ref_start = pd.Timestamp(2015, 9, 20)
ref_end = pd.Timestamp(2017,1,31)

# Compare the start of one range to the end of another and vice-versa
# Made into a separate variable for reability
cond = (ref_start <= df['end']) & (ref_end >= df['start'])
df[cond]

Categories

How to subset rows based on date overlap range efficiently using python pandas?

How to subset rows based on date overlap range efficiently using python pandas?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags