Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
302 views
in Technique[技术] by (71.8m points)

python - Pandas Remove a row when a particular kind of value appears in a column

I've a DF like this

         UNIT  EXITSn_hourly           Interval
1867     R081            104  00:00:00-04:00:00
1868     R081              0  04:00:00-04:00:00
1869     R081            129  04:00:00-08:00:00
1870     R081            521  08:00:00-12:00:00
1871     R081           1048  12:00:00-16:00:00
2838     R032             38  00:00:00-04:00:00
2839     R032              0  04:00:00-04:00:00
2840     R032             89  04:00:00-08:00:00
2841     R032            470  08:00:00-12:00:00

I need to delete entire row when Interval has this particular format

1868     R081              0  04:00:00-04:00:00

I not only want to remove 04:00:00-04:00:00 but also such similar values like

01:00:00-01:00:00

Actually this is my original df. I created an Interval

    C/A  UNIT       SCP     DATEn     TIMEn    DESCn  ENTRIESn   EXITSn
0  A002  R051  02-00-00  06-29-13  00:00:00  REGULAR   4174592  1433672
1  A002  R051  02-00-00  06-29-13  04:00:00  REGULAR   4174628  1433675
2  A002  R051  02-00-00  06-29-13  08:00:00  REGULAR   4174641  1433706
3  A002  R051  02-00-00  06-29-13  12:00:00  REGULAR   4174741  1433775
4  A002  R051  02-00-00  06-29-13  16:00:00  REGULAR   4174936  1433826
5  A002  R051  02-00-00  06-29-13  20:00:00  REGULAR   4175270  1433877
6  A002  R051  02-00-00  06-30-13  00:00:00  REGULAR   4175403  1433908
7  A002  R051  02-00-00  06-30-13  04:00:00  REGULAR   4175441  1433914
8  A002  R051  02-00-00  06-30-13  08:00:00  REGULAR   4175457  1433928
9  A002  R051  02-00-00  06-30-13  12:00:00  REGULAR   4175520  1433981

I created interval using this code

import copy

df = copy.deepcopy(turnstile_data)
pdf = df.shift(periods=1)

df['ENTRIESn_hourly'] = df['ENTRIESn'] - pdf['ENTRIESn'].fillna(0)
df['EXITSn_hourly'] = df['EXITSn'] - pdf['EXITSn'].fillna(0)
df['Interval'] = pdf['TIMEn']+'-'+ df['TIMEn'].fillna(0)
df.loc[(df['ENTRIESn'] == 0), 'ENTRIESn_hourly'] = 0
df.loc[(df['EXITSn'] == 0), 'EXITSn_hourly'] = 0
df.loc[(df['C/A'] != pdf['C/A']) | (df['UNIT'] != pdf['UNIT']) | (df['SCP'] != pdf['SCP']), ['ENTRIESn_hourly', 'EXITSn_hourly','Interval']] = 0

df = df[df.Interval != 0]
print df.head(20)

head7=copy.deepcopy(df)
required_df=head7[['UNIT','EXITSn_hourly','Interval']].groupby(head7.UNIT)
print required_df.head(5)
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Probably you want to split Interval into Interval_start and Interval_end and check whether they're equal:

df['Interval_start'] = df['Interval'].map(lambda s: s.split('-')[0])
df['Interval_end'] = df['Interval'].map(lambda s: s.split('-')[1])
df.query("Interval_start != Interval_end")

      UNIT  EXITSn_hourly           Interval Interval_start Interval_end
1867  R081            104  00:00:00-04:00:00       00:00:00     04:00:00
1869  R081            129  04:00:00-08:00:00       04:00:00     08:00:00
1870  R081            521  08:00:00-12:00:00       08:00:00     12:00:00
1871  R081           1048  12:00:00-16:00:00       12:00:00     16:00:00
2838  R032             38  00:00:00-04:00:00       00:00:00     04:00:00
2840  R032             89  04:00:00-08:00:00       04:00:00     08:00:00
2841  R032            470  08:00:00-12:00:00       08:00:00     12:00:00

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

2.1m questions

2.1m answers

60 comments

56.8k users

...