Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
218 views
in Technique[技术] by (71.8m points)

python - Concatenate columns with different lengths based on specific columns in Pandas

I have two different txt files containing the same number of columns but different lengths, i.e,

file1.txt

1650,A,2428057182945480,0.33446294,0.102967925,-0.3460815
1650,A,2428057232445480,0.086256325,0.719756,0.45393208
1650,A,2428057281945480,-0.04051014,1.1011207,1.0462191
1650,B,2428301534869292,-1.6426647,0.8912665,-4.4452224
1650,B,2428301584369292,-1.6128372,1.1938016,-3.1242943
1650,B,2428301633869292,-3.6656017,1.328025,-1.8204107
1650,B,2428301683369292,-6.0336843,2.2516093,-1.7117537
1650,B,2428301732869292,-3.2778456,-0.43924874,-1.3911091

file2.txt

1650,A,2428057133445480,-1.2798505,-5.187936,-2.3116016
1650,A,2428057182945480,-3.3029509,-6.8231754,-4.011485
1650,A,2428057232445480,-2.876783,-8.365042,-7.171831
1650,A,2428057281945480,-2.2542906,-8.5661545,-8.153454
1650,A,2428057331445480,-3.2646437,-10.953174,-8.826224
1650,B,2428301485369292,6.3887777,-0.42347443,0.82480246
1650,B,2428301534869292,8.522012,-16.99614,9.446322

As can be observed that both files have different lengths in A and B. I want to concatenate both of them using pandas such that the result is as follows:

1650,A,2428057133445480,NaN,NaN,NaN,-1.2798505,-5.187936,-2.3116016
1650,A,2428057182945480,0.33446294,0.102967925,-0.3460815,-3.3029509,-6.8231754,-4.011485
1650,A,2428057232445480,0.086256325,0.719756,0.45393208,-2.876783,-8.365042,-7.171831
1650,A,2428057281945480,-0.04051014,1.1011207,1.0462191,-2.2542906,-8.5661545,-8.153454
1650,A,2428057331445480,NaN,NaN,NaN,-3.2646437,-10.953174,-8.826224
1650,B,2428301485369292,NaN,NaN,NaN,6.3887777,-0.42347443,0.82480246
1650,B,2428301534869292,8.522012,-16.99614,9.446322,-1.6426647,0.8912665,-4.4452224
1650,B,2428301584369292,-1.6128372,1.1938016,-3.1242943,NaN,NaN,NaN
1650,B,2428301633869292,-3.6656017,1.328025,-1.8204107,NaN,NaN,NaN
1650,B,2428301683369292,-6.0336843,2.2516093,-1.7117537,NaN,NaN,NaN
1650,B,2428301732869292,-3.2778456,-0.43924874,-1.3911091,NaN,NaN,NaN

Based on my understanding I can first generate the dataframes and then use concatenate like this

df1 = read_data('file1.txt')
df2 = read_data('file2.txt')
pd.concat([df1,df2], ignore_index=True, axis=1)

Is this correct? If not, how to solve this problem?

Additionally, how to remove rows with Nan such that the result becomes

1650,A,2428057182945480,0.33446294,0.102967925,-0.3460815,-3.3029509,-6.8231754,-4.011485
1650,A,2428057232445480,0.086256325,0.719756,0.45393208,-2.876783,-8.365042,-7.171831
1650,A,2428057281945480,-0.04051014,1.1011207,1.0462191,-2.2542906,-8.5661545,-8.153454
1650,B,2428301534869292,8.522012,-16.99614,9.446322,-1.6426647,0.8912665,-4.4452224
question from:https://stackoverflow.com/questions/66059240/concatenate-columns-with-different-lengths-based-on-specific-columns-in-pandas

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

I don't know the column names so I am just putting dummy column names:

df1 = pd.read_csv('untitled.txt') # this is the first txt with columns abcdef
df2 = pd.read_csv('untitled1.txt') # this is the second txt with columns abcghi
 
df1.merge(df2, how='outer', on=['a','b','c']).dropna() # this gives what you want


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...