I have two different txt files containing the same number of columns but different lengths, i.e,
file1.txt
1650,A,2428057182945480,0.33446294,0.102967925,-0.3460815
1650,A,2428057232445480,0.086256325,0.719756,0.45393208
1650,A,2428057281945480,-0.04051014,1.1011207,1.0462191
1650,B,2428301534869292,-1.6426647,0.8912665,-4.4452224
1650,B,2428301584369292,-1.6128372,1.1938016,-3.1242943
1650,B,2428301633869292,-3.6656017,1.328025,-1.8204107
1650,B,2428301683369292,-6.0336843,2.2516093,-1.7117537
1650,B,2428301732869292,-3.2778456,-0.43924874,-1.3911091
file2.txt
1650,A,2428057133445480,-1.2798505,-5.187936,-2.3116016
1650,A,2428057182945480,-3.3029509,-6.8231754,-4.011485
1650,A,2428057232445480,-2.876783,-8.365042,-7.171831
1650,A,2428057281945480,-2.2542906,-8.5661545,-8.153454
1650,A,2428057331445480,-3.2646437,-10.953174,-8.826224
1650,B,2428301485369292,6.3887777,-0.42347443,0.82480246
1650,B,2428301534869292,8.522012,-16.99614,9.446322
As can be observed that both files have different lengths in A and B. I want to concatenate both of them using pandas such that the result is as follows:
1650,A,2428057133445480,NaN,NaN,NaN,-1.2798505,-5.187936,-2.3116016
1650,A,2428057182945480,0.33446294,0.102967925,-0.3460815,-3.3029509,-6.8231754,-4.011485
1650,A,2428057232445480,0.086256325,0.719756,0.45393208,-2.876783,-8.365042,-7.171831
1650,A,2428057281945480,-0.04051014,1.1011207,1.0462191,-2.2542906,-8.5661545,-8.153454
1650,A,2428057331445480,NaN,NaN,NaN,-3.2646437,-10.953174,-8.826224
1650,B,2428301485369292,NaN,NaN,NaN,6.3887777,-0.42347443,0.82480246
1650,B,2428301534869292,8.522012,-16.99614,9.446322,-1.6426647,0.8912665,-4.4452224
1650,B,2428301584369292,-1.6128372,1.1938016,-3.1242943,NaN,NaN,NaN
1650,B,2428301633869292,-3.6656017,1.328025,-1.8204107,NaN,NaN,NaN
1650,B,2428301683369292,-6.0336843,2.2516093,-1.7117537,NaN,NaN,NaN
1650,B,2428301732869292,-3.2778456,-0.43924874,-1.3911091,NaN,NaN,NaN
Based on my understanding I can first generate the dataframes
and then use concatenate
like this
df1 = read_data('file1.txt')
df2 = read_data('file2.txt')
pd.concat([df1,df2], ignore_index=True, axis=1)
Is this correct? If not, how to solve this problem?
Additionally, how to remove rows with Nan
such that the result becomes
1650,A,2428057182945480,0.33446294,0.102967925,-0.3460815,-3.3029509,-6.8231754,-4.011485
1650,A,2428057232445480,0.086256325,0.719756,0.45393208,-2.876783,-8.365042,-7.171831
1650,A,2428057281945480,-0.04051014,1.1011207,1.0462191,-2.2542906,-8.5661545,-8.153454
1650,B,2428301534869292,8.522012,-16.99614,9.446322,-1.6426647,0.8912665,-4.4452224
question from:
https://stackoverflow.com/questions/66059240/concatenate-columns-with-different-lengths-based-on-specific-columns-in-pandas