Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
248 views
in Technique[技术] by (71.8m points)

python - How to split information of multiple rows into columns?

I have a ".csv" file with multiple rows. The information is set like this:

GS3;724330300294409;50;BRABT;00147;44504942;01;669063000;25600;0
GS3;724330300294409;50;BRABT;00147;44504943;01;669063000;25600;0
GS3;724330300294409;50;BRABT;00147;44504944;01;669063000;25600;00004

I already receive information in rows (each file has almost 300000 rows). I'm sending this data to Kafka but I need to see the lines split into columns. For example:

Column1 Column2         Column3 Column4 Column5 Column6  Column7 Column8    Column9 Column10
GS3     724330300294409 50      BRABT   00147   44504942 01      669063000  25600   0
GS3     724330300294409 50      BRABT   00147   44504943 01      669063000  25600   0
GS3     724330300294409 50      BRABT   00147   44504944 01      669063000  25600   00004

I know the size for each value. For example:

3 (GS3)
15 (724330300294409)
2 (50)
5 (BRABT)
5 (00147)
8 (44504943)
2 (01)
10 (669063000)
5 (25600)
5 (0    )

I'm trying to do this through ksql on my Kafka Platform but I'm struggling. I'm new to python but it seems like a easier way to do this before I send data to Kafka.

I've been using Spooldir CSV Connector to send data to Kafka but each row is being set as a unique column on the topic.

I've used this to add ";" between data:

i = True
        for line in arquivo:
                if i: 
                        i = False
                        continue
                result = result + line[0:3].strip()+commatype+line[3:18].strip()+commatype+line[18:20].strip()+commatype+line[20:25].strip()+ ...

arquivo.close()
question from:https://stackoverflow.com/questions/65846552/how-to-split-information-of-multiple-rows-into-columns

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

If you accept that your column names start from Column0 (not Column1), you can call read_csv with sep=';' and a suitable prefix:

result = pd.read_csv('Input.csv', sep=';', header=None, prefix='Column', dtype='str')

Note that I passed dtype='str' because some columns of your input have leading zeroes which otherwise would be stripped.

This solution works regardless of the number of input columns, but the downside is that now all columns are of object type. Maybe you should convert some columns to other types.

The result is:

  Column0          Column1 Column2 Column3 Column4   Column5 Column6    Column7 Column8 Column9
0     GS3  724330300294409      50   BRABT   00147  44504942      01  669063000   25600       0 
1     GS3  724330300294409      50   BRABT   00147  44504943      01  669063000   25600       0 
2     GS3  724330300294409      50   BRABT   00147  44504944      01  669063000   25600   00004

Other option, to create column names just as you wish (starting from Column1), but possible only if you know the number of columns, is:

# Create the list of column names
names = [ f'Column{i}' for i in range(1, 11) ]
# Read passing the above column names
result = pd.read_csv('Input.csv', sep=';', names=names, dtype='str')

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

2.1m questions

2.1m answers

60 comments

57.0k users

...