Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
361 views
in Technique[技术] by (71.8m points)

Python combine CSVs, remove header and remove blanks

I’m extremely new to Python & trying to figure the below out:

I have multiple CSV files (monthly files) that I’m trying to combine into a yearly file. The monthly files all have headers, so I’m trying to keep the first header & remove the rest. I used the below script which accomplished this, however there are 10 blank rows between each month.

Does anyone know what I can add to this to remove the blank rows?

import shutil
import glob


#import csv files from folder
path = r'data/US/market/merged_data'
allFiles = glob.glob(path + "/*.csv")
allFiles.sort()  # glob lacks reliable ordering, so impose your own if output order matters
with open('someoutputfile.csv', 'wb') as outfile:
    for i, fname in enumerate(allFiles):
        with open(fname, 'rb') as infile:
            if i != 0:
                infile.readline()  # Throw away header on all but first file
            # Block copy rest of file from input to output without parsing
            shutil.copyfileobj(infile, outfile)
            print(fname + " has been imported.")     

Thank you in advance!

question from:https://stackoverflow.com/questions/65893229/python-combine-csvs-remove-header-and-remove-blanks

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

assuming the dataset isn't bigger than you memory, I suggest reading each file in pandas, concatenating the dataframes and filtering from there. blank rows will probably show up as nan.

import pandas as pd
import glob
path = r'data/US/market/merged_data'
allFiles = glob.glob(path + "/*.csv")
allFiles.sort()
df = pd.Dataframe()
for i, fname in enumerate(allFiles):
    #append data to existing dataframe
    df = df.append(pd.read(fname), ignore_index = True)
#hopefully, this will drop blank rows
df = df.dropna(how = 'all')
#write to file
df.to_csv('someoutputfile.csv')

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

2.1m questions

2.1m answers

60 comments

57.0k users

...