python - best way to convert large text file (900 mb, 300 columns, pipe delim) to pandas dataframe

Question

Welcome To Ask or Share your Answers For Others

python - best way to convert large text file (900 mb, 300 columns, pipe delim) to pandas dataframe

asked Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - best way to convert large text file (900 mb, 300 columns, pipe delim) to pandas dataframe

i have a 900mb text file (pipe delim) that i need to convert to a pandas df and, ultimately, ingest to a postgres db.

i've tried looping to chunk, but it didn't work

df = pd.DataFrame()
for chunk in pd.read_csv(r"my_file.txt", sep='|', chunksize=1000):
     df = pd.concat([df, chunk], ignore_index=True)

what else should i try? any help for a n00b is much appreciated. thank you!

EDIT (adding more detail): when trying to read the entire file and check nRows, using:

data = pd.read_csv(r"my_file.txt", sep='|')
print('Total rows: {0}'.format(len(data)))
print(list(data))

i'm thrown a DytpeWarning on ~50 columns (of ~300) asking to specify dtype option on import. i'm also thrown a MemoryError:

MemoryError: Unable to allocate 410. MiB for an array with shape (277, 388455) and data type object

out of curiosity, i tried reading different increments of nrows to see when the Dtype warning and file memory will be initially thrown - i'm able to read the first 2000 rows without either warning or error. i was able to read the first 240,000 rows without memory error, but with the Dtype warning on ~50 columns of the 300.

will i need to specify the Dtype in read_csv() for each column to avoid the warning?

additionally, i'm unsure how to handle the memory error - as one commenter mentioned below, 900mb isn't exactly wildly massive.

question from:https://stackoverflow.com/questions/65854245/best-way-to-convert-large-text-file-900-mb-300-columns-pipe-delim-to-pandas

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

Categories

python - best way to convert large text file (900 mb, 300 columns, pipe delim) to pandas dataframe

python - best way to convert large text file (900 mb, 300 columns, pipe delim) to pandas dataframe

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags