Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
124 views
in Technique[技术] by (71.8m points)

python - Fast loop through using Pandas - uszipcode

I have a CSV file with nearly 4M zipcodes. I would like to get city name, population etc., from uszipcode database.

Is there a way to run the below code without for loop or at least make it faster?

import uszipcode
from uszipcode import SearchEngine

import pandas as pd
zipcode_data = pd.read_csv("customer_zipcode_data.csv")

search = SearchEngine()

for index, row in zipcode_data.iterrows():
    zipcode_object = search.by_zipcode(row['ZIP_POSTAL_CODE'])
    print(zipcode_object.to_dict()['major_city'])
    zipcode_data.at[index, 'city_name'] = zipcode_object.to_dict()['major_city']
question from:https://stackoverflow.com/questions/66057024/fast-loop-through-using-pandas-uszipcode

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

use map instead of iterrows.

zipcode_data['city_name'] = zipcode_data['ZIP_POSTAL_CODE'].map(lambda x: search.by_zipcode(x).major_city)

as your csv has 4M zipcodes, that will be lots of duplicates, so you can consider applying search.by_zipcode only once for a unique zipcode, then use map(dict) or map(Series) to boost the speed.

df_zip_code = zipcode_data['ZIP_POSTAL_CODE'].drop_duplicates().to_frame()
df_zip_code['city_name'] = df_zip_code['ZIP_POSTAL_CODE'].map(lambda x: search.by_zipcode(x).major_city)
zip_code_map = df_zip_code.set_index('ZIP_POSTAL_CODE')['city_name']
zipcode_data['city_name'] = zipcode_data['ZIP_POSTAL_CODE'].map(zip_code_map)

use uszipcode's database file directly.

import pandas as pd
import sqlite3

# replace ${HOME} with your home direction
with sqlite3.connect("${HOME}/.uszipcode/simple_db.sqlite") as con:
    df = pd.read_sql_query("SELECT * from simple_zipcode", con)

print(df.head())

# convert the `CompressedJSONType` to str
# https://github.com/MacHu-GWU/uszipcode-project/blob/master/uszipcode/pkg/compressed_json_type.py

import zlib
for col in ['common_city_list', 'area_code_list']:
    df[col] = df[col].map(lambda x: json.loads(zlib.decompress(x).decode("utf-8")))


reuslt:

  zipcode zipcode_type      major_city    post_office_city  
0   35004     Standard           Moody           Moody, AL   
1   35005     Standard      Adamsville      Adamsville, AL   
2   35006     Standard           Adger           Adger, AL   
3   35007     Standard       Alabaster       Alabaster, AL   
4   35010     Standard  Alexander City  Alexander City, AL   

                                    common_city_list             county state  
0  b'xx9cx8bVxf2xcdxcfOxa9Txd2QPrLxceM,R...   St. Clair County    AL   
1  b'xx9cx8bVrLIxcc-.xcbxccxc9IUx8ax05x0...   Jefferson County    AL   
2      b'xx9cx8bVrLIO-Rx8ax05x00
xf9x02xe0'   Jefferson County    AL   
3  b'xx9cx8bVrxccILJ,.I-Rx8ax05x00x1exe8...      Shelby County    AL   
4  b'xx9cx8bVrxccIxadHxccKI-Rpxce,xa9Txd2...  Tallapoosa County    AL   

     lat    lng timezone  ...  land_area_in_sqmi water_area_in_sqmi  
0  33.62 -86.50  Central  ...              18.07               0.14   
1  33.59 -86.99  Central  ...              34.51               0.35   
2  33.40 -87.20  Central  ...              99.81               3.02   
3  33.22 -86.79  Central  ...              37.38               0.67   
4  32.90 -85.90  Central  ...             217.59              25.60   

   housing_units  occupied_housing_units  median_home_value  
0         4523.0                  4214.0           142500.0   
1         3485.0                  3067.0            97000.0   
2         1495.0                  1188.0            95400.0   
3         9799.0                  9180.0           153900.0   
4        10307.0                  8476.0            90800.0   

   median_household_income  bounds_west  bounds_east  bounds_north  
0                  58832.0   -86.551776   -86.452822     33.668501   
1                  46059.0   -87.081633   -86.906768     33.639430   
2                  51929.0   -87.341698   -87.071629     33.555800   
3                  64299.0   -86.861774   -86.726833     33.271760   
4                  37380.0   -86.108223   -85.763722     33.104456   

   bounds_south  
0     33.562686  
1     33.533901  
2     33.327443  
3     33.150196  
4     32.698717  


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...