use map
instead of iterrows
.
zipcode_data['city_name'] = zipcode_data['ZIP_POSTAL_CODE'].map(lambda x: search.by_zipcode(x).major_city)
as your csv has 4M zipcodes, that will be lots of duplicates, so you can consider applying search.by_zipcode
only once for a unique zipcode, then use map(dict)
or map(Series)
to boost the speed.
df_zip_code = zipcode_data['ZIP_POSTAL_CODE'].drop_duplicates().to_frame()
df_zip_code['city_name'] = df_zip_code['ZIP_POSTAL_CODE'].map(lambda x: search.by_zipcode(x).major_city)
zip_code_map = df_zip_code.set_index('ZIP_POSTAL_CODE')['city_name']
zipcode_data['city_name'] = zipcode_data['ZIP_POSTAL_CODE'].map(zip_code_map)
use uszipcode
's database file directly.
import pandas as pd
import sqlite3
# replace ${HOME} with your home direction
with sqlite3.connect("${HOME}/.uszipcode/simple_db.sqlite") as con:
df = pd.read_sql_query("SELECT * from simple_zipcode", con)
print(df.head())
# convert the `CompressedJSONType` to str
# https://github.com/MacHu-GWU/uszipcode-project/blob/master/uszipcode/pkg/compressed_json_type.py
import zlib
for col in ['common_city_list', 'area_code_list']:
df[col] = df[col].map(lambda x: json.loads(zlib.decompress(x).decode("utf-8")))
reuslt:
zipcode zipcode_type major_city post_office_city
0 35004 Standard Moody Moody, AL
1 35005 Standard Adamsville Adamsville, AL
2 35006 Standard Adger Adger, AL
3 35007 Standard Alabaster Alabaster, AL
4 35010 Standard Alexander City Alexander City, AL
common_city_list county state
0 b'xx9cx8bVxf2xcdxcfOxa9Txd2QPrLxceM,R... St. Clair County AL
1 b'xx9cx8bVrLIxcc-.xcbxccxc9IUx8ax05x0... Jefferson County AL
2 b'xx9cx8bVrLIO-Rx8ax05x00
xf9x02xe0' Jefferson County AL
3 b'xx9cx8bVrxccILJ,.I-Rx8ax05x00x1exe8... Shelby County AL
4 b'xx9cx8bVrxccIxadHxccKI-Rpxce,xa9Txd2... Tallapoosa County AL
lat lng timezone ... land_area_in_sqmi water_area_in_sqmi
0 33.62 -86.50 Central ... 18.07 0.14
1 33.59 -86.99 Central ... 34.51 0.35
2 33.40 -87.20 Central ... 99.81 3.02
3 33.22 -86.79 Central ... 37.38 0.67
4 32.90 -85.90 Central ... 217.59 25.60
housing_units occupied_housing_units median_home_value
0 4523.0 4214.0 142500.0
1 3485.0 3067.0 97000.0
2 1495.0 1188.0 95400.0
3 9799.0 9180.0 153900.0
4 10307.0 8476.0 90800.0
median_household_income bounds_west bounds_east bounds_north
0 58832.0 -86.551776 -86.452822 33.668501
1 46059.0 -87.081633 -86.906768 33.639430
2 51929.0 -87.341698 -87.071629 33.555800
3 64299.0 -86.861774 -86.726833 33.271760
4 37380.0 -86.108223 -85.763722 33.104456
bounds_south
0 33.562686
1 33.533901
2 33.327443
3 33.150196
4 32.698717
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…