text - large-scale string matching between different dataframes python

Question

Welcome To Ask or Share your Answers For Others

text - large-scale string matching between different dataframes python

asked Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

text - large-scale string matching between different dataframes python

I am trying to improve my lookup table run time.

dest_df = pd.DataFrame({"dest":["uk LHR","from ROM","City:LONDON","planetoronto"," rome rome","junk plane"]}) ## 300,000 rows
city_df_lookup=pd.DataFrame({"places":["london"," paris","toronto","rome"],
                           "code":["LHR","PAR","YTO","ROM"]}) ## around 10,000 rows 
code = city_df_lookup.code.tolist()                                                  
places = city_df_lookup.places.tolist()                                                        

def select(x):                                                                   
    for co, pl in zip(code, places):                                       
        if co in x:                                                             
            return pl                                                        

dest_df["clean_dest"] = dest_df["dest"].apply(select)  

dest_df.head()

dest               dest_match
0   uk LHR          london
1   from ROM        rome
2   City:LONDON     None
3   Planetoronto    None 
4   rome    rome    None 
5   junk plane      None

Unfortunately, the code above takes too long and i would also like the loop to try and string match between city_df_lookup.places and dest_df.dest columns

The desired output is:

dest               dest_match
0   uk LHR          london
1   from ROM        rome
2   City:LONDON     london
3   Planetoronto    tornoto
4   rome    rome    rome  
5   junk plane      No Match

I was thinking of using ahocorasick but not sure if there is a simpler method.

question from:https://stackoverflow.com/questions/65904049/large-scale-string-matching-between-different-dataframes-python

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

Categories

text - large-scale string matching between different dataframes python

text - large-scale string matching between different dataframes python

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags