Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
386 views
in Technique[技术] by (71.8m points)

python - Conditional Inner Join error: "Can only compare identically-labeled Series objects"?

I am trying to do a simple inner join between two DataFrames.

My first DataFrame is product data, which is a subset of a larger product data table containing information about a subset of products. I am using SKU Barcode as a primary key to uniquely identify each product. This is productDataRows.info:

RangeIndex: 1489 entries, 0 to 1488
Data columns (total 7 columns):
 #   Column       Non-Null Count  Dtype   
---  ------       --------------  -----   
 0   SKU Barcode  1489 non-null   float32 
 1   Brand        1489 non-null   category
 2   Title        1489 non-null   object  
 3   Size         1489 non-null   category
 4   Category     1489 non-null   category
 5   Image URL    1489 non-null   object  
 6   Cost         1489 non-null   float32 
dtypes: category(3), float32(2), object(2)

My second dataframe is market research information involving data about an individual sale of a product. One product will have many records about it, thus the SKU Barcode acts as a foreign key in this table. This is significantly larger than the other table. This is marketResearch.info:

RangeIndex: 28522436 entries, 0 to 28522435
Data columns (total 5 columns):
 #   Column         Dtype   
---  ------         -----   
 0   SKU Barcode    float32 
 1   Platform Code  category
 2   Price          int16   
 3   Rank           int32   
 4   Epoch Time     int64   
dtypes: category(1), float32(1), int16(1), int32(1), int64(1)
memory usage: 516.8 MB

Since productDataRows only contains a subset of all the total SKU Barcodes I need to locate all the records in marketResearch that correspond to an SKU Barcode in the productDataRows table AND have a platform code matching a local variable platform and obtain the market research about them whilst filtering out any records in market research that do not feature in the product data rows table.

I have tried a few things and this is the latest I have come up with:

marketResearchRows = marketResearch[(marketResearch['SKU Barcode'] == productDataRows['SKU Barcode']) & (marketResearch['Platform Code'] == platform)]

This is throwing the error:

ValueError: Can only compare identically-labeled Series objects

I have read that this may be because the two tables do not have identical columns but how can I get around this? I have tried merging the tables and then dropping values but the fact that my market research table is so large has thrown a lot of MemoryError errors when doing this.

One would think this would be an easy task but I have tried many things and have been having a lot of trouble.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)
等待大神答复

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...