Code example:
In [171]: A = np.array([1.1, 1.1, 3.3, 3.3, 5.5, 6.6])
In [172]: B = np.array([111, 222, 222, 333, 333, 777])
In [173]: C = randint(10, 99, 6)
In [174]: df = pd.DataFrame(zip(A, B, C), columns=['A', 'B', 'C'])
In [175]: df.set_index(['A', 'B'], inplace=True)
In [176]: df
Out[176]:
C
A B
1.1 111 20
222 31
3.3 222 24
333 65
5.5 333 22
6.6 777 74
Now, I want to retrieve A values:
Q1: in range [3.3, 6.6] - expected return value: [3.3, 5.5, 6.6] or [3.3, 3.3, 5.5, 6.6] in case last inclusive, and [3.3, 5.5] or [3.3, 3.3, 5.5] if not.
Q2: in range [2.0, 4.0] - expected return value: [3.3] or [3.3, 3.3]
Same for any other MultiIndex dimension, for example B values:
Q3: in range [111, 500] with repetitions, as number of data rows in range - expected return value: [111, 222, 222, 333, 333]
More formal:
Let us assume T is a table with columns A, B and C. The table includes n rows. Table cells are numbers, for example A double, B and C integers. Let's create a DataFrame of table T, let us name it DF. Let's set columns A and B indexes of DF (without duplication, i.e. no separate columns A and B as indexes, and separate as data), i.e. A and B in this case MultiIndex.
Questions:
- How to write a query on the index, for example, to query the index A (or B), say in the labels interval [120.0, 540.0]? Labels 120.0 and 540.0 exist. I must clarify that I am interested only in the list of indices as a response to the query!
- How to the same, but in case of the labels 120.0 and 540.0 do not exist, but there are labels by value lower than 120, higher than 120 and less than 540, or higher than 540?
- In case the answer for Q1 and Q2 was unique index values, now the same, but with repetitions, as number of data rows in index range.
I know the answers to the above questions in the case of columns which are not indexes, but in the indexes case, after a long research in the web and experimentation with the functionality of pandas, I did not succeed. The only method (without additional programming) I see now is to have a duplicate of A and B as data columns in addition to index.
See Question&Answers more detail:
os