Use a dict
as the value
argument to fillna()
As mentioned in the comment by @rhkarls on @Jeff's answer, using .loc
indexed to a list of columns won't support inplace
operations, which I too find frustrating. Here's a workaround.
Example:
import pandas as pd
import numpy as np
df = pd.DataFrame({'a':[1,2,3,4,np.nan],
'b':[6,7,8,np.nan,np.nan],
'x':[11,12,13,np.nan,np.nan],
'y':[16,np.nan,np.nan,19,np.nan]})
print(df)
# a b x y
#0 1.0 6.0 11.0 16.0
#1 2.0 7.0 12.0 NaN
#2 3.0 8.0 13.0 NaN
#3 4.0 NaN NaN 19.0
#4 NaN NaN NaN NaN
Let's say we want to fillna
for x
and y
only, not a
and b
.
I would expect using .loc
to work (as in an assignment), but it doesn't, as mentioned earlier:
# doesn't work
df.loc[:,['x','y']].fillna(0, inplace=True)
print(df) # nothing changed
However, the documentation says that the value
argument to fillna()
can be:
alternately a dict/Series/DataFrame of values specifying which value to use for each index (for a Series) or column (for a DataFrame). (values not in the dict/Series/DataFrame will not be filled).
It turns out that using a dict of values will work:
# works
df.fillna({'x':0, 'y':0}, inplace=True)
print(df)
# a b x y
#0 1.0 6.0 11.0 16.0
#1 2.0 7.0 12.0 0.0
#2 3.0 8.0 13.0 0.0
#3 4.0 NaN 0.0 19.0
#4 NaN NaN 0.0 0.0
Also, if you have a lot of columns in your subset, you could use a dict comprehension, as in:
df.fillna({x:0 for x in ['x','y']}, inplace=True) # also works