Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
510 views
in Technique[技术] by (71.8m points)

python - 在Python Pandas中向现有DataFrame添加新列(Adding new column to existing DataFrame in Python pandas)

I have the following indexed DataFrame with named columns and rows not- continuous numbers:

(我有以下索引的DataFrame,其中的命名列和行不是连续数字:)

          a         b         c         d
2  0.671399  0.101208 -0.181532  0.241273
3  0.446172 -0.243316  0.051767  1.577318
5  0.614758  0.075793 -0.451460 -0.012493

I would like to add a new column, 'e' , to the existing data frame and do not want to change anything in the data frame (ie, the new column always has the same length as the DataFrame).

(我想在现有数据帧中添加新列'e' ,并且不想更改数据帧中的任何内容(即,新列的长度始终与DataFrame相同)。)

0   -0.335485
1   -1.166658
2   -0.385571
dtype: float64

I tried different versions of join , append , merge , but I did not get the result I wanted, only errors at most.

(我尝试了不同版本的joinappendmerge ,但是没有得到想要的结果,最多只有错误。)

How can I add column e to the above example?

(如何在上面的示例中添加列e ?)

  ask by tomasz74 translate from so

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Use the original df1 indexes to create the series:

(使用原始的df1索引创建系列:)

df1['e'] = pd.Series(np.random.randn(sLength), index=df1.index)

Edit 2015

(编辑2015)
Some reported getting the SettingWithCopyWarning with this code.

(一些报告说使用此代码获取SettingWithCopyWarning 。)
However, the code still runs perfectly with the current pandas version 0.16.1.

(但是,该代码仍可以在当前的熊猫0.10.1版本中完美运行。)

>>> sLength = len(df1['a'])
>>> df1
          a         b         c         d
6 -0.269221 -0.026476  0.997517  1.294385
8  0.917438  0.847941  0.034235 -0.448948

>>> df1['e'] = pd.Series(np.random.randn(sLength), index=df1.index)
>>> df1
          a         b         c         d         e
6 -0.269221 -0.026476  0.997517  1.294385  1.757167
8  0.917438  0.847941  0.034235 -0.448948  2.228131

>>> p.version.short_version
'0.16.1'

The SettingWithCopyWarning aims to inform of a possibly invalid assignment on a copy of the Dataframe.

(SettingWithCopyWarning目的是通知有关数据框副本的可能无效分配。)

It doesn't necessarily say you did it wrong (it can trigger false positives) but from 0.13.0 it let you know there are more adequate methods for the same purpose.

(它不一定表示您做错了(它可能会触发误报),但是从0.13.0开始,它使您知道有更多适当的方法可以实现相同的目的。)

Then, if you get the warning, just follow its advise: Try using .loc[row_index,col_indexer] = value instead

(然后,如果收到警告,请遵循其建议: 尝试使用.loc [row_index,col_indexer] = value代替)

>>> df1.loc[:,'f'] = pd.Series(np.random.randn(sLength), index=df1.index)
>>> df1
          a         b         c         d         e         f
6 -0.269221 -0.026476  0.997517  1.294385  1.757167 -0.050927
8  0.917438  0.847941  0.034235 -0.448948  2.228131  0.006109
>>> 

In fact, this is currently the more efficient method as described in pandas docs

(实际上,这是熊猫文档中描述的当前更有效的方法)


Edit 2017

(编辑2017)

As indicated in the comments and by @Alexander, currently the best method to add the values of a Series as a new column of a DataFrame could be using assign :

(如评论中所述,@ Alexander指出,当前将Series的值添加为DataFrame的新列的最佳方法是使用assign :)

df1 = df1.assign(e=pd.Series(np.random.randn(sLength)).values)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...