data.table interface at the moment allows one to modify/update columns by reference (note that we don't need to re-assign the result back to a variable).
(data.table接口目前允许通过引用修改/更新列(请注意,我们不需要将结果重新分配回变量)。)
# sub-assign by reference, updates 'y' in-place DT[x >= 1L, y := NA]
But dplyr will never update by reference.
(但dplyr 永远不会通过引用更新。)
The dplyr equivalent would be (note that the result needs to be re-assigned): (dplyr等价物将是(注意结果需要重新分配):)
# copies the entire 'y' column ans <- DF %>% mutate(y = replace(y, which(x >= 1L), NA))
A concern for this is referential transparency .
(对此的关注是参考透明度 。)
Updating a data.table object by reference, especially within a function may not be always desirable. (通过引用更新data.table对象,尤其是在函数内更新可能并不总是令人满意的。)
But this is an incredibly useful feature: see this and this posts for interesting cases. (但这是一个非常有用的功能:看到这个和这个帖子的有趣案例。)
And we want to keep it. (我们想保留它。)
Therefore we are working towards exporting shallow()
function in data.table that will provide the user with both possibilities .
(因此,我们正在努力在data.table中导出shallow()
函数,它将为用户提供两种可能性 。)
For example, if it is desirable to not modify the input data.table within a function, one can then do: (例如,如果希望不修改函数中的输入data.table,则可以执行以下操作:)
foo <- function(DT) { DT = shallow(DT) ## shallow copy DT DT[, newcol := 1L] ## does not affect the original DT DT[x > 2L, newcol := 2L] ## no need to copy (internally), as this column exists only in shallow copied DT DT[x > 2L, x := 3L] ## have to copy (like base R / dplyr does always); otherwise original DT will ## also get modified. }
By not using shallow()
, the old functionality is retained:
(通过不使用shallow()
,保留旧功能:)
bar <- function(DT) { DT[, newcol := 1L] ## old behaviour, original DT gets updated by reference DT[x > 2L, x := 3L] ## old behaviour, update column x in original DT. }
By creating a shallow copy using shallow()
, we understand that you don't want to modify the original object.
(通过使用shallow()
创建浅拷贝 ,我们知道您不想修改原始对象。)
We take care of everything internally to ensure that while also ensuring to copy columns you modify only when it is absolutely necessary . (我们在内部处理所有事情,以确保在确保复制列时仅在绝对必要时修改。)
When implemented, this should settle the referential transparency issue altogether while providing the user with both possibilties. (实施时,这应该完全解决参考透明度问题,同时为用户提供两种可能性。)
Also, once shallow()
is exported dplyr's data.table interface should avoid almost all copies.
(此外,一旦shallow()
被导出,dplyr的data.table接口应该避免几乎所有的副本。)
So those who prefer dplyr's syntax can use it with data.tables. (所以那些喜欢dplyr语法的人可以将它与data.tables一起使用。)
But it will still lack many features that data.table provides, including (sub)-assignment by reference.
(但它仍然缺少data.table提供的许多功能,包括(sub)-assignment by reference。)
Aggregate while joining:
(加入时聚合:)
Suppose you have two data.tables as follows:
(假设您有两个data.tables如下:)
DT1 = data.table(x=c(1,1,1,1,2,2,2,2), y=c("a", "a", "b", "b"), z=1:8, key=c("x", "y")) # xyz # 1: 1 a 1 # 2: 1 a 2 # 3: 1 b 3 # 4: 1 b 4 # 5: 2 a 5 # 6: 2 a 6 # 7: 2 b 7 # 8: 2 b 8 DT2 = data.table(x=1:2, y=c("a", "b"), mul=4:3, key=c("x", "y")) # xy mul # 1: 1 a 4 # 2: 2 b 3
And you would like to get sum(z) * mul
for each row in DT2
while joining by columns x,y
.
(并且您希望在按列x,y
连接时获得DT2
每行的sum(z) * mul
。)
We can either: (我们可以:)
1) aggregate DT1
to get sum(z)
, 2) perform a join and 3) multiply (or)
(1)聚合DT1
得到sum(z)
,2)执行连接3)乘法(或))
# data.table way DT1[, .(z = sum(z)), keyby = .(x,y)][DT2][, z := z*mul][] # dplyr equivalent DF1 %>% group_by(x, y) %>% summarise(z = sum(z)) %>% right_join(DF2) %>% mutate(z = z * mul)
2) do it all in one go (using by = .EACHI
feature):
(2)一次完成(使用by = .EACHI
功能):)
DT1[DT2, list(z=sum(z) * mul), by = .EACHI]
What is the advantage?
(有什么好处?)
We don't have to allocate memory for the intermediate result.
(我们不必为中间结果分配内存。)
We don't have to group/hash twice (one for aggregation and other for joining).
(我们没有两次分组/哈希(一个用于聚合,另一个用于加入)。)
And more importantly, the operation what we wanted to perform is clear by looking at j
in (2).
(更重要的是,通过查看(2)中的j
,我们想要执行的操作是清楚的。)
Check this post for a detailed explanation of by = .EACHI
.
(查看<a href="https://stackoom.com/link/aHR0cHM6Ly9zdGFja292ZXJmbG93LmNvbS9hLzI3MDA0NTY2LzU1OTc4NA==" rel="nofollow noopener" target="_blank"