I'm having an issue with saving dist-class
objects (with saveRDS
) and getting much larger file sizes than expected. The code to reproduce this problem can be found at https://github.com/kstagaman/stackoverflow-help
Below reads in a phyloseq object with a very large community matrix (47
x 1,060,929)
library(phyloseq)
obj.size <- function(x) {paste("Object size:", format(object.size(x), units = "auto"))}
f.size <- function(x) {paste("File size:", round(file.size(x) / 1024^2, 1), "Mb")}
ps <- readRDS("example_phyloseq.rds")
Next, I create a distance matrix to determine community dissimilarities
between samples It should result in a 47 x 47 matrix
dist.mat <- phyloseq::distance(ps, method = "bray") # creating a distance matrix
attributes(dist.mat)$Size
## [1] 47
Okay, as expected, but…
cat(obj.size(dist.mat), sep = "
")
## Object size: 453.3 Mb
This is oddly big for a distance matrix of this size (like orders of
magnitude larger)
After investigating I realize that this is because
attributes(dist.mat)$call
is huge, because it includes every single
value from the original community matrix.
dist.mat.small <- dist.mat
attributes(dist.mat.small)$call <- NULL
cat(obj.size(dist.mat.small), sep = "
")
## Object size: 12.5 Kb
Okay, so that’s looking good. Much closer to what I’d expect for the
size. But…when I try to save…
saveRDS(dist.mat, file = "distance_matrix_original.rds")
saveRDS(dist.mat.small, file = "distance_matrix_small.rds")
cat(f.size("distance_matrix_original.rds"), sep = "
")
## File size: 112.5 Mb
cat(f.size("distance_matrix_small.rds"), sep = "
")
## File size: 112.5 Mb
What’s happening here? Why are the file sizes the same for objects of
very different size?
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…