Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
451 views
in Technique[技术] by (71.8m points)

r - How to add a index by set of data when using rbindlist?

I have several different csv files with the same structure. I read them into R using fread, and then union them into a bigger dataset using rbindlist().

files <- list.files( pattern = "*.csv" );
x2csv <- rbindlist( lapply(files, fread, stringsAsFactors=FALSE), fill = TRUE )

The code works weel. However, I would like to add a column filled with numbers to indicate from which csv file that observation came from. For exemple, the output should be:

       V1        V2         V3  C1
   1:   0 0.2859163 0.55848521   1
   2:   1 1.1616298 0.87571349   1 
   3:   2 2.1122510 0.95062116   2 
   4:   3 2.6832013 0.57095035   2
   5:   4 2.9117493 0.22854804   2 
   6:   5 2.9886040 0.07685464   3

where C1 is the new index column telling that: the first and second observations come from files[1] (the first .csv file); the third and fourth observation come from files[1] (the first .csv file); and so on.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

This is an enhanced version of Nicolás' answer which adds the file names instead of numbers:

x2csv <- rbindlist(lapply(files, fread), idcol = "origin")
x2csv[, origin := factor(origin, labels = basename(files))]
  • fread() uses stringsAsFactors = FALSE by default so we can save some keystrokes
  • Also fill = TRUE is only required if we want to read files with differing structure, e.g., differing position, name, or number of columns
  • The id col can be named (the default is .id) and is populated with the sequence number of the list element.
  • Then, this number is converted into a factor whose levels are labeled with the file names. A file name might be easier to remember than just a mere number. basename() strips the path off the file name.

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...