Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
315 views
in Technique[技术] by (71.8m points)

r - Selecting data frame rows based on partial string match in a column

I want to select rows from a data frame based on partial match of a string in a column, e.g. column 'x' contains the string "hsa". Using sqldf - if it had a like syntax - I would do something like:

select * from <> where x like 'hsa'.

Unfortunately, sqldf does not support that syntax.

Or similarly:

selectedRows <- df[ , df$x %like% "hsa-"]

Which of course doesn't work.

Can somebody please help me with this?

Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

I notice that you mention a function %like% in your current approach. I don't know if that's a reference to the %like% from "data.table", but if it is, you can definitely use it as follows.

Note that the object does not have to be a data.table (but also remember that subsetting approaches for data.frames and data.tables are not identical):

library(data.table)
mtcars[rownames(mtcars) %like% "Merc", ]
iris[iris$Species %like% "osa", ]

If that is what you had, then perhaps you had just mixed up row and column positions for subsetting data.


If you don't want to load a package, you can try using grep() to search for the string you're matching. Here's an example with the mtcars dataset, where we are matching all rows where the row names includes "Merc":

mtcars[grep("Merc", rownames(mtcars)), ]
             mpg cyl  disp  hp drat   wt qsec vs am gear carb
# Merc 240D   24.4   4 146.7  62 3.69 3.19 20.0  1  0    4    2
# Merc 230    22.8   4 140.8  95 3.92 3.15 22.9  1  0    4    2
# Merc 280    19.2   6 167.6 123 3.92 3.44 18.3  1  0    4    4
# Merc 280C   17.8   6 167.6 123 3.92 3.44 18.9  1  0    4    4
# Merc 450SE  16.4   8 275.8 180 3.07 4.07 17.4  0  0    3    3
# Merc 450SL  17.3   8 275.8 180 3.07 3.73 17.6  0  0    3    3
# Merc 450SLC 15.2   8 275.8 180 3.07 3.78 18.0  0  0    3    3

And, another example, using the iris dataset searching for the string osa:

irisSubset <- iris[grep("osa", iris$Species), ]
head(irisSubset)
#   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1          5.1         3.5          1.4         0.2  setosa
# 2          4.9         3.0          1.4         0.2  setosa
# 3          4.7         3.2          1.3         0.2  setosa
# 4          4.6         3.1          1.5         0.2  setosa
# 5          5.0         3.6          1.4         0.2  setosa
# 6          5.4         3.9          1.7         0.4  setosa

For your problem try:

selectedRows <- conservedData[grep("hsa-", conservedData$miRNA), ]

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...