Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
408 views
in Technique[技术] by (71.8m points)

regex - Remove parentheses and text within from strings in R

In R, I have a list of companies such as:

companies  <-  data.frame(Name=c("Company A Inc (COMPA)","Company B (BEELINE)", "Company C Inc. (Coco)", "Company D Inc.", "Company E"))

I want to remove the text with parenthesis, ending up with the following list:

                  Name
1        Company A Inc 
2            Company B
3       Company C Inc.
4       Company D Inc.
5            Company E

One approach I tried was to split the string and then use ldply:

companies$Name <- as.character(companies$Name)
c<-strsplit(companies$Name, "\(")
ldply(c)

But because not all company names have parentheses portions, it fails:

Error in list_to_dataframe(res, attr(.data, "split_labels"), .id, id_as_factor) : 
  Results do not have equal lengths

I'm not married to the strsplit solution. Whatever removes that text and the parentheses would be fine.

Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

A gsub should work here

gsub("\s*\([^\)]+\)","",as.character(companies$Name))

# [1] "Company A Inc"  "Company B"      "Company C Inc."
# [4] "Company D Inc." "Company E" 

Here we just replace occurrences of "(...)" with nothing (also removing any leading space). R makes it look worse than it is with all the escaping we have to do for the parenthesis since they are special characters in regular expressions.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

2.1m questions

2.1m answers

60 comments

57.0k users

...