Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
385 views
in Technique[技术] by (71.8m points)

R multiple fuzzy match agrep create variable

New to R. I would like to create a test by creating a variable (yes/no) that checks to see if first name OR last name fuzzy match to email address. If so, append a 'yes' variable to that row.

Data Example:

id firstname lastname email address match
1 patrick boyles [email protected] yes
2 zeke cosmos [email protected] yes
3 foo foo [email protected] no

I understand that I need to use agrep. What confuses me is how to tell R to check 2 columns (first name and last name) and only check within that row.

Thanks -The newbie

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Here is something to start with

library(stringdist) # install.packages("stringdist") b4, if you need to
df <- read.table(header = TRUE, text = "id firstname lastname emailaddress match
1 patrick boyles [email protected] yes
2 zeke cosmos [email protected] yes
3 foo foo [email protected] no")
df$match2 <- ifelse(with(df, stringdist(a = paste0(firstname, lastname), 
                                        b = sub("(.*)@.*", "\1", emailaddress), 
                                        method = "lcs")) <= 7, 
                    "yes", "no")
df
#   id firstname lastname      email.address match match2
# 1  1   patrick   boyles [email protected]   yes    yes
# 2  2      zeke   cosmos     [email protected]   yes    yes
# 3  3       foo      foo     [email protected]    no     no

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...