I have a dataset that looks as follows:
ex <- data.frame("id" =c(rep(1234,6)),
"case" = c(45,45,45,92,92,93),
"cpt" = c(20600,25263,8456,2345,9023,"C1245"),
"date"=c("2019-05-21", "2019-08-22","2019-04-12",
"2019-03-01", "2019-02-18", "2019-03-12"),
"window"=c(10,20,NA,45,NA,NA),
"pay"=c(520,140,2200,230,600,700))
I essentially have two questions:
How can I update the cpt column with the cpt value corresponding to the row with the highest pay grouped by case only focusing on any row containing a cpt value beginning with number 2 AND following this, how I can update the window column to replace any value that isn't the maximum with NA (also only focusing on rows where cpt value starts with 2)
I would like the final output to look like so:
ex1 <- data.frame("id" =c(rep(1234,6)),
"case" = c(45,45,45,92,92,93),
"cpt" = c(20600,20600,20600,2345,2345,"C1245"),
"date"=c("2019-05-21", "2019-08-22","2019-04-12",
"2019-03-01", "2019-02-18", "2019-03-12"),
"window"=c(10,NA,NA,45,NA,NA),
"pay"=c(520,140,2200,230,600,700))
I have two codes
ex[, cpt:=cpt[which.max(pay)], by=case]
ex %>% group_by(case) %>% mutate(window=replace(window, (pay < max(pay)), NA)) %>%
setDT()
but both of these codes apply to the overall max value and will produce
ex2 <- data.frame("id" =c(rep(1234,6)),
"case" = c(45,45,45,92,92,93),
"cpt" = c(8456,8456,8456,9023,9023,"C1245"),
"date"=c("2019-05-21", "2019-08-22","2019-04-12",
"2019-03-01", "2019-02-18", "2019-03-12"),
"window"=c(NA,NA,NA,NA,NA,NA),
"pay"=c(520,140,2200,230,600,700))
Is there any way I could alter the codes I have to get the output I'm looking for? Or any other code that might be more efficient? Thanks in advance!