Given a regular expression containing capture groups (parentheses) and a string, how can I obtain all the substrings matching the capture groups, i.e., the substrings usually referenced by "1", "2"?
Example: consider a regex capturing digits preceded by "xy":
s <- "xy1234wz98xy567"
r <- "xy(\d+)"
Desired result:
[1] "1234" "567"
First attempt: gregexpr
:
regmatches(s,gregexpr(r,s))
#[[1]]
#[1] "xy1234" "xy567"
Not what I want because it returns the substrings matching the entire pattern.
Second try: regexec
:
regmatches(s,regexec("xy(\d+)",s))
#[[1]]
#[1] "xy1234" "1234"
Not what I want because it returns only the first occurence of a matching for the entire pattern and the capture group.
If there was a gregexec
function, extending regexec
as gregexpr
extends regexpr
, my problem would be solved.
So the question is: how to retrieve all substrings (or indices that can be passed to regmatches
as in the examples above) matching capture groups in an arbitrary regular expression?
Note: the pattern for r
given above is just a silly example, it must remain arbitrary.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…