Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
683 views
in Technique[技术] by (71.8m points)

antlr4 - How to allow all chars in antlr lexxer inside double quotes to recognize a regex value?

I want to extend my grammar, so that it is allowed to define a regex value inside double quotes, here is an example which I want to allow

matches(value, test| ".*foobar[A-Z]");

Actually this is not recognized, because the dot and brackets are recognized before. Here is the Parse Tree

enter image description here

How can I resolve this, I tried it with a new rule ANY: . but with that I could not solve it. Any ideas?

This is my grammar

    grammar FEL;

    prog: expr+ SEMI? EOF;
    expr:
                 statement                     #StatementExpr
                 |NOT expr                     #NotExpr
                 | expr AND expr               #AndExpr
                 | expr (OR | XOR) expr        #OrExpr
                 | function                    #FunctionExpr
                 | LPAREN expr RPAREN          #ParenExpr
                 | writeCommand                #WriteExpr
     ;

    writeCommand: setCommand | setIfCommand;
    statement: ID '=' getCommand  NEWLINE      #Assign;
    setCommand: 'set' LPAREN variable = variableType '|' value = parameter RPAREN;
    setIfCommand: 'setIf' LPAREN variableType '|' expr '?' p1 = parameter ':' p2 = parameter RPAREN;

    getCommand:         getFieldValue                       #FieldValue
                                | getInstanceAttribValue    #InstanceAttribValue
                                | getFormAttribValue        #FormAttributeValue
                                | getMandatorAttribValue    #MandatorAttributeValue
                                ;

    getFieldValue: 'getFieldValue' LPAREN instanceID=ID COMMA fieldname=ID RPAREN;
    getInstanceAttribValue: 'getInstanceAttrib' LPAREN instanceId=ID COMMA moduleId=ID COMMA attribname=ID RPAREN;
    getFormAttribValue: 'getFormAttrib' LPAREN formId=ID COMMA moduleId=ID COMMA attribname=ID RPAREN;
    getMandatorAttribValue: 'getMandatorAttrib' LPAREN mandator=ID COMMA moduleId=ID COMMA attribname=ID RPAREN;
    parameter:
                variableType
                | constType
                ;
    anyType: DoubleQuote ANY DoubleQuote;
    pdixFuncton:ID;
    constType:
                    ID                  #ID_Without
                    | '"'  ID '"'       #ID_WITH
                    | INT               #INT_VALUE
                    | DIGIT_DOT         #DIGIT_DOT_VALUE
                    ;
    variableType:
                        valueType                   #Variable_Value
                        |instanceType               #Variable_Instance
                        |formType                   #Variable_Form
                        |bufferType                 #Variable_Buffer
                        |instanceAttribType         #Variable_Instance_Attrib
                        |formAttribType             #Variable_Form_Attrib
                        |mandatorAttribType         #Variable_Mandator_Attrib
                        |instanceAttachmentType     #Variable_Instance_Attachment
                        |formAttachmentType         #Variable_Form_Attachment
                        |mandatorAttachmentType     #Variable_Mandator_Attachment
                        ;
    valueType:'value' COMMA par=parameter (COMMA functionParameter)?;
    instanceType: 'instance' COMMA instanceParameter;
    formType: 'form' COMMA formParameter;
    bufferType: 'buffer' COMMA id=ID;
    instanceParameter: 'instanceId'
                                    | 'instanceKey'
                                    | 'firstpenId'
                                    | 'lastpenId'
                                    | 'lastUpdate'
                                    | 'started'
                                    ;
    formParameter: 'formId'
                                |'formKey'
                                |'lastUpdate'
                                ;
    functionParameter: 'lastPen'
                                    | 'fieldGroup'
                                    | ' fieldType'
                                    | 'fieldSource'
                                    | 'updateId'
                                    | 'sessionId'
                                    | 'icrConfidence'
                                    | 'icrRecognition'
                                    |  'lastUpdate';

    instanceAttribType: p = ('instattrib' | 'instanceattrib') COMMA attributeType;
    formAttribType:'formattrib' COMMA attributeType;
    mandatorAttribType: 'mandatorattrib' COMMA attributeType;
    instanceAttachmentType:('instattachment' | 'instanceatt') COMMA attachmentType;
    formAttachmentType:'formAtt' COMMA attachmentType;
    mandatorAttachmentType: 'mandatoratt' COMMA attachmentType;


    attributeType: module = ID '#' attribName = ID;
    attachmentType: name = ID '#' page = INT '#' index = INT;

     function:
                    commandIsSet
                    |commandIsEmpty
                    |commandIsEqual
                    |commandIsNumLessEqual
                    |commandIsNumLess
                    |commandIsNumGreaterEqual
                    |commandIsNumGreater
                    |commandIsNumEqual
                    |commandIsLess
                    |commandIsLessEqual
                    |commandIsGreater
                    |commandIsGreaterEqual
                    |commandMatches
                    |commandContains
                    |commandEndsWith
                    |commandStartsWith
                    ;
    commandIsSet: IS_SET LPAREN parameter RPAREN;
    commandIsEmpty: IS_EMPTY LPAREN parameter RPAREN;
    commandIsEqual: IS_EQUAL LPAREN p1 = parameter '|' p2 = parameter RPAREN;
    commandStartsWith: 'startsWith' LPAREN p1 = parameter '|' p2 = parameter RPAREN;
    commandEndsWith: 'endsWith' LPAREN p1 = parameter '|' p2 = parameter RPAREN;
    commandContains: 'contains' LPAREN p1 = parameter '|' p2 = parameter RPAREN;
    commandMatches: 'matches' LPAREN p1 = parameter '|' p2 = parameter RPAREN;
    commandIsLess: 'isLess' LPAREN p1 = parameter '|' p2 = parameter RPAREN;
    commandIsLessEqual: 'isLessEqual' LPAREN p1 = parameter '|' p2 = parameter RPAREN;
    commandIsGreater: 'isGreater' LPAREN p1 = parameter '|' p2 = parameter RPAREN;
    commandIsGreaterEqual: 'isGreaterEqual' LPAREN p1 = parameter '|' p2 = parameter RPAREN;
    commandIsNumEqual: 'isNumEqual' LPAREN p1 = parameter '|' p2 = parameter RPAREN;
    commandIsNumGreater: 'isNumGreater' LPAREN p1 = parameter '|' p2 = parameter RPAREN;
    commandIsNumGreaterEqual: 'isNumGreaterEqual' LPAREN p1 = parameter '|' p2 = parameter RPAREN;
    commandIsNumLess: 'isNumLess' LPAREN p1 = parameter '|' p2 = parameter RPAREN;
    commandIsNumLessEqual: 'isNumLessEqual' LPAREN p1 = parameter '|' p2 = parameter RPAREN;

    /*
    stringFunctionType:
        a=substringStrFunction
        |   a=cutStrFunction
        |   a=replaceStrFunction
        |   a=reformatDateStrFunction
        |   a=translateStrFunction
        |   a=fillStrFunction
        |   a=concatStrFunction
        |   a=justifyStrFunction
        |   a=ifElseStrFunction
        |   a=tokenStrFunction
        |   a=toLowerFunction
        |   a=toUpperFunction
        |   a=trimFunction
    ;
    */

    LPAREN : '(';
    RPAREN : ')';
    LBRACE : '{';
    RBRACE : '}';
    LBRACK : '[';
    RBRACK : ']';
    SEMI : ';';
    COMMA : ',';
    DOT : '.';
    ASSIGN : '=';
    GT : '>';
    LT : '<';
    BANG : '!';
    TILDE : '~';
    QUESTION : '?';
    COLON : ':';
    EQUAL : '==';
    LE : '<=';
    GE : '>=';
    NOTEQUAL : '!=';
    AND : 'and';
    OR : 'or';
    XOR :'xor';
    NOT :'not'  ;
    INC : '++';
    DEC : '--';
    ADD : '+';
    SUB : '-';
    MUL : '*';
    DIV : '/';

    INT: [0-9]+;
    DIGIT_DOT: FloatNumber;

    IS_SET:'isSet';
    IS_EMPTY:'isEmpty';
    IS_EQUAL:'isEqual';
    WS: (' '|'' | NEWLINE | '
' )+ -> skip;
    NEWLINE: '
';
    ID: JavaLetter JavaLetterOrDigit* | ANY;
    ANY: . ;


    fragment FloatNumber: ('0'..'9')+ ('.' ('0'..'9')+)?;

    fragment
    JavaLetter
    :   [a-zA-Z$_] // these are the "java letters" below 0xFF
    |   // covers all characters above 0xFF which are not a surrogate
        ~[u0000-u00FFuD800-uDBFF]
        {Character.isJavaIdentifierStart(_input.LA(-1))}?
    |   // covers UTF-16 surrogate pairs encodings for U+10000 to U+10FFFF
        [uD800-uDBFF] [uDC00-uDFFF]
        {Character.isJavaIdentifierStart(Character.toCodePoint((char)_input.LA(-2), (char)_input.LA(-1)))}?
    ;

    fragment
    JavaLetterOrDigit
    :   [a-zA-Z0-9$_] // these are the "java letters or digits" below 0xFF
    |   // covers all characters above 0xFF which are not a surrogate
        ~[u0000-u00FFuD800-uDBFF]
        {Character.isJavaIdentifierPart(_input.LA(-1))}?
    |   // covers UTF-16 surrogate pairs encodings for U+10000 to U+10FFFF
        [uD800-uDBFF] [uDC00-uDFFF]
        {Character.isJavaIdentifierPart(Character.toCodePoint((char)_input.LA(-2), (char)_input.LA(-1)))}?
    ;
    fragment DoubleQuote: '"' ;   // Hard to read otherwise.
question from:https://stackoverflow.com/questions/65854354/how-to-allow-all-chars-in-antlr-lexxer-inside-double-quotes-to-recognize-a-regex

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Remove the ANY lexer rule from your ID rule, it makes no sense to let input like ^ become an ID.

And creating strings is usually done in the lexer. Something like this should do it:

anyType : STRING;

constType
 : ID        #ID_Without
 | STRING    #ID_WITH
 | INT       #INT_VALUE
 | DIGIT_DOT #DIGIT_DOT_VALUE
 ;

STRING : '"' ( ~[\"
] | '\' ~[
] )* '"';

Also, in your functionParameter rule there's a ' fieldType' token. That should probably be 'fieldType' I guess.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...