This grammar does not parse "A and B", let alone "A B", starting from booleanAndExpression
or filter
. I suspect that your driver code--which you don't provide--does not check error conditions from the parse and lex. I cannot tell the target you are using, so it is hard to know what your build environment is, and whether it is working or not. But, I would also look into that.
Here is a complete C# app with your grammar as is (link). The driver code is generated by my Antlr4BuildTasks.Templates, and uses Antlr v4.9 and Antlr4BuildTasks, which I also wrote. The driver checks for parse errors, prints out the tokens and trees for your two examples. In both "A and B" and "A B", the parse fails. Here is the output from running your grammar on these inputs.
A and B
line 1:2 no viable alternative at input 'Aand'
line 1:2 no viable alternative at input 'Aand'
line 1:7 no viable alternative at input 'B'
line 1:7 no viable alternative at input 'B'
error in parse.
[@0,0:0='A',<25>,1:0]
[@1,2:4='and',<18>,1:2]
[@2,6:6='B',<25>,1:6]
[@3,7:6='<EOF>',<-1>,1:7]
( booleanAndExpression
( simpleFilter
)
( simpleFilter
( STR i=0 txt=A tt=25 DEFAULT_TOKEN_CHANNEL
) )
( AND i=1 txt=and tt=18 DEFAULT_TOKEN_CHANNEL
)
( simpleFilter
)
( simpleFilter
( STR i=2 txt=B tt=25 DEFAULT_TOKEN_CHANNEL
) ) )
A B
line 1:2 no viable alternative at input 'AB'
line 1:2 no viable alternative at input 'AB'
error in parse.
[@0,0:0='A',<25>,1:0]
[@1,2:2='B',<25>,1:2]
[@2,3:2='<EOF>',<-1>,1:3]
( booleanAndExpression
( simpleFilter
)
( simpleFilter
( STR i=0 txt=A tt=25 DEFAULT_TOKEN_CHANNEL
) )
( simpleFilter
( STR i=1 txt=B tt=25 DEFAULT_TOKEN_CHANNEL
) ) )
Here are a few suggestions to fix your grammar. I've made some of these changes here.
You should get in the habit of looking at the warning messages from the Antlr tool and address them. The Antlr Java tool warns ...Filter.g4:76:0: One of the token AND_MUST values unreachable. AND is always overlapped by token AND
. The rule AND_MUST: 'AND';
can never match because the lexer always matches the first rule. Since rules AND
and AND_MUST
match the same string, and AND
occurs before AND_MUST
, it will always match, and AND_MUST
will never match. You should remove the AND_MUST
rule. (a) Antlr tries to match as many input chars as possible when creating a token; (b) When two or more lexer rules can match the same input, the one defined first wins.
Antlr parses the input string until the rule is satisfied. If a partial string works, it will do so and not complain about the rest of the input. A parse from the booleanAndExpression
, which is not augmented with EOF
, will parse "A" in the input "A B" and return a valid parse. To prevent this behavior, always parse with an EOF-augmented rule, e.g., filter
. The original filter
rule allowed partial input matching, so I adjusted the rule with parentheses (see link).
When I read the rules for logicalExpression
and booleanAndExpression
, it looks like you are using factoring to set up the precedence of the OR and AND operators. But, later on, you use alt-rule precedence for EQ, NE, GT, GE, and other operators. It looks like you copied the expression rules from two different grammars and tried to combine them for your purpose. You should choose one way or the other to set up the precedence for your operators. Factoring is easier to control, but alt-rule precedence results in smaller trees and faster parses.
As I mentioned before, the rule booleanAndExpression : simpleFilter ((AND)? simpleFilter )* ;
is wrong. If you use AND?
, then the sentential form simpleFilter simpleFilter
is valid, so A B
would be valid. I changed this in the modified grammar, but you can check the behavior by modifying the updated grammar I provide with (AND)?
or just AND?
and rerun.
Rule simpleFilter
contains an error: it must include the base case for STR
.
You sprinkle Antlr labels throughout the grammar. I would not recommend that you do this. It clutters the grammar and makes it unportable to other parse generators. To distinguish between one alt from another, use the Antlr accessor functions that the tool generates. For example, to distinguish between "( A )" vs. "A" in the first rule, check whether "context.LPAREN()" is not null.
Instead of using "-> skip" in lexer rules, use an off-channel "-> channel(OFF_CHANNEL)". This allows the error messages to avoid "run-ins", e.g., instead of line 1:2 no viable alternative at input 'AB'
, it will appear line 1:2 no viable alternative at input 'A B'
. However, this would require you to split the combined grammar into separate lexer and parser grammars, probably not required at this point.
There was another "Answer" posted after my post, which recommends adding error productions to the grammar. I would not recommend that you do this. Instead, if you want better error messages, you should modify the error reporter for the parse. You can see the example I provide. A CFG should define the language. It should not be perverted for an implementation detail.
Regards,
Ken
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…