c++ - How to handle multi-line rules for gor parsing bnf grammar using boost spirit qi

Question

Welcome To Ask or Share your Answers For Others

c++ - How to handle multi-line rules for gor parsing bnf grammar using boost spirit qi

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

c++ - How to handle multi-line rules for gor parsing bnf grammar using boost spirit qi

Assuming I have a BNF grammar like this

<code>   ::=  <letter><digit> | <letter><digit><code>
<letter> ::= a | b | c | d | e
             | f | g | h | i
<digit>  ::= 0 | 1 | 2 | 3 |
             4

If you look at the <letter> rule, its continuation starts with the | but that of the <digit> rule starts with the production with | appearing at the end of the previous line. I also don't want to use a particular symbol to represent the end of a rule.

How do check if a rule as ended using the Boost Spirit Qi for implementation.

I have just gone through the tutorial on the boost page and wondering how I am going to handle this.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-23T18:42:28+0000

Wikipedia

BNF syntax can only represent a rule in one line, whereas in EBNF a terminating character, the semicolon character “;” marks the end of a rule.

So the simple answer is: the input isn't BNF.

Iff you want to support it anyways (at your own peril :)) you'll have to make it so. So, let's write a simplistic BFN grammar, literally mapping from Wikipedia BNF

<syntax>         ::= <rule> | <rule> <syntax>
<rule>           ::= <opt-whitespace> "<" <rule-name> ">" <opt-whitespace> "::=" <opt-whitespace> <expression> <line-end>
<opt-whitespace> ::= " " <opt-whitespace> | ""
<expression>     ::= <list> | <list> <opt-whitespace> "|" <opt-whitespace> <expression>
<line-end>       ::= <opt-whitespace> <EOL> | <line-end> <line-end>
<list>           ::= <term> | <term> <opt-whitespace> <list>
<term>           ::= <literal> | "<" <rule-name> ">"
<literal>        ::= '"' <text1> '"' | "'" <text2> "'"
<text1>          ::= "" | <character1> <text1>
<text2>          ::= '' | <character2> <text2>
<character>      ::= <letter> | <digit> | <symbol>
<letter>         ::= "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" | "J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" | "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z" | "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" | "j" | "k" | "l" | "m" | "n" | "o" | "p" | "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" | "y" | "z"
<digit>          ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
<symbol>         ::=  "|" | " " | "!" | "#" | "$" | "%" | "&" | "(" | ")" | "*" | "+" | "," | "-" | "." | "/" | ":" | ";" | ">" | "=" | "<" | "?" | "@" | "[" | "" | "]" | "^" | "_" | "`" | "{" | "}" | "~"
<character1>     ::= <character> | "'"
<character2>     ::= <character> | '"'
<rule-name>      ::= <letter> | <rule-name> <rule-char>
<rule-char>      ::= <letter> | <digit> | "-"

It could look like this:

template <typename Iterator>
struct BNF: qi::grammar<Iterator, Ast::Syntax()> {
    BNF(): BNF::base_type(start) {
        using namespace qi;
        start = skip(blank) [ _rule % +eol ];

        _rule       = _rule_name >> "::=" >> _expression;
        _expression = _list % '|';
        _list       = +_term;
        _term       = _literal | _rule_name;
        _literal    = '"' >> *(_character - '"') >> '"'
                    | "'" >> *(_character - "'") >> "'";
        _character  = alnum | char_(""'| !#$%&()*+,./:;>=<?@]\^_`{}~[-");
        _rule_name  = '<' >> (alpha >> *(alnum | char_('-'))) >> '>';

        BOOST_SPIRIT_DEBUG_NODES(
            (_rule)(_expression)(_list)(_term)
            (_literal)(_character)
            (_rule_name))
    }

  private:
    qi::rule<Iterator, Ast::Syntax()>     start;
    qi::rule<Iterator, Ast::Rule(),       qi::blank_type> _rule;
    qi::rule<Iterator, Ast::Expression(), qi::blank_type> _expression;
    qi::rule<Iterator, Ast::List(),       qi::blank_type> _list;
    // lexemes
    qi::rule<Iterator, Ast::Term()>       _term;
    qi::rule<Iterator, Ast::Name()>       _rule_name;
    qi::rule<Iterator, std::string()>     _literal;
    qi::rule<Iterator, char()>            _character;
};

Now it will parse your sample (corrected to be BNF):

    std::string const input = R"(<code>   ::=  <letter><digit> | <letter><digit><code>
<letter> ::= "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i"
<digit>  ::= "0" | "1" | "2" | "3" | "4"
    )";

Live On Compiler Explorer

Prints:

code ::= {<letter>, <digit>} | {<letter>, <digit>, <code>}
letter ::= {a} | {b} | {c} | {d} | {e} | {f} | {g} | {h} | {i}
digit ::= {0} | {1} | {2} | {3} | {4}
Remaining: "
    "

Support Line-Wrapped Rules

The best way is to not accept them - since the grammar wasn't designed for it unlike e.g. EBNF.

You can force the issue by doing a negative look-ahead in the skipper:

_skipper = blank | (eol >> !_rule);
start = skip(_skipper) [ _rule % +eol ];

For technical reasons (Boost spirit skipper issues) that doesn't compile, so we need to feed it a placeholder skipper inside the look-ahead:

_blank = blank;
_skipper = blank | (eol >> !skip(_blank.alias()) [ _rule ]);
start = skip(_skipper.alias()) [ _rule % +eol ];

Now it parses the same but with various line-breaks:

    std::string const input = R"(<code>   ::=  <letter><digit> | <letter><digit><code>
<letter> ::= "a" | "b" | "c" | "d" | "e"
           | "f" | "g" | "h" | "i"
<digit>  ::= "0" | "1" | "2" | "3" |
             "4"
    )";

Printing:

code ::= {<letter>, <digit>} | {<letter>, <digit>, <code>}
letter ::= {a} | {b} | {c} | {d} | {e} | {f} | {g} | {h} | {i}   
digit ::= {0} | {1} | {2} | {3} | {4}

FULL LISTING

Compiler Explorer

//#define BOOST_SPIRIT_DEBUG
#include <boost/spirit/include/qi.hpp>
#include <boost/fusion/adapted.hpp>
#include <fmt/ranges.h>
#include <fmt/ostream.h>
#include <iomanip>
namespace qi = boost::spirit::qi;

namespace Ast {
    struct Name : std::string {
        using std::string::string;
        using std::string::operator=;

        friend std::ostream& operator<<(std::ostream& os, Name const& n) {
            return os << '<' << n.c_str() << '>';
        }
    };

    using Term = boost::variant<Name, std::string>;
    using List = std::list<Term>;
    using Expression = std::list<List>;

    struct Rule {
        Name name; // lhs
        Expression rhs;
    };

    using Syntax = std::list<Rule>;
}

BOOST_FUSION_ADAPT_STRUCT(Ast::Rule, name, rhs)

namespace Parser {
    template <typename Iterator>
    struct BNF: qi::grammar<Iterator, Ast::Syntax()> {
        BNF(): BNF::base_type(start) {
            using namespace qi;
            _blank = blank;
            _skipper = blank | (eol >> !skip(_blank.alias()) [ _rule ]);
            start = skip(_skipper.alias()) [ _rule % +eol ];

            _rule       = _rule_name >> "::=" >> _expression;
            _expression = _list % '|';
            _list       = +_term;
            _term       = _literal | _rule_name;
            _literal    = '"' >> *(_character - '"') >> '"'
                        | "'" >> *(_character - "'") >> "'";
            _character  = alnum | char_(""'| !#$%&()*+,./:;>=<?@]\^_`{}~[-");
            _rule_name  = '<' >> (alpha >> *(alnum | char_('-'))) >> '>';

            BOOST_SPIRIT_DEBUG_NODES(
                (_rule)(_expression)(_list)(_term)
                (_literal)(_character)
                (_rule_name))
        }

      private:
        using Skipper = qi::rule<Iterator>;
        Skipper _skipper, _blank;

        qi::rule<Iterator, Ast::Syntax()>     start;
        qi::rule<Iterator, Ast::Rule(),       Skipper> _rule;
        qi::rule<Iterator, Ast::Expression(), Skipper> _expression;
        qi::rule<Iterator, Ast::List(),       Skipper> _list;
        // lexemes
        qi::rule<Iterator, Ast::Term()>       _term;
        qi::rule<Iterator, Ast::Name()>       _rule_name;
        qi::rule<Iterator, std::string()>     _literal;
        qi::rule<Iterator, char()>            _character;
    };
}

int main() {
    Parser::BNF<std::string::const_iterator> const parser;

    std::string const input = R"(<code>   ::=  <letter><digit> | <letter><digit><code>
<letter> ::= "a" | "b" | "c" | "d" | "e"
           | "f" | "g" | "h" | "i"
<digit>  ::= "0" | "1" | "2" | "3" |
             "4"
    )";

    auto it = input.begin(), itEnd = input.end();

    Ast::Syntax syntax;
    if (parse(it, itEnd, parser, syntax)) {
        for (auto& rule : syntax)
            fmt::print("{} ::= {}
", rule.name, fmt::join(rule.rhs, " | "));
    } else {
        std::cout << "Failed
";
    }

    if (it != itEnd)
        std::cout << "Remaining: " << std::quoted(std::string(it, itEnd)) << "
";
}

Also Live On Coliru (without libfmt)

Categories

c++ - How to handle multi-line rules for gor parsing bnf grammar using boost spirit qi

c++ - How to handle multi-line rules for gor parsing bnf grammar using boost spirit qi

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Support Line-Wrapped Rules

FULL LISTING

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags