Operator-precedence parser

From Wikipedia, the free encyclopedia

An operator precedence parser is a computer program that interprets an operator-precedence grammar. For example, most calculators use operator precedence parsers to convert from infix notation with order of operations (the usual format humans use for mathematical expressions) into a different format they use internally to compute the result.

Dijkstra's shunting yard algorithm (named after the shunting yard) is commonly used to implement operator precedence parsers which convert from infix notation to Reverse Polish notation (RPN).

Contents

An operator-precedence parser is a simple shift-reduce parser capable of parsing a subset of LR(1) grammars. More precisely, the operator-precedence parser can parse all LR(1) grammars where two consecutive nonterminals never appear in the right-hand side of any rule.

Operator-precedence parsers are not used often in practice, however they do have some properties that make them useful within a larger design. First, they are simple enough to write by hand, which is not generally the case with more sophisticated shift-reduce parsers. Second, they can be written to consult an operator table at runtime, which makes them suitable for languages that can add to or change their operators while parsing.

Perl 6 "sandwiches" an operator-precedence parser in between two Recursive descent parsers in order to achieve a balance of speed and dynamism. This is expressed in the virtual machine for Perl 6, Parrot as the Parser Grammar Engine (PGE). GCC's C and C++ parsers, which are hand-coded recursive descent parsers, are both sped up by an operator-precedence parser that can quickly examine arithmetic expressions.

An EBNF grammar that parses infix notation will usually look like this:

   expression ::= equality-expression
   equality-expression ::= additive-expression ( ( '==' | '!=' ) additive-expression ) *
   additive-expression ::= multiplicative-expression ( ( '+' | '-' ) multiplicative-expression ) *
   multiplicative-expression ::= primary ( ( '*' | '/' ) primary ) *
   primary ::= '(' expression ')' | NUMBER | VARIABLE | '-' primary

With many levels of precedence, implementing this grammar with a predictive recursive-descent parser can become inefficient. Parsing a number, for example, can require five function calls (one for each non-terminal in the grammar, until we reach primary).

An operator-precedence parser can do the same more efficiently. The idea is that we can left associate the arithmetic operations as long as we find operators with the same precedence, but we have to save a temporary result to evaluate higher precedence operators. The algorithm that is presented here does not need an explicit stack: instead, it uses recursive calls to implement the stack.

The algorithm is not a pure operator-precedence parser like the Dijkstra shunting yard algorithm. It assumes that the primary nonterminal is parsed in a separate subroutine, like in a recursive descent parser.

The pseudo-code for the algorithm is as follows. The parser starts at function parse_expression. Precedence levels are greater or equal to 0.

parse_expression ()

  • return parse_expression_1 (parse_primary (), 0)

parse_expression_1 (lhs, min_precedence)

  • while the next token is a binary operator whose precedence is >= min_precedence
  • op := next token
  • rhs := parse_primary ()
  • while the next token is a binary operator whose precedence is greater than op's, or a right-associative operator whose precedence is equal to op's
  • lookahead := next token
  • rhs := parse_expression_1 (rhs, lookahead's precedence)
  • lhs := the result of applying op with operands lhs and rhs
  • return lhs

An example execution on the expression 2 + 3 * 4 + 5 == 19 is as follows. We give precedence 0 to equality expressions, 1 to additive expressions, 2 to multiplicative expressions.

parse_expression_1 (lhs = 2, min_precedence = 0)

  • the next token is +, with precedence 1. the while loop is entered.
  • op is + (precedence 1)
  • rhs is 3
  • the next token is *, with precedence 2. recursive invocation.
    parse_expression_1 (lhs = 3, min_precedence = 2)
  • the next token is *, with precedence 2. the while loop is entered.
  • op is * (precedence 2)
  • rhs is 4
  • the next token is +, with precedence 1. no recursive invocation.
  • lhs is assigned 3*4 = 12
  • the next token is +, with precedence 1. the while loop is left.
  • 12 is returned.
  • the next token is +, with precedence 1. no recursive invocation.
  • lhs is assigned 2+12 = 14
  • the next token is +, with precedence 1. the while loop is not left.
  • op is + (precedence 1)
  • rhs is 5
  • the next token is ==, with precedence 0. no recursive invocation.
  • lhs is assigned 14+5 = 19
  • the next token is ==, with precedence 0. the while loop is not left.
  • op is == (precedence 0)
  • rhs is 19
  • the next token is end-of-line, which is not an operator. no recursive invocation.
  • lhs is assigned the result of evaluating 19 == 19, for example 1 (as in the C standard).
  • the next token is end-of-line, which is not an operator. the while loop is left.

1 is returned.

There are alternative ways to apply operator precedence rules which do not involve the Shunting Yard algorithm. One is to build a tree of the original expression, then apply tree rewrite rules to it. Another is the algorithm used in the early FORTRAN I compiler, which is to first fully parenthesise the expression by a trivial macro replacement — inserting a number of parentheses around each operator, such that they lead to the correct precedence when parsed with a 'flat' grammar. (A hack which takes longer to explain properly than to write code for - see below!)

#include 
#include 
int main(int argc, char *argv[]){
  int i;

  printf("((((");
  for(i=1;i!=argc;i++){
    if(     strcmp(argv[i], "^")==0) printf(")^(");
    else if(strcmp(argv[i], "*")==0) printf("))*((");
    else if(strcmp(argv[i], "/")==0) printf("))/((");
    else if(strcmp(argv[i], "+")==0) printf(")))+(((");
    else if(strcmp(argv[i], "-")==0) printf(")))-(((");
    else                             printf("%s", argv[i]);
  }
  printf("))))\n");
  return 0;
}

Invoke it as:

$ cc -o parenthesise parenthesise.c
$ ./parenthesise a \* b + c ^ d / e
((((a))*((b)))+(((c)^(d))/((e))))

Advanced Search
Included Web Search Engines


Safe Search

close

Top Matching Results

Occasionally Search.com will highlight specialized results that are based on the context of your query. Examples of specialized results include specific links to news, images, or video.

Top Matching Results may highlight information from other Search.com pages, content from the CNET Network of sites, or third party content. The listings are based purely on relevance. Search.com does not receive payment for listings in this section but our partners that provide this data may get paid for listing these products.

Sponsored Links

This section contains paid listings which have been purchased by companies that want to have their sites appear for specific search terms and related content. These listings are administered, sorted and maintained by a third party and are not endorsed by Search.com.

Search Results

Search.com sends your search query to several search engines at one time and integrates the results into one list which has been sorted by relevance using Search.com's proprietary algorithm. You can customize the list of search engines included in your metasearch from the preferences.

The search engines that are used in your metasearch may allow companies to pay to have their Web sites included within the results. To view the Paid Inclusion policy for a specific search engine, please visit their Web site. Search.com does not accept payment or share revenue with any search engine partner for listings in this section.