Recursive descent parser

From Wikipedia, the free encyclopedia

(Redirected from Recursive descent)
Jump to: navigation, search

A recursive descent parser is a top-down parser built from a set of mutually-recursive procedures (or a non-recursive equivalent) where each such procedure usually implements one of the production rules of the grammar. Thus the structure of the resulting program closely mirrors that of the grammar it recognizes.

A predictive parser is a recursive descent parser with no backup. Predictive parsing is possible only for the class of LL(k) grammars, which are the context-free grammars for which there exists some positive integer k that allows a recursive descent parser to decide which production to use by examining only the next k tokens of input. (The LL(k) grammars therefore exclude all ambiguous grammars, as well as all grammars that contain left recursion. Any context-free grammar can be transformed into an equivalent grammar that has no left recursion, but removal of left recursion does not always yield an LL(k) grammar.) A predictive parser runs in linear time.

Recursive descent with backup is a technique that determines which production to use by trying each production in turn. Recursive descent with backup is not limited to LL(k) grammars, but is not guaranteed to terminate unless the grammar is LL(k). Even when they terminate, parsers that use recursive descent with backup may require exponential time.

Although predictive parsers are widely used, programmers often prefer to create LR or LALR parsers via parser generators without transforming the grammar into LL(k) form.

Some authors define recursive descent parsers as the predictive parsers. Other authors use the term more broadly, to include backed-up recursive descent.[citation needed]

Contents

The following EBNF-like grammar (for Niklaus Wirth's PL/0 programming language, from Algorithms + Data Structures = Programs) is in LL(1) form (for simplicity, ident and number are assumed to be terminals):

 program = block "." .
 
 block =
     ["const" ident "=" number {"," ident "=" number} ";"]
     ["var" ident {"," ident} ";"]
     {"procedure" ident ";" block ";"} statement .
 
 statement =
     [ident ":=" expression
     | "call" ident
     | "begin" statement {";" statement} "end"
     | "if" condition "then" statement
     | "while" condition "do" statement
     ] .
 
 condition =
     "odd" expression
     | expression ("="|"#"|"<"|"<="|">"|">=") expression
     .
 
 expression = ["+"|"-"] term {("+"|"-") term} .
 
 term = factor {("*"|"/") factor} .
 
 factor = ident | number | "(" expression ")" .

Terminals are expressed in quotes (except for ident and number). Each nonterminal is defined by a rule in the grammar.

What follows is an implementation of a recursive descent parser for the above language in C. The parser reads in source code, and exits with an error message if the code fails to parse, exiting silently if the code parses correctly.

Notice how closely the predictive parser below mirrors the grammar above. There is a procedure for each nonterminal in the grammar. Parsing descends in a top-down manner, until the final nonterminal has been processed. The program fragment depends on a global variable, sym, which contains the next symbol from the input, and the function getsym, which updates sym when called.

The implementations of the functions getsym and error are omitted for simplicity.

typedef enum {ident, number, lparen, rparen, times, slash, plus,
    minus, eql, neq, lss, leq, gtr, geq, callsym, beginsym, semicolon,
    endsym, ifsym, whilesym, becomes, thensym, dosym, constsym, comma,
    varsym, procsym, period, oddsym} Symbol;
 
Symbol sym;
void getsym(void);
void error(const char msg[]);
void expression(void);
 
int accept(Symbol s) {
    if (sym == s) {
        getsym();
        return 1;
    }
    return 0;
}
 
int expect(Symbol s) {
    if (accept(s))
        return 1;
    error("expect: unexpected symbol");
    return 0;
}
 
void factor(void) {
    if (accept(ident)) {
        ;
    } else if (accept(number)) {
        ;
    } else if (accept(lparen)) {
        expression();
        expect(rparen);
    } else {
        error("factor: syntax error");
        getsym();
    }
}
 
void term(void) {
    factor();
    while (sym == times || sym == slash) {
        getsym();
        factor();
    }
}
 
void expression(void) {
    if (sym == plus || sym == minus)
        getsym();
    term();
    while (sym == plus || sym == minus) {
        getsym();
        term();
    }
}
 
void condition(void) {
    if (accept(oddsym)) {
        expression();
    } else {
        expression();
        if (sym == eql || sym == neq || sym == lss ||
            sym == leq || sym == gtr || sym == geq) {
            getsym();
            expression();
        } else {
            error("condition: invalid operator");
            getsym();
        }
    }
}
 
void statement(void) {
    if (accept(ident)) {
        expect(becomes);
        expression();
    } else if (accept(callsym)) {
        expect(ident);
    } else if (accept(beginsym)) {
        do {
            statement();
        } while (accept(semicolon));
        expect(endsym);
    } else if (accept(ifsym)) {
        condition();
        expect(thensym);
        statement();
    } else if (accept(whilesym)) {
        condition();
        expect(dosym);
        statement();
    }
}
 
void block(void) {
    if (accept(constsym)) {
        do {
            expect(ident);
            expect(eql);
            expect(number);
        } while (accept(comma));
        expect(semicolon);
    }
    if (accept(varsym)) {
        do {
            expect(ident);
        } while (accept(comma));
        expect(semicolon);
    }
    while (accept(procsym)) {
        expect(ident);
        expect(semicolon);
        block();
        expect(semicolon);
    }
    statement();
}
 
void program(void) {
    getsym();
    block();
    expect(period);
}

Recursive descent parsers are particularly easy to implement in functional languages such as Haskell or ML.

See Functional Pearls: Monadic Parsing in Haskell

This article was originally based on material from the Free On-line Dictionary of Computing, which is licensed under the GFDL.

Advanced Search
Included Web Search Engines


Safe Search

close

Top Matching Results

Occasionally Search.com will highlight specialized results that are based on the context of your query. Examples of specialized results include specific links to news, images, or video.

Top Matching Results may highlight information from other Search.com pages, content from the CNET Network of sites, or third party content. The listings are based purely on relevance. Search.com does not receive payment for listings in this section but our partners that provide this data may get paid for listing these products.

Sponsored Links

This section contains paid listings which have been purchased by companies that want to have their sites appear for specific search terms and related content. These listings are administered, sorted and maintained by a third party and are not endorsed by Search.com.

Search Results

Search.com sends your search query to several search engines at one time and integrates the results into one list which has been sorted by relevance using Search.com's proprietary algorithm. You can customize the list of search engines included in your metasearch from the preferences.

The search engines that are used in your metasearch may allow companies to pay to have their Web sites included within the results. To view the Paid Inclusion policy for a specific search engine, please visit their Web site. Search.com does not accept payment or share revenue with any search engine partner for listings in this section.