Read Aho section 4.1 regarding panic-mode as a method of compiler error recovery.
A simple panic-mode error handling system requires that we return to a high-level parsing function when a parsing or lexical error is detected. The high-level function re-synchronizes the input stream by skipping tokens until a suitable spot to resume parsing is found. For a grammar that ends statements with semicolons, the semicolon becomes the synchronizing token.
We add error-handling code to all the parsing functions so that when they detect parsing errors, instead of exiting, they return FALSE. We check the return codes of all the parsing functions, and return FALSE when any of them return FALSE. The high-level parsing function detects the FALSE return and does the appropriate re-synchronization of the input stream by skipping tokens.
All the parsing functions become Boolean functions. Each parsing function may succeed, in which case we continue parsing, or fail, in which case we stop parsing and return the failure indication to our parent function. For example:
static Boolean expression(void){ CALL term(), return FALSE if it fails WHILE TYPEOFTOKEN is PLUS or is MINUS DO get the next token CALL term(), return FALSE if it fails END WHILE return TRUE; -- parsing succeeded so far }static Boolean factor(void){ SWITCH TYPEOFTOKEN CASE ID: get the next token return TRUE -- parsing succeeded so far CASE CONST: get the next token return TRUE -- parsing succeeded so far CASE LEFTPAREN: get the next token CALL expression(), return FALSE if it fails CALL match(RIGHTPAREN), return FALSE if it fails return TRUE -- parsing succeeded so far DEFAULT: /*FALLTHROUGH*/ END SWITCH eprintf("File %s Line %ld: Expecting %s, %s, or %s;" " found: %s '%s'", filename, LINENUMBER, tokenType(ID), tokenType(CONST), tokenType(LEFTPAREN), tokenType(TYPEOFTOKEN), LEXEMESTR ); return FALSE }
To complete the panic-mode error recovery, some upper-level parsing function must detect the failure to parse and skip forward until an appropriate re-synchronizing token is found. Here is an example.
The function below is the top-level (root) parsing function for a parser that recognizes assignment statements and print statements that end in semicolons. On error, we call a panic() function to re-synchronize the input. We return the number of times that panic() was called, so that our calling function can print it.
Each parsing function is responsible for issuing an error message at the point where it detects a syntax error.
static int doParsing(void){ initialize errorcounter to zero WHILE TYPEOFTOKEN is not EOF DO SWITCH TYPEOFTOKEN CASE ID: -- ID is in the FIRST set of assignment() returnStatus = assignment() break CASE PRINT: -- PRINT is in the FIRST set of print() returnStatus = print() break CASE ... -- Other cases can go here, for other statement types break DEFAULT: eprintf("File %s Line %ld: Expecting %s or %s;" " found: %s '%s'", filename, LINENUMBER, tokenType(ID), tokenType(PRINT), tokenType(TYPEOFTOKEN), LEXEMESTR ); returnStatus = FALSE break END SWITCH IF returnStatus is FALSE THEN CALL panic() increment errorcounter ENDIF END WHILE return errorcounter }A semi-colon is a good re-synchronizing token to use in the grammars used in this course. Skipping and stopping just before the reserved words that start statements would also be a good strategy. For debugging purposes, we print out the type and value of all the tokens we skip over.
Once the semi-colon is found, one more token of look ahead is read; this prepares the parser to resume parsing after the statement with the syntax error. The code for this looks like this:
static void panic(void){WHILE TYPEOFTOKEN is not SEMICOLON and is not EOF DO eprintf("File %s Line %ld: Skipping over %s '%s'", filename, LINENUMBER, tokenType(TYPEOFTOKEN), LEXEMESTR); get next token END WHILEeprintf("File %s Line %ld: Skipped to %s '%s'\n", filename, LINENUMBER, tokenType(TYPEOFTOKEN), LEXEMESTR);IF TYPEOFTOKEN is SEMICOLON THEN get next token ENDIF }
Error handling in the above, Boolean way, where we have to test the return code of every function and propagate the return codes all the way back up a nested call sequence, complicates the code and makes it harder to read and maintain. An alternative implementation might use the setjmp() and longjmp() library functions to return directly to the error handling code, without requiring all the intermediate functions to return TRUE/FALSE.
More on this, another day.