Saturday, December 30, 2006

Maximal Munch

How is the expression x+++y parsed? As x++ + y or x + ++y ? Why?

The issue is with tokenization. The ANSI C standard requires that the longest possible sequence of characters be considered as a token. + and ++ are both valid tokens. When a +++ is encountered , the longest possible token is the ++. So, +++ is tokenized into ++ +.

On a similar note, x+++++y is always tokenized as x ++ ++ + y, and gives a syntax error. This is inspite of the fact that the tokenization x ++ + ++ y would not have an error.

This 'oddity' is known as the "maximal munch" principle.

No comments: