February 19, 2010

Why LR parsers for NLP with TAGs ? : a logical answer

It is known that context free grammars are often used to parse natural language strings. Indeed most of the contructs in natural languages can be expressed using context free grammars. And among the most efficient parsing techniques are the set of LR parsing techniques based on initial work by Don. Now the common shortcoming of LR parsers for them to be used for natural languages widely is that when conflicts occur during the parsing, the context provided by the associated context free grammar to resolve the conflict is very less. Hence conflicts occur commonly when LR parsing techniques are used for natural languages and it is not easy to remove the ambiguity involved.
Now consider the case of tree adjoining grammars, they fall into the category of mildly context sensitive formalisms and are more powerful than context free grammars and also in simple terms allow more than one level of derivation in each production. Given all these factors, applying a little logical thought would tell us that conflicts in LR parsing using a TAG for natural languages can be resolved with much more ease because of the extra context present. Given this I am presently working on writing a LR parser for TAGs, though implementations like XTAG already exist, I am doing this solely to understand the theoretical concepts better.