CLUE Department of Computational Linguistics at the University of Erlangen

Automatic Induction of Grammars

Problem: Given a finite subsequence of an infinite stream of data, what is its underlying structure?

This problem is not only hard, it is theoretically unsolvable. The reason is that any assumption made in constructing a structural description of the subsequence might not hold for the stream as a whole.
In terms of linguistics, no correct grammar can be induced by any algorithm using a finite corpus.

But to find incomplete quasi-solutions is a very exciting topic. There is an isomorphy between the problem above and the learning of any kind of rules in general, which relates it directly to questions of artificial intelligence.

Possible applications range from DNA-sequence analysis to data compression.


Jochen Leidner, 1998-04-29