This problem is not only hard, it is theoretically unsolvable.
The reason is that any assumption made in constructing a structural
description of the subsequence might not hold for the stream as a whole.
In terms of linguistics, no correct grammar can be induced by any algorithm
using a finite corpus.
But to find incomplete quasi-solutions is a very exciting topic. There is an isomorphy between the problem above and the learning of any kind of rules in general, which relates it directly to questions of artificial intelligence.
Possible applications range from DNA-sequence analysis to data compression.
Jochen Leidner, 1998-04-29