Corpus-based linguistics avoids the use of invented examples in favour of
real language, as collected in computer corpora.
Before using a corpus, a linguist must ensure that the corpus is
representative with respect to the linguistic phenomenon under
investigation.
I would like to avoid the term `corpus linguistics' and rather speak of
two disciplines: on the one hand, computer corpora are resources established
by natural language engineering, and on the other hand corpus-based
linguistics uses these resources to establish (hopefully) epirically valid
linguistic theories. This terminological division reflects the fact that the
former discipline is technological and the latter scientific.
Jochen Leidner, 1998-04-29