Next:The
Syntagmatic ApproachUp:Managing
terminology using statistical Previous:The
contents of this
Constructing a Domain Model Using
Statistical Analyses for Forming a Partial Formal Grammar
When faced with a corpus describing an unfamiliar domain, it
may be difficult to structure the domain [1].
Statistical analyses, that have originally been developed to be used in
deciphering ancient scripts [8, 9],
are applied to analyse the structure of the corpus by starting to form
a formal grammar. Context-free grammars used out of different types
of formal grammars are
-
the simplest grammars with sufficient power for some kind of description
of natural language, and
-
the most complex grammars with a relatively easily manageable syntax.
Correct inference of a complete and unique context-free grammar from a
text with no additional information is impossible [3].
However, additional information is expected to be collected by analyzing
combinatorial properties of the corpus. In addition, the grammar need not
be complete to reveal structural features of the corpus.
In the following sections, describing the analyses, a term may mean
either a single word or a nonterminal symbol presenting a sequence of terms
or a group of alternative terms.
Next:The
Syntagmatic ApproachUp:Managing
terminology using statistical Previous:The
contents of this
Päivikki Parpola
Sat Oct 14 22:52:14 EEST 2000