An Extended Subcategorization Frames Dictionary
A.F. Gelbukh, Sofia N. Galicia-Haro
The information on syntactic government, or subcategorization, is an integral
part of a language lexicon. The lack of this information leads to such errors
as *to marry with Mary that can be said by a French-speaking person, or,
say, *to marry on Mary or *to marry behind John that can be said
by a Russian-speaking person; clearly, for an English-speaking person it would
be difficult to choose the correct preposition when speaking in these
languages: *se marier Marie. The intuition based on one's native
language often does him or her an ill service when speaking in another, even
closely cognate language. Unfortunately, in the teaching practice and in the
existing manuals little attention is paid to clear and systematic consideration
of syntactic government. In the paper, a Spanish dictionary of extended
subcategorization frames is presented. Such a frame (a government pattern) for
a word lists the means of expression of its valences as well as gives the
information on the compatibility of these valences or specific means of their
expression (e.g., Spanish *mover de ... hasta ...). A
statistical algorithm of compilation of such a dictionary from a large
unprepared text corpus is discussed. The algorithm is non-supervised and
produces a list of prepositions (or grammatical cases) used with each word; at
the same time the algorithm resolves the syntactic ambiguity in the corpus.
Another, supervised algorithm is intended for computer-aided compilation of the
human-oriented dictionary.