Demo 2.1
User Manual
Alexander Gelbukh
Grigori Sidorov
Mexico City
1999
Contents
1. Welcometo Parser..............................................................................................................................................
1.1 The main screen......................................................................................................................................................
1.2 How Do I........................................................................................................................................................................
2. Screen Elements......................................................................................................................................................
2.1 Trees page...................................................................................................................................................................
2.1.1 Constituency format.............................................................................................................................................
2.1.2 Dependency format...............................................................................................................................................
2.1.3 Graphical format..................................................................................................................................................
2.2 Morphology page...................................................................................................................................................
2.3 Dump page.................................................................................................................................................................
2.4 Tracing page...........................................................................................................................................................
2.5 Text area..................................................................................................................................................................
3. Toolbar.....................................................................................................................................................................
3.1 Open button.............................................................................................................................................................
3.2 Type button.............................................................................................................................................................
3.3 Options button......................................................................................................................................................
3.4 Process All button.............................................................................................................................................
3.5 Zoom Picture button.........................................................................................................................................
3.6 View Additional Nodes button...................................................................................................................
3.7 Constituent Structure button..................................................................................................................
3.8 Dependency Structure button....................................................................................................................
3.9 Graphical Representation button...........................................................................................................
3.10 Help button...........................................................................................................................................................
4. Options.......................................................................................................................................................................
4.1 Proportional Representation checkbox.............................................................................................
4.2 Show Words Vertically checkbox...........................................................................................................
4.3 Morphology radiobuttons...........................................................................................................................
4.4 Trace Parsed checkbox...................................................................................................................................
4.5 Trace Not Parsed checkbox..........................................................................................................................
4.6 Gap Between Levels setting...........................................................................................................................
4.7 Gap On One Level setting..................................................................................................................................
4.8 Check Rules Changes for Each Sentence checkbox.......................................................................
4.9 Only Number of Variants checkbox........................................................................................................
4.10 Maximal Number of Variants in Input setting..............................................................................
5. Grammar....................................................................................................................................................................
5.1 The grammar provided with the parser..............................................................................................
5.2 The grammar compiler....................................................................................................................................
6. Development team...............................................................................................................................................
The Parser Demo program allows you to investigate the syntactic and morphological structure of Spanish sentences using an Extended Context-Free grammar formalism. It is useful for learning the Extended Context-Free grammar formalism and for development and testing the grammar.
Namely, it allows you:
· To view the variants of the syntactic structure of sentences,
· To view the variants of morphological analisis of the words in sentences,
· To investigate the protocol of the parsing process, in order to understand the internal working of the parser.
Please see the following topics:
How Do I..., page 3,ID_How Do I
Screen Elements
, page 5,Development team, page 18.
This is a sample screen of the program:
The screen shows a variant of syntactic structure of a Spanish sentence.
· To type a sentence to analyze, use the New Text button.
· To open a file with a text, use the Open button .
· To view the syntactic tree of the sentence, swicth to the Trees tab and be sure the button is pressed.
· To view the morphological structure of the sentence, switch to the Morphology tab.
· To view the technical information about the parsing process, swicth to the Dump or Tracing tab .
On the Parser Demo screen, you can choose one of the following four pages by clicking on the tabs at the top of the screen:
Trees page, page 5ID_How Do IID_Document page.
Morphology page, page 8ID_By Topic page.
Dump page, page 10ID_By Document page.
Tracing, page 11ID_Dictionary page.
Also on the Parser Demo screen you will found the following elements
The Text area, page 11.
Toolbar, page 12 .
Options
, page 16.
The Trees page allows you to investigate the syntactic structure of the sentences selected in the Document page.
Depending on the settings of the buttons, it can show the tree in one of the following formats:
Constituency format
.Dependency format
.GraphicalID_ format .
You can choose any variant of the syntactic tree in the list presented in the right part of the screen. The limit on the number of the variants presented on the screen is set under the button. If this limit was exhausted, the rest of the variants is ignored.
In this mode, the syntactic trees are presented as constituency structures.
This format of the syntactic tree is set with the button.
In this mode, the syntactic trees are presented as constituency structures.
This format of the syntactic tree is set with the button.
In this mode, the syntactic trees are presented in graphical form. The tree is in the form of constituency structure, but the dependency links (heads) are shown in red color.
This format of the syntactic tree is set with the button.
The Morphology page allows you to investigate the morphological structure of sentences. For each word of the current sentence, a series of its possible normalized forms and their morphological codes is presented.
The Dump page presents all the variants of the syntactic structure simultaneously in the text form.
The Tracing page allows you to investigate the process of parsing. It shows which rules were triggered and in what order, and also shows how the pieces of the structure were built.
In this area the current text is shown, and the current sentence is selected. You can select a sentence to investigate in this area by clicking on it.
The Toolbar provides access to the following settings and tools:
Open button .
Type button
Options button .
Process All button .
Zoom Picture button .
View Additional Nodes button .
Constituent Structure button .
Dependency Structure button .
Graphical Representation button .
Help button .
The Open button allows you to open a file containing the sentences to process. The format of this file depends on the Morphology radiobuttons under the Options button.
The Type button allows you to type the sentence to process. The format of the text depends on the Morphology radiobuttons under the Options button.
The Options button provides access to the following settings:
Proportional Representation checkbox
.Show Words Vertically checkbox
.Morphology radiobuttons
.Trace Parsed checkbox
.Trace Not Parsed checkbox
.Gap Between Levels setting.
Gap On One Level setting.
Check Rules Changes for Each Sentence checkbox.
Only Number of Variants checkbox.
Maximal Number of Variants in input setting.
The Process All button allows you to process all the sentences in the file automatically, in a batch mode.
The Zoom Picture button allows you to view the picture of the tree or the lists on full screen.
The View Additional Nodes button allows you to view the nodes of the grammar automatically added in the process of conversion of the grammar into the Chomsky normal form. When it is pressed, the additional nodes are shown.
The Constituent Structure button shows the syntactic tree in the traditional constituent structure form.
The Dependency Structure button shows the syntactic tree in the form of dependency tree.
The Graphical Representation button shows the syntactic tree in the graphical form. The constituent structure is shown. The dependencies (heads) are shown in red color.
The Help button shows this guide.
The Proportional Representation checkbox affects the way the nodes of the tree are centered in the graphical mode. We recommend to check this checkbox.
The Show Words Vertically checkbox affects the way the words are presented in the tree in the graphical mode. We recommend to uncheck this checkbox.
The Morphology radiobuttons determine the expected format of the input text file.
· Morphology in rules |
– all the wordforms of the file are expected to be terminal nodes of the grammar. |
· Morphology in input |
– the input file has a structured form, with the morphological codes explicitly assigned to each word. |
· Morphological analysis |
– the input is a plain text, and the program will analyze it morphologically. |
We recommend the option Morphological Analysis.
If the Trace Parsed checkbox is checked, the successfully parsed sentences will be traced in the Trace page.
Uncheking this checkbox speeds up the program when viewing the Trace page, and also allows to see in the Trace page only the sentences for which parsing failed.
If the Trace Not Parsed checkbox is checked, the sentences for which parsing failed will be traced in the Trace page.
Uncheking this checkbox speeds up the program when viewing the Trace page, and also allows to see in the Trace page only the successfully parsed sentences.
The Gap Between Levels setting affects the way the nodes of the tree are laid out in the graphical mode. The recommended value is 40.
The Gap Between Levels setting affects the way the nodes of the tree are laid out in the graphical mode. The recommended value is 10.
The Check Rules Changes for Each Sentence checkbox allows you to change the grammar without reloading the program. When this checkbox is set, any changes in the grammar will affect the program immediately. However, this slightly slows down the processing.
We recommend to check this checkbox.
The Only Number of Variants checkbox allows you to skip the phase of loading the found variants into the program’s viewer. Instead, when this checkbox is set, the program only detects the number of variants found for each sentence. This greatly speeds up the processing, however, you cannot view the found variants.
We recommend to uncheck this checkbox to view the results of parsing, and to check it to test the coverage of the grammar and the average ambiguity of parsing with the given grammar.
The Maximal Number of Variants in Input setting allows you to skip the phase of loading too many of the found variants into the program’s viewer. Instead, the program only loads the number of variants up to the given one for each sentence. This speeds up the processing, however, you cannot view all of the found variants.
The parser is provided with the following grammar:
# -------------------------
# General rules and clauses
# -------------------------
S
-> [BEG_S] @:S_SET END_S
-> [BEG_S] [BEG_S] @:CLAUSE END_S
-> [BEG_S] [BEG_S] @:ADVP END_S
-> @:PP END_S
-> [L_CONJ] @:LIS_NP END_S
-> [BEG_S] @:NP(nmb,gnd) END_S
S_SET
-> [L_CONJ] [SEP_O] [CIR] [SEP_O] [CIR] [SEP_O] @:CLAUSE
-> @:S_SET LIS_CLAUSE
-> LIS_CLAUSE @:S_SET
-> [CIR] [SEP_O] [CIR] [SEP_O] @:CLAUSE [SEP_O] [CIR] [CIR]
LIS_CLAUSE
-> SEP_O @:CLAUSE [LIS_CLAUSE]
-> [SEP_O] @:L_CONJ LIS_CLAUSE
SEP_O
-> ',' | ':' | ';' | '...' | '(' | '¿' | '¡' | '"' | '-' | ')'
END_S
-> '-' | ')' | '!' | '"' | '?' | '.'
BEG_S
-> '¿' | '-' | '¡'
L_CONJ
-> CONJ
-> CONJ_SUB
-> 'que'
-> '...'
CLAUSE
-> [NP_PERS(nmb,gnd,pers)] [CONJ_SUB] [PPR] [PPR] @:VP(nmb,pers,mean) [ADVP]
-> [NP_PERS(nmb,gnd,pers)] [SEP_O] [PPR] [PPR] @:VP(nmb,pers,mean) [ADVP]
-> [NP_PERS(nmb,gnd,pers)] [SEP_O] [PPR] [PPR] @:V(nmb,pers,AUX) [CIR] [ADVP]
-> ADVP [','] @:CLAUSE
-> [','] L_CONJ [','] @:CLAUSE
CIR
-> @:ADVP
-> @:PP [LIS_PP]
-> [L_CONJ] @:GER [LIS_NP]
# ---------------
# Nominal phrases
# ---------------
NP_PERS(nmb,gnd,3PRS)
-> NP(nmb,gnd)
NP_PERS(SG,gnd,1PRS)
-> 'yo'
NP_PERS(PL,gnd,1PRS)
-> 'nosotros'
NP_PERS(SG,gnd,2PRS)
-> 'tu'
NP_PERS(SG,gnd,3PRS)
-> 'él'
NP(nmb,gnd)
-> @:NP(nmb,gnd) ADV
-> @:NP(nmb,gnd) AP(nmb,gnd)
-> [DP(nmb,gnd)] @:NOM(nmb,gnd)
-> @:NP(nmb,gnd) LIS_NP(nmb1,gnd1)
LIS_NP(nmb,gnd)
-> ',' @:NP(nmb,gnd) [LIS_NP(nmb1,gnd1)]
-> @:CONJ NP(nmb,gnd)
NOM(nmb,gnd)
-> [NUM] @:N(nmb,gnd)
-> @:NOM(nmb,gnd) AP(nmb,gnd)
-> AP(nmb,gnd) @:NOM(nmb,gnd)
-> @:NOM(nmb,gnd) PP
-> @:AP(nmb,gnd)
-> INFP
-> PPR
-> DATE
-> NUM
-> 'quien'
# -------------
# Miscellaneous
# -------------
DP(nmb,gnd)
-> DET(nmb,gnd)
-> ART(nmb,gnd)
LIS_PP
-> ',' @:PP [LIS_PP]
-> @:CONJ PP
PP
-> @:PR NP(nmb,gnd)
-> @:PR CLAUSE
-> @:'que' NP(nmb,gnd)
-> @:'que' CLAUSE
AP(nmb,gnd)
-> @:ADJ(nmb,gnd) [AP(nmb,gnd)]
-> ',' @:ADJ(nmb,gnd) [AP(nmb,gnd)]
-> @:CONJ ADJ(nmb,gnd)
# --------------------
# Personal verb phrase
# --------------------
VP(nmb,pers,AUX)
-> @:V(nmb,pers,AUX)
VP(nmb,pers,mean)
-> @:VP_needs_NP(nmb,pers,mean) [NP(nmb1,gnd1)]
-> [ADVP] [SEP_O] @:VP(nmb,pers,mean) PP
-> @:V(nmb,pers,mean) GER
VP_needs_NP(nmb,pers,mean)
-> [ADVP] @:VP_V(nmb,pers,mean)
-> @:VP_needs_NP(nmb,pers,mean) PP
VP_V(nmb,pers,mean)
-> @:V(nmb,pers,mean)
-> @:'haber'(nmb,pers) [ADVP] PART(SG,MASC)
-> @:'ser'(nmb,pers) [ADVP] PART(nmb,gnd)
-> @:'ser'(nmb,pers) [ADVP] AP(nmb,gnd)
-> @:'estar'(nmb,pers) [ADVP] PART(nmb,gnd)
ADVP
-> [L_CONJ] @:ADV [NP(nmb,gnd)]
-> [L_CONJ] @:PP
-> 'hacer'(nmb,pers) @:NP(nmb,gnd)
-> @:PR ADV
# ------------------------
# Infinitivall verb phrase
# ------------------------
INFP
-> [ADVP] @:VP_INF(nmb,gnd,mean) [ADV]
-> [ADVP] @:V(INF,aux) [ADV]
VP_INF(nmb,gnd,mean)
-> @:VP_needs_NP_INF(nmb,gnd,mean) [NP(nmb1,gnd1)]
-> @:V(INF,mean) PP
VP_needs_NP_INF(nmb,gnd,mean)
-> @:VP_V_INF(nmb,gnd,mean)
-> @:VP_needs_NP_INF(nmb,gnd,mean) PP
VP_V_INF(nmb,gnd,mean)
-> @:V(INF,mean)
-> @:'haber'(INF) [ADVP] PART(SG,MASC)
-> @:'ser'(INF) [ADVP] PART(nmb,gnd)
The @ sign marks the lexical heads, the | sign separates alternatives, the [] signs denote optional elements. The terminal symbols are defines in the file lexesp.mrk where they are mapped to the symbols used by the morphological analyzer.
The symbols in parentheses denote morphological characteristics of the lexical heads of the constituents. They are supposed to be agreed if used more than one time. For example, in the rule
CLAUSE
-> NP_PERS(nmb,gnd,pers) @:VP(nmb,pers,mean)
the number nmb of the noun phrase and the verb phrase agree, while in the rule
VP(nmb,pers,mean)
-> @:VP_needs_NP(nmb,pers,mean) NP(nmb1,gnd1)
the number nmb1 and gender gnd1 are not supposed to agree with those of the verb.
The values for the variables like nmb and gnd are specified in the file descvar.txt.
To change the grammar, use the grammar compiler described in the next section.
To change the grammar, make the necessary changes to the file grammar.txt in the directory compile/data, and, if necessary, to the files lexesp.mrk and descvar.txt in the same directory, and run the program process.bat.
If you want to change the grammar without leaving the Parser Demo program, be sure to check the Check Rules Changes for Each Sentence checkbox under the Options button.
This
software is (C) Copyright by
the Center for Computing Research of National Polytechnic Institute, Mexico.
It was developed by the Natural Language
Laboratory.
The Parser Demo development team:
Design:
Dr. Alexander Gelbukh,
Programming: Dr. Grigori Sidorov,
Grammar: Sofía Galicia Haro.