National Polytechnic Institute
Center for Computing Research
Natural Language Processing Laboratory






Demo 2.1


User Manual


Alexander Gelbukh

Grigori Sidorov






Mexico City



1.     Welcometo Parser

The Parser Demo program allows you to investigate the syntactic and morphological structure of Spanish sentences using an Extended Context-Free grammar formalism. It is useful for learning the Extended Context-Free grammar formalism and for development and testing the grammar.

Namely, it allows you:

·                     To view the variants of the syntactic structure of sentences,

·                     To view the variants of morphological analisis of the words in sentences,

·                     To investigate the protocol of the parsing process, in order to understand the internal working of the parser.

Please see the following topics:

1.1     The main screen

This is a sample screen of the program:

The screen shows a variant of syntactic structure of a Spanish sentence.

1.2     How Do I...

·                     To type a sentence to analyze, use the  New Text button.

·                     To view the syntactic tree of the sentence, swicth to the Trees tab and be sure the  button is pressed.

·                     To view the morphological structure of the sentence, switch to the Morphology tab.

2.     Screen Elements

On the Parser Demo screen, you can choose one of the following four pages by clicking on the tabs at the top of the screen:

Also on the Parser Demo screen you will found the following elements

            Toolbar, page 12.

2.1     Trees page

The Trees page allows you to investigate the syntactic structure of the sentences selected in the Document page.

Depending on the settings of the  buttons, it can show the tree in one of the following formats:

You can choose any variant of the syntactic tree in the list presented in the right part of the screen. The limit on the number of the variants presented on the screen is set under the  button. If this limit was exhausted, the rest of the variants is ignored.

2.1.1     Constituency format

In this mode, the syntactic trees are presented as constituency structures.

This format of the syntactic tree is set with the  button.

2.1.2     Dependency format

In this mode, the syntactic trees are presented as constituency structures.

This format of the syntactic tree is set with the  button.

2.1.3     Graphical format

In this mode, the syntactic trees are presented in graphical form. The tree is in the form of constituency structure, but the dependency links (heads) are shown in red color.

This format of the syntactic tree is set with the  button.

2.2     Morphology page

The Morphology page allows you to investigate the morphological structure of sentences. For each word of the current sentence, a series of its possible normalized forms and their morphological codes is presented.

2.3     Dump page

The Dump page presents all the variants of the syntactic structure simultaneously in the text form.

2.4     Tracing page

The Tracing page allows you to investigate the process of parsing. It shows which rules were triggered and in what order, and also shows how the pieces of the structure were built.

2.5     Text area

In this area the current text is shown, and the current sentence is selected. You can select a sentence to investigate in this area by clicking on it.

3.     Toolbar

The Toolbar provides access to the following settings and tools:

3.1     Open button


The Open button allows you to open a file containing the sentences to process. The format of this file depends on the Morphology radiobuttons under the  Options button.

3.2     Type button

The Type button allows you to type the sentence to process. The format of the text depends on the Morphology radiobuttons under the Options button.

3.3     Options button


The Options button provides access to the following settings:

            Gap Between Levels setting.

            Gap On One Level setting.

            Check Rules Changes for Each Sentence checkbox.

            Only Number of Variants checkbox.

            Maximal Number of Variants in input setting.

3.4     Process All button


The Process All button allows you to process all the sentences in the file automatically, in a batch mode.

3.5     Zoom Picture button


The Zoom Picture button allows you to view the picture of the tree or the lists on full screen.

3.6     View Additional Nodes button


The View Additional Nodes button allows you to view the nodes of the grammar automatically added in the process of conversion of the grammar into the Chomsky normal form. When it is pressed, the additional nodes are shown.

3.7     Constituent Structure button


The Constituent Structure button shows the syntactic tree in the traditional constituent structure form.

3.8     Dependency Structure button


The Dependency Structure button shows the syntactic tree in the form of dependency tree.

3.9     Graphical Representation button


The Graphical Representation button shows the syntactic tree in the graphical form. The constituent structure is shown. The dependencies (heads) are shown in red color.

3.10     Help button


The Help button shows this guide.

4.     Options

The Proportional Representation checkbox affects the way the nodes of the tree are centered in the graphical mode. We recommend to check this checkbox.

The Show Words Vertically checkbox affects the way the words are presented in the tree in the graphical mode. We recommend to uncheck this checkbox.

The Morphology radiobuttons determine the expected format of the input text file.


·      Morphology in rules

– all the wordforms of the file are expected to be terminal nodes of the grammar.

·      Morphology in input

– the input file has a structured form, with the morphological codes explicitly assigned to each word.

·      Morphological analysis

– the input is a plain text, and the program will analyze it morphologically.


We recommend the option Morphological Analysis.

If the Trace Parsed checkbox is checked, the successfully parsed sentences will be traced in the Trace page.

Uncheking this checkbox speeds up the program when viewing the Trace page, and also allows to see in the Trace page only the sentences for which parsing failed.

If the Trace Not Parsed checkbox is checked, the sentences for which parsing failed will be traced in the Trace page.

Uncheking this checkbox speeds up the program when viewing the Trace page, and also allows to see in the Trace page only the successfully parsed sentences.


4.6     Gap Between Levels setting

The Gap Between Levels setting affects the way the nodes of the tree are laid out in the graphical mode. The recommended value is 40.

4.7     Gap On One Level setting

The Gap Between Levels setting affects the way the nodes of the tree are laid out in the graphical mode. The recommended value is 10.

4.8     Check Rules Changes for Each Sentence checkbox

The Check Rules Changes for Each Sentence checkbox allows you to change the grammar without reloading the program. When this checkbox is set, any changes in the grammar will affect the program immediately. However, this slightly slows down the processing.

We recommend to check this checkbox.

4.9     Only Number of Variants checkbox

The Only Number of Variants checkbox allows you to skip the phase of loading the found variants into the program’s viewer. Instead, when this checkbox is set, the program only detects the number of variants found for each sentence. This greatly speeds up the processing, however, you cannot view the found variants.

We recommend to uncheck this checkbox to view the results of parsing, and to check it to test the coverage of the grammar and the average ambiguity of parsing with the given grammar.

4.10     Maximal Number of Variants in Input setting

The Maximal Number of Variants in Input setting allows you to skip the phase of loading too many of the found variants into the program’s viewer. Instead, the program only loads the number of variants up to the given one for each sentence. This speeds up the processing, however, you cannot view all of the found variants.

5.     Grammar

5.1     The grammar provided with the parser

The parser is provided with the following grammar:

# -------------------------

# General rules and clauses

# -------------------------



        -> [BEG_S] @:S_SET END_S

        -> [BEG_S] [BEG_S] @:CLAUSE END_S

        -> [BEG_S] [BEG_S] @:ADVP END_S 

        -> @:PP END_S                   

        -> [L_CONJ] @:LIS_NP END_S      

        -> [BEG_S] @:NP(nmb,gnd) END_S  



        -> [L_CONJ] [SEP_O] [CIR] [SEP_O] [CIR] [SEP_O] @:CLAUSE

        -> @:S_SET LIS_CLAUSE

        -> LIS_CLAUSE @:S_SET

        -> [CIR] [SEP_O] [CIR] [SEP_O] @:CLAUSE [SEP_O] [CIR] [CIR]



        -> SEP_O @:CLAUSE [LIS_CLAUSE]

        -> [SEP_O] @:L_CONJ LIS_CLAUSE



        -> ',' | ':' | ';' | '...' | '(' | '¿' | '¡' | '"' | '-' | ')'



        -> '-' | ')' | '!' | '"' | '?' | '.'



        -> '¿' | '-' | '¡'



        -> CONJ

        -> CONJ_SUB

        -> 'que'

        -> '...'



        -> [NP_PERS(nmb,gnd,pers)] [CONJ_SUB] [PPR] [PPR] @:VP(nmb,pers,mean) [ADVP]

        -> [NP_PERS(nmb,gnd,pers)] [SEP_O] [PPR] [PPR] @:VP(nmb,pers,mean) [ADVP]

        -> [NP_PERS(nmb,gnd,pers)] [SEP_O] [PPR] [PPR] @:V(nmb,pers,AUX) [CIR] [ADVP]

        -> ADVP [','] @:CLAUSE

        -> [','] L_CONJ [','] @:CLAUSE



        -> @:ADVP

        -> @:PP [LIS_PP]

        -> [L_CONJ] @:GER [LIS_NP]


# ---------------

# Nominal phrases

# ---------------



        -> NP(nmb,gnd)



        -> 'yo'



        -> 'nosotros'



        -> 'tu'



        -> 'él'



        -> @:NP(nmb,gnd) ADV

        -> @:NP(nmb,gnd) AP(nmb,gnd)

        -> [DP(nmb,gnd)] @:NOM(nmb,gnd)

        -> @:NP(nmb,gnd) LIS_NP(nmb1,gnd1)



        -> ',' @:NP(nmb,gnd) [LIS_NP(nmb1,gnd1)]

        -> @:CONJ NP(nmb,gnd)



        -> [NUM] @:N(nmb,gnd)

        -> @:NOM(nmb,gnd)   AP(nmb,gnd)

        -> AP(nmb,gnd)    @:NOM(nmb,gnd)

        -> @:NOM(nmb,gnd)   PP

        -> @:AP(nmb,gnd)

        -> INFP

        -> PPR

        -> DATE

        -> NUM

        -> 'quien'


# -------------

# Miscellaneous

# -------------



        -> DET(nmb,gnd)

        -> ART(nmb,gnd)



        -> ',' @:PP [LIS_PP]

        -> @:CONJ PP



        -> @:PR NP(nmb,gnd)

        -> @:PR CLAUSE

        -> @:'que' NP(nmb,gnd)

        -> @:'que' CLAUSE    



        -> @:ADJ(nmb,gnd) [AP(nmb,gnd)]

        -> ',' @:ADJ(nmb,gnd) [AP(nmb,gnd)]

        -> @:CONJ ADJ(nmb,gnd)


# --------------------

# Personal verb phrase

# --------------------



        -> @:V(nmb,pers,AUX)



        -> @:VP_needs_NP(nmb,pers,mean) [NP(nmb1,gnd1)]

        -> [ADVP] [SEP_O] @:VP(nmb,pers,mean) PP

        -> @:V(nmb,pers,mean) GER



        -> [ADVP] @:VP_V(nmb,pers,mean)

        -> @:VP_needs_NP(nmb,pers,mean) PP



        -> @:V(nmb,pers,mean)

        -> @:'haber'(nmb,pers) [ADVP] PART(SG,MASC)

        -> @:'ser'(nmb,pers)   [ADVP] PART(nmb,gnd)

        -> @:'ser'(nmb,pers)   [ADVP] AP(nmb,gnd)

        -> @:'estar'(nmb,pers)  [ADVP] PART(nmb,gnd)



        -> [L_CONJ] @:ADV [NP(nmb,gnd)]

        -> [L_CONJ] @:PP

        -> 'hacer'(nmb,pers) @:NP(nmb,gnd)

        -> @:PR ADV


# ------------------------

# Infinitivall verb phrase

# ------------------------



        -> [ADVP] @:VP_INF(nmb,gnd,mean) [ADV]

        -> [ADVP] @:V(INF,aux) [ADV]



        -> @:VP_needs_NP_INF(nmb,gnd,mean) [NP(nmb1,gnd1)]

        -> @:V(INF,mean) PP



        -> @:VP_V_INF(nmb,gnd,mean)

        -> @:VP_needs_NP_INF(nmb,gnd,mean) PP



        -> @:V(INF,mean)

        -> @:'haber'(INF) [ADVP] PART(SG,MASC)

        -> @:'ser'(INF)   [ADVP] PART(nmb,gnd)

The @ sign marks the lexical heads, the | sign separates alternatives, the [] signs denote optional elements. The terminal symbols are defines in the file lexesp.mrk where they are mapped to the symbols used by the morphological analyzer.

The symbols in parentheses denote morphological characteristics of the lexical heads of the constituents. They are supposed to be agreed if used more than one time. For example, in the rule


        -> NP_PERS(nmb,gnd,pers) @:VP(nmb,pers,mean)

the number nmb of the noun phrase and the verb phrase agree, while in the rule


        -> @:VP_needs_NP(nmb,pers,mean) NP(nmb1,gnd1)

the number nmb1 and gender gnd1 are not supposed to agree with those of the verb.

The values for the variables like nmb and gnd are specified in the file descvar.txt.

To change the grammar, use the grammar compiler described in the next section.

5.2     The grammar compiler

To change the grammar, make the necessary changes to the file grammar.txt in the directory compile/data, and, if necessary, to the files lexesp.mrk and descvar.txt in the same directory, and run the program process.bat.

If you want to change the grammar without leaving the Parser Demo program, be sure to check the Check Rules Changes for Each Sentence checkbox under the Options button.

6.     Development team     


This software is (C) Copyright by
the Center for Computing Research of National Polytechnic Institute, Mexico.
It was developed by the Natural Language Laboratory.




The Parser Demo development team:

Design: Dr. Alexander Gelbukh,
Programming: Dr. Grigori Sidorov,
Grammar: Sofía Galicia Haro.