Matching ⬅️⬆️➡️ Rules set


Grew Tutorial • Lesson 2 • First rule

In this lesson, we write a rule and learn how to apply it to some graph.

The conversion between different formats is one of the common usages of Grew. We will use the example of the conversion from one dependency annotation format (used in the Sequoia project) to Surface Syntactic Universal Dependencies (SUD).

Data

We consider the sentence frwiki_50.1000_00907 Deux autres photos sont également montrées du doigt [en: Two other photos are also pointed out]. The two annotations (Sequoia and SUD) are:

Format frwiki_50.1000_00907
Sequoia 🔗CoNLL sequoia
SUD 🔗CoNLL sud

The adjective rule

The whole transformation is decomposed into small steps which are described by rules. In our example, we need a rule to change the POS for adjectives: A is used in Sequoia and ADJ in SUD.

The following Grew rule adjective.grs can do this transformation:

rule adj {
  pattern { N [upos=A] }
  commands { N.upos = ADJ }
}

The command used to apply the rule to the input graph is:

grew transform -config sequoia -grs adjective.grs -strat "adj" -i frwiki_50.1000_00907.seq.conll

and it produces:

# global.columns = ID FORM LEMMA UPOS XPOS FEATS HEAD DEPREL DEPS MISC
# sent_id = frwiki_50.1000_00907
# text = Deux autres photos sont également montrées du doigt.
1	Deux	deux	D	_	s=card	3	det	_	_
2	autres	autre	ADJ	_	n=p|s=ind	3	mod	_	_
3	photos	photo	N	_	g=f|n=p|s=c	6	suj	_	_
4	sont	être	V	_	m=ind|n=p|p=3|t=pst	6	aux.pass	_	_
5	également	également	ADV	_	_	6	mod	_	_
6	montrées	montrer	V	_	g=f|m=part|n=p|t=past	0	root	_	_
7	du	de	P+D	_	s=def	6	mod	_	_
8	doigt	doigt	N	_	g=m|n=s|s=c	7	obj.p	_	_

which correspond to the graph below where the POS of autres is now ADJ:

one_step_adj

Some other rules

Let us consider two other similar rules (preposition.grs, preposition.grs) needed for the conversion:

rule prep {
  pattern { N [upos=P] }
  commands { N.upos = ADP }
}
rule noun {
  pattern { N [upos=N] }
  commands { N.upos = NOUN }
}

With the command:

grew transform -config sequoia -grs preposition.grs -strat "prep" -i frwiki_50.1000_00907.seq.conll

the output is the empty set:

# global.columns = ID FORM LEMMA UPOS XPOS FEATS HEAD DEPREL DEPS MISC

With the command:

grew transform -config sequoia -grs noun.grs -strat "noun" -i frwiki_50.1000_00907.seq.conll

the output contains two graphs, one with the first noun photos with the new tag NOUN and the other with the second noun doigt with the new tag NOUN:

# global.columns = ID FORM LEMMA UPOS XPOS FEATS HEAD DEPREL DEPS MISC
# sent_id = frwiki_50.1000_00907_0
# text = Deux autres photos sont également montrées du doigt.
1	Deux	deux	D	_	s=card	3	det	_	_
2	autres	autre	A	_	n=p|s=ind	3	mod	_	_
3	photos	photo	NOUN	_	g=f|n=p|s=c	6	suj	_	_
4	sont	être	V	_	m=ind|n=p|p=3|t=pst	6	aux.pass	_	_
5	également	également	ADV	_	_	6	mod	_	_
6	montrées	montrer	V	_	g=f|m=part|n=p|t=past	0	root	_	_
7	du	de	P+D	_	s=def	6	mod	_	_
8	doigt	doigt	N	_	g=m|n=s|s=c	7	obj.p	_	_

# sent_id = frwiki_50.1000_00907_1
# text = Deux autres photos sont également montrées du doigt.
1	Deux	deux	D	_	s=card	3	det	_	_
2	autres	autre	A	_	n=p|s=ind	3	mod	_	_
3	photos	photo	N	_	g=f|n=p|s=c	6	suj	_	_
4	sont	être	V	_	m=ind|n=p|p=3|t=pst	6	aux.pass	_	_
5	également	également	ADV	_	_	6	mod	_	_
6	montrées	montrer	V	_	g=f|m=part|n=p|t=past	0	root	_	_
7	du	de	P+D	_	s=def	6	mod	_	_
8	doigt	doigt	NOUN	_	g=m|n=s|s=c	7	obj.p	_	_

In fact, the result of the application of a rule on a graph is a set of graphs, one for each occurence of the pattern found in the input graph. This set is then empty if the the pattern is not found (like pattern { N [upos=P] }) or contains two graphs if the pattern is found twice (like pattern { N [upos=N] }). To iterate the application of a rule, one has to use the strategy Onf.

The strategy Onf

If we use the strategy Onf (noun) instead of noun in the last command above, the rule noun is iterated as much as possible. In our examples:

grew transform -config sequoia -grs noun.grs -strat "Onf (noun)" -i frwiki_50.1000_00907.seq.conll

produces the graph where the two nouns have the new tag NOUN:

# global.columns = ID FORM LEMMA UPOS XPOS FEATS HEAD DEPREL DEPS MISC
# sent_id = frwiki_50.1000_00907
# text = Deux autres photos sont également montrées du doigt.
1	Deux	deux	D	_	s=card	3	det	_	_
2	autres	autre	A	_	n=p|s=ind	3	mod	_	_
3	photos	photo	NOUN	_	g=f|n=p|s=c	6	suj	_	_
4	sont	être	V	_	m=ind|n=p|p=3|t=pst	6	aux.pass	_	_
5	également	également	ADV	_	_	6	mod	_	_
6	montrées	montrer	V	_	g=f|m=part|n=p|t=past	0	root	_	_
7	du	de	P+D	_	s=def	6	mod	_	_
8	doigt	doigt	NOUN	_	g=m|n=s|s=c	7	obj.p	_	_

Note that Onf always outputs exactly one graph. With the strategy Onf(prep) for instance, the rewriting process will output one graph, identical to the input graph, obtained after 0 application of the prep rule.

NB : Onf stands for “one normal form”; it will be explained more in detail later with other strategies.


Matching ⬅️⬆️➡️ Rules set