Matching ⬅️ • ⬆️ • ➡️ Rules set
Grew Tutorial • Lesson 2 • First rule
In this lesson, we write a rule and learn how to apply it to some graph.
The conversion between different formats is one of the common usages of Grew. We will use the example of the conversion from one dependency annotation format (used in the Sequoia project) to Surface Syntactic Universal Dependencies (SUD).
Data
We consider the sentence frwiki_50.1000_00907
Deux autres photos sont également montrées du doigt [en: Two other photos are also pointed out].
The two annotations (Sequoia and SUD) are:
Format | frwiki_50.1000_00907 |
---|---|
Sequoia 🔗CoNLL | |
SUD 🔗CoNLL |
The adjective rule
The whole transformation is decomposed into small steps which are described by rules.
In our example, we need a rule to change the POS for adjectives: A
is used in Sequoia and ADJ
in SUD.
The following Grew rule adjective.grs
can do this transformation:
rule adj {
pattern { X [upos=A] }
commands { X.upos = ADJ }
}
The command used to apply the rule to the input graph is:
grew transform -config sequoia -grs adjective.grs -strat "adj" -i frwiki_50.1000_00907.seq.conll
and it produces:
# global.columns = ID FORM LEMMA UPOS XPOS FEATS HEAD DEPREL DEPS MISC
# sent_id = frwiki_50.1000_00907
# text = Deux autres photos sont également montrées du doigt.
1 Deux deux D _ s=card 3 det _ _
2 autres autre ADJ _ n=p|s=ind 3 mod _ _
3 photos photo N _ g=f|n=p|s=c 6 suj _ _
4 sont être V _ m=ind|n=p|p=3|t=pst 6 aux.pass _ _
5 également également ADV _ _ 6 mod _ _
6 montrées montrer V _ g=f|m=part|n=p|t=past 0 root _ _
7 du de P+D _ s=def 6 mod _ _
8 doigt doigt N _ g=m|n=s|s=c 7 obj.p _ _
which correspond to the graph below where the POS of autres is now ADJ
:
Some other rules
Let us consider two other similar rules (preposition.grs
, noun.grs
) needed for the conversion:
rule prep {
pattern { X [upos=P] }
commands { X.upos = ADP }
}
rule noun {
pattern { X [upos=N] }
commands { X.upos = NOUN }
}
With the command:
grew transform -config sequoia -grs preposition.grs -strat "prep" -i frwiki_50.1000_00907.seq.conll
the output is the empty set:
# global.columns = ID FORM LEMMA UPOS XPOS FEATS HEAD DEPREL DEPS MISC
With the command:
grew transform -config sequoia -grs noun.grs -strat "noun" -i frwiki_50.1000_00907.seq.conll
the output contains two graphs, one with the first noun photos with the new tag NOUN
and the other with the second noun doigt with the new tag NOUN
:
# global.columns = ID FORM LEMMA UPOS XPOS FEATS HEAD DEPREL DEPS MISC
# sent_id = frwiki_50.1000_00907_0
# text = Deux autres photos sont également montrées du doigt.
1 Deux deux D _ s=card 3 det _ _
2 autres autre A _ n=p|s=ind 3 mod _ _
3 photos photo NOUN _ g=f|n=p|s=c 6 suj _ _
4 sont être V _ m=ind|n=p|p=3|t=pst 6 aux.pass _ _
5 également également ADV _ _ 6 mod _ _
6 montrées montrer V _ g=f|m=part|n=p|t=past 0 root _ _
7 du de P+D _ s=def 6 mod _ _
8 doigt doigt N _ g=m|n=s|s=c 7 obj.p _ _
# sent_id = frwiki_50.1000_00907_1
# text = Deux autres photos sont également montrées du doigt.
1 Deux deux D _ s=card 3 det _ _
2 autres autre A _ n=p|s=ind 3 mod _ _
3 photos photo N _ g=f|n=p|s=c 6 suj _ _
4 sont être V _ m=ind|n=p|p=3|t=pst 6 aux.pass _ _
5 également également ADV _ _ 6 mod _ _
6 montrées montrer V _ g=f|m=part|n=p|t=past 0 root _ _
7 du de P+D _ s=def 6 mod _ _
8 doigt doigt NOUN _ g=m|n=s|s=c 7 obj.p _ _
In fact, the result of the application of a rule on a graph is a set of graphs, one for each occurence of the request found in the input graph. This set is then empty if the the request is not found (like pattern { X [upos=P] }
) or contains two graphs if the request is found twice (like pattern { X [upos=N] }
).
To iterate the application of a rule, one has to use the strategy Onf
.
The strategy Onf
If we use the strategy Onf (noun)
instead of noun
in the last command above, the rule noun
is iterated as much as possible.
In our examples:
grew transform -config sequoia -grs noun.grs -strat "Onf (noun)" -i frwiki_50.1000_00907.seq.conll
produces the graph where the two nouns have the new tag NOUN
:
# global.columns = ID FORM LEMMA UPOS XPOS FEATS HEAD DEPREL DEPS MISC
# sent_id = frwiki_50.1000_00907
# text = Deux autres photos sont également montrées du doigt.
1 Deux deux D _ s=card 3 det _ _
2 autres autre A _ n=p|s=ind 3 mod _ _
3 photos photo NOUN _ g=f|n=p|s=c 6 suj _ _
4 sont être V _ m=ind|n=p|p=3|t=pst 6 aux.pass _ _
5 également également ADV _ _ 6 mod _ _
6 montrées montrer V _ g=f|m=part|n=p|t=past 0 root _ _
7 du de P+D _ s=def 6 mod _ _
8 doigt doigt NOUN _ g=m|n=s|s=c 7 obj.p _ _
Note that Onf
always outputs exactly one graph.
With the strategy Onf(prep)
for instance, the rewriting process will output one graph, identical to the input graph, obtained after 0 application of the prep
rule.
NB : Onf
stands for “one normal form”; it will be explained more in detail later with other strategies.