Grew Tutorial • Lesson 5 • Confluence
Another well-known issue with rewriting is the problem of confluence.
Two concurrent rules
As said earlier, the Sequoia tag V
may be converted to AUX
or VERB
.
A naive way to encode this in rules is to write (file: aux_verb.grs
) the package:
package v {
rule aux {
pattern { X [upos = V] }
commands { X.upos = AUX }
}
rule verb {
pattern { X [upos = V] }
commands { X.upos = VERB }
}
}
The two rules overlap: each time a POS V
is found, both rules can be used and produce a different output!
We call this kind of system non-confluent.
So, what is the output of the following commands?
grew transform -config sequoia -grs aux_verb.grs -strat "Onf(v)" -i frwiki_50.1000_00907.seq.conll
Let’s try it:
# global.columns = ID FORM LEMMA UPOS XPOS FEATS HEAD DEPREL DEPS MISC
# sent_id = frwiki_50.1000_00907
# text = Deux autres photos sont également montrées du doigt.
1 Deux deux D _ s=card 3 det _ _
2 autres autre A _ n=p|s=ind 3 mod _ _
3 photos photo N _ g=f|n=p|s=c 6 suj _ _
4 sont être VERB _ m=ind|n=p|p=3|t=pst 6 aux.pass _ _
5 également également ADV _ _ 6 mod _ _
6 montrées montrer VERB _ g=f|m=part|n=p|t=past 0 root _ _
7 du de P+D _ s=def 6 mod _ _
8 doigt doigt N _ g=m|n=s|s=c 7 obj.p _ _
Well, it produced exactly one graph output by choosing (in a way which cannot be controlled) one of the possible ways to rewrite.
What should we do with non-confluent system? In fact, there are two possible situations:
- The two rules are correct and there is a real (linguistic) ambiguity; and the different solutions must be considered.
- There is no ambiguity, the rules must be corrected.
In our example (AUX
and VERB
), we are clearly in the second case, but let us consider the other one anyway, just to see how to deal with really non-confluent setting.
The Iter
strategy
Here, we suppose then that we are interested in all possible solutions.
Grew provides a strategy Iter
to do this:
grew transform -config sequoia -grs aux_verb.grs -strat "Iter(v)" -i frwiki_50.1000_00907.seq.conll
This will produces 4 different graphs with all combinations of AUX
and VERB
for the two words sont and montrées.
# global.columns = ID FORM LEMMA UPOS XPOS FEATS HEAD DEPREL DEPS MISC
# sent_id = frwiki_50.1000_00907_0
# text = Deux autres photos sont également montrées du doigt.
1 Deux deux D _ s=card 3 det _ _
2 autres autre A _ n=p|s=ind 3 mod _ _
3 photos photo N _ g=f|n=p|s=c 6 suj _ _
4 sont être AUX _ m=ind|n=p|p=3|t=pst 6 aux.pass _ _
5 également également ADV _ _ 6 mod _ _
6 montrées montrer AUX _ g=f|m=part|n=p|t=past 0 root _ _
7 du de P+D _ s=def 6 mod _ _
8 doigt doigt N _ g=m|n=s|s=c 7 obj.p _ _
# sent_id = frwiki_50.1000_00907_1
# text = Deux autres photos sont également montrées du doigt.
1 Deux deux D _ s=card 3 det _ _
2 autres autre A _ n=p|s=ind 3 mod _ _
3 photos photo N _ g=f|n=p|s=c 6 suj _ _
4 sont être AUX _ m=ind|n=p|p=3|t=pst 6 aux.pass _ _
5 également également ADV _ _ 6 mod _ _
6 montrées montrer VERB _ g=f|m=part|n=p|t=past 0 root _ _
7 du de P+D _ s=def 6 mod _ _
8 doigt doigt N _ g=m|n=s|s=c 7 obj.p _ _
# sent_id = frwiki_50.1000_00907_2
# text = Deux autres photos sont également montrées du doigt.
1 Deux deux D _ s=card 3 det _ _
2 autres autre A _ n=p|s=ind 3 mod _ _
3 photos photo N _ g=f|n=p|s=c 6 suj _ _
4 sont être VERB _ m=ind|n=p|p=3|t=pst 6 aux.pass _ _
5 également également ADV _ _ 6 mod _ _
6 montrées montrer AUX _ g=f|m=part|n=p|t=past 0 root _ _
7 du de P+D _ s=def 6 mod _ _
8 doigt doigt N _ g=m|n=s|s=c 7 obj.p _ _
# sent_id = frwiki_50.1000_00907_3
# text = Deux autres photos sont également montrées du doigt.
1 Deux deux D _ s=card 3 det _ _
2 autres autre A _ n=p|s=ind 3 mod _ _
3 photos photo N _ g=f|n=p|s=c 6 suj _ _
4 sont être VERB _ m=ind|n=p|p=3|t=pst 6 aux.pass _ _
5 également également ADV _ _ 6 mod _ _
6 montrées montrer VERB _ g=f|m=part|n=p|t=past 0 root _ _
7 du de P+D _ s=def 6 mod _ _
8 doigt doigt N _ g=m|n=s|s=c 7 obj.p _ _
Be stricter in rules design
Of course, in our POS tags conversion example, the correct solution is to design more carefully our two rules, in order to produce the correct output. For instance (file: aux_verb_confluent.grs
):
package v {
rule aux {
pattern { X [upos = V]; Y -[aux.pass]-> X }
commands { X.upos = AUX }
}
rule verb {
pattern { X [upos = V] }
without { Y -[aux.pass]-> X }
commands { X.upos = VERB }
}
}
Here, the two rules are clearly separated: the same clause Y -[aux.pass]-> X
is used first the pattern
part for rule aux
and in the without
part for the rule verb
.
For each occurence of V
in the sentence, exactly one of the two rules can be used.
With these two new rules, the system is confluent, and there is only one possible output.
This can be tested with the Iter
strategy which produces all possible graphs:
grew transform -config sequoia -grs aux_verb_confluent.grs -strat "Iter(v)" -i frwiki_50.1000_00907.seq.conll
# global.columns = ID FORM LEMMA UPOS XPOS FEATS HEAD DEPREL DEPS MISC
# sent_id = frwiki_50.1000_00907
# text = Deux autres photos sont également montrées du doigt.
1 Deux deux D _ s=card 3 det _ _
2 autres autre A _ n=p|s=ind 3 mod _ _
3 photos photo N _ g=f|n=p|s=c 6 suj _ _
4 sont être AUX _ m=ind|n=p|p=3|t=pst 6 aux.pass _ _
5 également également ADV _ _ 6 mod _ _
6 montrées montrer VERB _ g=f|m=part|n=p|t=past 0 root _ _
7 du de P+D _ s=def 6 mod _ _
8 doigt doigt N _ g=m|n=s|s=c 7 obj.p _ _
Of course, the Onf
produces the same output in this setting.
Note that there are two different ways to compute the final graph: first apply rule aux
and then the rule verb
or the other way round: rule verb
and then rule aux
. But the important consequence of the confluence property is that the same graph is produced in both cases.
⚠️ Use Iter
carefully
When a package p
is confluent, the two strategies Onf(p)
and Iter(p)
give the same result.
In practice, the strategy Onf(p)
must be preferred because it is much more efficient to compute.
The Iter(p)
strategy computes the set of all graphs which can be obtained with the package p
from the input graph and then, returns the subset of normal forms.
If the rule applies in several places in the graph, the number of graphs can be huge.
For instance, with the rule del_xpos
below, if the graph G contains n nodes with an xpos
feature, the number of graphs in the produced set is 2n (each one of the n nodes can appear with or without xpos
feature).
rule del_xpos{
pattern{ X[xpos] }
commands{ del_feat X.xpos }
}
We have seen in the previous lesson that Grew stops the computation when a bound in the number of rule applications is reached.
By default, the bound is 10,000; so, if we try to apply the strategy Iter(del_xpos)
to the 14 nodes graph:
# sent_id = n01006011
# text = A witness told police that the victim had attacked the suspect in April.
1 A a DET DT Definite=Ind|PronType=Art 2 det 2:det _
2 witness witness NOUN NN Number=Sing 3 nsubj 3:nsubj _
3 told tell VERB VBD Mood=Ind|Tense=Past|VerbForm=Fin 0 root 0:root _
4 police police NOUN NNS Number=Plur 3 iobj 3:iobj _
5 that that SCONJ IN _ 9 mark 9:mark _
6 the the DET DT Definite=Def|PronType=Art 7 det 7:det _
7 victim victim NOUN NN Number=Sing 9 nsubj 9:nsubj _
8 had have AUX VBD Mood=Ind|Tense=Past|VerbForm=Fin 9 aux 9:aux _
9 attacked attack VERB VBN Tense=Past|VerbForm=Part 3 ccomp 3:ccomp _
10 the the DET DT Definite=Def|PronType=Art 11 det 11:det _
11 suspect suspect NOUN NN Number=Sing 9 obj 9:obj _
12 in in ADP IN _ 13 case 13:case _
13 April April PROPN NNP Number=Sing 9 obl 9:obl:in SpaceAfter=No
14 . . PUNCT . _ 3 punct 3:punct _
There are 214 = 16,384 graphs to compute and necessarily at least the same number of rule applications.
Hence, even if there is no loop in this case, there is a huge number of rule applications, this explain why Grew wrongly detects a loop and output the message:
FAIL: [file: del_xpos.grs, line: 1] More than 10000 rewriting steps: check for loops or increase max_rules value. Last rules are: […del_xpos, del_xpos, del_xpos, del_xpos, del_xpos, del_xpos, del_xpos, del_xpos, del_xpos, del_xpos]
Of course, in this case, the strategy Onf(p)
nicely computes the unique normal form quickly (with only 14 rule applications).