Patterns

Patterns are used in Grew to describe left part of rewriting rules and in Grew-match to describe queries to be executed on corpora.


Pattern syntax

A Pattern is defined through 3 different kind of pattern items.

The full matching process is:

  1. If the graph does not satisfied one of the global items, the output is empty.
  2. Else the set M is initialised as the set of matchings which satisfy the union of positive items.
  3. For each negative item, remove from M the matchings which satisfy it.

Remarks

The syntax of patterns in Grew can be learned using the tutorial part of the Grew-match tool.


Positive and negative patterns

Positive and negative items both follow the same syntax. They are described by a list of clauses: node clauses, edge clauses and additional constraints

Node clauses

In a node clause, a node is described by an identifier and some constraints on its feature structure.

N [upos = VERB, Mood = Ind|Imp, Tense <> Fut, Number, !Person, lemma = "être" ]

The clause above illustrates the syntax of constraint that can be expressed, in turn:

Edge clauses

All edge clauses below require the existence of an edge between the node selected by N and the node selected by M, eventually with additional constraints:

Edges may also be named for usage in commands (in Grew) or in clustering (in Grew-match) with an identifier:

Note that edge may refer to undeclared nodes, these nodes are then implicitly declared without constraint. For instance, the two patterns below are equivalent:

pattern { N -[nsubj]-> M }
pattern { N[]; M[]; N -[nsubj]-> M }

As label edges are internally represented by feature structures (see here), it is possible to match them with a standard unification mechanism, similar to the one used for feature structures in nodes.

Some examples (with sud configuration) are given below.

Syntax Description comp  comp:obl comp:obl@agent comp:aux comp:obj@lvc
X -[1=comp]-> Y any edge such that the feature 1 is defined with value comp YES YES YES YES YES
X -[1=comp, 2=obl|aux]-> Y the feature 1 is defined with value comp and the feature 2 is defined with one of the two values obl or aux NO YES YES YES NO
X -[1=comp, 2<>obl|aux]-> Y the feature 1 is defined with value comp and the feature 2 is defined with a value different from obl or aux NO NO NO NO YES
X -[1=comp, !deep]-> Y the feature 1 is defined with value comp and the feature deep is not defined YES YES NO YES NO
X -[1=comp, 2=*]-> Y the feature 1 is defined with value comp and the feature 2 is defined with any value NO YES YES YES YES
X -[comp]-> Y the exact label comp and nothing else YES NO NO NO NO

⚠️ Matching with atomic labels ⚠️

It is important to note that from the pattern point of view, the two clauses X -[1=comp]-> Y (first line in the table) and X -[comp]-> Y (last line in the table) are not equivalent!

Difference with node features matching

Note that we would expect that the syntax X -[1=comp, 2]-> Y should be equivalent to X -[1=comp, 2=*]-> Y but it will bring an ambiguity for X -[lab]-> Y that can be interpreted as the atomic label X -[lab]-> Y or as X -[lab=*]-> Y. To avoid this ambiguity, the syntax X -[1=comp, 2]-> Y in not allowed.

Additional constraints

These constrains do not bind new elements in the graph, but must be fulfilled (i.e. binding solutions which do not fulfil the constraints are filtered out).

Remarks

When two or more nodes are equivalent in a pattern, each occurrence of the pattern in a graph is found several times (up to permutation in the sets of equivalent nodes). For instance, in the pattern below, the 3 nodes N1, N2 and N3 are equivalent.

pattern { N1 -[ARG1]-> N; N2 -[ARG1]-> N; N3 -[ARG1]-> N; }

This pattern is found 120 times in the Little Prince corpus (Grew-match) but there are only 20 different occurrences, each one is reported 6 times with all permutations on N1, N2 and N3. To avoid this, a constraint id(N1) < id(N2) can be used. It imposes an ordering on some internal representation of the nodes and so avoid these permutations. NB: if a constraint id(N1) < id(N2) is used with two non-equivalent nodes, the result is unspecified.

The pattern below returns the 20 expected occurrences (Grew-match)

pattern {
    N1 -[ARG1]-> N; N2 -[ARG1]-> N; N3 -[ARG1]-> N;
    N1.__id < N2.__id; N2.__ < N3.__id;
}

Global pattern

Global patterns let the user express constrains about the structure of the whole graph. It is also possible to express constraints about metadata of the graph.

Structure constraints

Structure constraints are expressed with a fixed list of keywords.

We describe below 4 of the constraints available. For each one, its negation is available by changing the is_ prefix by the is_not_ prefix.

Metadata constraints

In Grew, each graph is associated with a list of metadata: a list of (key, value) pairs.

In global items, constraints of these metadata can be expressed with:

For corpora described by the CoNLL-U format, available metadata are described before each sentence (see CoNNL-U doc). In the UD or SUD corpora, each sentence contains at least the two metadata sent_id and text.

Note about CoNNL-U specificities

Additional information available in the CoNNL-U format can be accessed through special features textform and wordform (see CoNLL-U format)