Requests

Requests are used in Grew to describe left part of rewriting rules and in Grew-match to describe queries to be executed on corpora.


Requests syntax

A request is defined through 4 different kinds of request items.

The full matching on one graph process is:

  1. If the graph metadata does not satisfied one of the global items, the output is empty.
  2. Else the set M is initialised as the set of matchings which satisfy the union of matching items.
  3. For each positive filtering item, remove from M the matchings which satisfy it.
  4. For each negative filtering item, remove from M the matchings which do not satisfy it.

On a corpus, the graph matching process is repeated on each graph.

Remarks

The syntax of requests in Grew can be learned using the tutorial part of the Grew-match tool.


Matching and filtering items

Matching and filtering items both follow the same syntax. They are described by a list of clauses: node clauses, edge clauses and additional constraints

Node clauses

In a node clause, a node is described by an identifier (N in the example below) and some constraints on its feature structure.

N [upos = VERB, Mood = Ind|Imp, Tense <> Fut, Number, !Person, form = "être", lemma = re"s.*" ]

The clause above illustrates the syntax of constraint that can be expressed, in turn:

Edge clauses

All edge clauses below require the existence of an edge between the node selected by N and the node selected by M, eventually with additional constraints:

Edges may also be named for usage in commands (in Grew) or in clustering (in Grew-match) with an identifier:

Note that edge may refer to undeclared nodes, these nodes are then implicitly declared without constraint. For instance, the two requests below are equivalent:

pattern { N -[nsubj]-> M }
pattern { N[]; M[]; N -[nsubj]-> M }

Additional constraints

These constrains do not bind new elements in the graph, but must be fulfilled (i.e. binding solutions which do not fulfil the constraints are filtered out).


Injectivity in nodes matching

By default the matching of nodes is injective, this means that two different nodes in the request are mapped to two different nodes in the graph.

For instance, the request below searches for two different tokens, both having the same lemma make

pattern { N1 [ lemma="make" ]; N2 [ lemma="make" ] }

[🆕 1.11] If the node identifier is suffixed by the symbol $, the injectify constraint is relaxed. A node N$ can be mapped to any node in the graph (either already mapped by another node of the request or not).

Example

In AMR graphs, if we look for a predicate (with concept=judge-01 in the example) with two arguments ARG0 and ARG1, there are two dictinct cases:

pattern { N [concept="judge-01"]; N -[ARG0]-> A0; N -[ARG1]-> A1; }
pattern { N [concept="judge-01"]; N -[ARG0]-> A; N -[ARG1]-> A; }

If we do not require the injectivity on one of the two arguments, then both cases above are returned → 5 occurences

pattern { N [concept="judge-01"]; N -[ARG0]-> A; N -[ARG1]-> B$; }

Complex edges

As label edges are internally represented by feature structures (see here), it is possible to match them with a standard unification mechanism, similar to the one used for feature structures in nodes.

Some examples (with sud configuration) are given below.

Syntax Description comp  comp:obl comp:obl@agent comp:aux comp:obj@lvc
X -[1=comp]-> Y any edge such that the feature 1 is defined with value comp YES YES YES YES YES
X -[1=comp, 2=obl|aux]-> Y the feature 1 is defined with value comp and the feature 2 is defined with one of the two values obl or aux NO YES YES YES NO
X -[1=comp, 2<>obl|aux]-> Y the feature 1 is defined with value comp and the feature 2 is defined with a value different from obl or aux NO NO NO NO YES
X -[1=comp, !deep]-> Y the feature 1 is defined with value comp and the feature deep is not defined YES YES NO YES NO
X -[1=comp, 2=*]-> Y the feature 1 is defined with value comp and the feature 2 is defined with any value NO YES YES YES YES
X -[comp]-> Y the exact label comp and nothing else YES NO NO NO NO

⚠️ Matching with atomic labels ⚠️

It is important to note that from the request point of view, the two clauses X -[1=comp]-> Y (first line in the table) and X -[comp]-> Y (last line in the table) are not equivalent!

Difference with node features matching

Note that we would expect that the syntax X -[1=comp, 2]-> Y should be equivalent to X -[1=comp, 2=*]-> Y but it will bring an ambiguity for X -[lab]-> Y that can be interpreted as the atomic label X -[lab]-> Y or as X -[lab=*]-> Y. To avoid this ambiguity, the syntax X -[1=comp, 2]-> Y in not allowed.


Global request

Global requests let the user express constrains about the structure of the whole graph. It is also possible to express constraints about metadata of the graph.

Structure constraints

Structure constraints are expressed with a fixed list of keywords.

We describe below 4 of the constraints available. For each one, its negation is available by changing the is_ prefix by the is_not_ prefix.

Metadata constraints

In Grew, each graph is associated with a list of metadata: a list of (key, value) pairs.

In global items, constraints of these metadata can be expressed with:

For corpora described by the CoNLL-U format, available metadata are described before each sentence (see CoNNL-U doc). In the UD or SUD corpora, each sentence contains at least the two metadata sent_id and text.


Some other tricks

Equivalent nodes

When two or more nodes are equivalent in a request (i.e. they can be exchanged without changing the semantics of the request), each occurrence of the request in a graph is reported several times (up to permutation in the sets of equivalent nodes). For instance, in the request below, the 3 nodes N1, N2 and N3 are equivalent.

pattern { N1 -[ARG1]-> N; N2 -[ARG1]-> N; N3 -[ARG1]-> N; }

This request is found 270 times in the Little Prince corpus but there are only 45 different occurrences, each one is reported 6 times with all permutations on N1, N2 and N3. To avoid this, a constraint N1.__id__ < N2.__id__ can be used. It imposes an ordering on some internal representation of the nodes and so avoid these permutations. NB: if a constraint N1.__id__ < N2.__id__ is used with two non-equivalent nodes, the result is unspecified.

The request below returns the 45 expected occurrences

pattern {
  N1 -[ARG1]-> N; N2 -[ARG1]-> N; N3 -[ARG1]-> N;
  N1.__id__ < N2.__id__; N2.__id__ < N3.__id__;
}