Requests

Requests are used in Grew to describe the left part of rewriting rules and in Grew-match to describe queries to be executed on corpora.

The syntax of requests in Grew can be learned using the tutorial part of the Grew-match tool.


Requests syntax

A request is defined through 4 different kinds of request items.

The full matching process on a graph is:

  1. If the graph metadata does not satisfy any of the global items, the output is empty.
  2. Else the set M is initialised as the set of matchings that satisfy the union of matching items.
  3. For each positive filtering item, remove from M the matchings that do not satisfy it.
  4. For each negative filtering item, remove from M the matchings that satisfy it.

On a corpus, the graph matching process is repeated on each graph.

Remarks


Matching and filtering items

Both matching and filtering items both follow the same syntax. They are described by a list of clauses: node clauses, edge clauses and additional constraints.

Node clauses

In a node clause, a node is described by an identifier (X in the example below) and some constraints on its feature structure.

X [upos = VERB, Mood = Ind|Imp, Tense <> Fut, Number, !Person, form = "être", lemma = re"s.*", Gloss = /.*POSS.*/i] ]

The clause above illustrates the syntax of constraint that can be expressed, in turn:

Anchor nodes

⚠️ For dependency trees, an anchor node (position 0) is added to the structure (see here). In ArboratorGrew, this node is not displayed but is still taken into account when searching requests or when applying rules.

Disjunction in node clause

⚠️ Since version 1.14

Following the feature request #47, a node can be matched with a disjunction of feature structures (separated by the pipe symbol |).

Examples

The following clause selects either a past participle verb or an adjective :

X [upos=VERB, VerbForm=Part, Tense=Past]|[upos=ADJ]

A node with either a upos ADV (and no ExtPos) or an ExtPos ADV can be searched with :

X [upos=ADV, !ExtPos]|[ExtPos=ADV]

Edge clauses

All edge clauses below require the existence of an edge between the node selected by X and the node selected by Y, eventually with additional constraints:

Edges may also be named for usage in commands (in Grew) or in clustering (in Grew-match) with an identifier:

Note that edges may refer to undeclared nodes, these nodes are then implicitly declared without constraint. For instance, the two requests below are equivalent:

pattern { X -[nsubj]-> Y }
pattern { X[]; Y[]; X -[nsubj]-> Y }

Additional constraints

These constraints do not bind new elements in the graph, but must be fulfilled (i.e. binding solutions which do not fulfill the constraints are filtered out).

Constraints on feature values:

Constraints on node ordering:

Constraints on large dominance

Constraints on in or out edges on bound nodes:

Constraints on edge labels:

Constraints on edges relative positions

These constraints impose that the source and the target of both edges are ordered).

Position of a node with respect to an edge

Constraints on distance between two nodes

[🆕 1.16.0] These constraints imply that both X and Y are ordered nodes.

In the previous constraints, = can be replaced by <, <=, > or >= with an obvious meaning! The keywords length and delta are also available as clustering keys.


Injectivity in nodes matching

By default, node matching is injective, meaning that two different nodes in the request are mapped to two different nodes in the graph.

For example, the following request searches for two different tokens, both with the same lemma make .

pattern { X1 [ lemma="make" ]; X2 [ lemma="make" ] }

If the node identifier is suffixed by the symbol $, the injectify constraint is relaxed. A node X$ can be mapped to any node in the graph (either already mapped by another node of the request or not). Note that X$ is a new name unrelated to any potential node named X.

Example

In AMR graphs, if we look for a predicate (with concept=judge-01 in the example) with two arguments ARG0 and ARG1, there are two dictinct cases:

pattern { X [concept="judge-01"]; X -[ARG0]-> A0; X -[ARG1]-> A1; }
pattern { X [concept="judge-01"]; X -[ARG0]-> A; X -[ARG1]-> A; }

If we do not require the injectivity on one of the two arguments, then both cases above are returned → 5 occurences

pattern { X [concept="judge-01"]; X -[ARG0]-> A; X -[ARG1]-> B$; }

For a more complex example with non-injective matching, you can see this example.


Complex edges

As label edges are internally represented by feature structures (see here), it is possible to match them with a standard unification mechanism, similar to the one used for feature structures in nodes.

Some examples (with sud configuration) are given below.

Syntax Description comp  comp:obl comp:obl@agent comp:aux comp:obj@lvc
X -[1=comp]-> Y any edge such that the feature 1 is defined with value comp YES YES YES YES YES
X -[1=comp, 2=obl|aux]-> Y the feature 1 is defined with value comp and the feature 2 is defined with one of the two values obl or aux NO YES YES YES NO
X -[1=comp, 2<>obl|aux]-> Y the feature 1 is defined with value comp and the feature 2 is defined with a value different from obl or aux NO NO NO NO YES
X -[1=comp, !deep]-> Y the feature 1 is defined with value comp and the feature deep is not defined YES YES NO YES NO
X -[1=comp, 2=*]-> Y the feature 1 is defined with value comp and the feature 2 is defined with any value NO YES YES YES YES
X -[comp]-> Y the exact label comp and nothing else YES NO NO NO NO

⚠️ Matching with atomic labels ⚠️

It is important to note that from the request point of view, the two clauses X -[1=comp]-> Y (first line in the table) and X -[comp]-> Y (last line in the table) are not equivalent!

Difference with node features matching

Note that we would expect that the syntax X -[1=comp, 2]-> Y should be equivalent to X -[1=comp, 2=*]-> Y but it will bring an ambiguity for X -[lab]-> Y that can be interpreted as the atomic label X -[lab]-> Y or as X -[lab=*]-> Y. To avoid this ambiguity, the syntax X -[1=comp, 2]-> Y in not allowed and you should write X -[1=comp, 2=*]-> Y.


Global request

Global requests let the user express constrains about the structure of the whole graph. It is also possible to express constraints about metadata of the graph.

Structure constraints

Structure constraints are expressed with a fixed list of keywords.

We describe below 4 of the constraints available. For each one, its negation is available by changing the is_ prefix by the is_not_ prefix.

Metadata constraints

In Grew, each graph is associated with a list of metadata: a list of (key, value) pairs.

In global items, constraints of these metadata can be expressed with:

For corpora described by the CoNLL-U format, available metadata are described before each sentence (see CoNNL-U doc). In the UD or SUD corpora, each sentence contains at least the two metadata sent_id and text.


Some other tricks

Equivalent nodes

When two or more nodes are equivalent in a request (i.e. they can be interchanged without altering the meaning of the request), each occurrence of the request in a graph is reported multiple times (up to permutation in the sets of equivalent nodes). For example, in the request below, the 3 nodes X1, X2 and X3 are equivalent.

pattern { X1 -[ARG1]-> X; X2 -[ARG1]-> X; X3 -[ARG1]-> X; }

This request is found 270 times in the Little Prince corpus but there are only 45 different occurrences; each one being reported 6 times with all permutations on X1, X2 and X3. To avoid this, the constraint X1.__id__ < X2.__id__ can be used, which imposes an ordering on some internal representation of the nodes and so avoids these permutations. Note: If a constraint X1.__id__ < X2.__id__ is used with two non-equivalent nodes, the result is unspecified.

The request below returns the 45 expected occurrences

pattern {
  X1 -[ARG1]-> X; X2 -[ARG1]-> X; X3 -[ARG1]-> X;
  X1.__id__ < X2.__id__; X2.__id__ < X3.__id__;
}