Requests
Requests are used in Grew to describe the left part of rewriting rules and in Grew-match to describe queries to be executed on corpora.
The syntax of requests in Grew can be learnt using the tutorial part of the Grew-match tool.
Requests syntax
A request is defined through 4 different kinds of request items.
- Global items (introduced by the keyword
global
) filter structures based on information about the whole graph or its metadata. - Matching items (introduced by keyword
pattern
) describe nodes and relations that must be found in the graph. - Positive filtering items (introduced by the keyword
with
) filter out matchings previously selected by other items (keeping only the ones which follows the additional graph constraints). - Negative filtering items (introduced by the keyword
without
) filter out matchings previously selected by other items (keeping only the ones which do not follow the additional graph constraints).
The full matching on one graph process is:
- Take a graph and a request as input.
- Output a set of matchings; a matching being a function from nodes and edges defined in the matching items to nodes and edges of the host graph.
- If the graph metadata does not satisfy one of the global items, the output is empty.
- Else the set M is initialised as the set of matchings which satisfy the union of matching items.
- For each positive filtering item, remove from M the matchings which do not satisfy it.
- For each negative filtering item, remove from M the matchings which satisfy it.
On a corpus, the graph matching process is repeated on each graph.
Remarks
- If there is more than one matching
pattern
items, the union is considered. - If there is more than one filtering (
without
orwith
) items, there are all interpreted independently - The order of items in a request are irrelevant.
- It there is no positive item, there is a trivial matching which is the empty function.
Matching and filtering items
Matching and filtering items both follow the same syntax. They are described by a list of clauses: node clauses, edge clauses and additional constraints
Node clauses
In a node clause, a node is described by an identifier (N
in the example below) and some constraints on its feature structure.
N [upos = VERB, Mood = Ind|Imp, Tense <> Fut, Number, !Person, form = "être", lemma = re"s.*" ]
The clause above illustrates the syntax of constraint that can be expressed, in turn:
upos = VERB
requires that the featureupos
is defined with the valueVERB
Mood = Ind|Imp
requires that the featureMood
is defined with one of the two valuesInd
orImp
Tense <> Fut
requires that the featureTense
is defined with a value different fromFut
Number
requires that the featureNumber
is defined whatever is its value (note that the same constraint can also be writtenNumber = *
)!Person
requires that the featurePerson
is not definedform = "être"
quotes are required when non-ASCII characters are usedlemma = re"s.*"
the prefixre
before a string declares a regular expression
Anchor nodes
⚠️ For dependency trees, an anchor node (position 0) is added to the structure (see here). In ArboratorGrew, this node is not displayed but is still taken into account when searching requests or when applying rules.
Disjunction in node clause
⚠️ Since version 1.14
Following the feature request #47, a node can be matched with a disjunction of feature structures
(separated by the pipe symbol |
).
Examples
The following clause selects either a past participle verb or an adjective :
N [upos=VERB, VerbForm=Part, Tense=Past]|[upos=ADJ]
A node with either a upos
ADV
(and no ExtPos
) or an ExtPos
ADV
can be searched with
:
N [upos=ADV, !ExtPos]|[ExtPos=ADV]
Edge clauses
All edge clauses below require the existence of an edge between the node selected by N
and the node selected by M
, eventually with additional constraints:
N -> M
: no additional constrainsN -[nsubj]-> M
: the edge label isnsubj
N -[nsubj|obj]-> M
: the edge label is eithernsubj
orobj
N -[^nsubj|obj]-> M
: the edge label is different fromnsubj
andobj
N -[re".*subj"]-> M
: the edge follows the regular expression (see here for regular expressions accepted)
Edges may also be named for usage in commands (in Grew) or in clustering (in Grew-match) with an identifier:
e: N -> M
e: N -[nsubj]-> M
- …
Note that edges may refer to undeclared nodes, these nodes are then implicitly declared without constraint. For instance, the two requests below are equivalent:
pattern { N -[nsubj]-> M }
pattern { N[]; M[]; N -[nsubj]-> M }
Additional constraints
These constraints do not bind new elements in the graph, but must be fulfilled (i.e. binding solutions which do not fulfill the constraints are filtered out).
-
Constraints on feature values:
N.lemma = M.lemma
two feature values must be equalN.lemma <> M.lemma
two feature values must be differentN.lemma = "constant"
the featurelemma
of nodeN
must be the valueconstant
N.lemma = re".*ing"
the value of a feature must follow a regular expression (see here for regular expressions accepted)N.lemma = lexicon.field
imposes that the featurelemma
of nodeN
must be the be present in thefield
of thelexicon
. NB: this also reduces the current lexicon to the items for whichfield
is equals toN.lemma
.
-
Constraints on node ordering:
N < M
the nodeN
immediately precedes the nodeM
N << M
the nodeN
precedes the nodeM
-
Constraints on in or out edges on bound nodes:
* -[nsubj]-> M
there is an incoming edge with labelnsubj
with targetM
(NB: the source node of the incoming edge is not bound; it can be equals to any other node (bound or not))M -[nsubj]-> *
there is an outgoing edge with labelnsubj
with sourceM
(NB: the target node of the outcoming edge is not bound; it can be equals to any other node (bound or not))
-
Constraints on edge labels:
e1.label = e2.label
the labels of the two edgese1
ande2
are equale1.label <> e2.label
the labels of the two edgese1
ande2
are different
-
Constraints on edges relative positions (these constraints impose that the source and the target of both edges are ordered)
-
Position of a node with respect to an edge
N << e
the nodeN
is strictly included between source and target of edgee
.
Injectivity in nodes matching
By default the matching of nodes is injective, this means that two different nodes in the request are mapped to two different nodes in the graph.
For instance, the request below searches for two different tokens, both having the same lemma make
pattern { N1 [ lemma="make" ]; N2 [ lemma="make" ] }
If the node identifier is suffixed by the symbol $
, the injectify constraint is relaxed.
A node N$
can be mapped to any node in the graph (either already mapped by another node of the request or not).
Example
For a more complex example with non-injective matching, you can see this example.
In AMR graphs, if we look for a predicate (with concept=judge-01
in the example) with two arguments ARG0
and ARG1
, there are two dictinct cases:
pattern { N [concept="judge-01"]; N -[ARG0]-> A0; N -[ARG1]-> A1; }
pattern { N [concept="judge-01"]; N -[ARG0]-> A; N -[ARG1]-> A; }
If we do not require the injectivity on one of the two arguments, then both cases above are returned → 5 occurences
pattern { N [concept="judge-01"]; N -[ARG0]-> A; N -[ARG1]-> B$; }
Complex edges
As label edges are internally represented by feature structures (see here), it is possible to match them with a standard unification mechanism, similar to the one used for feature structures in nodes.
N -[1=subj]-> M
the edge must match the edge feature constraints (more examples below).- [Since version
1.9.1
]N -[2="зад"]-> M
the edge must match the edge feature constraints with non-ASCII characters (see #36).
Some examples (with sud
configuration) are given below.
Syntax | Description | comp |
comp:obl |
comp:obl@agent |
comp:aux |
comp:obj@lvc |
---|---|---|---|---|---|---|
X -[1=comp]-> Y |
any edge such that the feature 1 is defined with value comp |
YES | YES | YES | YES | YES |
X -[1=comp, 2=obl|aux]-> Y |
the feature 1 is defined with value comp and the feature 2 is defined with one of the two values obl or aux |
NO | YES | YES | YES | NO |
X -[1=comp, 2<>obl|aux]-> Y |
the feature 1 is defined with value comp and the feature 2 is defined with a value different from obl or aux |
NO | NO | NO | NO | YES |
X -[1=comp, !deep]-> Y |
the feature 1 is defined with value comp and the feature deep is not defined |
YES | YES | NO | YES | NO |
X -[1=comp, 2=*]-> Y |
the feature 1 is defined with value comp and the feature 2 is defined with any value |
NO | YES | YES | YES | YES |
X -[comp]-> Y |
the exact label comp and nothing else |
YES | NO | NO | NO | NO |
⚠️ Matching with atomic labels ⚠️
It is important to note that from the request point of view, the two clauses X -[1=comp]-> Y
(first line in the table) and X -[comp]-> Y
(last line in the table) are not equivalent!
Difference with node features matching
Note that we would expect that the syntax X -[1=comp, 2]-> Y
should be equivalent to X -[1=comp, 2=*]-> Y
but it will bring an ambiguity for X -[lab]-> Y
that can be interpreted as the atomic label X -[lab]-> Y
or as X -[lab=*]-> Y
.
To avoid this ambiguity, the syntax X -[1=comp, 2]-> Y
in not allowed.
Global request
Global requests let the user express constrains about the structure of the whole graph. It is also possible to express constraints about metadata of the graph.
Structure constraints
Structure constraints are expressed with a fixed list of keywords.
We describe below 4 of the constraints available.
For each one, its negation is available by changing the is_
prefix by the is_not_
prefix.
-
is_cyclic
: the graph satisfied this constraint if and only if it contains a cycle. A cycle is a list of nodesN1
,N2
…N(k-1)
,Nk
such that there are edgesN1 -> N2
,N2 -> N3
,N(k-1) -> Nk
,Nk -> N1
. In graph theory, a non cyclic graph is also called a Directed Acyclic Graph (DAG). -
is_forest
: the graph satisfied this constraint if and only it is acyclic and if there are no couples of edges with the same target. In other words, a graph is a forest if and only if it is acyclic and each node has at most one incoming edge. -
is_tree
: a graph is a tree if it is a forest and if it have exactly one root. -
is_projective
: the usual notion of projectivity defined on tree is generalised by saying the a structure is projective if there are no 4-tuples (A
,B
,C
,D
) of ordered nodes (i.e.A << B
,B << C
andC << D
) such thatA
andC
are linked andB
andD
are linked (two nodes are linked when there is at least one edge between the two, whatever is the orientation).
Metadata constraints
In Grew, each graph is associated with a list of metadata: a list of (key, value) pairs.
In global
items, constraints of these metadata can be expressed with:
sent_id = "fr-ud-train_01234" | "fr-ud-train_12345"
: the metadatasent_id
has one of the two given values;sent_id <> "fr-ud-train_01234" | "fr-ud-train_12345"
: the metadatasent_id
is different from two given values;text = re".*\baux\b.*
: thetext
metadata field follows the given regexp (see here for regular expressions accepted; in the example, the field must contain the word aux).
For corpora described by the CoNLL-U format, available metadata are described before each sentence (see CoNNL-U doc).
In the UD or SUD corpora, each sentence contains at least the two metadata sent_id
and text
.
Some other tricks
Equivalent nodes
When two or more nodes are equivalent in a request (i.e. they can be interchanged without altering the meaning of the request), each occurrence of the request in a graph is reported multiple times (up to permutation in the sets of equivalent nodes).
For example, in the request below, the 3 nodes N1
, N2
and N3
are equivalent.
pattern { N1 -[ARG1]-> N; N2 -[ARG1]-> N; N3 -[ARG1]-> N; }
This request is found 270 times in the Little Prince corpus
but there are only 45 different occurrences; each one being reported 6 times with all permutations on N1
, N2
and N3
.
To avoid this, the constraint N1.__id__ < N2.__id__
can be used, which imposes an ordering on some internal representation of the nodes and so avoids these permutations.
NB: If a constraint N1.__id__ < N2.__id__
is used with two non-equivalent nodes, the result is unspecified.
The request below returns the 45 expected occurrences
pattern {
N1 -[ARG1]-> N; N2 -[ARG1]-> N; N3 -[ARG1]-> N;
N1.__id__ < N2.__id__; N2.__id__ < N3.__id__;
}