Grew-match: Online Graph Matching
Grew-match is a one-page online web application for searching graph requests in treebanks.
There are several instances, each one with its own URL.
The address https://match.grew.fr displays a portal with links to these instances.
See below for a the list of instances.
If you want to run your own Grew-match instance, see Local installation of Grew-match.
Basic usage
Once you have selected an instance,
- Select the corpus on which you want to search:
- Use the top navigation bar to select the collection (a subset of corpora)
- Use the left pane to select the corpora on which the request will be executed
- Enter the search request in the text area. You can use the snippets on the right of the text area)
- Click on
SearchorCount
With Search:
- If the number of matches is below 1000, the number of items is displayed,
- Else, the computation stops after the first 1000 occurences computed.
For instance, if you search for a
nsubjrelation in the UD_French-GSD corpus , the amount of corpus used to find the first 1000 items is reported like inMore than 1000 results found in 5.28% of the corpus, This means that the first 1000 items were found in 5.28% of the 16,342 sentences of the UD_French-GSD corpus. - Items are displayed by batches of size 10; if you want to see the next 10 items, click on
More results.
With count, all the solutions are computed, but, it is not possible to visualize annotation examples.
For instance, with the same request as above, we observe 18,980 occcurences of nsubj:
.
Learning syntax
A tutorial with a progressive sequence of requests is available. You may also explore snippets given on the right of the text area to learn with other examples. A more comprehensive documentation is available in the requests page.
Clustering the occurrences
In addition to the main request, it is possible to perform clustering on the set of occurrences returned by this request.
When a clustering key is used, the set of occurrences (or the first 1,000 occurrences if Search is selected) is divided into subsets according of the key value.
Each possible value is presented as a button showing the size of the associated subset.
The button provides access to the corresponding occurrences (in Search mode).
When a whether sub-request is used, the results are split in two clusters Yes and No.
See the clustering documentation page for syntax and examples of usage.
Display options
Below the text area, a few options are available:
lemma: if checked, the lemma (CoNLL-U column 3) is shown in outputupos: if checked, the upos (CoNLL-U column 4) is shown in outputxpos: if checked, the xpos (CoNLL-U column 5) is shown in outputfeatures: if checked, other features (CoNLL-U column 6 and column 10) are showntextform/wordform: if checked, additional featurestextformandwordform(see CoNLL-U doc) are shownsentence order: 3 values are availableinitial: the sentence are scanned in the order they are present in the original corpusby length: the shortest sentences (in term of tokens number) are scanned firstshufflethe set of sentences is shuffled before searching the request (useful to search randomly for examples in a corpus)
context: if checked, the previous and the following sentences are shown (of course, this is meaningful only on corpora where original sentences ordering is preserved)
About CoNLL-U field names
The fields 2, 3, 4 and 5 of CoNLL-U files are considered as features with the following feature names.
| CoNLL-U field | FORM (col 2) |
LEMMA (col 3) |
UPOS (col 4) |
XPOS (col 5) |
|---|---|---|---|---|
| Grew syntax | form |
lemma |
upos |
xpos |
For instance:
- searching for the word is →
pattern { X [form="is"] } - searching for the lemma be →
pattern { X [lemma="be"] }
For other features, defined in CoNLL-U fields FEATS (col 6) and MISC (col 10), the name of the feature can be used directly with exceptions:
Enhanced dependencies
In the UD framework, a few corpora are also provided with an additional annotation layer called Enhanced dependencies.
These corpora are available by default with the enhanced layer and another corpus (with prefix bUD, for “basic” UD) is also available.
If the default treebank is selected, the enhanced dependencies are displayed in blue below the sentence.
In the pattern, an enhanced dependency can be searched with the prefix E:.
For example, the following pattern
searches for an enhanced obl relation in UD_English-EWT without a basic (i.e. non-enhanced) counterpart:
:
pattern { X -[E:obj]-> Y }
without { X -[obj]-> Y }
Grew-match instances
The https://universal.grew.fr instance
This instance contains the version 2.18 of the UD and the SUD treebanks, as well as a few more recent versions synchronised with the GitHub data. The top navigation bar provides access to:
- UD 2.18: The 353 treebanks of the version 2.18 of UD and 47 bUD corpora (basic UD: for corpora with Enhanced UD, a version only with basic UD is available)
- SUD 2.18: The 352 treebanks of the version 2.18 of SUD, 8 mSUD and 2 pSUD treebanks (see page SUD data for more details about SUD corpora and the extensions mSUD and pSUD)
- UD Latest: Some UD corpora in their latest version available on
devbranch on GitHub (with suffix@dev). Currently, some English, French, Irish and Portuguese are available. If you want to access in Grew-match to thedevbranch of another UD treebank, please contact us. These treebanks are updated immediately after each new push done on GitHub. - SUD Latest: Latest version available on GitHub of the native SUD corpora (with suffix
@latest). - UD Auto: Automatic UD conversion of SUD-native treebanks (with suffix
@conv). - SUD Auto: Automatic SUD conversion of some UD treebanks and a few other automatically built SUD treebanks
Other instances
https://parseme.grew.fr: MWE annotation from the Parseme projecthttps://semantics.grew.fr: a few available semantic graphbankshttps://orfeo.grew.fr: See Orfeo projecthttps://sequoia.grew.fr: Different annotations layers of the French Sequoia corpushttps://naija.grew.fr: See NaijaSynCor project
Contact
For any remark or request, you can either contact us or open an issue on GitHub.




