Grew-match: Online Graph Matching
Grew-match is a one page online web application for searching graph requests in treebanks.
There are several instances, each one with each own URL.
The address https://match.grew.fr
displays a portal with links to instances.
See below for a the list of instances
If you want to run your own instance of Grew-match, see Local installation of Grew-match.
Basic usage
Once you have selected an instance,
- Select the corpus on which you want to search:
- with the top navbar, select the collection (subset of corpora)
- with the left pane, select the corpora on which the request will be executed
- Enter the search request in the text area (you may use some snippets on the right of the text area)
- Click on
Search
orCount
With Search
:
- If the number of matches is below 1000, the number of items is displayed,
- Else, the computation stops after the first 1000 occurences computed (for instance, if you search for a
nsubj
relation in the UD_French-GSD corpus , and the amount of corpus used to find the first 1000 items is reported like inMore than 1000 results found in 5.44% of the corpus
, This means that the first 1000 items were found in 5.44% of the 16,342 sentences of the UD_French-GSD corpus. ) - Items are displayed by batches of size 10; if you want to see the next 10 items, click on
More results
.
With count
, all the solutions are computed, but, it is not possible to visualize annotation examples.
For instance, with the same request as above, we observe 18,974 occcurences of nsubj
.
Learning syntax
A tutorial with a progressive sequence of requests is available. You may also explore snippets given on the right of the text area to learn with other examples. A more comprehensive documentation is available in the requests page.
Clustering the occurrences
In addition to the main request, it is possible to make some clustering on the set of occurrences returned by this request.
When a clustering key is used, the set of occurrences (or the first 1000 occurrences if Search
is used) is split in subsets depending of the key value.
Each possible value is presented as a button with the size of the associated subset; the button gives access to the corresponding occurrences (in Search
mode).
When a whether
sub-request is used, matching are split in two clusters Yes
and No
.
See clustering documentation page for syntax and examples of usage.
Display options
Below the text area, a few options are available:
lemma
: if checked, the lemma (CoNLL-U column 3) is shown in outputupos
: if checked, the upos (CoNLL-U column 4) is shown in outputxpos
: if checked, the xpos (CoNLL-U column 5) is shown in outputfeatures
: if checked, other features (CoNLL-U column 6 and column 10) are showntextform/wordform
: if checked, additional featurestextform
andwordform
(see CoNLL-U doc) are shownsentence order
: 3 values are availableinitial
: the sentence are scanned in the order they are present in the original corpusby length
: the shortest sentences (in term of tokens number) are scanned firstshuffle
the set of sentences is shuffled before searching the request (useful to search randomly for examples in a corpus)
context
: if checked, the previous and the following sentences are shown (of course, this is meaningful only on corpora where original sentences ordering is preserved)
About CoNLL-U field names
The fields 2, 3, 4 and 5 of CoNLL-U files are considered as features with the following feature names.
CoNLL-U field | FORM (col 2) |
LEMMA (col 3) |
UPOS (col 4) |
XPOS (col 5) |
---|---|---|---|---|
Grew syntax | form |
lemma |
upos |
xpos |
For instance:
- searching for the word is →
pattern { X [form="is"] }
- searching for the lemma be →
pattern { X [lemma="be"] }
For other features, defined in CoNLL-U fields FEATS
(col 6) and MISC
(col 10), the name of the feature can be used directly with exceptions:
Enhanced dependencies
In the UD framework, a few corpora are also provided with another annotation layer (Enhanced dependencies).
For these corpora, they are available by default with the enhanced layer and another corpora (with prefix bUD
, for “basic” UD) is also available
If the default treebank is selected, enhanced dependencies are displayed in blue below the sentence.
In the pattern, an enhanced dependency can be searched with the prefix E:
.
For instance, the pattern below
searches for an enhanced obl
relation in UD_English-EWT without a non-enhanced counterpart:
:
pattern { X -[E:obj]-> Y }
without { X -[obj]-> Y }
Grew-match instances
The https://universal.grew.fr
instance
This instance contains the version 2.15 of the UD and the SUD treebanks and a few more recent versions synchronised with GitHub data. The top navbar gives access to:
- UD 2.15: The 296 treebanks of the version 2.15 of UD
- SUD 2.15: The 300 treebanks of the version 2.15 of SUD (see page SUD data for more details about SUD corpora)
- UD Latest: (with suffix
@dev
) Some UD corpora in their latest version available ondev
branch on GitHub English, French, Irish and Portuguese). If you want to access to thedev
branch of another UD treebank, please contact us. These treebanks are updated in at most one hour after a new push is done on GitHub. - SUD Latest: (with suffix
@latest
) latest version available on GitHub of the native SUD corpora. - UD Auto: (with suffix
@conv
) automatic UD conversion of SUD-native treebanks - SUD Auto: automatic SUD conversion of some UD treebanks and a few other automatically built SUD treebanks
Other instances
https://semantics.grew.fr
: MWE annotation from the Parseme projecthttps://semantics.grew.fr
: a few available semantic graphbankshttps://orfeo.grew.fr
: See Orfeo projecthttps://sequoia.grew.fr
: Different annotations layers of the French Sequoia corpushttps://naija.grew.fr
: See NaijaSynCor project
Contact
For any remark or request, you can either contact me or open an issue on GitHub.