# Grew-match: Online Graph Matching

Grew-match is a one page online web application for searching graph requests in treebanks. There are several instances, each one with each own URL. The address http://match.grew.fr displays a portal with links to instances. See below for a the list of instances

If you want to run your own instance of Grew-match, see Local installation of Grew-match.

## Basic usage

Once you have selected an instance,

1. Select the corpus on which you want to search:
• with the top navbar, select the collection (subset of corpora)
• with the left pane, select the corpora on which the request will be executed
2. Enter the search request in the text area (you may use some snippets on the right of the text area)
3. Click on Search or Count

With Search:

• If the number of matches is below 1000, the number of items is displayed,
• Else, the computation stops after the first 1000 occurences computed (for instance, if you search for a nsubj relation in the UD_French-GSD corpus , and the amount of corpus used to find the first 1000 items is reported like in More than 1000 results found in 5.43% of the corpus, This means that the first 1000 items were found in 5.43% of the 16,341 sentences of the UD_French-GSD corpus. )
• Items are displayed by batches of size 10; if you want to see the next 10 items, click on More results.

With count, all the solutions are computed, but, it is not possible to visualize annotation examples. For instance, with the same request as above, we observe 18,998 occcurences of nsubj.

## Learning syntax

A tutorial with a progressive sequence of requests is available. You may also explore snippets given on the right of the text area to learn with other examples. A more comprehensive documentation is available in the requests page.

## Clustering the occurrences

In addition to the main request, it is possible to make some clustering on the set of occurrences returned by this request.

When a clustering key is used, the set of occurrences (or the first 1000 occurrences if Search is used) is split in subsets depending of the key value. Each possible value is presented as a button with the size of the associated subset; the button gives access to the corresponding occurrences (in Search mode).

When a whether sub-request is used, matching are split in two clusters Yes and No.

See clustering documentation page for syntax and examples of usage.

## Display options

Below the text area, a few options are available:

• lemma: if checked, the lemma (CoNLL-U column 3) is shown in output
• upos: if checked, the upos (CoNLL-U column 4) is shown in output
• xpos: if checked, the xpos (CoNLL-U column 5) is shown in output
• features: if checked, other features (CoNLL-U column 6 and column 10) are shown
• textform/wordform: if checked, additional features textform and wordform (see CoNLL-U doc) are shown
• sentence order: 3 values are available
• initial: the sentence are scanned in the order they are present in the original corpus
• by length: the shortest sentences (in term of tokens number) are scanned first
• shuffle the set of sentences is shuffled before searching the request (useful to search randomly for examples in a corpus)
• context: if checked, the previous and the following sentences are shown (of course, this is meaningful only on corpora where original sentences ordering is preserved)

The fields 2, 3, 4 and 5 of CoNLL-U files are considered as features with the following feature names.

CoNLL-U field FORM (col 2) LEMMA (col 3) UPOS (col 4) XPOS (col 5)
Grew syntax form lemma upos xpos

For instance:

• searching for the word ispattern { N [form="is"] }
• searching for the lemma bepattern { N [lemma="be"] }

For other features, defined in CoNLL-U fields FEATS (col 6) and MISC (col 10), the name of the feature can be used directly with exceptions:

• for layered features: see here
• for irregular used of MISC field: see here

## Enhanced dependencies

In the UD framework, a few corpora are also provided with another annotation layer EUD (Enhanced dependencies). For these corpora, a switch button is available (above the text area) where the user can chose between UD (only basic UD syntactic relations) and EUD (all relations, basic and enhanced).

If EUD is selected, enhanced dependencies are displayed in blue below the sentence. In the pattern, an enhanced dependency can be searched with the prefix E:. For instance, the pattern below searches for an enhanced obl relation in UD_English-EWT without a non-enhanced counterpart: :

pattern { N -[E:obj]-> M }
without { N -[obj]-> M }


# Grew-match instances

## The http://universal.grew.fr instance

This instance contains the version 2.11 of the UD and the SUD treebanks and a few more recent versions synchronised with GitHub data. The top navbar gives access to:

• UD 2.11: The 243 treebanks of the version 2.11 of UD
• SUD 2.11: The 241 treebanks of the version 2.11 of SUD (see page SUD data for more details about SUD corpora)
• UD Latest:
• suffix @dev: corpora in their latest version available on dev branch on GitHub (English, French, Irish and Portuguese). If you want to access to the dev branch of another UD treebank, please contact us.
• suffix @conv: the automatic conversion of the native SUD treebanks into UD.
• SUD Latest:
• suffix @latest: latest version available on GitHub of the native SUD corpora.

# Contact

For any remark or request, you can either contact me or open an issue on GitHub.