# Grew-match: Online Graph Matching

Grew-match is a one page online web application for searching graph patterns in treebanks. Treebanks available are in several linguistic formats:

If you want to use it on some other corpora, you can run your own Grew-match following the instructions on Local installation of Grew-match.

## Basic usage

1. Select the corpus on which you want to search (first chose the format in the top navbar and then the corpus in the left pane)
2. Enter the search pattern in the text area (you may use some snippets on the right of the text area)
3. Click on Search

The number of items is displayed and the first 10 items can be explored. If you want to see the next 10 items, click on Get more results.

To limit server usage, only the first 1000 items are computed. If the searched pattern is found more then 1000 times, the amount of corpus used to find the first 1000 items is reported. For instance, if you search for a nsubj relation in the UD_French-GSD corpus (see output), the message is More than 1000 results found in 5.16% of the corpus. This means that the first 1000 items were found in 5.16% of the 16,342 sentences of the UD_French-GSD corpus.

## Learning syntax

A tutorial with a progressive sequence of patterns is available. You may also explore snippets given on the right of the text area to learn with other examples. A more comprehensive documentation is available in the patterns page.

The fields 2, 3, 4 and 5 of CoNLL-U structure are considered as features with the following feature names.

CoNLL-U field 2 3 4 5
Name form lemma upos xpos

For instance, if you want to search:

• for the word is, you write: pattern { N [form="is"] }
• for the lemma be, you write: pattern { N [lemma="be"] }

## Display options

Below the textarea, a few options are available:

• lemma: if checked, the lemma (CoNLL-U column 3) is shown in output
• upos: if checked, the upos (CoNLL-U column 4) is shown in output
• xpos: if checked, the xpos (CoNLL-U column 5) is shown in output
• features: if checked, other features (CoNLL-U column 6 and column 10) are shown
• textform/wordform: if checked, additional features textform and wordform (see CoNLL-U doc) are shown
• sentence order: 3 value are available
• initial: the sentence are scanned in the order they are present in the original corpus
• by length: the shortest sentences (in term of tokens number) are scanned first
• shuffle the set of sentences is shuffled before searching the pattern (useful to search randomly for examples in a corpus)
• context: if checked, the previous and the following sentences are shown (of course, this is useful only on corpora where original sentences ordering is preserved)

## Enhanced dependencies

In the UD framework, a few corpora are also provided with another annotation EUD layer (Enhanced dependencies). For these corpora, a switch button is available (above the textarea) where the user can chose between UD and EUD

If EUD is selected, enhanced dependencies are displayed in blue below the sentence. In the pattern, an enhanced dependency can be searched with the prefix E:. For instance, the pattern below searches for an enhanced obl relation in UD_English-EWT without a non-enhanced counterpart (see output):

pattern { N -[E:obj]-> M }
without { N -[obj]-> M }


## Contact

For any remark or request, you can either contact us or open an issue on the GitLab project (you will have to register).

## Deprecated _MISC_ and _UD_ prefixes

In older versions, features declared in column 10 were accessible with the _MISC_ prefix. Since 1.4, no prefix is required.

Multiword tokens or empty nodes were identified with the _UD_ prefix. This prefix is not used anymore; it is replaced by features textform and wordform (see CoNLL-U doc).