⬆️ Top

Building relation tables on your treebank

We call here “relation table” a table like the ones which are available through Grew-match: example on UD_French-PUD, version 2.13 (select a relation on the left).

The simplest way to compute this kind of table on your own corpus is to use the Python library grewpy. It is also possible to do the same with the Command Line Interface.

For this example, we suppose that we have a subfolder data which contains the file fr_pud-ud-test.conllu (the version 2.13 of the corpus UD_French-PUD which can be downloaded here).

.
├── data
│   └── fr_pud-ud-test.conllu

With the grewpy Python lib

See here for the installation of grewpy.

Table for nsubj relation

The script below loads the corpus and computes the table for the nsubj relation:

from grewpy import Corpus, Request

pud_corpus = Corpus('data/fr_pud-ud-test.conllu')
nsubj_table = pud_corpus.count (Request ('G -[nsubj]-> D'), clustering_keys=['G.upos', 'D.upos'])

print (nsubj_table)

The output is a nested Python dictonary, the toplevel keys correspond to the G.upos and the embedded keys correspond to the D.upos. For instance, nsubj_table['VERB']['NOUN'] returns 544 which corresponds to the number of occurrences of the nsubj relation from a VERB to a NOUN:

{'X': {'NOUN': 2}, 'VERB': {'X': 2, 'SYM': 3, 'PROPN': 199, 'PRON': 470, 'NUM': 3, 'NOUN': 544, 'DET': 1, 'ADV': 1, 'ADJ': 6}, 'PROPN': {'PRON': 2, 'NOUN': 2}, 'PRON': {'PROPN': 2, 'PRON': 8, 'NOUN': 2}, 'NOUN': {'PROPN': 11, 'PRON': 26, 'NUM': 1, 'NOUN': 43, 'ADJ': 2}, 'ADV': {'PRON': 1}, 'ADJ': {'X': 1, 'VERB': 1, 'SYM': 1, 'PROPN': 10, 'PRON': 20, 'NOUN': 53, 'ADJ': 1}}

Note that the sums for rows and columns are not given but it is easy to add them in the Python code.

Table for nsubj relation and its possible extension

The example above requires for nsubj but not for nsubj:pass and nsubj:caus which are also used in UD_French-PUD. To have the table for all relations nsubj with and without extension, the request 'G -[nsubj]-> D' should be changed to 'G -[1=nsubj]-> D' (see complex edges for an explanation).

Compute tables for all relations

It is possible to get all relation tables (without looping on edge labels) by using one more clustering key.

from grewpy import Corpus, Request

pud_corpus = Corpus('data/fr_pud-ud-test.conllu')
all_tables = pud_corpus.count (Request ('e: G -> D'), clustering_keys=['e.label', 'G.upos', 'D.upos'])

print (all_tables)

In the code above, all_tables is a dictionary mapping the possible values of dependency label (e.label) to a sub-dictionary as the one obtained above for nsubj.

…,
 'iobj': {'VERB': {'PRON': 35}, 'ADJ': {'ADP': 1}}, 
 'goeswith': {'NUM': {'X': 1}, 'NOUN': {'X': 1}, 'ADV': {'X': 1}},
…

With the Command Line Interface

The needed requests must be declared in a external file. So we suppose that our folder contains two more files:

pattern { G -[nsubj]-> D }
pattern { e: G -> D }

The command below builds the JSON code of the nsubj relation table.

grew count -request nsubj_table.req -key G.upos -key D.upos -i data/fr_pud-ud-test.conllu

For all tables:

grew count -request all_tables.req -key e.label -key G.upos -key D.upos -i data/fr_pud-ud-test.conllu

Remarks

{
  "all_tables.req": {
    "xcomp": { … },
    …
    "acl": { … }
  }
}