Grew • Command Line Interface

The command used to run Grew is: grew <subcommand> [<args>]

The 5 main subcommands are:

Other subcommands:

There are two modes of input data: Mono corpus or Multi corpora. See here for more details about input formats.

The table below shows what are the accepted input modes for the main subcommands.

transform grep count compile clean
Mono ✅ (🆕 in 1.10)
Multi ✅ (🆕 in 1.10)

The table below shows what are the ouptut mode modes for the 3 main subcommands (compile and clean does not have any output).

CLI arg transform grep count
CoNLL-U ✅ (default)
JSON -json ✅ (default) ✅ (default)
CoNLL-X -cupt / -semcor / -columns …
DOT -dot
multi JSON -multi_json
TSV -tsv ✅ (in some cases)

Transform

In this mode, Grew apply a Graph Rewriting System to a graph or a set of graphs.

The full command for this mode:

grew transform [<args>]

All arguments are optional:


Grep

This mode corresponds to the command line version of the Grew-match tool. Clustering is also available 🔗 in the grep mode.

Preliminaries

To test the examples below, you will need to create a local folder called data containing three corpora: UD_Chinese-PUD, UD_English-PUD and UD_French-PUD (version 2.17). With the commands below, you can create the folder, download the corpora and compile them.

mkdir -p data
wget https://github.com/UniversalDependencies/UD_French-PUD/raw/refs/tags/r2.17/fr_pud-ud-test.conllu -O data/fr_pud-ud-test.conllu
wget https://github.com/UniversalDependencies/UD_English-PUD/raw/refs/tags/r2.17/en_pud-ud-test.conllu -O data/en_pud-ud-test.conllu
wget https://github.com/UniversalDependencies/UD_Chinese-PUD/raw/refs/tags/r2.17/zh_pud-ud-test.conllu -O data/zh_pud-ud-test.conllu
wget http://grew.fr/usage/cli/en_fr_zh.json -O en_fr_zh.json
grew compile -i en_fr_zh.json

Without clustering

The command is:

grew grep -request <request> -i <input>

where:

The output is given in JSON format.

Example with Mono input

The command below search for all occurrences of the dislocated dependency relation in UD_French-PUD with the Grew request pattern { e: X -[dislocated]-> Y }. The fact the edge from X to Y is given an identifier e will give the information about this edge in the output (see below).

grew grep -request "pattern { e: X -[dislocated]-> Y }" -i data/fr_pud-ud-test.conllu

produces the following JSON output:

[
  {
    "sent_id": "n01121051",
    "matching": {
      "nodes": { "Y": "11", "X": "2" },
      "edges": {
        "e": { "source": "2", "label": "dislocated", "target": "11" }
      }
    }
  },
  {
    "sent_id": "n01086031",
    "matching": {
      "nodes": { "Y": "5", "X": "1" },
      "edges": {
        "e": { "source": "1", "label": "dislocated", "target": "5" }
      }
    }
  },
  {
    "sent_id": "n01001011",
    "matching": {
      "nodes": { "Y": "20", "X": "29" },
      "edges": {
        "e": { "source": "29", "label": "dislocated", "target": "20" }
      }
    }
  }
]

This means that the request has been found three times in the corpus. Each instance provides the identifier of the sentence and the position of the matched nodes and edges.

Note that there are two other options:

Example with Multi input

With the Mutli mode data described in the example file en_fr_zh.json 🔗

[
  { 
    "id": "UD_English-PUD",
    "directory": "data",
    "files": ["en_pud-ud-test.conllu"]
  },
  { 
    "id": "UD_French-PUD",
    "directory": "data",
    "files": ["fr_pud-ud-test.conllu"]
  },
  {
    "id": "UD_Chinese-PUD",
    "directory": "data",
    "files": ["zh_pud-ud-test.conllu"]
  }
]

The command:

grew grep -request "pattern { e: X -[dislocated]-> Y }" -i en_fr_zh.json

produces the following JSON output:

{
  "UD_French-PUD": [
    {
      "sent_id": "n01121051",
      "matching": {
        "nodes": { "Y": "11", "X": "2" },
        "edges": {
          "e": { "source": "2", "label": "dislocated", "target": "11" }
        }
      }
    },
    {
      "sent_id": "n01086031",
      "matching": {
        "nodes": { "Y": "5", "X": "1" },
        "edges": {
          "e": { "source": "1", "label": "dislocated", "target": "5" }
        }
      }
    },
    {
      "sent_id": "n01001011",
      "matching": {
        "nodes": { "Y": "20", "X": "29" },
        "edges": {
          "e": { "source": "29", "label": "dislocated", "target": "20" }
        }
      }
    }
  ],
  "UD_English-PUD": [
    {
      "sent_id": "n01029007",
      "matching": {
        "nodes": { "Y": "3", "X": "6" },
        "edges": {
          "e": { "source": "6", "label": "dislocated", "target": "3" }
        }
      }
    },
    {
      "sent_id": "n01002058",
      "matching": {
        "nodes": { "Y": "4", "X": "17" },
        "edges": {
          "e": { "source": "17", "label": "dislocated", "target": "4" }
        }
      }
    }
  ],
  "UD_Chinese-PUD": [
    {
      "sent_id": "w04010029",
      "matching": {
        "nodes": { "Y": "25", "X": "18" },
        "edges": {
          "e": { "source": "18", "label": "dislocated", "target": "25" }
        }
      }
    },
    {
      "sent_id": "n05002017",
      "matching": {
        "nodes": { "Y": "4", "X": "1" },
        "edges": {
          "e": { "source": "1", "label": "dislocated", "target": "4" }
        }
      }
    },
    {
      "sent_id": "w01116100",
      "matching": {
        "nodes": { "Y": "14", "X": "11" },
        "edges": {
          "e": { "source": "11", "label": "dislocated", "target": "14" }
        }
      }
    },
    {
      "sent_id": "w01107013",
      "matching": {
        "nodes": { "Y": "16", "X": "9" },
        "edges": {
          "e": { "source": "9", "label": "dislocated", "target": "16" }
        }
      }
    },
    {
      "sent_id": "n01070017",
      "matching": {
        "nodes": { "Y": "8", "X": "6" },
        "edges": {
          "e": { "source": "6", "label": "dislocated", "target": "8" }
        }
      }
    }
  ]
}

With clustering

In both Mono and Multi modes, if the command line additionally contains one or more arguments introduced by -key, the set of occurrences is clustered recursively according to the given clustering items.

See the Clustering documentation page for details of the various clustering items available.

Examples

With the same files as in the without clustering example above.

With -key, we can cluster the results according to the upos of the node Y (the dependent).

grew grep -request "pattern { e: X -[dislocated]-> Y }" -key Y.upos -i data/fr_pud-ud-test.conllu
{
  "VERB": [
    {
      "sent_id": "n01121051",
      "matching": {
        "nodes": { "Y": "11", "X": "2" },
        "edges": {
          "e": { "source": "2", "label": "dislocated", "target": "11" }
        }
      }
    }
  ],
  "PRON": [
    {
      "sent_id": "n01001011",
      "matching": {
        "nodes": { "Y": "20", "X": "29" },
        "edges": {
          "e": { "source": "29", "label": "dislocated", "target": "20" }
        }
      }
    }
  ],
  "ADJ": [
    {
      "sent_id": "n01086031",
      "matching": {
        "nodes": { "Y": "5", "X": "1" },
        "edges": {
          "e": { "source": "1", "label": "dislocated", "target": "5" }
        }
      }
    }
  ]
}

If the -key argument is surrounded by curly braces, a whether like clustering. In the next example, we cluster the results according to the fact that the relation is left-headed.

grew grep -request "pattern { e: X -[dislocated]-> Y }" -key "{X << Y}" -i data/fr_pud-ud-test.conllu
{
  "Yes": [
    {
      "sent_id": "n01121051",
      "matching": {
        "nodes": { "Y": "11", "X": "2" },
        "edges": {
          "e": { "source": "2", "label": "dislocated", "target": "11" }
        }
      }
    },
    {
      "sent_id": "n01086031",
      "matching": {
        "nodes": { "Y": "5", "X": "1" },
        "edges": {
          "e": { "source": "1", "label": "dislocated", "target": "5" }
        }
      }
    }
  ],
  "No": [
    {
      "sent_id": "n01001011",
      "matching": {
        "nodes": { "Y": "20", "X": "29" },
        "edges": {
          "e": { "source": "29", "label": "dislocated", "target": "20" }
        }
      }
    }
  ]
}

Finally, several clusterings can be applied one after the other. For example

grew grep -request "pattern { e: X -[dislocated]-> Y }" -key Y.upos -key "{ X << Y }" -i data/fr_pud-ud-test.conllu
{
  "VERB": {
    "No": [
      {
        "sent_id": "n01121051",
        "matching": {
          "nodes": { "Y": "11", "X": "2" },
          "edges": {
            "e": { "source": "2", "label": "dislocated", "target": "11" }
          }
        }
      }
    ]
  },
  "PRON": {
    "Yes": [
      {
        "sent_id": "n01001011",
        "matching": {
          "nodes": { "Y": "20", "X": "29" },
          "edges": {
            "e": { "source": "29", "label": "dislocated", "target": "20" }
          }
        }
      }
    ]
  },
  "ADJ": {
    "No": [
      {
        "sent_id": "n01086031",
        "matching": {
          "nodes": { "Y": "5", "X": "1" },
          "edges": {
            "e": { "source": "1", "label": "dislocated", "target": "5" }
          }
        }
      }
    ]
  }
}

Remarks:


Count

Preliminaries: same a for grep section above.

This mode computes corpus statistics based on Grew-match style requests.

The input data are:

By default, it returns a JSON describing several embedded dictionaries, counting in each corpus, each request clustered according to clustering items.

If the output dimension is 2, the statistics can be printed as a TSV table. This is the case for:

The optional -config parameter (see here) can also be used.

Example with Multi mode, several requests and no clustering

Each request is described in a separate file.

Here, we use the two following 1-line files:

The command

grew count -request AN.req -request NA.req -i en_fr_zh.json

outputs the JSON data:

{
  "UD_French-PUD": { "NA.req": 935, "AN.req": 422 },
  "UD_English-PUD": { "NA.req": 9, "AN.req": 1117 },
  "UD_Chinese-PUD": { "NA.req": 0, "AN.req": 363 }
}

And, with -tsv option:

grew count -request AN.req -request NA.req -i en_fr_zh.json -tsv
Corpus	AN	NA
UD_English-PUD	1117	9
UD_French-PUD	422	935
UD_Chinese-PUD	363	0

which corresponds to the table:

Corpus AN NA
UD_English-PUD 1117 9
UD_French-PUD 422 935
UD_Chinese-PUD 363 0

We can then observe that in the annotations of the 3 corpora in use:

Example with Multi mode, one request and a key clustering of the output

Using the same data as in the previous example, the following command:

grew count -request AN.req -key N.Number -i en_fr_zh.json -tsv

produces the TSV file:

Corpus	__undefined__	Plur	Ptan	Sing
UD_English-PUD	0	392	1	724
UD_French-PUD	0	178	0	244
UD_Chinese-PUD	363	0	0	0

Example with Multi mode, one request and a whether clustering of the output

With the command:

grew count -request "pattern { A[upos=ADJ]; N[upos=NOUN]; N -[amod]-> A; }" -key "{ A << N }" -i en_fr_zh.json -tsv

we obtain the TSV file:

Corpus	No	Yes
UD_English-PUD	9	1117
UD_French-PUD	935	422
UD_Chinese-PUD	0	363

Remarks


Compile

For the Grew-match backend (grew_match_back) or for the grew count command, it is necessary to first compile corpora. For these two usages, sets of corpora are described in a JSON file.

The files describing the corpora are search in the CORPUSBANK folder. The CORPUSBANK folder can be given as an environment variable CORPUSBANK or on the command line with the arg -CORPUSBANK <folder>.

Note that the compilation creates a new folder named _build_grew in the mail folder of the corresponding corpus. This folder contains a compiled version of the corpus and a fex other files used by Grew-match.


Clean

The commands below removes the marshal files produced by the grew compile command for the set of corpora described in a corpus bank (see Compile section above).

grew clean -i <corpora.json>


Parameters

This section describes some command line arguments that are common to several commands.

-config

The config value can be: ud, sud, sequoia or basic. The default value is ud.

This parameter changes how CoNLL-U and GRS files are interpreted. More specifically, it controls:

This parameter is used in the transform, grep and count modes.