Grew • Command Line Interface

The command used to run Grew is: grew <subcommand> [<args>]

The 5 main subcommands are:

Other subcommands:

There are two modes of input data: Mono corpus or Multi corpora. See here for more details about input formats.

The table below shows what are the accepted input modes for the main subcommands.

transform grep count compile clean
Mono ✅ (🆕 in 1.10)
Multi ✅ (🆕 in 1.10)

The table below shows what are the ouptut mode modes for the 3 main subcommands (compile and clean does not have any output).

CLI arg transform grep count
CoNLL-U ✅ (default)
JSON -json ✅ (default) ✅ (default)
CoNLL-X -cupt / -semcor / -columns …
DOT -dot
multi JSON -multi_json
TSV -tsv ✅ (in some cases)

Transform

In this mode, Grew apply a Graph Rewriting System to a graph or a set of graphs.

The full command for this mode:

grew transform [<args>]

All arguments are optional:


Grep

This mode corresponds to the command line version of the Grew-match tool. The clustering is also available 🔗 in the grep mode.

Without clustering

The command is:

grew grep -request <request_file> -i <input>

where:

The output is given in JSON format.

Example with Mono input

With the following files:

pattern { e: M -[dislocated]-> N }

NB: the fact the edge from M to N is given an identifier e will give the information about this edge in the output (see below).

The command:

grew grep -request dislocated.req -i fr_pud-ud-test.conllu

produces the following JSON output:

[
  {
    "sent_id": "n01121051",
    "matching": {
      "nodes": { "N": "11", "M": "2" },
      "edges": {
        "e": { "source": "2", "label": "dislocated", "target": "11" }
      }
    }
  },
  {
    "sent_id": "n01086031",
    "matching": {
      "nodes": { "N": "5", "M": "1" },
      "edges": {
        "e": { "source": "1", "label": "dislocated", "target": "5" }
      }
    }
  },
  {
    "sent_id": "n01001011",
    "matching": {
      "nodes": { "N": "20", "M": "29" },
      "edges": {
        "e": { "source": "29", "label": "dislocated", "target": "20" }
      }
    }
  }
]

This means that the request described in the file dislocated.req was found three times in the corpus, each item gives the sentence identifier and the position of the nodes and the edges matched by the request.

Note that two other options exist:

Example with Multi input

With the Mutli mode data described in the example file en_fr_zh.json 🔗 (which must be compiled with grew compile -i en_fr_zh.json)

{ "corpora": [
  { "id": "UD_English-PUD",
    "directory": "_build",
    "files": ["en_pud-ud-test.conllu"]
  },
  { "id": "UD_French-PUD",
    "directory": "_build",
    "files": ["fr_pud-ud-test.conllu"]
  },
  { "id": "UD_Chinese-PUD",
    "directory": "_build",
    "files": ["zh_pud-ud-test.conllu"]
  } ]
}

The command:

grew grep -request dislocated.req -i en_fr_zh.json

produces the following JSON output:

{
  "UD_French-PUD": [
    {
      "sent_id": "n01121051",
      "matching": {
        "nodes": { "N": "11", "M": "2" },
        "edges": {
          "e": { "source": "2", "label": "dislocated", "target": "11" }
        }
      }
    },
    {
      "sent_id": "n01086031",
      "matching": {
        "nodes": { "N": "5", "M": "1" },
        "edges": {
          "e": { "source": "1", "label": "dislocated", "target": "5" }
        }
      }
    },
    {
      "sent_id": "n01001011",
      "matching": {
        "nodes": { "N": "20", "M": "29" },
        "edges": {
          "e": { "source": "29", "label": "dislocated", "target": "20" }
        }
      }
    }
  ],
  "UD_English-PUD": [
    {
      "sent_id": "n01029007",
      "matching": {
        "nodes": { "N": "3", "M": "6" },
        "edges": {
          "e": { "source": "6", "label": "dislocated", "target": "3" }
        }
      }
    },
    {
      "sent_id": "n01002058",
      "matching": {
        "nodes": { "N": "4", "M": "17" },
        "edges": {
          "e": { "source": "17", "label": "dislocated", "target": "4" }
        }
      }
    }
  ],
  "UD_Chinese-PUD": [
    {
      "sent_id": "w04010029",
      "matching": {
        "nodes": { "N": "25", "M": "18" },
        "edges": {
          "e": { "source": "18", "label": "dislocated", "target": "25" }
        }
      }
    },
    {
      "sent_id": "n05002017",
      "matching": {
        "nodes": { "N": "4", "M": "1" },
        "edges": {
          "e": { "source": "1", "label": "dislocated", "target": "4" }
        }
      }
    },
    {
      "sent_id": "w01116100",
      "matching": {
        "nodes": { "N": "14", "M": "11" },
        "edges": {
          "e": { "source": "11", "label": "dislocated", "target": "14" }
        }
      }
    },
    {
      "sent_id": "w01107013",
      "matching": {
        "nodes": { "N": "16", "M": "9" },
        "edges": {
          "e": { "source": "9", "label": "dislocated", "target": "16" }
        }
      }
    },
    {
      "sent_id": "n01070017",
      "matching": {
        "nodes": { "N": "8", "M": "6" },
        "edges": {
          "e": { "source": "6", "label": "dislocated", "target": "8" }
        }
      }
    }
  ]
}

With clustering

In both modes Mono and Multi, if the command line additionally contains one or more arguments (-key … or -whether …), the set of occurrences is recursively clusterised following the given clustering items.

See the clustering documentation page for details about the different existing clustering items.

Examples

With the same files as in the without clustering example above.

With -key, we can cluster the results according to the upos of the node N (the dependent).

grew grep -request dislocated.req -key N.upos -i fr_pud-ud-test.conllu
{
  "PRON": [
    {
      "sent_id": "n01086031",
      "matching": {
        "nodes": { "N": "5", "M": "1" },
        "edges": {
          "e": { "source": "1", "label": "dislocated", "target": "5" }
        }
      }
    }
  ],
  "NOUN": [
    {
      "sent_id": "n01121051",
      "matching": {
        "nodes": { "N": "11", "M": "2" },
        "edges": {
          "e": { "source": "2", "label": "dislocated", "target": "11" }
        }
      }
    },
    {
      "sent_id": "n01001011",
      "matching": {
        "nodes": { "N": "20", "M": "29" },
        "edges": {
          "e": { "source": "29", "label": "dislocated", "target": "20" }
        }
      }
    }
  ]
}

With -whether, we can cluster the results according to the fact that the relation left-headed. We observe that in two cases, the governor M is before N.

grew grep -request dislocated.req -whether "M << N" -i fr_pud-ud-test.conllu
{
  "Yes": [
    {
      "sent_id": "n01121051",
      "matching": {
        "nodes": { "N": "11", "M": "2" },
        "edges": {
          "e": { "source": "2", "label": "dislocated", "target": "11" }
        }
      }
    },
    {
      "sent_id": "n01086031",
      "matching": {
        "nodes": { "N": "5", "M": "1" },
        "edges": {
          "e": { "source": "1", "label": "dislocated", "target": "5" }
        }
      }
    }
  ],
  "No": [
    {
      "sent_id": "n01001011",
      "matching": {
        "nodes": { "N": "20", "M": "29" },
        "edges": {
          "e": { "source": "29", "label": "dislocated", "target": "20" }
        }
      }
    }
  ]
}

Finally, several clustering can be applied successively. For instance

grew grep -request dislocated.req -key N.upos -whether "M << N" -i fr_pud-ud-test.conllu
{
  "PRON": {
    "Yes": [
      {
        "sent_id": "n01086031",
        "matching": {
          "nodes": { "N": "5", "M": "1" },
          "edges": {
            "e": { "source": "1", "label": "dislocated", "target": "5" }
          }
        }
      }
    ]
  },
  "NOUN": {
    "Yes": [
      {
        "sent_id": "n01121051",
        "matching": {
          "nodes": { "N": "11", "M": "2" },
          "edges": {
            "e": { "source": "2", "label": "dislocated", "target": "11" }
          }
        }
      }
    ],
    "No": [
      {
        "sent_id": "n01001011",
        "matching": {
          "nodes": { "N": "20", "M": "29" },
          "edges": {
            "e": { "source": "29", "label": "dislocated", "target": "20" }
          }
        }
      }
    ]
  }
}

Remarks:


Count

This mode computes corpus statistics based on Grew-match style requests.

The input data are:

By defalut, it returns a JSON describing several embedded dictionaries, counting in each corpus, each request clustered following clustering items.

If the output dimension is 2, the statistics can be printed as a TSV table. This is the case for:

The optionnal -config parameter (see here) can also be used.

TODO: The set of corpora is described in a JSON file and must be compiled before running grew count.

Example with Multi mode, several requests and no clustering

Each request is described in a separate file. With the two following 1-line files:

and the Multi mode file en_fr_zh.json 🔗

{ "corpora": [
  { "id": "UD_English-PUD",
    "directory": "_build",
    "files": ["en_pud-ud-test.conllu"]
  },
  { "id": "UD_French-PUD",
    "directory": "_build",
    "files": ["fr_pud-ud-test.conllu"]
  },
  { "id": "UD_Chinese-PUD",
    "directory": "_build",
    "files": ["zh_pud-ud-test.conllu"]
  } ]
}

After compiling the corpora: grew compile -i en_fr_zh.json

The command grew count -request ADJ_NOUN_pre.req -request ADJ_NOUN_post.req -i en_fr_zh.json outputs the JSON data:

{
  "UD_French-PUD": { "ADJ_NOUN_pre.req": 423, "ADJ_NOUN_post.req": 935 },
  "UD_English-PUD": { "ADJ_NOUN_pre.req": 1114, "ADJ_NOUN_post.req": 12 },
  "UD_Chinese-PUD": { "ADJ_NOUN_pre.req": 364, "ADJ_NOUN_post.req": 0 }
}

And, with -tsv option: grew count -request ADJ_NOUN_pre.req -request ADJ_NOUN_post.req -i en_fr_zh.json -tsv

Corpus	ADJ_NOUN_pre	ADJ_NOUN_post
UD_English-PUD	1114	12
UD_French-PUD	423	935
UD_Chinese-PUD	364	0

which corresponds to the table:

Corpus ADJ_NOUN NOUN_ADJ
UD_English-PUD 1114 12
UD_French-PUD 423 935
UD_Chinese-PUD 364 0

We can then observe that in the annotations of the 3 corpora in use:

Example with Multi mode, one request and a key clustering of the output

With the same data as in the previous example, the following command:

grew count -request ADJ_NOUN_pre.req -key N.Number -i en_fr_zh.json -tsv

produces the TSV file:

Corpus	__undefined__	Plur	Sing
UD_English-PUD	0	392	722
UD_French-PUD	0	178	245
UD_Chinese-PUD	364	0	0

which corresponds to the table:

Corpus Plur Sing undefined
UD_English-PUD 392 722 0
UD_French-PUD 178 245 0
UD_Chinese-PUD 0 0 364

Example with Multi mode, one request and a whether clustering of the output

Using a whether clustering, with the request ADJ_NOUN.req 🔗

pattern { A[upos=ADJ]; N[upos=NOUN]; N -[amod]-> A; }

and the command: grew count -request ADJ_NOUN.req -whether "A << N" -i en_fr_zh.json -tsv

we obtain the TSV file:

Corpus	No	Yes
UD_English-PUD	12	1114
UD_French-PUD	935	423
UD_Chinese-PUD	0	364

which corresponds to the table:

Corpus No Yes
UD_English-PUD 12 1114
UD_French-PUD 935 423
UD_Chinese-PUD 0 364

Remarks


Compile

For the Grew-match backend (grew_match_back) or for the command grew count, it is required to first compile corpora. For these two usages, sets of corpora are described in a JSON file.

For compilation, the command is:

grew compile -i <corpora.json>

Note that this produces, for each corpus, a new file with the .marshal extension stored in the corpus directory. The .marshal file is computed only if the corpus has changed since the last compilation.


Clean

The commands below removes the marshal files produced by the grew compile command for the set of corpora described in the JSON file corpora.json.

grew clean -i <corpora.json>


Parameters

This section describes a few command line arguments that are shared by several commands.

-config

The config value can be: ud, sud, sequoia or basic. The default value is ud.

This parameter modifies how CoNNL-U and GRS files are interpreted. More precisely, it controls:

This parameter is used in the transform, grep and count modes.