Grew • Command Line Interface
The command to run Grew is: grew <subcommand> [<args>]
Main subcommands are:
-
🔗
transform
: application of a rewriting system to a set of graphs -
🔗
grep
: search for a pattern in a corpus -
🔗
compile
: compile a set of corpora -
🔗
clean
: clean a set of corpora -
🔗
count
: compute stats of a set of patterns in a set of corpora) -
OBSOLETE 🔗
gui
: run the GTK interface
Other subcommands:
version
: Print version numbers of the Grew Ocaml library and of the Grew toolhelp
: Print general helphelp <subcommand>
: Print help for the given subcommand
Transform
In this mode, Grew apply a Graph Rewriting System to a graph or a set of graphs.
The full command for this mode:
grew transform [<args>]
All arguments are optional:
-grs <grs_file>
: the main file which describes the Graph Rewriting System. If no GRS is given, the empty GRS is loaded:strat main {Seq ()}
-i <input_file>
: describes the input data (CoNLL file of gr file). If no input file is given, Grew reads fromstdin
-o <output_file>
: is the name of the output file (CoNLL file). If no output file is given, Grew writes tostdout
-strat <name>
: the strategy used in transformation (default value:main
)-safe_commands
: flag. It makes rewriting process fail in case of ineffective command
Grep
This mode corresponds to the command line version of the Grew-match tool. The command is:
grew grep -pattern <pattern_file> -i <corpus_file>
where:
<pattern_file>
is a file which describes a pattern<corpus_file>
is the corpus in which the search is done
The output is given in JSON format.
Example
With the following files:
- The
dev
part of the corpusUD_French-GSD
version 2.6:fr_gsd-ud-dev.conllu
🔗 - A pattern file with the code below:
rouge.pat
🔗
pattern { e: M -> N; N [lemma="rouge"] }
The command:
grew grep -pattern rouge.pat -i fr_gsd-ud-dev.conllu
produces the following JSON output:
[
{
"sent_id": "w01133014",
"matching": {
"nodes": { "N": "7", "M": "6" },
"edges": {
"e": { "source": "6", "label": { "1": "amod" }, "target": "7" }
}
}
},
{
"sent_id": "n01098044",
"matching": {
"nodes": { "N": "10", "M": "9" },
"edges": {
"e": { "source": "9", "label": { "1": "amod" }, "target": "10" }
}
}
},
{
"sent_id": "n01050006",
"matching": {
"nodes": { "N": "22", "M": "21" },
"edges": {
"e": { "source": "21", "label": { "1": "amod" }, "target": "22" }
}
}
}
]
This means that the pattern described in the file rouge.pat
was found twice in the corpus, each item gives the sentence identifier and the position of the nodes and the edges matched by the pattern.
Note that two other options exist:
-html
: produces a newhtml
field in each JSON item with the sentence where words impacted by the pattern are in a special HTML span with classhighlight
-dep_dir <directory>
: produces a new file in the folderdirectory
with the representation of the sentence with highlighted part (as in Grew-match tool) and a new field in each JSON item with the filename; the output is indep
format (usable with Dep2pict).
Compile
For the Grew-match server (grew_daemon
) or for the command grew count
, it is required to first compile corpora.
For these two usages, sets of corpora are described in a JSON file.
For compilation, the command is:
grew compile -i <corpora.json>
Note that this produces, for each corpus, a new file with the marshal
extension stored in the corpus directory.
The marshal
is computed only if the corpus has changed since the last compilation.
Clean
The commands below removes the marshal
files produced by the grew compile
command for the set of corpora described in the JSON file corpora.json
.
grew clean -i <corpora.json>
Count
This mode computes corpus statistics. Given a set of patterns and a set of corpora, a TSV table is built with the number of occurrences for each pattern in each corpus.
The set of corpora is described in a JSON file and must be compiled before running grew count
.
Each pattern is described in a separate file.
Example
With the two following 1-line files:
ADJ_NOUN.pat
🔗pattern { A[upos=ADJ]; N[upos=NOUN]; N -[amod]-> A; A << N }
NOUN_ADJ.pat
🔗pattern { A[upos=ADJ]; N[upos=NOUN]; N -[amod]-> A; N << A }
and the example file en_fr_zh.json
🔗
{ "corpora": [
{ "id": "UD_English-PUD",
"directory": "_build",
"files": ["en_pud-ud-test.conllu"]
},
{ "id": "UD_French-PUD",
"directory": "_build",
"files": ["fr_pud-ud-test.conllu"]
},
{ "id": "UD_Chinese-PUD",
"directory": "_build",
"files": ["zh_pud-ud-test.conllu"]
} ]
}
- Compile the corpora:
grew compile -i en_fr_zh.json
- Build stat table:
grew count -patterns "ADJ_NOUN.pat NOUN_ADJ.pat" -i en_fr_zh.json
The output is given as TSV data:
Corpus # sentences ADJ_NOUN NOUN_ADJ
UD_English-PUD 1000 1118 12
UD_French-PUD 1000 423 935
UD_Chinese-PUD 1000 364 0
which corresponds to the table:
Corpus | # sentences | ADJ_NOUN | NOUN_ADJ |
---|---|---|---|
UD_English-PUD | 1000 | 1118 | 12 |
UD_French-PUD | 1000 | 423 | 935 |
UD_Chinese-PUD | 1000 | 364 | 0 |
We can then observe that in the annotations of the 3 corpora in use:
- in English, there is a strong preference for adjective position before the noun (98.9%)
- in French, there is a weak preference for adjective position after the noun (68.9%)
- in Chinese, there is a very strong preference for adjective position before the noun (100%)
Remarks
- The TSV table also contains a column with the size of corpora (in number of sentences), this can be useful to make cross-corpora analysis and to compute ratios instead of raw numbers.
- Pattern syntax can be learned here or with the online Grew-match tool, first with the tutorial and then with snippets given on the right of the text area.
- If some corpus is updated, it is necessary to run again the compilation step.
- Some patterns may take a long time to be searched in corpora.
GUI (Obsolete)
The command to run the GTK interface: grew gui [<args>]
.
It supposes that you have installed the grew_gui
opam package (see GUI installation page).
Optional arguments:
-grs <grs_file>
: load the given file-i <input_file>
: input data (graph or corpus) loaded in GUI-strat <name>
: the strategy selected in the interface (default:main
)-main_feat <feat_name_list>
set the list of feature names used ad the main feat in graph visualization