Grew-API for the Arborator-Grew tool

This documentation corresponds to the GitHub.

By default the doc applies to branch master. New features available only on the Dev server are identified by the flag [⚠️DEV⚠️]

Some services or features may be marked with [❌DEPRECATED❌], they will be removed in an upcoming server update.


The Arborator-Grew tool is available on https://arborator.grew.fr.

The grew_server tool is a web server which manages set of annotated graphs with multiple annotations on the same sentence. It is built to be used as an API by the Arborator-Grew graph annotation tool.

Below, we suppose that the server is available on some baseURL.

Annotations are stored with the following hierarchy:

We describe below the list of available services to deal with these levels. All services are called with a base name and with POST parameters. Three types of parameter are used: <string>, <int> and <file>.

All services reply with JSON data of one of this three forms:


Projects

newProject

This service is used to initialise a new empty project. An error is returned if a project with the same name already exists.

getProjects

This service returns the list of existing projects and some metadata for each project.

The returned value is a list of dict:

[
  { 
    "name": "project_1", 
    "number_samples": 23,
    "number_sentences": 45,
    "number_tokens": 574,
    "number_trees": 79,
    "users": [ "Alice", "Bob", "Charlie" ],
  },
  { 
    "name": "project_2"
    "number_samples": 2
    "number_sentences": 4
    "number_tokens": 54
    "number_trees": 9
    "users": [ "Alice", "Bob" ]
  }
]

getUserProjects

The returned value is the same as in the getProjects service but only with projects where the given user_id is in project’s users.

eraseProject

This service is used to remove a project. If the project does not exist, nothing happens.

renameProject

Renaming of an existing project. An error is produced either if project_id does not exists or if new_project_id already exists.


Samples

All services about samples return an error if the requested project does not exist.

newSamples

This service is used to initialise a list of new empty samples in a given project.

The string sample_ids must be a JSON encoding of a list of strings (like ["sample_1", "sample_2"]). If one of the given sample_id already exists in the project, an error is reported and the project is unchanged (no new sample is created).

getSamples

This service returns the list of existing samples in a given project.

[
  {
    "name": "sample",
    "number_sentences": 2,
    "number_tokens": 23,
    "number_trees": 4,
    "tree_by_user": {"charlie": 1, "bob": 2, "alice": 1},
    "tags": {"TODO": 2, "IN PROGESS": 3, "DONE": 5}}
  }
]

eraseSamples

This service is used to remove a list of samples. For sample which does not exist, nothing happens. The string sample_ids must be a JSON encoding of a list of strings (like ["sample_1", "sample_2"]).

NB: Unlike for other services, an empty list in sample_ids in not interpreted as all samples, an empty list will not erase any sample.

renameSample

An error is returned either if sample_id does not exist or if new_sample_id already exists in project_id.


Sentences

eraseSentence


Graphs

eraseGraphs

This service is used to remove a list of graphs, in a given sample_id and for a given user_id. The string sent_ids must be a JSON encoding of a list of strings (like ["sent_1", "sent_2"]). If sent_ids is the empty list, all graphs for the given user in the sample are erased.


Other get services

getConll

getUsers

getSentIds


Save annotations

saveConll

insertConll

Insert data from conll_file in the sample_id. Sentences that do not already exists before are inserted right after sentence pivot_sent_id. If no sentence pivot_sent_id exists, new sentences are inserted at the beginning of sample_id.

NB This service can be used for sentence splitting. If a sample containts 3 sentences with sent_ids: s1, s2 and s3; the splitting of s2 in s2a and s2b can be done with two operations:

  1. insertConll with conll_file containing new data for s2a, s2b and pivot_sent_id = s2
  2. eraseSentence with sent_id = s2

saveGraph

This service saves (updates or creates) each graph described in conll_graphs under user_id name. The argument conll_graphs must be one string with all graphs separated by an empty line (as in usual CoNLL-U files for corpora).


Search with Grew requests

searchRequestInGraphs

The two services below are deprecated and are replaced by similar ones with the sample_ids parameter.

Given a Grew request, a list of users and a project, this service returns a list of occurrences of the request in the project.

See here for the usage of user_ids argument.

Each occurrence is described by a dict

{
  'sample_id':…,
  'sent_id':…,
  'conll':…,
  'user_id':…,
  'nodes':…,
  'edges':…
}

The same service is avalaible with clustering:

[⚠️DEV⚠️]: parameter sample_ids

The searchRequestInGraphsservice is available with the additional parameter sample_ids (See here for the usage of sample_ids argument.):


Relation tables

In order to produce the relations tables (as in Grew-match), the following service can be used:

relationTables

See here for the usage of sampe_ids and user_ids POST parameters.

The service returns a JSON dictionary of depth 3 where keys are:

and the values are integers indicating the number of occurrences of the each triple of keys.

Example

With the following CoNLL:

1	(	_	PUNCT	_	_	3	punct	_	_
2	ouvert	_	ADJ	_	_	0	root	_	_
3	à	_	ADP	_	_	2	mod	_	ExtPos=ADV|Idiom=Yes
4	nouveau	_	ADJ	_	_	3	comp:obj	_	InIdiom=Yes
5	)	_	PUNCT	_	_	3	punct	_	_

The relationTables service returns:

{
  "root": { "_": { "ADJ": 1 } },
  "punct": { "ADP": { "PUNCT": 2 } },
  "mod": { "ADJ": { "ADV": 1 } },
  "comp:obj": { "ADP": { "ADJ": 1 } }
}

Note the the mod relation has ADV as the POS for the dependant, because of the ExtPos feature on the word à.

The Grew request corresponding to the mod line is:

pattern { X -[mod]-> Y; X [upos="ADJ"]; Y [ExtPos="ADV"/upos="ADV"]; }

Applying Grew rules

tryPackage

See here for the usage of sample_ids and user_ids arguments.

For user_ids, only the value { "one" : […] } is accepted in order to ensure that only at most one new graph can be returned for each sentence.

The package parameter must be a JSON string encoding a list of rules. For instance:

"rule r1 { pattern { X [upos=VERB] } commands { X.upos = V } }
rule r2 { pattern { e: X -[nsubj]-> Y } commands { del_edge e; add_edge X -[subj]-> Y }"

See Grew command syntax for doc about the commands part.

The output is the list of new graphs produced by the package applications (note that the same rule may be applied more than once in a given graph). Each item of the list is an object with the following fields:

Below, an example of output after a rewrite with the two rules:

# sent_id = fr-ud-dev_00002
# user_id = ud
# text = Les études durent six ans mais leur contenu diffère donc selon les Facultés.
1	Les	le	DET	_	Definite=Def|Number=Plur|PronType=Art	2	det	_	wordform=les
2	études	étude	NOUN	_	Gender=Fem|Number=Plur|Shared=No	3	SUBJ	_	_
3	durent	durer	V	_	Mood=Ind|Number=Plur|Person=3|Tense=Pres|VerbForm=Fin	0	root	_	_
4	six	six	NUM	_	Number=Plur	5	det	_	_
5	ans	an	NOUN	_	Gender=Masc|Number=Plur	3	comp:obj	_	_
6	mais	mais	CCONJ	_	_	9	cc	_	_
7	leur	son	DET	_	Number=Sing|Number[psor]=Plur|Person[psor]=3|PronType=Prs	8	det	_	_
8	contenu	contenu	NOUN	_	Gender=Masc|Number=Sing	9	SUBJ	_	_
9	diffère	différer	V	_	Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin	3	conj	_	_
10	donc	donc	ADV	_	_	9	mod	_	_
11	selon	selon	ADP	_	_	9	mod	_	_
12	les	le	DET	_	Definite=Def|Number=Plur|PronType=Art	13	det	_	_
13	Facultés	faculté	NOUN	_	Gender=Fem|Number=Plur	11	comp:obj	_	SpaceAfter=No|wordform=facultés
14	.	.	PUNCT	_	_	3	punct	_	_

and the output data returned by the service (with CoNLL code skipped):

[
    {
        "sample_id": "single",
        "sent_id": "fr-ud-dev_00002",
        "conll": "...",
        "user_id": "ud",
        "modified_edges": [
            {
                "src": "9",
                "edge": "SUBJ",
                "tar": "8"
            },
            {
                "src": "3",
                "edge": "SUBJ",
                "tar": "2"
            }
        ],
        "modified_nodes": [
            {
                "id": "3",
                "features": [
                    "upos"
                ]
            },
            {
                "id": "9",
                "features": [
                    "upos"
                ]
            }
        ]
    }
]

Services for project configuration

getProjectConfig

The service returns a JSON data of the current configuration of the project

updateProjectConfig

The service update the current configuration associated to the project.


Export the most recent data in a project

exportProject

See here for the usage of sample_ids argument.

The service returns an URL on a file containing the “export” of the project. In the export:


Get the lexicon computed from a treebank

getLexicon

See here for the usage of sample_ids and user_ids arguments. The string features must be a JSON encoding of a list of strings (ex: ["form", "lemma", "upos", "Gender"]).

The service returns a JSON data of the lexicon computed form the given corpora.

The output is a list of objects. Each object contains two fields:

If the prune integer argument is set as n, only the subset of unambiguous structures at depth n is reported. For instance, if the keys are ["form", "lemma", "upos", "Gender", "Number"], the pruning at level 3 will keep only lexicon entries where there is more than one couple of value for Gender and Number with the same triple of values for features form, lemma and upos.

Example

With a corpus containing the following sentence:

1	espace	espace	NOUN	_	Gender=Fem|Number=Sing	_	_	_	_
2	espace	espace	NOUN	_	Gender=Fem|Number=Sing	_	_	_	_
3	espace	espace	NOUN	_	Gender=Masc|Number=Sing	_	_	_	_
4	maison	maison	NOUN	_	Gender=Fem|Number=Sing	_	_	_	_
5	maison	maison	NOUN	_	Gender=Fem|Number=Sing	_	_	_	_
6	souris	souris	NOUN	_	Gender=Fem	_	_	_	_
7	souris	souris	NOUN	_	Gender=Fem|Number=Plur	_	_	_	_

The getLexicon with features ["form", "lemma", "upos", "Gender", "Number"] returns the JSON below (note the second line where the value associated with Number is null):

[
  { "feats": { "form": "souris", "lemma": "souris", "upos": "NOUN", "Gender": "Fem", "Number": "Plur" }, "freq": 1 },
  { "feats": { "form": "souris", "lemma": "souris", "upos": "NOUN", "Gender": "Fem", "Number": null }, "freq": 1 },
  { "feats": { "form": "espace", "lemma": "espace", "upos": "NOUN", "Gender": "Masc", "Number": "Sing" }, "freq": 1 },
  { "feats": { "form": "espace", "lemma": "espace", "upos": "NOUN", "Gender": "Fem", "Number": "Sing" }, "freq": 2 },
  { "feats": { "form": "maison", "lemma": "maison", "upos": "NOUN", "Gender": "Fem", "Number": "Sing" }, "freq": 2 }
]

and with the additional argument prune with value 3, the line about maison is not returned because the triple (maison, maison, NOUN) is associated with only one line in the previous structure.

[
  { "feats": { "form": "souris", "lemma": "souris", "upos": "NOUN", "Gender": "Fem", "Number": "Plur" }, "freq": 1 },
  { "feats": { "form": "souris", "lemma": "souris", "upos": "NOUN", "Gender": "Fem", "Number": null }, "freq": 1 },
  { "feats": { "form": "espace", "lemma": "espace", "upos": "NOUN", "Gender": "Masc", "Number": "Sing" }, "freq": 1 },
  { "feats": { "form": "espace", "lemma": "espace", "upos": "NOUN", "Gender": "Fem", "Number": "Sing" }, "freq": 2 },
]

Get tagset or features from a treebank

See here for the usage of sample_ids argument.

getPOS

returns the list of POS (upos feature) used in the data.

getRelations

returns the list of relations used in the data/

getFeatures

returns the list of feature names used in the data.

In [⚠️DEV⚠️], the service returns two separate lists for FEATS / MISC features (according to the current config).

{
  "FEATS": [
    "Aspect",
    "Number",
    "Number[psor]",
    "Person",
    "Polarity",
    "Tense"
  ],
  "MISC": [
    "AlignBegin",
    "AlignEnd",
    "Gloss"
  ]
}


Generic arguments usage

sample_ids

Several services use a string argument named sample_ids. The string sample_ids must be a JSON encoding of a list of strings (like ["sample_1", "sample_2"]).

⚠️ If the list contains an unused sample_id, no error is returned and the sample_id is ignored.

user_ids

The string user_ids must be a JSON encoding of one of these forms:

This parameter is used for the services:

This fulfils the request #110:

  • See for my trees → { "one" : ["current_user"] } (or { "multi" : ["current_user"] } which has the same meaning)
  • See for my trees or last tree (only one user_id per tree is returned) → { "one" : ["current_user", "__last__"] }
  • See last trees → { "one" : ["__last__"] }
  • See trees from everyone → "all"
  • See trees for users in a given list → { "multi" : ["user_1", "user_2", …] }