Grew-API for the Arborator-Grew tool

This documentation corresponds to the GitHub.

By default the doc applies to branch master. New features available only on the Dev server are identified by the flag [⚠️DEV⚠️]

Some services or features may be marked with [❌DEPRECATED❌], they will be removed in an upcoming server update.

The Arborator-Grew tool is available on https://arborator.grew.fr.

The grew_server tool is a web server which manages set of annotated graphs with multiple annotations on the same sentence. It is built to be used as an API by the Arborator-Grew graph annotation tool.

Below, we suppose that the server is available on some baseURL.

Annotations are stored with the following hierarchy:

the server manages any number of projects
each project contains any number of samples
each sample contains any number of sentences
each sentence may be annotated by any number of users

We describe below the list of available services to deal with these levels. All services are called with a base name and with POST parameters. Three types of parameter are used: <string>, <int> and <file>.

All services reply with JSON data of one of this three forms:

{ "status": "OK", "data": … } when the request was executed correctly, the content of the data field depends on the service.
{ "status": "WARNING", "messages": …, "data": … } when the request can be partially executed; the messages fields contains a list of messages.
{ "status": "ERROR", "message": "…" } when the request cannot be executed.

Projects

`newProject`

(<string> project_id)

This service is used to initialise a new empty project. An error is returned if a project with the same name already exists.

`getProjects`

()

This service returns the list of existing projects and some metadata for each project.

The returned value is a list of dict:

[
  { 
    "name": "project_1", 
    "number_samples": 23,
    "number_sentences": 45,
    "number_tokens": 574,
    "number_trees": 79,
    "users": [ "Alice", "Bob", "Charlie" ],
  },
  { 
    "name": "project_2"
    "number_samples": 2
    "number_sentences": 4
    "number_tokens": 54
    "number_trees": 9
    "users": [ "Alice", "Bob" ]
  }
]

`getUserProjects`

(<string> user_id)

The returned value is the same as in the getProjects service but only with projects where the given user_id is in project’s users.

`eraseProject`

(<string> project_id)

This service is used to remove a project. If the project does not exist, nothing happens.

`renameProject`

(<string> project_id, <string> new_project_id)

Renaming of an existing project. An error is produced either if project_id does not exists or if new_project_id already exists.

Samples

All services about samples return an error if the requested project does not exist.

`newSamples`

(<string> project_id, <string> sample_ids)

This service is used to initialise a list of new empty samples in a given project.

The string sample_ids must be a JSON encoding of a list of strings (like ["sample_1", "sample_2"]). If one of the given sample_id already exists in the project, an error is reported and the project is unchanged (no new sample is created).

`getSamples`

(<string> project_id)

This service returns the list of existing samples in a given project.

[
  {
    "name": "sample",
    "number_sentences": 2,
    "number_tokens": 23,
    "number_trees": 4,
    "tree_by_user": {"charlie": 1, "bob": 2, "alice": 1},
    "tags": {"TODO": 2, "IN PROGESS": 3, "DONE": 5}}
  }
]

The field tree_by_user was added in February 2023 aa8e97a5.
The field tags was added in September 2024 e1591f5a.

`eraseSamples`

(<string> project_id, <string> sample_ids)

This service is used to remove a list of samples. For sample which does not exist, nothing happens. The string sample_ids must be a JSON encoding of a list of strings (like ["sample_1", "sample_2"]).

NB: Unlike for other services, an empty list in sample_ids in not interpreted as all samples, an empty list will not erase any sample.

`renameSample`

(<string> project_id, <string> sample_id, <string> new_sample_id)

An error is returned either if sample_id does not exist or if new_sample_id already exists in project_id.

Sentences

`eraseSentence`

(<string> project_id, <string> sample_id, <string> sent_id)

Graphs

`eraseGraphs`

(<string> project_id, <string> sample_id, <string> sent_ids, <string> user_id)

This service is used to remove a list of graphs, in a given sample_id and for a given user_id. The string sent_ids must be a JSON encoding of a list of strings (like ["sent_1", "sent_2"]). If sent_ids is the empty list, all graphs for the given user in the sample are erased.

Other `get` services

`getConll`

(<string> project_id, <string> sample_id, <string> sent_id, <string> user_id) returns a conll_string
(<string> project_id, <string> sample_id, <string> sent_id) returns a dict user_id → conll_string
(<string> project_id, <string> sample_id) returns a 2-levels dict sent_id → user_id → conll_string

`getUsers`

(<string> project_id, <string> sample_id, <string> sent_id)
(<string> project_id, <string> sample_id)
(<string> project_id)

`getSentIds`

(<string> project_id, <string> sample_id)
(<string> project_id)

Save annotations

`saveConll`

(<string> project_id, <string> sample_id, <file> conll_file)

`insertConll`

(<string> project_id, <string> sample_id, <file> conll_file, <string> pivot_sent_id)

Insert data from conll_file in the sample_id. Sentences that do not already exists before are inserted right after sentence pivot_sent_id. If no sentence pivot_sent_id exists, new sentences are inserted at the beginning of sample_id.

NB This service can be used for sentence splitting. If a sample containts 3 sentences with sent_ids: s1, s2 and s3; the splitting of s2 in s2a and s2b can be done with two operations:

insertConll with conll_file containing new data for s2a, s2b and pivot_sent_id = s2
eraseSentence with sent_id = s2

`saveGraph`

(<string> project_id, <string> sample_id, <string> user_id, <string> conll_graph)

This service saves (updates or creates) each graph described in conll_graphs under user_id name. The argument conll_graphs must be one string with all graphs separated by an empty line (as in usual CoNLL-U files for corpora).

Search with Grew requests

`searchRequestInGraphs`

The two services below are deprecated and are replaced by similar ones with the sample_ids parameter.

[❌DEPRECATED❌] (<string> project_id, <string> user_ids, <string> request) returns a list of occurrences.

Given a Grew request, a list of users and a project, this service returns a list of occurrences of the request in the project.

See here for the usage of user_ids argument.

Each occurrence is described by a dict

{
  'sample_id':…,
  'sent_id':…,
  'conll':…,
  'user_id':…,
  'nodes':…,
  'edges':…
}

The same service is avalaible with clustering:

[❌DEPRECATED❌] (<string> project_id, <string> user_ids, <string> request, <string> clusters) where clusters is a list of cluster keys, separated by ;. This returns nested dictionaries (the depth being equals to the length of the cluster key list). The set of occurrences of the request in project_id are clustered with the first key of the list; each cluster is further clustered recursively with the remaining keys. For instance: If the length of the cluster keys list is 1, the behaviour is similar the the clustering feature available in Grew-match.

[⚠️DEV⚠️]: parameter `sample_ids`

The searchRequestInGraphsservice is available with the additional parameter sample_ids (See here for the usage of sample_ids argument.):

(<string> project_id, <string> sample_ids, <string> user_ids, <string> request) returns a list of occurrences.
(<string> project_id, <string> sample_ids, <string> user_ids, <string> request, <string> clusters)

Relation tables

In order to produce the relations tables (as in Grew-match), the following service can be used:

`relationTables`

(<string> project_id, <string> sample_ids, <string> user_ids)

See here for the usage of sampe_ids and user_ids POST parameters.

The service returns a JSON dictionary of depth 3 where keys are:

the dependency relations label
the upos of the governor of the relation
the upos of the dependant of the relation (NB: ExtPos is taken into account if present)

and the values are integers indicating the number of occurrences of the each triple of keys.

Example

With the following CoNLL:

1	(	_	PUNCT	_	_	3	punct	_	_
2	ouvert	_	ADJ	_	_	0	root	_	_
3	à	_	ADP	_	_	2	mod	_	ExtPos=ADV|Idiom=Yes
4	nouveau	_	ADJ	_	_	3	comp:obj	_	InIdiom=Yes
5	)	_	PUNCT	_	_	3	punct	_	_

The relationTables service returns:

{
  "root": { "_": { "ADJ": 1 } },
  "punct": { "ADP": { "PUNCT": 2 } },
  "mod": { "ADJ": { "ADV": 1 } },
  "comp:obj": { "ADP": { "ADJ": 1 } }
}

Note the the mod relation has ADV as the POS for the dependant, because of the ExtPos feature on the word à.

The Grew request corresponding to the mod line is:

pattern { X -[mod]-> Y; X [upos="ADJ"]; Y [ExtPos="ADV"/upos="ADV"]; }

Applying Grew rules

`tryPackage`

(<string> project_id, <string> sample_ids, <string> user_ids, <string> package)

See here for the usage of sample_ids and user_ids arguments.

For user_ids, only the value { "one" : […] } is accepted in order to ensure that only at most one new graph can be returned for each sentence.

The package parameter must be a JSON string encoding a list of rules. For instance:

"rule r1 { pattern { X [upos=VERB] } commands { X.upos = V } }
rule r2 { pattern { e: X -[nsubj]-> Y } commands { del_edge e; add_edge X -[subj]-> Y }"

See Grew command syntax for doc about the commands part.

The output is the list of new graphs produced by the package applications (note that the same rule may be applied more than once in a given graph). Each item of the list is an object with the following fields:

conll: the graph obtained after one or several applications of the rules.
sample_id
sent_id
user_id
modified_edges with source id, new label and target_id
modified_nodes with the id of the node and the list of features modified by the rule

Below, an example of output after a rewrite with the two rules:

pattern { X [upos=VERB] } commands { X.upos=V }
pattern { e: X -[nsubj]-> Y } commands { del_edge e; add_edge X -[NSUBJ]-> Y }

# sent_id = fr-ud-dev_00002
# user_id = ud
# text = Les études durent six ans mais leur contenu diffère donc selon les Facultés.
1	Les	le	DET	_	Definite=Def|Number=Plur|PronType=Art	2	det	_	wordform=les
2	études	étude	NOUN	_	Gender=Fem|Number=Plur|Shared=No	3	SUBJ	_	_
3	durent	durer	V	_	Mood=Ind|Number=Plur|Person=3|Tense=Pres|VerbForm=Fin	0	root	_	_
4	six	six	NUM	_	Number=Plur	5	det	_	_
5	ans	an	NOUN	_	Gender=Masc|Number=Plur	3	comp:obj	_	_
6	mais	mais	CCONJ	_	_	9	cc	_	_
7	leur	son	DET	_	Number=Sing|Number[psor]=Plur|Person[psor]=3|PronType=Prs	8	det	_	_
8	contenu	contenu	NOUN	_	Gender=Masc|Number=Sing	9	SUBJ	_	_
9	diffère	différer	V	_	Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin	3	conj	_	_
10	donc	donc	ADV	_	_	9	mod	_	_
11	selon	selon	ADP	_	_	9	mod	_	_
12	les	le	DET	_	Definite=Def|Number=Plur|PronType=Art	13	det	_	_
13	Facultés	faculté	NOUN	_	Gender=Fem|Number=Plur	11	comp:obj	_	SpaceAfter=No|wordform=facultés
14	.	.	PUNCT	_	_	3	punct	_	_

and the output data returned by the service (with CoNLL code skipped):

[
    {
        "sample_id": "single",
        "sent_id": "fr-ud-dev_00002",
        "conll": "...",
        "user_id": "ud",
        "modified_edges": [
            {
                "src": "9",
                "edge": "SUBJ",
                "tar": "8"
            },
            {
                "src": "3",
                "edge": "SUBJ",
                "tar": "2"
            }
        ],
        "modified_nodes": [
            {
                "id": "3",
                "features": [
                    "upos"
                ]
            },
            {
                "id": "9",
                "features": [
                    "upos"
                ]
            }
        ]
    }
]

Services for project configuration

`getProjectConfig`

(<string> project_id)

The service returns a JSON data of the current configuration of the project

`updateProjectConfig`

(<string> project_id, <string> config)

The service update the current configuration associated to the project.

Export the most recent data in a project

`exportProject`

(<string> project_id, <string> sample_ids)

See here for the usage of sample_ids argument.

The service returns an URL on a file containing the “export” of the project. In the export:

only graphs in the project with a timestamp numerical metadata are present
if several graphs share the same sent_id, keep only the graph with the highest timestamp

Get the lexicon computed from a treebank

`getLexicon`

(<string> project_id, <string> user_ids, <string> sample_ids, <string> features)
(<string> project_id, <string> user_ids, <string> sample_ids, <string> features, <int> prune)

See here for the usage of sample_ids and user_ids arguments. The string features must be a JSON encoding of a list of strings (ex: ["form", "lemma", "upos", "Gender"]).

The service returns a JSON data of the lexicon computed form the given corpora.

The output is a list of objects. Each object contains two fields:

feats: an object whose keys follow features argument (value are string or null)
freq: an int giving the frequency of the lexical item

If the prune integer argument is set as n, only the subset of unambiguous structures at depth n is reported. For instance, if the keys are ["form", "lemma", "upos", "Gender", "Number"], the pruning at level 3 will keep only lexicon entries where there is more than one couple of value for Gender and Number with the same triple of values for features form, lemma and upos.

Example

With a corpus containing the following sentence:

1	espace	espace	NOUN	_	Gender=Fem|Number=Sing	_	_	_	_
2	espace	espace	NOUN	_	Gender=Fem|Number=Sing	_	_	_	_
3	espace	espace	NOUN	_	Gender=Masc|Number=Sing	_	_	_	_
4	maison	maison	NOUN	_	Gender=Fem|Number=Sing	_	_	_	_
5	maison	maison	NOUN	_	Gender=Fem|Number=Sing	_	_	_	_
6	souris	souris	NOUN	_	Gender=Fem	_	_	_	_
7	souris	souris	NOUN	_	Gender=Fem|Number=Plur	_	_	_	_

The getLexicon with features ["form", "lemma", "upos", "Gender", "Number"] returns the JSON below (note the second line where the value associated with Number is null):

[
  { "feats": { "form": "souris", "lemma": "souris", "upos": "NOUN", "Gender": "Fem", "Number": "Plur" }, "freq": 1 },
  { "feats": { "form": "souris", "lemma": "souris", "upos": "NOUN", "Gender": "Fem", "Number": null }, "freq": 1 },
  { "feats": { "form": "espace", "lemma": "espace", "upos": "NOUN", "Gender": "Masc", "Number": "Sing" }, "freq": 1 },
  { "feats": { "form": "espace", "lemma": "espace", "upos": "NOUN", "Gender": "Fem", "Number": "Sing" }, "freq": 2 },
  { "feats": { "form": "maison", "lemma": "maison", "upos": "NOUN", "Gender": "Fem", "Number": "Sing" }, "freq": 2 }
]

and with the additional argument prune with value 3, the line about maison is not returned because the triple (maison, maison, NOUN) is associated with only one line in the previous structure.

[
  { "feats": { "form": "souris", "lemma": "souris", "upos": "NOUN", "Gender": "Fem", "Number": "Plur" }, "freq": 1 },
  { "feats": { "form": "souris", "lemma": "souris", "upos": "NOUN", "Gender": "Fem", "Number": null }, "freq": 1 },
  { "feats": { "form": "espace", "lemma": "espace", "upos": "NOUN", "Gender": "Masc", "Number": "Sing" }, "freq": 1 },
  { "feats": { "form": "espace", "lemma": "espace", "upos": "NOUN", "Gender": "Fem", "Number": "Sing" }, "freq": 2 },
]

Get tagset or features from a treebank

See here for the usage of sample_ids argument.

`getPOS`

(<string> project_id, <string> sample_ids)

returns the list of POS (upos feature) used in the data.

`getRelations`

(<string> project_id, <string> sample_ids)

returns the list of relations used in the data/

`getFeatures`

(<string> project_id, <string> sample_ids)

returns the list of feature names used in the data.

In [⚠️DEV⚠️], the service returns two separate lists for FEATS / MISC features (according to the current config).

{
  "FEATS": [
    "Aspect",
    "Number",
    "Number[psor]",
    "Person",
    "Polarity",
    "Tense"
  ],
  "MISC": [
    "AlignBegin",
    "AlignEnd",
    "Gloss"
  ]
}

Generic arguments usage

`sample_ids`

Several services use a string argument named sample_ids. The string sample_ids must be a JSON encoding of a list of strings (like ["sample_1", "sample_2"]).

If the sample_ids list is not empty, only sentences from a sample_id in the list are considered.
If the sample_ids list is empty, all sentences are considered (except for the service eraseSamples).

⚠️ If the list contains an unused sample_id, no error is returned and the sample_id is ignored.

`user_ids`

The string user_ids must be a JSON encoding of one of these forms:

The string "all": all users are taken into account for each sentence
The object { "multi" : ["user_1", "user_2", …] }: all users explicitly mentioned in the list are taken into account for each sentence
The object { "one" : ["user_1", "user_2", …] }: for each sentence, only one graph (at most) is returned; the one for the first user of the list for which the graph is defined. In the list, the pseudo-user __last__ can be used. It selects the graph with the most recent timestamp.

This parameter is used for the services:

searchRequestInGraphs
getLexicon
tryPackage: in this case, only the value { "one" : […] } is accepted in order to ensure that only at most one new graph can be returned for each sentence.

This fulfils the request #110:

See for my trees → { "one" : ["current_user"] } (or { "multi" : ["current_user"] } which has the same meaning)

See for my trees or last tree (only one user_id per tree is returned) → { "one" : ["current_user", "__last__"] }

See last trees → { "one" : ["__last__"] }

See trees from everyone → "all"

See trees for users in a given list → { "multi" : ["user_1", "user_2", …] }

Grew

Grew-API for the Arborator-Grew tool

Projects

newProject

getProjects

getUserProjects

eraseProject

renameProject

Samples

newSamples

getSamples

eraseSamples

renameSample

Sentences

eraseSentence

Graphs

eraseGraphs

Other get services

getConll

getUsers

getSentIds

Save annotations

saveConll

insertConll

saveGraph

Search with Grew requests

searchRequestInGraphs

[⚠️DEV⚠️]: parameter sample_ids

Relation tables

relationTables

Example

Applying Grew rules

tryPackage

Services for project configuration

getProjectConfig

updateProjectConfig

Export the most recent data in a project

exportProject

Get the lexicon computed from a treebank

getLexicon

Example

Get tagset or features from a treebank

getPOS

getRelations

getFeatures

Generic arguments usage

sample_ids

user_ids

`newProject`

`getProjects`

`getUserProjects`

`eraseProject`

`renameProject`

`newSamples`

`getSamples`

`eraseSamples`

`renameSample`

`eraseSentence`

`eraseGraphs`

Other `get` services

`getConll`

`getUsers`

`getSentIds`

`saveConll`

`insertConll`

`saveGraph`

`searchRequestInGraphs`

[⚠️DEV⚠️]: parameter `sample_ids`

`relationTables`

`tryPackage`

`getProjectConfig`

`updateProjectConfig`

`exportProject`

`getLexicon`

`getPOS`

`getRelations`

`getFeatures`

`sample_ids`

`user_ids`