JSON format used in Grew
The JSON format described here is intended to be the exchange format between the various graph representations used in different existing projects.
- ⚠️ This format is available in version
1.5
ofgrew
; check your version withgrew version
and upgrade if needed. - ⚠️ This format is different from the one used for exchanges with the Python binding.
A graph is described by a JSON object with the following fields:
meta
(optional): an JSON object storing metadata at the graph level;nodes
(required): an JSON object for graph nodes;edges
(optional): an array for graph edges;order
(optional): an array of node identifiers (strings) describing the subset of nodes of the graph that are ordered.
JSON encoding of nodes
Nodes are described by a JSON object where keys are node identifiers and values describe the node content.
The node content can be in one of the two following forms:
- a string
- a JSON object in which all values are strings (in general this describes a feature structure).
The string form is used when the node does not have a complex structure. In this case, the given string is interpreted as a feature structure with only one feature named label
. Hence we have an equivalence between these two lines:
"nodes": { "X": "A" }
"nodes": { "X": { "label" : "A" } }
Nodes in CoNLL files are interpreted as complex node, for instance:
3 are be AUX VA Mood=Ind|Number=Plur|Tense=Pres|VerbForm=Fin 4 aux _ _
corresponds to the following JSON node object:
{
"form": "are",
"lemma": "be",
"upos": "AUX",
"xpos": "VA"
"Mood": "Ind",
"Number": "Plur",
"Tense": "Pres",
"VerbForm": "Fin"
}
JSON encoding of an edge
An edge is described by a JSON object with three required fields:
src
: the node identifier of the source of the edgelabel
: the edge label descriptiontar
: the node identifier of the target of the edge
As for nodes, edges labels are described by a feature structure with a shortcut for simple labels. Hence an edge label can be:
- a string
- a JSON object in which all values are strings (this describes a feature structure)
The string case is interpreted as a feature structure with one feature named 1
(to be compatible with complex edges used in UD / SUD encoding, see Graph edges description).
The two following lines are then equivalent:
{ "src": "M", "label": "obj", "tar": "N" }
{ "src": "M", "label": { "1" : "obj" }, "tar": "N" }
JSON encoding of a metadata
The meta data associated with a graph is a JSON object in which all values are strings.
Nodes ordering
The field order
must be a list of string, each string being a node identifier.
Examples
The empty graph
The empty graph is described by empty_graph.json
:
{ "nodes" : {} }
Encoding of a non linguistic graph
|
---|
Encoding of a CoNLL graph
# sent_id = fr-ud-dev_00327
# text = Interview exclusive !
1 Interview interview NOUN _ Gender=Fem|Number=Sing 0 root _ wordform=interview
2 exclusive exclusif ADJ _ Gender=Fem|Number=Sing 1 amod _ _
3 ! ! PUNCT _ _ 1 punct _ _
{
"meta": {
"sent_id": "fr-ud-dev_00327",
"text": "Interview exclusive !",
"_filename": "fr-ud-dev_00327.conllu"
},
"nodes": {
"0": { "form": "__0__" },
"1": {
"Gender": "Fem",
"Number": "Sing",
"form": "Interview",
"lemma": "interview",
"textform": "Interview",
"upos": "NOUN",
"wordform": "interview"
},
"2": {
"Gender": "Fem",
"Number": "Sing",
"form": "exclusive",
"lemma": "exclusif",
"textform": "exclusive",
"upos": "ADJ",
"wordform": "exclusive"
},
"3": {
"form": "!",
"lemma": "!",
"textform": "!",
"upos": "PUNCT",
"wordform": "!"
}
},
"edges": [
{ "src": "1", "label": "punct", "tar": "3" },
{ "src": "1", "label": "amod", "tar": "2" },
{ "src": "0", "label": "root", "tar": "1" }
],
"order": [ "0", "1", "2", "3" ]
}
⚠️ the feature wordform
and textform
are set when a CoNLL structure is loaded (see CoNLL-U format).