JSON format used in Grew
The JSON format described here is intended to be the exchange format between the various graph representations used in different existing projects.
- ⚠️ This format is available in version
1.5
ofgrew
; check your version withgrew version
and upgrade if needed. - ⚠️ This format is different from the one used for exchanges with the Python binding.
A graph is described by a JSON object with the following fields:
meta
(optional): an JSON object storing metadata at the graph level;nodes
(required): an JSON object for graph nodes;edges
(optional): an array for graph edges;order
(optional): an array of node identifiers (strings) describing the subset of nodes of the graph that are ordered.
JSON encoding of nodes
Nodes are described by an JSON objects where keys are node identifiers and values describe the node content.
The node content can be in one of the two following forms:
- a string
- a JSON object in which all values are strings (in general this describes a feature structure).
The string form is used when the node does not have a complex structure. In this case, the given string is interpreted as a feature structure with only one feature named label
. Hence we have and equivalence between these two lines:
"nodes": { "N": "A" }
"nodes": { "N": { "label" : "A" } }
Nodes in CoNLL files are interpreted as complex node, for instance:
3 are be AUX VA Mood=Ind|Number=Plur|Tense=Pres|VerbForm=Fin 4 aux _ _
corresponds to the following JSON node object:
{
"form": "are",
"lemma": "be",
"upos": "AUX",
"xpos": "VA"
"Mood": "Ind",
"Number": "Plur",
"Tense": "Pres",
"VerbForm": "Fin",
}
JSON encoding of an edge
An edge is described by a JSON object with three following required fields:
src
: the node identifier of the source of the edgelabel
: the edge label descriptiontar
: the node identifier of the target of the edge
As for nodes, edges labels are described by a feature structure with a shortcut for simple labels. Hence an edge label can be:
- a string
- a JSON object in which all values are strings (this describes a feature structure)
The string case is interpreted as a feature structure with one feature named 1
(to be compatible with complex edges used in UD / SUD encoding, see Graph edges description).
The two following codes are equivalent:
"edges": [ "src": "M", "label": "obj", "tar": "N" ]
"edges": [ "src": "M", "label": { "1" : "obj" }, "tar": "N" ]
JSON encoding of a metadata
The meta data associated with a graph is a JSON object in which all values are strings.
Nodes ordering
The field order
must be a list of string, each string being a node identifier.
Examples
The empty graph
The empty graph is described by empty_graph.json
:
{ "nodes" : {} }
Encoding of a non linguistic graph
|
---|
Encoding of a CoNLL graph
# sent_id = fr-ud-dev_00327
# text = Interview exclusive !
1 Interview interview NOUN _ Gender=Fem|Number=Sing 0 root _ wordform=interview
2 exclusive exclusif ADJ _ Gender=Fem|Number=Sing 1 amod _ _
3 ! ! PUNCT _ _ 1 punct _ _
{
"meta": { "sent_id": "fr-ud-dev_00327", "text": "Interview exclusive !" },
"nodes": {
"0": { "form": "__0__" },
"1": {
"Gender": "Fem",
"Number": "Sing",
"form": "Interview",
"lemma": "interview",
"textform": "Interview",
"upos": "NOUN",
"wordform": "interview"
},
"2": {
"Gender": "Fem",
"Number": "Sing",
"form": "exclusive",
"lemma": "exclusif",
"textform": "exclusive",
"upos": "ADJ",
"wordform": "exclusive"
},
"3": {
"form": "!",
"lemma": "!",
"textform": "!",
"upos": "PUNCT",
"wordform": "!"
}
},
"edges": [
{ "src": "1", "label": "punct", "tar": "3" },
{ "src": "1", "label": "amod", "tar": "2" },
{ "src": "0", "label": "root", "tar": "1" }
],
"order": [ "0", "1", "2", "3" ]
}
⚠️ the feature wordform
and textform
are set when a CoNLL structure is loaded (see CoNLL-U format).