JSON format used in Grew

The JSON format described here is intended to be the exchange format between the various graph representations used in different existing projects.

A graph is described by a JSON object with the following fields:

JSON encoding of nodes

Nodes are described by a JSON object where keys are node identifiers and values describe the node content.

The node content can be in one of the two following forms:

  1. a string
  2. a JSON object in which all values are strings (in general this describes a feature structure).

The string form is used when the node does not have a complex structure. In this case, the given string is interpreted as a feature structure with only one feature named label. Hence we have an equivalence between these two lines:

"nodes": { "X": "A" }
"nodes": { "X": { "label" : "A" } }

Nodes in CoNLL files are interpreted as complex node, for instance:

3	are	be	AUX	VA	Mood=Ind|Number=Plur|Tense=Pres|VerbForm=Fin	4	aux	_	_

corresponds to the following JSON node object:

{
  "form": "are",
  "lemma": "be",
  "upos": "AUX",
  "xpos": "VA"
  "Mood": "Ind",
  "Number": "Plur",
  "Tense": "Pres",
  "VerbForm": "Fin"
}

JSON encoding of an edge

An edge is described by a JSON object with three required fields:

As for nodes, edges labels are described by a feature structure with a shortcut for simple labels. Hence an edge label can be:

  1. a string
  2. a JSON object in which all values are strings (this describes a feature structure)

The string case is interpreted as a feature structure with one feature named 1 (to be compatible with complex edges used in UD / SUD encoding, see Graph edges description). The two following lines are then equivalent:

{ "src": "M", "label": "obj", "tar": "N" }
{ "src": "M", "label": { "1" : "obj" }, "tar": "N" }

JSON encoding of a metadata

The meta data associated with a graph is a JSON object in which all values are strings.

Nodes ordering

The field order must be a list of string, each string being a node identifier.


Examples

The empty graph

The empty graph is described by empty_graph.json:

{ "nodes" : {} }

Encoding of a non linguistic graph

{
  "nodes": {
    "A": "A",
    "B": "B",
    "C": "C"
  },
  "edges": [
    { "src": "A", "label": "X", "tar": "B"},
    { "src": "B", "label": "Y", "tar": "C"},
    { "src": "C", "label": "Z", "tar": "A"}
  ],
  "order": [ "A", "B" ]
}
abc

Encoding of a CoNLL graph

# sent_id = fr-ud-dev_00327
# text = Interview exclusive !
1	Interview	interview	NOUN	_	Gender=Fem|Number=Sing	0	root	_	wordform=interview
2	exclusive	exclusif	ADJ	_	Gender=Fem|Number=Sing	1	amod	_	_
3	!	!	PUNCT	_	_	1	punct	_	_

{
  "meta": {
    "sent_id": "fr-ud-dev_00327",
    "text": "Interview exclusive !",
    "_filename": "fr-ud-dev_00327.conllu"
  },
  "nodes": {
    "0": { "form": "__0__" },
    "1": {
      "Gender": "Fem",
      "Number": "Sing",
      "form": "Interview",
      "lemma": "interview",
      "textform": "Interview",
      "upos": "NOUN",
      "wordform": "interview"
    },
    "2": {
      "Gender": "Fem",
      "Number": "Sing",
      "form": "exclusive",
      "lemma": "exclusif",
      "textform": "exclusive",
      "upos": "ADJ",
      "wordform": "exclusive"
    },
    "3": {
      "form": "!",
      "lemma": "!",
      "textform": "!",
      "upos": "PUNCT",
      "wordform": "!"
    }
  },
  "edges": [
    { "src": "1", "label": "punct", "tar": "3" },
    { "src": "1", "label": "amod", "tar": "2" },
    { "src": "0", "label": "root", "tar": "1" }
  ],
  "order": [ "0", "1", "2", "3" ]
}

⚠️ the feature wordform and textform are set when a CoNLL structure is loaded (see CoNLL-U format).