JSON format used in Grew

The JSON format described here is intended to be the exchange format between the various graph representations used in different existing projects.

A graph is described by a JSON object with the following fields:

JSON encoding of nodes

Nodes are described by an JSON objects where keys are node identifiers and values describe the node content.

The node content can be in one of the two following forms:

  1. a string
  2. a JSON object in which all values are strings (in general this describes a feature structure).

The string form is used when the node does not have a complex structure. In this case, the given string is interpreted as a feature structure with only one feature named label. Hence we have and equivalence between these two lines:

Nodes in CoNLL files are interpreted as complex node, for instance:

3	are	be	AUX	VA	Mood=Ind|Number=Plur|Tense=Pres|VerbForm=Fin	4	aux	_	_

corresponds to the following JSON node object:

{
  "form": "are",
  "lemma": "be",
  "upos": "AUX",
  "xpos": "VA"
  "Mood": "Ind",
  "Number": "Plur",
  "Tense": "Pres",
  "VerbForm": "Fin",
}

JSON encoding of an edge

An edge is described by a JSON object with three following required fields:

As for nodes, edges labels are described by a feature structure with a shortcut for simple labels. Hence an edge label can be:

  1. a string
  2. a JSON object in which all values are strings (this describes a feature structure)

The string case is interpreted as a feature structure with one feature named 1 (to be compatible with complex edges used in UD / SUD encoding, see Graph edges description). The two following codes are equivalent:

JSON encoding of a metadata

The meta data associated with a graph is a JSON object in which all values are strings.

Nodes ordering

The field order must be a list of string, each string being a node identifier.


Examples

The empty graph

The empty graph is described by empty_graph.json:

{ "nodes" : {} }

Encoding of a non linguistic graph

{
  "nodes": {
    "A": "A",
    "B": "B",
    "C": "C"
  },
  "edges": [
    { "src": "A", "label": "X", "tar": "B"},
    { "src": "B", "label": "Y", "tar": "C"},
    { "src": "C", "label": "Z", "tar": "A"}
  ],
  "order": [ "A", "B" ]
}
abc

Encoding of a CoNLL graph

# sent_id = fr-ud-dev_00327
# text = Interview exclusive !
1	Interview	interview	NOUN	_	Gender=Fem|Number=Sing	0	root	_	wordform=interview
2	exclusive	exclusif	ADJ	_	Gender=Fem|Number=Sing	1	amod	_	_
3	!	!	PUNCT	_	_	1	punct	_	_
{
  "meta": { "sent_id": "fr-ud-dev_00327", "text": "Interview exclusive !" },
  "nodes": {
    "0": { "form": "__0__" },
    "1": {
      "Gender": "Fem",
      "Number": "Sing",
      "form": "Interview",
      "lemma": "interview",
      "textform": "Interview",
      "upos": "NOUN",
      "wordform": "interview"
    },
    "2": {
      "Gender": "Fem",
      "Number": "Sing",
      "form": "exclusive",
      "lemma": "exclusif",
      "textform": "exclusive",
      "upos": "ADJ",
      "wordform": "exclusive"
    },
    "3": {
      "form": "!",
      "lemma": "!",
      "textform": "!",
      "upos": "PUNCT",
      "wordform": "!"
    }
  },
  "edges": [
    { "src": "1", "label": "punct", "tar": "3" },
    { "src": "1", "label": "amod", "tar": "2" },
    { "src": "0", "label": "root", "tar": "1" }
  ],
  "order": [ "0", "1", "2", "3" ]
}

⚠️ the feature wordform and textform are set when a CoNLL structure is loaded (see CoNLL-U format).