grewpy Tutorial

grewpy library: Graph module

Download the notebook here.

First, we import the Graph module from grewpy.

NB: The port number is different at each execution. If you don’t have the message connected to port: …, see here.

from grewpy import Graph

connected to port: 55453


Build a graph

A graph can be built from its JSON encoding (see here for more information on this format).

g1_str = """
{
"nodes": {
"A": "A",
"B": "B",
"C": "C"
},
"edges": [
{ "src": "A", "label": "X", "tar": "B"},
{ "src": "A", "label": "XX", "tar": "B"},
{ "src": "B", "label": "Y", "tar": "C"},
{ "src": "C", "label": "Z", "tar": "A"}
],
"order": [ "A", "B" ]
}
"""
g1 = Graph(g1_str)


A graph can also be built from CoNLL data.

g2_conll = """# sent_id = en_partut-ud-202
# text = The work is done.
1	The	the	DET	RD	Definite=Def|PronType=Art	2	det	_	_
2	work	work	NOUN	S	Number=Sing	4	nsubj:pass	_	_
3	is	be	AUX	VA	Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin	4	aux:pass	_	_
4	done	do	VERB	V	Tense=Past|VerbForm=Part	0	root	_	SpaceAfter=No
5	.	.	PUNCT	FS	_	4	punct	_	_

"""
g2 = Graph(g2_conll)


Functions on graphs

The length (len) of a graph is the number of nodes. Note that when a graph is built from CoNLL data an anchor node is added at position 0, that’s why len(g2) is 6 and not 5.

len (g1), len(g2)

(3, 6)

print (g2.to_conll())

# sent_id = en_partut-ud-202
# text = The work is done.
1	The	the	DET	RD	Definite=Def|PronType=Art	2	det	_	_
2	work	work	NOUN	S	Number=Sing	4	nsubj:pass	_	_
3	is	be	AUX	VA	Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin	4	aux:pass	_	_
4	done	do	VERB	V	Tense=Past|VerbForm=Part	0	root	_	SpaceAfter=No
5	.	.	PUNCT	FS	_	4	punct	_	_

print (g2.to_dot())

digraph G {
node [shape=box];
N_5 [label=<<TABLE BORDER="0" CELLBORDER="0" CELLSPACING="0">
<TR><TD COLSPAN="3"><B>.</B></TD></TR>
<TR><TD ALIGN="right">upos</TD><TD>=</TD><TD ALIGN="left">PUNCT</TD></TR>
<TR><TD ALIGN="right">lemma</TD><TD>=</TD><TD ALIGN="left">.</TD></TR>
<TR><TD ALIGN="right">xpos</TD><TD>=</TD><TD ALIGN="left">FS</TD></TR>
<TR><TD ALIGN="right">textform</TD><TD>=</TD><TD ALIGN="left">.</TD></TR>
<TR><TD ALIGN="right">wordform</TD><TD>=</TD><TD ALIGN="left">.</TD></TR>
</TABLE>
>]
N_4 [label=<<TABLE BORDER="0" CELLBORDER="0" CELLSPACING="0">
<TR><TD COLSPAN="3"><B>done</B></TD></TR>
<TR><TD ALIGN="right">upos</TD><TD>=</TD><TD ALIGN="left">VERB</TD></TR>
<TR><TD ALIGN="right">lemma</TD><TD>=</TD><TD ALIGN="left">do</TD></TR>
<TR><TD ALIGN="right">xpos</TD><TD>=</TD><TD ALIGN="left">V</TD></TR>
<TR><TD ALIGN="right">SpaceAfter</TD><TD>=</TD><TD ALIGN="left">No</TD></TR>
<TR><TD ALIGN="right">Tense</TD><TD>=</TD><TD ALIGN="left">Past</TD></TR>
<TR><TD ALIGN="right">VerbForm</TD><TD>=</TD><TD ALIGN="left">Part</TD></TR>
<TR><TD ALIGN="right">textform</TD><TD>=</TD><TD ALIGN="left">done</TD></TR>
<TR><TD ALIGN="right">wordform</TD><TD>=</TD><TD ALIGN="left">done</TD></TR>
</TABLE>
>]
N_3 [label=<<TABLE BORDER="0" CELLBORDER="0" CELLSPACING="0">
<TR><TD COLSPAN="3"><B>is</B></TD></TR>
<TR><TD ALIGN="right">upos</TD><TD>=</TD><TD ALIGN="left">AUX</TD></TR>
<TR><TD ALIGN="right">lemma</TD><TD>=</TD><TD ALIGN="left">be</TD></TR>
<TR><TD ALIGN="right">xpos</TD><TD>=</TD><TD ALIGN="left">VA</TD></TR>
<TR><TD ALIGN="right">Mood</TD><TD>=</TD><TD ALIGN="left">Ind</TD></TR>
<TR><TD ALIGN="right">Number</TD><TD>=</TD><TD ALIGN="left">Sing</TD></TR>
<TR><TD ALIGN="right">Person</TD><TD>=</TD><TD ALIGN="left">3</TD></TR>
<TR><TD ALIGN="right">Tense</TD><TD>=</TD><TD ALIGN="left">Pres</TD></TR>
<TR><TD ALIGN="right">VerbForm</TD><TD>=</TD><TD ALIGN="left">Fin</TD></TR>
<TR><TD ALIGN="right">textform</TD><TD>=</TD><TD ALIGN="left">is</TD></TR>
<TR><TD ALIGN="right">wordform</TD><TD>=</TD><TD ALIGN="left">is</TD></TR>
</TABLE>
>]
N_2 [label=<<TABLE BORDER="0" CELLBORDER="0" CELLSPACING="0">
<TR><TD COLSPAN="3"><B>work</B></TD></TR>
<TR><TD ALIGN="right">upos</TD><TD>=</TD><TD ALIGN="left">NOUN</TD></TR>
<TR><TD ALIGN="right">lemma</TD><TD>=</TD><TD ALIGN="left">work</TD></TR>
<TR><TD ALIGN="right">xpos</TD><TD>=</TD><TD ALIGN="left">S</TD></TR>
<TR><TD ALIGN="right">Number</TD><TD>=</TD><TD ALIGN="left">Sing</TD></TR>
<TR><TD ALIGN="right">textform</TD><TD>=</TD><TD ALIGN="left">work</TD></TR>
<TR><TD ALIGN="right">wordform</TD><TD>=</TD><TD ALIGN="left">work</TD></TR>
</TABLE>
>]
N_1 [label=<<TABLE BORDER="0" CELLBORDER="0" CELLSPACING="0">
<TR><TD COLSPAN="3"><B>The</B></TD></TR>
<TR><TD ALIGN="right">upos</TD><TD>=</TD><TD ALIGN="left">DET</TD></TR>
<TR><TD ALIGN="right">lemma</TD><TD>=</TD><TD ALIGN="left">the</TD></TR>
<TR><TD ALIGN="right">xpos</TD><TD>=</TD><TD ALIGN="left">RD</TD></TR>
<TR><TD ALIGN="right">Definite</TD><TD>=</TD><TD ALIGN="left">Def</TD></TR>
<TR><TD ALIGN="right">PronType</TD><TD>=</TD><TD ALIGN="left">Art</TD></TR>
<TR><TD ALIGN="right">textform</TD><TD>=</TD><TD ALIGN="left">The</TD></TR>
<TR><TD ALIGN="right">wordform</TD><TD>=</TD><TD ALIGN="left">The</TD></TR>
</TABLE>
>]
N_0 [label=<<TABLE BORDER="0" CELLBORDER="0" CELLSPACING="0">
<TR><TD COLSPAN="3"><B>__0__</B></TD></TR>
</TABLE>
>]
N_0 -> N_4[label="root", ];
N_2 -> N_1[label="det", ];
N_4 -> N_2[label="nsubj:pass", ];
N_4 -> N_3[label="aux:pass", ];
N_4 -> N_5[label="punct", ];
{ rank=same; N_0; N_1; }
N_0 -> N_1 [label="SUCC", style=dotted, fontcolor=white, color=white];
{ rank=same; N_1; N_2; }
N_1 -> N_2 [label="SUCC", style=dotted, fontcolor=white, color=white];
{ rank=same; N_2; N_3; }
N_2 -> N_3 [label="SUCC", style=dotted, fontcolor=white, color=white];
{ rank=same; N_3; N_4; }
N_3 -> N_4 [label="SUCC", style=dotted, fontcolor=white, color=white];
{ rank=same; N_4; N_5; }
N_4 -> N_5 [label="SUCC", style=dotted, fontcolor=white, color=white];
}

import json
print (json.dumps(g1.json_data(), indent=2))

{
"nodes": {
"A": "A",
"B": "B",
"C": "C"
},
"edges": [
{
"src": "A",
"label": "X",
"tar": "B"
},
{
"src": "A",
"label": "XX",
"tar": "B"
},
{
"src": "B",
"label": "Y",
"tar": "C"
},
{
"src": "C",
"label": "Z",
"tar": "A"
}
],
"order": [
"A",
"B"
],
"meta": {}
}


Internal representation of graphs

Internally a graph is encoded with four elements:

• a dict features which maps each node identifier to either a string or a dictionary encoding the feature structure of the node
• a dict sucs which maps each node indentifier to a list of outgoing edges, each edge is a pair with the target node and the edge label
• a list named order which describes the list of strictly ordered nodes
• a dict meta which describes the meta data of the graphs (keys and values are strings)

features

The features dictionary is the one get by default when accessing a graph. The two expressions above are equal:

g1["A"], g1.features["A"]

('A', 'A')


For simple graphs as above, a feature is a only a string but when there is a more complex feature structure, it is a dict:

g2["2"]

{'Number': 'Sing',
'form': 'work',
'lemma': 'work',
'textform': 'work',
'upos': 'NOUN',
'wordform': 'work',
'xpos': 'S'}


sucs

Each node is given a list of successors described by pairs of the target nodes and the edges label. Edge labels are dictionaries (see here for details on edge label encoding.)

g1.sucs["A"]

[('B', Fs_edge({'1': 'X'})), ('B', Fs_edge({'1': 'XX'}))]

g2.sucs["4"]

[('5', Fs_edge({'1': 'punct'})),
('3', Fs_edge({'1': 'aux', '2': 'pass'})),
('2', Fs_edge({'1': 'nsubj', '2': 'pass'}))]


Note that a node with no successor is not defined in the sucs dictionary.

"3" in g2.sucs

False


Use the get function to avoid KeyError and safely get the successors:

g2.sucs.get("3", [])

[]