{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "```\n", "title: \"Grewpy • graph\"\n", "date: 2023-08-14\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[`grewpy` Tutorial](../tutorial)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# `grewpy` library: Graph module\n", "\n", "Download the notebook [here](../graph.ipynb)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, we import the `Graph` module from `grewpy`.\n", "\n", "**NB:** The port number is different at each execution. If you don't have the message `connected to port: …`, see [here](../../usage/python/#install)." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "execution": { "iopub.execute_input": "2024-08-11T16:53:03.290947Z", "iopub.status.busy": "2024-08-11T16:53:03.290614Z", "iopub.status.idle": "2024-08-11T16:53:03.559676Z", "shell.execute_reply": "2024-08-11T16:53:03.539982Z" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "connected to port: 55453\n" ] } ], "source": [ "from grewpy import Graph" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Build a graph\n", "\n", "A graph can be built from its JSON encoding (see [here](../../doc/json) for more information on this format)." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "execution": { "iopub.execute_input": "2024-08-11T16:53:03.565538Z", "iopub.status.busy": "2024-08-11T16:53:03.564864Z", "iopub.status.idle": "2024-08-11T16:53:03.570382Z", "shell.execute_reply": "2024-08-11T16:53:03.569662Z" } }, "outputs": [], "source": [ "g1_str = \"\"\"\n", "{\n", " \"nodes\": {\n", " \"A\": \"A\",\n", " \"B\": \"B\",\n", " \"C\": \"C\"\n", " },\n", " \"edges\": [\n", " { \"src\": \"A\", \"label\": \"X\", \"tar\": \"B\"},\n", " { \"src\": \"A\", \"label\": \"XX\", \"tar\": \"B\"},\n", " { \"src\": \"B\", \"label\": \"Y\", \"tar\": \"C\"},\n", " { \"src\": \"C\", \"label\": \"Z\", \"tar\": \"A\"}\n", " ],\n", " \"order\": [ \"A\", \"B\" ]\n", "}\n", "\"\"\"\n", "g1 = Graph(g1_str)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A graph can also be built from CoNLL data." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "execution": { "iopub.execute_input": "2024-08-11T16:53:03.574447Z", "iopub.status.busy": "2024-08-11T16:53:03.574131Z", "iopub.status.idle": "2024-08-11T16:53:03.581554Z", "shell.execute_reply": "2024-08-11T16:53:03.580774Z" } }, "outputs": [], "source": [ "g2_conll = \"\"\"# sent_id = en_partut-ud-202\n", "# text = The work is done.\n", "1\tThe\tthe\tDET\tRD\tDefinite=Def|PronType=Art\t2\tdet\t_\t_\n", "2\twork\twork\tNOUN\tS\tNumber=Sing\t4\tnsubj:pass\t_\t_\n", "3\tis\tbe\tAUX\tVA\tMood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin\t4\taux:pass\t_\t_\n", "4\tdone\tdo\tVERB\tV\tTense=Past|VerbForm=Part\t0\troot\t_\tSpaceAfter=No\n", "5\t.\t.\tPUNCT\tFS\t_\t4\tpunct\t_\t_\n", "\n", "\"\"\"\n", "g2 = Graph(g2_conll)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Functions on graphs\n", "\n", "The length (`len`) of a graph is the number of nodes.\n", "Note that when a graph is built from CoNLL data an *anchor* node is added at position 0, that's why `len(g2)` is 6 and not 5." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "execution": { "iopub.execute_input": "2024-08-11T16:53:03.588021Z", "iopub.status.busy": "2024-08-11T16:53:03.587425Z", "iopub.status.idle": "2024-08-11T16:53:03.596695Z", "shell.execute_reply": "2024-08-11T16:53:03.595863Z" } }, "outputs": [ { "data": { "text/plain": [ "(3, 6)" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len (g1), len(g2)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "execution": { "iopub.execute_input": "2024-08-11T16:53:03.601587Z", "iopub.status.busy": "2024-08-11T16:53:03.601218Z", "iopub.status.idle": "2024-08-11T16:53:03.606862Z", "shell.execute_reply": "2024-08-11T16:53:03.606118Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "# sent_id = en_partut-ud-202\n", "# text = The work is done.\n", "1\tThe\tthe\tDET\tRD\tDefinite=Def|PronType=Art\t2\tdet\t_\t_\n", "2\twork\twork\tNOUN\tS\tNumber=Sing\t4\tnsubj:pass\t_\t_\n", "3\tis\tbe\tAUX\tVA\tMood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin\t4\taux:pass\t_\t_\n", "4\tdone\tdo\tVERB\tV\tTense=Past|VerbForm=Part\t0\troot\t_\tSpaceAfter=No\n", "5\t.\t.\tPUNCT\tFS\t_\t4\tpunct\t_\t_\n", "\n" ] } ], "source": [ "print (g2.to_conll())" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "execution": { "iopub.execute_input": "2024-08-11T16:53:03.611519Z", "iopub.status.busy": "2024-08-11T16:53:03.610989Z", "iopub.status.idle": "2024-08-11T16:53:03.616022Z", "shell.execute_reply": "2024-08-11T16:53:03.615332Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "digraph G {\n", " node [shape=box];\n", " N_5 [label=<\n", "\n", "\n", "\n", "\n", "\n", "\n", "
.
upos=PUNCT
lemma=.
xpos=FS
textform=.
wordform=.
\n", ">]\n", " N_4 [label=<\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
done
upos=VERB
lemma=do
xpos=V
SpaceAfter=No
Tense=Past
VerbForm=Part
textform=done
wordform=done
\n", ">]\n", " N_3 [label=<\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
is
upos=AUX
lemma=be
xpos=VA
Mood=Ind
Number=Sing
Person=3
Tense=Pres
VerbForm=Fin
textform=is
wordform=is
\n", ">]\n", " N_2 [label=<\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
work
upos=NOUN
lemma=work
xpos=S
Number=Sing
textform=work
wordform=work
\n", ">]\n", " N_1 [label=<\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
The
upos=DET
lemma=the
xpos=RD
Definite=Def
PronType=Art
textform=The
wordform=The
\n", ">]\n", " N_0 [label=<\n", "\n", "
__0__
\n", ">]\n", " N_0 -> N_4[label=\"root\", ];\n", " N_2 -> N_1[label=\"det\", ];\n", " N_4 -> N_2[label=\"nsubj:pass\", ];\n", " N_4 -> N_3[label=\"aux:pass\", ];\n", " N_4 -> N_5[label=\"punct\", ];\n", " { rank=same; N_0; N_1; }\n", " N_0 -> N_1 [label=\"SUCC\", style=dotted, fontcolor=white, color=white];\n", " { rank=same; N_1; N_2; }\n", " N_1 -> N_2 [label=\"SUCC\", style=dotted, fontcolor=white, color=white];\n", " { rank=same; N_2; N_3; }\n", " N_2 -> N_3 [label=\"SUCC\", style=dotted, fontcolor=white, color=white];\n", " { rank=same; N_3; N_4; }\n", " N_3 -> N_4 [label=\"SUCC\", style=dotted, fontcolor=white, color=white];\n", " { rank=same; N_4; N_5; }\n", " N_4 -> N_5 [label=\"SUCC\", style=dotted, fontcolor=white, color=white];\n", "}\n", "\n" ] } ], "source": [ "print (g2.to_dot())" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "execution": { "iopub.execute_input": "2024-08-11T16:53:03.620240Z", "iopub.status.busy": "2024-08-11T16:53:03.619912Z", "iopub.status.idle": "2024-08-11T16:53:03.624170Z", "shell.execute_reply": "2024-08-11T16:53:03.623460Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{\n", " \"nodes\": {\n", " \"A\": \"A\",\n", " \"B\": \"B\",\n", " \"C\": \"C\"\n", " },\n", " \"edges\": [\n", " {\n", " \"src\": \"A\",\n", " \"label\": \"X\",\n", " \"tar\": \"B\"\n", " },\n", " {\n", " \"src\": \"A\",\n", " \"label\": \"XX\",\n", " \"tar\": \"B\"\n", " },\n", " {\n", " \"src\": \"B\",\n", " \"label\": \"Y\",\n", " \"tar\": \"C\"\n", " },\n", " {\n", " \"src\": \"C\",\n", " \"label\": \"Z\",\n", " \"tar\": \"A\"\n", " }\n", " ],\n", " \"order\": [\n", " \"A\",\n", " \"B\"\n", " ],\n", " \"meta\": {}\n", "}\n" ] } ], "source": [ "import json\n", "print (json.dumps(g1.json_data(), indent=2))\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Internal representation of graphs\n", "\n", "Internally a graph is encoded with four elements:\n", " - a dict `features` which maps each node identifier to either a string or a dictionary encoding the feature structure of the node\n", " - a dict `sucs` which maps each node indentifier to a list of outgoing edges, each edge is a pair with the target node and the edge label\n", " - a list named `order` which describes the list of strictly ordered nodes\n", " - a dict `meta` which describes the meta data of the graphs (keys and values are strings)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### `features`\n", "The `features` dictionary is the one get by default when accessing a graph.\n", "The two expressions above are equal:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "execution": { "iopub.execute_input": "2024-08-11T16:53:03.631021Z", "iopub.status.busy": "2024-08-11T16:53:03.630639Z", "iopub.status.idle": "2024-08-11T16:53:03.637382Z", "shell.execute_reply": "2024-08-11T16:53:03.636310Z" } }, "outputs": [ { "data": { "text/plain": [ "('A', 'A')" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "g1[\"A\"], g1.features[\"A\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For simple graphs as above, a *feature* is a only a string but when there is a more complex feature structure, it is a dict:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "execution": { "iopub.execute_input": "2024-08-11T16:53:03.642712Z", "iopub.status.busy": "2024-08-11T16:53:03.642291Z", "iopub.status.idle": "2024-08-11T16:53:03.646653Z", "shell.execute_reply": "2024-08-11T16:53:03.645702Z" } }, "outputs": [ { "data": { "text/plain": [ "{'Number': 'Sing',\n", " 'form': 'work',\n", " 'lemma': 'work',\n", " 'textform': 'work',\n", " 'upos': 'NOUN',\n", " 'wordform': 'work',\n", " 'xpos': 'S'}" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "g2[\"2\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### `sucs`\n", "\n", "Each node is given a list of *successors* described by pairs of the target nodes and the edges label. \n", "Edge labels are dictionaries (see [here](../../doc/graph/#edges) for details on edge label encoding.)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "execution": { "iopub.execute_input": "2024-08-11T16:53:03.652236Z", "iopub.status.busy": "2024-08-11T16:53:03.651445Z", "iopub.status.idle": "2024-08-11T16:53:03.657030Z", "shell.execute_reply": "2024-08-11T16:53:03.656237Z" } }, "outputs": [ { "data": { "text/plain": [ "[('B', Fs_edge({'1': 'X'})), ('B', Fs_edge({'1': 'XX'}))]" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "g1.sucs[\"A\"]" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "execution": { "iopub.execute_input": "2024-08-11T16:53:03.664118Z", "iopub.status.busy": "2024-08-11T16:53:03.663556Z", "iopub.status.idle": "2024-08-11T16:53:03.670007Z", "shell.execute_reply": "2024-08-11T16:53:03.669352Z" } }, "outputs": [ { "data": { "text/plain": [ "[('5', Fs_edge({'1': 'punct'})),\n", " ('3', Fs_edge({'1': 'aux', '2': 'pass'})),\n", " ('2', Fs_edge({'1': 'nsubj', '2': 'pass'}))]" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "g2.sucs[\"4\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that a node with no successor is not defined in the `sucs` dictionary." ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "execution": { "iopub.execute_input": "2024-08-11T16:53:03.681641Z", "iopub.status.busy": "2024-08-11T16:53:03.681294Z", "iopub.status.idle": "2024-08-11T16:53:03.688027Z", "shell.execute_reply": "2024-08-11T16:53:03.687247Z" } }, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\"3\" in g2.sucs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Use the `get` function to avoid `KeyError` and safely get the successors:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "execution": { "iopub.execute_input": "2024-08-11T16:53:03.698896Z", "iopub.status.busy": "2024-08-11T16:53:03.698319Z", "iopub.status.idle": "2024-08-11T16:53:03.704262Z", "shell.execute_reply": "2024-08-11T16:53:03.703586Z" } }, "outputs": [ { "data": { "text/plain": [ "[]" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "g2.sucs.get(\"3\", [])" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.5" } }, "nbformat": 4, "nbformat_minor": 2 }