Local installation of Grew-match

Grew-match is available online for a number of corpora (UD, SUD, Parseme…). If you want to use Grew-match on your own corpus, you have to install it locally, following the instructions on this page.

Please report any problems here in case of trouble.

You may also consider using grew_match_quick. This a Python script that automatizes the steps depscribed here.


Step 0: Prerequisites

Follow the Grew installation instructions (steps 1 and 2), in order to install and set up Ocaml & Opam.

Install the required Ocaml libraries:

opam install ssl ocsipersist-dbm fileutils eliom
opam remote add grew "http://opam.grew.fr"
opam install dep2pictlib grew

Step 1: Create a new directory

Create a new directory where all the necessary files and data will be installed. Replace the string __DIR__ with this new directory throughout this documentation, in commands or parameter files.


Step 2: Installing data and corpusbank

For this example, we will install three treebanks in __DIR__/data :

cd __DIR__
mkdir data && cd data
git clone https://github.com/UniversalDependencies/UD_Arabic-PUD.git
git clone https://github.com/UniversalDependencies/UD_French-PUD.git
git clone https://github.com/UniversalDependencies/UD_Spanish-PUD.git

Configuring the corpusbank

The set of corpora to be served is described in a folder called corpusbank which contains JSON files.

cd __DIR__
mkdir corpusbank

In this new folder, add the JSON file pud.json:

[
  {
    "id": "UD_Arabic-PUD",
    "config": "ud",
    "lang": "ar",
    "rtl": true,
    "directory": "__DIR__/data/UD_Arabic-PUD"
  },
  {
    "id": "UD_French-PUD",
    "config": "ud",
    "lang": "fr",
    "directory": "__DIR__/data/UD_French-PUD"
  },
  {
    "id": "UD_Spanish-PUD",
    "config": "ud",
    "lang": "es",
    "directory": "__DIR__/data/UD_Spanish-PUD"
  }

]

Compiling the corpora

The following command compiles all the corpora defined in the corpusbank. It should be run before the first use and each time a corpus is modified.

grew compile -CORPUSBANK __DIR__/corpusbank

Step 3: Install and configure the backend

download the code:

cd __DIR__
git clone https://gitlab.inria.fr/grew/grew_match_back.git

Configuring grew_match_back

In the grew-match_back folder (__DIR__/grew_match_back), edit the file gmb.conf.in (lines 28 to 31) with the real __DIR__ value:

  <LOG>__DIR__/grew_match_back/log</LOG>
  <CORPUSBANK>__DIR__/corpusbank</CORPUSBANK>
  <RESOURCES>__DIR__/data</RESOURCES>
  <STORAGE>__DIR__/grew_match_back/static</STORAGE>

Step 4: Starting the backend

The command below should be run in the background (or in a separate terminal) so that the backend remains available during use.

cd __DIR__/grew_match_back
make GMB_PORT=4758 test.opt

Of course, the port number (4758) can be changed to another value, but it must be the same as the one defined in the instance.json file below.


Step 5: Install and configure the frontend webpage

Download

The code for the main Grew-match website itself is available at gitlab.inria.fr/grew/grew_match:

cd __DIR__
git clone https://gitlab.inria.fr/grew/grew_match.git

Update

cd __DIR__/grew_match
git pull

Configure grew_match

In the grew-match folder (__DIR__/grew_match), add a instance.json file with the following code:

{
	"backend": "http://localhost:4758/",
	"groups": 
	[
		{
			"id": "PUD",
			"mode": "syntax",
			"style": "dropdown",
			"corpora": [
				"UD_Arabic-PUD",
				"UD_French-PUD",
				"UD_Spanish-PUD"
			]
		}
	]
}

Step 5: Start an http server

With Python 3, you can start a web server with the following commands:

cd __DIR__/grew_match
python -m http.server

⚠️ the last command should be kept running while using Grew-match: run it in the background (or in another terminal).

You can check that the URL http://localhost:8000 shows the Grew-match interface.

The PORT 8000 can be changed to another value (e.g. 12345) with the command python -m http.server 12345 to start the server and URL http://localhost:12345 to connect.


Step 6: Grew-match is ready!

You’re done! At the URL http://localhost:8000, you should be able to make a request on your corpora.

Note: Once everything is configured as explained above, you should run the two commands in the background to restart Grew-match:


Going further

After a corpus update

Note: There is no need to restart the Python http server for the frontend.

Web interface configuration

If you have a long list of corpora and want to put them in a pane on the left (like for UD treebanks), change in instance.json the line "style": "dropdown", by "style": "left_pane",.

More complex examples

For larger examples of corpusbank definition: see https://github.com/grew-nlp/corpusbank