Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/add vulcan #88

Open
wants to merge 18 commits into
base: feature/minimalist-parser
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
70 changes: 47 additions & 23 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,52 +2,76 @@

[![Actions Status](https://github.com/UUDigitalHumanitieslab/parseport/workflows/Unit%20tests/badge.svg)](https://github.com/UUDigitalHumanitieslab/parseport/actions)

ParsePort is an interface for the [Spindle](https://github.com/konstantinosKokos/spindle) parser using the [Æthel](https://github.com/konstantinosKokos/aethel) library, both developed by dr. Konstantinos Kogkalidis as part of a research project conducted with prof. dr. Michaël Moortgat at Utrecht University. Other parsers may be added in the future.
ParsePort is a web interface for two NLP-related (natural language processing) parsers and two associated pre-parsed text corpora, both developed at Utrecht University.

1. The [Spindle](https://github.com/konstantinosKokos/spindle) parser is used to produce type-logical parses of Dutch sentences. It features a pre-parsed corpus of around 65.000 sentences (based on [Lassy Small](https://taalmaterialen.ivdnt.org/download/lassy-klein-corpus6/)) called [Æthel](https://github.com/konstantinosKokos/aethel). These tools have been developed by dr. Konstantinos Kogkalidis as part of a research project conducted with prof. dr. Michaël Moortgat at Utrecht University.

2. The Minimalist Parser produces syntactic tree models of English sentences based on user input, creating syntax trees in the style of [Chomskyan Minimalist Grammar](https://en.wikipedia.org/wiki/Minimalist_program). The parser has been developed by dr. Meaghan Fowlie at Utrecht University and comes with a pre-parsed corpus of 100 sentences taken from the Wall Street Journal. The tool used to visualize these syntax trees in an interactive way is Vulcan, developed by dr. Jonas Groschwitz, also at Utrecht University.

## Running this application in Docker

In order to run this application you need a working installation of Docker and an internet connection. You will also need the source code from two other repositories, `spindle-server` and `latex-service` to be present in the same directory as the `parseport` source code.
In order to run this application you need a working installation of Docker and an internet connection. You will also need the source code from four other repositories. These must be located in the same directory as the `parseport` source code.

1. [`spindle-server`](https://github.com/CentreForDigitalHumanities/spindle-server) hosts the source code for a server with the Spindle parser;
2. [`latex-service`](https://github.com/CentreForDigitalHumanities/latex-service) contains a LaTeX compiler that is used to export the Spindle parse results in PDF format;
3. [`mg-parser-server`](https://github.com/CentreForDigitalHumanities/mg-parser-server) has the source code for the Minimalist Grammar parser;
4. [`vulcan-parseport`](https://github.com/CentreForDigitalHumanities/vulcan-parseport) is needed for the websocket-based webserver that hosts Vulcan, the visualization tool for MGParser parse results.

See the instructions in the README files of these repositories for more information on these codebases.

In addition, you need to add a configuration file named `.env` to the root directory of this project with at least the following setting.

```
```conf
DJANGO_SECRET_KEY=...
```

In overview, your file structure should be as follows.

```
┌── parseport (this project)
| ├── compose.yaml
| ├── .env
| ├── frontend
| | └── Dockerfile
| └── backend
| ├── Dockerfile
| └── aethel_db
| └── data
| └── aethel.pickle
|
├── spindle-server
| ── Dockerfile
| ── Dockerfile
| └── model_weights.pt
|
├── latex-service
| └── Dockerfile
|
└── parseport (this project)
├── compose.yaml
├── .env
├── frontend
| └── Dockerfile
└── backend
├── Dockerfile
└── aethel.pickle
├── mg-parser-server
| └── Dockerfile
|
└── vulcan-parseport
├── Dockerfile
└── app
└── standard.pickle
```

Note that you will need two data files in order to run this project.
Note that you will need three data files in order to run this project.

- `model_weights.pt` should be put in the root directory of the `spindle-server` project. It can be downloaded from _Yoda-link here_.
- `aethel.pickle` should live at `parseport/backend/`. You can find it in the zip archive [here](https://github.com/konstantinosKokos/aethel/tree/stable/data).
- `aethel.pickle` contains the pre-parsed data for Æthel and should live at `parseport/backend/aethel_db/data`. You can find it in the zip archive [here](https://github.com/konstantinosKokos/aethel/tree/stable/data).
- `standard.pickle` contains the pre-parsed corpus for the Minimalist Parser. It should be placed in the `vulcan-parseport/app` directory. You can download it from _Yoda-link here_.

This application can be run in both `production` and `development` mode. Either mode will start a network of five containers.
This application can be run in both `production` and `development` mode. Either mode will start a network of seven containers.

| Name | Description |
|--------------|---------------------------------------------------|
| `nginx` | Entry point and reverse proxy, exposes port 5000. |
| `pp-ng` | The frontend server (Angular). |
| `pp-dj` | The backend/API server (Django). |
| `pp-spindle` | The server hosting the Spindle parser. |
| `pp-latex` | The server hosting a LaTeX compiler. |
| Name | Description |
|-------------------|---------------------------------------------------|
| `nginx` | Entry point and reverse proxy, exposes port 5000. |
| `pp-ng` | The frontend server (Angular). |
| `pp-dj` | The backend/API server (Django). |
| `pp-spindle` | The server hosting the Spindle parser. |
| `pp-latex` | The server hosting a LaTeX compiler. |
| `pp-mg-parser` | The server hosting the Minimalist Grammar parser. |
| `pp-vulcan` | The server hosting the Vulcan visualization tool. |

Start the Docker network in **development mode** by running the following command in your terminal.

Expand All @@ -61,7 +85,7 @@ For **production mode**, run the following instead.
docker compose --profile prod up --build -d
```

The Spindle server needs to download several files before the parser is ready to receive. You should wait a few minutes until the message *App is ready!* appears in the Spindle container logs.
The Spindle server needs to download several files before the parser is ready to receive input. You should wait a few minutes until the message *App is ready!* appears in the Spindle container logs.

Open your browser and visit your project at http://localhost:5000 to view the application.

Expand Down
4 changes: 2 additions & 2 deletions backend/minimalist_parser/urls.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
from django.urls import path

from minimalist_parser.views.parse import MPParseView
from minimalist_parser.views.parse import MGParserView

urlpatterns = [path("parse", MPParseView.as_view(), name="mp-parse")]
urlpatterns = [path("parse", MGParserView.as_view(), name="mp-parse")]
110 changes: 90 additions & 20 deletions backend/minimalist_parser/views/parse.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
import pickle
import base64
import json
from typing import Optional
from dataclasses import dataclass
Expand All @@ -10,71 +10,141 @@

from parseport.http_client import http_client
from parseport.logger import logger
from uuid import uuid4


class MGParserErrorSource(Enum):
INPUT = "input"
MG_PARSER = "mg_parser"
GENERAL = "general"
VULCAN = "vulcan"


@dataclass
class MGParserResponse:
ok: Optional[bool] = None
error: Optional[MGParserErrorSource] = None
id: Optional[str] = None

def json_response(self) -> JsonResponse:
return JsonResponse(
{
"ok": self.ok or False,
"error": getattr(self, "error", None),
"error": self.error.value if self.error else None,
"id": getattr(self, "id", None),
},
status=400 if self.error else 200,
)


# Create your views here.
class MPParseView(View):
class MGParserView(View):
def post(self, request: HttpRequest) -> HttpResponse:
data = self.read_request(request)
"""
Expects a POST request with a JSON body containing an English string to
be parsed, in the following format:
{
"input": str
}

The input is sent to the parser, and an ID is generated if the parse is
successful. The parse results are forwarded to the Vulcan server, where
a visualisation is created ahead of time. The ID is returned to the
client, which can be used to redirect the client to the Vulcan page
where the visualisation is displayed.

Returns a JSON response with the following format:
{
"id": str | None,
"error": str | None
}
"""
data = self.validate_input(request)
if data is None:
logger.warning("Failed to validate user input.")
return MGParserResponse(error=MGParserErrorSource.INPUT).json_response()

parsed = self.send_to_parser(data)
logger.info("User input validated. Sending to parser...")

if parsed is None:
parsed_binary = self.send_to_parser(data)

if parsed_binary is None:
logger.warning("Failed to parse input: %s", data)
return MGParserResponse(error=MGParserErrorSource.MG_PARSER).json_response()

# TODO: send parsed data to Vulcan.
logger.info("Parse successful. Sending to Vulcan...")
parse_id = self.generate_parse_id()

vulcan_response = self.send_to_vulcan(parsed_binary, parse_id)
if vulcan_response is None:
return MGParserResponse(error=MGParserErrorSource.VULCAN).json_response()

logger.info("Vulcan response received. Returning ID to client...")

return MGParserResponse(ok=True).json_response()
return MGParserResponse(id=parse_id).json_response()

def generate_parse_id(self) -> str:
"""Generate a unique, URL-safe ID for the current request."""
return str(uuid4()).replace("-", "")

def send_to_vulcan(self, parsed_data: bytes, id: str) -> Optional[dict]:
"""
Send request to downstream Vulcan server.
"""
try:
base64_encoded = base64.b64encode(parsed_data).decode("utf-8")
except Exception as e:
logger.warning("Failed to base64 encode parsed data: %s", e)
return None

vulcan_response = http_client.request(
method="POST",
url=settings.VULCAN_URL,
body=json.dumps(
{
"parse_data": base64_encoded,
"id": id,
}
),
headers={"Content-Type": "application/json"},
)

def send_to_parser(self, text: str) -> Optional[str]:
if vulcan_response.status != 200:
logger.warning(
"Received non-200 response from Vulcan server for input %s", parsed_data
)
return None

try:
json_response = vulcan_response.json()
except json.JSONDecodeError:
logger.warning("Received non-JSON response from Vulcan server")
return None

return json_response

def send_to_parser(self, input_string: str) -> Optional[bytes]:
"""Send request to downstream MG Parser"""
mg_parser_response = http_client.request(
method="POST",
url=settings.MINIMALIST_PARSER_URL,
body=json.dumps({"input": text}),
body=json.dumps({"input": input_string}),
headers={"Content-Type": "application/json"},
)

if mg_parser_response.status != 200:
logger.warning(
"Received non-200 response from MG Parser server for input %s", text
"Received non-200 response from MG Parser server for input %s",
input_string,
)
return None

# Parse as pickle
try:
response_body = mg_parser_response.data
parsed_data = pickle.loads(response_body)
except pickle.UnpicklingError:
logger.warning("Received non-pickle response from MG Parser server")
parsed_data = mg_parser_response.data
if type(parsed_data) != bytes:
logger.warning("Received non-bytes response from MG Parser server")
return None

return parsed_data

def read_request(self, request: HttpRequest) -> Optional[str]:
def validate_input(self, request: HttpRequest) -> Optional[str]:
"""Read and validate the HTTP request received from the frontend"""
request_body = request.body.decode("utf-8")

Expand Down
1 change: 1 addition & 0 deletions backend/parseport/settings.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@
"corsheaders",
"aethel_db",
"minimalist_parser",
"vulcan",
]

MIDDLEWARE = [
Expand Down
1 change: 1 addition & 0 deletions backend/parseport/urls.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,4 +44,5 @@
namespace="rest_framework",
),
),
path("vulcan/", include("vulcan.urls")),
]
4 changes: 1 addition & 3 deletions backend/parseport/views.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,6 @@ def get(self, request):
aethel=aethel_status(),
spindle=status_check('spindle'),
mp=status_check('minimalist_parser'),
vulcan=True,
# When Vulcan is up and running, uncomment the following line.
# vulcan=status_check('vulcan'),
vulcan=status_check('vulcan'),
)
)
1 change: 0 additions & 1 deletion backend/src/aethel
Submodule aethel deleted from 41eab8
3 changes: 3 additions & 0 deletions backend/vulcan/admin.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
from django.contrib import admin

# Register your models here.
6 changes: 6 additions & 0 deletions backend/vulcan/apps.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
from django.apps import AppConfig


class VulcanConfig(AppConfig):
default_auto_field = 'django.db.models.BigAutoField'
name = 'vulcan'
Empty file.
3 changes: 3 additions & 0 deletions backend/vulcan/models.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
from django.db import models

# Create your models here.
46 changes: 46 additions & 0 deletions backend/vulcan/templates/vulcan/index.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
<!DOCTYPE html>
<html>
<head>

<title>Graph visualization</title>

<link rel="stylesheet" type="text/css" href="/vulcan-static/style.css">
<script src="https://cdnjs.cloudflare.com/ajax/libs/socket.io/4.4.1/socket.io.min.js"></script>
<script type=text/javascript src="https://d3js.org/d3.v5.min.js"></script>

</head>

<body>

<h2>VULCAN visualization</h2>
<div id="headerId">
<button id="previousButton">&lt;Previous</button>
<button id="nextButton">Next&gt;</button>
<!--input text field with width 50 and right alignment-->
<!--restricting to numbers as per
https://stackoverflow.com/questions/13952686/how-to-make-html-input-tag-only-accept-numerical-values
(somewhere a bit down)-->
<input style="text-align: right" type="text" size="5" id="corpusPositionInput" value="0"
onkeypress="
return (event.charCode !=8 && event.charCode ==0 || (event.charCode >= 48 && event.charCode <= 57))">
<div style="display: inline;" id="corpusPositionText"></div>
<button id="searchButton" style="margin-left: 30px">Search</button>
<button id="clearSearchButton" style="margin-left: 5px">Clear Search</button>
</div>

<div id="chartId"></div>



<script type="application/javascript" src="/vulcan-static/definitions.js"></script>
<script type="application/javascript" src="/vulcan-static/node.js"></script>
<script type="application/javascript" src="/vulcan-static/graph.js"></script>
<script type="application/javascript" src="/vulcan-static/table.js"></script>
<script type="application/javascript" src="/vulcan-static/label_alternatives.js"></script>
<script type="application/javascript" src="/vulcan-static/mouseover_texts.js"></script>
<script type="application/javascript" src="/vulcan-static/search.js"></script>
<script type="application/javascript" src="/vulcan-static/baseScript.js"></script>


</body>
</html>
3 changes: 3 additions & 0 deletions backend/vulcan/tests.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
from django.test import TestCase

# Create your tests here.
7 changes: 7 additions & 0 deletions backend/vulcan/urls.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
from django.urls import path, re_path

from vulcan.views import VulcanView

urlpatterns = [
re_path(r'^(?P<id>\w+)?$', VulcanView.as_view(), name='vulcan'),
]
7 changes: 7 additions & 0 deletions backend/vulcan/views.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
from django.shortcuts import render
from django.views import View

# Create your views here.
class VulcanView(View):
def get(self, request, *args, **kwargs):
return render(request, 'vulcan/index.html')
Loading