We have a technical report that explains our question-answer generation system here.
We propose an approach which is generally based on the framework of an ongoing work by A. Sarvaiya. Formally, given a passage , question-answer generation (QAG) system retrieves the most important sentence
from
. Then, QAG system produces a set of question-answer pairs
, where each generated
can be found in
, and its pair
is the interrogative version of
or a clause
from a set of clauses
in
, without
in it. As shown in the figure below, here are four main modules in our QAG system.
-
Preprocessing, which cleans the input passage
from unnecessary characters and shapes it into the desirable form (list of sentences).
-
Sentence Selection, which picks top-
most important sentences
given
. The text summarization method used can be chosen between TextRank, multi-word phrase extraction (MWPE), and latent semantic analysis (LSA). The chosen method ranks the sentences in P and selects the top-
highest ranked sentences as the output.
-
Gap Selection, which selects phrases in
that can be used as answers
based on constituent tree from syntactic parser and named entity recognition (NER).
-
Question Formation, which creates the interrogative version of
or
in
without
to make a question
for each answer in
. The final output of this module is question-answer pairs
related to
.
-
Demonstration with Jupyter Notebook. Check it out!
-
Source code for the web version is in this GitHub repository.
- Export '$STANFORD_PARSER' environment variable using lib path using command
EXPORT STANFORD_PARSER=<ABSOLUTE_PATH>
e.g.EXPORT STANFORD_PARSER=/home/anonymous/qag-web/qag/lib
- Run using command
gunicorn qag:app
- If the program stuck after loading the model. It's because the port already used. This happened usually after you already run the program twice or several times. Please kill the process that runs on the port using
lsof -i
find java, kill the proces PID.