Skip to content

Commit

Permalink
Add literature review module and citation screening functionality (#171
Browse files Browse the repository at this point in the history
…) (#172)

* feat(citation screening): add citation screening project stub

* docs(citation screening): update README.md

* docs(citation screening): add django project template

* fix(cso classifier): fix keywords parameter in CSO classifier

* feat(utils): add Author dataclass, update Article fields

- add citations and references count to Article

* feat(cso classifier): add CSO classifier for single article

* feat(SemanticScholar search): add function for searching with S2 API

* docs(SemanticScholar search): update type annotations

* feat(frontend): display new fields on the result page

* feat(search): update aminer search with new fields

* feat(CORE search): add CORE API search function

* feat(CSO concepts): CSO concepts for empty docs classified on page load

- add API call for article card that runs CSO classifier on articles without CSO concepts

* style: style with black

* feat(CSO classifier): CSO concepts are classified only after clicking 'CSO keywords' button

- significant search speedup

* feat(CSO classifier search): CSO classifier API works on both titles and abstracts

* fix(CORE search): fix URL in CORE search API call

* feat(pdf icon): add pdf icon

* style: style with black

* fix(frontend): remove unnecessary div

* feat(search): speed up search by naive concurrency

- avg search time drops from 5s to 3s

* fix(search): fix order of search request

* docs(citation screening): add simple screening django model

* fix(imports): fix relative package imports

* feat(postgresql): add readme and requirements for running postgresql database

* feat(frontend): add item in navbar with link to user profile

* feat(postgresql): add django database configuration with postgresql

* feat(literature review): add new django project with database routers

- literature review is connected to a postgressql database

* feat(literature review): first attempt to create new literature review

- basic form with title and description

* feat(literature review): register models in django admin

* feat(literature review): add review fields, on save search semantic scholar

* feat(literature review): add main literature review subpage

* feat(literature review): display all literature reviews in table

* feat(literature review): connects literature review with user

- needed to remove sqlite database as django doesn't allow relations
between multiple databases (connecting User with LiteratureReview)

BREAKING CHANGE: removes sqlite database, requires postgresql database
from now

* refactor(literature review): remove old sqlite connection code

* feat(literature review): create review detailed page

- add namespace to reviews urls

* feat(review search): add top_k parameter while creating review

- display number of PDFs retrieved

* feat(citation screening): implement initial title and abstract screening

- add manual screening pipeline for single reviewer based on iterative approach.
- opens first unscreened paper

* feat(citation screening): display decision count

- add source database and query to each paper

* feat(citation screening): enable exporting screening results to json

* refactor(citation screening): refactor with sourcery

style with black

* feat(citation screening): update results export with more fields

* refactor(citation screening): refactor screen_papers()

* feat(citation screening): measure time spent on screening

* feat(literature review): update database models with new fields

- added new fields to literature review

* feat(automatic classification): add django app template

* feat(literature review): update UI of screening and review display pages

* fix(literature review): remove unused favicons

* feat(literature review): add option to screen any paper from the list

* feat(literature review): further updates to screening UI

- add review statistics table
- display only snippet by default, expand to abstract on click

* feat(literature review): update review creation page

* feat(literature review): add option for selecting search engines while creating new review

* feat(google scholar): add script for searching GS with scholarly

* fix(search results): if abstract not available display only snippet

* feat(literature review): google scholar added to available search engines for literature review

* feat(literature review): relax requirements on field names while creating new review

* feat(automatic classification): add django models for document classification

* feat(automatic classification): add views and serializers for ml algorithms

* fix(citation screening): descriptive reason field made optional

* feat(seed studies): user can add seed studies to review from pdf URLs

* feat(literature review): add option for editing literature review parameters

* fix(UI): change height of main page so it fits into screen without scrolling

* feat(automatic classification): add base algorithm class and fasttext implementation

* feat(automatic classification): add dummy classification algorithm

* test(automatic classification): add classifier and registry tests

* feat(automatic classification): add ML algorithm registry

* feat(automatic classification): update django settings.py with new apps

* feat(citation screening): add new option to citation screening

* feat(automatic classification): add document_classification urls

* feat(automatic classification): add document_classification models to django admin

* feat(automatic classification): users can add new classifiers to the review

* feat(literature review): display number of citations per paper

* feat(literature review): allow for sorting the table with papers

* feat(automatic classification): automatic screening works with dummy model on review data

* fix(citation screening): fix missing time in context data for screening papers

* feat(automatic classification): update classifiers, automatic screening works with fasttext

* fix(literature review): always enable review details page button

* fix(users): add login required to user_profile view

* feat(literature review): add django db migrations

* ci: add requirements for literature review ap

* fix(google scholar): fix publication date in google scholar search

* feat(document classification): fix import in document classification

* feat(literature review): merge papers based on the title

* feat(document classification): improve fasttext classifier

* feat(document classification): 'maybe' decisions treated as 'yes' for training model

* docs: update deployment information README.md

* fix(concept search): comment CCS and wiki taxonomies

* docs: update README.md and requirements.txt

* docs: update documentation for grobid

* feat(document classification): comment out ML registry code

* feat(literature review): fix type mismatch when creating new review

* feat(literature review): add link to my reviews to the header

* feat(search): disable taxonomy search

* feat(search): disable taxonomy search

* fix(UI): hide my reviews for non authenticated users

* feat(document search): rename internal search function to search_cruise

* feat(article): add DOI to article class, update search functions accordingly

* fix(citation screening): fix ambiguity with questions about prior knowledge of paper/authors

* fix(citation screening): fix DOI display during screening

* fix(literature review): remove duplicated key when deduplicating

* fix(google scholar): fix google scholar publication date type

* fix(literature review): fix deduplication

* fix(literature review): fix deduplication

* fix(literature review): fix deduplication

* feat(citation screening): update screening interface according to comments from Georgios

* fix(citation screening): update screening interface according to comments from Georgios

* feat(favicon): add cruise favicon

* fix(literature review): add review id to results export

* feat(citation screening): update citation screening page UI

* feat(citation screening): display asterisk for required elements during screening

* docs(README): update postgres configuration documentation

* docs(citation screening): add documentation to citation screening forms

* fix(search documents): remove checkbox for searching with taxonomies

* refactor(document classification): remove unused PredictView

* style: style with black

* feat(document screening): add can_screen_automatically flag

* feat(organisations): create organisations django app with db model

* feat(organisations): admin can create new organisation

- django admins add organisation admin

* feat(organisations): display organisation details

* feat(organisations): implement removing user from organisation

* feat(organisations): implement remove organisation and member delete

* style: style with black

* feat(organisations): filter users part of organisation

- fix modal ID for deleting users

* feat(organisations): display user organisations on user profile page

* feat(organisations): enable permissions for editing and viewing organisations

* feat(organisations): adding/removing member and deleting organisation requires permissions

* feat(organisations): enable adding reviews to organisation

* feat(organisations): display all reviews from organisation

* style: style with black

* refactor(organisations): optimise imports

* test: add unit tests to the citation_screening views (#170)

* test(literature review): create tests for literature review list

* fix(users): add default django login URL

* feat(logging): fix path to the loggers' FileHandler

* test(literature review): add unit tests for review details

* fix(home): replace default homepage urlpattern

* fix(literature review): remove unnecessary if statement

* test(literature review): change assertion for user without review access

user who don't have access to a review are redirected to 404
page
behaviour now is the same as when review does not exist

* fix(literature review): change return value in review_details() view

user who don't have access to a review are redirected to 404
page
behaviour now is the same as when review does not exist

* test(literature review): add unit tests for creating new lit review

* test(literature review): expand create new review test to check saved db content

* fix(about): update about page with contact detail

* test(literature review): add test_create_new_review_POST_unauthenticated test

* fix(literature review): change return value in edit_review() view

user who don't have access to a review are redirected to 404
behaviour now is the same as when review does not exist

* test(literature review): add tests for editing literature review

* fix(citation screening): rename "Topic relevance" to "Relevance to the query"

* fix(literature review): fix edit and create review redirects in case of form errors

* test(literature review): add tests for export_review view

* test(literature review): add fake paper to test if review is exported correctly

* test(literature review): new tests for adding seed studies and automatic screening

* test(citation screening): add tests for screening POST method

* test(literature review): move all xxx_GET_not_member tests into one method

* fix(literature review): remove unnecessary prints
  • Loading branch information
WojciechKusa authored Nov 28, 2022
1 parent da342a1 commit d862316
Show file tree
Hide file tree
Showing 89 changed files with 4,447 additions and 51 deletions.
72 changes: 70 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ Activate the environment:
$ source activate cruise-literature
```

Use pip to install requirements:
Use pip to install requirements (you will need `g++` to install fasttext):

```bash
(cruise-literature)$ pip install -r requirements.txt
Expand All @@ -36,6 +36,72 @@ Checkout [the backend](src/backend/README.md)
In order to use [CORE search API](https://core.ac.uk/services/api) create a file `data/core_api_key.txt` and insert your API key.
Next, change `SEARCH_WITH_CORE` to `True` in `src/cruise_literature/cruise_literature/settings.py`.

### 1.3 Postgres database

#### Ubuntu

```bash
$ sudo apt install postgresql postgresql-contrib
```

```bash
$ service postgresql start
```

Start postgres server

```bash
$ sudo systemctl start postgresql.service
```

##### Configuration

Replace `SYSTEM_USERNAME` with your system username and `YOUR_PASSWORD` with your desired database password.

You can check what is your `SYSTEM_USERNAME` with the following command:

```bash
$ whoami
```

Start psql and open database:

```bash
$ sudo -u postgres psql
```

Create new role for cruise application, set its name same as your `SYSTEM_USERNAME`, give `LOGIN` and `CREATEDB` permissions; set `YOUR_PASSWORD` password:

```postgres
postgres-# CREATE ROLE SYSTEM_USERNAME WITH LOGIN;
postgres-# ALTER ROLE SYSTEM_USERNAME CREATEDB;
postgres-# ALTER USER SYSTEM_USERNAME WITH PASSWORD 'YOUR_PASSWORD';
postgres-# \q
```

On shell, open psql with `postgres` database with our new user.

```bash
$ psql postgres
```

Note that the postgres prompt looks different, because you’re not logged in as a root user anymore. Create a `cruise_literature` database and grant all privileges to our `SYSTEM_USERNAME` user:

```postgres
postgres-> CREATE DATABASE cruise_literature;
postgres-> GRANT ALL PRIVILEGES ON DATABASE cruise_literature TO SYSTEM_USERNAME;
```

Update the `DATABASES` entry in [`cruise_literature/settings.py`](src/cruise_literature/cruise_literature/settings.py):

```python
...
"USER": "SYSTEM_USERNAME",
"PASSWORD": "YOUR_PASSWORD",
...
```


## 2. Running

### 2.1 On a local host
Expand All @@ -49,7 +115,7 @@ Go into `src/cruise_literature/` directory:
Make migrations and migrate the database

```bash
(cruise-literature)$ python manage.py makemigrations home document_search concept_search users
(cruise-literature)$ python manage.py makemigrations
(cruise-literature)$ python manage.py migrate
```

Expand Down Expand Up @@ -78,6 +144,8 @@ Server should be available at http://127.0.0.1:8000/

### 2.2 Deployment on prod server

Add `YOUR_IP` to `ALLOWED_HOSTS` in `cruise_literature/settings.py`

```bash
(cruise-literature)$ python manage.py runserver YOUR_IP:YOUR_PORT
```
Expand Down
4 changes: 3 additions & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,16 @@
Django~=4.0.4
wikipedia~=1.4.0
fuzzywuzzy~=0.18.0
requests~=2.27.1
requests~=2.25.1
torch==1.10.2
transformers==4.14.1
rdflib~=6.1.1
faiss-cpu~=1.7.2
numpy~=1.22.3
pandas==1.4.2
xmltodict~=0.13.0
psycopg2~=2.9.3
psycopg2-binary~=2.9.3

# scripts
pycld3~=0.22
Expand Down
104 changes: 104 additions & 0 deletions src/cruise_literature/citation_screening/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
# Citation screening

This is the citation screening project readme.




## Installation

### PostgreSQL

*Based on this [gist](https://gist.github.com/phortuin/2fe698b6c741fd84357cec84219c6667)*

[Install postgresql](https://www.postgresql.org/download/)

#### On Mac:

`brew install postgresql@14`

Run server:

`pg_ctl -D /opt/homebrew/var/postgresql@14 start`

Note: if you’re on Intel, the /opt/homebrew probably is `/usr/local`.

Start psql and open database `postgres`, which is the database postgres uses itself to store roles, permissions, and structure:

```bash
$ psql postgres
```

#### Ubuntu

`service postgresql start`

Run server:

`sudo systemctl start postgresql@14-main`

Start psql and open database:

`sudo -u postgres psql`



### Next steps

Create role for application, give login and `CREATEDB` permissions:

```postgres
postgres-# CREATE ROLE cruise_literature_user WITH LOGIN;
postgres-# ALTER ROLE cruise_literature_user CREATEDB;
```

Quit psql, because we will log in with the new role (=cruise_literature_user) to create a database:

```postgres
postgres-# \q
```

On shell, open psql with `postgres` database with user `cruise_literature_user`:

```bash
$ psql postgres -U cruise_literature_user
```

Note that the postgres prompt looks different, because we’re not logged in as a root user anymore. We’ll create a database and grant all privileges to our user:

```postgres
postgres-> CREATE DATABASE cruise_literature;
postgres-> GRANT ALL PRIVILEGES ON DATABASE cruise_literature TO cruise_literature_user;
```

Run migrations and start server:

```bash
$ python manage.py makemigrations
$ python manage.py migrate --database=literature_review
$ python manage.py migrate
$ python manage.py runserver
```


sudo systemctl start postgresql
sudo systemctl enable postgresql




### GROBID

#### server

```bash
docker pull lfoppiano/grobid:0.7.1
docker run -t --rm -p 8070:8070 lfoppiano/grobid:0.7.1
```


#### Client

Just install

`pip install grobid_tei_xml`
Empty file.
6 changes: 6 additions & 0 deletions src/cruise_literature/citation_screening/admin.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
from django.contrib import admin

from .models import LiteratureReview, LiteratureReviewMember

admin.site.register(LiteratureReview)
admin.site.register(LiteratureReviewMember)
6 changes: 6 additions & 0 deletions src/cruise_literature/citation_screening/apps.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
from django.apps import AppConfig


class CitationScreeningConfig(AppConfig):
default_auto_field = "django.db.models.BigAutoField"
name = "citation_screening"
Loading

0 comments on commit d862316

Please sign in to comment.