Skip to content

An attempt to scrape & parse Degree Programs and Courses from the ANU website. Uses NLP (Semantic Parsing) to extract logical expressions from unstructured text that describes course requisites. Objective: To organize the messy data, and to improve the experience of degree planning.

Notifications You must be signed in to change notification settings

kaihirota/ANU-Programs-and-Courses-Scraper

Repository files navigation

Stage 1: Build dataset

ANU search API Endpoints

  1. First, retrieve datasets through ANU API. ./fetch_data.sh

  2. Scrape programsandcourses website.

    The program uses Scrapy to collect data on classes, specialisations, and programs from the ANU programs and courses website.

    ./run_spiders.sh

Semantic Parsing

# target: "and" / "or" where
# a. preceded or followed by a punctuation, unless that punctuation is part of a named entity
# b. different pair of named entity and verb on left and right of the sentence
# give priority to a over b
# action: split
"To enrol in this course you must be studying a Master of Engineering 
and 
have completed ENGN8100 and (ENGN8160 or ENGN8260)."

"To enrol in this course you must have either: completed COMP6250 (Professional Practice 1) 
and be enrolled in the Master of Computing; 
OR 
be enrolled in the Master of Computing (Advanced)."

# capture co-requisite (COMP6710 OR COMP6730)
# target: "or" & if verb to the left but no named entity to the left
# action: treat as one expression, co-requisite check
# turn it into OR of two expressions: enrolled, completed
"To enrol in this course you must have completed or be currently enrolled in COMP6710 OR COMP6730."

# split by "-" and check length
# note verb to the left of ":" and recursively divide the right half into left and right
"To enrol in this course you must: 
- be enrolled in the Master of Computing 
- have completed COMP8260 and COMP6442 
- find a project/supervisor; and 
- have an approved 'Independent Study Contract' Incompatible with COMP8715 and COMP8830."

Stage 2: Build Neo4j Graph Database

Neo4j

Edit config_empty.py and rename it as config.py.

To build neo4j database, run python graph_builder/graph_builder.py

img/img1.jpg img/img2.jpg img/img3.jpg img/img4.jpg img/img5.jpg

Stage 3: Graph API Endpoint & GraphQL Playground

ANU Programs and Courses Graph API (Unofficial)

Stage 4: WebApp

ANU Programs and Courses Graph Explorer

About

An attempt to scrape & parse Degree Programs and Courses from the ANU website. Uses NLP (Semantic Parsing) to extract logical expressions from unstructured text that describes course requisites. Objective: To organize the messy data, and to improve the experience of degree planning.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published