-
Notifications
You must be signed in to change notification settings - Fork 1
Home
merlot-gatherer
is a Java library that can be used to scrap the MERLOT repository (www.merlot.org). It also allows for computing some metrics for each of the learning resources gathered (see metrics-detailed-info). It stores all the information into a MySQL Server database that can be later used for analytics.
Created under LGPL license.
- Prerequisites
- How to run the program
- Execution methods
- Information extracted from MERLOT
- Metrics obtained
- More info
You need the following installed for running merlot-gatherer
:
- Java Runtime Environment (http://www.java.com/en/download). Built and tested on version 1.6.0_37.
- MySQL Server (http://www.mysql.com/downloads). Tested on version 5.5.8.
You can either use the Java Archive File (.jar) file or compiling the entire project by yourself.
- Using .jar file:
This is the easiest way, follow the next steps:
-
Download
MerlotCrawler.jar
, the content from the "Lib" folder (jericho-html-3.1.jar and mysql-connector-java-5.1.6-bin.jar), the config.xml file from the "Config" folder, and the DatabaseStructure.sql file. -
Put the .jar file into a folder ([.jar Path]), the config.xml file into a subfolder of [.jar Path] named "config", and the content from the "Lib" folder into [Lib folder Path].
-
Import the database structure from the file DatabaseStructure.sql into a MySQL Server Database. It has to be installed in your computer (see Prerequisites). You need to specify the connection information (database name, user and password) in the config.xml file.
-
Open a command line window and run:
java.exe -cp "[Lib folder Path]\jericho-html-3.1.jar;[Lib folder Path]\mysql-connector-java-5.1.6- bin.jar;[.jar Path]\MerlotCrawler.jar" merlotcrawler.Main -executionMethod
Notes: -executionMethod is a parameter that can be "-total","-onlyMaterials","-onlyMetrics", or "-updateUsers", depending on the kind of execution method that you prefer to run. See Execution Methods for more information.
- Compiling the program by yourself:
Use this option if you're a developer only. We have used NetBeans (https://netbeans.org/download) but this .
Follow these steps:
-
You'll need to download all the content from "Lib", "Config" and "src" folders available in: https://github.com/ieru/merlot-gatherer
-
Download and import into a MySQLDatabase called "dbcrawlermerlot" the file DatabaseStructure.sql from: https://github.com/ieru/merlot-gatherer.
-
Specify the connection parameters for the MySQL Database (database name, user and password) in the config.xml file from "Config" folder.
Notes: If they are not available here, you can download Lib files from their maintainers:
- jericho-html-3.1.jar: Available from http://sourceforge.net/projects/jerichohtml/files/
- mysql-connector-java-5.1.6-bin.jar: Available from http://dev.mysql.com/downloads/connector/j/
When you run the program, you have to specify a command line argument indicating the execution method. The execution methods available are the following:
* -total: (Obtain Materials + Obtain Metrics) Connects to MERLOT's repository and downloads all the Learning Objects information, inserting the extracted data into the database. After that, connects to the database in order to obtain the object's locations and calculates their web metrics. * -onlyMaterials: Connects to MERLOT's repository and downloads all the Learning Objects information, inserting the extracted data into the database. * -onlyMetrics: Connects to the database in order to obtain the object's locations and calculates their web metrics. (Note: the database needs to have information before executing this method). * -updateUsers: Connects to the database in order to obtain the MERLOT's users data. It compares the information with MERLOT and updates it if required. (Note: the database needs to have information before executing this method).The program collects the following information from MERLOT's repository:
-
Material Information: Incluiding material's ID, title, type, technical format, location, date added, date modified, author's ID, submitter, description, technical requirements, language, material version, copyright, source code available, accessibility information available, cost involved, creative commons, mobile compatibility, category, primary audiences, and awards.
-
Author Information: Incluiding ID, name, organization, and email.
-
Comments Information: Incluiding ID, material name, rating, classroom use, user ID, remarks, technical remarks and date added.
-
Reviews Information: Incluiding ID, material name, overview, learning goals, author, date added, target student population, prerequisite knowledge or skills, type of material, recommended use, technical requirements, content quality, effectiveness, ease of use, other issues and comments, comments from the author, and peer reviewer's ID.
-
Valorations Information: Rating and stars granted to the materials in either reviews or comments.
-
Users Information: Including ID, name, ribbons, if it's an author or peer reviewer, merlot awards, categories, type, last login, member since, and user's collections.
Detailed info about Metrics calculation on: metrics-detailed-info
javadoc documentation is available in a zipped file javadoc.zip
.