Instructions for replicate the experiments in MillenniumDB
- clone the MillenniumDB repository at the
path_query_challenge
branchgit clone -b path_query_challenge [email protected]:MillenniumDB/MillenniumDB.git
- Follow the instructions on the README to build the project.
-
Transform the N-Triples File downloaded from Figshare to the text format for MillenniumDB:
python3 scripts/nt_to_mdb.py truthy_direct_properties.nt truthy_direct_properties.mdb
-
Execute the bulk loading:
build/Release/bin/create_db truthy_direct_properties.mdb tests/dbs/wikidata
Before executing the script you need to edit some paths to match the correct folders on your machine:
MDB_PATH = f'/home/user/MillenniumDB' # where you cloned MillenniumDB repo
OUTPUT_PATH = f'/home/user/results' # results and logs will be written here, folder must exist
Then you can run the script choosing the desired parameters:
-
python3 scripts/benchmark_mdb.py [query_file] [ANY|SIMPLE|TRAILS|ALL|ALL_COUNT] [bfs|dfs] [cache|naive] [btree|trie]
for example:
python3 scripts/benchmark_mdb.py queries/set_I.txt ANY bfs cache btree
-
The param block
[ANY|SIMPLE|TRAILS|ALL|ALL_COUNT]
selects the desired semantics of returned paths. HereALL
stands forALL SHORTEST
. The semantics ofANY
will depend on the selected algorithm (see[bfs|dfs]
below). -
The param block
[btree|trie]
selects the data access method. Thebtree
option uses the disk data stored in B+trees and buffered into main memory. Thetrie
option stores the data in main memory using a compact sparse array representation. Thetrie
is constructed on the fly as needed. -
The param block
[cache|naive]
tells us whether the in-memory representation is loaded each time a new query is fired (naive
), or if we cache previously loaded relations (cache
). -
The param bloc
[bfs|dfs]
selects the algorithm applied to evaluate the queries. Note thatbfs
combined withANY
gives a single shortest path, whiledfs
withANY
gives an arbitrary path. -
The param
dfs
is only considered when usingANY|SIMPLE|TRAILS
, becauseALL|ALL_COUNT
don't have an dfs implementation. -
The params
[cache|naive]
are only considered when usingtrie
. They don't have any impact when usingbtree
.