Provides a Clojure library for use by the Wormbase project.
Features include:
-
Model-driven import of ACeDB data into a Datomic database.
- (Dynamic generation of an isomorphic Datomic schema from an annotated ACeDB models file)
-
Conversion of ACeDB database dump files into a datomic database
-
Routines for parsing and dumping ACeDB "dump files".
-
Utility functions and macros for querying WormBase data.
-
A command line interface for utilities described above (via
lein run
)
-
Java 1.8 (Prefer official oracle version)
-
-
You will also need to specify which flavour and version of datomic you want use in your lein peer project configuration.
Example:
(defproject myproject-0.1-SNAPSHOT :dependencies [[com.datomic/datomic-free "0.9.5359" :exclusions [joda-time]] [wormbase/pseudoace "0.4.4"]])
-
Follow the GitFlow mechanism for branching and committing changes:
- Feature branches should be derived from the
develop
branch: i.e:. git checkout -b feature-x develop
This project attempts to adhere to the Clojure coding-style conventions.
Run all tests regularly, but in particular:
-
before issuing a new pull request
-
after checking out a feature-branch
alias run-tests="lein with-profile dev,test do eastwood, test"
run-tests
Other useful leiningen plugins for development include:
Recommends idiomatic source code changes.
There is editor support in Emacs. e.g: M-x kibit-current-file
Command line examples:
# whole project
lein with-profile dev kibit
# single file
lein with-profile dev kibit src/pseudoace/core.clj
Reports on subjectively bad code. This tool checks for:
-
"files ending in blank lines"
-
redefined var roots in source directories"
-
"whether you keep up with your docstrings"
-
arguments colliding with clojure.core functions
Of the above, only 1. 2. and 3. are generally useful to fix, since 4. requires creative (short) naming that may not be intuitive for the reader. Use your discretion when choosing to "fix" any "violations" reported in category 4.
Configure leiningen credentials for clojars.
Test your setup by running:
# Ensure you are Using `gpg2`, and the `gpg-agent` is running.
# Here, gpg is a symbolic link to gpg2
gpg --quiet --batch --decrypt ~/.lein/credentials.clj.gpg
The output should look like (credentials elided):
;; my.datomic.com and clojars credentials
{#"my\.datomic\.com" {:username ...
:password ...}
#"clojars" {:username ...
:password ...}}
This process re-uses the leiningen deployment tools:
-
Checkout the
develop
branch if not already checked-out.-
Update changes entries in the CHANGES.md file
-
Replace "un-released" in the latest version entry with the current date.
-
Change the version from
MAJOR.MINOR.PATCH-SNAPSHOT
toMAJOR.MINOR.PATCH
inproject.clj
. -
Commit and push all changes.
-
-
Checkout the
master
branch.-
Merge the
develop
branch into tomaster
(via a github pull request or directly using git) -
Run:
lein deploy
-
-
Checkout the
develop
branch.-
Merge the
master
branch back intodevelop
. -
Change the version from
MAJOR.MINOR.PATCH
toMAJOR.MINOR.PATCH-SNAPSHOT
inproject.clj
. -
Update
CHANGES.md
with the next version number and a "back to development" stanza, e.g:
## 0.3.2 - (unreleased) - Nothing changed yet.
Commit and push these changes, typically with the message:
"Back to development"
-
# GIT_RELEASE_TAG should be the annotated git release tag, e.g:
# GIT_RELEASE_TAG="0.3.2"
#
# If you want to use a local git tag, ensure it matches the version in
# projet.clj, e.g:
# GIT_RELEASE_TAG="0.3.2-SNAPSHOT"
#
# LEIN_PROFILE can be any named lein profile (or multiple delimiter by comma),
# examples:
# LEIN_PROFILE="aws"
# LEIN_PROFILE="mysql"
# LEIN_PROFILE="postgresql"
# LEIN_PROFILE="dev
git checkout "${GIT_RELEASE_TAG}"
./scripts/bundle-release.sh $GIT_RELEASE_TAG $LEIN_PROFILE
An archive named pseudoace-$GIT_RELEASE_TAG.tar.gz
will be created in the
release-archives
directory.
The archive contains two artefacts:
tar tvf pseudoace-$GIT_RELEASE_TAG.tar.gz
./pseudoace-$GIT_RELEASE_TAG.jar
./sort-edn-log.sh
To ensure we comply with the datomic license ensure this tar file, and specifically the jar file contained therein is never distributed to a public server for download, as this would violate the terms of any preparatory Congnitech Datomic license.
A command line utility has been developed for ease of usage:
URL_OF_TRANSACTOR="datomic:dev://localhost:4334/*"
lein run --url $URL_OF_TRANSACTOR <command>
--url
is a required option for most sub-commands, it should be of
the form of:
datomic:<storage-backend-alias>://<hostname>:<port>/<db-name>
Alternatively, for extra speed, one can use the Clojure routines directly from a repl session:
# start the repl (Read Eval Print Loop)
lein repl
Example of invoking a sub-command:
(list-databases {:url (System/getenv "URL_OF_TRANSACTOR")})
Run pseudoace
with the same arguments as you would when using lein run
:
java -jar pseudoace-$GIT_RELEASE_TAG.jar -v
Create the database and parse .ace dump-files into EDN.
Example:
java -jar pseudoace-$GIT_RELEASE_TAG.jar \
--url $DATOMIC_URL \
--acedump-dir ACEDUMP_DIR \
--log-dir LOG_DIR \
-v prepare-import
The prepare-import
sub-command:
- Creates a new database at the specified
--url
- Converts
.ace
dump-files located in--acedump-dir
into pseudo EDN files located in--log-dir
. - Creates the database schema from the annotated ACeDB models file
specified by
--model
. - Optionally dumps the newly created database schema to the file
specified by
--schema-filename
.
The format of the generated files is:
<ace-db-style_timestamp>
The EDN data is required to sorted by timestamp in order to preserve the time invariant of Datomic:
find $LOG_DIR \
-type f \
-name "*.edn.gz" \
-exec ./sort-edn-log.sh {} +
Transacts the EDN sorted by timestamp in --log-dir
to the database
specified with --url
:
java -jar pseudoace-$GIT_RELEASE_TAG.jar \
--log-dir LOG_DIR \
-v import-logs
Using a full dump of a recent release of Wormbase, you can expect the import process to take in the region of 8-12 hours depending on the platform you run it on.