Better storage and export for usage events #15

mwoodiupui · 2024-07-11T20:57:59Z

Is your feature request related to a problem? Please describe.
Solr is not a good primary store for irreproducible data such as usage events. It's designed to cache data that can be reloaded from elsewhere. It's hard to use with external statistical tools. Usage data are a sequence of structured records with multivalued fields containing data that cannot be recreated.

The core expands forever (unless you delete old cases), quickly becoming the largest of DSpace's cores, and the older records are perhaps not worth keeping online.

Describe the solution you'd like
A simple log of usage events. We could use Log4J2 (the same logger used for other logging in DSpace) to manage things like file rollover. An event can be represented in JSON as a single complex object on one line. External statistical tools should be able to ingest such files either directly or with minimal reformatting. Tools such as jq exist to select or transform JSON records, and files can be readily combined using ordinary file tools. Older records can be compressed, and perhaps archived offline.

Describe alternatives or workarounds you've considered
XML is unsuitable because an XML file must be a single document with a top-level end element. It is not a good fit to a conceptually unending stream of records. We would require rules for reading a file of events as millions of tiny separate "document"s.

YAML would work, but YAML does too much and probably requires a third-party parser such as the endlessly-buggy Jackson. JSON is about the right level of complexity and we can use JSON-P (JSR 353).

A relational database is a poor fit due to multivalued fields. We'd need either a forest of tables and foreign keys or tricky encoding rules (and have to build parsers for the rule set).

A graph database would work well, but we don't need a DBMS for this. It's just a time series.

The text was updated successfully, but these errors were encountered:

mwoodiupui added the enhancement label Jul 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better storage and export for usage events #15

Better storage and export for usage events #15

mwoodiupui commented Jul 11, 2024

Better storage and export for usage events #15

Better storage and export for usage events #15

Comments

mwoodiupui commented Jul 11, 2024