You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Solr is not a good primary store for irreproducible data such as usage events. It's designed to cache data that can be reloaded from elsewhere. It's hard to use with external statistical tools. Usage data are a sequence of structured records with multivalued fields containing data that cannot be recreated.
The core expands forever (unless you delete old cases), quickly becoming the largest of DSpace's cores, and the older records are perhaps not worth keeping online.
Describe the solution you'd like
A simple log of usage events. We could use Log4J2 (the same logger used for other logging in DSpace) to manage things like file rollover. An event can be represented in JSON as a single complex object on one line. External statistical tools should be able to ingest such files either directly or with minimal reformatting. Tools such as jq exist to select or transform JSON records, and files can be readily combined using ordinary file tools. Older records can be compressed, and perhaps archived offline.
Describe alternatives or workarounds you've considered
XML is unsuitable because an XML file must be a single document with a top-level end element. It is not a good fit to a conceptually unending stream of records. We would require rules for reading a file of events as millions of tiny separate "document"s.
YAML would work, but YAML does too much and probably requires a third-party parser such as the endlessly-buggy Jackson. JSON is about the right level of complexity and we can use JSON-P (JSR 353).
A relational database is a poor fit due to multivalued fields. We'd need either a forest of tables and foreign keys or tricky encoding rules (and have to build parsers for the rule set).
A graph database would work well, but we don't need a DBMS for this. It's just a time series.
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
Solr is not a good primary store for irreproducible data such as usage events. It's designed to cache data that can be reloaded from elsewhere. It's hard to use with external statistical tools. Usage data are a sequence of structured records with multivalued fields containing data that cannot be recreated.
The core expands forever (unless you delete old cases), quickly becoming the largest of DSpace's cores, and the older records are perhaps not worth keeping online.
Describe the solution you'd like
A simple log of usage events. We could use Log4J2 (the same logger used for other logging in DSpace) to manage things like file rollover. An event can be represented in JSON as a single complex object on one line. External statistical tools should be able to ingest such files either directly or with minimal reformatting. Tools such as
jq
exist to select or transform JSON records, and files can be readily combined using ordinary file tools. Older records can be compressed, and perhaps archived offline.Describe alternatives or workarounds you've considered
XML is unsuitable because an XML file must be a single document with a top-level end element. It is not a good fit to a conceptually unending stream of records. We would require rules for reading a file of events as millions of tiny separate "document"s.
YAML would work, but YAML does too much and probably requires a third-party parser such as the endlessly-buggy Jackson. JSON is about the right level of complexity and we can use JSON-P (JSR 353).
A relational database is a poor fit due to multivalued fields. We'd need either a forest of tables and foreign keys or tricky encoding rules (and have to build parsers for the rule set).
A graph database would work well, but we don't need a DBMS for this. It's just a time series.
The text was updated successfully, but these errors were encountered: