title | short | language | page |
---|---|---|---|
Avram Specification |
Avram |
en |
true |
Avram is a schema language for field-based data formats such as key-value records or library formats MARC and PICA.
- author: Jakob Voß (
[email protected]
) - version: 0.9.6
- date: 2024-01-19
MARC and related formats such as PICA and MAB are used since decades as the basis for library automation. Several variants, dialects and profiles exist for different applications. The Avram schema language allows to specify individual formats for documentation, validation, and requirements engineering. The schema language is named after Henriette D. Avram (1919-2006) who devised MARC as the first automated cataloging system in the 1960s.
The Avram specification consists of a schema format based on JSON and validation rules to validate records against individual schemas. The format can also be used to express results of record analysis. Avram schemas cover library formats based on MARC and PICA as well as simple key-value structures.
The document is managed in a git repository at https://github.com/dini-ag-kim/avram together with test files for implementations.
The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.
A string is a sequence of Unicode code points.
A single character is a string consisting of exactely one Unicode code point.
A timestamp is a date or datetime as defined with XML Schema datatype datetime (-?YYYY-MM-DDThh:mm:ss(\.s+)?(Z|[+-]hh:mm)?
) date (-?YYYY-MM-DD(Z|[+-]hh:mm)?
), gYearMonth (-?YYYY-MM
), or gYear (-?YYYY
).
A regular expression is a non-empty string that conforms to the ECMA 262 (2015) regular expression grammar. The expression is interpreted as Unicode pattern with .
matching all characters, including newlines.
A language is a natural language identifier as defined with XML Schema datatype language.
A non-negative integer is a natural number (0, 1, 2...)
An URI is a valid URI string according to RFC 3986.
An IRI reference is a non-empty string matching the regular expression ^[^\x00-\x20<>"{}|^`\\]+$
and conforming to the syntax of IRI as defined in [RFC 3987].
An URL is an URI starting with http://
or https://
.
A range is a sequence of digits or a sequence of digits followed by a dash (-
) and a second sequence of digits. The second sequence, if given, SHOULD have same length as the first. The numeric values of each sequence are called start number and end number, respectively. The end number, if given, MUST be larger than the start number. Examples of valid ranges include 0
, 00
, 3-7
, 03-12
, and 1-09
but not 7-2
. A string matches a range if it is a sequence of digits of same length as the longest sequence in the range and the numerical value is equal to or within the start number and the end number of the range. For instance 7
matches range 0-9
but it does not match 1-3
nor 03-10
and 07
matches 03-10
but not 0-9
.
Avram schemas are used to validate and analyze records. A record is a non-empty sequence of fields, each consisting of a tag, being a non-empty string and
- either a flat field value, being a string,
- or a non-empty sequence of subfields, each being a pair of subfield code (being a single character) and subfield value (being a string).
Fields with subfields, also called variable fields, MAY also have
- either two indicators, each being a single character,
- or an occurrence, being a sequence of two digits with positive numerical value (
01
,02
, ...99
).
In addition, each record has a set of record types, each being a non-empty string. By default this set is empty and applications MAY choose to not support record types at all (see validation with record types).
The record model can further be restricted by a format family.
The encoding of records in JSON or other individual serialization formats such as MARCXML, ISO 2709, or PICA JSON is out of the scope of this specification.
-
Possible JSON serialization of a record of type
test
and two flat fields with occurence and one field with three subfields of codeg
,g
, ands
:{ "types": [ "test" ], "fields": [ { "tag": "uri", "occurrence": "01", "value": "http://www.wikidata.org/entity/Q10953" }, { "tag": "uri", "occurrence": "02" "value": "https://viaf.org/viaf/18236820" }, { "tag": "name", "subfields": [ "g", "Henriette", "g", "Davidson", "s", "Avram" ] } ] }
The record model can be restricted by a format family, identified by a non-empty string. The following format families are part of this specification:
-
flat
: all fields are flat without indicators or occurrences (simple key-value structures with repeatable keys) -
marc
: flat fields have no indicators or occurrences, variable fields have no occurrences and exactely two indicators, each being a lowercase alphanumeric character or a space character (a
toz
,0
to9
, andLDR
. -
pica
: all fields are variable without indicators. Field tags consist of four characters being a digit0
,1
, or2
, followed by two digits, followed by an uppercase letterA
toZ
or@
. -
mab
: fields have one indicator and no occurrences. Field tags consist of three digits.
Restrictions on records by a format family imply restrictions on schemas for this format family.
-
Possible JSON serializations of records of family
flat
,marc
, andpica
, respectively:{ "fields": [ { "tag": "given", "value": "Henriette" }, { "tag": "given", "value": "Davidson" }, { "tag": "surname", "value": "Avram" }, { "tag": "birth", "value": "1919-10-07" } ] }
{ "types": ["z"], "fields": [ { "tag": "LDR", "value": "00000nz a2200000oc 4500" }, { "tag": "001", "value": "1089521669" }, { "tag": "100", "indicator1": "1", "indicator2": " ", "subfields": [ "a", "Avram, Henriette D." "d", "1919-2006" ] } ] }
{ "fields": [ { "tag": "003U", "subfields": [ "a", "http://d-nb.info/gnd/1089521669" ] }, { "tag": "028A", "subfields": [ "d", "Henriette D.", "a", "Avram" ] }, { "tag": "060R", "subfields": [ "a", "1919", "b", "2006", "4", "datl" ] } ] }
An Avram Schema is a JSON object given as serialized JSON document or any other format that encodes a JSON document. In contrast to RFC 7159, all object keys MUST be unique. String values SHOULD NOT be the empty string. Applications MAY remove keys with empty string value.
A schema MUST contain key
fields
with a field schedule.
A schema SHOULD contain keys documenting the format defined by the schema:
title
with the name of the formatdescription
with a short description of the formatfamily
with a format familyurl
with a homepage URL of the formaturi
with an URI to uniquely identify the formatlanguage
with the language values of keystitle
,description
, andlabel
used throughout the schema. Its value SHOULD be assumed asund
if not specified.
The schema MAY contain keys:
$schema
with an URL of the Avram metaschemacodelists
with a codelist directoryrules
with external validation rulesrecords
with a non-negative integer to indicate a number of recordscreated
with a timestamp when this schema was createdmodified
with a timestamp when this schema was updated
{
"fields": { },
"title": "MARC 21 Format for Classification Data",
"description": "MARC format for classification numbers and captions associated with them",
"url": "https://www.loc.gov/marc/classification/",
"uri": "http://format.gbv.de/marc/classification",
"language": "en",
"$schema": "https://format.gbv.de/schema/avram/schema.json"
}
A field schedule is a JSON object that maps field identifiers to field definitons.
{
"010": { "label": "Library of Congress Control Number" },
"084": { "label": "Classification Scheme and Edition" }
}
Field identifiers of a field schedule MUST NOT overlap. Two field identifiers overlap when it is possible to match a field with both.
A field identifier is a non-empty string that can be used to match fields. The identifier consists of a tag, optionally followed by the slash (/
) and
- either a field occurrence, being a range of two digit sequences except the single sequence of two digits (
00
), - or the dollar character (
$
) followed by small letter x (x
) and a field counter, being a range of one or two digits sequences (0
,0-1
...,00
,00-01
...,98-99
).
Applications MAY further allow a tag followed by the slash and two zeroes (/00
) as alias for a bare tag.
A field matches a field identifier if the tag of the field is equal to the tag of the field identifier, and
- the field has no occurrence and the field identifier has no field occurrence nor field counter,
- or the occurrence of the field matches the range of the field occurrence,
- or the first subfield value of subfield with subfield code
x
matches the range of the field counter.
LDR
,001
,850
... (MARC)021A
,045Q/01
,028B/01-02
,209K
,209A/$x00-09
,247A/$x0
... (PICA)001
,100
,805
... (MAB)
A field definition is a JSON object that SHOULD contain key:
tag
with the tag of the fieldlabel
with the name of the fieldrepeatable
with a boolean value, assumed asfalse
by defaultrequired
with a boolean value, assumed asfalse
by default
The field definition MAY further contain keys:
occurrence
with the field occurrence of the fieldcounter
with the field counter of the fieldurl
with an URL link to documentation of the fielddescription
with a description of the fieldcomment
with an additional comment about the fieldindicator1
with first indicator definition ornull
as placeholder for{"codes":{" ":{}}}
indicator2
with second indicator definition ornull
as placeholder for{"codes":{" ":{}}}
pica3
with corresponding Pica3 numbercreated
with a timestamp when this field was introducedmodified
with a timestamp when this field was changeddeprecated
with a boolean value, assumed asfalse
by defaultpositions
with a specification of positions (for flat fields)pattern
with a regular expression (for flat fields)groups
with pattern groups of the regular expression (for flat fields)codes
with a codelistsubfields
with a subfield schedule (for variable fields)rules
with external validation rulestotal
with a non-negative integer to indicate the number of times this field has been foundrecords
with a non-negative integer to indicate the number of records this field has been found intypes
with a JSON object that maps record types to typed field definitons.
A typed field definition is a JSON object with optional keys positions
, pattern
, groups
, codes
, label
, description
, and url
, each defined identical to keys of same name allowed in a field definition (see validation with record types).
If a field definition is given in a field schedule, each of tag
, occurrence
and counter
MUST either be missing or have same value as used to construct the corresponding field identifier.
If a field definition contains the subfield
keys indicating a variable field, it MUST NOT contain keys for flat fields (positions
, pattern
and/or codes
).
Applications MAY allow and remove occurrence
keys with value two zeroes (00
) as alias for a field definition without occurrence.
-
MARC field
240
specified as mandatory and non-repeatable:{ "tag": "240", "label": "Uniform Title", "url": "https://www.loc.gov/marc/bibliographic/bd240.html", "required": true, "repeatable": false, "modified": "2017-12" }
-
PICA field
045B/02
in K10plus format{ "tag": "045B", "occurrence": "02", "pica3": "5022", "label": "Systematik für Bibliotheken (SfB)", "repeatable": true, "subfields": { "a": { "label": "Notation", "repeatable": true }, "A": { "label": "Quelle", "repeatable": true } } }
-
MARC field
007
with
Subfield values and flat field values can be specified positions, being a JSON object that maps character positions to data element definitions. A character position is a range. It is RECOMMENDED to use sequences of two digits.
A data element definition is a JSON object that SHOULD contain key:
label
with the name of the data elementstart
with the start number of the character positionend
with the end number of the character position or the start number if there is no end number
The data element definition MAY further contain keys:
url
with an URL link to documentationdescription
with additional descriptioncodes
with a codelist with codes of length defined by the character position rangeflags
with a codelist with codes of same length being a proper divisor of the length of the character position rangepattern
with a regular expressiongroups
with pattern groups of the regular expression
Character positions of a positions object MUST NOT overlap. Two character positions overlap if there is a string that matches both of them.
-
Positions in MARC 21 Bibliographic field
005
:{ "00-03": { "label": "year", "start": 0, "end": 3 }, "04-05": { "label": "month", "start": 4, "end": 5 }, "06-07": { "label": "day", "start": 6, "end": 7 }, "08-09": { "label": "hour", "start": 8, "end": 9 }, "10-11": { "label": "minute", "start": 10, "end": 11 }, "12-15": { "label": "second", "start": 12, "end": 15 } }
-
Position
33-35
in MARC 21 Bibliographic field008
can hold a combination of flags, filled up with spaces:{ "33-35": { "flags": { " ": { "label": "No specified special format characteristics" }, "e": { "label": "Manuscript" } , "j": { "label": "Picture card } post card" }, "k": { "label": "Calendar" }, "l": { "label": "Puzzle" }, "n": { "label": "Game" }, "o": { "label": "Wall map" } , "p": { "label": "Playing cards" } , "r": { "label": "Loose-leaf" }, "z": { "label": "Other" } , "|": { "label": "No attempt to code" } }, "pattern": "^[^ ]* *$" }
Pattern groups are a JSON object that maps numbers of capturing groups (starting with "1"
) of a regular expression to documentation objects, each with optional keys:
label
with the name of the groupdescription
with additional description of the groupurl
with an URL link to documentation of the group
{
"pattern": "^([0-9]{4})-([01][0-9])-([0-3][0-9])$",
"groups": {
"1": { "label": "year" },
"2": { "label": "month" },
"3": { "label": "day" },
}
}
A subfield schedule is a JSON object that maps subfield codes to subfield definitions. A subfield code is a single character. A subfield definition is a JSON object that SHOULD contain keys:
code
with the subfield codelabel
with the name of the subfieldrepeatable
with a boolean value, assumed asfalse
by defaultrequired
with a boolean value, assumedfalse
by default
The subfield definition MAY further contain keys:
pattern
with a regular expressiongroups
with pattern groups of the regular expressionpositions
with a specification of positionscodes
with a codelistrules
with external validation rulesurl
with an URL link to documentationdescription
with a description of the subfieldcomment
with an additional comment about the subfieldpica3
with a corresponding Pica3 syntax definitioncreated
with a timestamp when this subfield was introducedmodified
with a timestamp when this subfield was updateddeprecated
with a boolean value, assumed asfalse
by defaulttotal
with a non-negative integer to indicate the number of times this subfield has been foundrecords
with a non-negative integer to indicate the number of records this subfield has been found in
The subfield definition MAY but SHOULD NOT contain an additional, deprecated key
order
with a non-negative integer used to specify a partial or complete order of subfields
-
Subfield schedule for MARC 21 bibliographic field
250
(Edition Statement):{ "a": { "label": "Edition statement", "repeatable": false, "pattern": "\\.$" }, "b": { "label": "Remainder of edition statement", "repeatable": false }, "3": { "label": "Materials specified", "repeatable": false }, "6": { "label": "Field link and sequence number", "repeatable": true } }
An indicator definition is a JSON object that SHOULD contain key
label
with the name of the indicator
and further MAY contain keys:
url
with an URL link to documentationdescription
with additional description of the indicatorcodes
with a codelist of single character codespattern
with a a regular expressiongroups
with pattern groups of the regular expression
{
"label": "Type",
"codes": {
" ": "Abbreviated key title",
"0": "Other abbreviated title"
}
}
A codelist is
- either a JSON object that maps codes to code definitions (explicit codelist)
- or a non-empty string that SHOULD be an URI (codelist reference).
A code is a non-empty string. A code definition is either a string or a JSON object with optional keys:
code
with the codelabel
with the name of the codedescription
with additional description of the codecreated
with a timestamp when this code was introducedmodified
with a timestamp when this code was updatedurl
with a link to documentation of the codedeprecated
with a boolean value, assumed asfalse
by default
Optional key code
of a code definition MUST be equal to the key of the code definition in its codelist.
A code definition being a string MUST be treated identical to a codelist definition being JSON object with only key label
having the value of the string.
A codelist directory is a JSON object that maps codelist references to JSON objects each having at least the mandatory key codes
with a codelist and optional keys:
title
with a name of the codelistdescription
with additional description of the codelistcreated
with a timestamp when this codelist was introducedmodified
with a timestamp when this codelist was updatedurl
with a homepage URL or link to documentation of the codelist
A codelist reference can be resolved by looking up its value as key in the codelist directory to get the corresponding explicit codelist.
-
Explicit, reference, and codelist directory:
{ " ": "No specified type", "a": { "label": "Archival", "created": "2022" }, "x": { "code": "x" } }
"http://id.loc.gov/vocabulary/languages"
{ "http://id.loc.gov/vocabulary/languages": { "title": "MARC List for Languages", "codes": { "eng": { "label": "English" }, "fre": { "label": "French" } } } }
An Avram Schema MAY include references to additional validation rules with key rules
at the root level, at field schedules, and at subfield schedules to check additional data types or integrity constraints. The value of this keys MUST be an array. The elements of this array MUST
-
either a rule identifier, being an IRI reference that MUST NOT be equal to names of validation rules,
-
or a JSON object with arbitrary contents. The object SHOULD include a key
class
having an IRI reference as value to specify the kind of rule.
{
"fields": {
"birth": {
"subfields": {
"Y": { "label": "year" },
"M": { "label": "month" },
"D": { "label": "day" }
},
"rules": ["http://example.org/valid-date"]
},
"death": {
"subfields": {
"Y": { "label": "year" },
"M": { "label": "month" },
"D": { "label": "day" }
},
"rules": ["http://example.org/valid-date"]
},
"age": {
"rules": ["xsd:nonNegativeInteger"]
}
},
"rules": [
{
"class: "conditional-rule",
"if": "birth?",
"then": "birth.Y < 1950",
"description": "birth only allowed before 1950 for privacy reasons"
}
]
}
A format family restricts the model of records than can be described by an Avram schema. Known values of schema key family
imply restriction on field identifiers and field definitions.
Field identifiers are plain tags. Field definitions MUST NOT include keys occurrence
, counter
, indicator1
, indicator2
, or subfields
.
Field identifiers are plain tags and MUST either be the string LDR
or three digits. Field definitions MUST NOT include keys occurrence
or counter
. Field definitions of flat fields MUST NOT have keys indicator1
or indicator2
.
Field identifiers MUST NOT include a field counter if its tag starts with digit 0
or 1
and MUST NOT include a field occurrence if its tag starts with digit 2
.Tags MUST match the regular expression ^[012][0-9][0-9][A-Z@]
. Field definitions MUST NOT include keys indicator1
or indicator2
.
Field identifiers are plain tags and MUST consist of excactely three digits. Field definitions MUST NOT include keys indicator2
, occurrence
, or counter
.
A JSON Schema to validate Avram Schemas is available at https://format.gbv.de/schema/avram/schema.json.
Applications MAY extend the metaschema for particular format families and formats, for instance by further restriction of the allowed set of field identifiers.
Avram schemas can be used to validate records based on validation rules specfied in this section (marked in bold and numbered from 1 to 23). Rule 1 to 19 refer to validation of individual records, fields, and subfields. Rule 20 to 22 refer to validation of sets of records. Rule 22 can refer to both.
An Avram validator MAY choose to support only a limited set of validation rules, it SHOULD allow to enable and disable selected rules and it MAY disable selected rules by default. It is RECOMMENDED to disable counting rules (18 to 20) and external rules (21) by default. Support and selection of validation rules MUST be documented.
An Avram validator MAY limit validation to selected format families.
- invalidRecord: A set of records is valid against a schema, if all of its records pass record validation against the field schedule of the schema.
A record is valid against a field schedule if the following rules are met and every field passes field validation against its corresponding field definition from the field schedule. If rule undefinedField is disabled, fields without corresponding field definition are assumed to be valid.
-
undefinedField: Every field matches a field identifier in the field schedule.
-
deprecatedField: The matched field definition must not have key
deprecated
set totrue
. -
nonrepeatableField: The record does not contain more than one field matching the same field definition with
repeatable
beingfalse
. -
missingField: the record contains at least one field for each field definition with
required
beingtrue
.
A field is valid against a field definition if the following rules are met:
-
invalidfieldvalue: if the field is a flat field, its field value must be valid by value validation and by validation with record types.
-
invalidIndicator: If the field contains indicators, their values must be valid by value validation against the corresponding indicator definition
indicator1
(first indicator) andindicator2
(second indicator).
If the field is a variable field:
-
undefinedSubfield: Every subfield has a corresponding subfield definition.
-
deprecatedSubfield: The matched subfield definition must not have key
deprecated
set totrue
. -
nonrepeatableSubfield: For subfield definitions with
repeatable
beingtrue
, the field MUST NOT contain more than one subfield. -
missingSubfield: For subfield definitions with
required
beingtrue
, the field MUST contain at least one subfield. -
invalidSubfieldValue: Every subfield value is valid by value validation against its corresponding subfield definition.
Tag and occurrence of a field are not included in field validation as they are part of record validation.
A value (given as string), is valid if it conforms to a definition (given as field definition, subfield definition, indicator definition, or data element definition) by meeting the following rules:
-
patternMismatch: If the definition contains key
pattern
, the value must match its regular expression. The pattern is not anchored by default, so^
and/or$
must be included to match start and/or end of the value. -
invalidPosition: If the definition contains key
positions
, the value must be valid against its positions.
If the definition contains key codes
, the value must further be valid against its codelist (see corresponding rules below).
A value is always valid if the definition contains neither of keys pattern
, positions
, and codes
.
Record types are arbitrary strings attached to a record as flags. An Avram validator SHOULD support the following rule to enable additional validation depending on record types.
- recordTypes: If a field definition contains key
types
with a JSON object, the record types of a record must be looked up in this object to corresponding typed field definitions. All resulting typed field definitions must then be used for additional value validation with their keyspattern
,positions
, andcodes
.
-
A MARC 21 Bibliographic record has a type of material such as Book (
BK
) and Visual Material (VM
) and a category of material such as Text (t
) and Motion Picture (m
). Both can be encoded as Avram record types (not to be confused with MARC 21 record types). For instance a record may have the two typesBK
andt
. An Avram Schema of MARC 21 Bibliographic format could support these types by including the following in definition of field008
and007
:{ "tag": "008", "url": "https://www.loc.gov/marc/bibliographic/bd008.html", "positions": { ...common positions for all materials... }, "types": { "BK": { "url": "https://www.loc.gov/marc/bibliographic/bd008b.html", "positions": { ...additional positions for books... } }, ... } }, { "tag": "007", "url": "https://www.loc.gov/marc/bibliographic/bd007.html", "types": { "t": { "url": "https://www.loc.gov/marc/bibliographic/bd007t.html", "positions": { ... } }, ... } }
A string value is valid against positions if all substrings defined by character positions of the positions are valid by value validation against the corresponding data element definitions. Character positions are counted by Unicode code points.
- invalidFlag: If a data element definition contains key
flags
, the substring MUST also consist of a concatenation of codes from the codelist defined byflags
.
-
undefinedCode: A string value is valid against an explicit codelist if the value is a defined code in this codelist.
-
deprecatedCode: A string value is not valid against an explicit codelist if its code has key
deprecated
set totrue
. -
undefinedCodelist: A string value is valid against a codelist reference if the codelist reference can be resolved and the value is defined in the resolved explicit codelist.
Applications MAY also resolve codelist references against externally defined explicit codelists by implicitly extending the codelist directory of the schema. If so, the application MUST make clear whether codelists directly defined in the codelist directory are overriden or extened.
Avram schemas can also be used to give or expect a number of elements with keys records
at root level and keys records
and total
at field definitions, subfield definitions and code definitions. Support of the following counting rules in Avram validators is OPTIONAL. An Avram validator MUST document whether it supports counting rules or not.
Validation rules for counting are:
-
countRecord to enable counting the total number of records, and the total numbers or records each field with a field definition, each subfield with a subfield definition, and each code with a code definition is found in.
-
countField to enable counting the total number each field from the field schedule is found
-
countSubfield to enable counting the total number each subfield field from a subfield schedule is found
If selected counting rules are supported and enabled, then the following must be checked by an Avram validator:
-
the number of validated records MUST be equal to the value of schema key
records
if this key exist (rulecountRecord
). -
if a field definition of the schema includes key
records
then the number of input records with this field MUST be equal to the number given by this key (combination of rulescountRecord
andcountField
). -
if a subfield definition of the schema includes key
records
then the number of input records with a field with this subfield MUST be equal to the number given by this key (combination of rulescountRecord
andcountSubfield
). -
if a field definition of the schema includes key
total
then the total number this field is contained in input records MUST be equal to the number given by this key (rulecountField
). -
if a subfield definition of the schema includes key
total
then the total number this subfield is contained in input records MUST be equal to the number given by this key (rulecountSubfield
).
By default external validation rules are ignored for validation because their semantics is out of the scope of this specification. The following rule can be enabled to require records to met all external rules:
- externalRule: Enforces an Avram validator to process all external rules and reject input data as invalid if a rule is violated or cannot be checked.
-
T. Berners-Lee, R. Fielding, L. Masinter: Uniform Resource Identifier (URI): Generic Syntax. RFC 3986, January 2005, https://tools.ietf.org/html/rfc3986.
-
M. Duerst, M. Suignard: Internationalized Resource Identifiers (IRIs). RFC 3987, January 2005, https://tools.ietf.org/html/rfc3987.
-
P. Biron, A. Malhotra: XML Schema Part 2: Datatypes Second Edition. W3C Recommendation, October 2005. https://www.w3.org/TR/xmlschema-2/
-
S. Bradner: Key words for use in RFCs to Indicate Requirement Levels. RFC 2119, March 1997. https://tools.ietf.org/html/rfc2119
-
T. Bray: The JavaScript Object Notation (JSON) Data Interchange Format. RFC 7159, March 2014. https://tools.ietf.org/html/rfc7159
-
ECMAScript 2015 Language Specification (ECMA-262, 6ᵗʰ edition) June 2015. http://www.ecma-international.org/ecma-262/6.0/
- avram-js reference implementation of an Avram validator, also includes an Avram meta-validator to check whether an Avram schema conforms to this specification
- QA catalogue Java implementation for MARC-based formats
- PICA::Schema Perl implementation for PICA-based formats
- MARC::Schema Perl implementation for MARC-based formats
- marctable crawls MARC 21 Bibliographic format from Library of Congress as Avram Schema
- K10plus Avram schemas
- JSON Table Schema schema format for tabular data
- JSON Schema schema language for JSON formats
- MARCspec - A common MARC record path language
Thanks to Péter Király for picking up the idea and for collaborative development. Thanks to Carsten Klee, Ed Summers, Harry Gegic, Johann Rolschewski, Stefan Majewski, Thomas Frings, and Timothy Thompson for comments, code and contributions.
- Add code key
url
- Add field and subfield key
comment
- Rename schema field
profile
touri
- Allow labels in typed field definition
- Allow pattern in indicator definition
- Add flags in positions
- Support deprecated codes
- Add pattern groups
- Add record types
- Clarify fixed length of codes in indicator definitions and positions
- Allow to omit leading zeroes in ranges
- Support deprecated fields and subfields
- Add flags and corresponding validation rule
- Change expressing field counters in field identifiers (
xNN
to/$xNN
) - Define indicator value
null
as placeholder for{" ":{}}
- Rename and redefine
checks
asrules
- Add formal specification of URI and URL based on RFC 3986
- Allow plain strings as code definition
- Remove code counting
- Disallow overlapping field identifiers of a field schedule
- Rename validation options and replace numbered validation rules
- Change codelist directory to support codelist metadata (breaking change!)
- Remove subfield key
order
and validation optioncheck_subfield_order
- Enumerate and better describe validation rules
- Add examples and improve wording
- Change semantics of counting options on validation
- Add optional code definition key
code
. - Extend Metaschema.
- Remove
deprecated-fields
,deprecated-subfields
anddeprecated-codes
. - Allow
created
andmodified
at schema, field, subfield and code. - Add position keys
start
andend
.
- Allow
pattern
,codes
anddeprecated-codes
at flat field definitions - Allow flat field values and subfield values to be empty
- Let dot in regular expressions also match newlines
- Extend definition of format families
- Allow simple string for referenced codelists
- Simplify treatment of overlapping field identifiers
- Disallow empty string regular expressions
- Extend formal description of validation
- Rename "fixed fields" to "flat fields"
- Add optional schema key
family
- Add codelist directories (
codelists
) - Add external validation rules (
checks
) - Remove field types (
types
) - Allow
deprecated-codes
also at subfield schedules
- More explicitly specificy field occurrence and field counter
- Textual refactoring
- Rename
count
torecords
to not confuse withcounter
- Add
total
andrecords
at field definitions, subfield definitions and code definitions - Allow URIs as codelists and allow
codes
at subfield level
- Add
counter
for PICA-based formats - Modify allowed values in
occurrence
- Add option field
description
in addition tolabel
- Add schema field
profile
to identify schemas
- Add
count
andlanguage
- Change
occurrence
from three to two digits
- Add
deprecated-subfields
- Add
pattern
at subfields and positions - Add
position
at subfields - Extend definition of positions
- Disallow empty strings
- First version