Skip to content

Latest commit

 

History

History
999 lines (719 loc) · 40.7 KB

specification.md

File metadata and controls

999 lines (719 loc) · 40.7 KB
title short language page
Avram Specification
Avram
en
true

Avram is a schema language for field-based data formats such as key-value records or library formats MARC and PICA.

Table of Contents

Introduction

MARC and related formats such as PICA and MAB are used since decades as the basis for library automation. Several variants, dialects and profiles exist for different applications. The Avram schema language allows to specify individual formats for documentation, validation, and requirements engineering. The schema language is named after Henriette D. Avram (1919-2006) who devised MARC as the first automated cataloging system in the 1960s.

The Avram specification consists of a schema format based on JSON and validation rules to validate records against individual schemas. The format can also be used to express results of record analysis. Avram schemas cover library formats based on MARC and PICA as well as simple key-value structures.

The document is managed in a git repository at https://github.com/dini-ag-kim/avram together with test files for implementations.

Conformance requirements

The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

Data types

A string is a sequence of Unicode code points.

A single character is a string consisting of exactely one Unicode code point.

A timestamp is a date or datetime as defined with XML Schema datatype datetime (-?YYYY-MM-DDThh:mm:ss(\.s+)?(Z|[+-]hh:mm)?) date (-?YYYY-MM-DD(Z|[+-]hh:mm)?), gYearMonth (-?YYYY-MM), or gYear (-?YYYY).

A regular expression is a non-empty string that conforms to the ECMA 262 (2015) regular expression grammar. The expression is interpreted as Unicode pattern with . matching all characters, including newlines.

A language is a natural language identifier as defined with XML Schema datatype language.

A non-negative integer is a natural number (0, 1, 2...)

An URI is a valid URI string according to RFC 3986.

An IRI reference is a non-empty string matching the regular expression ^[^\x00-\x20<>"{}|^`\\]+$ and conforming to the syntax of IRI as defined in [RFC 3987].

An URL is an URI starting with http:// or https://.

A range is a sequence of digits or a sequence of digits followed by a dash (-) and a second sequence of digits. The second sequence, if given, SHOULD have same length as the first. The numeric values of each sequence are called start number and end number, respectively. The end number, if given, MUST be larger than the start number. Examples of valid ranges include 0, 00, 3-7, 03-12, and 1-09 but not 7-2. A string matches a range if it is a sequence of digits of same length as the longest sequence in the range and the numerical value is equal to or within the start number and the end number of the range. For instance 7 matches range 0-9 but it does not match 1-3 nor 03-10 and 07 matches 03-10 but not 0-9.

Records

Avram schemas are used to validate and analyze records. A record is a non-empty sequence of fields, each consisting of a tag, being a non-empty string and

  • either a flat field value, being a string,
  • or a non-empty sequence of subfields, each being a pair of subfield code (being a single character) and subfield value (being a string).

Fields with subfields, also called variable fields, MAY also have

  • either two indicators, each being a single character,
  • or an occurrence, being a sequence of two digits with positive numerical value (01, 02, ...99).

In addition, each record has a set of record types, each being a non-empty string. By default this set is empty and applications MAY choose to not support record types at all (see validation with record types).

The record model can further be restricted by a format family.

The encoding of records in JSON or other individual serialization formats such as MARCXML, ISO 2709, or PICA JSON is out of the scope of this specification.

Examples
  • Possible JSON serialization of a record of type test and two flat fields with occurence and one field with three subfields of code g, g, and s:

    {
      "types": [ "test" ],
      "fields": [
        {
          "tag": "uri",
          "occurrence": "01",
          "value": "http://www.wikidata.org/entity/Q10953"
        },
        {
          "tag": "uri",
          "occurrence": "02"
          "value": "https://viaf.org/viaf/18236820"
        },
        {
          "tag": "name",
          "subfields": [
            "g", "Henriette",
            "g", "Davidson",
            "s", "Avram"
          ]
        }
      ]
    }

Format families

The record model can be restricted by a format family, identified by a non-empty string. The following format families are part of this specification:

  • flat: all fields are flat without indicators or occurrences (simple key-value structures with repeatable keys)

  • marc: flat fields have no indicators or occurrences, variable fields have no occurrences and exactely two indicators, each being a lowercase alphanumeric character or a space character (a to z, 0 to 9, and ). Field tags are either a string of three digits or the string LDR.

  • pica: all fields are variable without indicators. Field tags consist of four characters being a digit 0, 1, or 2, followed by two digits, followed by an uppercase letter A to Z or @.

  • mab: fields have one indicator and no occurrences. Field tags consist of three digits.

Restrictions on records by a format family imply restrictions on schemas for this format family.

Examples
  • Possible JSON serializations of records of family flat, marc, and pica, respectively:

    {
      "fields": [
        { "tag": "given", "value": "Henriette" },
        { "tag": "given", "value": "Davidson" },
        { "tag": "surname", "value": "Avram" },
        { "tag": "birth", "value": "1919-10-07" }
      ]
    }
    {
      "types": ["z"],
      "fields": [
        { "tag": "LDR", "value": "00000nz  a2200000oc 4500" },
        { "tag": "001", "value": "1089521669" },
        { "tag": "100",
          "indicator1": "1",
          "indicator2": " ",
          "subfields": [
            "a", "Avram, Henriette D."
            "d", "1919-2006"
        ] }
      ]
    }
    {
      "fields": [
        { "tag": "003U", "subfields": [ "a", "http://d-nb.info/gnd/1089521669" ] },
        { "tag": "028A", "subfields": [ "d", "Henriette D.", "a", "Avram" ] },
        { "tag": "060R", "subfields": [ "a", "1919", "b", "2006", "4", "datl" ] }
      ]
    }

Schema format

An Avram Schema is a JSON object given as serialized JSON document or any other format that encodes a JSON document. In contrast to RFC 7159, all object keys MUST be unique. String values SHOULD NOT be the empty string. Applications MAY remove keys with empty string value.

A schema MUST contain key

A schema SHOULD contain keys documenting the format defined by the schema:

  • title with the name of the format
  • description with a short description of the format
  • family with a format family
  • url with a homepage URL of the format
  • uri with an URI to uniquely identify the format
  • language with the language values of keys title, description, and label used throughout the schema. Its value SHOULD be assumed as und if not specified.

The schema MAY contain keys:

  • $schema with an URL of the Avram metaschema
  • codelists with a codelist directory
  • rules with external validation rules
  • records with a non-negative integer to indicate a number of records
  • created with a timestamp when this schema was created
  • modified with a timestamp when this schema was updated
Example
{
  "fields": { },
  "title": "MARC 21 Format for Classification Data",
  "description": "MARC format for classification numbers and captions associated with them",
  "url": "https://www.loc.gov/marc/classification/",
  "uri": "http://format.gbv.de/marc/classification",
  "language": "en",
  "$schema": "https://format.gbv.de/schema/avram/schema.json"
}

Field schedule

A field schedule is a JSON object that maps field identifiers to field definitons.

Example
{
  "010": { "label": "Library of Congress Control Number" },
  "084": { "label": "Classification Scheme and Edition" }
}

Field identifiers of a field schedule MUST NOT overlap. Two field identifiers overlap when it is possible to match a field with both.

Field identifier

A field identifier is a non-empty string that can be used to match fields. The identifier consists of a tag, optionally followed by the slash (/) and

  • either a field occurrence, being a range of two digit sequences except the single sequence of two digits (00),
  • or the dollar character ($) followed by small letter x (x) and a field counter, being a range of one or two digits sequences (0, 0-1..., 00, 00-01..., 98-99).

Applications MAY further allow a tag followed by the slash and two zeroes (/00) as alias for a bare tag.

A field matches a field identifier if the tag of the field is equal to the tag of the field identifier, and

  • the field has no occurrence and the field identifier has no field occurrence nor field counter,
  • or the occurrence of the field matches the range of the field occurrence,
  • or the first subfield value of subfield with subfield code x matches the range of the field counter.
Examples
  • LDR, 001, 850... (MARC)
  • 021A, 045Q/01, 028B/01-02, 209K, 209A/$x00-09, 247A/$x0... (PICA)
  • 001, 100, 805... (MAB)

Field definition

A field definition is a JSON object that SHOULD contain key:

  • tag with the tag of the field
  • label with the name of the field
  • repeatable with a boolean value, assumed as false by default
  • required with a boolean value, assumed as false by default

The field definition MAY further contain keys:

  • occurrence with the field occurrence of the field
  • counter with the field counter of the field
  • url with an URL link to documentation of the field
  • description with a description of the field
  • comment with an additional comment about the field
  • indicator1 with first indicator definition or null as placeholder for {"codes":{" ":{}}}
  • indicator2 with second indicator definition or null as placeholder for {"codes":{" ":{}}}
  • pica3 with corresponding Pica3 number
  • created with a timestamp when this field was introduced
  • modified with a timestamp when this field was changed
  • deprecated with a boolean value, assumed as false by default
  • positions with a specification of positions (for flat fields)
  • pattern with a regular expression (for flat fields)
  • groups with pattern groups of the regular expression (for flat fields)
  • codes with a codelist
  • subfields with a subfield schedule (for variable fields)
  • rules with external validation rules
  • total with a non-negative integer to indicate the number of times this field has been found
  • records with a non-negative integer to indicate the number of records this field has been found in
  • types with a JSON object that maps record types to typed field definitons.

A typed field definition is a JSON object with optional keys positions, pattern, groups, codes, label, description, and url, each defined identical to keys of same name allowed in a field definition (see validation with record types).

If a field definition is given in a field schedule, each of tag, occurrence and counter MUST either be missing or have same value as used to construct the corresponding field identifier.

If a field definition contains the subfield keys indicating a variable field, it MUST NOT contain keys for flat fields (positions, pattern and/or codes).

Applications MAY allow and remove occurrence keys with value two zeroes (00) as alias for a field definition without occurrence.

Example
  • MARC field 240 specified as mandatory and non-repeatable:

    {
      "tag": "240",
      "label": "Uniform Title",
      "url": "https://www.loc.gov/marc/bibliographic/bd240.html",
      "required": true,
      "repeatable": false,
      "modified": "2017-12"
    }
  • PICA field 045B/02 in K10plus format

    {
      "tag": "045B",
      "occurrence": "02",
      "pica3": "5022",
      "label": "Systematik für Bibliotheken (SfB)",
      "repeatable": true,
      "subfields": {
        "a": { "label": "Notation", "repeatable": true },
        "A": { "label": "Quelle", "repeatable": true }
      }
    }
  • MARC field 007 with

Positions

Subfield values and flat field values can be specified positions, being a JSON object that maps character positions to data element definitions. A character position is a range. It is RECOMMENDED to use sequences of two digits.

A data element definition is a JSON object that SHOULD contain key:

  • label with the name of the data element
  • start with the start number of the character position
  • end with the end number of the character position or the start number if there is no end number

The data element definition MAY further contain keys:

  • url with an URL link to documentation
  • description with additional description
  • codes with a codelist with codes of length defined by the character position range
  • flags with a codelist with codes of same length being a proper divisor of the length of the character position range
  • pattern with a regular expression
  • groups with pattern groups of the regular expression

Character positions of a positions object MUST NOT overlap. Two character positions overlap if there is a string that matches both of them.

Examples
  • Positions in MARC 21 Bibliographic field 005:

    {
      "00-03": { "label": "year", "start": 0, "end": 3 },
      "04-05": { "label": "month", "start": 4, "end": 5 },
      "06-07": { "label": "day", "start": 6, "end": 7 },
      "08-09": { "label": "hour", "start": 8, "end": 9 },
      "10-11": { "label": "minute", "start": 10, "end": 11 },
      "12-15": { "label": "second", "start": 12, "end": 15 }
    }
  • Position 33-35 in MARC 21 Bibliographic field 008 can hold a combination of flags, filled up with spaces:

    {
      "33-35": {
        "flags": {
          " ": { "label": "No specified special format characteristics" },
          "e": { "label": "Manuscript" }  ,
          "j": { "label": "Picture card } post card" },
          "k": { "label": "Calendar" },
          "l": { "label": "Puzzle" },
          "n": { "label": "Game" },
          "o": { "label": "Wall map" }   ,
          "p": { "label": "Playing cards" }       ,
          "r": { "label": "Loose-leaf" },
          "z": { "label": "Other" } ,
          "|": { "label": "No attempt to code" }
        },
        "pattern": "^[^ ]* *$"
    }

Pattern groups

Pattern groups are a JSON object that maps numbers of capturing groups (starting with "1") of a regular expression to documentation objects, each with optional keys:

  • label with the name of the group
  • description with additional description of the group
  • url with an URL link to documentation of the group
Example
{
  "pattern": "^([0-9]{4})-([01][0-9])-([0-3][0-9])$",
  "groups": {
    "1": { "label": "year" },
    "2": { "label": "month" },
    "3": { "label": "day" },
  }
}

Subfield schedule

A subfield schedule is a JSON object that maps subfield codes to subfield definitions. A subfield code is a single character. A subfield definition is a JSON object that SHOULD contain keys:

  • code with the subfield code
  • label with the name of the subfield
  • repeatable with a boolean value, assumed as false by default
  • required with a boolean value, assumed false by default

The subfield definition MAY further contain keys:

  • pattern with a regular expression
  • groups with pattern groups of the regular expression
  • positions with a specification of positions
  • codes with a codelist
  • rules with external validation rules
  • url with an URL link to documentation
  • description with a description of the subfield
  • comment with an additional comment about the subfield
  • pica3 with a corresponding Pica3 syntax definition
  • created with a timestamp when this subfield was introduced
  • modified with a timestamp when this subfield was updated
  • deprecated with a boolean value, assumed as false by default
  • total with a non-negative integer to indicate the number of times this subfield has been found
  • records with a non-negative integer to indicate the number of records this subfield has been found in

The subfield definition MAY but SHOULD NOT contain an additional, deprecated key

  • order with a non-negative integer used to specify a partial or complete order of subfields
Example
  • Subfield schedule for MARC 21 bibliographic field 250 (Edition Statement):

    {
      "a": {
        "label": "Edition statement",
        "repeatable": false,
        "pattern": "\\.$"
      },
      "b": {
        "label": "Remainder of edition statement",
        "repeatable": false
      },
      "3": {
        "label": "Materials specified",
        "repeatable": false
      },
      "6": {
        "label": "Field link and sequence number",
        "repeatable": true
      }
    }

Indicator definition

An indicator definition is a JSON object that SHOULD contain key

  • label with the name of the indicator

and further MAY contain keys:

  • url with an URL link to documentation
  • description with additional description of the indicator
  • codes with a codelist of single character codes
  • pattern with a a regular expression
  • groups with pattern groups of the regular expression
Example
{
  "label": "Type",
  "codes": {
    " ": "Abbreviated key title",
    "0": "Other abbreviated title"
  }
}

Codelist

A codelist is

  • either a JSON object that maps codes to code definitions (explicit codelist)
  • or a non-empty string that SHOULD be an URI (codelist reference).

A code is a non-empty string. A code definition is either a string or a JSON object with optional keys:

  • code with the code
  • label with the name of the code
  • description with additional description of the code
  • created with a timestamp when this code was introduced
  • modified with a timestamp when this code was updated
  • url with a link to documentation of the code
  • deprecated with a boolean value, assumed as false by default

Optional key code of a code definition MUST be equal to the key of the code definition in its codelist.

A code definition being a string MUST be treated identical to a codelist definition being JSON object with only key label having the value of the string.

A codelist directory is a JSON object that maps codelist references to JSON objects each having at least the mandatory key codes with a codelist and optional keys:

  • title with a name of the codelist
  • description with additional description of the codelist
  • created with a timestamp when this codelist was introduced
  • modified with a timestamp when this codelist was updated
  • url with a homepage URL or link to documentation of the codelist

A codelist reference can be resolved by looking up its value as key in the codelist directory to get the corresponding explicit codelist.

Examples
  • Explicit, reference, and codelist directory:

    {
      " ": "No specified type",
      "a": {
        "label": "Archival",
        "created": "2022"
      },
      "x": {
        "code": "x"
      }
    }
    "http://id.loc.gov/vocabulary/languages"
    {
      "http://id.loc.gov/vocabulary/languages": {
        "title": "MARC List for Languages",
        "codes": {
          "eng": { "label": "English" },
          "fre": { "label": "French" }
        }
      }
    }

External validation rules

An Avram Schema MAY include references to additional validation rules with key rules at the root level, at field schedules, and at subfield schedules to check additional data types or integrity constraints. The value of this keys MUST be an array. The elements of this array MUST

  • either a rule identifier, being an IRI reference that MUST NOT be equal to names of validation rules,

  • or a JSON object with arbitrary contents. The object SHOULD include a key class having an IRI reference as value to specify the kind of rule.

Example
{
  "fields": {
    "birth": {
      "subfields": {
        "Y": { "label": "year" },
        "M": { "label": "month" },
        "D": { "label": "day" }
      },
      "rules": ["http://example.org/valid-date"]
    },
    "death": {
      "subfields": {
        "Y": { "label": "year" },
        "M": { "label": "month" },
        "D": { "label": "day" }
      },
      "rules": ["http://example.org/valid-date"]
    },
    "age": {
      "rules": ["xsd:nonNegativeInteger"]
    }
  },
  "rules": [
    {
      "class: "conditional-rule",
      "if": "birth?",
      "then": "birth.Y < 1950",
      "description": "birth only allowed before 1950 for privacy reasons"
    }
  ]
}

Restrictions by format family

A format family restricts the model of records than can be described by an Avram schema. Known values of schema key family imply restriction on field identifiers and field definitions.

flat formats

Field identifiers are plain tags. Field definitions MUST NOT include keys occurrence, counter, indicator1, indicator2, or subfields.

marc formats

Field identifiers are plain tags and MUST either be the string LDR or three digits. Field definitions MUST NOT include keys occurrence or counter. Field definitions of flat fields MUST NOT have keys indicator1 or indicator2.

pica formats

Field identifiers MUST NOT include a field counter if its tag starts with digit 0 or 1 and MUST NOT include a field occurrence if its tag starts with digit 2.Tags MUST match the regular expression ^[012][0-9][0-9][A-Z@]. Field definitions MUST NOT include keys indicator1 or indicator2.

mab formats

Field identifiers are plain tags and MUST consist of excactely three digits. Field definitions MUST NOT include keys indicator2, occurrence, or counter.

Metaschema

A JSON Schema to validate Avram Schemas is available at https://format.gbv.de/schema/avram/schema.json.

Applications MAY extend the metaschema for particular format families and formats, for instance by further restriction of the allowed set of field identifiers.

Validation rules

Avram schemas can be used to validate records based on validation rules specfied in this section (marked in bold and numbered from 1 to 23). Rule 1 to 19 refer to validation of individual records, fields, and subfields. Rule 20 to 22 refer to validation of sets of records. Rule 22 can refer to both.

An Avram validator MAY choose to support only a limited set of validation rules, it SHOULD allow to enable and disable selected rules and it MAY disable selected rules by default. It is RECOMMENDED to disable counting rules (18 to 20) and external rules (21) by default. Support and selection of validation rules MUST be documented.

An Avram validator MAY limit validation to selected format families.

  1. invalidRecord: A set of records is valid against a schema, if all of its records pass record validation against the field schedule of the schema.

Record validation

A record is valid against a field schedule if the following rules are met and every field passes field validation against its corresponding field definition from the field schedule. If rule undefinedField is disabled, fields without corresponding field definition are assumed to be valid.

  1. undefinedField: Every field matches a field identifier in the field schedule.

  2. deprecatedField: The matched field definition must not have key deprecated set to true.

  3. nonrepeatableField: The record does not contain more than one field matching the same field definition with repeatable being false.

  4. missingField: the record contains at least one field for each field definition with required being true.

Field validation

A field is valid against a field definition if the following rules are met:

  1. invalidfieldvalue: if the field is a flat field, its field value must be valid by value validation and by validation with record types.

  2. invalidIndicator: If the field contains indicators, their values must be valid by value validation against the corresponding indicator definition indicator1 (first indicator) and indicator2 (second indicator).

If the field is a variable field:

  1. undefinedSubfield: Every subfield has a corresponding subfield definition.

  2. deprecatedSubfield: The matched subfield definition must not have key deprecated set to true.

  3. nonrepeatableSubfield: For subfield definitions with repeatable being true, the field MUST NOT contain more than one subfield.

  4. missingSubfield: For subfield definitions with required being true, the field MUST contain at least one subfield.

  5. invalidSubfieldValue: Every subfield value is valid by value validation against its corresponding subfield definition.

Tag and occurrence of a field are not included in field validation as they are part of record validation.

Value validation

A value (given as string), is valid if it conforms to a definition (given as field definition, subfield definition, indicator definition, or data element definition) by meeting the following rules:

  1. patternMismatch: If the definition contains key pattern, the value must match its regular expression. The pattern is not anchored by default, so ^ and/or $ must be included to match start and/or end of the value.

  2. invalidPosition: If the definition contains key positions, the value must be valid against its positions.

If the definition contains key codes, the value must further be valid against its codelist (see corresponding rules below).

A value is always valid if the definition contains neither of keys pattern, positions, and codes.

Validation with record types

Record types are arbitrary strings attached to a record as flags. An Avram validator SHOULD support the following rule to enable additional validation depending on record types.

  1. recordTypes: If a field definition contains key types with a JSON object, the record types of a record must be looked up in this object to corresponding typed field definitions. All resulting typed field definitions must then be used for additional value validation with their keys pattern, positions, and codes.
Example
  • A MARC 21 Bibliographic record has a type of material such as Book (BK) and Visual Material (VM) and a category of material such as Text (t) and Motion Picture (m). Both can be encoded as Avram record types (not to be confused with MARC 21 record types). For instance a record may have the two types BK and t. An Avram Schema of MARC 21 Bibliographic format could support these types by including the following in definition of field 008 and 007:

    {
      "tag": "008",
      "url": "https://www.loc.gov/marc/bibliographic/bd008.html",
      "positions": { ...common positions for all materials... },
      "types": {
        "BK": {
          "url": "https://www.loc.gov/marc/bibliographic/bd008b.html",
          "positions": { ...additional positions for books... }
        }, ...
      }  
    },
    {
      "tag": "007",
      "url": "https://www.loc.gov/marc/bibliographic/bd007.html",
      "types": {
        "t": {
          "url": "https://www.loc.gov/marc/bibliographic/bd007t.html",
          "positions": { ... }
        }, ...
      }
    }

Validation with positions

A string value is valid against positions if all substrings defined by character positions of the positions are valid by value validation against the corresponding data element definitions. Character positions are counted by Unicode code points.

  1. invalidFlag: If a data element definition contains key flags, the substring MUST also consist of a concatenation of codes from the codelist defined by flags.

Validation with codelists

  1. undefinedCode: A string value is valid against an explicit codelist if the value is a defined code in this codelist.

  2. deprecatedCode: A string value is not valid against an explicit codelist if its code has key deprecated set to true.

  3. undefinedCodelist: A string value is valid against a codelist reference if the codelist reference can be resolved and the value is defined in the resolved explicit codelist.

Applications MAY also resolve codelist references against externally defined explicit codelists by implicitly extending the codelist directory of the schema. If so, the application MUST make clear whether codelists directly defined in the codelist directory are overriden or extened.

Counting

Avram schemas can also be used to give or expect a number of elements with keys records at root level and keys records and total at field definitions, subfield definitions and code definitions. Support of the following counting rules in Avram validators is OPTIONAL. An Avram validator MUST document whether it supports counting rules or not.

Validation rules for counting are:

  1. countRecord to enable counting the total number of records, and the total numbers or records each field with a field definition, each subfield with a subfield definition, and each code with a code definition is found in.

  2. countField to enable counting the total number each field from the field schedule is found

  3. countSubfield to enable counting the total number each subfield field from a subfield schedule is found

If selected counting rules are supported and enabled, then the following must be checked by an Avram validator:

  • the number of validated records MUST be equal to the value of schema key records if this key exist (rule countRecord).

  • if a field definition of the schema includes key records then the number of input records with this field MUST be equal to the number given by this key (combination of rules countRecord and countField).

  • if a subfield definition of the schema includes key records then the number of input records with a field with this subfield MUST be equal to the number given by this key (combination of rules countRecord and countSubfield).

  • if a field definition of the schema includes key total then the total number this field is contained in input records MUST be equal to the number given by this key (rule countField).

  • if a subfield definition of the schema includes key total then the total number this subfield is contained in input records MUST be equal to the number given by this key (rule countSubfield).

Validation with external validation rules

By default external validation rules are ignored for validation because their semantics is out of the scope of this specification. The following rule can be enabled to require records to met all external rules:

  1. externalRule: Enforces an Avram validator to process all external rules and reject input data as invalid if a rule is violated or cannot be checked.

References

Normative references

Informative references

Implementations and public Avram schemas

  • avram-js reference implementation of an Avram validator, also includes an Avram meta-validator to check whether an Avram schema conforms to this specification
  • QA catalogue Java implementation for MARC-based formats
  • PICA::Schema Perl implementation for PICA-based formats
  • MARC::Schema Perl implementation for MARC-based formats
  • marctable crawls MARC 21 Bibliographic format from Library of Congress as Avram Schema
  • K10plus Avram schemas

Related standards

Appendix

Acknowledgments

Thanks to Péter Király for picking up the idea and for collaborative development. Thanks to Carsten Klee, Ed Summers, Harry Gegic, Johann Rolschewski, Stefan Majewski, Thomas Frings, and Timothy Thompson for comments, code and contributions.

Changes

0.9.7 - work in progress

  • Add code key url
  • Add field and subfield key comment
  • Rename schema field profile to uri

0.9.6 - 2024-01-19

  • Allow labels in typed field definition
  • Allow pattern in indicator definition
  • Add flags in positions
  • Support deprecated codes
  • Add pattern groups

0.9.5 - 2024-01-12

  • Add record types
  • Clarify fixed length of codes in indicator definitions and positions
  • Allow to omit leading zeroes in ranges
  • Support deprecated fields and subfields
  • Add flags and corresponding validation rule

0.9.4 - 2024-01-02

  • Change expressing field counters in field identifiers (xNN to /$xNN)
  • Define indicator value null as placeholder for {" ":{}}
  • Rename and redefine checks as rules

0.9.3 - 2023-12-22

  • Add formal specification of URI and URL based on RFC 3986
  • Allow plain strings as code definition
  • Remove code counting
  • Disallow overlapping field identifiers of a field schedule
  • Rename validation options and replace numbered validation rules

0.9.2 (2023-11-29)

  • Change codelist directory to support codelist metadata (breaking change!)
  • Remove subfield key order and validation option check_subfield_order
  • Enumerate and better describe validation rules
  • Add examples and improve wording
  • Change semantics of counting options on validation

0.9.1 (2023-11-27)

  • Add optional code definition key code.
  • Extend Metaschema.

0.9.0 (2023-10-27)

  • Remove deprecated-fields, deprecated-subfields and deprecated-codes.
  • Allow created and modified at schema, field, subfield and code.
  • Add position keys start and end.

0.8.2 (2022-09-01)

  • Allow pattern, codes and deprecated-codes at flat field definitions
  • Allow flat field values and subfield values to be empty
  • Let dot in regular expressions also match newlines
  • Extend definition of format families

0.8.1 (2022-06-20)

  • Allow simple string for referenced codelists
  • Simplify treatment of overlapping field identifiers
  • Disallow empty string regular expressions
  • Extend formal description of validation
  • Rename "fixed fields" to "flat fields"
  • Add optional schema key family

0.8.0 (2022-04-25)

  • Add codelist directories (codelists)
  • Add external validation rules (checks)
  • Remove field types (types)
  • Allow deprecated-codes also at subfield schedules

0.7.1 (2021-10-01)

  • More explicitly specificy field occurrence and field counter
  • Textual refactoring

0.7.0 (2021-09-29)

  • Rename count to records to not confuse with counter
  • Add total and records at field definitions, subfield definitions and code definitions
  • Allow URIs as codelists and allow codes at subfield level

0.6.0 (2020-09-15)

  • Add counter for PICA-based formats
  • Modify allowed values in occurrence

0.5.0 (2020-08-04)

  • Add option field description in addition to label
  • Add schema field profile to identify schemas

0.4.0 (2019-05-09)

  • Add count and language
  • Change occurrence from three to two digits

0.3.0 (2018-03-16)

  • Add deprecated-subfields

0.2.0 (2018-03-09)

  • Add pattern at subfields and positions
  • Add position at subfields
  • Extend definition of positions
  • Disallow empty strings

0.1.0 (2018-02-20)

  • First version