Skip to content

Latest commit

 

History

History
287 lines (223 loc) · 16.9 KB

rules.md

File metadata and controls

287 lines (223 loc) · 16.9 KB

RDF2MARC Conversion Language

Conversion rules for generating MARC fields are specified in a simple but flexible XML format, with elements in the namespace http://www.loc.gov/bf2marc. The conversion rules are used by the compile.xsl stylesheet to generate an XSL stylesheet for converting BIBFRAME descriptions encoded as RDF/XML to MARCXML.

Usage

You can generate the conversion stylesheet using xsltproc:

xsltproc src/compile.xsl rules.xml > bibframe2marc.xsl

The Makefile included with this project will generate a top-level rules.xml file in the xsl directory based on the files in the xsl/rules directory, and use that file to generate the bibframe2marc.xsl stylesheet.

XML elements

  • rules: the root element of any rules document. The rules element can contain the elements version, file, map, cf, df, select, and switch. The order of the rules determines the field order of the generated MARC record.
<rules xmlns="http://www.loc.gov/bf2marc">
  <version>0.1.0-SNAPSHOT</version>
  <file>rules/00-LDR.xml</file>
  <cf tag="001">
    <transform><xsl:value-of select="$pRecordId"/></transform>
  </cf>
  <df tag="500">
    <ind1 default=" "/>
    <ind2 default=" "/>
    <sf code="a">
      <transform>MARC record generated by DLC bibframe2marc <xsl:value-of select="$vCurrentVersion"/><xsl:if test="$pGenerationDatestamp != ''">: <xsl:value-of select="$pGenerationDatestamp"/></xsl:if></transform>
    </sf>
  </df>
</rules>
  • version: the version of the conversion rules. Populates the vCurrentVersion variable of the generated stylesheet. Only the top-level version element is processed (the version is not processed for any included files).
<version>0.1.0-SNAPSHOT</version>
  • file: the path to a file containing another rules document relative the to current directory. This provides a mechanism for splitting up the conversion rules into more manageable chunks. Rules files can be included to an arbitrary depth.
<file>rules/00-LDR.xml</file>
  • map: Create a lookup table. The map element should contain a flat XML data structure. It has a required name attribute. It creates a variable with the name of the name attribute in the stylesheet that contains the data structure specified. This data structure can then be referenced in XPath expressions with the exsl:node-set() function or using the lookup element.
<bf2marc:rules xmlns:bf2marc="http://www.loc.gov/bf2marc">
  <bf2marc:map name="instruments">
    <instrument>
      <code>ba</code>
      <type>brass</type>
      <label>horn</label>
    </instrument>
    <instrument>
      <code>bb</code>
      <type>brass</type>
      <label>trumpet</label>
    </instrument>
  </bf2marc:map>
</bf2marc:rules>
  • key: Build an XSLT key from a set of elements in the source document. The key element creates an xsl:key in the output stylesheet with the same attributes. This key can then be referenced in XPath expressions or XSL fragments.
<key name="kMusicMediumSource" match="bf:MusicMedium" use="bf:source/bf:Source/rdfs:label"/>

Building conversion rules

Two high-level elements are used to encode the conversion rules that generate MARC fields.

  • cf: conversion rules for the generating MARC leader and control fields. These rules should generate strings. The cf element requires a tag attribute. The special attribute value tag="LDR" will generate a MARC leader. The optional attribute chopPunct, if set to "true", will remove enclosing parentheses brackets, braces, and quotes, and ending punctuation except for a period or ellipsis.
<cf tag="001">
  <transform><xsl:value-of select="$pRecordId"/></transform>
</cf>
  • df: conversion rules for generating MARC data fields. The df element requires a tag attribute, which should contain a 3-character value. An optional boolean attribute, repeatable, if set to "false" will prevent the data field from being generated more than once. The optional attribute lang-xpath holds an XPath expression for a single property with an xml:lang attribute that can be used to generate an xml:lang tag attribute on the data field. Note: using an expression like rdfs:label|bf:code will result in unexpected results! The optional attribute lang-prefer generates processing code that attempts to prefer vernacular or transliterated versions of literals, based on the lang-xpath and the cataloging language script parameter pCatScript.

The df element is more complex. In addition to the rule building blocks documented below, the following elements are required:

  • ind1: rule for generating the first indicator of the MARC data field. The ind1 element requires a default attribute. This rule should generate a single character legal for a MARC indicator.
  • ind2: rule for generating the second indicator of the MARC data field. The ind2 element requires a default attribute. This rule should generate a single character legal for a MARC indicator.
  • sf: rules for generating MARC subfield values. At least one sf element is required. The sf element requires a code attribute, which should contain a 1-character value legal for a MARC subfield code. The optional attribute chopPunct, if set to "true", will remove enclosing parentheses brackets, braces, and quotes, and ending punctuation except for a period or ellipsis. This element is repeatable within the df element. The order of sf elements determines the order of subfields in the generated MARC data field. An optional boolean attribute, repeatable, if set to "false", will prevent the subfield from being generated more than once. These rules should generate strings.
<df tag="500" lang-xpath="rdfs:label">
  <ind1 default=" "/>
  <ind2 default=" "/>
  <sf code="a">
    <transform>MARC record generated by DLC bibframe2marc <xsl:value-of select="$vCurrentVersion"/><xsl:if test="$pGenerationDatestamp != ''">: <xsl:value-of select="$pGenerationDatestamp"/></xsl:if></transform>
  </sf>
</df>

Rule building blocks

  • Bare values: A bare value in an element will generate a constant value in the MARC data element. Note that if there is a bare value in an element, any other processing rules will be ignored. If a bare value is enclosed in a text element, leading and trailing white space will be preserved.

    • Bare values can be used with the cf, ind1, ind2, sf, position, select, and case elements.
<df tag="500">
  <ind1 default=" "/>
  <ind2 default=" "/>
  <sf code="a">MARC record generated by DLC bibframe2marc</sf>
</df>
  • context: Set the context for the contained processing rules. Required attribute xpath contains an XPath path expression to set the context. Any XPath expressions in the contained elements will be evaluated in the context set by the containing context element. If the context is not matched in the source document, the MARC element will not be generated. Generates an xsl:template or xsl:for-each element in the output stylesheet. Note that multiple context blocks in non-repeatable fields are not allowed. If there is a context block in an element, other processing rules will be ignored.

    • This element can be used with the cf and df elements.
<cf tag="003">
  <context xpath="rdf:RDF/bf:Work/bf:adminMetadata/bf:AdminMetadata/bf:source/bf:Source/bf:code">
    <select xpath="."/>
  </context>
</cf>
  • fixed-field/position: Generate a string for use in a fixed-length fields. A fixed-field element must contain one or more position elements, the results of which will be concatenated to generate the value for the target MARC data element. The position element has a required default attribute.

    • These elements can be used with the cf and sf elements.
<cf tag="LDR">
  <fixed-field>
    <position default="     "/>
    <position default="n">
      <select xpath="substring(rdf:RDF/bf:Work/bf:adminMetadata/bf:AdminMetadata/bf:status/bf:Status/bf:code,1,1)"/>
    </position>
  </fixed-field>
</cf>
  • select: Use an XPath path expression to select a value for use in the target MARC data element, or a value or nodeset for processing. The output of a select element should be a string for use in a MARC data element. Note that the select element will also set the context for any contained XPath expressions. The select element generates an xsl:for-each element in the output stylesheet. In the context of a non-repeatable field or subfield, multiple select elements are not allowed.

    • This element can be used with the rules, cf, ind1, ind2, sf, position, and case elements.
<sf code="b" repeatable="false">
  <select xpath="bf:subtitle"/>
</sf>
  • var: Create a locally-scoped XSL variable within a context. The variable has the name from the name attribute and the value from either the xpath attribute or from a internal switch element. The variable can be used within the current context in transforms and xpath expressions.

    • This element can only be used with the context and select elements.
<select xpath="bf:title/*">
  <var name="vTitleClass" xpath="local-name()"/>
  <switch>
    <case test="$vTitleClass=KeyTitle">
      <df tag="222"/>
    </case>
  </switch>
  </select>
  • switch/case: These elements offer a form of conditional processing. A switch element contains one or more case elements. Each case element has a required test attribute, which contains an XPath expression. If the XPath expression evaluates to true(), the case element matches. The first case element to match is evaluated. If no case elements match, no value is generated by the switch element. The special test attribute value "default" (e.g. test="default") provides default processing, if needed.

    • These elements can be used with the rules, cf, ind1, ind2, sf, position, and select elements.
<sf code="a" repeatable="false">
  <switch>
    <case test="bf:mainTitle"><select xpath="bf:mainTitle"/></case>
    <case test="rdfs:label"><select xpath="rdfs:label"/></case>
  </switch>
</sf>
  • if: This element provides basic confitional processing. An if element is only supported as a childe of rules but can contain any other element. Each if requires a test attribute, which contains an XPath expression. If the XPath expression evaluates to true(), the if element matches.
  <if test="$xslProcessor = 'libxslt'">
      <switch>
        <case test="bf:Work/bf:hasPart[contains(@rdf:resource, 'hubs')]
        ">
        <transform>
          <xsl:message>Record <xsl:value-of select="$vRecordId"/>: 
              Unprocessed relationship node(s)/Hub(s) 
              <xsl:value-of select="name()"/>.  Repeatable target 
              field 7XX.</xsl:message>
        </transform>
        </case>
      </switch>
  </if>
  • lookup/lookupField: These elements can be used to look up a value in a map using values from the current context, or using a static value. The lookup element has 2 required attributes: map identifies the map for the lookup, and targetField identifies the field in the map that contains the targeted value. A lookup element contains 1 or more lookupField elements, which are and-ed together to search the map. The lookupField element has a required name attribute that identifies the map field in which to look for the value. The optional xpath attribute contains an XPath expression in the current context to use as a lookup value. If there is no xpath attribute, the text value of the lookupField element is used as the lookup value.

    • These elements can be used with the cf, ind1, ind2, sf, position, case and var elements.
<df tag="048">
  <context xpath="//bf:Work/bf:instrument/bf:MusicInstrument">
    <ind1 default=" "/>
    <ind2 default=" "/>
    <sf code="a">
      <lookup map="instruments" targetField="code">
        <lookupField name="type">brass</field>
        <lookupField name="label" xpath="rdfs:label"/>
      </lookup>
    </sf>
  </context>
</df>
  • transform: Provide an XSL fragment for processing the current context.

    • This element can be used with the rules, cf, ind1, ind2, sf, position, case, and select elements.
<cf tag="001">
  <transform><xsl:value-of select="$pRecordId"/></transform>
</cf>

XSL variables and stylesheet parameters

The following global variables in the generated stylesheet are available for use in XSL fragments and XPath expressions:

  • vAdminMetadata: The node set that represents the bf:AdminMetadata object for the BIBFRAME description. By default, it is from the XPath /rdf:RDF/bf:Instance/bf:adminMetadata/bf:AdminMetadata (the top-level bf:Instance subject). If there is no object at that path, the variable will be created from /rdf:RDF/bf:Work/bf:adminMetadata/bf:AdminMetadata (the top-level bf:Work subject).
  • vRecordId: A record ID that is calculated for the current BIBFRAME description, with this priority:
    1. The value of the pRecordId parameter, if it is passed to the stylesheet
    2. The value in /rdf:RDF/bf:Work/bf:adminMetadata/bf:AdminMetadata/bf:identifiedBy/bf:Local/rdf:value, if there is no bf:source property or the bf:source/bf:Source/rdfs:label value is "DLC".
    3. generate-id() (default)
  • vCurrentVersion: The value of the version element of the top-level rules document.

In addition, the following string parameters can be passed to the stylesheet and used as global variables in XSL fragments and XPath expressions:

  • pRecordId: The assigned record ID for the description, e.g. for use in the MARC 001 control field.
  • pGenerationDatestamp: Defaults to date:date-time() or fn:current-dateTime() if the functions are available to the XSLT processer. Generated defaults are in YYYYMMDDhhmmss.0 format (ISO 8601) for use in an 005 or 884.
  • pConversionAgency: MARC organization code of the institution doing the data generation, e.g. for use in the 003 or the 884. Defaults to DLC.
  • pCatScript: The ISO 15924 script subtag of the cataloging language, for dealing with multi-script records. Defaults to Latn.
  • pSourceRecordId: An identifier for the source record, perhaps a URI
  • pGenerationUri: Identifier for the generation process, e.g. a Github URL. Defaults to https://github.com/lcnetdev/bibframe2marc.

XSL named templates

The following named templates are defined in the generated stylesheet for use in XSL fragments:

  • tChopPunct: Chops enclosing quotes, parens, brackets, and end punctuation. Used internally when stylesheet parameters pChopPunct is "true".
  • tPadRight: Return a right-padded string.
  • tPadLeft: Return a left-padded string.
  • EDTF-Date1: Return the first date from an EDTF date range. If the EDTF date is not a range, will simply return the date.
  • EDTF-Date2: Return the second date from an EDTF date range. If the EDTF date is not a range, will return an empty string.
  • EDTF-DatePart: Return the date part of a single EDTF date (not a range).
  • EDTF-TimePart: Return the time part of a single EDTF date (not a range).
  • EDTF-TimeDiff: Return the time shift part of a single EDTF date (not a range).
  • EDTF-to-033: Return a string formatted as a date for the 033/263 fields from a single EDTF date (not a range).
  • tScriptCode: Extract the script subtag from an xml:lang attribute.
  • tUriCode: Extract the code (last path element) of an id.loc.gov URI
  • tToken2Subfields: Tokenize a string into a set of subfields
  • tGetRelResource: Return a MARC record. Template will convert marcKey to small MARC record or call tGetMARCAuth. MARC record may be an authority or BibHub. If using xsltproc, must process template result set with exsl:node-set().
  • tGetMARCAuth: Return a MARC record from id.loc.gov as a node set. Note special processing for LOC authorities retrieves an SRU result set, so you may need to dig for the MARC record to use the values! See the test examples in rules/test/test/04-1XX.xspec.
  • tGetLabel: Return a label for a resource based in URI. Expects to lookup data at ID.LOC.GOV. Will not work with xsltproc.

Limitations

  • The behavior of context and select blocks in a non-repeatable field is somewhat limiting. For a non-repeatable field, there can only be one context or select block.
  • Lookup support is aggravated by xsltproc limitations, principally a lack of support for HTTPS.

XML namespaces

The generated stylesheet uses the following namespace prefixes:

  • rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#
  • rdfs: http://www.w3.org/2000/01/rdf-schema#
  • marc: http://www.loc.gov/MARC21/slim
  • bf: http://id.loc.gov/ontologies/bibframe/
  • bflc: http://id.loc.gov/ontologies/bflc/
  • madsrdf: http://www.loc.gov/mads/rdf/v1#
  • xsl: http://www.w3.org/1999/XSL/Transform
  • local: local:

These prefixes should be used in any XPath expressions and embedded XSL fragments to support XSL transformation. For your rules documents to validate, you may need to import these namespaces, along with the bf2marc namespace.