Skip to content

08 people places mvp

djbpitt edited this page Oct 6, 2023 · 16 revisions

Review

In earlier sessions we created model, view, and wrapper scripts that work together (because of the way the controller coordinates them) to deliver a dynamic HTML page to the user that includes a list of titles. We also created an “index” or home page that users land on, navigation links for other modules, and CSS to style the HTML pages.

Yet another acronym - MVP

With this framework in place, we can begin to develop features into their minimum viable product (MVP) form. MVP is a project management term popularized by product managers in the enterprise software-development field. The idea is that it can be useful to build some software products iteratively, with each successive phase of work bringing new value to an already functional product. While enterprise software development prefers this iterative approach primarily for profit reasons, it can be a useful model for humanists because it allows you to troubleshoot your work step by step from the perspective of a user of your application. This makes sense because in most cases you are (or are part of) your own audience. If something doesn’t work well for you, the sooner you can improve on it, the better.

How we understand the minimum and viable parts of MVP in the context a project depends on each project’s requirements. If you plan to make your MVP available to the public, those two terms will have a fairly high threshold for acceptability. If, like us, you plan to make your MVP available only to team members, the standard can be lower. For our project we defined MVP as the stage at which a user could reasonably answer a simple research question related to that feature using the web interface. For example, “Does the corpus include any reference to Charles Dickens?” can be answered after this stage because we developed a feature that exposes the prosopography.

In the present lesson you will implement an MVP for two new features, one that will eventually contribute to the map visualization and the other to the prosopography reading annotations in the final product. By developing the model to source, format, and display the information in the persons.xml and places.xml data files, you’ll create first usable resources on the way to deploying richer, more advanced features that support research objectives.

Goals

  • Create human-readable tables to facilitate data exploration of the prosopography and gazetteer
  • Lay groundwork for future features like reading views and maps
  • Learn typeswitch, an XQuery expression used in many of the view modules in the application

Building the model

To start building the model we first revisited the data in the prosopography and gazetteer files to determine what information and structure would be most useful. We settled on a model that treated each individual and each place as a single entity, but if you have more hierarchical data you may need to represent the relationships in a more complex way. The model for the gazetteer includes a bit of hierarchy: a place can sometimes have a “parent place” (e.g., a building, which is a type of place, may be inside a city, which is a different type of place), and we represent that by nesting the <place> element for the child (e.g., building) inside the <place> element for the parent (e.g., city). When creating the model, you’ll want identify aspects of the data that are relevant to the research goal and to the features that you want to build. If the information you need is not present in the XML files, this is a good moment to revisit your markup and create that data. Not every view will include all information in the source XML; typically the source XML contains all information that you might need for all views, but each individual view might use only some of that information.

Places

A reference resource about places is traditionally called a gazetteer; the TEI uses the term placeography instead.

Source XML

The places.xml file contains entries like:

<place xml:id="bowstreet" type="court">
  <placeName>Bow Street Magistrates Court</placeName>
  <location>
    <geo>51.513611 -0.1225</geo>
  </location>
</place>

All <place> elements have one or more <placeName> children. The <location> is optional, but if it is present, it has a single <geo> child that contains a whitespace-separated pair of numerical values, the first representing the latitude of the place and the second the longitude. <place> elements may be nested inside other <place> elements where the outer place (such as city) contains the inner place (such as a building). We’ll say more about the structure of <place> elements in the next lesson, but the preceding information is enough to develop an MVP perspective.

places.xql

We decided to render the list of places as an HTML table, with columns for name, latitude, longitude, and, where applicable, parent place. If a place has more than one <placeName> we’ll provide them both in the same cell. The first step toward this representation is to create the model, which we do with the following XQuery:

xquery version "3.1";
(:=====
Declare namespaces
=====:)
declare namespace hoax = "http://www.obdurodon.org/hoaxed";
declare namespace m = "http://www.obdurodon.org/model";
declare namespace tei = "http://www.tei-c.org/ns/1.0";
declare namespace html="http://www.w3.org/1999/xhtml";
(:=====
Declare global variables to path
=====:)
declare variable $exist:root as xs:string := 
    request:get-parameter("exist:root", "xmldb:exist:///db/apps");
declare variable $exist:controller as xs:string := 
    request:get-parameter("exist:controller", "/hoaXed");
declare variable $path-to-data as xs:string := 
    $exist:root || $exist:controller || '/data';

declare variable $gazeteer as document-node() := 
    doc($exist:root || $exist:controller || '/data/aux_xml/places.xml');

<m:places>{
for $entry in $gazeteer/descendant::tei:place
let $place-name as xs:string+ := $entry/tei:placeName ! string()
let $geo as element(tei:geo)? := $entry/tei:location/tei:geo
let $lat as xs:string := substring-before($geo, " ")
let $long as xs:string := substring-after($geo, " ")
let $parent as xs:string? := $entry/parent::tei:place/tei:placeName[1] ! string()
return
  <m:placeEntry>
    {$place-name !  <m:placeName>{.}</m:placeName>}
    <m:geo>
      <m:lat>{$lat}</m:lat>
      <m:long>{$long}</m:long>
    </m:geo>
    {$parent ! <m:parentPlace>{.}</m:parentPlace>}
  </m:placeEntry>
}</m:places>

Much of this should look familiar by this stage, as we’re reusing a lot of code to declare global variables for paths, using the m: model namespace, and writing a FLWOR expression. Here are a few details:

  1. A place must have at least one <placeName> element and may have more. The XPath expression selects all such elements and uses the simple mapping operator (!) to compute the string value of each of them.
  2. The element(geo)? construction says that the item, if present, must be an element of type <geo> in the TEI namespace, and there cannot be more than one. We could, alternatively, have specified just element() as the datatype, which would have matched any element, but there’s no reason not to use the strictest available datatyping, which provides the greatest protection against error. (Well … almost no reason. Stricter datatypes may impose a performance penalty, so our practice is to start with strict typing and consider relaxing it only if we notice meaningful performance problems.)
  3. We use substring-before() and substring-after() to break the <geo> value into separate strings for latitude and longitude.
  4. Since not all places have parent places, we make $parent optional. But because a parent place may have more than one <placeName>, we use a numerical predicate to say that for this purpose we want only the first <placeName>.

Sample model output looks like:

<m:placeEntry>
  <m:placeName>Bow Street Magistrates Court</m:placeName>
  <m:geo>
    <m:lat>51.513611</m:lat>
    <m:long>-0.1225</m:long>
  </m:geo>
</m:placeEntry>

Two design decisions that influenced the model XML merit further comment:

  1. The model is flatter than the source XML because it omits the <location> element, which contributes nothing to our goals for place data.
  2. In the <geo> element the TEI follows 1984 World Geodetic System (WGS84) in using the first value to represent the latitude and second to represent the longitude. This isn’t the only available convention, though, and many other systems order longitude before latitude. Our model avoids confusion or uncertainty by using explicit element names instead of largely arbitrary order to distinguish the two values.

We’ll discuss the transformation from a model for places to a view later, but before that we’ll explore how we created the model for persons.

People

A reference resource about persons is traditionally called a prosopography; the TEI uses the term personography instead.

Source XML

The people.xml file contains entries like the following:

<person sex="M" xml:id="johnrussell" role="politician">
  <occupation>Prime Minister</occupation>
  <persName>
    <addName role="honorary">Lord</addName>
    <surname>Russell</surname>
    <forename>John</forename>
  </persName>
  <bibl>
    <ref target="https://www.gov.uk/government/history/past-prime-ministers/lord-john-russell-1st-earl-russell">Lord 
        John Russell was British Prime Minister (Whig) from 1846 - 1852, and again from 1865 to 1866.</ref>
  </bibl>
</person>

We decided that we’d like the eventual output to be in the form of an HTML table with columns for name, “about” (prose from inside the <ref> element), job, role, and sex. Most of these types of information are optional, that is, not required for all persons.

people.xql

Below is the XQuery that creates the model for our table of persons:

xquery version "3.1";
(:=====
Declare namespaces
=====:)
declare namespace hoax = "http://www.obdurodon.org/hoaxed";
declare namespace m = "http://www.obdurodon.org/model";
declare namespace tei = "http://www.tei-c.org/ns/1.0";
declare namespace html="http://www.w3.org/1999/xhtml";
(:=====
Declare global variables to path
=====:)
declare variable $exist:root as xs:string :=
    request:get-parameter("exist:root", "xmldb:exist:///db/apps");
declare variable $exist:controller as xs:string :=
    request:get-parameter("exist:controller", "/hoaXed");
declare variable $path-to-data as xs:string :=
    $exist:root || $exist:controller || '/data';
declare variable $pros as xs:string := $exist:root || $exist:controller || '/data/aux_xml/persons.xml';


<m:persons>
{
    for $person in doc($pros)/descendant::tei:person
    let $surname as xs:string? := $person/tei:persName/tei:surname ! string(.)
    let $forename as xs:string? := ($person/tei:persName/tei:forename
        => string-join(' '))[boolean(.)]
    let $abt as xs:string? := $person//tei:bibl ! normalize-space(.)
    let $job as xs:string? := $person//tei:occupation ! normalize-space(.)
    let $role as xs:string? := $person/@role ! string()
    let $gm as xs:string? := $person/@sex ! string()
    return
        <m:entry>
            <m:name>{string-join(($surname, $forename), ', ')}</m:name>
            <m:about>{$abt}</m:about>
            <m:job>{$job}</m:job>
            <m:role>{$role}</m:role>
            <m:gm>{$gm}</m:gm>
        </m:entry>
}
</m:persons>

The model includes one element in the model namespace for each of the five types of information we identified as important for our purposes. Our handling of names merits a couple of comments:

  1. Some persons have multiple forenames, and we string-join those across a space character when we declare the $forename variable. The string-join() function always returns a string, so if a person had no forenames and we omitted the [boolean(.)] predicate, the value of the $forename variable for that person would be an empty string. We actually want it to be an empty sequence, rather than an empty string, for reasons we explain below, and the function boolean() evaluates to false() for an empty string and true() for any non-empty string. This lets us use a predicate to say “select the result of the string-join() operation only if it is not an empty string; if it is an empty string, the result is an empty sequence”. There are other ways to implement this logic, but we find the use of boolean() inside a predicate easiest to understand.
  2. When we create the <m:name> element we string-join the surname and forename across a comma and space. If $forename is an empty sequence, there’s nothing to join, so no comma plus space are output. This would not be the case were $forename an empty string, which would count as an item for string-join purposes. That difference is why we cared about representing the absence of forenames as an empty sequence, rather than an empty string.

Sample output looks like:

<m:entry>
  <m:name>Russell, John</m:name>
  <m:about>Lord John Russell was British Prime Minister (Whig) from 1846 - 1852, and again from 1865 to 1866.</m:about>
  <m:job>Prime Minister</m:job>
  <m:role>politician</m:role>
  <m:gm>M</m:gm>
</m:entry>

What happens if a value is missing, e.g., if a person lacks a job or role or anything else? Because we always output all five children of an <m:entry>, those values will be empty elements, e.g.:

<m:entry>
  <m:name>Brown</m:name>
  <m:about/>
  <m:job/>
  <m:role>reporter</m:role>
  <m:gm>M</m:gm>
</m:entry>

Whether to represent the absence of information by including an empty element, as we do above, or, alternatively, by omitting the element entirely is up to you. Some developers may choose to omit an element entirely if it isn’t relevant (for example, omit the <job> for an infant who can be assumed not to have had a job) and to include an empty element to indicate that the category is relevant but the value is unknown (at least to the developer). If you make that sort of distinction in your own project you’ll want to document it in your process metadata because the meaning will not be self-evident to those who look at your code in the future.

Constructing the view

We decided to display this data in an HTML table with one column for each of the five types of elements. A table is a type of data visualization, so it is important to understand the implicit assumptions you inherit when showing the data to a user in this way. Here are a few considerations:

  1. How should the data be ordered? Table rows are rendered in an order, and for this type of table it would be natural to sort the rows by the value of the first column, that is, by the name. We don’t regard that as essential for our MVP, but in a real project we would probably want to use alphabetical order so that users could learn easily whether a particular person is present.
  2. The model includes information about all persons mentioned, but no information about how often they are mentioned, which might be important for some research purposes. In future iterations of this visualization it could be helpful to add a column with calculated counts of how often a particular person is mentioned. For now, though, we want only to be able to answer questions about which persons (and places) are present as a way of generating avenues for further data exploration, and to lay the groundwork for enhanced reading and mapping features. There is no neutral or universally correct way to display data, including in a table; the way you present information in the views for your project inevitably incorporates decisions about what to include or exclude and how to organize and arrange the contents, which means that you want to make mindful decisions.

Using the XQuery typeswitch expression

Recursive typeswitch expressions are a useful XQuery feature that mimics some of the XML-to-XML transformation power of XSLT. It is possible, and in some cases preferable, to use XSLT in your eXist-db application. In the Institute for which we developed this project we chose to avoid using XSLT because we did not have time to teach it adequately and it would therefore have become a limiting prerequisite. Like XSLT templates, recursive typeswitch statements let you declare how you want to process specific node types, with the limitation that recursive typeswitch can match only node types and element names, while XSLT can incorporate predicates into its @match attributes. Whether you use XQuery or XSLT is a project-dependent choice, but for many purposes, including this project, typeswitch expressions are able to replicate the speed and ease of XSLT when transforming XML documents. We introduce typeswitch here, and then use it more extensively in stage 13-reading-view.

We recommend reading 8.4.1 Transforming data with recursive typeswitch starting on page 179 of XQuery for humanists before proceeding in this section. We will follow the same steps they do to construct our view pages, but our code differs in places, so feel free to compare the code examples below to those in XQuery for humanists as you read along.

declare function local:dispatch($node as node()) as item()* {
  typeswitch($node)
    case text() return $node (: change nothing, return the text:)
    case element(m:places) return local:table($node) (: apply the local:table function to an m:places element:)
    case element(m:placeEntry) return local:row($node) (:apply the local:row function to an m:placeEntry element:)
    case element(m:placeName) return local:placeName($node) (: apply the local:placeName function to an m:placeName element:)
    case element (m:lat) return local:cell($node) (: apply the local:cell function to an m:lat element:)
    case element (m:long) return local:cell($node) (: apply the local:cell function to an m:long element:)
    case element (m:parentPlace) return local:cell($node) (: apply the local:cell function to an m:parentPlace element:)
    default return local:passthru($node) 
      (: apply the local:passthru function to any element not matched by a case statement above :)
;

First let’s look at the function signature, that is, the function name, the type of input it accepts (called the function parameters), and the type of result it produces. XQuery pre-declares the local: namespace, which means that you can use it without declaring it yourself, and it’s the customary namespace for functions that are used only in a single file. The local:dispatch() function expects a single node (of any type) as its input and it returns zero or more items. item() is the most general datatype, so the local:dispatch() function can ultimately return one or more nodes or atomic values of any type.

The local:dispatch() function invokes a typeswitch expression, which tests the type of a node and either outputs it directly (if it’s a text node) or applies another function to it. Our goal, then, is to use typeswitch to process every node in the model namespace by passing that node to a helper function that will transform it to an appropriate HTML node. Several types of nodes in the model namespace are transformed in the same way; specifically, latitude, longitude, and parent place are all handled by the same local:cell() function, and by saying “invoke the local:cell() function for all of these types” we can avoid having to repeat the code for each one separately.

The way we get typeswitch to see and process every node in the model is to use recursion. That is, we tell it to process the root of the model ourselves, it invokes a helper function, and that and all other helper functions process a particular node type and, among other things, apply local:dispatch() to its child nodes. This is called a tree traversal, that is, every time our XQuery sees a node it processes it and reapplies local:dispatch() to process its child nodes, stopping only when there are no children.

Let’s take a look at the local:passthru function:

declare function local:passthru($node as node()) as item()* {
    for $child in $node/node() return local:dispatch($child)
};

In human language, when this function is invoked, it says “for every child node of whatever node is passed to us (down to the very text nodes!), run the local:dispatch() function with that child node as input.” The local:dispatch() function contains our typeswitch expression, so every node processed by local:passthru() will be forwarded to local:dispatch(), where it will be checked against the cases we listed. Even for a prosopography with thousands of entries, this code would work to create an HTML table of the data.

We also need to create our helper functions. Here’s the one for latitude, longitude, and parent places; it creates an HTML table cell (a <td> element) and then applies local:passthru() to all of the children:

declare function local:cell ($node as element()) as element(html:td){
  <html:td>{local:passthru($node)}</html:td>
};

Below is the complete XQuery file that creates the view for places:

places-to-html.xql

xquery version "3.1";
(:=====
Declare namespaces
=====:)
declare namespace hoax = "http://www.obdurodon.org/hoaxed";
declare namespace m = "http://www.obdurodon.org/model";
declare namespace tei = "http://www.tei-c.org/ns/1.0";
declare namespace html="http://www.w3.org/1999/xhtml";

(:=====
the function request:get-data(); is an eXist-specific XQuery
function that we use to pass data among XQuery scripts via 
the controller.
=====:)
declare variable $data as document-node() := request:get-data();

declare function local:dispatch($node as node()) as item()* {
  typeswitch($node)
    case text() return $node
    case element(m:places) return local:table($node)
    case element(m:placeEntry) return local:row($node)
    case element(m:placeName) return local:placeName($node)
    case element (m:lat) return local:cell($node)
    case element (m:long) return local:cell($node)
    case element (m:parentPlace) return local:cell($node)
    default return local:passthru($node)
};

declare function local:table($node as element(m:places)) as element(html:table){
  <html:table id="places">
    <html:tr>
      <html:th>Placename</html:th>
      <html:th>Latitude</html:th>
      <html:th>Longitude</html:th>
      <html:th>Parent place</html:th>
    </html:tr>
    {local:passthru($node)}
    </html:table>
};
declare function local:row ($node as element(m:placeEntry)) as element(html:tr){
  <html:tr>{local:passthru($node)}</html:tr>
};
declare function local:cell ($node as element()) as element(html:td){
  <html:td>{local:passthru($node)}</html:td>
};
declare function local:placeName($node as element(m:placeName)) as element(html:td)? {
  if (not($node/preceding-sibling::m:placeName))
  then 
    <html:td>{string-join($node/../m:placeName, "; ")}</html:td>
  else ()
};
declare function local:passthru($node as node()) as item()* {
  for $child in $node/node() return local:dispatch($child)
};

local:dispatch($data)

While writing these functions, we used the XQuery/Typeswitch Transformations page from the XQuery wikibook as our guide, and you can see the patterns from that site mirrored in the code above. The XQuery for humanists example code (pages 182–91) differs from ours in two principal ways:

  1. Instead of using helper functions, the XQuery for humanists version embeds the handling for all element types directly in the case statements. With a small number of element types and simple processing this can be easier to read, but it can become cluttered as it grows larger. We prefer the separate helper functions because they help us focus on one thing at a time: our local:dispatch() is responsible for determining how each type of input node should be processed, but it doesn’t concern itself with how that processing is implemented. Meanwhile, each helper function is responsible only for processing a single type of input node.
  2. The XQuery for humanists local:transform() function accepts multiple nodes as simultaneous input, while our local:dispatch() accepts only one node at a time. Meanwhile, our local:passthru() uses a for expression to send each child separately into local:dispatch(), while the XQuery for humanists counterpart (part of each case statement instead of a separate function) sends all of the children off for processing in a single statement and puts the logic to handle them one at a time inside local:transform(). Our approach reflects our preference for thinking of local:dispatch() as receiving one node at a time, and that’s why we put the for statement inside local:passthru() instead of inside local:dispatch().

Which of these two approaches you take is a matter of personal preference; the two produce the same results.

Below is the complete code we use for the people-to-html.xql, which uses the same general approach as our places-to-html.xql:

xquery version "3.1";
(:=====
Declare namespaces
=====:)
declare namespace hoax = "http://www.obdurodon.org/hoaxed";
declare namespace m = "http://www.obdurodon.org/model";
declare namespace tei = "http://www.tei-c.org/ns/1.0";
declare namespace html="http://www.w3.org/1999/xhtml";

(:=====
the function request:get-data(); is an eXist-specific XQuery
function that we use to pass data among XQuery scripts via 
the controller.
=====:)
declare variable $data as document-node() := request:get-data();

declare function local:dispatch($node as node()) as item()* {
    typeswitch($node)
        case text() return $node
        case element(m:persons) return local:table($node)
        case element(m:entry) return local:row($node)
        case element(m:name) return local:cell($node)
        case element (m:about) return local:cell($node)
        case element (m:job) return local:cell($node)
        case element (m:role) return local:cell($node)
        case element (m:gm) return local:cell($node)
        default return local:passthru($node)
};

declare function local:table($node as element(m:persons)) as element(html:table){
    <html:table>
    <html:tr>
        <html:th>Name</html:th>
        <html:th>About</html:th>
        <html:th>Job</html:th>
        <html:th>Role</html:th>
        <html:th>Sex</html:th>
    </html:tr>
    {local:passthru($node)}
    </html:table>
};
declare function local:row ($node as element(m:entry)) as element(html:tr){
    <html:tr>{local:passthru($node)}</html:tr>
};
declare function local:cell ($node as element()) as element(html:td){
    <html:td>{local:passthru($node)}</html:td>
};
declare function local:passthru($node as node()) as item()* {
    for $child in $node/node() return local:dispatch($child)
};
local:dispatch($data)

Bonus activities

  1. Enhance the tables of persons and places to sort them alphabetically.
  2. Enhance the table of persons to include a count of the number of times the person is mentioned in the corpus.
  3. Uh oh! Persons mentioned in the article files are tagged as <persName ref="#pointer">, where pointer is replaced by the @xml:id of a <person> in the prosopography. For example, a mention of John Russell (see above) would read <persName ref="#johnrussell">Lord John Russell</persName>. But there are 13 <persName> elements in the article corpus (some pointing to the same person) and 23 <person> elements in the prosopography. Are there spurious entries in the prosopography? Are there persons in the article corpus who haven’t been tagged properly? Use XQuery (in eXide) to explore the data and decide how you would address this discrepancy in a real project.
  4. Uh oh! Again! Some persons tagged as <persName> in the article corpus have no @ref attribute because they have no corresponding entries in the prosopography. Find those persons and decide how you would address this discrepancy in a real project.

Summary

In this section you learned about the concept of MVP (minimum viable product) as a way of thinking about iterative design. You also learned about the typeswitch expression in XQuery, which you used to transform data from model to HTML for the view component of your MVC architecture. All code explored in this section is available in the 08-people-places-mvp branch.