Skip to content

JSON processing specs

David Megginson edited this page Jul 7, 2020 · 4 revisions

(For coders)

A JSON processing spec allows to to specify the entire processing chain for a HXL dataset as a JSON object. Here is a simple example:

{
    "input": "https://docs.google.com/spreadsheets/d/1ytPD-f4a8CbNKTfMS3EqZOpBo9LWCk_NDKxJCgmpXA8/edit#gid=1101521524",
    "recipe": [
        {
            "filter": "count",
            "patterns": "sector"
        }
    ]
}

Every filter in the HXL Proxy is also available in the JSON processing spec. The advantage of these specs is that you can store and version-control them outside of the HXL Proxy, and can easily roll back changes when there's a problem. The Process JSON spec API endpoint allows you to execute the processing specs on the HXL Proxy and get the results, or you can use the hxlspec command from the Python Python libhxl package, like this:

$ hxlspec spec.json > output.csv

Alternatively, you can import a spec in your own Python code list this, and then operate on it like any other dataset:

import hxl.io, json

with open("my-spec.json", "r") as input:
    dataset = hxl.io.from_spec(json.load(input))

Properties

The processing spec is a JSON object with the following top-level properties, many of which are similar to the options on the Source and Recipe pages (the only required property is input):

Property JSON dataype Description
input string (URL) (required) the URL of the dataset to use.
sheet_index int If input points to an Excel workbook, this specifies which sheet to use (starting from 1). If unspecified, use the first sheet with HXL hashtags.
timeout int The number of seconds to wait before timing out a web connection.
verify_ssl boolean If false, don't verify SSL certificates (useful for self-signed certs). Defaults to true.
http_headers object Custom HTTP headers to add to the request for the dataset (such as Authorization). The object's properties are the headers, and the values are the header values.
encoding string Character encoding to use for CSV data, such as "utf-8" (will attempt to detect by default).
tagger object A JSON tagger object defining HXL hashtags and attributes to add to an untagged dataset, similar to the Tagger page.
recipe array A list of zero or more JSON filter object(s) configuring filters to apply to the dataset.

(There is an additional property, allow_local, that is available only in the commandline script or Python library. If true, it will allow input to be a local filename instead of a URL.)

Clone this wiki locally