-
Notifications
You must be signed in to change notification settings - Fork 4
JSON processing specs
(For coders)
A JSON processing spec allows to to specify the entire processing chain for a HXL dataset as a JSON object. Here is a simple example:
{
"input": "https://docs.google.com/spreadsheets/d/1ytPD-f4a8CbNKTfMS3EqZOpBo9LWCk_NDKxJCgmpXA8/edit#gid=1101521524",
"recipe": [
{
"filter": "count",
"patterns": "sector"
}
]
}
Every filter in the HXL Proxy is also available in the JSON processing spec. The advantage of these specs is that you can store and version-control them outside of the HXL Proxy, and can easily roll back changes when there's a problem. The Process JSON spec API endpoint allows you to execute the processing specs on the HXL Proxy and get the results, or you can use the hxlspec command from the Python Python libhxl package, like this:
$ hxlspec spec.json > output.csv
Alternatively, you can import a spec in your own Python code list this, and then operate on it like any other dataset:
import hxl.io, json
with open("my-spec.json", "r") as input:
dataset = hxl.io.from_spec(json.load(input))
The processing spec is a JSON object with the following top-level properties, many of which are similar to the options on the Source and Recipe pages (the only required property is input):
Property | JSON dataype | Description |
---|---|---|
input | string (URL) | (required) the URL of the dataset to use. |
sheet_index | int | If input points to an Excel workbook, this specifies which sheet to use (starting from 1). If unspecified, use the first sheet with HXL hashtags. |
timeout | int | The number of seconds to wait before timing out a web connection. |
verify_ssl | boolean | If false, don't verify SSL certificates (useful for self-signed certs). Defaults to true. |
http_headers | object | Custom HTTP headers to add to the request for the dataset (such as Authorization). The object's properties are the headers, and the values are the header values. |
encoding | string | Character encoding to use for CSV data, such as "utf-8" (will attempt to detect by default). |
tagger | object | A JSON tagger object defining HXL hashtags and attributes to add to an untagged dataset, similar to the Tagger page. |
recipe | array | A list of zero or more JSON filter object(s) configuring filters to apply to the dataset. |
(There is an additional property, allow_local, that is available only in the commandline script or Python library. If true, it will allow input to be a local filename instead of a URL.)
Learn more about the HXL standard at http://hxlstandard.org