Skip to content

Releases: climatepolicyradar/navigator-data-ingest

v2.1.6-beta

24 May 21:16
7947119
Compare
Choose a tag to compare

The previous bug fix for ext4 max filename length of 255 bytes did not take into account the addition of the content hash, so trim futher to ensure we don't exceed that.

v2.1.5-beta

24 May 20:12
fb629ff
Compare
Choose a tag to compare

Max file length was exceeded in s3.

Bug fix as ext4 used in Amazon Linux /tmp directory has a max filename length of 255 bytes, so trim to ensure we don't exceed that.

v2.1.4-beta

24 May 15:47
c0a56c1
Compare
Choose a tag to compare

A further update to the request made in the api_client to try altering the source url to get the correct response.

v2.1.3-beta

24 May 14:49
116b9a8
Compare
Choose a tag to compare

Updating the ingest stage to alter the source url should it fail to download from source.

This involves simply removing any percent symbols from the source url string.

v2.1.2-beta

24 May 11:04
4e213ed
Compare
Choose a tag to compare

Updating the Updates type to have s3_value as optional.

v2.1.1-beta

22 May 16:06
cd93e95
Compare
Choose a tag to compare

Change typing for download_url and source_url from AnyHttpUrl to str.

v2.1.0-beta

19 May 14:47
7420545
Compare
Choose a tag to compare

Ingest Stage to Use Execution Data File

Updating the ingest stage to get the new_and_updated_documents.json file from the input/${timestamp} directory by reading the execution data file which points to this directory.

Explanation:
We use a json file to store information on the step functions run in s3. This is named used the unique stepfunctions execution id. This is as execution id is accessible at all points in the stepfunctions flow whereas the run directory we create via the backend would have to be carted through the entire json path.

This means that now all steps in the stepfunctions flow can access the input json (db-state.json)

v2.0.10-beta

16 May 15:43
4cb0e12
Compare
Choose a tag to compare

Facilitating the download of documents from a download_url such that we can cache documents in an s3 bucket whilst preserving the original source url.

v2.0.9-beta

20 Apr 11:05
a534c4f
Compare
Choose a tag to compare

Fixes an issue caused by not archiving indexer input json files when embeddings need to be regenerated.

v2.0.8-beta

17 Apr 15:39
b6eae6c
Compare
Choose a tag to compare

Sundry