Releases: climatepolicyradar/navigator-data-ingest
v2.1.6-beta
The previous bug fix for ext4 max filename length of 255 bytes did not take into account the addition of the content hash, so trim futher to ensure we don't exceed that.
v2.1.5-beta
Max file length was exceeded in s3.
Bug fix as ext4 used in Amazon Linux /tmp directory has a max filename length of 255 bytes, so trim to ensure we don't exceed that.
v2.1.4-beta
A further update to the request made in the api_client to try altering the source url to get the correct response.
v2.1.3-beta
Updating the ingest stage to alter the source url should it fail to download from source.
This involves simply removing any percent symbols from the source url string.
v2.1.2-beta
Updating the Updates type to have s3_value as optional.
v2.1.1-beta
Change typing for download_url and source_url from AnyHttpUrl to str.
v2.1.0-beta
Ingest Stage to Use Execution Data File
Updating the ingest stage to get the new_and_updated_documents.json file from the input/${timestamp} directory by reading the execution data file which points to this directory.
Explanation:
We use a json file to store information on the step functions run in s3. This is named used the unique stepfunctions execution id. This is as execution id is accessible at all points in the stepfunctions flow whereas the run directory we create via the backend would have to be carted through the entire json path.
This means that now all steps in the stepfunctions flow can access the input json (db-state.json)
v2.0.10-beta
Facilitating the download of documents from a download_url such that we can cache documents in an s3 bucket whilst preserving the original source url.
v2.0.9-beta
Fixes an issue caused by not archiving indexer input json files when embeddings need to be regenerated.
v2.0.8-beta
Sundry