Skip to content

A project containing common code for the TWI data collection projects

License

Notifications You must be signed in to change notification settings

JessicaHenkel/twi_data_collection_s3

 
 

Repository files navigation

TWI Data Collection - AWS S3- AGOL integration

This project includes the integration script for the TWI institute data collection projects. It also defines a metadata format, a folder naming convention, and an AGOL feature class structure.

This project includes a notebook demonstrating the approach (in docs/notebooks folder).

You need to configure your credentials in the $HOME/.aws/credentials file, using your project names to separate the different execution environments.

[LWI]
aws_access_key_id = <an access key>
aws_secret_access_key = <a secret access key>
[GLO]
aws_access_key_id = <an access key>
aws_secret_access_key = <a secret access key>
[TEST]
aws_access_key_id = <the minio access key>
aws_secret_access_key = <the minio secret key>

Once installed (i.e. the src path is part of the PYTHONPATH), you can use the module as follows:

python -m metadata_collector collect --help
Usage: python -m metadata_collector collect [OPTIONS]

  This app will collect the metadata from the buckets for a project and
  create the feature-class map with them.

Options:
  --project TEXT                  Id of the project, default: TEST  [default:
                                  TEST]

  --process_all                   Process all the documents, ignoring the
                                  already seen lists  [default: False]

  -b, --bucket TEXT               the bucket to scan (optional if you provide
                                  the project name), you can use multiple
                                  buckets using: -b bucket1 -b bucket2 ...

  -o, --output_format [GeoJSON|GPKG|Shapefile|CSV]
                                  [default: GeoJSON]
  --output_name TEXT              output name without extension (will be added
                                  according to the format)  [default: output]

  --generate_plot                 generate a plot of the map  [default: False]
  --hucs_gdb PATH                 [default: ./data/hucs.gdb]
  --agol_credentials TEXT         a json including  username,  password,
                                  endpoint (optional, will use the twiotg
                                  instance by default),  folder (optional,
                                  will use / by default), and  tags
                                  (optional). example:

                                  --agol_credentials '{   "username":
                                  "theUsername",   "password": "thePassword",
                                  "endpoint":
                                  "https://theagolinstance.maps.arcmap.com",
                                  "folder": "/",   "tags": ["map", "metadata"]
                                  }'

  --help                          Show this message and exit.

It requires the hucs.gdb file (or a shapefile), you can set the path using the option --hucs_gdb <the file location>, or it will try to use the default location.

Translate a CSDGM file to a YAML in the format used by the collector

Usage: python -m metadata_collector translate [OPTIONS] SOURCE_FILE

  Translates a xml file with metadata in format CSDGM to a YAML file with
  the format used by the collector.

Options:
  --target_file FILENAME  Output file. By default it will be metadata.yaml in
                          the same directory of the source file

  --schema_file FILENAME  Schema file  [default:
                          conf/metadata.jsonschema.json]

  --help                  Show this message and exit.

Development environment

You will need virtualenv and python3.8+. In a ubuntu box you can install it using

sudo apt install python3 virtualenv

Then, you can create a virtual environment, activate it, and install the dependencies:

virtualenv -p python3 .venv
source .venv/bin/activate
pip install -r requirements.txt

This project uses pre-commit hooks, so you need to run:

pre-commit install

the first time you clone your project.

For test, we use MinIO. Make sure to start the server, pointing to the appropriated folder, before testing. We can add a minio binary to the virtual environment for convenience:

cd .venv/bin
wget https://dl.min.io/server/minio/release/linux-amd64/minio

Then, from the project root, we could start the server using:

minio server a_test_folder 2>&1 >logs/minio.log &

In MinIO, the folders on the test folders will be used as buckets.

Also, when activating the environment again, in a new session, you may get a message indicating that GDAL_DATA is not set, to avoid this you can add this line to the .venv/bin/activate script:

export GDAL_DATA=$VIRTUAL_ENV/lib/python3.8/site-packages/fiona/gdal_data/

Development standards

Code must be documented. We use click to define the CLIs. We use black as formatter (the pre-commit hook will take care of it, if you follow the previous instructions).

We use bandit to detect security problems. In the travis configuration bandit results are informative only (they doesn't mark the build as failed).

The main branch on Github is intended to have code common to different data collection projects. You should either fork or create a new branch to add code/documentation specific to one project.

About

A project containing common code for the TWI data collection projects

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 73.8%
  • HTML 22.0%
  • Shell 4.2%