Skip to content

Commit

Permalink
updates to
Browse files Browse the repository at this point in the history
  • Loading branch information
toddbruner committed Oct 31, 2024
1 parent b1c585a commit 98eb848
Show file tree
Hide file tree
Showing 398 changed files with 58,558 additions and 1 deletion.
36 changes: 36 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
FROM python:3.11-bullseye

WORKDIR /code

# Add necessary groups and users
RUN groupadd -g 3000 scot4api
RUN useradd -M -r -u 3000 -g 3000 -s /bin/bash scot4api

RUN apt-get update && \
apt-get upgrade -y && \
apt-get install -y \
curl \
mariadb-client \
python3-dev \
default-libmysqlclient-dev \
build-essential \
libxml2-dev \
libxslt-dev

# Create the default file storage directories
RUN mkdir -p /var/scot_files/_deleted_items
RUN chown -R scot4api /var/scot_files

# Copy over the required files
COPY requirements.txt /code/requirements.txt
COPY ./src/app /code/app

# Install requirements and upgrade pip
RUN pip install --upgrade pip && pip install -r requirements.txt

# Set deployment user and give correct permissions
RUN chown -R scot4api /code
USER scot4api

# Start option
CMD ["uvicorn", "app.main:app", "--host", "127.0.0.1", "--port", "8000"]
37 changes: 37 additions & 0 deletions Dockerfile-util
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
FROM python:3.11-bullseye

WORKDIR /code

# Add necessary groups and users
RUN groupadd -g 3000 scot4api
RUN useradd -M -r -u 3000 -g 3000 -s /bin/bash scot4api

RUN apt-get update && \
apt-get upgrade -y && \
apt-get install -y \
curl \
mariadb-client \
python3-dev \
default-libmysqlclient-dev \
build-essential \
libxml2-dev \
libxslt-dev

# Create the default file storage directories
RUN mkdir -p /var/scot_files/_deleted_items
RUN chown -R scot4api /var/scot_files

# Copy over the required files
COPY requirements.txt /code/requirements.txt
COPY requirements-test.txt /code/requirements-test.txt

COPY ./src/app /code/app
COPY ./tests /code/tests
COPY ./conversion /code/conversion

# Install requirements and upgrade pip
RUN pip install --upgrade pip && pip install -r requirements-test.txt

# Set deployment user and give correct permissions
RUN chown -R scot4api /code
USER scot4api
23 changes: 23 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
Copyright (2024) Sandia Corporation. Under the terms of Contract DE-AC04-94AL85000, there is a non-exclusive license for use of this work by or on behalf of the U.S. Government. Export of this program may require a license from the United States Government.

NOTICE:

For five (5) years from 09/01/2024, the United States Government is granted for itself and others acting on its behalf a paid-up, nonexclusive, irrevocable worldwide license in this data to reproduce, prepare derivative works, and perform publicly and display publicly, by or on behalf of the Government. There is provision for the possible extension of the term of this license. Subsequent to that period or any extension granted, the United States Government is granted for itself and others acting on its behalf a paid-up, nonexclusive, irrevocable worldwide license in this data to reproduce, prepare derivative works, distribute copies to the public, perform publicly and display publicly, and to permit others to do so. The specific term of the license can be identified by inquiry made to Sandia Corporation or DOE.

NEITHER THE UNITED STATES GOVERNMENT, NOR THE UNITED STATES DEPARTMENT OF ENERGY, NOR SANDIA CORPORATION, NOR ANY OF THEIR EMPLOYEES, MAKES ANY WARRANTY, EXPRESS OR IMPLIED, OR ASSUMES ANY LEGAL RESPONSIBILITY FOR THE ACCURACY, COMPLETENESS, OR USEFULNESS OF ANY INFORMATION, APPARATUS, PRODUCT, OR PROCESS DISCLOSED, OR REPRESENTS THAT ITS USE WOULD NOT INFRINGE PRIVATELY OWNED RIGHTS.

Any licensee of this software has the obligation and responsibility to abide by the applicable export control laws, regulations, and general prohibitions relating to the export of technical data. Failure to obtain an export control license or other authority from the Government may result in criminal liability under U.S. laws.

Copyright [2024] Sandia Corporation.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
86 changes: 85 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1 +1,85 @@
Placeholder
# SCOT4 API

## Build and Deploy

**To build images for testing**: push code to the default branch of this repository. The CI pipeline will create a new image and push it to the [unsorted container registry](https://baltig.sandia.gov/scot/scot4/SCOT-API/container_registry/313) with a tag matching the short SHA of the commit.

**To build an image for quality**: [identify the pipeline created by your latest push](https://baltig.sandia.gov/scot/scot4/SCOT-API/-/pipelines). Click the play button the "Tag Qual Image" job. This will take the image and push it to the [quality container registry](https://baltig.sandia.gov/scot/scot4/SCOT-API/container_registry/326) as tag `latest`.

**To build an image for production**: [create a new release for this project](https://baltig.sandia.gov/scot/scot4/SCOT-API/-/releases). When selecting a tag, choose a new tag name that follows a valid semantic versioning scheme (MAJOR.MINOR.PATCH) for instance 1.0.17. Make sure that this version is greater than any previous release. **Note**: only a maintainer of this repository may create a patch for the default branch. Once the image is created, it will be placed in [the production container registry](https://baltig.sandia.gov/scot/scot4/SCOT-API/container_registry/340) with a tag name matching the git tag created as well as overwriting `latest`.

On tag validity: a job is run in the pipeline that verifies the tag is a valid semantic version string and is greater than any version before this one. This script lives in [SCOT4 Pipeline Support Repo](https://baltig.sandia.gov/scot/scot4/pipeline-support/-/blob/main/scripts/tag_validate.py?ref_type=heads) and is bundled into a container image for use in pipelines. It utilizes the gitlab release api to check all of the repo's releases and the git tags associated with them

#### Initial Setup

Create a .env

```shell
touch src/.env
```

Needs to contain these keys
```
# PROD or DEV
ENV=
SECRET_KEY=
SQLALCHEMY_DATABASE_URI=sqlite:///../scot4-test.db
```

Note `main.py` ts called from the TLD.
```shell
export PYTHONPATH=$PWD/src
python src/app/main.py
```

#### Running

Using main
```shell
python src/app/main.py
```

**OR**
Using uvicorn
```shell
export PYTHONPATH=$PWD/src
cd src/app
uvicorn main:app --host=127.0.0.1 --port=8080 --reload
```


#### Running Tests
Now in parallel!
- With `-n auto`, pytest-xdist will use as many processes as your computer has physical CPU cores.
- `--dist loadfile`: Tests are grouped by their containing file. Groups are distributed to available workers as whole units. This guarantees that all tests in a file run in the same worker.
- Make sure that the SQLite database is in memory otherwise it can crash

```shell
export PYTHONPATH=$PWD/src:$PWD/tests
export SQLALCHEMY_DATABASE_URI="sqlite://"
export ENV=TEST
pytest -n auto --dist loadfile tests/
```

To run pytest normally
```shell
export PYTHONPATH=$PWD/src:$PWD/tests
pytest tests/
```

What needs to be done/thought about

Roles & Permissions
* Administrator - Full Access
* Incident Commander - View Edit Events, Alerts
* Observer - View Events

schemas need:
* PositiveInt
* EmailStr
* AnyUrl
* None
* Fix for

DB models
* need to be pluralized
48 changes: 48 additions & 0 deletions conversion/conversion.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# SCOT3 to SCOT4 Conversion Utilities

This directory contains scripts for migrating data from version 3 of SCOT to version 4. They are grouped into three categories: database migrations, file migrations, and extra (optional) migrations. Three bash shell scripts have been provided for you to run the applicable migrations in each category.

## Database Conversion
This set of scripts migrates the core database data to the SCOT4 database by pulling data directly from the SCOT3 mongodb database. Almost all SCOT3 installations migrating to SCOT4 will want to do this. The `database_conversion.sh` script will run all of the necessary scripts for you.

The following environment variables should be set when running `database_conversion.sh`:
- MONGO_DB_URI - the URI used to connect to the SCOT3 mongodb database
- MONGO_DB_NAME - the name of the SCOT3 mongodb database
- SQL_URI - the URI used to connect to the SCOT4 SQL database
- SQL_PW - the password used to connect to the SCOT4 SQL database
- SCOT_MIGRATION_STAGING_DIRECTORY (optional) - the directory used to stage intermediate files for the conversion (created if does not exist, default /data/scot4_migration_sync/)

## Extra Migrations
This set of scripts contains useful entries that are not necessarily required, but which will ease the transition from SCOT3 to SCOT4.

### Signature Migrations
One of the primary ways that SCOT4 differs from SCOT3 is that guides must be linked to alerts by way of a signature. In SCOT4, signatures more explicitly represent the rules that generate alerts, so guides are linked to specific signatures, and those signatures are then linked to alerts when they fire.

Because of this, in order for guides for new and past alerts to be linked properly, each must be linked to a signature. The script `guide_sigs_link.py` will attempt to link guides to signatures by name or create a new signature for a guide if there isn't already a signature with the same name. Likewise, `link_alerts_signatures.py` will attempt to link all existing alerts with a signature as if those alerts had just been generated. `signature_permissions.py` will also fix permissions on existing signatures (since signatures didn't have permissions in SCOT3).

If you would like to perform all of these extra signature migrations steps, run the `signature_conversion.sh` script.

The following environment variables should be set when running `signature_conversion.sh`:
- SQLALCHEMY_DATABASE_URI - set to the SCOT4 database URI, as if running the SCOT4 API
- PYTHONPATH - set to include the src/ directory of the SCOT4 API (the scripts borrow code from the API to run)

### Admin Migration
By default, the SCOT4 migration creates a user named `scot-admin` to be the initial superuser for SCOT. You can give this user a password and an API key by setting the `SCOT_ADMIN_PASSWORD` and/or `SCOT_ADMIN_APIKEY` environment variables respectively, then running the `extra_migration/update_admin_password_and_api_key.py` script.

The following environment variables should be set when running `update_admin_password_and_api_key.py`:
- SQLALCHEMY_DATABASE_URI - set to the SCOT4 database URI, as if running the SCOT4 API
- PYTHONPATH - set to include the src/ directory of the SCOT4 API (the script borrows code from the API to run)

## File Conversion
Finally, if you uploaded files to SCOT3 and wish to migrate them to SCOT4, they must be migrated separately. This also applies to cached images in entries that were downloaded and subsequently hosted through SCOT. These scripts upload the files and cached images to the SCOT4 file store and also rewrite existing entries to point to the new files.

Before files and images can be migrated, **you must configure a file storage mechanism on the SCOT4 instance**. This usually means that you must set up the API and frontend, and configure a file storage option through the admin panel on the frontend. Once you have done this, you can run the `file_conversion.sh` to migrate both files and cached images from SCOT3.

The following environment variables should be set when running `file_conversion.sh`:
- MONGO_DB_URI - the URI of the SCOT3 mongodb database
- MONGO_DB_NAME - the name of the SCOT3 mongodb database
- SCOT4_URI - the base URI of the SCOT4 installation (e.g. https://scot4.example.com)
- SCOT_ADMIN_APIKEY - a SCOT4 API key with admin priveleges (see above for one way to create one)
- SCOT3_FILE_PREFIX (needed for file migration) - the directory under which the files were stored in the SCOT3 database, this defaults to the default in SCOT3, which was `/opt/scotfiles/`
- SCOT_FILES_DIR (needed for file migration) - the directory on the current machine in which the old SCOT3 files are stored (with the same file structure that the SCOT3 installation had)
- SCOT_CACHED_IMAGES_DIR (needed for cached images migration) - the directory on the current machine that contains the SCOT3 cached images in their original file structure (this is usually the /cached_images/ directory in the SCOT3 files)
17 changes: 17 additions & 0 deletions conversion/database_conversion.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
#! /bin/bash
set -e

CONVERSION_DIR=$(dirname "$0")
SCOT_MIGRATION_STAGING_DIRECTORY="${SCOT_MIGRATION_STAGING_DIRECTORY:-/data/scot4_migration_sync/}"

# Set up dirs
mkdir -p $SCOT_MIGRATION_STAGING_DIRECTORY/conversion_staging

# Create all TSVs from mongo data
cd $CONVERSION_DIR/database_migration
python3 ./scot3_scot4_mongo_tsv_export.py

# Tear down DB and import TSVs
mysqlsh $SQL_URI --password=$SQL_PW --file ./initial_scot4_database.sql
mysqlsh $SQL_URI --password=$SQL_PW --file ./scot3_scot4_tsv_import.py
mysqlsh $SQL_URI --password=$SQL_PW --file ./fix_parent_entry_ids.sql
66 changes: 66 additions & 0 deletions conversion/database_migration/bulk_alert_data.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
import os
import csv
import tqdm
import json

def main(mongo_db=None):
schema_id_map = {}
staging_directory = os.getenv('SCOT_MIGRATION_STAGING_DIRECTORY')
scot3_alertgroup_count = mongo_db.alertgroup.count_documents({})
scot3_alertgroups = mongo_db.alertgroup.find()
_id = 1
with open(f'{staging_directory}/alertgroup_schema_keys.csv', 'w+') as alertgroup_schema_keys_csv:
writer = csv.writer(alertgroup_schema_keys_csv, dialect='unix', delimiter="\t", quotechar="'")
writer.writerow(['schema_key_name', 'alertgroup_id', 'schema_key_order', 'schema_key_id'])
with tqdm.tqdm(total=scot3_alertgroup_count) as pbar:
bulk_array = []
for alertgroup in scot3_alertgroups:
alerts = mongo_db.alert.find({'alertgroup':alertgroup['id']})
schema_keys = set([])
for alert in alerts:
schema_keys.update([k.lower() for k in alert['data'].keys()])
new_schema_keys = [[x.lower(), alertgroup['id'], c] for c,x in enumerate(schema_keys) if (x.lower() != '_raw' and x.lower() !='columns' and x.lower() != 'search')]
for schema_key_iter in new_schema_keys:
schema_key_iter.append(_id)
schema_key_name = schema_key_iter[0]
alertgroup_id = schema_key_iter[1]
_key = f"{schema_key_name}-{alertgroup_id}"
schema_id_map[_key] = _id
writer.writerow(schema_key_iter)
_id += 1
pbar.update(1)
scot3_alerts = mongo_db.alert.find()
scot3_alert_count = mongo_db.alert.count_documents({})
# initialize csv file
with tqdm.tqdm(total=scot3_alert_count) as pbar:
with open(f'{staging_directory}/alert_data.csv', 'w+') as alert_data_csv:
writer = csv.writer(alert_data_csv, dialect='unix', delimiter="\t", quotechar="'")
for alert in scot3_alerts:
alert_datas = transform_alert(alert=alert, schema_id_map=schema_id_map)
writer.writerows(alert_datas)
pbar.update(1)

def transform_alert(alert=None, schema_id_map=None):
alert_datas = []
# First transform alert['data'] and alert['data_with_flair'] dictionaries to only have lowercase keys. This will eliminate duplicate keys
alert['data'] = {k.lower(): v for k,v in alert['data'].items()}
alert['data_with_flair'] = {k.lower(): v for k,v in alert['data_with_flair'].items()}
alertgroup_id = alert['alertgroup']
unique_keys = set(list(alert['data'].keys()) + list(alert['data_with_flair'].keys()))
for c,k in enumerate(unique_keys):
if k =='columns' or k =='search' or k=='_raw':
# We don't care about these columns because they should not show up in an alertgroup table
continue
else:
# Get the schem key id from the map we created beforehand
schema_id = schema_id_map.get(f"{k}-{alertgroup_id}")
if schema_id is None:
continue
else:
data_value = alert['data'].get(k)
data_value_flaired = alert['data_with_flair'].get(k)
data_value = json.dumps(data_value)
data_value_flaired = json.dumps(data_value_flaired)
alert_data = [data_value, data_value_flaired, schema_id , alert['id']]
alert_datas.append(alert_data)
return alert_datas
36 changes: 36 additions & 0 deletions conversion/database_migration/bulk_alertgroups.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
import csv
import os
from datetime import datetime
from datetime import timezone
import tqdm
from conversion_utilities import write_permission, write_tag_source_links

def main(mongo_db=None, role_lookup=None, tag_lookup=None, source_lookup=None):
staging_directory = os.getenv('SCOT_MIGRATION_STAGING_DIRECTORY')
permission_csv = open(f'{staging_directory}/alertgroup_permissions.csv','w+')
permission_csv_writer = csv.writer(permission_csv, dialect='unix', delimiter='\t', quotechar="'")
permission_csv_writer.writerow(['role_id', 'target_type', 'target_id', 'permission'])

links_csv = open(f'{staging_directory}/alertgroup_links.csv','w+')
link_csv_writer = csv.writer(links_csv, dialect='unix', delimiter='\t', quotechar="'")
link_csv_writer.writerow(['v0_type', 'v0_id', 'v1_type', 'v1_id'])
scot3_alertgroup_count = mongo_db.alertgroup.count_documents({})
scot3_alertgroups = mongo_db.alertgroup.find()
with open(f'{staging_directory}/alertgroups.csv', 'w+') as alertgroup_schema_keys_csv:
writer = csv.writer(alertgroup_schema_keys_csv, dialect='unix', delimiter="\t", quotechar="'")
writer.writerow(['alertgroup_id', 'tlp', 'subject', 'created_date', 'modified_date', 'view_count'])
with tqdm.tqdm(total=scot3_alertgroup_count) as pbar:
for alertgroup in scot3_alertgroups:
view_count = alertgroup.get('views')
if view_count is None:
view_count = 0
new_alertgroup = [alertgroup['id'], 'unset', alertgroup['subject'], datetime.fromtimestamp(alertgroup['created']).astimezone(timezone.utc).replace(tzinfo=None), datetime.fromtimestamp(alertgroup['updated']).astimezone(timezone.utc).replace(tzinfo=None), view_count]
writer.writerow(new_alertgroup)
write_permission(thing=alertgroup, thing_type='alertgroup', role_lookup=role_lookup, permission_csv_writer=permission_csv_writer)
write_tag_source_links(thing=alertgroup, thing_type='alertgroup', tag_lookup=tag_lookup, source_lookup=source_lookup, link_csv_writer=link_csv_writer)
pbar.update(1)

permission_csv.close()
links_csv.close()
if __name__ == "__main__":
main()
Loading

0 comments on commit 98eb848

Please sign in to comment.