Skip to content

Commit

Permalink
2.6.2 Security features (#1737)
Browse files Browse the repository at this point in the history
### Feature or Bugfix
- Security

### Detail

### 🔐 Security
* Update sanitization technique for terms filtering by @noah-paige in
#1692 and in
#1693
* Move access logging to a separate environment logging bucket by
@noah-paige in #1695
* Add explicit token duration config for both JWTs by @noah-paige in
#1698
* Disable GraphQL introspection if prod sizing by @noah-paige in
#1704
* Add snyk workflow on schedule by @noah-paige in
#1705,
#1708,
#1713,
#1745 and in in
#1746
* Unify Logger Config for Tasks by @noah-paige in
#1709
* Updating overly permissive policies tagged by checkov for environment
role using least privilege principles by @mourya-33 in
#1632

Data.all permission model has been reviewed to ensure all Mutations and
Queries have proper permissions:
* Add MANAGE_SHARES permissions by @dlpzx in
#1702
* Add permission check - is tenant to update SSM parameters API by
@dlpzx in #1714
* Add GET_SHARE_OBJECT permissions to get data filters API by @dlpzx in
#1717
* Add permissions on list datasets for env group + cosmetic S3 Datasets
by @dlpzx in #1718
* Add GET_WORKSHEET permission in RUN_SQL_QUERY by @dlpzx in
#1716
* Add permissions to Quicksight monitoring service layer by @dlpzx in
#1715
* Add LIST_ENVIRONMENT_DATASETS permission for listing shared datasets
and cleanup unused code by @dlpzx in
#1719
* Add is_owner permissions to Glossary mutations + add new integration
tests by @dlpzx in #1721
* Refactor env permissions + modify getTrustAccount by @dlpzx in
#1712
* Add Feed consistent permissions by @dlpzx in
#1722
* Add Votes consistent permissions by @dlpzx in
#1724
* Consistent get_<DATA_ASSET> permissions - Dashboards by @dlpzx in
#1729


### 🧪 Test improvements
Integration tests are in sync with `main` without 2.7 planned features.
In this PR all core modules, optional modules and submodules are tested.
That includes: tenant-permissions, omics, mlstudio, votes, notifications
and backwards compatiblity of s3 shares. by @SofiaSazonova, @noah-paige
, @petrkalos and @dlpzx


In addition, the following PR adds functional tests that ensure the
permission model of data.all is not corrupted.
* ⭐ Add resource permission checks by @petrkalos in
#1711


### Dependencies
* Update FastAPI by @petrkalos in #1577 
* update fastapi dependency by @noah-paige in
#1699
* Upgrade "cross-spawn" to "7.0.5" by @dlpzx in
#1701
* Bump python runtime to bump cdk klayers cryptography version by
@noah-paige in #1707


### Relates
- List above

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

---------

Co-authored-by: mourya-33 <[email protected]>
Co-authored-by: Mourya Darivemula <[email protected]>
Co-authored-by: Noah Paige <[email protected]>
Co-authored-by: Petros Kalos <[email protected]>
Co-authored-by: Sofia Sazonova <[email protected]>
Co-authored-by: Sofia Sazonova <[email protected]>
  • Loading branch information
7 people authored Jan 15, 2025
1 parent 99dd5bb commit 749ffc3
Show file tree
Hide file tree
Showing 244 changed files with 9,412 additions and 13,591 deletions.
71 changes: 25 additions & 46 deletions .checkov.baseline
Original file line number Diff line number Diff line change
Expand Up @@ -417,7 +417,7 @@
]
},
{
"file": "/cdk.out/asset.3045cb6b4340be1e173df6dcf6248d565aa849ceda3e2cf2c2f221ccee4bc1d6/pivotRole.yaml",
"file": "/cdk.out/asset.05d71d8b69cd4483d3c9db9120b556b718c72f349debbb79d461c74c4964b350/pivotRole.yaml",
"findings": [
{
"resource": "AWS::IAM::ManagedPolicy.PivotRolePolicy0",
Expand Down Expand Up @@ -490,12 +490,6 @@
{
"file": "/checkov_environment_synth.json",
"findings": [
{
"resource": "AWS::IAM::ManagedPolicy.dataallanothergroup111111servicespolicy19AC37181",
"check_ids": [
"CKV_AWS_111"
]
},
{
"resource": "AWS::IAM::ManagedPolicy.dataallanothergroup111111servicespolicy2E85AF510",
"check_ids": [
Expand All @@ -508,24 +502,6 @@
"CKV_AWS_111"
]
},
{
"resource": "AWS::IAM::ManagedPolicy.dataallanothergroup111111servicespolicy5A19E75CA",
"check_ids": [
"CKV_AWS_109"
]
},
{
"resource": "AWS::IAM::ManagedPolicy.dataallanothergroup111111servicespolicyCC720210",
"check_ids": [
"CKV_AWS_109"
]
},
{
"resource": "AWS::IAM::ManagedPolicy.dataalltestadmins111111servicespolicy1A0C96958",
"check_ids": [
"CKV_AWS_111"
]
},
{
"resource": "AWS::IAM::ManagedPolicy.dataalltestadmins111111servicespolicy2B12D381A",
"check_ids": [
Expand All @@ -538,18 +514,6 @@
"CKV_AWS_111"
]
},
{
"resource": "AWS::IAM::ManagedPolicy.dataalltestadmins111111servicespolicy3E3CBA9E",
"check_ids": [
"CKV_AWS_109"
]
},
{
"resource": "AWS::IAM::ManagedPolicy.dataalltestadmins111111servicespolicy56D7DC525",
"check_ids": [
"CKV_AWS_109"
]
},
{
"resource": "AWS::Lambda::Function.CustomCDKBucketDeployment8693BB64968944B69AAFB0CC9EB8756C81C01536",
"check_ids": [
Expand All @@ -563,38 +527,34 @@
"resource": "AWS::Lambda::Function.GlueDatabaseLFCustomResourceHandler7FAF0F82",
"check_ids": [
"CKV_AWS_115",
"CKV_AWS_117",
"CKV_AWS_173"
"CKV_AWS_117"
]
},
{
"resource": "AWS::Lambda::Function.LakeformationDefaultSettingsHandler2CBEDB06",
"check_ids": [
"CKV_AWS_115",
"CKV_AWS_117",
"CKV_AWS_173"
"CKV_AWS_117"
]
},
{
"resource": "AWS::Lambda::Function.dataallGlueDbCustomResourceProviderframeworkonEventF8347BA7",
"check_ids": [
"CKV_AWS_115",
"CKV_AWS_116",
"CKV_AWS_117",
"CKV_AWS_173"
"CKV_AWS_117"
]
},
{
"resource": "AWS::Lambda::Function.dataallLakeformationDefaultSettingsProviderframeworkonEventBB660E32",
"check_ids": [
"CKV_AWS_115",
"CKV_AWS_116",
"CKV_AWS_117",
"CKV_AWS_173"
"CKV_AWS_117"
]
},
{
"resource": "AWS::S3::Bucket.EnvironmentDefaultBucket78C3A8B0",
"resource": "AWS::S3::Bucket.EnvironmentDefaultLogBucket7F0EFAB3",
"check_ids": [
"CKV_AWS_18"
]
Expand Down Expand Up @@ -653,6 +613,25 @@
}
]
},
{
"file": "/checkov_pipeline_synth.json",
"findings": [
{
"resource": "AWS::IAM::Role.PipelineRoleDCFDBB91",
"check_ids": [
"CKV_AWS_107",
"CKV_AWS_108",
"CKV_AWS_111"
]
},
{
"resource": "AWS::S3::Bucket.thistableartifactsbucketDB1C8C64",
"check_ids": [
"CKV_AWS_18"
]
}
]
},
{
"file": "/frontend/docker/prod/Dockerfile",
"findings": [
Expand Down
31 changes: 31 additions & 0 deletions .github/workflows/snyk.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
name: Snyk

on:
workflow_dispatch:

schedule:
- cron: "0 9 * * 1" # runs each Monday at 9:00 UTC

permissions:
contents: read
security-events: write

jobs:
security:
strategy:
matrix:
python-version: [3.9]
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: snyk/actions/setup@master
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
- name: Install All Requirements
run: make install
- name: Run Snyk to check for vulnerabilities
run: snyk test --all-projects --detection-depth=5 --severity-threshold=high
env:
SNYK_TOKEN: ${{ secrets.SNYK_TOKEN }}
8 changes: 7 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ venv:
@python3 -m venv "venv"
@/bin/bash -c "source venv/bin/activate"

install: upgrade-pip install-deploy install-backend install-cdkproxy install-tests
install: upgrade-pip install-deploy install-backend install-cdkproxy install-tests install-integration-tests install-custom-auth install-userguide

upgrade-pip:
pip install --upgrade pip setuptools
Expand All @@ -36,6 +36,12 @@ install-tests:
install-integration-tests:
pip install -r tests_new/integration_tests/requirements.txt

install-custom-auth:
pip install -r deploy/custom_resources/custom_authorizer/requirements.txt

install-userguide:
pip install -r documentation/userguide/requirements.txt

lint:
pip install ruff
ruff check --fix
Expand Down
10 changes: 9 additions & 1 deletion backend/api_handler.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
from dataall.base.db import get_engine
from dataall.base.loader import load_modules, ImportMode

from graphql.pyutils import did_you_mean

logger = logging.getLogger()
logger.setLevel(os.environ.get('LOG_LEVEL', 'INFO'))
Expand All @@ -32,6 +33,11 @@
for name in ['boto3', 's3transfer', 'botocore', 'boto']:
logging.getLogger(name).setLevel(logging.ERROR)

ALLOW_INTROSPECTION = True if os.getenv('ALLOW_INTROSPECTION') == 'True' else False

if not ALLOW_INTROSPECTION:
did_you_mean.__globals__['MAX_LENGTH'] = 0

load_modules(modes={ImportMode.API})
SCHEMA = bootstrap_schema()
TYPE_DEFS = gql(SCHEMA.gql(with_directives=False))
Expand Down Expand Up @@ -137,7 +143,9 @@ def handler(event, context):
else:
raise Exception(f'Could not initialize user context from event {event}')

success, response = graphql_sync(schema=executable_schema, data=query, context_value=app_context)
success, response = graphql_sync(
schema=executable_schema, data=query, context_value=app_context, introspection=ALLOW_INTROSPECTION
)

dispose_context()
response = json.dumps(response)
Expand Down
11 changes: 11 additions & 0 deletions backend/dataall/__init__.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,13 @@
from . import core, version
from .base import utils, db, api
import logging
import os
import sys

logging.basicConfig(
level=os.environ.get('LOG_LEVEL', 'INFO'),
handlers=[logging.StreamHandler(sys.stdout)],
format='[%(levelname)s] %(message)s',
)
for name in ['boto3', 's3transfer', 'botocore', 'boto', 'urllib3']:
logging.getLogger(name).setLevel(logging.ERROR)
15 changes: 5 additions & 10 deletions backend/dataall/base/cdkproxy/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,17 +1,12 @@
aws-cdk-lib==2.99.0
boto3==1.28.23
boto3-stubs==1.28.23
botocore==1.31.23
aws-cdk-lib==2.160.0
boto3==1.35.26
boto3-stubs==1.35.26
cdk-nag==2.7.2
constructs==10.0.73
starlette==0.36.3
fastapi == 0.109.2
Flask==2.3.2
fastapi == 0.115.5
PyYAML==6.0
requests==2.32.2
tabulate==0.8.9
uvicorn==0.15.0
werkzeug==3.0.3
constructs>=10.0.0,<11.0.0
werkzeug==3.0.6
git-remote-codecommit==1.16
aws-ddk-core==1.3.0
2 changes: 1 addition & 1 deletion backend/dataall/base/context.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
that in the request scope
The class uses Flask's approach to handle request: ThreadLocal
That approach should work fine for AWS Lambdas and local server that uses Flask app
That approach should work fine for AWS Lambdas and local server that uses FastApi app
"""

from dataclasses import dataclass
Expand Down
3 changes: 3 additions & 0 deletions backend/dataall/base/feature_toggle_checker.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@
Contains decorators that check if a feature has been enabled or not
"""

import functools

from dataall.base.config import config
from dataall.base.utils.decorator_utls import process_func

Expand All @@ -10,6 +12,7 @@ def is_feature_enabled(config_property: str):
def decorator(f):
fn, fn_decorator = process_func(f)

@functools.wraps(fn)
def decorated(*args, **kwargs):
value = config.get_property(config_property)
if not value:
Expand Down
34 changes: 28 additions & 6 deletions backend/dataall/base/utils/naming_convention.py
Original file line number Diff line number Diff line change
@@ -1,28 +1,46 @@
from enum import Enum

import re
from .slugify import slugify


class NamingConventionPattern(Enum):
S3 = {'regex': '[^a-zA-Z0-9-]', 'separator': '-', 'max_length': 63}
S3 = {
'regex': '[^a-zA-Z0-9-]',
'separator': '-',
'max_length': 63,
'valid_external_regex': '(?!(^xn--|.+-s3alias$))^[a-z0-9][a-z0-9-]{1,61}[a-z0-9]$',
}
KMS = {'regex': '[^a-zA-Z0-9-]$', 'separator': '-', 'max_length': 63, 'valid_external_regex': '^[a-zA-Z0-9_-]+$'}
IAM = {'regex': '[^a-zA-Z0-9-_]', 'separator': '-', 'max_length': 63} # Role names up to 64 chars
IAM_POLICY = {'regex': '[^a-zA-Z0-9-_]', 'separator': '-', 'max_length': 128} # Policy names up to 128 chars
GLUE = {'regex': '[^a-zA-Z0-9_]', 'separator': '_', 'max_length': 240} # Limit 255 - 15 extra chars buffer
GLUE = {
'regex': '[^a-zA-Z0-9_]',
'separator': '_',
'max_length': 240,
'valid_external_regex': '^[a-zA-Z0-9_]+$',
} # Limit 255 - 15 extra chars buffer
GLUE_ETL = {'regex': '[^a-zA-Z0-9-]', 'separator': '-', 'max_length': 52}
NOTEBOOK = {'regex': '[^a-zA-Z0-9-]', 'separator': '-', 'max_length': 63}
MLSTUDIO_DOMAIN = {'regex': '[^a-zA-Z0-9-]', 'separator': '-', 'max_length': 63}
DEFAULT = {'regex': '[^a-zA-Z0-9-_]', 'separator': '-', 'max_length': 63}
DEFAULT_SEARCH = {'regex': '[^a-zA-Z0-9-_:. ]'}
OPENSEARCH = {'regex': '[^a-z0-9-]', 'separator': '-', 'max_length': 27}
OPENSEARCH_SERVERLESS = {'regex': '[^a-z0-9-]', 'separator': '-', 'max_length': 31}
DATA_FILTERS = {'regex': '[^a-z0-9_]', 'separator': '_', 'max_length': 31}
REDSHIFT_DATASHARE = {
'regex': '[^a-zA-Z0-9_]',
'separator': '_',
'max_length': 1000,
} # Maximum length of 2147483647


class NamingConventionService:
def __init__(
self,
target_label: str,
target_uri: str,
pattern: NamingConventionPattern,
resource_prefix: str,
target_uri: str = '',
resource_prefix: str = '',
):
self.target_label = target_label
self.target_uri = target_uri if target_uri else ''
Expand All @@ -37,4 +55,8 @@ def build_compliant_name(self) -> str:
separator = NamingConventionPattern[self.service].value['separator']
max_length = NamingConventionPattern[self.service].value['max_length']
suffix = f'-{self.target_uri}' if len(self.target_uri) else ''
return f"{slugify(self.resource_prefix + '-' + self.target_label[:(max_length- len(self.resource_prefix + self.target_uri))] + suffix, regex_pattern=fr'{regex}', separator=separator, lowercase=True)}"
return f"{slugify(self.resource_prefix + '-' + self.target_label[:(max_length - len(self.resource_prefix + self.target_uri))] + suffix, regex_pattern=fr'{regex}', separator=separator, lowercase=True)}"

def sanitize(self):
regex = NamingConventionPattern[self.service].value['regex']
return re.sub(regex, '', self.target_label)
1 change: 1 addition & 0 deletions backend/dataall/core/environment/api/queries.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@

getTrustAccount = gql.QueryField(
name='getTrustAccount',
args=[gql.Argument(name='organizationUri', type=gql.NonNullableType(gql.String))],
type=gql.String,
resolver=get_trust_account,
test_scope='Environment',
Expand Down
Loading

0 comments on commit 749ffc3

Please sign in to comment.