Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manage Access to credentialed-access projects Using Access Point Policies #2293

Open
wants to merge 26 commits into
base: dev
Choose a base branch
from

Conversation

Chrystinne
Copy link
Contributor

This Pull Request implements the use of AWS S3 Access Point policies to manage access to restricted-access projects stored in S3 buckets.

The key change in this code is to ensure scalability, enabling access management as the number of projects and users grows.

@Chrystinne Chrystinne assigned Chrystinne and bemoody and unassigned Chrystinne and bemoody Sep 17, 2024
@Chrystinne Chrystinne changed the title Manage Access to Restricted S3 Buckets Using Access Point Policies Manage Access to credentialed-access projects Using Access Point Policies Sep 17, 2024
@bemoody
Copy link
Collaborator

bemoody commented Sep 18, 2024

class AccessPoint(models.Model):
    aws = models.ForeignKey(AWS, related_name='access_points', on_delete=models.CASCADE)
    name = models.CharField(max_length=255)
    users = models.ManyToManyField('AccessPointUser', related_name='access_points')

    def __str__(self):
        return self.name

class AccessPointUser(models.Model):
    aws_id = models.CharField(max_length=20)  # Assuming AWS ID is a string like '053677451470'

    def __str__(self):
        return self.aws_id

First observation: the name of the class should probably be AWSAccessPoint or S3AccessPoint. Make it clear what this is for.

Second: we want to associate access points with particular Users, not particular AWS principals. Because:

  • If somebody removes the AWS account from their PhysioNet profile, we need to remove them from any access points that they had access to.

  • If they subsequently add a different AWS account to their PhysioNet profile, I think we want to add them back to the same access points they were using previously (so their existing scripts will work with their new AWS credentials.)

We could define users as a ManyToManyField pointing to user.User. But I think it'd be better to do something like:

class AWSAccessPointUser(models.Model):
    access_point = models.ForeignKey(AWSAccessPoint, related_name='users',
                                     on_delete=models.CASCADE)
    user = models.ForeignKey('user.User', related_name='aws_access_points',
                             on_delete=models.CASCADE)

    class Meta:
        unique_together = [('access_point', 'user')]

Which is essentially the same thing as ManyToManyField at the database level, but at the Python level it gives more flexibility for the future.

@bemoody
Copy link
Collaborator

bemoody commented Sep 18, 2024

we want to associate access points with particular Users, not particular AWS principals

Now, I say this, but to be fair, there is still an open question of whether we want to allow one person to use multiple AWS principals. Doing that, however, could be quite messy UX-wise (we'd either have to tell people "use access point X if you're using principal A, use access point Y if you're using principal B"... or else we'd have to tell people that every time they add a new principal they might be bumped to a different AP.)

Anyway, I think if we want to add that feature down the road, we can define new models at that point to support it.

@bemoody
Copy link
Collaborator

bemoody commented Sep 18, 2024

I don't like having "aws_id" as an argument to AWS.s3_uri(). What I think we want is to define an s3_uri method in the AccessPoint class.

To obtain the S3 URI for a particular authorized user, we might end up doing something like

    s3_uri = None
    if project.aws:
        if project.aws.is_private:
            ap = project.aws.access_points.filter(users__user=user).first()
            if ap:
                s3_uri = ap.s3_uri()
        else:
            s3_uri = project.aws.s3_uri()

If the region and account ID are required as part of the S3 URI for the access point, those things should be stored in the AccessPoint model (they shouldn't be hard-coded or inferred.)

@bemoody
Copy link
Collaborator

bemoody commented Sep 20, 2024

The basic concept here, I think, is to automatically grant access to everyone who has a linked AWS account and has permission to access the project.

Still some details need working out, but is that broadly how we want this to work? Or should we instead grant access only to people who request it?

I don't think it makes a huge difference, but doing the latter has some advantages: fewer APs to manage, and we get some feedback about how many people are using the feature.

@bemoody
Copy link
Collaborator

bemoody commented Sep 27, 2024

  • In upload_project_to_S3, the controlled-access bucket is created if it doesn't exist, but its bucket policy is not set. We need to set the bucket policy.

  • create_controlled_bucket_policy produces a policy that doesn't work; you want "s3:DataAccessPointAccount": settings.AWS_ACCOUNT_ID (not f"s3:DataAccessPointAccount": "{settings.AWS_ACCOUNT_ID}").

  • create_data_access_point_policy needs to restrict the s3:prefix for ListBucket actions.

…ring projects with a 'RESTRICTED/CREDENTIALED' access policy.
…s being created for the open data bucket or the controlled data bucket. This update also changes how we grant access to users for the controlled-access dataset by using Access Points (APs). It includes creating and listing APs, creating and updating AP policies, and associating AWS users with APs.
… view to use update_data_access_point_policy instead of the old bucket policy update method. Ensure the S3 credentials exist before updating the data access point policy.
…PointUser. Associate access points with specific users instead of AWS principals. Modify the s3_uri() method to retrieve the AWS ID from the cloud information associated with the logged-in user. This information is used to properly display the AWS sync command for the logged-in user.
…ormation to be compatible with the changes made in the AWSAccessPoint and AWSAccessPointUser models.
@Chrystinne Chrystinne force-pushed the restricted-access-s3-bucket branch from 729aa5d to e7a527e Compare November 21, 2024 20:38
@Chrystinne Chrystinne force-pushed the restricted-access-s3-bucket branch from 751b827 to 1dc66d5 Compare December 3, 2024 21:56
@Chrystinne Chrystinne force-pushed the restricted-access-s3-bucket branch from 8f57fe9 to b8eaa82 Compare December 3, 2024 22:41
Copy link
Collaborator

@bemoody bemoody left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's an incomplete list of issues that need addressing:

  • Do not use thread-local storage. Do not use middleware.

  • AP policy does not restrict the s3:prefix. The prefix must be restricted to match the project slug/version.

  • Every time a new user is added, existing users are randomly reassigned to APs. After a given user has been assigned to an AP, they must not be reassigned to a different AP.

  • AP policy needs to use the aws_userid, not the aws_id.

@bemoody
Copy link
Collaborator

bemoody commented Dec 11, 2024

Other issues that are also important are:

  • Updating only the access-point policy that is being changed, not updating all access-point policies at once.

  • Handling AWS IDs that no longer exist (either by detecting and removing them from the policy, or using a policy condition that doesn't require IDs to be valid.)

  • Refreshing policies periodically (adding/removing/updating people whose authorization status has changed or whose AWS ID has changed.)

  • Enforcing uniqueness of AWSAccessPointUser for (user, aws), probably by adding a foreign key from AWSAccessPointUser to AWS.

…y the specific access point policy being changed, and preventing the reassignment of existing users to different access points when a new user is added. Replacing aws_id with aws_userid in access point policies.
@Chrystinne Chrystinne force-pushed the restricted-access-s3-bucket branch 2 times, most recently from b600c69 to 1721f8a Compare January 14, 2025 21:26
… unavailable,and skip DUA signature check for open projects.
@Chrystinne Chrystinne force-pushed the restricted-access-s3-bucket branch from 1721f8a to b42d709 Compare January 14, 2025 21:38
@Chrystinne Chrystinne force-pushed the restricted-access-s3-bucket branch from ff28a70 to 187fe98 Compare January 21, 2025 17:44
@Chrystinne Chrystinne force-pushed the restricted-access-s3-bucket branch from a5892d1 to c7ee305 Compare January 22, 2025 00:05
@Chrystinne Chrystinne force-pushed the restricted-access-s3-bucket branch from 93b7cc5 to ed5c551 Compare January 22, 2025 21:21
@Chrystinne
Copy link
Contributor Author

@bemoody This is a list of the changes addressed in this current PR:

  • Rename classes AWSAccessPoint or S3AccessPoint;

  • Associate access points with particular Users, not particular AWS principals;

  • Update AWSAccessPointUser model;

  • Fix policy created by create_controlled_bucket_policy:
    Replacing f"s3:DataAccessPointAccount": "{settings.AWS_ACCOUNT_ID}" by "s3:DataAccessPointAccount": settings.AWS_ACCOUNT_ID ;

  • Remove the use of middleware/thread-local storage;

  • When adding a new user to an access point, ensure other users are not reassigned to different access points;

  • Use the aws_userid, not the aws_id, in access point policies;

  • Update only the access-point policy that is being changed, not updating all access-point policies at once;

  • Enforcing uniqueness of AWSAccessPointUser for (user, aws), by adding a foreign key;

  • Remove "aws_id" as an argument to AWS.s3_uri(). Define an s3_uri method in the AccessPoint class;

  • Move the method for retrieving the S3 URI for a specific authorized user to the AWSAccessPoint model.

@Chrystinne
Copy link
Contributor Author

@bemoody These are the issues created to address the next features to be implemented as enhancements (a new label, "enhancement," has been created) as new PRs:

Issue #2330

  • Detect and remove invalid AWS IDs from the policy, or use a policy condition that does not require IDs to be valid;

  • Refresh policies periodically to add, remove, or update users whose authorization status or AWS ID has changed;

  • Remove users from access points when they remove their AWS account from their PhysioNet profile;

  • Reassign users to the same access points if they later add a different AWS account to their PhysioNet profile, ensuring their existing scripts continue to work with new AWS credentials;

Issue #2332

  • Ensure the region and account ID, required as part of the S3 URI for the access point, are stored in the AccessPoint model rather than being hard-coded or inferred.

@Chrystinne
Copy link
Contributor Author

@bemoody The PR is now ready for review.

@bemoody
Copy link
Collaborator

bemoody commented Jan 23, 2025

        if (
            has_s3_credentials()
            and files_sent_to_S3(project) is not None
            and s3_bucket_has_credentialed_users(project)
        ):

This should be:

if has_s3_credentials() and files_sent_to_S3(project):

@bemoody
Copy link
Collaborator

bemoody commented Jan 23, 2025

          {% if has_s3_credentials and project.aws.sent_files and s3_uri != None %}
            {% if not project.aws.is_private or has_signed_dua %}

Just write

          {% if project.aws.sent_files and s3_uri %}

(or maybe the check for sent_files should be in the view)

@bemoody
Copy link
Collaborator

bemoody commented Jan 23, 2025

get_access_point_name_for_user_and_project: not used, remove it.
list_access_points: not used, remove it.
update_aws_access_point_policy: not used, remove it.

@bemoody
Copy link
Collaborator

bemoody commented Jan 23, 2025

    if project.access_policy == AccessPolicy.OPEN:
        update_open_bucket_policy(project, bucket_name)
    else:
        if s3_bucket_has_credentialed_users(project):
            initialize_access_points(project)

Just write:

    if project.access_policy == AccessPolicy.OPEN:
        update_open_bucket_policy(project, bucket_name)
    else:
        initialize_access_points(project)

Then s3_bucket_has_credentialed_users is not needed and can be removed.

@bemoody
Copy link
Collaborator

bemoody commented Jan 23, 2025

s3_bucket_has_access_point: not used, remove it.
get_access_point_name: not used except by s3_bucket_has_access_point, remove it.
create_first_data_access_point_policy: not used, remove it.
validade_aws_id: not used, remove it.

@Chrystinne
Copy link
Contributor Author

@bemoody Your last review comments have been addressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants