Skip to content
This repository has been archived by the owner on May 27, 2024. It is now read-only.

testing latest pull changes from Lyft master #9

Open
wants to merge 336 commits into
base: test_merge_lyft_master
Choose a base branch
from

Conversation

ashwin-borneo
Copy link

No description provided.

ramonpetgrave64 and others added 30 commits September 15, 2022 15:02
* convert cleanup jobs syntax

* leftovers
* patches ready

* undo driftdetect

* add syntax disabler

* add description for disable_syntax_upgrade

* linter
* fix syntax in tests

* fix some timezone issues in tests
* add option to specify request timeout in pagerduty API session.

* make explicit check against None.

* update config.md file for pagerduty. add parameter in CLI to set PD session timeout.

* update docs/root/modules/pagerduty/config.md

Co-authored-by: Ryan Lane <[email protected]>

Co-authored-by: Ryan Lane <[email protected]>
Fix exception checking order in function `_is_common_exception` so that `NoSuchBucketPolicy` is checked before `NoSuchBucket`
* first attempt

* ingest and parse s3 bucket policies and create node

* add documentation for s2bucket policy statements

* rm print statements

* edit firstseen for bucket policy statements

* add id for s3policystatement

* remove unused helper

* fix cleanup for s3 bucket policies

* fix cleanup for s3 bucket policies another typo

* integration test

* change ingest principal for s3 bucket policy

* test relationship btwn s3 bucket and its policy statements

* lint

* lint

* Update schema.md

edit s3policystatement id

Co-authored-by: Ramon Petgrave <[email protected]>
Resolve relationship between lambda and AWSPrincipal is always deleted #1006
* Bug fix #991

* Fix issue with foreign and in scope accounts

* test debug fix

* fix

* test module fix

* Update aws_foreign_accounts.json

* Fix based on Ramon's comments

* Update iam.py

* Update cartography/intel/aws/ec2/vpc_peerings.py

* Update cartography/intel/aws/ec2/vpc_peerings.py

* Apply suggestions from code review

* ramon rev2

* Update cartography/intel/aws/organizations.py

Co-authored-by: Ramon Petgrave <[email protected]>

Co-authored-by: Ramon Petgrave <[email protected]>
* aws: inspector2: finding and package lastupdated

add the missing finding.lastupdated and package.lastupdated, so that the nodes can properly cleanup

* comma
* aws_paginate optional get

Optionally get the response object from aws_paginate()'s pages. This is useful for those opt-in regions where an API may simply not include the response object if a region is not enabled.
```
 {'ResponseMetadata': {'HTTPHeaders': {'connection': 'keep-alive',
                                      'content-length': '2',
                                      'content-type': 'application/json',
                                      'date': 'Fri, 01 Nov 2022 01:01:01 GMT',
                                      'x-amz-apigw-id': '<redacted>',
                                      'x-amzn-requestid': '<redacted>',
                                      'x-amzn-trace-id': 'Root=<redacted>'},
                      'HTTPStatusCode': 200,
                      'RequestId': '<redacted>',
                      'RetryAttempts': 0}} 
```

As apposed to 

```
 {'ResponseMetadata': {'HTTPHeaders': {'connection': 'keep-alive',
                                      'content-length': '2',
                                      'content-type': 'application/json',
                                      'date': 'Fri, 01 Nov 2022 01:01:01 GMT',
                                      'x-amz-apigw-id': '<redacted>',
                                      'x-amzn-requestid': '<redacted>',
                                      'x-amzn-trace-id': 'Root=<redacted>'},
                      'HTTPStatusCode': 200,
                      'RequestId': '<redacted>',
                      'RetryAttempts': 0}} 
                      'findings':[...]
```

* log warning if key doesn't exist
* Fix #1030: add neo4j_database param to CLI

* 0.66.1
…nts (#1043)

* Fix #1042: AWS: only specify profile_name when syncing multiple accounts

* actually include the fix

* Remove extra logger call
* Build ingest query

* Linter

* Save cleanup query for another PR

* Implement schema

* bump mypy to 0.981 for python/mypy#13398

* linter

* make load_graph_data interface make more sense

* fix comment

* Docs and some better names

* add a todo

* Doc updates, rename some fields

* Fix pre-commit

* Code commment suggestions

Co-authored-by: Ramon Petgrave <[email protected]>

* Stackoverflow comment for clarity)

* Support ingesting only parts of a schema without breaking the others

* Doc comment

* Linter

* Support matching on one or more properties

* Correctly name test

* Change key_refs to TargetNodeMatcher to enforce it as a mandatory field

* Remove use of hacky default_field()

* Support subset of schema relationships for query generation, test multiple node labels

* Docstrings

* Comments in tests

* Better comments

* Test for exception conditions

* Remove irrelevant comment

Co-authored-by: Ramon Petgrave <[email protected]>
danbrauer and others added 30 commits December 11, 2024 09:53
### Summary

This PR adds to the Github graph, adding user membership in teams, for
users who are 'immediate' members of a team.

In case it is unclear or for people newer to Github, note: this is
focusing on
['immediate'](https://docs.github.com/en/graphql/reference/enums#teammembershiptype)
membership to a team, meaning a member is in the team directly as
opposed to being in a child team. A user could be considered a member of
a team if they are members of a child team, but this PR maps only
'immediate' membership. (In a follow-up PR we'd like to add child teams
to the graph, which we think will complete the membership picture.)

We think this is a valuable addition to the graph because our broad
intent is to understand all access a user has, and (at least in our org)
most access to repos is granted via team. If we do not know who is in
the team, then, we do not know who has access.

#### Illustration of the intention
![Cartography AMPS User Direct Team
Membership](https://github.com/user-attachments/assets/7d3d70ab-ab16-4a21-8970-3f9d2b8fe525)

#### Screencaps

**EXAMPLE USER LOOKUP**

BEFORE
(empty result because nothing exists)
![Screenshot 2024-12-03 at 5 07
00 PM](https://github.com/user-attachments/assets/908e8e96-179d-494a-ac7d-acc03ee54ab1)

AFTER
![Screenshot 2024-12-03 at 5 06
04 PM](https://github.com/user-attachments/assets/073376ee-b8d6-4b68-8d7e-246e9f9601a2)

**OVERVIEW OF COUNTS OF EACH TYPE**

BEFORE
(empty result because nothing exists)
![Screenshot 2024-12-03 at 5 09
03 PM](https://github.com/user-attachments/assets/c4758f8f-71ec-49b5-a94a-c3d464a9a51e)

AFTER
![Screenshot 2024-12-03 at 5 08
44 PM](https://github.com/user-attachments/assets/cc0be586-c1bf-440b-a8a0-6bc236272509)


### Related issues or links

None


### Checklist

Provide proof that this works (this makes reviews move faster). Please
perform one or more of the following:
- [x] Update/add unit or integration tests.
- [x] Include a screenshot showing what the graph looked like before and
after your changes.
- [ ] Include console log trace showing what happened before and after
your changes.

If you are changing a node or relationship:
- [x] Update the
[schema](https://github.com/lyft/cartography/tree/master/docs/root/modules)
and
[readme](https://github.com/lyft/cartography/blob/master/docs/schema/README.md).

**N/A** If you are implementing a new intel module:
- [ ] Use the NodeSchema [data
model](https://cartography-cncf.github.io/cartography/dev/writing-intel-modules.html#defining-a-node).

---------

Signed-off-by: Daniel Brauer <[email protected]>
…tors sync (#1406)

### Summary

Fix for #1404. Made this as a separate PR from
#1405 since I can't
push up to the etsy fork.

### Related issues or links

- #1404


### Checklist

Provide proof that this works (this makes reviews move faster). Please
perform one or more of the following:
- [x] Update/add unit or integration tests.
- [x] Include a screenshot showing what the graph looked like before and
after your changes.
- [ ] Include console log trace showing what happened before and after
your changes.

**Not applicable** If you are changing a node or relationship:
- [ ] Update the
[schema](https://github.com/lyft/cartography/tree/master/docs/root/modules)
and
[readme](https://github.com/lyft/cartography/blob/master/docs/schema/README.md).

**Not applicable** If you are implementing a new intel module:
- [ ] Use the NodeSchema [data
model](https://cartography-cncf.github.io/cartography/dev/writing-intel-modules.html#defining-a-node).

---------

Signed-off-by: Daniel Brauer <[email protected]>
Co-authored-by: Daniel Brauer <[email protected]>
### Summary

Docs tweak, updating docker img location



### Related issues or links

N/A

### Checklist

**N/A** Provide proof that this works (this makes reviews move faster).
Please perform one or more of the following:
- [ ] Update/add unit or integration tests.
- [ ] Include a screenshot showing what the graph looked like before and
after your changes.
- [ ] Include console log trace showing what happened before and after
your changes.

**N/A** If you are changing a node or relationship:
- [ ] Update the
[schema](https://github.com/lyft/cartography/tree/master/docs/root/modules)
and
[readme](https://github.com/lyft/cartography/blob/master/docs/schema/README.md).

**N/A** If you are implementing a new intel module:
- [ ] Use the NodeSchema [data
model](https://cartography-cncf.github.io/cartography/dev/writing-intel-modules.html#defining-a-node).

Signed-off-by: Daniel Brauer <[email protected]>
**NOTE**: Turn on "Hide whitespaces" to review

### Summary

We got some memory problem recently especially after NVD API becomes
more unstable and retries are more often. This change attempts to reduce
the memory leak while many requests are created, also logging the
requests to help debug easier as there might be more improvements needed
for this to work nicer.

* Added a log statement to indicate when the NIST NVD API is being
called, which includes the URL and parameters being used.
* Wrap the `requests.get` call to use a context manager (`with`
statement) to ensure the response is properly closed after use.

---------

Signed-off-by: Khanh Le Do <[email protected]>
…ollaborators (#1409)

### Summary
> Describe your changes.

Allows the GitHub sync to continue if the user is unable to list
repository collaborators.


### Checklist

Provide proof that this works (this makes reviews move faster). Please
perform one or more of the following:
- [x] Update/add unit or integration tests.
- [x] Include a screenshot showing what the graph looked like before and
after your changes.
- [ ] Include console log trace showing what happened before and after
your changes.

If you are changing a node or relationship:
- [ ] Update the
[schema](https://github.com/lyft/cartography/tree/master/docs/root/modules)
and
[readme](https://github.com/lyft/cartography/blob/master/docs/schema/README.md).

If you are implementing a new intel module:
- [ ] Use the NodeSchema [data
model](https://cartography-cncf.github.io/cartography/dev/writing-intel-modules.html#defining-a-node).
### Summary

This PR fixes the issue described in issue #1374—please see the issue
and discussion there for details, but here it is in summary:

The 'role' property on a GitHubUser node expresses whether a user is an
'ADMIN' (aka Owner) or a 'MEMBER' of a GitHubOrganization. For a graph
with multiple orgs, however, the role property will reflect only one of
the user-org relationships. If the user is an admin in one org and a
member in another, for example, the 'role' property will be wrong for
one of those relationships. To fix this, this PR shifts the role to be
expressed in the relationship between GitHubUser and GitHubOrganization.

Below are a few before/after screen caps to show intent and the PR
working.

#### **USER IS AN OWNER OF ONE ORG, MEMBER OF ANOTHER**

BEFORE
Here the user is graphed as 'MEMBER_OF' to both orgs. The 'role'
property (not pictured) is set to reflect one of those relationships, so
it is partly wrong.
![Screenshot 2024-12-09 at 5 22
15 PM](https://github.com/user-attachments/assets/4b1094ba-0020-4d10-a397-52653e0214e9)

AFTER
The relationship is now updated to reflect that the user is an admin of
one of the orgs. And, as suggested by @achantavy in #1374, admins now
get two user-org relationships "so that we can determine what org a user
belongs to in a uniform way".
![Screenshot 2024-12-09 at 5 08
04 PM](https://github.com/user-attachments/assets/6c654b5f-0363-4045-9167-2f6459371cbc)

#### **MEMBERS AND UNAFFILIATEDS REMAIN THE SAME**

BEFORE & AFTER
This user is handy for demonstration purposes because they are a member
of one org, and unaffiliated to another (the user in question is an
enterprise owner, and is not a member of the second org). These
relationships remain untouched by this PR, and so the before and after
results for the query shown are identical.
![Screenshot 2024-12-09 at 5 07
20 PM](https://github.com/user-attachments/assets/4dffb9a5-691d-4099-9169-9f4195814c7b)


#### **RELATIONSHIP COUNTS**

BEFORE
Only MEMBER_OF and UNAFFILIATED is graphed, and to know who is an ADMIN
you have to look at a property:
![Screenshot 2024-12-11 at 8 44
21 PM](https://github.com/user-attachments/assets/ec8c7675-13d1-4ecd-a059-2154a3e9beb1)
![Screenshot 2024-12-11 at 9 03
52 PM](https://github.com/user-attachments/assets/c78d5db8-7ff4-44b1-b76d-67ab6bd5011f)
(Note there are 16 admins, but also note how I had to specify folks who
are 'UNAFFILIATED' in my query. If I hadn't done that, I would've gotten
a higher number, as some of those folks are admins but of a different
org. So the result would have been wrong.)

AFTER
In addition to the exiting relationships that were there before, there
are 16 new ADMIN_OF relationships for every node that _had_ an ADMIN
role property.
![Screenshot 2024-12-11 at 8 41
56 PM](https://github.com/user-attachments/assets/d70a5583-9366-42f9-82b7-5c51d426a7f8)



### Related issues or links

- #1374 


### Checklist

Provide proof that this works (this makes reviews move faster). Please
perform one or more of the following:
- [x] Update/add unit or integration tests.
- [x] Include a screenshot showing what the graph looked like before and
after your changes.
- [ ] Include console log trace showing what happened before and after
your changes.

If you are changing a node or relationship:
- [x] Update the
[schema](https://github.com/lyft/cartography/tree/master/docs/root/modules)
and
[readme](https://github.com/lyft/cartography/blob/master/docs/schema/README.md).

**NOT APPLICABLE** you are implementing a new intel module:
- [ ] Use the NodeSchema [data
model](https://cartography-cncf.github.io/cartography/dev/writing-intel-modules.html#defining-a-node).

Signed-off-by: Daniel Brauer <[email protected]>
### Summary

The [current
code](https://github.com/cartography-cncf/cartography/blob/4d53bce6d9f3f6703b709b70071cbcc36820a5bb/cartography/intel/cve/feed.py#L73-L113)
uses of variable `sleep_time` for both retries and sleep between
requests. Furthermore, the `sleep_time` was not reset in the next
request, meaning that if the first request increases it to 16 seconds,
the second request will continue to take that up to every bigger number
(the `retries` count is reset though which makes it worse).

Two changes here:
- The sleep between requests is set to be a dedicated variable
`sleep_between_requests`.
- I decided to rewrite the request retry logic more properly using
HttpAdapter's retry policy.

Unfortunately, I tried pretty hard to see if we can keep
`test_call_cves_api_with_error` which tests the retry logic but it's not
very feasible. Also it doesn't make a lot of sense to test something
managed by the Session object itself. So I had to drop it.

### Testing

I think existing integ test can confirm the code is still working. The
retry change itself will need some manual review to see if it makes
sense.

---------

Signed-off-by: Khanh Le Do <[email protected]>
### Summary
We had hardcoded `2002` as the initial year of CVE NVD to ingest.
Recently NVD updated CVE information from 1999.
Starting year per yer ingestion from 1999 onwards.


### Related issues or links
N/A


### Checklist

Provide proof that this works (this makes reviews move faster). Please
perform one or more of the following:
- [] Update/add unit or integration tests.
- [] Include a screenshot showing what the graph looked like before and
after your changes.
- [] Include console log trace showing what happened before and after
your changes.

Signed-off-by: Eryx Paredes <[email protected]>
### NOTE: all the below still holds by the relationships between the
nodes has been changed from 'CHILD_TEAM' to 'MEMBER_OF_TEAM'

### Summary

This PR adds to the Github graph, adding child team members of teams.

This is very similar to, and can be considered a follow-up to the prior
PR #1395, which added 'immediate' user members of teams. Now this PR is
adding child-teams. Between these two, the graph can now answer the
question of who is a member of a team, either directly or via child
teams.

We think this is a valuable addition to the graph because our broad
intent is to understand all access a user has. Since access is
frequently granted to teams, we need the complete picture of team
membership.

#### Illustration of the intention
![Cartography AMPS Team Child Team
Membership](https://github.com/user-attachments/assets/9620f50b-310a-43d1-a15b-28fe8480bbc4)

#### Screencaps

**EXAMPLE ALL TEAMS LOOKUP**

BEFORE
(nothing is returned because the relationship does not exist)
![Screenshot 2024-12-06 at 11 23
54 AM](https://github.com/user-attachments/assets/cd2310cd-d2ff-4faf-876a-00e798cfe1e9)

AFTER
(details not visible here but hopefully this related a sense of the
relationships)
![Screenshot 2024-12-06 at 11 53
11 AM](https://github.com/user-attachments/assets/5dfdd96f-526c-4299-9c1e-464f84d6e9d9)

**EXAMPLE SINGLE TEAM LOOKUP**

BEFORE
(nothing is returned because the relationship does not exist)
![Screenshot 2024-12-06 at 12 44
55 PM](https://github.com/user-attachments/assets/5dc0419b-ac4d-4038-9005-e2b41703563a)

AFTER
![Screenshot 2024-12-06 at 12 44
32 PM](https://github.com/user-attachments/assets/926776e2-c497-4c10-a80a-05cf54de532b)

#### Note on loops / cyclic graphs

GitHub does not seem to allow the creation of loops, eg `Team A <-- Team
B <-- Team A`, so this PR has no handling or guarding against that sort
of thing.



### Related issues or links

None.


### Checklist

Provide proof that this works (this makes reviews move faster). Please
perform one or more of the following:
- [x] Update/add unit or integration tests.
- [x] Include a screenshot showing what the graph looked like before and
after your changes.
- [ ] Include console log trace showing what happened before and after
your changes.

If you are changing a node or relationship:
- [x] Update the
[schema](https://github.com/lyft/cartography/tree/master/docs/root/modules)
and
[readme](https://github.com/lyft/cartography/blob/master/docs/schema/README.md).

**Not applicable** If you are implementing a new intel module:
- [ ] Use the NodeSchema [data
model](https://cartography-cncf.github.io/cartography/dev/writing-intel-modules.html#defining-a-node).

---------

Signed-off-by: Daniel Brauer <[email protected]>
### Summary
> Describe your changes.

Fixes #1415.

When we call list-permission-sets on an account that does not have
identitycenter enabled, the AWS API raises an AccessDeniedException.

This PR wraps our call with `@aws_handle_regions` so that we no longer
raise an unhandled exception, allowing the sync to continue.

### Related issues or links
> Include links to relevant issues or other pages.

- #1415


### Checklist

Provide proof that this works (this makes reviews move faster). Please
perform one or more of the following:
- [x] Update/add unit or integration tests.

---------

Signed-off-by: Alex Chantavy <[email protected]>
0.97.0 release

Signed-off-by: Alex Chantavy <[email protected]>
### Summary
> Describe your changes.

Fix #1421. Handles permissions exceptions when enumerating
identitycenter.

### Related issues or links
> Include links to relevant issues or other pages.

- #1421


### Checklist

Provide proof that this works (this makes reviews move faster). Please
perform one or more of the following:
- [x] Update/add unit or integration tests.
- [ ] Include a screenshot showing what the graph looked like before and
after your changes.
- [ ] Include console log trace showing what happened before and after
your changes.

Signed-off-by: Alex Chantavy <[email protected]>
…rce (#1419)

### Summary
> Describe your changes.

#1413 

Removes a restriction so that cartography's data model can now
automatically clean up NodeSchemas that don't have a tenant
relationship.

In these cases, the nodes will not be deleted even if they are
considered stale, but their stale relationships will be.

We do it this way because cartography syncs assets one tenant at a time,
and stale nodes can only be safely deleted if they are tied to a tenant
so that we do not erroneously delete nodes attached to other tenants.

### Related issues or links
> Include links to relevant issues or other pages.
- #1413

We discussed this in the above issue but I'll summarize a bit here: in
the case of GitHub, GitHubUsers don't have a true "tenant" relationship
with their organization because they can exist independent of their
organizations. In this case, it makes sense for a GitHubUser to have a
node schema that does not have a tenant rel. If we want to list out all
of the users in an organization, then we simply query for the
organization and its attached users.


### Checklist

Provide proof that this works (this makes reviews move faster). Please
perform one or more of the following:
- [x] Update/add unit or integration tests.
- [ ] Include a screenshot showing what the graph looked like before and
after your changes.
- [ ] Include console log trace showing what happened before and after
your changes.

---------

Signed-off-by: Alex Chantavy <[email protected]>
… and GCP (#1425)

### Summary
> Describe your changes.

Fixes bugs where we added duplicate values to the
`exposed_internet_type` field in the AWS EC2 and GCP analysis jobs.

### Bugfix 1: Added an additional CASE condition to handle existing
lists:
- Add `['direct']` (as an array) instead of just the string 'direct' so
that the concatenation in Neo4j works as expected
- Added an ELSE clause to return the existing list unchanged if 'direct'
is already present
This now properly maintains a list without duplicating the 'direct'
entry.


### Bugfix 2. Remove a slow AllNodesScan
The query
```cypher
MATCH (n)
WHERE 
    n.exposed_internet IS NOT NULL AND 
    labels(n) IN ['AutoScalingGroup', 'EC2Instance', 'LoadBalancer', 'LoadBalancerV2'] 
WITH n LIMIT $LIMIT_SIZE 
REMOVE n.exposed_internet, n.exposed_internet_type 
return COUNT(*) as TotalCompleted
```

is supposed to remove the `exposed_internet` and `exposed_internet_type`
fields from all nodes `n` if node `n` is an AutoScalingGroup,
EC2Instance, LoadBalancer, or LoadBalancerV2. However this query is
wrong because `labels(n)` will return `['EC2Instance']`, and
`['EC2Instance']` is not in the array `['AutoScalingGroup',
'EC2Instance', 'LoadBalancer', 'LoadBalancerV2']`.

Further, the larger issue is that it would be more performant to match
on specific labels instead of all nodes for a large graph. To fix we
will match on specific labels.


## Before

![image](https://github.com/user-attachments/assets/87ae742e-a3d4-4f40-aaf9-4fc953dfb7a2)

The query removes nothing, and `exposed_internet_type` is a list of
repeating items:


![image](https://github.com/user-attachments/assets/9e80d8f6-cc92-4939-948a-48f548d80557)


## After

![image](https://github.com/user-attachments/assets/373707d0-3373-40b6-bccd-96998514524e)

Now that I've separated it to specific labels, the query performs as
intended.


![image](https://github.com/user-attachments/assets/ce0bba63-9b7e-44a7-ba8f-5cfdef6b4e02)


### Related issues or links
> Include links to relevant issues or other pages.

- #386







### Checklist

Provide proof that this works (this makes reviews move faster). Please
perform one or more of the following:
- [ ] Update/add unit or integration tests.
- [x] Include a screenshot showing what the graph looked like before and
after your changes.
- [ ] Include console log trace showing what happened before and after
your changes.

---------

Signed-off-by: Alex Chantavy <[email protected]>
### Summary
> Describe your changes.

Updates EC2 key pair sync to use the data model.

Fixes an issue that created duplicate key pairs because the older
version of the key pair sync had extra node labels but the
EC2KeyPairInstance model did not:

![image](https://github.com/user-attachments/assets/b9a004bc-de4a-458d-a7d1-9c6728548ba8)

### Checklist

Provide proof that this works (this makes reviews move faster). Please
perform one or more of the following:
- [x] Update/add unit or integration tests.
- [ ] Include a screenshot showing what the graph looked like before and
after your changes.
- [ ] Include console log trace showing what happened before and after
your changes.

If you are changing a node or relationship:
- [ ] Update the
[schema](https://github.com/lyft/cartography/tree/master/docs/root/modules)
and
[readme](https://github.com/lyft/cartography/blob/master/docs/schema/README.md).

If you are implementing a new intel module:
- [x] Use the NodeSchema [data
model](https://cartography-cncf.github.io/cartography/dev/writing-intel-modules.html#defining-a-node).

---------

Signed-off-by: Alex Chantavy <[email protected]>
Release bump for 0.98.0rc1
### Summary
> Describe your changes.

Removes various 'this document has been moved here' strings across the
sphinx docs, as those have been in place for at least 2 years now.

Cleans up some sphinx markdown problems such as this one:

Before:
![Screenshot 2024-12-30 at 9 26
10 PM](https://github.com/user-attachments/assets/5d5b865d-8696-4a89-a334-b799f8278d81)

After:
![Screenshot 2024-12-30 at 10 07
25 PM](https://github.com/user-attachments/assets/14421954-db9b-47f8-89bc-e513c3f78363)

---------

Signed-off-by: Alex Chantavy <[email protected]>
### Summary
Fixes #1428.

Consolidates the publish to PyPI and GHCR steps so that we first publish
to PyPI _and then_ publish to GHCR.

Previously this happened in parallel. This was problematic because
although both PyPI and GHCR releases would have the same version tag,
the GHCR image would actually be installed with the _previous_ version
of cartography because the PyPI step was not complete yet.


### Related issues or links
> Include links to relevant issues or other pages.

- #1428
- #1420


### Checklist

Provide proof that this works (this makes reviews move faster). Please
perform one or more of the following:
- [ ] Update/add unit or integration tests.
- [ ] Include a screenshot showing what the graph looked like before and
after your changes.
- [x] Include console log trace showing what happened before and after
your changes.

#### Proof that this passes the VERSION field to the docker image:

(ignore the unrelated messages related to a dockerignore file; the image
built successfully)

```
➜  cartography git:(asdf) ✗ export VERSION=0.98.0rc1
➜  cartography git:(asdf) ✗ docker build --build-arg VERSION=$VERSION -t ghcr.io/your-org/cartography:$VERSION -f Dockerfile /.
[+] Building 45.9s (8/8) FINISHED                                                                                                                                  docker:desktop-linux
 => [internal] load build definition from Dockerfile                                                                                                                               0.0s
 => => transferring dockerfile: 653B                                                                                                                                               0.0s
 => [internal] load metadata for docker.io/library/python:3.10-slim                                                                                                                0.4s
 => ERROR [internal] load .dockerignore                                                                                                                                            0.0s
 => => transferring context: 45B                                                                                                                                                   0.0s
 => [1/4] FROM docker.io/library/python:3.10-slim@sha256:bdc6c5b8f725df8b009b32da65cbf46bfd24d1c86dce2e6169452c193ad660b4                                                          0.0s
 => => resolve docker.io/library/python:3.10-slim@sha256:bdc6c5b8f725df8b009b32da65cbf46bfd24d1c86dce2e6169452c193ad660b4                                                          0.0s
 => CACHED [2/4] WORKDIR /var/cartography                                                                                                                                          0.0s
 => [3/4] RUN pip install cartography==0.98.0rc1                                                                                                                                  33.1s
 => [4/4] RUN cartography -h                                                                                                                                                       0.9s
 => exporting to image                                                                                                                                                            11.5s
 => => exporting layers                                                                                                                                                            8.7s
 => => exporting manifest sha256:9133b561069dafaa94c449fa09bd32cc4342700f326b7827d839bebd824e8d59                                                                                  0.0s
 => => exporting config sha256:30eb18f3afcce2ec6c03953d15967dcb9fd9bbf41f828ebf82254c2bfeaac0a8                                                                                    0.0s
 => => exporting attestation manifest sha256:478d2472fb23c8d82630ccd765b9a242ba06a034c842c69b9569bfd4ff41bf00                                                                      0.0s
 => => exporting manifest list sha256:7544facf7a37c613fd4396db4d4d6ae07558d38c4948c76c33af008a84f004b2                                                                             0.0s
 => => naming to ghcr.io/your-org/cartography:0.98.0rc1                                                                                                                            0.0s
 => => unpacking to ghcr.io/your-org/cartography:0.98.0rc1                                                                                                                         2.8s
------
 > [internal] load .dockerignore:
------
```

#### Proof that the correct version is installed in the image

<img width="1297" alt="Screenshot 2025-01-01 at 11 55 10 AM"
src="https://github.com/user-attachments/assets/c1fa8f8a-19cb-4319-b3ee-9818eedeca80"
/>


## Other changes coming up soon
- Migrate from setup.py to pyproject.toml to modernize the build by
removing a hardcoded version field in setup.py
- Migrate from pip to uv for faster build

---------

Signed-off-by: Alex Chantavy <[email protected]>
0.98.0rc2 rc bump to test out fix for #1428 end to end
Fixes a misplaced single quote.

## Before:

![image](https://github.com/user-attachments/assets/0b73bf65-4802-49bb-a64f-0d46aa2dd250)
We set `VERSION_SPECIFIER='=='${{ env.VERSION }}`, and this evaluates to
`'=='0.98.0rc2`.

Instead, we need it to evaluate to `'==0.98.0rc2'`.

## After:
I can't test this in github actions unless this is merged and deployed,
but I tried to test locally:

![image](https://github.com/user-attachments/assets/8831765c-3073-4823-b89c-2d779b7f62f0)

This PR moves the `'` character to what I think is the correct place for
the version specifier string to be interpreted correctly by pip.
### Summary

`setup.py` based builds are being deprecated in favor of
`pyproject.toml` based builds.

This PR migrates the project from setup.py to pyproject.toml.

One big benefit is that we will now no longer need to file PRs to bump
release versions in setup.py.

### Related issues

#1016 
___
Read through our [developer
docs](https://lyft.github.io/cartography/dev/developer-guide.html)

- [x] PR Title starts with "Fixes: [issue number]"

---------

Signed-off-by: Alex Chantavy <[email protected]>
Co-authored-by: Chandan <[email protected]>
Co-authored-by: Alex Chantavy <[email protected]>
### Summary

`oauth2client` has been deprecated since 2019. Updating the library we
use to authenticate with Google to the recommended alternative option,
still maintained by Google.

See:
https://google-auth.readthedocs.io/en/master/oauth2client-deprecation.html

### Related issues or links
-

### Checklist

Provide proof that this works (this makes reviews move faster). Please
perform one or more of the following:
- [ ] Update/add unit or integration tests.
- [x] Include a screenshot showing what the graph looked like before and
after your changes.
- [ ] Include console log trace showing what happened before and after
your changes.

Tested by running the gsuite module locally with some internal
credentials. Module found the appropriate creds using the legacy
delegated method and finished without a problem.

![Screenshot 2025-01-02 at 4 38 58 p
m](https://github.com/user-attachments/assets/df669b12-4dc5-44fd-80a2-3fc4aca6c5c9)

![Screenshot 2025-01-02 at 4 39 39 p
m](https://github.com/user-attachments/assets/7c3e6475-e807-4f84-a92f-bc6ff1b9177a)

---------

Signed-off-by: Sergio Franco <[email protected]>
The publish to GHCR step is currently failing with this error:

![image](https://github.com/user-attachments/assets/2bd96bee-3acd-49df-8ece-8a7d2920288a)

## Actual
The resulting command generated is `RUN pip install
cartography'==0.98.0rc3'`..

## Expected
.. when I wanted it to be `RUN pip install cartography==0.98.0rc3`.


## In this PR
I adjusted the version specifier and re-tested it locally and _think_
that this will now work in GitHub actions.


![image](https://github.com/user-attachments/assets/fe40c3bf-c12c-4a2b-94d3-78235850018b)

Signed-off-by: Alex Chantavy <[email protected]>
### Summary
> Describe your changes.

For security reasons, PyPI does not allow us to overwrite a previously
released version of cartography when publishing. Our GitHub actions
publish job currently (1) publishes to PyPI and then (2) installs that
PyPI release on a Docker image and (3) pushes that image to GHCR.

Step (2) is prone to race conditions where PyPI is not immediately ready
to serve the newly published package, causing the job to fail. Then when
we retry, step (1) will fail because the PyPI version is already present
in PyPIP, causing steps (2) and (3) to be unreachable.

This PR makes step (1) idempotent so that we can retry steps (2) and
(3).


## Reference


https://github.com/pypa/gh-action-pypi-publish?tab=readme-ov-file#tolerating-release-package-file-duplicates

Signed-off-by: Alex Chantavy <[email protected]>
### Summary
> Describe your changes.

Makes GitHub sphinx docs less ugly.

Before:
<img width="720" alt="Screenshot 2025-01-01 at 10 31 21 PM"
src="https://github.com/user-attachments/assets/6d1a7a0a-e3ae-43b9-907f-c0f724fe15fc"
/>

After:
<img width="729" alt="Screenshot 2025-01-01 at 10 31 11 PM"
src="https://github.com/user-attachments/assets/162acdde-865f-40b3-97fb-e88bf107952a"
/>

Signed-off-by: Alex Chantavy <[email protected]>
Co-authored-by: i_virus <[email protected]>
### Summary
> Describe your changes.

If the list passed to load() is empty, we now return early to save time
from checking indexes, generating a query string, and talking to the
database.

Also fixes this for AWS' resourcegroupstaggingapi sync.

Since many cartography users likely have nothing in many regions, we
might as well save ourselves some network and database calls.


### Checklist

Provide proof that this works (this makes reviews move faster). Please
perform one or more of the following:
- [x] Update/add unit or integration tests.
- [ ] Include a screenshot showing what the graph looked like before and
after your changes.
- [ ] Include console log trace showing what happened before and after
your changes.

If you are changing a node or relationship:
- [ ] Update the
[schema](https://github.com/lyft/cartography/tree/master/docs/root/modules)
and
[readme](https://github.com/lyft/cartography/blob/master/docs/schema/README.md).

If you are implementing a new intel module:
- [ ] Use the NodeSchema [data
model](https://cartography-cncf.github.io/cartography/dev/writing-intel-modules.html#defining-a-node).

---------

Signed-off-by: Alex Chantavy <[email protected]>
### Summary
> Describe your changes.

When making a new release, we 
1. publish the release to PyPI
2. install it from PyPI to the Dockerfile
3. publish that Dockerfile to GHCR

After publishing to PyPI however we need a few more seconds for the
package to be available. Currently we skip over this and the latest
release is not available, causing the build to fail.

![image](https://github.com/user-attachments/assets/4d7aa967-d0e6-4ed8-8ef1-be25b573eb06)

https://github.com/cartography-cncf/cartography/actions/runs/12577359450/job/35054696773

This PR makes it so that we now wait 10 seconds for PyPI before
attempting to build the image.

Signed-off-by: Alex Chantavy <[email protected]>
### Summary
> Describe your changes.

Adds a security policy.

### Related issues or links
> Include links to relevant issues or other pages.

- #1438

Credit to https://github.com/falcosecurity/falco/security where I
plagiarized the text from. I think it accomplishes what we need it to
without introducing heavyweight process that we aren't ready for.

Signed-off-by: Alex Chantavy <[email protected]>
### Summary
> Describe your changes.
Adds `:MEMBER` relationship to `:AWSAccount` and `:AWSInspectorFinding` when the
finding is ingested via a delegated account.


### Related issues or links
> Include links to relevant issues or other pages.

- #1439
-
https://docs.aws.amazon.com/inspector/latest/user/admin-member-relationship.html
-
https://docs.aws.amazon.com/organizations/latest/userguide/services-that-can-integrate-inspector2.html

### Checklist

Provide proof that this works (this makes reviews move faster). Please
perform one or more of the following:
- [x] Update/add unit or integration tests.
- [x] Include a screenshot showing what the graph looked like before and
after your changes.
- [ ] Include console log trace showing what happened before and after
your changes.

If you are changing a node or relationship:
- [x] Update the
[schema](https://github.com/lyft/cartography/tree/master/docs/root/modules)
and
[readme](https://github.com/lyft/cartography/blob/master/docs/schema/README.md).

If you are implementing a new intel module:
- [ ] Use the NodeSchema [data
model](https://cartography-cncf.github.io/cartography/dev/writing-intel-modules.html#defining-a-node).

---------

Signed-off-by: Alex Chantavy <[email protected]>
Signed-off-by: Eryx Paredes <[email protected]>
Co-authored-by: Alex Chantavy <[email protected]>
### Summary
This covers an edge case where the EC2 instance exists in one region,
but the AMI is from another region .
There isn't much documentation available, but I noticed it more on Amazon
Linux recommended AMIs for EKS.

### Related issues or links
- https://winder.ai/how-to-list-all-amis-for-each-region-in-aws/
- https://docs.aws.amazon.com/eks/latest/userguide/retrieve-ami-id.html
-
https://docs.aws.amazon.com/eks/latest/userguide/eks-optimized-ami.html

With no multi region support:
![Screenshot 2025-01-16 at 2 08 54 p
m](https://github.com/user-attachments/assets/0e58e4d2-e083-4886-ad82-9f4c8add624c)

With multi region support
![Screenshot 2025-01-16 at 2 08 43 p
m](https://github.com/user-attachments/assets/8a30cb2c-8d3c-41f2-bb83-4c89a6e63bdf)


### Checklist

Provide proof that this works (this makes reviews move faster). Please
perform one or more of the following:
- [x] Update/add unit or integration tests.
- [ ] Include a screenshot showing what the graph looked like before and
after your changes.
- [ ] Include console log trace showing what happened before and after
your changes.

If you are changing a node or relationship:
- [ ] Update the
[schema](https://github.com/lyft/cartography/tree/master/docs/root/modules)
and
[readme](https://github.com/lyft/cartography/blob/master/docs/schema/README.md).

If you are implementing a new intel module:
- [ ] Use the NodeSchema [data
model](https://cartography-cncf.github.io/cartography/dev/writing-intel-modules.html#defining-a-node).

---------

Signed-off-by: Alex Chantavy <[email protected]>
Signed-off-by: Eryx Paredes <[email protected]>
Co-authored-by: Alex Chantavy <[email protected]>
Co-authored-by: i_virus <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.