Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(openchallenges): add CI workflow for updating the DB dump files on a daily cadence #2489

Merged
merged 15 commits into from
Feb 13, 2024

Conversation

vpchung
Copy link
Member

@vpchung vpchung commented Feb 7, 2024

Fixes #2485
Fixes #2418

Changelog

  • add script for pulling data from the OC Data Google Sheet, cleaning it up, then dumping it into CSV files
  • add CI workflow to update the DB CSV files daily and open a PR
  • add Google client (GC) credentials file (needed for gspread Python library) to .gitignore

Note

I need to figure out how to add and use schematic as a step, but the workflow as-is achieves the goal of updating the CSV files (see screenshots below).

Preview

  • Example PR:
Screenshot 2024-02-07 at 9 45 51 PM
  • Files updated:
Screenshot 2024-02-07 at 9 46 00 PM

TODO for repo admin

The workflow uses two secrets:

  • secrets.GC_JSON - used to create credentials file for the Google client APIs
  • secrets.GITHUB_TOKEN - used to push changes and create PR

From what I can tell with the other workflows, GITHUB_TOKEN is already set so no further action is required here.

For GC_JSON, the credentials will be shared over LastPass (see Shared-OpenChallenges > Google Service Account JSON Key). The key will need to be added as-is (in JSON format) as a secret to this repo.

@vpchung vpchung self-assigned this Feb 8, 2024
@vpchung vpchung added the sonar-scan-approved-deprecated Ready for Sonar code analysis label Feb 8, 2024
@vpchung vpchung marked this pull request as ready for review February 8, 2024 06:11
Copy link
Member

@tschaffter tschaffter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is cool stuff. I made a few comments to avoid stepping on the toes of other products being developed in Sage Monorepo.

An improvement would be to add steps to the workflow that check if a DB update PR already exists. If it does, the PR could be updated (preferred) or closed before opening a new one (more noise on the notification side).

Another "improvement" I have in mind would be to create an Nx project that enables developers to perform this task locally. The project would be a containerized python script that would provide the following commands:

nx build-image openchallenges-db-csv-update (name pending)
nx serve openchallenges-db-csv-update

This container could then be used by the GitHub workflow.

I'm currently exploring this approach for seeding the iAtlas database (Mongo). Let's touch base on this next week. Meanwhile, let's merge this PR once you have reviewed and addressed comments from reviewers. This is great work!!

@vpchung
Copy link
Member Author

vpchung commented Feb 12, 2024

@tschaffter : Another "improvement" I have in mind would be to create an Nx project that enables developers to perform this task locally.

Oh yes, looking back at a similar ticket (from ages ago 🤣 ), you made a note about creating a project for this resource. I will look into it after this PR - thanks!!

@vpchung
Copy link
Member Author

vpchung commented Feb 12, 2024

@tschaffter : An improvement would be to add steps to the workflow that check if a DB update PR already exists. If it does, the PR could be updated (preferred) or closed before opening a new one (more noise on the notification side).

This makes sense. Currently, a PR is created with a unique title based on the current date. With your suggestion then, I should add a step where it checks for an existing branch like db-update-* first - if there is one, push a new commit to it. Otherwise, create a new branch + PR.

@vpchung
Copy link
Member Author

vpchung commented Feb 12, 2024

2/12 update

Workflow has been updated so that commits are added to the existing PR if there is already a db-update-* branch open. Below are screenshots of what we can expect:

Use case: db-update branch does not exist

  • New branch created

    • branch name will include the current date
  • New PR created

    • PR title will include the current date
    • PR will be auto-labeled with sonar-scan-approved
    • commit message will also include the current date

Screenshot 2024-02-12 at 3 11 30 PM

  • Files updated

Screenshot 2024-02-12 at 3 07 59 PM

Use case: db-update branch currently exists

  • Checkout existing branch

  • Existing PR will be updated

    • commit message will include the current date when changes are pushed

Screenshot 2024-02-12 at 3 11 21 PM

Note

Because I tested the workflow on the same day, the commit dates are the same in the screenshot above

  • Files updated

Screenshot 2024-02-12 at 3 12 52 PM

TODO for repo admin

@tschaffter The workflow uses two secrets:

  • secrets.GOOGLE_CLIENT_JSON - used to create credentials file for the Google client APIs
  • secrets.GITHUB_TOKEN - used to push changes and create PR

From what I can tell with the other workflows, GITHUB_TOKEN is already set so no further action is required here.

For GOOGLE_CLIENT_JSON, the credentials will be shared over LastPass (see Shared-OpenChallenges > Google Service Account JSON Key). The key will need to be added as-is (in JSON format) as a secret to this repo.

The "Action permissions" of the repository will also need to be updated to the following as well (if it isn't already):

Screenshot 2024-02-12 at 3 20 29 PM

These settings can be accessed by going to Settings > Actions > General

@vpchung vpchung requested a review from tschaffter February 12, 2024 23:23
@tschaffter
Copy link
Member

Similarly to another comments, can you prefix the feature branch name with openchallenges/db-update- (<product>/). Here I suggest using the slash as separator similarly to pr/vpchung/2404.

@tschaffter
Copy link
Member

Can the title of the PR be updated with the latest date in case the PR content is updated?

@vpchung
Copy link
Member Author

vpchung commented Feb 13, 2024

The workflow has been further simplified by leveraging the create-pull-request action. By using a static branch name (openchallenges/db-update in this case), the action will auto-update the PR as needed, e.g.

Screenshot 2024-02-12 at 4 20 36 PM

@vpchung vpchung merged commit 73aeb0b into Sage-Bionetworks:main Feb 13, 2024
8 checks passed
@vpchung vpchung deleted the bug-2418 branch February 13, 2024 02:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sonar-scan-approved-deprecated Ready for Sonar code analysis
Projects
None yet
2 participants