Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adapt to tt-system-tools hugepages configuration #14396

Merged
merged 25 commits into from
Jan 2, 2025
Merged

Conversation

blozano-tt
Copy link
Contributor

@blozano-tt blozano-tt commented Oct 28, 2024

Tickets

metal-internal-workflows

Update provisioning to use new hugepages flow from tt-system-tools

tt-metal

Many metal machines don't have correct hugepages config
#15675

Problem description

The old hugepages configuration was janky, and the plan of record is to move to use tt-system-tools method.

What's changed

  • Remove invocation of sudo /etc/rc.local in mount weka script.
  • Use our custom action ensure-active-weka-mount consistently everywhere we try to mount weka
  • Remove setup_hugepages.py
  • Add forced garbage collection after every pytest execution to reduce pressure on system memory

Checklist

  • Post commit CI passes
  • Blackhole Post commit (if applicable)
  • Model regression CI testing passes
  • Device performance regression CI testing passes (if applicable)
  • New/Existing tests provide coverage for changes

@blozano-tt blozano-tt added the P1 label Dec 12, 2024
    systems and post-tt tools systems. Add a loop to check that the
    mount worked
This reverts commit 9324ac3.

Revert "REVERT ME - use issue-15821 for all tgg"

This reverts commit c6ea4b7.

Revert "REVERT ME - use issue-15821 for all tg"

This reverts commit 655f9a2.

Revert "REVERT ME - use issue-15821 for all t3k"

This reverts commit 9450c73.

Revert "REVERT ME - use issue-15821 for single card post commit"

This reverts commit f5c6bbc.

Revert "REVERT ME - make t3k unit tests use new machines"

This reverts commit 994fabe.

Revert "REVERT ME - make demo tests use new runners"

This reverts commit 9d611fd.
@tt-rkim tt-rkim merged commit b4a702b into main Jan 2, 2025
10 checks passed
@tt-rkim tt-rkim deleted the blozano/rc_local branch January 2, 2025 08:39
mcw-anasuya pushed a commit that referenced this pull request Jan 2, 2025
### Tickets

#### metal-internal-workflows
Update provisioning to use new hugepages flow from tt-system-tools
-
https://github.com/tenstorrent-metal/metal-internal-workflows/issues/278
- https://github.com/tenstorrent-metal/metal-internal-workflows/pull/327

#### tt-metal
Many metal machines don't have correct hugepages config
#15675

### Problem description
The old hugepages configuration was janky, and the plan of record is to
move to use tt-system-tools method.

### What's changed
- Remove invocation of `sudo /etc/rc.local` in mount weka script.
- Use our custom action `ensure-active-weka-mount` consistently
everywhere we try to mount weka
- Remove setup_hugepages.py
- Add forced garbage collection after every pytest execution to reduce
pressure on system memory

### Checklist
- [ ] Post commit CI passes
- [ ] Blackhole Post commit (if applicable)
- [x] [Model regression CI testing
passes](https://github.com/tenstorrent/tt-metal/actions/runs/12290324750)
- [ ] Device performance regression CI testing passes (if applicable)
- [ ] New/Existing tests provide coverage for changes

---------

Co-authored-by: Raymond Kim <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
infra-ci infrastructure and/or CI changes P1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants