Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[gpu][spark-rapids] Consolidate mig.sh Scripts and Sync Driver Installation Steps Across Copies #1259

Open
SurajAralihalli opened this issue Nov 8, 2024 · 8 comments · May be fixed by #1269

Comments

@SurajAralihalli
Copy link
Contributor

I've observed that there are multiple mig.sh scripts in both /spark-rapids/mig.sh and /gpu/mig.sh.

Additionally, the driver installation steps in /spark-rapids/mig.sh and /spark-rapids/spark-rapids.sh have diverged, with most updates happening in /spark-rapids/spark-rapids.sh

I wish we could have a single source for mig.sh

@cjac
Copy link
Contributor

cjac commented Nov 8, 2024

That's a reasonable request. We could solve in the short term with symlinks or hard links if they're on the same block device.

@SurajAralihalli
Copy link
Contributor Author

Thanks @cjac, we could start with copying driver installations enhancements from spark-rapids.sh to mig.sh

@cjac
Copy link
Contributor

cjac commented Nov 11, 2024

Hello @SurajAralihalli !

Thanks for writing. I have been thinking about creating a "parent" git repo which uses submodules to check out each related repository in a repeatable way. The utility functions that action maintainers have developed together and are most up-to-date in gpu/install_gpu_drivers.sh should be common and included in the heading of most of the init actions. I was thinking about using the m4 templating system to generate each init action, including the most up to date version of the utility functions.

But that's a bit down the road.

What you're asking about is specific to mig.sh ; are you recommending that we audit the functionality of each implementation and refactor and cross-merge all changes between implementations? We could then use a single file, and either discard the redundant copies or at least keep them all in sync with one another.

Our current implementation does not lend itself well to sharing code. I think we could solve a lot of problems by coming up with a reference implementation for sharing code between init actions. I'm afraid that generating the scripts from templates using m4 or the like would be the most effective approach.

@SurajAralihalli
Copy link
Contributor Author

My request specifically concerns mig.sh. Both mig.sh and spark-rapids.sh share common functionality, but recent updates to this functionality were made only in spark-rapids.sh, leaving mig.sh outdated. While updating mig.sh may serve as a temporary solution to quickly unblock customers, I appreciate your more efficient approach, though, as you mentioned, it may take a bit more time to implement. Thanks for detailing your solution!

@SurajAralihalli SurajAralihalli linked a pull request Dec 4, 2024 that will close this issue
@cjac
Copy link
Contributor

cjac commented Dec 16, 2024

can we merge this issue into #1276 ?

@cjac
Copy link
Contributor

cjac commented Dec 16, 2024

If not, can you rebase onto #1275 ?

If so, I'll plan work on templatizing the code, generating common headers including function definitions. The actions themselves will only be an #include statement followed by the code unique to this action.

On each release, the action will be processed by a template expansion step which re-generates each action.

This change implies a re-factor of all actions to be implemented in terms of the common functions we've been developing during the clean-up of gpu/install_gpu_drivers.sh

My personal preference would be Template::Toolkit[1], but I don't know if we have any other perl developers in the userbase.

Is there a python equivalent? It seems there is[2]. Its test suite was updated last year and claims compliance with Python 2.10

[1] https://template-toolkit.org/
[2] https://github.com/lmr/Template-Toolkit-Python

@cjac
Copy link
Contributor

cjac commented Jan 7, 2025

I'm addressing the need to update mig.sh in PR #1284

@cjac
Copy link
Contributor

cjac commented Jan 10, 2025

I'm about to run tests now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants