-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: implement preflight #502
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good stuff @noryev ! Just a couple small comments/suggestions
Message: "NVIDIA runtime test failed", | ||
Error: err, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] Ordering of these is different in comparison to the lines above. Also maybe a worth adding a formatted error like the others?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed the ordering and per Brians comments, I also moved these structs to preflight.go to get rid of the types.toml
fields := strings.Split(record, ", ") | ||
if len(fields) != 4 { | ||
continue | ||
} | ||
|
||
memoryStr := strings.Split(fields[2], " ")[0] | ||
memoryMiB, _ := strconv.ParseInt(memoryStr, 10, 64) | ||
|
||
gpu := GPUInfo{ | ||
UUID: strings.TrimSpace(fields[0]), | ||
Name: strings.TrimSpace(fields[1]), | ||
MemoryTotal: memoryMiB, | ||
DriverVersion: strings.TrimSpace(fields[3]), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe would be cleaner if you had a mapper function that took in a record and spit out a gpu object? Also, do we know for certain that the fields won't be blank?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TY! Totally agree, I have moved that parsing code into its own parseGPURecord func that is cleaner. WDYT? d1b7d78
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also added better error handling if the fields are blank too
type preflightChecker struct { | ||
gpuInfo []GPUInfo | ||
} | ||
|
||
type GPUCheckConfig struct { | ||
Required bool | ||
MinGPUs int | ||
MinMemory int64 | ||
Capabilities []string | ||
} | ||
|
||
func checkNvidiaSMI() error { | ||
_, err := exec.LookPath("nvidia-smi") | ||
return err | ||
} | ||
|
||
type nvidiaSmiResponse struct { | ||
UUID string | ||
Name string | ||
MemoryTotal string | ||
DriverVersion string | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason for leaving these in this file vs putting them in preflight/types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would have made more sense! For cleaner code, I moved types.go into preflight.go per Brians instructions getting rid of the types.go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we move these types into preflight.go
? Having the types alongside the code that uses them is a better code organization pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep! Thank you for the recommendation!
pkg/options/resource-provider.go
Outdated
Preflight: preflight.PreflightConfig{ | ||
GPU: struct { | ||
Required bool | ||
Enabled bool | ||
MinMemoryGB int64 | ||
Capabilities []string | ||
}{ | ||
Required: false, // Enable to require GPU | ||
Enabled: true, // Enable checks to detect if GPU exists | ||
MinMemoryGB: 1, // Minimum memory required for GPU (we can match this with the resourceOffer) | ||
}, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed these configs. It looks like this may run fine on CPU-only and GPU machines, and doesn't significantly slow down start up times. For now, let's leave the preflight check enabled in all cases, and we can see if there is a motivation to make it configurable.
The Required
config can also be removed now that the preflight check works fine on any machine.
The MinMemoryGB
can be moved to the code that uses it and hardcoded. In the future, there may be a reason a resource provider would want to configure this, but for a first pass we can hard code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
9cbff18
to
2e56993
Compare
pkg/options/resource-provider.go
Outdated
Preflight: preflight.PreflightConfig{ | ||
GPU: struct { | ||
MinMemoryGB int64 | ||
}{ | ||
MinMemoryGB: preflight.RequiredGPUMemoryGB, | ||
}, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this to the preflight package. We add items to the options package when they are configurable by users.
Co-authored-by: logan <[email protected]>
Co-authored-by: logan <[email protected]>
Co-authored-by: logan <[email protected]>
Co-authored-by: logan <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@@ -80,6 +80,7 @@ type ResourceProvider struct { | |||
web3SDK *web3.Web3SDK | |||
options ResourceProviderOptions | |||
controller *ResourceProviderController | |||
gpuInfo []preflight.GPUInfo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we using this?
Co-authored-by: logan <[email protected]>
Co-authored-by: logan <[email protected]>
Co-authored-by: logan <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary
This PR implements a new pre-flight check system for Lilypad resource providers. The pre-flight check verifies essential requirements including GPU availability, Docker runtime, and system resources before a provider joins the network.
This pull request makes the following changes:
These changes are made to ensure a Resource Provider(s) is prepared to run a Lilypad Module
Task/Issue reference
First iteration of preflight!
Test plan
You can test this locally within the dev environment.
Details (optional)
Add any additional details that will help to review this pull request.
Related issues or PRs (optional)
A previous PR that served as inspiration: #433