Add flux components slides #267

vsoch · 2024-03-28T22:28:44Z

This PR adds an abstractions and architecture guide, which right now is our small set of documentation slides that go over flux (high level) and explain the space of projects. This addition includes:

Various tweaks to build /config errors on other pages I noticed (the first commit)
The index.rst (front page) links directly to the architectures page (first picture below)
The guides page includes the new page (second picture)
The architectures page directly links the slides, and they have the flux release code (third picture)

And it's quite pleasant to click through the slides and learn about flux! Each is very simple with words / pictures highlighted to make a point.

Signed-off-by: vsoch <[email protected]>

Problem: the flux components are diverse and can be confusing. Solution: create an architecture page that includes a short set of slides that go over high level concepts and projects. Signed-off-by: vsoch <[email protected]>

vsoch · 2024-03-28T22:31:56Z

Oh neat, readthedocs has a new dashboard!

Signed-off-by: vsoch <[email protected]>

grondo

Looks good thanks! I have one question about naming the slides "architecture slides", which I think may confuse some people.

grondo · 2024-04-04T00:46:18Z

guides/architecture.rst

+.. _flux-architecture:
+
+#################
+Flux Architecture


Is "architecture" the correct term here? This seems to be describing a high level overview of Flux Framework and many of the current components that are currently project under that umbrella. It could just be me, but when I see "Flux Architecture", I would expect to see something more along the lines of what's described here

Maybe this document should be called "Flux Framework Overview and Components"?

garlick

Generally good! Here are a few nitpicky suggestions that you may or may not want to include:

Where does flux run? How about a third box that is "your laptop" since is so easy to start a flux instance anywhere for experimentation (a nice feature)
The flux core slide is a bit skimpy. Maybe you could steal something from here and expand it into two or three slides? It does contain a lot that defines the overall architecture of Flux.
Flux-security: only required at an HPC site if using Flux as the native resource manager. Not required if running Flux under another RM like slurm. It is intended to collect all the tricky security bits of flux into one small, auditable, infrequently changing package
On the pmix slide I would specifically mention OpenMPI. I would hate to leave anyone wondering "Hey I use MPI, do I need PMIX?" Do you use openmpi? Ok maybe.
The segue into other flux projects is a little choppy and those sentences in boxes seem a little word salady to me. We just talked about components and now we're putting them inside each other? Please have another look at those and see if they still make sense to you. I'm not too sure what to suggest. Maybe given that this is a transition, something about bridging the HPC and cloud communities? It may not be bad to acknowledge their differences here?
At this point you are in your element and I wouldn't want to suggest that you do anything different!

vsoch · 2024-04-04T02:16:50Z

Thanks @garlick ! I'll get started on these changes and ping you when they are ready for a second review. I really appreciate it!

vsoch · 2024-04-04T03:29:25Z

@garlick the slides are updated - please take a look!

The segue into other flux projects is a little choppy and those sentences in boxes seem a little word salady to me. We just talked about components and now we're putting them inside each other? Please have another look at those and see if they still make sense to you. I'm not too sure what to suggest. Maybe given that this is a transition, something about bridging the HPC and cloud communities? It may not be bad to acknowledge their differences here?

I agree, but I don't have better ideas at the moment, primarily because all of these are changing so rapidly. The way I'm viewing these slides now is that the first section on flux projects is a cohesive thing, and the remainder sections (starting with fluence, for example) are separate references for when someone asks about them specifically. I think to map these into the same space we need a larger discussion / mapping out of the components and projects. I started a flux architecture initiative but it didn't pick up any interest so I've only been doing this thinking as needed. I think we can just improve upon this over time.

Signed-off-by: vsoch <[email protected]>

garlick · 2024-04-04T13:03:17Z

Better!

Note: see @grondo's comment about the title - I thought that was a good comment

In the core slides, I think I would not use the throughput example as 1 job/s is far less than what we can do normally, and we don't provide tooling to nest flux instances to increase throughput for a single stream of work. A more down to earth performance benefit of flux recursion is that, in contrast to a monolithic resource manager like slurm, batch jobs run as full flux instances, and thus could run a taxing workload (like a high throughput one) without impacting the parent Flux instance or other batch jobs.
How do nodes communicate? In the first slide, knda sounds like the lead broker role is user-optional, and that followers connect directly to the leader socket. Maybe just say "one broker is designated as the leader situated at the root of a tree based overlay network"? Followers join the overlay network.
In flux-security, I think the sentence about the tricky bits and auditability should be in the text of the first slide as that's fundamental to its existence as a component

Problem: we are not really talking about architecture. Solution: rename to components.

vsoch · 2024-04-04T15:00:46Z

@grondo apologies I just missed your comment! Changes:

renamed to "Flux Components" (files, and documentation)
removed throughout slides
flux security "tricky bits" moved to first slide
rephrased the TBON slides
I added one slide for fluxion, which felt way too empty!

garlick · 2024-04-04T15:08:54Z

Thanks!

Did all those changes get made? I'm still seeing the throughput example on slide 13 for example

vsoch · 2024-04-04T15:22:50Z

Is your browser caching? Here is a direct link to the slides: https://docs.google.com/presentation/d/10EchFMjJYFCZGa0CMWR1AwazGsLGXJYY5Nwj34nSZvg/edit?usp=sharing

And that set has the throughput ones removed:

garlick · 2024-04-04T15:26:25Z

I meant the "real world example" slide which is the culmination of the throughput example (I see it there in the png)

vsoch · 2024-04-04T15:30:00Z

That's from the learning guide though - https://flux-framework.readthedocs.io/en/latest/guides/learning_guide.html#fully-hierarchical-resource-management-techniques

Why is it wrong? It's mostly meant to demonstrate why the instances are useful. There isn't really anything I can find that shows it beyond that.

vsoch · 2024-04-04T15:31:49Z

ok I added back the original slides and changed to "Here is a real world example that shows increasing throughput to 500 jobs/second with three instance levels." I think the real world example is still important - instances / nesting is one of Flux key features and we don't do a good job anywhere of telling people why they might care. The job submission example is relatively simple (easy to understand) and I think achieves that. But if there is a better example I can definitely use that, I just don't know of one.

garlick · 2024-04-04T15:45:24Z

Why is it wrong? It's mostly meant to demonstrate why the instances are useful. There isn't really anything I can find that shows it beyond that.

We don't support it with tooling, it fragments resources, and it's more of a stunt than a real solution to a problem. Also I hate to advertise 1 job/s when we can get

 garlick@system76-pc:~/proj/flux-core$ src/cmd/flux start src/test/throughput.py  -x
number of jobs: 100
submit time:    0.134 s (743.7 job/s)
script runtime: 0.649 s
job runtime:    0.550 s
throughput:     181.8 job/s (script: 154.1 job/s)

The example I was suggesting is

A more down to earth performance benefit of flux recursion is that, in contrast to a monolithic resource manager like slurm, batch jobs run as full flux instances, and thus could run a taxing workload (like a high throughput one) without impacting the parent Flux instance or other batch jobs.

chu11 · 2024-04-04T17:18:54Z

just some nits

slide 18 - I don't think the go bindings aren't a part of flux-core. Perhaps say something along the lines of "other bindings like rust/go available in other projects"?

slide 26 - perhaps more generically "flux-security is needed when different users will be running jobs on the resources, such as when flux is the native resource manager on an HPC cluster". There are some people that install schedulers on clusters just for themselves and no one else.

slide 37 - sorry if it's just me, but the English here sounds weird to me "When we expose the flux sched bindings in Go, we create a plugin called Fluence" ... It sounds like the Go bindings are called Fluence. Do you mean "We exposed the flux-sched bindings in Go, which allowed use to create a plugin called Fluence"?

slide 45 - Not super knowledgeable of kubernetes speak, so it could be me ... the sentence is also a little run on, so I read this as ...

"We can map Flux components into containers AND kubernetes abstractions, this allows us to implement ...."

but I think you mean

"We can map Flux components into containers. By using kubernetes abstractions we can implement ...."

vsoch · 2024-04-04T19:44:48Z

@garlick I added that description and cut the other slides entirely. I think in the future we do want to have a convincing example, because it's hard to understand the recursive / nesting and people often need something that is more proof in the pudding than hypothetical.

@chu11

I removed the reference to the Go bindings
suggestion for flux-security applied
fluence suggestion too
I do mean flux components into abstractions. An abstraction in Kubernetes is (high level) a pod, config map, job. We have to figure out analogous Kubernetes abstractions to handle different components of Flux.

garlick

LGTM!

vsoch · 2024-04-04T20:00:34Z

@chu11 when it looks good to you we can merge.

chu11 · 2024-04-04T20:09:55Z

looks good

vsoch · 2024-04-04T20:17:20Z

Thank you to you both!

vsoch added 2 commits March 28, 2024 16:16

fix: issues with make html build

94be8fa

Signed-off-by: vsoch <[email protected]>

feat: add flux abstractions (architecture) page

999ca84

Problem: the flux components are diverse and can be confusing. Solution: create an architecture page that includes a short set of slides that go over high level concepts and projects. Signed-off-by: vsoch <[email protected]>

vsoch force-pushed the add-architecture-slides branch from c92d4a6 to 999ca84 Compare March 28, 2024 22:31

ci: test python 3.11 for readthedocs

bf12ccf

Signed-off-by: vsoch <[email protected]>

vsoch requested a review from garlick March 28, 2024 22:35

grondo reviewed Apr 4, 2024

View reviewed changes

garlick reviewed Apr 4, 2024

View reviewed changes

assets: flux-bird images

a13a036

Signed-off-by: vsoch <[email protected]>

guide: rename architecture to components

4887f9e

Problem: we are not really talking about architecture. Solution: rename to components.

vsoch force-pushed the add-architecture-slides branch from d456095 to 4887f9e Compare April 4, 2024 14:50

garlick approved these changes Apr 4, 2024

View reviewed changes

vsoch changed the title ~~Add architecture slides~~ Add flux components slides Apr 4, 2024

vsoch merged commit 07bf155 into master Apr 4, 2024
5 checks passed

vsoch deleted the add-architecture-slides branch April 4, 2024 20:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add flux components slides #267

Add flux components slides #267

vsoch commented Mar 28, 2024

vsoch commented Mar 28, 2024

grondo left a comment

grondo Apr 4, 2024

garlick left a comment

vsoch commented Apr 4, 2024

vsoch commented Apr 4, 2024

garlick commented Apr 4, 2024 •

edited

Loading

vsoch commented Apr 4, 2024

garlick commented Apr 4, 2024

vsoch commented Apr 4, 2024

garlick commented Apr 4, 2024

vsoch commented Apr 4, 2024

vsoch commented Apr 4, 2024

garlick commented Apr 4, 2024

chu11 commented Apr 4, 2024

vsoch commented Apr 4, 2024

garlick left a comment

vsoch commented Apr 4, 2024

chu11 commented Apr 4, 2024

vsoch commented Apr 4, 2024

Add flux components slides #267

Add flux components slides #267

Conversation

vsoch commented Mar 28, 2024

vsoch commented Mar 28, 2024

grondo left a comment

Choose a reason for hiding this comment

grondo Apr 4, 2024

Choose a reason for hiding this comment

garlick left a comment

Choose a reason for hiding this comment

vsoch commented Apr 4, 2024

vsoch commented Apr 4, 2024

garlick commented Apr 4, 2024 • edited Loading

vsoch commented Apr 4, 2024

garlick commented Apr 4, 2024

vsoch commented Apr 4, 2024

garlick commented Apr 4, 2024

vsoch commented Apr 4, 2024

vsoch commented Apr 4, 2024

garlick commented Apr 4, 2024

chu11 commented Apr 4, 2024

vsoch commented Apr 4, 2024

garlick left a comment

Choose a reason for hiding this comment

vsoch commented Apr 4, 2024

chu11 commented Apr 4, 2024

vsoch commented Apr 4, 2024

garlick commented Apr 4, 2024 •

edited

Loading