-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Let certain users allocate "broken" boards as part of a larger job #367
Comments
I'd give everyone the capability — it's not actively harmful I believe— but it does require adding more queries. That's because the filtering out of “broken” boards is currently baked into the computation of a virtual column that characterises whether the board is available for allocation. Also, broken boards might not respond to BMP requests correctly so there's that to worry about too! |
It isn't necessarily harmful to get a machine with a board marked as broken, but it should only happen by default if the allocation would still meet the request i.e. a request for 12 boards probably shouldn't return 12 boards with one broken, but a request for 11 could do that. An admin might then want to allow the return of the 12 boards even with the broken one if that is explicitly asked for; that could also be explicitly asked for by any user though. If the request includes a board that doesn't do the BMP bit correctly, that is OK too; if the user asked for it, they get what they asked for! |
I know easy to ask for harder to do. if not too hard it would be nice to avoid "broken" boards in the middle of the machine as that is more likely to create routing hotspots around it. Anyone who specifically asks for a machine with broken boards and then moans something did not work will be laughed at! |
And of course 0,0 shouldn't be broken! |
I'll always require 0,0 to be alive, and a minimum number (determined by the request) of connected boards be alive too; the algorithm won't give you two disconnected chunks in an allocation (unless your request is at least satisfied by one of them I suppose). Both allocating dead boards and avoiding broken boards in the middle of the machine will require different allocation algorithms than the one we currently use. Right now, we compute whether boards are available for allocation (= not allocated and not broken), and then run a pseudorectangle (because of triads) over the machine to pick candidates for allocation. Then we ensure we have a connected subgraph of boards (rooted at 0,0 in local coordinates) within that rectangle of sufficient actual size. If we do, that's the allocation. The advantage of this approach is that it allows us to allocate a large block of boards even if there's a small existing allocation within that large block; it's treated as if the board is dead from the perspective of the larger allocation. It's this sort of trick that was why I very much wanted to stop using the algorithms that the old spalloc used, and instead move to the relational-set-based ones that SQL makes practical. These are also among the most complex SQL queries I've ever written. (I've given up on trying to combine them into one; it might be possible, but the idea scares me due to the way I'd need nested aggregations. With the big machine being mostly operational, the connectivity check will pass most of the time anyway.) |
Wow this sounds complex, but I really like the idea that you can nest an allocation in another! That should allow some quite complex combinations of allocations. |
Also, if you want to look for yourself, the queries are in their own files (the Java code to connect them together is mostly straightforward)
The sequence: WITH
-- Name the arguments for sanity
args(width, height, machine_id, max_dead_boards) AS (
VALUES (:width, :height, :machine_id, :max_dead_boards)), is just so that I can work with proper named arguments and JDBC, which is strictly positional (unlike with result sets); the |
I should add that trying to put broken boards on the edge of the machine is non-critical (and clearly doesn't work as well with the idea of nesting allocations). |
I'm not sure how to even express that in relational algebra. 😄 |
For debugging, it could be useful to allow certain users (admins?) to allocate boards marked as "broken" as part of a bigger job. They would have to request this specifically of course.
The text was updated successfully, but these errors were encountered: