-
Notifications
You must be signed in to change notification settings - Fork 7
/
Copy pathallocate.txt
91 lines (74 loc) · 2.73 KB
/
allocate.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
We can estimate the total throughput of the system,
and we can break it down by resource type,
and for a given set of keywords we can see how much is not vetoed.
Goal: someone comes to SU with an allocation request, which includes
- the available app version types
(CPU, NVIDIA, AMD, platform, VM)
- the set of keywords
- # FLOPs
We can tell them whether this can be granted, and if so starting when
Parameters:
- what % of resources to use for allocations (50?)
- max % for a single allocation (10?)
Projects also get an unallocated share,
with a fraction determined by the popularity of their keywords
to process an allocation request:
select hosts that aren't ruled out by keywords
total their flops for eligible resources
------------
SU serves as a meta-scheduler.
It keeps track of targets, allocations, balances, host assignments.
SU allocation is at the granularity of project.
If a project needs app- or user-level granularity it can either
- run multiple BOINC projects, 1 per user/app
- use BOINC's built-in allocation system
An "allocation" consists of
- an initial balance
- the rate of balance accrual
- the interval of balance accrual
where "balance" is in terms of FLOPs
There are two measures of work done:
1) REC, as reported directly by clients
2) credit, as reported by projects
2) is cheat-resistant, 1) is not
1) is up-to-date, 2) is not (long jobs, validation delay)
Approach:
use 1) from SU scheduling, but don't show it to volunteers
(eliminate incentive for cheating)
show 2) to volunteers.
?? what if SU assigns a host to a project, and the project doesn't have work?
SU should work in a reasonable way for sporadic workloads
Need way for SU to check if project has work?
================
Linear bounded allocation model
Resource: N computers
each does X FLOPS
each has queueing delay of X seconds
The above are dynamically estimated.
Make the estimate conservative; e.g. 90% of data
TFLOPS = N*X
There are M "submitters"
each has:
FLOPS rate
max balance
(the above can be adjusted, but are otherwise static)
balance
(dynamic)
balance:
constantly increases at the given rate
maxes out at max
is decreased by X when a job of X FLOPs is dispatched
sum of rates is < TFLOPS
The above is for projects.
What about SU?
submitters are projects
"dispatch" is attaching a computer to project
a project's balance decreases at rate of attached computers
divided by # attachments
RPC
client is attached to a set of projects
The right thing may be to detach these and switch to new project
possibility:
find set of compatible projects based on prefs
attach to highest, detach from lowest
take into account cached data?