-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CP013: Calrify the requirements for the copy and move semantics of the execution_resource
#67
Comments
I wish I had been there for the Monday telecon. David Hollman and I discussed this on Friday and we came up with the following concerns:
I was thinking that it's really just enough for execution resources to be unique identifiers. That is, if I view a subset of available resources, and then request to create a context with that subset, those identifiers shouldn't change between my first view and the context creation request. David had other ideas too -- I'm not speaking for him here, I just wanted to get some ideas out there before the Monday morning meeting. |
Notes from Monday 30 July 2018 telecon: The point of making execution resources copyable and moveable is so that we can use standard algorithms to traverse the hierarchy of resources, e.g., for depth-based views of topology. This also enables use of the "pImpl" idiom to hide the implementation. @mhoemmen agrees that this makes sense. The argument for reference counting is that it makes the execution resource a "lightweight, opaque reference." However, reference counting is not necessarily "lightweight." For example, if users need to access the topology inside a parallel region, the parent resource (that different threads may need to traverse concurrently) may experience thread contention if it uses global reference counting. (Different threads would need copies of the resource -- that copying would touch the reference count.) Instead of mandating reference counting, we could say "as if reference counted," or even better: just that the resource be a unique identifier. The key is that it's not the resource itself; instead, it "points to" the object, like an index (e.g., device or platform ID) or a pointer. "Unique" means that it always refers to the same resource, as long as that resource exists. This is different than What about lifetime of an execution resource? Some resources are dynamic. Thus, in order to see what resources are available -- at least the top-level resources like a GPU -- the system may need to load DLLs and initialize back-ends. The advantage of reference counting is that it makes the lifetime clear. However, some kinds of resources may not have "global lifetime." For example, GPU scratch or a thread block may only be available during a parallel execution. If resources are neither copyable nor moveable, it makes their lifetime easier to define, but hinders traversal (see above). It's true that the execution context could control lifetime. However, what about memory? There is no analogous "memory context" that would control whether a memory resource is available. That could lead to trouble if users have containers that use a Summary:
Questions:
|
The vast majority of standard algorithms (in terms of the subclause in the standard, at least), don't require copyable, so this requirement doesn't make sense to me. Can someone elaborate with a specific example?
The pimpl idiom can be used with move-only or copyable objects, or even objects with neither of those operations, so I don't see how this is related. For instance, libc++ implements
I don't see how reference counting clarifies the use case of something only being available during parallel execution. Does it imply that holding a reference to the execution resource would transparently prevent parallel execution from shutting down, thus (potentially) deadlocking the program? That doesn't seem desirable; something like that should be much more explicit. In every case I can think of, the accessibility or a resource (or lack thereof) is (or should be) completely unrelated to the count of owners of that resource, but maybe I'm missing something. (Regardless of whether a counterexample exists, it seems wrong to impose a reference counting requirement unless this edge case that I'm missing is somehow the common case.)
I think this may be one of the sources of fundamental disconnect for me, at least. Just because a given resource may need to have something running to work doesn't mean that the creation of a handle to that resource should transparently start a driver or something. I need to read through the current draft of this effort more carefully and perhaps try harder to make it to some of the calls, so it's totally fair to take my comments with a grain of salt at this point. |
@dhollman wrote:
I should have left out the "pImpl" bit -- that wasn't essential to the discussion.
I think we all agreed that we don't need to mandate reference counting. The bigger question is about resource lifetime. There are two parts to a resource being "alive":
We could imagine these two things being separate. For example, the system might invalidate a resource between when I get it from the tree, and when I try to create a context from it. That implies the context creation should fail. (This is part of what it means for a resource to be a "unique identifier": either it works and points to the intended thing, or it's invalid and context creation with it fails.) Could a resource become invalid during tree traversal? We've decided that resources form a tree. Do we want validity of a resource during tree traversal to follow Here's a concrete example:
Questions:
Here's my view:
Does this sound reasonable? My answers would mostly decouple the availability and validity of resources from any driver or run-time initialization. ("Mostly," because (3) would mandate no reuse of IDs, unlike how |
Update resource lifetime discussion in the Affinity proposal, based on codeplaysoftware#67 discussion. NOTE: I have not changed the rest of the proposal to make it consistent with these changes. If people like this pull request, I will then update the rest of the proposal accordingly.
My PR #70 changes the affinity lifetime section to reflect this discussion. It does not include changes to the rest of the proposal for consistency. If people like my PR, I will change the rest of the proposal accordingly. Thanks all! |
Thanks very much @mhoemmen, I've put some comments on the pull request. I also had some further thoughts, following on from the discussion on the pull request regarding how we manage the lifetime of an entire system topology. So if we were to say that calling So in the example below, we discover the topology assign the root note to a local variable auto sysRes = this_system::resources(); // discover topology, sysRes points to root of the DAG
auto dynamicRes = sysRes[3]; // store the resource for a specific dynamic resource
/* some callback mechanism detects that the topology has changed and the dynamic resource is no longer available */
sysRes = this_system::resources(); // re-discover topology, sysRes now points to the new root of the DAG To solve this we could require that upon calling Another issue I see if how we define the destruction of the topology DAG itself. I see a few potential ways to approach this:
|
Thanks for the detailed explanation! @AerialMantis wrote:
We could also just have resources be unique, so that either a resource points to a thing that was valid at snapshot time, or points to nothing. Then, we wouldn't need a callback. The callback approach suggests polling, or some system facility, that could either be insufficiently general or expensive. I'd rather just have attempts to use an invalidated execution resource fail. |
Notes from 2018-09-03 call:
|
Currently, the requirements for the
execution_resource
are quite vague:In #40 we decided that the
execution_resource
should be copyable and moveable but must act as an opaque type with reference counting semantics.Taken from #40:
Perhaps we want to introduce normative wording which requires certain behaviour of the
execution_resource
when being copied or moved in order to guarantee the corerect behaviour.The text was updated successfully, but these errors were encountered: