CP013: Calrify the requirements for the copy and move semantics of the `execution_resource` #67

AerialMantis · 2018-07-24T13:06:40Z

Currently, the requirements for the execution_resource are quite vague:

[Note: The intention is that the actual implementation details of a resource topology are
described in an execution context when required. This allows the execution resource objects
to be lightweight objects that serve as identifiers that are only referenced. --end note]

In #40 we decided that the execution_resource should be copyable and moveable but must act as an opaque type with reference counting semantics.

Taken from #40:

Answering the second point, the execution_reosurce should remain copyable and moveable so that it can be used within std alogrithms, but it should be an opaque type with reference counting semantics.

Perhaps we want to introduce normative wording which requires certain behaviour of the execution_resource when being copied or moved in order to guarantee the corerect behaviour.

The text was updated successfully, but these errors were encountered:

mhoemmen · 2018-07-29T05:37:23Z

I wish I had been there for the Monday telecon. David Hollman and I discussed this on Friday and we came up with the following concerns:

We couldn't think of any type in the Standard that has reference-counting semantics, but doesn't have shared_ in the name (e.g., shared_ptr, shared_future).
Reference counting could have unattractive overhead if accessed concurrently.
What about architectures with execution resources (e.g., GPUs) that might only exist during a parallel execution? It would be bad for those references to escape the parallel region.
It's not clear that all kinds of execution resources would fit naturally into a reference-counting lifetime model. For example, what if an operating system could "take back" available execution resources during a program's execution, as long as the program isn't currently using them? Any view of the execution resources might be just a snapshot of what's currently available.

I was thinking that it's really just enough for execution resources to be unique identifiers. That is, if I view a subset of available resources, and then request to create a context with that subset, those identifiers shouldn't change between my first view and the context creation request.

David had other ideas too -- I'm not speaking for him here, I just wanted to get some ideas out there before the Monday morning meeting.

mhoemmen · 2018-07-30T18:25:51Z

Notes from Monday 30 July 2018 telecon:

The point of making execution resources copyable and moveable is so that we can use standard algorithms to traverse the hierarchy of resources, e.g., for depth-based views of topology. This also enables use of the "pImpl" idiom to hide the implementation. @mhoemmen agrees that this makes sense.

The argument for reference counting is that it makes the execution resource a "lightweight, opaque reference." However, reference counting is not necessarily "lightweight." For example, if users need to access the topology inside a parallel region, the parent resource (that different threads may need to traverse concurrently) may experience thread contention if it uses global reference counting. (Different threads would need copies of the resource -- that copying would touch the reference count.)

Instead of mandating reference counting, we could say "as if reference counted," or even better: just that the resource be a unique identifier. The key is that it's not the resource itself; instead, it "points to" the object, like an index (e.g., device or platform ID) or a pointer. "Unique" means that it always refers to the same resource, as long as that resource exists. This is different than thread_id, since the standard permits reuse of thread IDs. We need to look in the standard for a type that behaves like a unique identifier. The Filesystem TS could be a good start.

What about lifetime of an execution resource? Some resources are dynamic. Thus, in order to see what resources are available -- at least the top-level resources like a GPU -- the system may need to load DLLs and initialize back-ends. The advantage of reference counting is that it makes the lifetime clear. However, some kinds of resources may not have "global lifetime." For example, GPU scratch or a thread block may only be available during a parallel execution. If resources are neither copyable nor moveable, it makes their lifetime easier to define, but hinders traversal (see above).

It's true that the execution context could control lifetime. However, what about memory? There is no analogous "memory context" that would control whether a memory resource is available. That could lead to trouble if users have containers that use a pmr memory resource. We agreed that execution and memory resources should be symmetric, in that either one may need to start a driver. (For example, calling cudaMalloc requires that CUDA be active.)

Summary:

Not agreed on reference-counting semantics
- Not all execution and memory resources may have lifetimes that fit reference counting
- Reference counts have overhead and may cause thread contention
- The Standard has no precedent for a reference-counting type without shared_ in the name
- However, there's no "memory context" now, and containers etc. may hang on to memory resources
Resources should be unique identifiers

Questions:

What is the lifetime of a resource?
Could different resources have different lifetimes?
Do we need to worry about resource lifetime now? Could we just say that a resource is a unique identifier, that always refers to the same thing (or to a thing that no longer exists)?
If resources don't reference-count, then do we need a "memory context" analogous to execution context, to manage the memory resource's lifetime?

dhollman · 2018-07-30T19:08:14Z

The point of making execution resources copyable and moveable is so that we can use standard algorithms to traverse the hierarchy of resources, e.g., for depth-based views of topology.

The vast majority of standard algorithms (in terms of the subclause in the standard, at least), don't require copyable, so this requirement doesn't make sense to me. Can someone elaborate with a specific example?

This also enables use of the "pImpl" idiom to hide the implementation.

The pimpl idiom can be used with move-only or copyable objects, or even objects with neither of those operations, so I don't see how this is related. For instance, libc++ implements std::mutex using a pimpl pattern (see here and here), but it is neither movable nor copyable.

The advantage of reference counting is that it makes the lifetime clear. However, some kinds of resources may not have "global lifetime." For example, GPU scratch or a thread block may only be available during a parallel execution.

I don't see how reference counting clarifies the use case of something only being available during parallel execution. Does it imply that holding a reference to the execution resource would transparently prevent parallel execution from shutting down, thus (potentially) deadlocking the program? That doesn't seem desirable; something like that should be much more explicit. In every case I can think of, the accessibility or a resource (or lack thereof) is (or should be) completely unrelated to the count of owners of that resource, but maybe I'm missing something. (Regardless of whether a counterexample exists, it seems wrong to impose a reference counting requirement unless this edge case that I'm missing is somehow the common case.)

We agreed that execution and memory resources should be symmetric, in that either one may need to start a driver.

I think this may be one of the sources of fundamental disconnect for me, at least. Just because a given resource may need to have something running to work doesn't mean that the creation of a handle to that resource should transparently start a driver or something.

I need to read through the current draft of this effort more carefully and perhaps try harder to make it to some of the calls, so it's totally fair to take my comments with a grain of salt at this point.

mhoemmen · 2018-07-30T20:19:36Z

@dhollman wrote:

This also enables use of the "pImpl" idiom to hide the implementation.

I should have left out the "pImpl" bit -- that wasn't essential to the discussion.

(Regardless of whether a counterexample exists, it seems wrong to impose a reference counting requirement unless this edge case that I'm missing is somehow the common case.)

I think we all agreed that we don't need to mandate reference counting. The bigger question is about resource lifetime. There are two parts to a resource being "alive":

I can encounter the resource when traversing the resource tree, and I can traverse the resource's children.
I can create a context from the resource.

We could imagine these two things being separate. For example, the system might invalidate a resource between when I get it from the tree, and when I try to create a context from it. That implies the context creation should fail. (This is part of what it means for a resource to be a "unique identifier": either it works and points to the intended thing, or it's invalid and context creation with it fails.)

Could a resource become invalid during tree traversal? We've decided that resources form a tree. Do we want validity of a resource during tree traversal to follow std::set iterator invalidation rules, or should invalidation of a parent invalidate all its children? If the latter, how would users see that?

Here's a concrete example:

Suppose that company "Aleph" makes an accelerator that can be viewed as a resource, and that has its own child resources.
Users must call Aleph_initialize() in order to see the accelerator and its children as resources.
Users must call Aleph_finalize() when they are done using the accelerator.

Questions:

What should happen if users are traversing the resource tree, but never use the accelerator's resource (other than to iterate past it), and something else concurrently calls Aleph_finalize()?
What should happen if users are traversing the accelerator's child resources and something else concurrently calls Aleph_finalize()?
What should happen if users try to create an execution context from the accelerator's resource, after Aleph_finalize() has been called?
What should happen to outstanding execution contexts that use the accelerator's resource, if something calls Aleph_finalize() after the context was created?

Here's my view:

Nothing bad should happen. Users should be able to iterate past an invalidated resource. If users are iterating a resource R's children and one child becomes invalid, that should not invalidate R or the iterators to its children.
Implementation defined; may be UB. (Parent may invalidate children; different than std::set iterator invalidation rule.)
I would prefer that context creation fails (like how MPI functions report an error if they are called after MPI_Finalize). The resource should become invalid; it should not silently point to something else.
Implementation defined; may be UB.

Does this sound reasonable? My answers would mostly decouple the availability and validity of resources from any driver or run-time initialization. ("Mostly," because (3) would mandate no reuse of IDs, unlike how thread_id works in the Standard.)

Update resource lifetime discussion in the Affinity proposal, based on codeplaysoftware#67 discussion. NOTE: I have not changed the rest of the proposal to make it consistent with these changes. If people like this pull request, I will then update the rest of the proposal accordingly.

mhoemmen · 2018-08-06T00:03:43Z

My PR #70 changes the affinity lifetime section to reflect this discussion. It does not include changes to the rest of the proposal for consistency. If people like my PR, I will change the rest of the proposal accordingly. Thanks all!

AerialMantis · 2018-08-27T14:30:19Z

Thanks very much @mhoemmen, I've put some comments on the pull request.

I also had some further thoughts, following on from the discussion on the pull request regarding how we manage the lifetime of an entire system topology. So if we were to say that calling this_system::resources creates a snapshot of the topology at the point of the call, which simply enumerates the resources, then how would we manage copies of the resources within the DAG in the case of child resources dynamically becoming unavailable (if we come to support this feature)?

So in the example below, we discover the topology assign the root note to a local variable sysRes, store a resource within the DAG to another local variable dynamicRes, and then upon the topology changing we discover the topology again, updating the local variable sysRes. Then if the change to the topology is one of dynamicRes's child resources becoming unavailable, the view that dynamicRes has of the topology is now out of date.

auto sysRes = this_system::resources();  // discover topology, sysRes points to root of the DAG
auto dynamicRes = sysRes[3];  // store the resource for a specific dynamic resource

/* some callback mechanism detects that the topology has changed and the dynamic resource is no longer available */

sysRes = this_system::resources();  // re-discover topology, sysRes now points to the new root of the DAG

To solve this we could require that upon calling this_system::resources any previous topology state becomes invalid, so you can still continue to traverse it, but if you were to use it in an execution context then it would fail. So in this case sysRes would be a valid resource within the new topology DAG, and dynamicRes would be an invalid resource from a previous topology DAG.

Another issue I see if how we define the destruction of the topology DAG itself. I see a few potential ways to approach this:

We could have some kind of reference counting semantics, it could be difficult to ensure that any heap-allocated DAG information is deallocated properly when creating copies of resources, though this is generally inefficient and undesirable.
We could have a root object, such as resource_topology, which defines the lifetime of the entire DAG.
We could have the root object itself actual store all of the DAG information recursively, though this would mean that resources are no longer necessarily lightweight.

mhoemmen · 2018-09-03T04:17:19Z

Thanks for the detailed explanation!

@AerialMantis wrote:

To solve this we could require that upon calling this_system::resources any previous topology state becomes invalid, so you can still continue to traverse it, but if you were to use it in an execution context then it would fail. So in this case sysRes would be a valid resource within the new topology DAG, and dynamicRes would be an invalid resource from a previous topology DAG.

We could also just have resources be unique, so that either a resource points to a thing that was valid at snapshot time, or points to nothing. Then, we wouldn't need a callback. The callback approach suggests polling, or some system facility, that could either be insufficiently general or expensive. I'd rather just have attempts to use an invalidated execution resource fail.

AerialMantis · 2018-09-03T15:00:22Z

Notes from 2018-09-03 call:

We decided that execution resources should not be required to be reference counted and that discovering the topology should perform any initialisation and finalisation on underlying OS or API calls, and should return a structure of opaque identifiers representing a snapshot of the topology (so having no active underlying resources until you construct an execution context). There should be a promise that the resource identifiers don't change.
We decided that constructing an execution context from an execution resource that is no longer valid will throw an exception, and that interacting with the underlying OS or API is undefined behaviour, but the implementation should aim to handle gracefully if it can.
We also decided that we should try to require that the topology discovery be race free.

CP013: Address #67 & discussion in #70

AerialMantis mentioned this issue Jul 24, 2018

CP013: Differences in Affinity and Context papers #40

Open

7 tasks

mhoemmen mentioned this issue Jul 30, 2018

CP013: Affinity asymmetry between execution and memory resource #41

Closed

mhoemmen mentioned this issue Aug 6, 2018

Affinity: Update resource lifetime #70

Open

AerialMantis added SAN18 San Diego 2018 C++ Meeting In review labels Aug 25, 2018

mhoemmen mentioned this issue Sep 10, 2018

Affinity: Address #67 & discussion in #70 #74

Merged

AerialMantis added a commit that referenced this issue Oct 2, 2018

Merge pull request #74 from mhoemmen/Affinity

926634e

CP013: Address #67 & discussion in #70

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CP013: Calrify the requirements for the copy and move semantics of the `execution_resource` #67

CP013: Calrify the requirements for the copy and move semantics of the `execution_resource` #67

AerialMantis commented Jul 24, 2018

mhoemmen commented Jul 29, 2018 •

edited

Loading

mhoemmen commented Jul 30, 2018

dhollman commented Jul 30, 2018

mhoemmen commented Jul 30, 2018

mhoemmen commented Aug 6, 2018

AerialMantis commented Aug 27, 2018

mhoemmen commented Sep 3, 2018

AerialMantis commented Sep 3, 2018

CP013: Calrify the requirements for the copy and move semantics of the execution_resource #67

CP013: Calrify the requirements for the copy and move semantics of the execution_resource #67

Comments

AerialMantis commented Jul 24, 2018

mhoemmen commented Jul 29, 2018 • edited Loading

mhoemmen commented Jul 30, 2018

dhollman commented Jul 30, 2018

mhoemmen commented Jul 30, 2018

mhoemmen commented Aug 6, 2018

AerialMantis commented Aug 27, 2018

mhoemmen commented Sep 3, 2018

AerialMantis commented Sep 3, 2018

CP013: Calrify the requirements for the copy and move semantics of the `execution_resource` #67

CP013: Calrify the requirements for the copy and move semantics of the `execution_resource` #67

mhoemmen commented Jul 29, 2018 •

edited

Loading