Skip to content

Commit

Permalink
doc: describe the GC we are about to implement
Browse files Browse the repository at this point in the history
  • Loading branch information
runspired committed Nov 26, 2024
1 parent 431523c commit 3352349
Showing 1 changed file with 169 additions and 1 deletion.
170 changes: 169 additions & 1 deletion packages/store/src/-private/managers/resource-manager.ts
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,175 @@ type Caches = {
};

/**
* The ResourceManager
* ## ResourceManager
*
* The ResourceManager is responsible for managing instance
* creation and retention for managed ui Objects.
*
* Managed UI Objects include:
* - (Reactive) UIDocuments
* - (Reactive) UIArrays
* - (Reactive) UIRecords
*
* Every Managed UI Object has a well-known identity token:
* - UIDocuments => StableDocumentIdentifier
* - UIArrays => StableDocumentIdentifier
* - UIRecords => StableRecordIdentifier
*
* This identity token is a CacheKey that can be safely used
* to reference the UI Object and retrieve its associated data
* from the Cache without needing to retain the UI Object itself.
*
* Data in the cache keyed to these CacheKeys includes:
* - StableDocumentIdentifier => StructuredDocument (the request response)
* - StableDocumentIdentifier => ResourceDocument (the parsed and processed content of the request)
* - StableRecordIdentifier => Resource (the data for individual records)
* - StableRecordIdentifier => GraphEdge (data describing the relationship between one resource and another)
*
* The ResourceManager has three modes:
* - Strong (default - until v6)
* - Weak + auto GC
* - Weak + manual GC (defaut after v6)
*
*
* ----------------------------------------------------------------
*
* ### Strong Mode
*
* In strong mode, Managed UI Objects are retained forever unless
* explicitly destroyed by the application. Associated data in the
* cache is similarly retained forever unless explicitly removed.
*
* Strong mode comes with inherent risks:
* - memory usage may grow to a problematic size
* - manually managing release can result in unsafe teardown occurring
* - access to legacy APIs (only allowed in strong mode) like
* unloadRecord and unloadAll can result in application bugs,
* unsafe teardown, and broken relationships.
*
* While there are risks, strong mode is a great choice for applications
* that understand these risks and are able to manage them effectively.
*
* Applications that may wish to use strong mode will typically be
* those that utilize small quantities of data that changes infrequently.
*
* The primary benefit of strong mode is that it incurs less overhead
* on accessing data because UI Objects do not need to ever re-instantiate,
* and their instance is quicker to retrieve due to not requiring a
* `<WeakRef>.deref()` resolution.
*
*
* ----------------------------------------------------------------
*
* ### Weak Mode
*
* This is managed via WeakRef. In a WeakRef, the *value* is weakly
* retained while the *key* is strongly retained. This means that
* our CacheKey is strongly retained, but the UI Object is free to
* be collected if the application no longer has a reference to it.
*
* On it's own, this already provides a significant reduction in
* longterm memory usage for applications that have a lot of UI Objects:
* but we can do more!
*
* In addition to utlizing WeakRefs, the ResourceManager manages a
* a FinalizationRegistry and registers the Managed UI Object with it.
*
* This allows us to be notified when a UI Object is GC'd and either
* update bookkeeping or perform a more advanced GC operation of our
* own. Whenever a UI Object is GC'd, we mark it's CacheKey for potential
* cleanup.
*
* This is where the Auto vs Manual GC comes into the picture.
*
* - Auto GC: The ResourceManager performs a GC operation using the
* FinalizationRegistry callback as a trigger to schedule the GC.
* - Manual GC: The ResourceManager only used the FinalizationRegistry
* callback to update bookkeeping and relies on the application to
* trigger the GC operation.
*
* Which is best primarily depends on what framework you are using,
* how well you understand your application's scheduling needs and workload,
* and how much control you want to have over the GC process. Currently,
* we think that leaving this decision to the application is the best choice,
* at least until we've had more time to observe how and when applications
* are using the GC in the wild and gathered feedback.
*
* > [!TIP]
* > A manual GC operation is always possible when using auto GC, but it
* > is almost non-sensical to call the GC manually as there would likely be no
* > work to do.
*
*
* ----------------------------------------------------------------
*
* ### Understanding the GC Process
*
* The GC Presumes that the application has fully migrated to using Request based
* patterns and eliminated the use of all legacy resource-centric patterns.
*
* This presumption is necessary because the legacy resource-centric patterns are
* not compatible with the concept of GC, because it is not possible to determine
* when a resource in a relationship is no longer needed by the application except
* for the few cases where it is part of a group of resources that are collectively
* no longer accessible at all.
*
* We use a [Tracing GC approach](https://en.wikipedia.org/wiki/Tracing_garbage_collection)
* but with a twist: reachability refers to the request graph, not the relationship graph.
*
* There are effectively two separate graph traversals possible of data in the WarpDrive Cache:
* - the graph of which requests include which resources
* - the graph of relationships between resources
*
* In our GC, we ignore the relationship graph and focus on the request graph, treating the
* ResourceDocuments as the roots. If the CacheKey for a ResourceDocument has been marked,
* and no other ResourceDocument can reach that document via a relationship of a resource it
* contains directly, then it can be GC'd.
*
* We ignore the relationship graph specifically because every cache insertion is an upsert
* operation on resources. Over time lots of resources become reachable from each other in
* the cache that were not originally reachable from each other in the request graph. While
* this is useful efficient storage and retrieval and mutation management, it makes it mostly
* useless for GC purposes. We care not about what is "physically" reachable from a resource
* but what is "conceptually" reachable from the application's perspective, trusting what the
* application has previously told us via the request graph.
*
* Resources are a bit trickier because there's a few edge cases we have to keep in mind:
* - A resource may have been created on the client and thus not yet be part of any document
* except within the mutated state of a relationship.
* - A resource may have been added to the state of a relationship on a record within a document
* it was not originally part of.
* - A resource may have been added to the cache but never materialized into a UIRecord.
*
* This third nuance is the most interesting because it means that quite easily we could end up
* with orphaned state in the cache if all we rely on is the mark from the FinalizationRegistry
* to generate the list of candidates for GC. For this reason, we always consider any resource
* that was never materialized into a UIRecord as a candidate for GC and initialize its state as
* marked.
*
* With the above in mind, we iterate the list of marked CacheKeys for resources: if the resource
* no longer belongs to any known request, we consider it a candidate.
*
* - if the candidate is not in any relationships, we remove it from the cache
* - if the candidate is only in implicit relationships, we remove it from the cache
* - if the candidate is in a relationship with a non-candidate due to a mutation, we keep it and
* ensure it is added to that document's list of resources. We also add it to a temporary "kept" list.
* - if the candidate is only in relationships with other candidates, it continues be a candidate.
*
* Once we have iterated all candidates, if no records were kept, then we can remove all remaining
* candidates. If records were kept, we do another pass. Kept records become "non-candidates".
*
* - if the candidate is in a relationship with a kept record, and the document the kept record
* was added to has other resources of the same type, we keep the candidate and ensure it is added
* to that document's list of resources. We also add it to a new "kept" list.
* - if the candidate is only in relationships with other candidates, it continues to be a candidate.
*
* We repeat the above process until no new records are kept in a pass, at which point we can remove
* all remaining candidates.
*
* This process may occassionally result in keeping more records than the application actually needed
* us to keep; however, it also ensures that we do not remove records that the application still needs
* and provides a way to ensure that those records are still capable of being GC'd in the future.
*
* @internal
*/
Expand Down

0 comments on commit 3352349

Please sign in to comment.