Special topic chapter for finalizers and weak references #1265

wks · 2025-01-16T12:05:15Z

This PR adds a special topic chapter in the Porting Guide for supporting finalizers and weak references. This topic is frequently asked and somewhat complex, and needs a dedicated chapter.

We also updated the doc comments of the Scanning::process_weak_refs API to add code example of the intended use case, and warn the users about potential pitfalls.

Added a special topic chapter for how to implement finalizers and weak references with MMTk.

wks · 2025-01-23T07:49:49Z

I have finished the chapter, and I invited @qinsoon and @k-sareen to review it. There are some things I haven't done in this PR, but can be added later.

Ephemerons: We currently have not implemented ephemerons in any VMs. We need Ephemeron for V8, but that's not a priority at this moment. We can add a subsection for ephemerons when we implemented it, or when someone needs help implementing ephemerons with MMTk. (Update: I added a section for ephemerons. But since we don't have a VM that actively uses that algorithm, we can't verify if that algorithm is correct.)
Hyperlinks between the doc comments in the code and the Porting Guide chapter. The porting guide needs to be rendered in order to be published to https://docs.mmtk.io/, and that'll happen after this PR is merged. We can add the URL when the new chapter becomes publicly available.

Maybe I also need to define some concepts in that chapter, such as what exactly is a finalizer or a weak reference. I think the current state is clear enough for people with experience with GC and VM development. But please let met know if anything is not clear.

wks · 2025-01-23T08:43:23Z

It is still arguable whether "retain" or "resurrect" is a better word for keeping an unreachable object alive in a GC. "Resurrect" is more intuitive because we also talk about "finalizers can resurrect dead objects". But the word "resurrect" may imply that the object has already died once. It is OK for finalizers because the object has already become unreachable during mutator time, which can be considered as "dead". But for soft references, it is a bit awkward to say we "resurrect" the referent of a soft reference because that implies we consider the referent as "dead" before. It is "dead" in the sense that it is not strongly reachable, but softly reachable objects usually live through GCs, as long as it is not emergency GC. Maybe "retain" is better when describing soft references.

In existing code, we use the word "retain" in many places in ReferenceProcessor. There is only one use of "resurrect" in FinalizerProcessor, and it was added by me.

qinsoon

The doc is written with dfiferent definitions of 'weak ref', and 'resurrect objects'. I think we should refer to a well-accepted literature such as the GC handbook for such definitions, and be consistent with their definitions. In my opinion, we need a pass through the section to fix the definitions and related descriptions. I would like to do a detailed review after that.

qinsoon · 2025-01-23T09:12:13Z

docs/userguide/src/SUMMARY.md

@@ -36,6 +36,8 @@
    - [Performance Tuning](portingguide/perf_tuning/prefix.md)
        - [Link Time Optimization](portingguide/perf_tuning/lto.md)
        - [Optimizing Allocation](portingguide/perf_tuning/alloc.md)
+    - [Special Topics](portingguide/topics/prefix.md)


I suggest using something like "Runtime Features", or "Language Features" as the title. I don't see how this section is 'special' compared to other sections.

What about "VM-Specific Concerns"?

The GC Handbook has a chapter named Language-Specific Concerns, and it discusses only finalizers and weak references. I think that's because finalizer and weak reference semantics is part of the programming language, and is visible to the programmers. Things like conservative stack scanning, object pinning and interior pointers are not all exposed at the language level, but they are more related to VM implementations. So "VM-Specific Concerns" may be a better title.

I don't like calling them "Runtime/Language Features" because "features" are things meant to be used by their users, but things like stack scanning are peculiar aspects of those runtimes/languages that their implementers should care about.

And I think "VM peculiarities" also describes what the chapters in this part are about. But it sounds offensive to the VMs.

qinsoon · 2025-01-24T00:05:01Z

docs/userguide/src/portingguide/topics/prefix.md

@@ -0,0 +1,5 @@
+# Special topics
+
+Every VM is special in some way.  Because of this, some VM bindings may use MMTk features not


If we change the title, the paragraph needs to be changed accordingly. I feel it is more reasonable that this section is dedicated to how to implement different runtime features with MMTk.

qinsoon · 2025-01-24T00:10:09Z

docs/userguide/src/portingguide/topics/weakref.md

+Some VMs support **weak references**.  If an object cannot be reached from roots following only
+strong references, the object will be considered dead.  Weak references to dead objects will be
+cleared, and associated clean-up operations will be executed.  Some VMs also support more complex


The definition for weak references is incorrect.

If an object cannot be reached from roots following only strong references, the object will be considered dead.

If an object cannot be reached from strong references, it is not "strongly reachable". It is not dead, as it can still be "weakly reachable". Sometimes "reachable" implies strongly reachable. That's fine. But this should not be used in a context that may cause confusion.

Weak references to dead objects will be cleared, and associated clean-up operations will be executed.

This is indeed confusing. If we have a weak reference to the object, the object is not dead. It is up to the language semantics or the VM to decide whether to keep the weakly-referenced object alive or not. Only when the VM decides that the referent will not be kept alive, the object/referent is dead, otherwise it is alive.

I will rewrite this part using definitions from the GC handbook, and probably add a "Definitions" section before the "Overview" section.

I didn't use the terms "strongly/weakly reachable" because I thought they were Java-specific. But since the GC Handbook also uses that term in the language-neutral introduction section of weak references, I believe those terms should be acceptable for other VMs in general, too.

qinsoon · 2025-01-24T00:18:35Z

docs/userguide/src/portingguide/topics/weakref.md

+-   **Query forwarded address**: If an object is already reached, the VM binding can further query
+    the new address of an object.  This is needed to support copying GC.
+    +   Do this with `ObjectReference::get_forwarded_object()`.
+-   **Resurrect objects**: If an object is not reached, the VM binding can optionally resurrect the


See my argument above for the definition of weak ref. We do not 'resurect' weak reference.

It is arguable whether we should say 'resurrect' finalizable objects. GC needs to keep finalizable objects alive until they are properly finalized -- I don't want to call it 'resurrect', simply it is not dead from the GC's perspective. Also 'resurrecting objects' normally refers to the specific application behavior that an object that is being finalized and should become dead after finalization is at the language/application level kept alive. Using 'resurrect' in the GC may get things confusing.

GC handbook's definition for 'resurrection': "an action performed by a finaliser that caused the previously unreachable object to become reachable."

I consulted @eliotmoss and read the GC handbook, and they both agree with what you said. "Resurrection" refers to the action in the finalizer that makes the object reachable from other parts of the program by, for example, assigning its reference to a global variable. This is not what tracer.trace_object does. Even if we call trace_object(object), it still doesn't guarantee the object will "resurrect" in that sense. If the finalize() function doesn't leak its reference, the object will still be collectable after finalize() returns.

"Retain" should be a better term, and it is already used in JikesRVM as well as our existing ReferenceProcessor in mmtk-core.

qinsoon · 2025-01-24T00:23:58Z

docs/userguide/src/portingguide/topics/weakref.md

+descendants) live through the current GC.  The typical use pattern is:
+
+```rust
+impl<VM: VMBinding> Scanning<VM> for VMScanning {


It is generally preferrable to put the code into a test, and include the snippets in the test in the doc. In such cases, the code will be always checked by the tests and the CI. Maybe we cannot run process_weak_refs in our mock tests, but we can still make sure that it can be compiled.

See here for examples:

mmtk-core/docs/userguide/src/portingguide/perf_tuning/alloc.md

Line 36 in 051bc74

{{#include ../../../../../src/vm/tests/mock_tests/mock_test_doc_mutator_storage.rs:mutator_storage_boxed_pointer}}

k-sareen · 2025-01-24T00:42:13Z

I would avoid inventing terminology and just use what is in the GC Handbook. The handbook uses resurrect for finalizers and "trace" for soft references. Pg 213 onwards for the First Edition of the GC Handbook

wks added 2 commits January 21, 2025 19:33

Special topic chapter for finalizer & weakref

94f192b

Added a special topic chapter for how to implement finalizers and weak references with MMTk.

State machine and deprecated things

be789b2

wks force-pushed the feature/weak-final-guide branch from b2ea9b3 to be789b2 Compare January 21, 2025 11:51

wks added 7 commits January 22, 2025 16:19

Optimization section

a176a41

Update comments

f1c0970

Minor fixes

29d443c

Use the word "resurrect" consistently

79c8398

Rewrite part of the finalizers section

fc946c8

Revise the weakref and optimization sections

160368c

Use en_US spelling consistently

b9c4086

wks changed the title ~~WIP: Special topic chapter for weak references~~ Special topic chapter for finalizers and weak references Jan 23, 2025

wks marked this pull request as ready for review January 23, 2025 06:24

wks requested review from qinsoon and k-sareen January 23, 2025 06:25

Ephemeron

9ec8d5d

qinsoon reviewed Jan 24, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Special topic chapter for finalizers and weak references #1265

Special topic chapter for finalizers and weak references #1265

wks commented Jan 16, 2025 •

edited

Loading

wks commented Jan 23, 2025 •

edited

Loading

wks commented Jan 23, 2025

qinsoon left a comment

qinsoon Jan 23, 2025

wks Jan 24, 2025

qinsoon Jan 24, 2025

qinsoon Jan 24, 2025

wks Jan 24, 2025

qinsoon Jan 24, 2025

wks Jan 24, 2025

qinsoon Jan 24, 2025

k-sareen commented Jan 24, 2025 •

edited

Loading

		@@ -0,0 +1,5 @@
		# Special topics

		Every VM is special in some way. Because of this, some VM bindings may use MMTk features not

Special topic chapter for finalizers and weak references #1265

Are you sure you want to change the base?

Special topic chapter for finalizers and weak references #1265

Conversation

wks commented Jan 16, 2025 • edited Loading

wks commented Jan 23, 2025 • edited Loading

wks commented Jan 23, 2025

qinsoon left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

k-sareen commented Jan 24, 2025 • edited Loading

wks commented Jan 16, 2025 •

edited

Loading

wks commented Jan 23, 2025 •

edited

Loading

k-sareen commented Jan 24, 2025 •

edited

Loading