Skip to content

Commit

Permalink
More thread-safe GC (#529)
Browse files Browse the repository at this point in the history
* slightly more thread safe gc

* use Channel not Vector and make disable/enable a no-op

* document GCHook

* cannot lock channels on julia 1.6

* revert to using a vector for the queue

* restore test script

* combine queue into a single item

* prefer Fix2 over anonymous function

* update docs

* test multithreaded

* test gc from python

* add gc tests

* fix test

* add deprecation warnings

* safer locking (plus explanatory comments)

* ref of weakref

* SpinLock -> ReentrantLock

* SpinLock -> ReentrantLock

* typo: testset -> testitem

* delete redundant test

* remove out of date comment

* comment erroneous test

---------

Co-authored-by: Christopher Doris <github.com/cjdoris>
  • Loading branch information
cjdoris authored Aug 2, 2024
1 parent ce8bd7a commit bcd2bbb
Show file tree
Hide file tree
Showing 8 changed files with 188 additions and 39 deletions.
1 change: 1 addition & 0 deletions .github/workflows/tests-nightly.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ jobs:
- uses: julia-actions/julia-runtest@v1
env:
JULIA_DEBUG: PythonCall
JULIA_NUM_THREADS: '2'
- uses: julia-actions/julia-processcoverage@v1
- uses: codecov/codecov-action@v1
with:
Expand Down
3 changes: 3 additions & 0 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ jobs:
uses: julia-actions/julia-runtest@v1
env:
JULIA_DEBUG: PythonCall
JULIA_NUM_THREADS: '2'
- name: Process coverage
uses: julia-actions/julia-processcoverage@v1
- name: Upload coverage to Codecov
Expand Down Expand Up @@ -82,6 +83,8 @@ jobs:
- name: Run tests
run: |
pytest -s --nbval --cov=pysrc ./pytest/
env:
PYTHON_JULIACALL_THREADS: '2'
- name: Upload coverage to Codecov
uses: codecov/codecov-action@v2
env:
Expand Down
30 changes: 17 additions & 13 deletions docs/src/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,19 +4,23 @@

No.

Some rules if you are writing multithreaded code:
- Only call Python functions from the first thread.
- You probably also need to call `PythonCall.GC.disable()` on the main thread before any
threaded block of code. Remember to call `PythonCall.GC.enable()` again afterwards.
(This is because Julia finalizers can be called from any thread.)
- Julia intentionally causes segmentation faults as part of the GC safepoint mechanism.
If unhandled, these segfaults will result in termination of the process. To enable signal handling,
set `PYTHON_JULIACALL_HANDLE_SIGNALS=yes` before any calls to import juliacall. This is equivalent
to starting julia with `julia --handle-signals=yes`, the default behavior in Julia.
See discussion [here](https://github.com/JuliaPy/PythonCall.jl/issues/219#issuecomment-1605087024) for more information.
- You may still encounter problems.

Related issues: [#201](https://github.com/JuliaPy/PythonCall.jl/issues/201), [#202](https://github.com/JuliaPy/PythonCall.jl/issues/202)
However it is safe to use PythonCall with Julia with multiple threads, provided you only
call Python code from the first thread. (Before v0.9.22, tricks such as disabling the
garbage collector were required.)

From Python, to use JuliaCall with multiple threads you probably need to set
[`PYTHON_JULIACALL_HANDLE_SIGNALS=yes`](@ref julia-config) before importing JuliaCall.
This is because Julia intentionally causes segmentation faults as part of the GC
safepoint mechanism. If unhandled, these segfaults will result in termination of the
process. This is equivalent to starting julia with `julia --handle-signals=yes`, the
default behavior in Julia. See discussion
[here](https://github.com/JuliaPy/PythonCall.jl/issues/219#issuecomment-1605087024)
for more information.

Related issues:
[#201](https://github.com/JuliaPy/PythonCall.jl/issues/201),
[#202](https://github.com/JuliaPy/PythonCall.jl/issues/202),
[#529](https://github.com/JuliaPy/PythonCall.jl/pull/529)

## Issues when Numpy arrays are expected

Expand Down
8 changes: 8 additions & 0 deletions docs/src/releasenotes.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,13 @@
# Release Notes

## Unreleased
* Finalizers are now thread-safe, meaning PythonCall now works in the presence of
multi-threaded Julia code. Previously, tricks such as disabling the garbage collector
were required. Python code must still be called on the main thread.
* `GC.disable()` and `GC.enable()` are now a no-op and deprecated since they are no
longer required for thread-safety. These will be removed in v1.
* Adds `GC.gc()`.

## 0.9.21 (2024-07-20)
* `Serialization.serialize` can use `dill` instead of `pickle` by setting the env var `JULIA_PYTHONCALL_PICKLE=dill`.
* `numpy.bool_` can now be converted to `Bool` and other number types.
Expand Down
23 changes: 23 additions & 0 deletions pytest/test_all.py
Original file line number Diff line number Diff line change
Expand Up @@ -75,3 +75,26 @@ def test_issue_433():
"""
)
assert out == 25

def test_julia_gc():
from juliacall import Main as jl
# We make a bunch of python objects with no reference to them,
# then call GC to try to finalize them.
# We want to make sure we don't segfault.
# We also programmatically check things are working by verifying the queue is empty.
# Debugging note: if you get segfaults, then run the tests with
# `PYTHON_JULIACALL_HANDLE_SIGNALS=yes python3 -X faulthandler -m pytest -p no:faulthandler -s --nbval --cov=pysrc ./pytest/`
# in order to recover a bit more information from the segfault.
jl.seval(
"""
using PythonCall, Test
let
pyobjs = map(pylist, 1:100)
Threads.@threads for obj in pyobjs
finalize(obj)
end
end
GC.gc()
@test isempty(PythonCall.GC.QUEUE.items)
"""
)
125 changes: 100 additions & 25 deletions src/GC/GC.jl
Original file line number Diff line number Diff line change
Expand Up @@ -3,77 +3,152 @@
Garbage collection of Python objects.
See `disable` and `enable`.
See [`gc`](@ref).
"""
module GC

using ..C: C

const ENABLED = Ref(true)
const QUEUE = C.PyPtr[]
const QUEUE = (; items = C.PyPtr[], lock = Threads.SpinLock())
const HOOK = Ref{WeakRef}()

"""
PythonCall.GC.disable()
Disable the PythonCall garbage collector.
Do nothing.
This means that whenever a Python object owned by Julia is finalized, it is not immediately
freed but is instead added to a queue of objects to free later when `enable()` is called.
!!! note
Like most PythonCall functions, you must only call this from the main thread.
Historically this would disable the PythonCall garbage collector. This was required
for safety in multi-threaded code but is no longer needed, so this is now a no-op.
"""
function disable()
ENABLED[] = false
return
Base.depwarn(
"disabling the PythonCall GC is no longer needed for thread-safety",
:disable,
)
nothing
end

"""
PythonCall.GC.enable()
Re-enable the PythonCall garbage collector.
Do nothing.
This frees any Python objects which were finalized while the GC was disabled, and allows
objects finalized in the future to be freed immediately.
!!! note
Like most PythonCall functions, you must only call this from the main thread.
Historically this would enable the PythonCall garbage collector. This was required
for safety in multi-threaded code but is no longer needed, so this is now a no-op.
"""
function enable()
ENABLED[] = true
if !isempty(QUEUE)
for ptr in QUEUE
Base.depwarn(
"disabling the PythonCall GC is no longer needed for thread-safety",
:enable,
)
nothing
end

"""
PythonCall.GC.gc()
Free any Python objects waiting to be freed.
These are objects that were finalized from a thread that was not holding the Python
GIL at the time.
Like most PythonCall functions, this must only be called from the main thread (i.e. the
thread currently holding the Python GIL.)
"""
function gc()
if C.CTX.is_initialized
unsafe_free_queue()
end
nothing
end

function unsafe_free_queue()
Base.@lock QUEUE.lock begin
for ptr in QUEUE.items
if ptr != C.PyNULL
C.Py_DecRef(ptr)
end
end
empty!(QUEUE.items)
end
empty!(QUEUE)
return
nothing
end

function enqueue(ptr::C.PyPtr)
# If the ptr is NULL there is nothing to free.
# If C.CTX.is_initialized is false then the Python interpreter hasn't started yet
# or has been finalized; either way attempting to free will cause an error.
if ptr != C.PyNULL && C.CTX.is_initialized
if ENABLED[]
if C.PyGILState_Check() == 1
# If the current thread holds the GIL, then we can immediately free.
C.Py_DecRef(ptr)
# We may as well also free any other enqueued objects.
if !isempty(QUEUE.items)
unsafe_free_queue()
end
else
push!(QUEUE, ptr)
# Otherwise we push the pointer onto the queue to be freed later, either:
# (a) If a future Python object is finalized on the thread holding the GIL
# in the branch above.
# (b) If the GCHook() object below is finalized in an ordinary GC.
# (c) If the user calls PythonCall.GC.gc().
Base.@lock QUEUE.lock push!(QUEUE.items, ptr)
end
end
return
nothing
end

function enqueue_all(ptrs)
if C.CTX.is_initialized
if ENABLED[]
if any(!=(C.PYNULL), ptrs) && C.CTX.is_initialized
if C.PyGILState_Check() == 1
for ptr in ptrs
if ptr != C.PyNULL
C.Py_DecRef(ptr)
end
end
if !isempty(QUEUE.items)
unsafe_free_queue()
end
else
append!(QUEUE, ptrs)
Base.@lock QUEUE.lock append!(QUEUE.items, ptrs)
end
end
return
nothing
end

"""
GCHook()
An immortal object which frees any pending Python objects when Julia's GC runs.
This works by creating it but not holding any strong reference to it, so it is eligible
to be finalized by Julia's GC. The finalizer empties the PythonCall GC queue if
possible. The finalizer also re-attaches itself, so the object does not actually get
collected and so the finalizer will run again at next GC.
"""
mutable struct GCHook
function GCHook()
finalizer(_gchook_finalizer, new())
end
end

function _gchook_finalizer(x)
if C.CTX.is_initialized
finalizer(_gchook_finalizer, x)
if !isempty(QUEUE.items) && C.PyGILState_Check() == 1
unsafe_free_queue()
end
end
nothing
end

function __init__()
HOOK[] = WeakRef(GCHook())
nothing
end

end # module GC
28 changes: 27 additions & 1 deletion test/GC.jl
Original file line number Diff line number Diff line change
@@ -1 +1,27 @@
# TODO
@testitem "GC.gc()" begin
let
pyobjs = map(pylist, 1:100)
Threads.@threads for obj in pyobjs
finalize(obj)
end
end
# The GC sometimes actually frees everything before this line.
# We can uncomment this line if we GIL.@release the above block once we have it.
# Threads.nthreads() > 1 && @test !isempty(PythonCall.GC.QUEUE.items)
PythonCall.GC.gc()
@test isempty(PythonCall.GC.QUEUE.items)
end

@testitem "GC.GCHook" begin
let
pyobjs = map(pylist, 1:100)
Threads.@threads for obj in pyobjs
finalize(obj)
end
end
# The GC sometimes actually frees everything before this line.
# We can uncomment this line if we GIL.@release the above block once we have it.
# Threads.nthreads() > 1 && @test !isempty(PythonCall.GC.QUEUE.items)
GC.gc()
@test isempty(PythonCall.GC.QUEUE.items)
end
9 changes: 9 additions & 0 deletions test/finalize_test_script.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
using PythonCall

# This would consistently segfault pre-GC-thread-safety
let
pyobjs = map(pylist, 1:100)
Threads.@threads for obj in pyobjs
finalize(obj)
end
end

0 comments on commit bcd2bbb

Please sign in to comment.