Skip to content

Commit

Permalink
Merge branch 'main' into optimize_nth_and_nthback_for_boundlistiter
Browse files Browse the repository at this point in the history
  • Loading branch information
Owen-CH-Leung committed Jan 9, 2025
2 parents 40d38f3 + c0f08c2 commit 1b19616
Show file tree
Hide file tree
Showing 54 changed files with 1,523 additions and 376 deletions.
16 changes: 13 additions & 3 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -656,14 +656,17 @@ jobs:
# ubuntu x86_64 -> windows x86_64
- os: "ubuntu-latest"
target: "x86_64-pc-windows-gnu"
flags: "-i python3.12 --features abi3 --features generate-import-lib"
manylinux: off
flags: "-i python3.12 --features generate-import-lib"
# macos x86_64 -> aarch64
- os: "macos-13" # last x86_64 macos runners
target: "aarch64-apple-darwin"
# macos aarch64 -> x86_64
- os: "macos-latest"
target: "x86_64-apple-darwin"
# windows x86_64 -> aarch64
- os: "windows-latest"
target: "aarch64-pc-windows-msvc"
flags: "-i python3.12 --features generate-import-lib"
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
Expand All @@ -677,11 +680,18 @@ jobs:
- name: Setup cross-compiler
if: ${{ matrix.target == 'x86_64-pc-windows-gnu' }}
run: sudo apt-get install -y mingw-w64 llvm
- uses: PyO3/maturin-action@v1
- name: Compile version-specific library
uses: PyO3/maturin-action@v1
with:
target: ${{ matrix.target }}
manylinux: ${{ matrix.manylinux }}
args: --release -m examples/maturin-starter/Cargo.toml ${{ matrix.flags }}
- name: Compile abi3 library
uses: PyO3/maturin-action@v1
with:
target: ${{ matrix.target }}
manylinux: ${{ matrix.manylinux }}
args: --release -m examples/maturin-starter/Cargo.toml --features abi3 ${{ matrix.flags }}

test-cross-compilation-windows:
needs: [fmt]
Expand Down
7 changes: 6 additions & 1 deletion build.rs
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
use std::env;

use pyo3_build_config::pyo3_build_script_impl::{cargo_env_var, errors::Result};
use pyo3_build_config::{bail, print_feature_cfgs, InterpreterConfig};
use pyo3_build_config::{
add_python_framework_link_args, bail, print_feature_cfgs, InterpreterConfig,
};

fn ensure_auto_initialize_ok(interpreter_config: &InterpreterConfig) -> Result<()> {
if cargo_env_var("CARGO_FEATURE_AUTO_INITIALIZE").is_some() && !interpreter_config.shared {
Expand Down Expand Up @@ -42,6 +44,9 @@ fn configure_pyo3() -> Result<()> {
// Emit cfgs like `invalid_from_utf8_lint`
print_feature_cfgs();

// Make `cargo test` etc work on macOS with Xcode bundled Python
add_python_framework_link_args();

Ok(())
}

Expand Down
22 changes: 11 additions & 11 deletions guide/src/building-and-distribution.md
Original file line number Diff line number Diff line change
Expand Up @@ -144,7 +144,17 @@ rustflags = [
]
```

Using the MacOS system python3 (`/usr/bin/python3`, as opposed to python installed via homebrew, pyenv, nix, etc.) may result in runtime errors such as `Library not loaded: @rpath/Python3.framework/Versions/3.8/Python3`. These can be resolved with another addition to `.cargo/config.toml`:
Using the MacOS system python3 (`/usr/bin/python3`, as opposed to python installed via homebrew, pyenv, nix, etc.) may result in runtime errors such as `Library not loaded: @rpath/Python3.framework/Versions/3.8/Python3`.

The easiest way to set the correct linker arguments is to add a `build.rs` with the following content:

```rust,ignore
fn main() {
pyo3_build_config::add_python_framework_link_args();
}
```

Alternatively it can be resolved with another addition to `.cargo/config.toml`:

```toml
[build]
Expand All @@ -153,16 +163,6 @@ rustflags = [
]
```

Alternatively, one can include in `build.rs`:

```rust
fn main() {
println!(
"cargo:rustc-link-arg=-Wl,-rpath,/Library/Developer/CommandLineTools/Library/Frameworks"
);
}
```

For more discussion on and workarounds for MacOS linking problems [see this issue](https://github.com/PyO3/pyo3/issues/1800#issuecomment-906786649).

Finally, don't forget that on MacOS the `extension-module` feature will cause `cargo test` to fail without the `--no-default-features` flag (see [the FAQ](https://pyo3.rs/main/faq.html#i-cant-run-cargo-test-or-i-cant-build-in-a-cargo-workspace-im-having-linker-issues-like-symbol-not-found-or-undefined-reference-to-_pyexc_systemerror)).
Expand Down
42 changes: 31 additions & 11 deletions guide/src/free-threading.md
Original file line number Diff line number Diff line change
Expand Up @@ -156,20 +156,40 @@ freethreaded build, holding a `'py` lifetime means only that the thread is
currently attached to the Python interpreter -- other threads can be
simultaneously interacting with the interpreter.

The main reason for obtaining a `'py` lifetime is to interact with Python
You still need to obtain a `'py` lifetime is to interact with Python
objects or call into the CPython C API. If you are not yet attached to the
Python runtime, you can register a thread using the [`Python::with_gil`]
function. Threads created via the Python [`threading`] module do not not need to
do this, but all other OS threads that interact with the Python runtime must
explicitly attach using `with_gil` and obtain a `'py` liftime.

Since there is no GIL in the free-threaded build, releasing the GIL for
long-running tasks is no longer necessary to ensure other threads run, but you
should still detach from the interpreter runtime using [`Python::allow_threads`]
when doing long-running tasks that do not require the CPython runtime. The
garbage collector can only run if all threads are detached from the runtime (in
a stop-the-world state), so detaching from the runtime allows freeing unused
memory.
do this, and pyo3 will handle setting up the [`Python<'py>`] token when CPython
calls into your extension.

### Global synchronization events can cause hangs and deadlocks

The free-threaded build triggers global synchronization events in the following
situations:

* During garbage collection in order to get a globally consistent view of
reference counts and references between objects
* In Python 3.13, when the first background thread is started in
order to mark certain objects as immortal
* When either `sys.settrace` or `sys.setprofile` are called in order to
instrument running code objects and threads
* Before `os.fork()` is called.

This is a non-exhaustive list and there may be other situations in future Python
versions that can trigger global synchronization events.

This means that you should detach from the interpreter runtime using
[`Python::allow_threads`] in exactly the same situations as you should detach
from the runtime in the GIL-enabled build: when doing long-running tasks that do
not require the CPython runtime or when doing any task that needs to re-attach
to the runtime (see the [guide
section](parallelism.md#sharing-python-objects-between-rust-threads) that
covers this). In the former case, you would observe a hang on threads that are
waiting on the long-running task to complete, and in the latter case you would
see a deadlock while a thread tries to attach after the runtime triggers a
global synchronization event, but the spawning thread prevents the
synchronization event from completing.

### Exceptions and panics for multithreaded access of mutable `pyclass` instances

Expand Down
59 changes: 58 additions & 1 deletion guide/src/parallelism.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Parallelism

CPython has the infamous [Global Interpreter Lock](https://docs.python.org/3/glossary.html#term-global-interpreter-lock), which prevents several threads from executing Python bytecode in parallel. This makes threading in Python a bad fit for [CPU-bound](https://en.wikipedia.org/wiki/CPU-bound) tasks and often forces developers to accept the overhead of multiprocessing.
CPython has the infamous [Global Interpreter Lock](https://docs.python.org/3/glossary.html#term-global-interpreter-lock) (GIL), which prevents several threads from executing Python bytecode in parallel. This makes threading in Python a bad fit for [CPU-bound](https://en.wikipedia.org/wiki/CPU-bound) tasks and often forces developers to accept the overhead of multiprocessing. There is an experimental "free-threaded" version of CPython 3.13 that does not have a GIL, see the PyO3 docs on [free-threaded Python](./free-threading.md) for more information about that.

In PyO3 parallelism can be easily achieved in Rust-only code. Let's take a look at our [word-count](https://github.com/PyO3/pyo3/blob/main/examples/word-count/src/lib.rs) example, where we have a `search` function that utilizes the [rayon](https://github.com/rayon-rs/rayon) crate to count words in parallel.
```rust,no_run
Expand Down Expand Up @@ -117,4 +117,61 @@ test_word_count_python_sequential 27.3985 (15.82) 45.452

You can see that the Python threaded version is not much slower than the Rust sequential version, which means compared to an execution on a single CPU core the speed has doubled.

## Sharing Python objects between Rust threads

In the example above we made a Python interface to a low-level rust function,
and then leveraged the python `threading` module to run the low-level function
in parallel. It is also possible to spawn threads in Rust that acquire the GIL
and operate on Python objects. However, care must be taken to avoid writing code
that deadlocks with the GIL in these cases.

* Note: This example is meant to illustrate how to drop and re-acquire the GIL
to avoid creating deadlocks. Unless the spawned threads subsequently
release the GIL or you are using the free-threaded build of CPython, you
will not see any speedups due to multi-threaded parallelism using `rayon`
to parallelize code that acquires and holds the GIL for the entire
execution of the spawned thread.

In the example below, we share a `Vec` of User ID objects defined using the
`pyclass` macro and spawn threads to process the collection of data into a `Vec`
of booleans based on a predicate using a rayon parallel iterator:

```rust,no_run
use pyo3::prelude::*;
// These traits let us use int_par_iter and map
use rayon::iter::{IntoParallelRefIterator, ParallelIterator};
#[pyclass]
struct UserID {
id: i64,
}
let allowed_ids: Vec<bool> = Python::with_gil(|outer_py| {
let instances: Vec<Py<UserID>> = (0..10).map(|x| Py::new(outer_py, UserID { id: x }).unwrap()).collect();
outer_py.allow_threads(|| {
instances.par_iter().map(|instance| {
Python::with_gil(|inner_py| {
instance.borrow(inner_py).id > 5
})
}).collect()
})
});
assert!(allowed_ids.into_iter().filter(|b| *b).count() == 4);
```

It's important to note that there is an `outer_py` GIL lifetime token as well as
an `inner_py` token. Sharing GIL lifetime tokens between threads is not allowed
and threads must individually acquire the GIL to access data wrapped by a python
object.

It's also important to see that this example uses [`Python::allow_threads`] to
wrap the code that spawns OS threads via `rayon`. If this example didn't use
`allow_threads`, a rayon worker thread would block on acquiring the GIL while a
thread that owns the GIL spins forever waiting for the result of the rayon
thread. Calling `allow_threads` allows the GIL to be released in the thread
collecting the results from the worker threads. You should always call
`allow_threads` in situations that spawn worker threads, but especially so in
cases where worker threads need to acquire the GIL, to prevent deadlocks.

[`Python::allow_threads`]: {{#PYO3_DOCS_URL}}/pyo3/marker/struct.Python.html#method.allow_threads
13 changes: 13 additions & 0 deletions guide/src/performance.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,19 @@ impl PartialEq<Foo> for FooBound<'_> {
}
```

## Calling Python callables (`__call__`)
CPython support multiple calling protocols: [`tp_call`] and [`vectorcall`]. [`vectorcall`] is a more efficient protocol unlocking faster calls.
PyO3 will try to dispatch Python `call`s using the [`vectorcall`] calling convention to archive maximum performance if possible and falling back to [`tp_call`] otherwise.
This is implemented using the (internal) `PyCallArgs` trait. It defines how Rust types can be used as Python `call` arguments. This trait is currently implemented for
- Rust tuples, where each member implements `IntoPyObject`,
- `Bound<'_, PyTuple>`
- `Py<PyTuple>`
Rust tuples may make use of [`vectorcall`] where as `Bound<'_, PyTuple>` and `Py<PyTuple>` can only use [`tp_call`]. For maximum performance prefer using Rust tuples as arguments.


[`tp_call`]: https://docs.python.org/3/c-api/call.html#the-tp-call-protocol
[`vectorcall`]: https://docs.python.org/3/c-api/call.html#the-vectorcall-protocol

## Disable the global reference pool

PyO3 uses global mutable state to keep track of deferred reference count updates implied by `impl<T> Drop for Py<T>` being called without the GIL being held. The necessary synchronization to obtain and apply these reference count updates when PyO3-based code next acquires the GIL is somewhat expensive and can become a significant part of the cost of crossing the Python-Rust boundary.
Expand Down
1 change: 1 addition & 0 deletions newsfragments/4768.added.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Added `PyCallArgs` trait for arguments into the Python calling protocol. This enabled using a faster calling convention for certain types, improving performance.
1 change: 1 addition & 0 deletions newsfragments/4768.changed.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
`PyAnyMethods::call` an friends now require `PyCallArgs` for their positional arguments.
4 changes: 4 additions & 0 deletions newsfragments/4788.fixed.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
* Fixed thread-unsafe access of dict internals in BoundDictIterator on the
free-threaded build.
* Avoided creating unnecessary critical sections in BoundDictIterator
implementation on the free-threaded build.
3 changes: 3 additions & 0 deletions newsfragments/4789.added.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
* Added `PyList::locked_for_each`, which is equivalent to `PyList::for_each` on
the GIL-enabled build and uses a critical section to lock the list on the
free-threaded build, similar to `PyDict::locked_for_each`.
2 changes: 2 additions & 0 deletions newsfragments/4789.changed.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
* Operations that process a PyList via an iterator now use a critical section
on the free-threaded build to amortize synchronization cost and prevent race conditions.
1 change: 1 addition & 0 deletions newsfragments/4800.fixed.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
fix: cross-compilation compatibility checks for Windows
1 change: 1 addition & 0 deletions newsfragments/4802.fixed.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Fixed missing struct fields on GraalPy when subclassing builtin classes
1 change: 1 addition & 0 deletions newsfragments/4808.fixed.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Fix generating import lib for python3.13t when `abi3` feature is enabled.
1 change: 1 addition & 0 deletions newsfragments/4814.fixed.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
`derive(FromPyObject)` support raw identifiers like `r#box`.
1 change: 1 addition & 0 deletions newsfragments/4822.changed.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Bumped `target-lexicon` dependency to 0.13
1 change: 1 addition & 0 deletions newsfragments/4832.fixed.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
`#[pyclass]` complex enums support more than 12 variant fields.
1 change: 1 addition & 0 deletions newsfragments/4833.added.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Add `pyo3_build_config::add_python_framework_link_args` build script API to set rpath when using macOS system Python.
8 changes: 4 additions & 4 deletions pyo3-build-config/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -13,12 +13,12 @@ rust-version = "1.63"

[dependencies]
once_cell = "1"
python3-dll-a = { version = "0.2.11", optional = true }
target-lexicon = "0.12.14"
python3-dll-a = { version = "0.2.12", optional = true }
target-lexicon = "0.13"

[build-dependencies]
python3-dll-a = { version = "0.2.11", optional = true }
target-lexicon = "0.12.14"
python3-dll-a = { version = "0.2.12", optional = true }
target-lexicon = "0.13"

[features]
default = []
Expand Down
Loading

0 comments on commit 1b19616

Please sign in to comment.