Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add GIL release for statistics #3077

Merged
merged 1 commit into from
Jan 9, 2025
Merged

Conversation

hanbin973
Copy link
Contributor

Description

This is a simple addition that releases GIL for a number of statistics methods including genetic_relatedness_vector.
Given this addition, one can use the threading module to enable multi-threading.

import tskit
import msprime
import numpy as np
import time
import matplotlib.pyplot as plt

import threading

def _genetic_relatedness_vector_threads(
        ts: tskit.TreeSequence,
        W: np.ndarray,
        num_threads: int = 4,
        ) -> np.ndarray:

    def _f(chunk, x):
        x += ts.genetic_relatedness_vector(
                W, mode='branch', centre=False, windows=chunk
            )[0]

    x = np.zeros_like(W)
    chunks = np.linspace(0, ts.sequence_length, num_threads+1)
    threads = [
            threading.Thread(target=_f, args=(chunks[j:j+2], x)) for j in range(num_threads)
            ]
    
    for t in threads:
        t.start()
    for t in threads:
        t.join()

    return x

num_cols = [1, 10, 20, 50, 100, 200, 500]
time_thread = []
time_nothread = []
for num_col in num_cols:
    W = np.random.normal(size=(ts.num_samples, num_col))
    
    # threading
    t1 = time.time()
    x = _genetic_relatedness_vector_threads(ts, W)
    t2 = time.time()
    time_thread.append(t2-t1)
    
    # nothreading
    t1 = time.time()
    x = ts.genetic_relatedness_vector(W, mode='branch', centre=False)
    t2 = time.time()
    time_nothread.append(t2-t1)

fig, ax = plt.subplots(figsize=(6,4))
ax.plot(num_cols, time_thread, label='with threading')
ax.plot(num_cols, time_nothread, label='without threading')

ax.set_xlabel('Number of matrix columns', fontsize=12)
ax.set_ylabel('Time (s)', fontsize=12)
ax.set_title('#individual=%d, sequence length=%d, #thread=4' % (num_individuals, seq_length), fontsize=12)

ax.legend(fontsize=12)
plt.show()

image

Copy link

codecov bot commented Jan 8, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.86%. Comparing base (ffecae9) to head (a666b3a).
Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #3077   +/-   ##
=======================================
  Coverage   89.86%   89.86%           
=======================================
  Files          29       29           
  Lines       32155    32157    +2     
  Branches     5768     5768           
=======================================
+ Hits        28895    28897    +2     
  Misses       1859     1859           
  Partials     1401     1401           
Flag Coverage Δ
c-tests 86.71% <ø> (ø)
lwt-tests 80.78% <ø> (ø)
python-c-tests 89.05% <100.00%> (+<0.01%) ⬆️
python-tests 98.98% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
python/_tskitmodule.c 89.05% <100.00%> (+<0.01%) ⬆️

@petrelharp
Copy link
Contributor

LGTM! This should be thread-safe.

Do we want this example in the docs (separate PR)?

Copy link
Member

@benjeffery benjeffery left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems we're not touching Python objects in that section so should be good.

@jeromekelleher jeromekelleher added the AUTOMERGE-REQUESTED Ask Mergify to merge this PR label Jan 9, 2025
Copy link
Contributor

mergify bot commented Jan 9, 2025

This pull request has been removed from the queue for the following reason: pull request branch update failed.

The pull request can't be updated

You should look at the reason for the failure and decide if the pull request needs to be fixed or if you want to requeue it.

If you want to requeue this pull request, you need to post a comment with the text: @mergifyio requeue

@benjeffery
Copy link
Member

@Mergifyio requeue

Copy link
Contributor

mergify bot commented Jan 9, 2025

requeue

✅ The queue state of this pull request has been cleaned. It can be re-embarked automatically

Copy link
Contributor

mergify bot commented Jan 9, 2025

This pull request has been removed from the queue for the following reason: pull request branch update failed.

The pull request can't be updated

You should look at the reason for the failure and decide if the pull request needs to be fixed or if you want to requeue it.

If you want to requeue this pull request, you need to post a comment with the text: @mergifyio requeue

@mergify mergify bot merged commit 0427239 into tskit-dev:main Jan 9, 2025
20 of 21 checks passed
@mergify mergify bot removed the AUTOMERGE-REQUESTED Ask Mergify to merge this PR label Jan 9, 2025
@hanbin973
Copy link
Contributor Author

LGTM! This should be thread-safe.

Do we want this example in the docs (separate PR)?

I'd love to contribute. One question is why was the old threading article removed from the docs? My code was merely a lazy adaptation of the one from the old article.

@benjeffery
Copy link
Member

One question is why was the old threading article removed from the docs?

@hyanwong has been moving some things around to places like the tutorials, I assume that was why?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants