generated from neuroinformatics-unit/quarto-presentation-template
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathindex.qmd
212 lines (171 loc) · 6.69 KB
/
index.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
---
title: Multi-threading/processing for large array data, in pytorch
subtitle: what I learnt reviewing cellfinder PR \#440
author: Alessandro Felder, (Matt Einhorn)
execute:
enabled: true
format:
revealjs:
theme: [default, niu-dark.scss]
logo: img/logo_niu_dark.png
footer: "Multi-threading/processing | 2024-09-03"
slide-number: c
menu:
numbers: true
chalkboard: true
scrollable: true
preview-links: false
view-distance: 10
mobile-view-distance: 10
auto-animate: true
auto-play-media: true
code-overflow: wrap
highlight-style: atom-one
mermaid:
theme: neutral
fontFamily: arial
curve: linear
html:
theme: [default, niu-dark.scss]
logo: img/logo_niu_dark.png
date: "2023-07-05"
toc: true
code-overflow: scroll
highlight-style: atom-one
mermaid:
theme: neutral
fontFamily: arial
curve: linear
margin-left: 0
embed-resources: true
page-layout: full
my-custom-stuff:
my-reuseable-variable: "some stuff"
---
## Table of contents
* Context
* Threads, Processes and Queues in Python
* Multiprocessing in Pytorch
* Redesigning multiprocessing for cellfinder, in PyTorch
## Context {.smaller}
::: {.incremental}
* cellfinder classification has moved to `pytorch` (thanks, Igor!)
* Matt (developer at Cornell) has become a regular cellfinder contributor
* knows pytorch
* his lab needs speed (for e.g. CFOs whole-brain stained samples)
* Matt translated the "cell candidate detection steps" to pytorch
* I needed to learn how parallelisation works in pytorch, to review the code.
* turns out I needed to learn Python first!
:::
## Threads versus Processes[^1]
::: {.fragment .fade-in-then-semi-out}
::: {style="margin-top: 1em; font-size: 0.5em;"}
"A process is an instance of program (e.g. Jupyter notebook, Python interpreter). Processes spawn threads (sub-processes) to handle subtasks like reading keystrokes, loading HTML pages, saving files. Threads live inside processes and share the same memory space."
:::
:::
:::: {.columns}
::: {.column width="50%" style="font-size: 0.5em;"}
::: {.fragment .fade-in}
Processes
::: {.incremental}
* can have multiple threads
* can execute code simultaneously in the same python program
* have more overhead than threads as opening and closing processes takes more time
* Sharing information is slower than sharing between threads as processes do not share memory space (pickling!).
:::
:::
:::
::: {.column width="50%" style="font-size: 0.5em;"}
::: {.fragment .fade-in}
Threads
::: {.incremental}
* are like mini-processes that live inside a process
* share memory space and efficiently read and write to the same variables
* cannot execute code simultaneously in the same python program (although there are workarounds)
:::
:::
:::
::::
[^1]: [Brendan Fortuner on Medium](https://medium.com/@bfortuner/python-multithreading-vs-multiprocessing-73072ce5600b)
## A Python Queue
::: {style="text-align: center; margin-top: 1em"}
[A Python Queue](https://docs.python.org/3/library/queue.html#queue.Queue){preview-link="true" style="text-align: center"}
:::
## Multithreading
```{python}
#| echo: true
# threads share local memory (by default)
from threading import Thread
from queue import Queue
def put_hello_in_queue(q):
q.put('hello')
if __name__ == '__main__':
q = Queue()
print(type(q))
threads = []
for i in range(7):
t = Thread(target=put_hello_in_queue, args=(q,))
threads.append(t)
t.start()
for t in threads:
t.join()
print([q.get() for i in range(q.qsize())])
```
## Multiprocessing
```{.python code-line-numbers="1|8-9|14|17"}
import multiprocessing as mp
def put_hello_in_queue(q):
q.put('hello')
if __name__ == "__main__":
ctx = mp.get_context('spawn')
q = ctx.Queue() # multiprocessing queue contents are shared across processes
print(type(q))
processes = []
for i in range(7):
p = ctx.Process(target=put_hello_in_queue, args=(q,))
processes.append(p)
p.start()
for p in processes:
p.join()
print([q.get() for i in range(7)])
```
* [Python Multiprocessing module](https://docs.python.org/3/library/multiprocessing.html#module-multiprocessing)
* [Multiprocessing Queue](https://docs.python.org/3/library/multiprocessing.html#multiprocessing.Queue)
* [Pickling](https://docs.python.org/3/library/pickle.html#what-can-be-pickled-and-unpickled)
## Pytorch multiprocessing
::: {style="text-align: center; margin-top: 1em"}
[torch.multiprocessing is a wrapper around the native multiprocessing module.](https://pytorch.org/docs/stable/multiprocessing.html){preview-link="true" style="text-align: center"}
:::
## Pytorch multiprocessing
::: {style="text-align: center; margin-top: 1em"}
[Sharing CUDA tensors](https://pytorch.org/docs/stable/multiprocessing.html#sharing-cuda-tensors){preview-link="true" style="text-align: center"}
:::
## Cellfinder multiprocessing/threading
New `pytorch`-friendly implementation of parallelisation in cellfinder's cell candidate detection step
::: {style="text-align: center; margin-top: 1em"}
[threading.py](https://github.com/brainglobe/cellfinder/blob/dc1d740589f697680f3868f4a4a0662c1fef1616/cellfinder/core/tools/threading.py){preview-link="true" style="text-align: center"}
:::
::: {style="text-align: center; margin-top: 1em"}
[test_threading.py](https://github.com/brainglobe/cellfinder/blob/dc1d740589f697680f3868f4a4a0662c1fef1616/tests/core/test_unit/test_tools/test_threading.py){preview-link="true" style="text-align: center"}
:::
## Volume Filter
::: {style="text-align: center; margin-top: 1em"}
[Volume filter](https://github.com/brainglobe/cellfinder/blob/dc1d740589f697680f3868f4a4a0662c1fef1616/cellfinder/core/detect/filters/volume/volume_filter.py){preview-link="true" style="text-align: center"}
:::
## Performance and results
::: {style="text-align: center; margin-top: 1em"}
[Matt's benchmarks](https://github.com/brainglobe/cellfinder/pull/440){preview-link="true" style="text-align: center"}
:::
## Performance?
CFos data of Nic Lavoie (MIT) on our HPC, with GPU:
* old version of cellfinder: 9 hours for ~3 Mio cell candidates
* new version of cellfinder: 2 hours for ~3 Mio cell candidates
## Next steps
* Turn these slides in docs with nice explanatory images
* Tweak PR 440 (expose extra parameters)
* merge and release!
## Concluding thoughts
* I still don't understand everything
* There are ways to parallelise Python (and pytorch)
* Processes and threads are appropriate in different situations...
* ... "optimisation" of code is empirical to some extent