Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Anonymized stats through buckets #878

Open
wants to merge 74 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
74 commits
Select commit Hold shift + click to select a range
28d1b06
start to add code for bucket comparison
rw0x0 Nov 5, 2024
2613064
progress on buckets
rw0x0 Nov 5, 2024
dc72804
.
rw0x0 Nov 5, 2024
e6d9d75
.
rw0x0 Nov 5, 2024
f71d428
.
rw0x0 Nov 5, 2024
0c3ff87
.
rw0x0 Nov 5, 2024
363248a
.
rw0x0 Nov 6, 2024
82ea801
send/receive u32 with chacah encryption
rw0x0 Nov 6, 2024
538d954
bitinject
rw0x0 Nov 6, 2024
03f7cd8
compare multiple thresholds function done, untested
rw0x0 Nov 6, 2024
9e03c3e
also make <=
rw0x0 Nov 8, 2024
2926cdd
fix: Also fix the threshold comparison
rw0x0 Nov 18, 2024
dfc60d7
Merge branch 'main' of github.com:worldcoin/gpu-iris-mpc into rw/buckets
philsippl Jan 3, 2025
64a3fe4
wip: keep results around
philsippl Jan 3, 2025
9bf5c6d
wip
philsippl Jan 5, 2025
70bce2e
clippy
philsippl Jan 5, 2025
0f26aa5
wip
philsippl Jan 5, 2025
c2ba1eb
fix: ignore phantom matchers
philsippl Jan 5, 2025
05528e5
wip
philsippl Jan 9, 2025
0b9a511
wip
philsippl Jan 11, 2025
2db44f3
fix: wrong kernel for assign_u32 function
rw0x0 Jan 13, 2025
e29473b
fix: use correct buffer in buckets-GPU function
rw0x0 Jan 13, 2025
c9c60a2
feat: adapt threshold test for new test strucuture
rw0x0 Jan 13, 2025
4d26e4e
minor fix
rw0x0 Jan 13, 2025
5769bef
.
rw0x0 Jan 13, 2025
5a9d060
another testcase adapted
rw0x0 Jan 13, 2025
2219f0d
another testcase adapted
rw0x0 Jan 13, 2025
d664b18
.
rw0x0 Jan 13, 2025
0f9e401
another testcase adapted
rw0x0 Jan 13, 2025
1b54a0b
.
rw0x0 Jan 13, 2025
7a902ec
another testcase adapted
rw0x0 Jan 13, 2025
b35c61a
.
rw0x0 Jan 13, 2025
58f6b5b
.
rw0x0 Jan 13, 2025
20b2cd6
add new test case for bucketing
rw0x0 Jan 13, 2025
ec1235d
.
rw0x0 Jan 13, 2025
63bbc11
.
rw0x0 Jan 13, 2025
784747a
fix an error
rw0x0 Jan 13, 2025
6679731
add another test
rw0x0 Jan 13, 2025
7c5dbed
.
rw0x0 Jan 13, 2025
92bcd5e
add buckets test
rw0x0 Jan 13, 2025
2aa8920
.
rw0x0 Jan 13, 2025
87357cc
.
rw0x0 Jan 13, 2025
d9c4c7e
wip
philsippl Jan 14, 2025
6dcc548
use the open_bucket function in the testcase
rw0x0 Jan 14, 2025
6a38c93
.
rw0x0 Jan 14, 2025
ea3eff4
make open_buckets not overwrite input
rw0x0 Jan 14, 2025
2370a6d
function for threshold translation
rw0x0 Jan 14, 2025
766ed20
.
rw0x0 Jan 14, 2025
611f058
match_distances_counter_idx
philsippl Jan 14, 2025
599d06f
minor
rw0x0 Jan 14, 2025
ceff1c1
fix an int/size_t error for kernels!
rw0x0 Jan 14, 2025
a4fd3fe
sort results for consistency across nodes
philsippl Jan 15, 2025
e09fc46
clean + clippy
philsippl Jan 15, 2025
0f64f36
cleanup and fixes
philsippl Jan 16, 2025
ca5cea0
clippy
philsippl Jan 16, 2025
4cfc360
remove keeping two results around
philsippl Jan 21, 2025
e40dc40
Merge branch 'main' of https://github.com/worldcoin/iris-mpc into ps/…
philsippl Jan 21, 2025
09b8643
up
philsippl Jan 21, 2025
802dacf
add synchronize streams after loading to GPU in gpu_dependant testcases
rw0x0 Jan 22, 2025
da60df6
clippy fix
rw0x0 Jan 22, 2025
a15e9ac
debug for testcases
rw0x0 Jan 22, 2025
fd6d3ec
.
rw0x0 Jan 22, 2025
0d6be5d
.
rw0x0 Jan 22, 2025
db43027
fix the len issues in sending/receiving
rw0x0 Jan 22, 2025
239be6f
fix of fix
rw0x0 Jan 22, 2025
03ef5c8
remove or-tree test
rw0x0 Jan 22, 2025
1681616
fix?
rw0x0 Jan 22, 2025
dff640c
buckets as config
carlomazzaferro Jan 24, 2025
a2c9626
Ps/buckets config improvements (#965)
carlomazzaferro Jan 24, 2025
7d61283
merge main
carlomazzaferro Jan 25, 2025
4d15e1a
Merge branch 'ps/buckets' of github.com:worldcoin/iris-mpc into ps/bu…
carlomazzaferro Jan 25, 2025
fad8d51
revert nccl changes
carlomazzaferro Jan 25, 2025
f33fdf4
custom image build
carlomazzaferro Jan 25, 2025
4d28ac6
custom image deployment
carlomazzaferro Jan 25, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/temp-branch-build-and-push.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ name: Branch - Build and push docker image
on:
push:
branches:
- "reduce-size-docker-image"
- "ps/buckets"

concurrency:
group: '${{ github.workflow }} @ ${{ github.event.pull_request.head.label || github.head_ref || github.ref }}'
Expand Down
5 changes: 2 additions & 3 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion deploy/stage/common-values-iris-mpc.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
image: "ghcr.io/worldcoin/iris-mpc:v0.13.20"
image: "ghcr.io/worldcoin/iris-mpc:f33fdf4bd38feeb4bf0258c6c3d0226bdf4786fd"

environment: stage
replicaCount: 1
Expand Down
6 changes: 6 additions & 0 deletions deploy/stage/smpcv2-0-stage/values-iris-mpc.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,12 @@ env:
- name: SMPC__MAX_BATCH_SIZE
value: "64"

- name: SMPC__MATCH_DISTANCES_BUFFER_SIZE
value: "128"

- name: SMPC__N_BUCKETS
value: "10"

- name: SMPC__SERVICE__METRICS__HOST
valueFrom:
fieldRef:
Expand Down
6 changes: 6 additions & 0 deletions deploy/stage/smpcv2-1-stage/values-iris-mpc.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,12 @@ env:
- name: SMPC__MAX_BATCH_SIZE
value: "64"

- name: SMPC__MATCH_DISTANCES_BUFFER_SIZE
value: "128"

- name: SMPC__N_BUCKETS
value: "10"

- name: SMPC__SERVICE__METRICS__HOST
valueFrom:
fieldRef:
Expand Down
6 changes: 6 additions & 0 deletions deploy/stage/smpcv2-2-stage/values-iris-mpc.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,12 @@ env:
- name: SMPC__MAX_BATCH_SIZE
value: "64"

- name: SMPC__MATCH_DISTANCES_BUFFER_SIZE
value: "128"

- name: SMPC__N_BUCKETS
value: "10"

- name: SMPC__SERVICE__METRICS__HOST
valueFrom:
fieldRef:
Expand Down
14 changes: 14 additions & 0 deletions iris-mpc-common/src/config/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,12 @@ pub struct Config {

#[serde(default)]
pub fixed_shared_secrets: bool,

#[serde(default = "default_match_distances_buffer_size")]
pub match_distances_buffer_size: usize,

#[serde(default = "default_n_buckets")]
pub n_buckets: usize,
}

fn default_load_chunks_parallelism() -> usize {
Expand Down Expand Up @@ -145,6 +151,14 @@ fn default_db_load_safety_overlap_seconds() -> i64 {
60
}

fn default_match_distances_buffer_size() -> usize {
1 << 20
}

fn default_n_buckets() -> usize {
10
}

impl Config {
pub fn load_config(prefix: &str) -> eyre::Result<Config> {
let settings = config::Config::builder();
Expand Down
5 changes: 4 additions & 1 deletion iris-mpc-gpu/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,10 @@ repository.workspace = true

[dependencies]
bincode = "1.3.3"
cudarc = { version = "0.13.3", features = ["cuda-12020", "nccl"] }
cudarc = { git = "https://github.com/worldcoin/cudarc-fork.git", features = [
"cuda-12020",
"nccl",
] }
eyre.workspace = true
tracing.workspace = true
bytemuck.workspace = true
Expand Down
12 changes: 3 additions & 9 deletions iris-mpc-gpu/src/bin/nccl.rs
Original file line number Diff line number Diff line change
Expand Up @@ -85,15 +85,9 @@ async fn main() -> eyre::Result<()> {
for i in 0..n_devices {
devs[i].bind_to_thread().unwrap();

comms[i]
.broadcast(slices[i].as_ref(), &mut slices1[i], 0)
.unwrap();
comms[i]
.broadcast(slices[i].as_ref(), &mut slices2[i], 1)
.unwrap();
comms[i]
.broadcast(slices[i].as_ref(), &mut slices3[i], 2)
.unwrap();
comms[i].broadcast(&slices[i], &mut slices1[i], 0).unwrap();
comms[i].broadcast(&slices[i], &mut slices2[i], 1).unwrap();
comms[i].broadcast(&slices[i], &mut slices3[i], 2).unwrap();
}

for dev in devs.iter() {
Expand Down
154 changes: 148 additions & 6 deletions iris-mpc-gpu/src/dot/distance_comparator.rs
Original file line number Diff line number Diff line change
@@ -1,23 +1,31 @@
use super::ROTATIONS;
use crate::helpers::{
device_manager::DeviceManager, launch_config_from_elements_and_threads,
DEFAULT_LAUNCH_CONFIG_THREADS,
use crate::{
helpers::{
device_manager::DeviceManager, launch_config_from_elements_and_threads,
DEFAULT_LAUNCH_CONFIG_THREADS,
},
threshold_ring::protocol::{ChunkShare, ChunkShareView},
};
use cudarc::{
driver::{CudaFunction, CudaSlice, CudaStream, CudaView, LaunchAsync},
driver::{
result::{launch_kernel, memset_d8_sync},
sys, CudaFunction, CudaSlice, CudaStream, CudaView, DevicePtr, DeviceSlice, LaunchAsync,
},
nvrtc::compile_ptx,
};
use std::{cmp::min, sync::Arc};
use std::{cmp::min, ffi::c_void, sync::Arc};

const PTX_SRC: &str = include_str!("kernel.cu");
const OPEN_RESULTS_FUNCTION: &str = "openResults";
const OPEN_RESULTS_BATCH_FUNCTION: &str = "openResultsBatch";
const MERGE_DB_RESULTS_FUNCTION: &str = "mergeDbResults";
const MERGE_BATCH_RESULTS_FUNCTION: &str = "mergeBatchResults";
const ALL_MATCHES_LEN: usize = 256;

pub struct DistanceComparator {
pub device_manager: Arc<DeviceManager>,
pub open_kernels: Vec<CudaFunction>,
pub open_batch_kernels: Vec<CudaFunction>,
pub merge_db_kernels: Vec<CudaFunction>,
pub merge_batch_kernels: Vec<CudaFunction>,
pub query_length: usize,
Expand All @@ -37,6 +45,7 @@ impl DistanceComparator {
pub fn init(query_length: usize, device_manager: Arc<DeviceManager>) -> Self {
let ptx = compile_ptx(PTX_SRC).unwrap();
let mut open_kernels: Vec<CudaFunction> = Vec::new();
let mut open_batch_kernels: Vec<CudaFunction> = Vec::new();
let mut merge_db_kernels = Vec::new();
let mut merge_batch_kernels = Vec::new();
let mut opened_results = vec![];
Expand All @@ -58,12 +67,15 @@ impl DistanceComparator {
device
.load_ptx(ptx.clone(), "", &[
OPEN_RESULTS_FUNCTION,
OPEN_RESULTS_BATCH_FUNCTION,
MERGE_DB_RESULTS_FUNCTION,
MERGE_BATCH_RESULTS_FUNCTION,
])
.unwrap();

let open_results_function = device.get_func("", OPEN_RESULTS_FUNCTION).unwrap();
let open_results_batch_function =
device.get_func("", OPEN_RESULTS_BATCH_FUNCTION).unwrap();
let merge_db_results_function = device.get_func("", MERGE_DB_RESULTS_FUNCTION).unwrap();
let merge_batch_results_function =
device.get_func("", MERGE_BATCH_RESULTS_FUNCTION).unwrap();
Expand All @@ -90,13 +102,15 @@ impl DistanceComparator {
);

open_kernels.push(open_results_function);
open_batch_kernels.push(open_results_batch_function);
merge_db_kernels.push(merge_db_results_function);
merge_batch_kernels.push(merge_batch_results_function);
}

Self {
device_manager,
open_kernels,
open_batch_kernels,
merge_db_kernels,
merge_batch_kernels,
query_length,
Expand All @@ -115,6 +129,85 @@ impl DistanceComparator {

#[allow(clippy::too_many_arguments)]
pub fn open_results(
&self,
results1: &[CudaView<u64>],
results2: &[CudaView<u64>],
results3: &[CudaView<u64>],
matches_bitmap: &[CudaSlice<u64>],
db_sizes: &[usize],
real_db_sizes: &[usize],
offset: usize,
total_db_sizes: &[usize],
ignore_db_results: &[bool],
match_distances_buffers_codes: &[ChunkShare<u16>],
match_distances_buffers_masks: &[ChunkShare<u16>],
match_distances_counters: &[CudaSlice<u32>],
match_distances_indices: &[CudaSlice<u32>],
code_dots: &[ChunkShareView<u16>],
mask_dots: &[ChunkShareView<u16>],
batch_size: usize,
max_bucket_distances: usize,
streams: &[CudaStream],
) {
for i in 0..self.device_manager.device_count() {
// Those correspond to 0 length dbs, which were just artificially increased to
// length 1 to avoid division by zero in the kernel
if ignore_db_results[i] {
continue;
}
let num_elements = (db_sizes[i] * self.query_length).div_ceil(64);
let threads_per_block = DEFAULT_LAUNCH_CONFIG_THREADS; // ON CHANGE: sync with kernel
let cfg = launch_config_from_elements_and_threads(
num_elements as u32,
threads_per_block,
&self.device_manager.devices()[i],
);
self.device_manager.device(i).bind_to_thread().unwrap();

let ptr_param = |ptr: *const sys::CUdeviceptr| ptr as *mut c_void;
let usize_param = |val: &usize| val as *const usize as *mut _;

let params = &mut [
// Results arrays
ptr_param(results1[i].device_ptr()),
ptr_param(results2[i].device_ptr()),
ptr_param(results3[i].device_ptr()),
ptr_param(matches_bitmap[i].device_ptr()),
usize_param(&db_sizes[i]),
usize_param(&(batch_size * ROTATIONS)),
usize_param(&offset),
usize_param(&num_elements),
usize_param(&real_db_sizes[i]),
usize_param(&total_db_sizes[i]),
ptr_param(match_distances_buffers_codes[i].a.device_ptr()),
ptr_param(match_distances_buffers_codes[i].b.device_ptr()),
ptr_param(match_distances_buffers_masks[i].a.device_ptr()),
ptr_param(match_distances_buffers_masks[i].b.device_ptr()),
ptr_param(match_distances_counters[i].device_ptr()),
ptr_param(match_distances_indices[i].device_ptr()),
ptr_param(code_dots[i].a.device_ptr()),
ptr_param(code_dots[i].b.device_ptr()),
ptr_param(mask_dots[i].a.device_ptr()),
ptr_param(mask_dots[i].b.device_ptr()),
usize_param(&max_bucket_distances),
];

unsafe {
launch_kernel(
self.open_kernels[i].cu_function(),
cfg.grid_dim,
cfg.block_dim,
0,
streams[i].stream,
params,
)
.unwrap();
}
}
}

#[allow(clippy::too_many_arguments)]
pub fn open_batch_results(
&self,
results1: &[CudaView<u64>],
results2: &[CudaView<u64>],
Expand Down Expand Up @@ -143,7 +236,7 @@ impl DistanceComparator {
self.device_manager.device(i).bind_to_thread().unwrap();

unsafe {
self.open_kernels[i]
self.open_batch_kernels[i]
.clone()
.launch_on_stream(
&streams[i],
Expand Down Expand Up @@ -347,4 +440,53 @@ impl DistanceComparator {
})
.collect::<Vec<_>>()
}

pub fn prepare_match_distances_buffer(&self, max_size: usize) -> Vec<ChunkShare<u16>> {
(0..self.device_manager.device_count())
.map(|i| {
let a = self.device_manager.device(i).alloc_zeros(max_size).unwrap();
let b = self.device_manager.device(i).alloc_zeros(max_size).unwrap();

self.device_manager.device(i).bind_to_thread().unwrap();
unsafe {
memset_d8_sync(*a.device_ptr(), 0xff, a.num_bytes()).unwrap();
memset_d8_sync(*b.device_ptr(), 0xff, b.num_bytes()).unwrap();
}

ChunkShare::new(a, b)
})
.collect::<Vec<_>>()
}

pub fn prepare_match_distances_counter(&self) -> Vec<CudaSlice<u32>> {
(0..self.device_manager.device_count())
.map(|i| self.device_manager.device(i).alloc_zeros(1).unwrap())
.collect::<Vec<_>>()
}

pub fn prepare_match_distances_index(&self, max_size: usize) -> Vec<CudaSlice<u32>> {
(0..self.device_manager.device_count())
.map(|i| {
let a = self.device_manager.device(i).alloc_zeros(max_size).unwrap();
unsafe {
memset_d8_sync(*a.device_ptr(), 0xff, a.num_bytes()).unwrap();
}
a
})
.collect::<Vec<_>>()
}

pub fn prepare_match_distances_buckets(&self, n_buckets: usize) -> ChunkShare<u32> {
let a = self
.device_manager
.device(0)
.alloc_zeros(n_buckets)
.unwrap();
let b = self
.device_manager
.device(0)
.alloc_zeros(n_buckets)
.unwrap();
ChunkShare::new(a, b)
}
}
Loading
Loading