Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve zkey load time #25

Closed
oskarth opened this issue Oct 31, 2023 · 13 comments
Closed

Improve zkey load time #25

oskarth opened this issue Oct 31, 2023 · 13 comments
Labels
perf Performance related issues

Comments

@oskarth
Copy link
Collaborator

oskarth commented Oct 31, 2023

Problem

It is slow to load zkey.

cargo test --release test_generate_proof2 -- --nocapture shows it takes ~70-80s to load in zkey for a ~80mb file (Keccak256).

Zkey is ~80mb but would expect this to be much faster.

This seems to be the case regardless of if we are using dylib or not.

Approaches

Two ideas:

  1. Figure out how to improve in terms of reading in the zkey file (maybe memmap or something, but this adds complexity for iOS integration)
  2. Do this separately and persist arkworks ProvingKey in a better way (e.g. serialize to disk beforehand)

Acceptance criteria

Much faster loading of zkey, or other way to improve this type of load up time

@oskarth
Copy link
Collaborator Author

oskarth commented Nov 3, 2023

I confirm that this is a problem on iOS device too. My hunch is this will be improved using dylib.

Still seems weird to me this is so slow though.

@oskarth
Copy link
Collaborator Author

oskarth commented Nov 3, 2023

Same basic behavior. With this we isolate this to initializing of library.

https://github.com/oskarth/mopro/blob/main/mopro-core/src/middleware/circom/mod.rs#L87 for timing

Gonna leave this for now but two approaches:

  1. Figure out how to improve in terms of reading in the zkey file (maybe memmap or something, but this adds complexity for iOS integration)
  2. Do this separately and persist arkworks ProvingKey in a better way (e.g. serialize to disk beforehand)

@oskarth oskarth changed the title Figure out why zkey takes so long to load in Improve zkey load time Nov 3, 2023
@oskarth
Copy link
Collaborator Author

oskarth commented Nov 3, 2023

We should be able to preprocess zkey and then quickly load proving key and matrices from disk:

// Reads a SnarkJS ZKey file into an Arkworks ProvingKey.
pub fn read_zkey<R: Read + Seek>(
    reader: &mut R,
) -> IoResult<(ProvingKey<Bn254>, ConstraintMatrices<Fr>)> {
    let mut binfile = BinFile::new(reader)?;
    let proving_key = binfile.proving_key()?;
    let matrices = binfile.matrices()?;
    Ok((proving_key, matrices))
}

@oskarth
Copy link
Collaborator Author

oskarth commented Nov 8, 2023

I tried out some things here #26 and it seems like the slow part is the ProvingKey. I think this is because it is doing a bunch of EC operations etc, so there might be a more efficient way to do serialization field by field.

running 1 test
Reading zkey from: ../mopro-core/examples/circom/keccak256/target/keccak256_256_test_final.zkey

test tests::test_serialization_deserialization has been running for over 60 seconds
Time to read zkey: 99.439572834s
Writing arkzkey to: ../mopro-core/examples/circom/keccak256/target/keccak256_256_test_final.arkzkey
Time to write zkey: 21.54349525s
Reading arkzkey from: ../mopro-core/examples/circom/keccak256/target/keccak256_256_test_final.arkzkey
...
Time to read arkzkey: 107.505148542s
test tests::test_serialization_deserialization ... ok

@oskarth
Copy link
Collaborator Author

oskarth commented Nov 10, 2023

More complete log for ark-zkey crate. Notice the "time to deserialize proving key".

Reading zkey from: ../mopro-core/examples/circom/keccak256/target/keccak256_256_test_final.zkey
test tests::test_serialization_deserialization has been running for over 60 seconds
...Time to read zkey: 132.629060542s
Writing arkzkey to: ../mopro-core/examples/circom/keccak256/target/keccak256_256_test_final.arkzkey
Time to write zkey: 9.867594125s
Reading arkzkey from: ../mopro-core/examples/circom/keccak256/target/keccak256_256_test_final.arkzkey
Time to open arkzkey file: 55.25µs
Time to mmap: 17.958µs
 
Time to deserialize proving key: 85.626978917s
Time to deserialize matrices: 107.982625ms
Time to read arkzkey: 85.735258334s
test tests::test_serialization_deserialization ... ok

@oskarth
Copy link
Collaborator Author

oskarth commented Nov 10, 2023

Improvements using custom serialization for arkworks proving key and matrices, and using unchecked https://github.com/oskarth/mopro/tree/main/ark-zkey

This is about x10 better (Result: 18s vs 158s for keccak256) but can't help but to think this can be improved quite a lot more.

@oskarth oskarth mentioned this issue Nov 11, 2023
3 tasks
@oskarth oskarth added moperf Project MoPerf and removed moperf Project MoPerf labels Jan 26, 2024
@oskarth
Copy link
Collaborator Author

oskarth commented Feb 21, 2024

This is the library we want to try https://github.com/rkyv/rkyv

@oskarth oskarth added the perf Performance related issues label Feb 23, 2024
@oskarth
Copy link
Collaborator Author

oskarth commented Mar 6, 2024

For checking perf: https://github.com/mstange/samply

@oskarth
Copy link
Collaborator Author

oskarth commented Apr 15, 2024

Think we do this already but might be worth sanity checking arkworks-rs/circom-compat#46 (comment)

@vivianjeng
Copy link
Collaborator

Suggestions from @vimwitch

For #25 the problem isn't loading the file into memory, this happens in < 1 ms. The problem is parsing the bytes into usable data. In this file the zkey is being parsed: https://github.com/arkworks-rs/circom-compat/blob/master/src/zkey.rs

The process is simple, it iterates over the bytes and extracts certain sections as bigint (32, 64, 256 bit) numbers. The problem is on line 347 and 358. These lines call into this function: https://github.com/arkworks-rs/algebra/blob/master/ec/src/models/short_weierstrass/affine.rs#L72

This function asserts that the point is on the curve and in the correct subgroup. These checks have to be performed for thousands of points and take 99% of the time in the profiler. I forked circom-compat and switched to use new_unchecked (line 89) and i can parse the example zkey in 100 ms:

warning: `ark-zkey` (example "zkey-bench") generated 1 warning
    Finished release [optimized] target(s) in 2.33s
     Running `/home/chance/work/mopro/target/release/examples/zkey-bench`
Reading zkey from: ../mopro-core/examples/circom/keccak256/target/keccak256_256_test_final.zkey
Time to load zkey into memory: 7.917µs
Time to read zkey: 116.407183ms
Serializing proving key and constraint matrices
Time to serialize proving key and constraint matrices: 101ns
[build] Writing arkzkey to: ../mopro-core/examples/circom/keccak256/target/keccak256_256_test_final.arkzkey
[build] Time to write arkzkey: 4.300008685s
Reading arkzkey from: ../mopro-core/examples/circom/keccak256/target/keccak256_256_test_final.arkzkey
Time to open arkzkey file: 7.416µs
Time to mmap arkzkey: 6.154µs
Time to deserialize proving key: 7.379545612s
Time to deserialize matrices: 24.512168ms
Time to read arkzkey: 7.40547119s

You can test this by switching the circom-compat dependency in ark-zkey to use https://github.com/vimwitch/circom-compat.git then running the tests. They should execute significantly faster.

I think using new_unchecked is safe. snarkjs does not check curve/group membership here: https://github.com/iden3/snarkjs/blob/master/src/zkey_utils.js#L195. It might be good to ask some other people about this though.

@oskarth
Copy link
Collaborator Author

oskarth commented Apr 21, 2024

Oh awesome! I thought we used new_unchecked already. Yes indeed, the problem are those EC checks.

Out of curiosity, how exactly did you profile to find those exact lines being the problem? I didn't spend enough time to get a good profile setup but that makes sense. Easier than doing the zero-copy serialization probably.

As for as I know it should be OK to use new_unchecked. Especially if snarkjs does this as well.

My suggestion would be that we merge it, and we can create an issue to understand security assumptions better.

In any case, for serious production use, we'd want to audit code and libraries more. Right now most of the stack (including dependencies like ark-groth16 afaik) are unaudited. So in the grand scheme of things I don't think this is especially unsafe (famous last words) compared to general attack surface.

@chancehudson
Copy link
Collaborator

I use samply for profiling rust stuff. I took the test that loads the zkey and copied it into an example executable then let it run for a few seconds with the profiler. I don't know how to run the tests with samply attached, i think it might not be possible. This is the profile: https://share.firefox.dev/3UohBEa

@vivianjeng
Copy link
Collaborator

fixed by #129

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
perf Performance related issues
Projects
None yet
Development

No branches or pull requests

3 participants