Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AVM: new teal opcodes for the MiMC hash function to support Zero Knowledge Proofs #5978

Merged
merged 21 commits into from
Dec 16, 2024

Conversation

giuliop
Copy link
Contributor

@giuliop giuliop commented Apr 18, 2024

This PR adds two new teal opcode: mimc_BN254 and mimc_BLS12_381, implementing the MiMC hash function on the curve field of BN254 and BLS12-381.

Rationale

The objective of this PR is to make zero knowledge proof applications on Algorand practical.
It is now possible to define a zk-circuit with gnark and generate an Algorand smart contract verifier with AlgoPlonk.

A critical building block for interesting applications is a hash function. Consider for instance zk-proofs for inclusion in a merkle tree which are now possible on Algorand as in this example.

To manage merkle tree updates in a smart contract and merkle proofs in a zk-circuit we need a matching hash function both in the AVM and the zk-circuit.

Traditional hash functions such as the ones currently available in the AVM are not well suited for zk-circuits making them extremely large and not practical. For this reason, zk-friendly hash functions have been developed.
These hash functions keep the size of the zk-circuits implementing them small enough for practical applications.

Why MiMC

zk-friendly hash functions are an area of active research and no standard has been set.
While more efficient zk-friendly hash functions than MiMC have been presented, we propose to introduce MiMC for the following reasons:

  1. MiMC is the oldest of the group, having been presented in 2016 and the most battle-tested one, having been used in various application, including Tornado Cash. As such, it is the one offering the highest plausible security.

  2. Choosing a different one to optimize for performance feels like premature optimization given the active research in the space. MiMC is performant enough to allow practical applications on the AVM today (as we'll discuss in the benchmark section), and we are better off waiting for research to mature and standards to emerge before choosing a hash function based on performance.

  3. gnark provides an zk-circuit implementation of MiMC in its std library which can be immediately leveraged with AlgoPlonk, and at the same time gnark-crypto - which already powers the elliptical curve operations used in go-algorand - provides a matching MiMC implementation that we can use in go-algorand, as done by this PR.

For these reasons we see MiMC as the most pragmatic choice to enable zk applications on Algorand today.
Developers will be immediately able to define circuits with gnark, deploy a verifier with AlgoPlonk, and use MiMC on both ends to build applications.

Design Consideration

The implementation of MiMC in the AVM presented a number of design choices that are discussed here.

Number of opcodes.

This PR proposes to add two new opcodes: mimc_BN254 and mimc_BLS12_381.

MiMC operates in a curve field, hashing its input in blocks of the same byte size as the curve modulus and seeing each block as an integer modulo the field modulus. In a zk-circuit based on elliptic curve cryptography such as those used by the plonk protocol you would use the MiMC function operating in the chosen curve.

Since the AVM offers elliptic curve operations on the curves BN254 and BLS12_381, we propose to add MiMC opcodes for those two curves.

The design alternative would be to introduce a single opcode mimc that accepts an additional parameter representing the chosen curve. There is the EC constant already in the AVM but it represent groups on curves, not curves. So a new constant would need to be introduced which would be a perfectly fine design choice as well.

Opcodes signature

The new opcodes accept a byte array as input on the stack and result in a 32-byte array output on the stack.
The opcodes require an input of size multiple of 32 bytes or will fail.
Moreover, they split the input in 32-byte chunks, interpret them as big-endian unsigned integers, and reduce them by the curve modulus before writing them to the hasher.
In fact, a call to the underlying gnark-crypto hasher.Write would fail if the input is not a multiple of 32 bytes or if any 32-byte chunk of the input represents a number bigger than the curve modulus.

The design alternatives are two, which are orthogonal:

  • accept an input size not multiple of 32 bytes without failing and pad the last chuck with 0 bytes to the left so that it represents the same number before writing to the hasher

  • let it fail if any 32-byte chunk represents a number bigger than the curve modulus

For the former we chose not to pad since it can be done efficiently on the smart contract client.
For the latter we chose to avoid putting the onus of modulo operations on the smart contract client but implement them directly

Benchmarking

We extended the BenchmarkHashes in go-algorand/data/transactions/logic/crypto_test.go to include the mimc opcodes to benchmark them against the other hash function already provided in the AVM.

Running go test -bench=BenchmarkHashes and focusing on the output for keccak256, sha256, and mimc_BLS12_381 (mimc_BN254 has similar profile to mimc_BLS12_381) we get the following:

ns/op ns/op/32 byte
BenchmarkHashes/keccak256-32-10 618 618
BenchmarkHashes/keccak256-128-10 598 150
BenchmarkHashes/keccak256-512-10 1613 101
BenchmarkHashes/keccak256-1024-10 2981 93
BenchmarkHashes/keccak256-4096-10 10726 84
BenchmarkHashes/sha256-32-10 119 119
BenchmarkHashes/sha256-128-10 179 45
BenchmarkHashes/sha256-512-10 380 24
BenchmarkHashes/sha256-1024-10 653 20
BenchmarkHashes/sha256-4096-10 2271 18
BenchmarkHashes/mimc_BLS12_381-32-10 6909 6909
BenchmarkHashes/mimc_BLS12_381-128-10 27834 6959
BenchmarkHashes/mimc_BLS12_381-512-10 115213 7201
BenchmarkHashes/mimc_BLS12_381-1024-10 234765 7336
BenchmarkHashes/mimc_BLS12_381-4096-10 963742 7529

We can observe that:

  • mimc_BLS12_381 native cost is directly proportional to its input size, so its AVM cost should be a function of that

  • keccak256 and sha256, which have a fixed AVM cost, have a native cost which increases less than linearly with the input size, as their native "cost per 32-byte chunk" decreases with the input size, roughly "stabilizing" around 512-byte inputs.

  • For 512-byte input the mimc hash has 71x the cost of keccak and 303x the cost of sha256, and multiplying those factors for the their native teal cost of 130 and 35 respectively, we get a range of 9,300-10,600 teal cost for mimc, or 580-663 teal cost per 32-byte of input.

So we propose in this PR that the cost of the new mimc opcodes be 620 for each 32-byte of input.

Does this makes it practical to use it?

Let's consider a big merkle tree, e.g, a 32 levels one that allows for 2**32 (over four billion) insertions.
We can store it on the blockchain in compact representation by storing only the last inserted leaf, its sibling, and all the siblings of the parent nodes up to the root. Updating this representation requires 32 mimc hashes each taking two 32-byte inputs, for a total teal cost of 32 * 620 * 2 = 39,680, a manageable budget cost.

By contrast, a native teal implementation of mimc which we tried as well shows a cost of ~20,000 per 32-byte of input, making for instance managing any non-toy merkle tree inpractical.

Testing

We extended the *_test.go files in package logic to include the new opcodes and make all tests pass running
go test in data/transactions/logic

No test fails running make test

Running make integration gives error for these tests:

  • test/e2e-go/upgrades TestKeysWithoutStateProofKeyCanRegister
  • test/e2e-go/features/transactions TestAssetValidRounds

Both tests fail with:
Error: Received unexpected error: exit status 66
We suspect this is caused by a system configuration issue and look forward to feedback from the PR reviewers on it.

An integration test for the mimc opcodes exercising them in a smart contract on a private net is here:
https://github.com/giuliop/test-mimc-opcodes
We did not include that in the PR not knowing if/how it was appropriate to include them.

No issues reported by make sanity


Let's add the mimc opcodes to the AVM and bring zero knowledge proofs on Algorand today

@CLAassistant
Copy link

CLAassistant commented Apr 18, 2024

CLA assistant check
All committers have signed the CLA.

Copy link

codecov bot commented Apr 18, 2024

Codecov Report

Attention: Patch coverage is 79.03226% with 13 lines in your changes missing coverage. Please review.

Project coverage is 51.89%. Comparing base (f87ae8a) to head (59376ee).
Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
data/transactions/logic/fields.go 72.72% 5 Missing and 1 partial ⚠️
data/transactions/logic/crypto.go 82.60% 3 Missing and 1 partial ⚠️
data/transactions/logic/fields_string.go 50.00% 1 Missing and 1 partial ⚠️
cmd/opdoc/opdoc.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5978      +/-   ##
==========================================
+ Coverage   51.85%   51.89%   +0.04%     
==========================================
  Files         639      639              
  Lines       85508    85553      +45     
==========================================
+ Hits        44336    44394      +58     
+ Misses      38356    38343      -13     
  Partials     2816     2816              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jannotti
Copy link
Contributor

I'm broadly positive on this. You make a pretty good case for this particular hash function, but I wonder what you think of sumhash512. If I understand correctly, we used it in state proofs exactly because it was amenable to zk proofs. It's much faster than this.

@giuliop
Copy link
Contributor Author

giuliop commented Apr 18, 2024

That is of course a very good question, and here are my views.
Also let me say that I am not a cryptographer but an "applied" guy so happy to be challenged.

The short version is that sumhash512 results in much larger zk-circuits than mimc in the context of arithmetic circuits as the ones used in the plonk protocol.

The longer version is that the only gates (operations) available in these circuits are addition and multiplication gates, and the inputs/outputs of these gates are finite field elements (big numbers modulo a prime p). MIMC was designed to create the smallest possible circuits by using only these native building blocks, that is addition and multiplication operations on integers modulo the field modulus (that's why mimc_BN254 is different than mimc_BLS12_381, the modulus is different to match the native modulus of the circuit).

sumhash512 is much faster because it uses bitwise operations and a lookup table, things that are very costly (as in number of gates needed to represent them) if you only have addition and multiplication and variables that are large integers modulo p.

So in a spectrum going from slow/small circuit size to fast/large circuit size you would have:
mimc ------> sumhas512 -------> sha256

There is research ongoing for more advanced zk-circuits that offer more complex gates which can change this equation in the future but in the present, as usual, it is all about tradeoffs and I think having a hash function that optimizes for the smallest possible practical circuits while offering battle-tested plausible security is worth having on the AVM right now

@cusma
Copy link

cusma commented Apr 19, 2024

Why MiMC

zk-friendly hash functions are an area of active research and no standard has been set. While more efficient zk-friendly hash functions than MiMC have been presented, we propose to introduce MiMC for the following reasons:

  1. MiMC is the oldest of the group, having been presented in 2016 and the most battle-tested one, having been used in various application, including Tornado Cash. As such, it is the one offering the highest plausible security.

Seems that MiMC is still a very recent hash function, not part of libsodium. Do you know if it has been proven secure under traditional assumptions and it is has already undergone or currently undergoing a standardization process?

The design alternative would be to introduce a single opcode mimc that accepts an additional parameter representing the chosen curve. There is the EC constant already in the AVM but it represent groups on curves, not curves. So a new constant would need to be introduced which would be a perfectly fine design choice as well.

I think this would be cleaner design.

Moreover, they split the input in 32-byte chunks, interpret them as big-endian unsigned integers, and reduce them by the curve modulus before writing them to the hasher.

Could this modulo reduction affect the behavior of the hash function to the point of making it insecure (e.g. not collision-resistant)?

@giuliop
Copy link
Contributor Author

giuliop commented Apr 19, 2024

Seems that MiMC is still a very recent hash function, not part of libsodium. Do you know if it has been proven secure under traditional assumptions and it is has already undergone or currently undergoing a standardization process?

You are correct, still it is the oldest of the zk-friendly hash functions, so any other one would be less optimal from that perspective. My understanding, and again I am not a cryptographer, is that for practical hash functions (as in not too inefficient to be used in practice) you cannot really prove them "secure", you can only prove them "insecure" :)
So what you can have is plausible security as the research community tries to break the hash function and doesn't succeed.
What I like about mimc is that it is used in Tornado Cash offering a big bounty to try to break it and it was not broken.

The design alternative would be to introduce a single opcode mimc that accepts an additional parameter representing the chosen curve. There is the EC constant already in the AVM but it represent groups on curves, not curves. So a new constant would need to be introduced which would be a perfectly fine design choice as well.

I think this would be cleaner design.

I agree it is a cleaner design. My reasoning not to go for it immediately was that it could make sense to introduce a constant to represent elliptic curves on the AVM if it was used by more opcodes than just mimc. So I thought we could wait for that to be the case and then introduce it in a new version of teal, moving the mimc opcodes to that paradigm then.

Moreover, they split the input in 32-byte chunks, interpret them as big-endian unsigned integers, and reduce them by the curve modulus before writing them to the hasher.

Could this modulo reduction affect the behavior of the hash function to the point of making it insecure (e.g. not collision-resistant)?

The modulo reduction is inherent in the zk-circuits operations since their inputs/outputs basically represent points on an elliptic curve which is a finite field. Also in a sense all practical hash functions do "reduction" since they reduce arbitrary length inputs to a fixed size output (e.g., 32 byte). So far MiMC has shown to be collision resistant, and again is the most battle-tested of all zk-friendly alternatives.

@johnalanwoods
Copy link

Discussed this on call today with @giuliop - I'm strongly supportive of adding MiMC, it would enable Plonk based ZK systems integrated with AlgoKIt.

@jannotti
Copy link
Contributor

It doesn't look like we have any test vectors here. I would like to see them. See, for example, TestSHA3_256 or TestSHA512_156 which is even better because it explains where the test vector comes from.

We should also have boring tests that show behaviour for 0 byte strings and non-multiples of 32.

I would also prefer to see this as a single opcode, mimc that takes a group as argument. I will work on that, since it might require improving the way we calculate costs so they can depend on both fields and on size of inputs. I forget whether we can do that right now.

I would prefer to not introduce new constants, and use the existing curve names used by the ec opcodes. I would like to understand mimc better so that I understand why it wants to be be given BN256 rather than the individual curves, BN256g1 and BN256g2. Does it actually use both?

@giuliop
Copy link
Contributor Author

giuliop commented May 13, 2024

I wrote the tests separately in https://github.com/giuliop/test-mimc-opcodes not knowing how the test system worked in go-algorand but it looks straightforward and will add them.

mimc does not use points in G1 or G2 but uses properties that are linked to the curve itself, so different between BN254 and BLS12-381. In particular it uses the curve modulus and a different set of constants.
So to select the correct mimc function we only need to know the curve, not the group.
We could make the opcode work by passing the existing EC field and just looking at the curve and treating g1/g2 the same.

@giuliop
Copy link
Contributor Author

giuliop commented May 13, 2024

Your concerns are well founded in the context of a normal hash function and what is different here is that we are mirroring the way mimc has to work in the context of a zk-circuit which has inherent limitations that any operation in it has to bear.

In fact, the zk circuit only offers two gates, addition and multiplication, which operate natively in the modulo of the curve field you define them in. Any input to those gates is seen modulo the curve modulus, any output coming from the gates too.

So you cannot escape that, any input to any function defined in the zk circuit will be equivalent to any other input modulo congruent. You have to think in modulo terms from the start and define your application accordingly.

I think that's why they have designed Write to return an error, so that the developers MUST be aware of what's happening on the zk circuit side of things and don't shoot themselves on the foot accidentally.

What we want on the AVM is simply a hash function that mirrors what happens in a zk circuit, so that we can write zk applications (e.g., manage a merkle tree on the AVM mimc hashing the nodes to compute the root and using the zk circuit to compute merkle proof without revealing the node value or the node index). We would not want to use such a hash function, with its inherent limitations, outside of a zk context.

That's also why we don't want to use WriteString in the AVM: that function simply takes any arbitrary input and reduces it modulo the curve modulus, which is quite inflexible. What we want is to be able to replicate on the AVM successive calls to Write with 32 byte chunks as we would do in the zk circuit so that the resulting hash when you call Sum would not have a known collision (except of course passing a new series of chucks each modulo congruent to its corresponding chunk in the original input).

In the end you've got to put yourself in the context of a zk circuit with its limitations and all then makes sense.

Happy to hop on a quick call to chat live if useful.

@giuliop
Copy link
Contributor Author

giuliop commented May 14, 2024

I've submitted this. I don't know what they'll say, because I don't know the state of standardization of mimc hashing. Consensys/gnark-crypto#504

I saw their reply, they are between a rock and hard place, I cannot think of a better design myself.
Simply modulo reducing any arbitrary input is weak collision wise, the only other sensible "default" which is what this PR implementation does on the avm, is to take the modulo of each 32 byte chunk, which they rejected to avoid the performance penalty.

This PR is forcing that penalty on the mimc user to avoid doing the modulo checking and reducing inside a smart contract which is less efficient than doing it on the node. Of course this penalizes applications where this would be unnecessary by design.
The alternative is to reduce the cost of the mimc function on the AVM and avoid the modulo reduction by default, having the users take care of that on the AVM if needed.
If you like that more I'm happy to change the PR in that direction, I am personally torn between the two options.

@jannotti
Copy link
Contributor

I've submitted this. I don't know what they'll say, because I don't know the state of standardization of mimc hashing. Consensys/gnark-crypto#504

... Simply modulo reducing any arbitrary input is weak collision wise, the only other sensible "default" which is what this PR implementation does on the avm, is to take the modulo of each 32 byte chunk, which they rejected to avoid the performance penalty.

I disagree. They rejected that on the grounds that it allows collisions. In their response to me they noted:

We discarded the solution of reducing the blocks of 32 bytes modulo r as it would allow collisions.

The point here is that collisions in a hash function are bad (automatic modulo reduction could allow an attacker to present two different proofs of the same thing, and it's easy to think of reasons that would cause problems), so they disallow such inputs. I think that is the prudent choice. I believe the AVM should simply panic if presented with such an input, just as it would if presented with a 31 byte input. This is analogous to the many places the ec opcodes panic in similar contexts. I don't think this is pushing a performance problem into the smart contract. The smart contract shouldn't bother checking these things either (and certainly shouldn't perform the modulo reduction itself). It should let this opcode panic.

It's a little ugly to re-use the ec constants, maybe that should be
changed.

This also changes the opcode to panic on buffers than contain elements
greater than the curve's modulus.

It's unclear what mimc should do with a zero buffer.  Even gnark seems
unsure.  Their code says:

```
// TODO @ThomasPiellard shouldn't Sum() returns an error if there is no data?
// TODO: @Tabaie, @thomas Piellard Now sure what to make of this
/*if len(d.data) == 0 {
	d.data = make([]byte, BlockSize)
}*/
```
@jannotti
Copy link
Contributor

I made this PR on yours to make these changes. Happy to discuss further.
giuliop#1

@giuliop
Copy link
Contributor Author

giuliop commented May 14, 2024

I disagree. They rejected that on the grounds that it allows collisions. In their response to me they noted:

Yes, you are right.

The point here is that collisions in a hash function are bad (automatic modulo reduction could allow an attacker to present two different proofs of the same thing, and it's easy to think of reasons that would cause problems), so they disallow such inputs. I think that is the prudent choice. I believe the AVM should simply panic if presented with such an input, just as it would if presented with a 31 byte input. This is analogous to the many places the ec opcodes panic in similar contexts. I don't think this is pushing a performance problem into the smart contract. The smart contract shouldn't bother checking these things either (and certainly shouldn't perform the modulo reduction itself). It should let this opcode panic.

I agree with you, let's let the AVM fail if any 32 byte chunk represents a number higher than the modulus.

There are some niche cases where the smart contract might still want to perform the modulo reduction itself but in the vast majority it would not be needed.
An example of niche case is hashing an Algorand address since some of them have a 32 byte representation higher than the modulus. In this case a known collision might be acceptable if exploiting it also required knowing the associated private key, which would not be possible.

A final note on security: when you generate a proof you can always pass inputs higher than the modulus, the zk circuit does indeed perform automatic modulo reduction on inputs/outputs.
To exploit that in practice, you would give to the verifier the proof and (some) public inputs higher than the modulus to have a valid "collision". To defend against that, a well written verifier needs to perform a check on each public input and fail if any of those is higher than the modulus. The verifiers generated by AlgoPlonk do that of course.
The other attack vector is if the zk circuit is not well designed to take into account the automatic modulo reduction on private inputs which cannot be checked by the verifier. A simple example would be naively checking A = B + C without checking for modulo overflow inside the circuit.

@giuliop
Copy link
Contributor Author

giuliop commented May 14, 2024

I made this PR on yours to make these changes. Happy to discuss further. giuliop#1

Thank you, I will review tonight!

@giuliop
Copy link
Contributor Author

giuliop commented May 15, 2024

I incorporated your changes and made two new changes:

  1. I made the opcode fail on empty input, to be more conservative.
    In a zk circuit there is no concept of empty input, every input is an integer, and there is no way to pass to mimc an empty input.

  2. I added comprehensive test vectors for mimc. I did not copy the generation code in the comments since it's 240 lines of go code, just put a link to it and explained the approach

I agree with you that using the EC constant is a little ugly but still acceptable I think.

@jannotti
Copy link
Contributor

Could you add mimcVersion to experiments so we don't release it to mainnet by mistake if we bump LogicVersion before this is completely ready?
var experiments = []uint64{spOpcodesVersion, mimcVersion}

I also have one request around your test code, but I appreciate your nice complete set of unit tests. I'll make my comment there.

With that done, I think this is mergable. It would stay in vFuture until we develop some usage, and spend some time making sure we feel good about MiMC, but it seems basically ready.

@giuliop
Copy link
Contributor Author

giuliop commented Oct 9, 2024

Thinking about it I realized we don't really need to pass two arguments to mimc, one for the curve and one for the other configuration parameters, since the curve is also a configuration parameter and we can have just one argument for all.
Also ECGroup was not a great choice anyway to represent the curve since it really represents a group in the curve, not a curve.

So I created a new MimcConfig type that can take one of two values for now BN254_MP_110 and BLS12_381_MP_111 that we can extend with new values in the future.

I have also improved the documentation to point out the caveats discussed above.

@giuliop
Copy link
Contributor Author

giuliop commented Oct 9, 2024

I rerun the cost benchmarking and update the cost for the mimc opcode, now it is more precise.

Before it was overestimated because I didn't know about the 1 'tick' ≈ 15 ns rule of thumb and made the cost relative to the other hash functions at small sizes, but those have a fixed cost which overshoots on small sizes and undershoots on large inputs, while mimc has a variable cost based on input length.

Here's the updated benchmarking:

Function Input Size ns Ticks (ns/15) Opcode Cost
sha256 32 86.09 6 35
sha256 128 134.1 9 35
sha256 512 282 19 35
sha256 1024 475.8 32 35
sha256 4096 1645 110 35
keccak256 32 353.2 24 130
keccak256 128 316.8 21 130
keccak256 512 995.2 66 130
keccak256 1024 1909 127 130
keccak256 4096 7047 470 130
mimc_BN254_MP_110 32 4823 322 360
mimc_BN254_MP_110 128 19122 1275 1410
mimc_BN254_MP_110 512 76431 5095 5610
mimc_BN254_MP_110 1024 154827 10322 11210
mimc_BN254_MP_110 4096 640658 42711 44810
mimc_BLS12_381_MP_111 32 4938 329 360
mimc_BLS12_381_MP_111 128 19557 1304 1410
mimc_BLS12_381_MP_111 512 78415 5228 5610
mimc_BLS12_381_MP_111 1024 160037 10669 11210
mimc_BLS12_381_MP_111 4096 673366 44891 44810

@giuliop
Copy link
Contributor Author

giuliop commented Oct 9, 2024

I think this PR is good to go now !

@cpeikert
Copy link

cpeikert commented Nov 8, 2024

All of these decisions seem sensible to me from a cryptography/security perspective. I will leave it to @jannotti to comment on implementation, tests, etc. and decide on merging.

@jannotti jannotti changed the title Add new teal opcodes for the MiMC hash function to support Zero Knowledge Proofs AVM: new teal opcodes for the MiMC hash function to support Zero Knowledge Proofs Dec 13, 2024
@gmalouf gmalouf requested review from cpeikert and removed request for cpeikert December 13, 2024 22:50
Copy link
Contributor

@algorandskiy algorandskiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good to me. @jannotti what do you think about cost test? I can't see any linearCost tests.

data/transactions/logic/pairing.go Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants