Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SnarkyJS SHA/Keccak #9

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
179 changes: 179 additions & 0 deletions 0005-snarkyjs-ecdsa-sha.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,179 @@
# Exposing ECDSA and SHA3/Keccak to SnarkyJS

## Summary

Foreign cryptography primitives such as ECDSA and SHA3 are widely used outside of Mina. For example, Ethereum uses ECDSA over secp256k1 for signatures - in order to "communicate" with the outside world and other blockchains, SnarkyJS (and, therefore Mina) needs to support these primitives as well. This RFC describes how we will leverage the custom gates implemented by the crypto team and expose them to SnarkyJS, making them accessible to smart contract developers.
Trivo25 marked this conversation as resolved.
Show resolved Hide resolved

## Motivation

The initial [Ethereum Crypto Primitive PRD](https://www.notion.so/minaprotocol/Ethereum-Primitives-Support-PRD-d89af720e1c94f7b90166709432e7bd5) describes the importance of establishing cryptographic compatibility with Ethereum. The [ECDSA PRD](https://www.notion.so/minaprotocol/ECDSA-ver-gadget-PoC-PRD-9458c38adf204d6b922deb8eed1ac193) and the
[Keccak PRD](https://www.notion.so/minaprotocol/Keccak-gadget-PoC-PRD-59b024bce9d5441c8a00a0fcc9b356ae) then go into detail and describe two of the most important building blocks to achieve Ethereum compatibility in a cryptographic sense.

This RFC describes the steps needed to expose both ECDSA and SHA3/Keccak to SnarkyJS, enabling SnarkyJS developer to use these primitives in their applications and smart contracts.

It is important to mention that this RFC only considers the work required to _expose_ already existing cryptographic primitives (gates and gadgets), which have previously been implemented by the crypto team, from the OCaml bindings layer to SnarkyJS - it is not required to implement additional cryptographic primitives. Changes will only impact [snarkyjs-bindings](https://github.com/o1-labs/snarkyjs-bindings) and [SnarkyJS](https://github.com/o1-labs/snarkyjs) itself.

Once completed, SnarkyJS users will be able to leverage ECDSA and SHA3/Keccak to build applications that integrate with Ethereum and other use cases that require the use of said cryptographic primitives.

## Detailed design

### SHA3/Keccak

The Keccak and SHA3 gadget has been implemented by the crypto team, enabling us to leverage the already existing bindings layer to and from OCaml. This design allows us to simply integrate the new gadgets into SnarkyJS by simply exposing them in `ocaml/lib/snarky_js_bindings_lib.ml`.

For Keccak/SHA3, the implementation exposes two ready-to-go functions.
Trivo25 marked this conversation as resolved.
Show resolved Hide resolved

```ocaml
(** Gagdet for NIST SHA-3 function for output lengths 224/256/384/512 *)
val nist_sha3 :
(module Snarky_backendless.Snark_intf.Run with type field = 'f)
-> int
-> 'f Snarky_backendless.Cvar.t list
-> 'f Snarky_backendless.Cvar.t array

(** Gadget for Keccak hash function for the parameters used in Ethereum *)
val eth_keccak :
Trivo25 marked this conversation as resolved.
Show resolved Hide resolved
(module Snarky_backendless.Snark_intf.Run with type field = 'f)
-> 'f Snarky_backendless.Cvar.t list
-> 'f Snarky_backendless.Cvar.t array

```

These two functions will be imported into the bindings layer and exposed via a new sub-module as part of the already existing `Snarky` module.

```ocaml

module Sha = struct
let create message nist length =
let message_array = Array.to_list message in
if Js.to_bool nist then
Trivo25 marked this conversation as resolved.
Show resolved Hide resolved
Kimchi_gadgets.Keccak.nist_sha3 (module Impl) length message_array
else Kimchi_gadgets.Keccak.eth_keccak (module Impl) message_array
Trivo25 marked this conversation as resolved.
Show resolved Hide resolved
end

```

In order to reduce the bindings surface, Keccak and SHA3 will be exposed via a `create` function, which behaves like a factory pattern.
By calling this function inside SnarkyJS, we can define and expose all possible variants of SHA3(224/256/384/512) and Keccak without the need to have an individual function in the bindings layer for each variant.
Trivo25 marked this conversation as resolved.
Show resolved Hide resolved

In order to differentiate Poseidon, which works over native Field elements, and SHA3 and Keccak which works over byte-sized Field elements, it will be beneficial to introduce a new type `UInt8` (a Field element that is exactly a byte) to draw a clan line between both hash functions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems useful. Nonetheless, the gadget already performs all necessary constraints to make sure that the inputs are indeed 8 bits at most. That means it is sound. But if you want to provide the user interface with a much more explicit type for this so they can see if it is ok at compile time, then I believe it is a great idea for user experience.

Copy link
Contributor

@mitschabaude mitschabaude Jun 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is important information! If we had a UInt8 type, it would actually be preferable to constrain its range as it is introduced into provable code. This is how we usually do it, so that any user of UInt8 in provable code can rely on the fact that this value can only be 8 bits long. (You might have seen the same done for booleans in snarky).

So, could it be a possibility to add a flag to the existing gadgets that disables the 8 bit range check on inputs? @querolita

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea @mitschabaude - makes sense.

Note typo: draw a clan line

Copy link
Member

@querolita querolita Jun 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Answering your question @mitschabaude, yes, it could be easily added so that you have access to it from SnarkyJS. From a cryptographic point of view, I still have the feeling that I do prefer to leave those constraints in the circuit. Meaning, even if the JS code won't allow you to introduce anything larger than UInt8, it still relies on trusting that the types are correct rather than a mathematical proof that it is indeed the case. But if this is something that SnarkyJS does often, I guess it is fine if you can disable the check in case the user introduces a UInt8 as input.

Copy link
Contributor

@mitschabaude mitschabaude Jun 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@querolita I may have been not clear enough -- what I'm proposing is to add constraints which prove that inputs fit in 8 bits. It's just better to add those constraints when a value is introduced, and not inside a specific gadget.

Assume that we have multiple gadgets which operate on UInt8. If every gadget adds constraints to check that the Uint8 is only 8 bits long, then we will add those gates multiple times (if we use multiple of these gadgets).

If, instead, we add the constraints whenever a Uint8 is introduced (with exists), and not in each gadget, then we will have those constraints only once.

In snarkyjs and in snarky, we already have a way to add constraints to the proof everytime a variable of a certain type is introduced with exists. In snarky, every typ can have a check method which contains those "type-defining" constraints.

exists calls typ.check:
https://github.com/o1-labs/snarky/blob/14f8e2ff981a9c9ea48c94b2cc1d8c161301537b/src/base/checked_runner.ml#L277

and types such as booleans include their defining constraints as part of typ.check:
https://github.com/o1-labs/snarky/blob/14f8e2ff981a9c9ea48c94b2cc1d8c161301537b/src/base/utils.ml#L278-L281

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, I completely agree with you @querolita that we can never leave out constraints like this.

even if snarkyjs would throw an error on creating a UInt8 which doesn't fit in 8 bits, that would not prevent adversarial provers from including such a value in their proof: they can simply create their proof using a modified version of snarkyjs which doesn't throw this error. a modification like this is trivial and doesn't change the constraints, so an adversarial proof like that would be valid against an on-chain verification key, which simply means smart contracts would have a security vulnerability

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooooooh!!! Okay I get you now! Nice very nice indeed! Actually we had discussions early in the days of the design of Keccak where we were analysing the possibility to add a "Byte" type, together with existing ones like Boolean or Field. But we thought it might not be super urgent at the Snarky level, but it indeed looks interesting at the SnarkyJS level.

I really like this idea now. And yes, I can create a flag to disable these checks inside the gadget, if they have already been constrained at UInt8 creation time. I will create a subsequent PR with this, together with other improvements that have been suggested along this discussion.


In SnarkyJS, these new primitives will be declared as followed:

```ts
function buildSha(length: 224 | 256 | 385 | 512, nist: boolean) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update this with the most object too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly do you mean by "the most object"?

return {
hash(message: UInt8[]) {
let payload = [0, ...message.map((f) => f.value)];
return Snarky.sha.create(payload, nist, length).map(Field);
},
};
}

const Sha3_224 = buildSha(224, true);
const Sha3_256 = buildSha(256, true);
const Sha3_385 = buildSha(385, true);
const Sha3_512 = buildSha(512, true);
const Keccak = buildSha(256, false);
```

Another alternative to the factory pattern above could be to supply the developer with a single function that takes a range of parameters, so that the developer can choose their flavour of SHA3/Keccak on their own.

```ts
function SHA3(
length: 224 | 256 | 385 | 512,
nist?: boolean = true
Trivo25 marked this conversation as resolved.
Show resolved Hide resolved
): { hash(xs: UInt8[]) };
```

Developers can then import these functions into their project via

```ts
import { Sha3_224, Sha3_256, Sha3_385, Sha3_512, Keccak } from "snarkyjs";

Sha3_224.hash(xs);
// ..
```

### Alternative Approach
Copy link

@MartinMinkov MartinMinkov Jun 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've spent some time researching and thinking about our approach to the API, and I feel a Hash interface could be a great fit for us.

I took a look into the functionality of various popular JS runtimes like Node.JS, Deno, and Bun, I noticed they offer builder patterns for creating hash functions. While their approach is great, considering the diverse configuration options they have to deal with, our needs aren't as complex (I think). So, perhaps we might not need to follow the builder pattern for our hash functions.

I also looked into how Noir structures its approach: https://noir-lang.org/standard_library/cryptographic_primitives/hashes.

They follow a std:hash:[hash_name] style, influenced by Rust, which I found quite appealing. It feels efficient and neat to call the function directly without any additional building process. My vote goes towards an interface-styled approach, happy to hear other opinions, though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with having the different hashes directly!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checking... in the snark these hashes come with some circuit constants and it may be more efficient to include those in the circuit once at the start of the circuit. If the "direct hash" API isn't able to facilitate this, then the "builder" approach could have an advantage since in that approach you could add those constraints when building the hash configuration.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jspada, don't we have a way to include constants only once anyway? "Constant unification"? When you use a constant that already exists in the circuit, it will just be wired to the existing one?
If that's true and we can use it, it seems to me we don't need a special setup command to avoid adding the same constants multiple times. cc @mrmr1993?


Another possibility is to combine all hash functions, including Poseidon, under a shared namespace `Hash`. Developers will then be able to use these functions by calling `Hash.[hash_name].hash(xs)`. However, this would not be equivalent to the existing `Poseidon` API.
Trivo25 marked this conversation as resolved.
Show resolved Hide resolved

However, the existing `Poseidon` API could be deprecated in favour of the new `Hash` namespace.
Trivo25 marked this conversation as resolved.
Show resolved Hide resolved

```ts
const Hash = {
Poseidon: {
hash: (xs: UInt8[]) => "digest",
},

// ..

SHA3_224: {
hash: (xs: UInt8[]) => "digest",
},

Keccak: {
hash: (xs: UInt8[]) => "digest",
},
};
```

Developers would then be able to simply import the `Hash` namespace into their project and use all available hash functions.

```ts
import { Hash } from "snarkyjs";

Hash.Poseidon.hash(xs);
Hash.Keccak.hash(xs);
// ..

// deprecated in favor of Hash.Poseidon.hash
Poseidon.hash(xs);
```

It is important to mention that we have a lot of freedom in the the implementation details of the API, as the gates and gadgets are already implemented in OCaml and just need to exposed and wrapped by a developer friendly API.

One goal of the API is to design an easy to use interface for developers as well as maintaining consistency with already existing primitives (Poseidon in this case, Schnorr in the case of ECDSA).
The usage of both Keccak/SHA3 and Poseidon is the same: The user passes an array of Field elements into the hash function and the output is returned. One important detail is that Keccak/SHA3 only accepts an array of Field element that each are no larger than 1 byte. We will also provide additional helper functions that allow conversion between Field element arrays and hexadecimal encoded strings.

```ts
function fieldBytesFromHex(hex: string): UInt8[];
function hexToFieldBytes(xs: UInt8[]): string;
```

Overall, exposing new gadgets and gates follow a strict pattern that has been used all over in the SnarkyJS bindings layer. As an example, the [Poseidon](https://github.com/o1-labs/snarkyjs-bindings/blob/main/ocaml/lib/snarky_js_bindings_lib.ml#L386) implementation behaves similarly. From the point of view of SnarkyJS, these gadgets are just another set of function calls to OCaml.

## ECDSA
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this section if there will be separate RFC?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! It will be removed with the other PR, which is stacked on top of this PR #14


**TODO** this will be a seperate PR stacked on top of this one

**Evergreen, wide-sweeping Details**

**Ephemeral details**

## Test plan and functional requirements

## SHA3/Keccak
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test plan and functional requirements and SHA3/Keccak sections are of the same level. Later one probably should be a sub-section (###).


In order to test the implementation of SHA3 and Keccak in SnarkyJS, we will follow the testing approach we already apply to other gadgets and gates.
This includes testing the out-of- and in-snark variants using our testing framework, as well as adding a SHA3 and Keccak regression test. The regression tests will also include a set of predetermined digests to make sure that the algorithm doesn't unexpectedly change over time (similar to the tests implemented for the OCaml gadget).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a set of predetermined digests

If we're considering to utilize the test vectors from organizations like NIST (to ensure the correctness of the implementation), then we should specify it.

regression test

It will be good to also mention what will form the regression testing suite (for example, functional tests, specific edge cases, errors handling, backward compatibility, etc.).

Copy link
Contributor

@mitschabaude mitschabaude Jun 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, we have a "regression test suite" in use in snarkyjs CI, which this RFC refers to. it checks that constraints generated using a given list of contracts / provable methods don't change: https://github.com/o1-labs/snarkyjs/blob/main/src/examples/vk_regression.ts

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for specific edge cases, errors handling

Since padding is important for security and there are at least two kinds of padding supported by our Keccak gadgets, from a security perspective testing different length messages (e.g. empty, zero, one less than padding lengths, one more, multiples of, etc) are great edge-cases to test imho!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks everyone! I slightly edited this section to reflect your ideas. I am a big fan of adding (maybe not all, but at least some) of the NIST test vectors - what are your thoughts on this? Is this already being done in the crypto side? (didn't explicitly see that)


In addition to that, we should provide a dedicated integration test that handles SHA3/Keccak hashing within a smart contract (proving enabled). This will allow us to not only provide developers with an example, but also ensure that SHA3 and Keccak proofs can be generated.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Are we concerned about performance testing?
    • For example, hashing large amounts of data or rapidly making hash requests.
  • Are we concerned about collision testing?
  • Are we concerned about security testing?
    • Possible/known vulnerabilities, cryptographic weakness, etc.
  • Are we concerned about usability testing?
    • In a form of gathering the feedback from a small group of developers to improve the design and usability of the API before releasing it.

  • The RFC indicates that hashing with SHA3 and Keccak is more expensive than using Poseidon. Performance metrics, such as time and memory usage, should be gathered and analyzed to provide developers with a clear understanding of the subject.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question - @querolita to what extend has the original crypto implementation be tested? Do you have any recommendations to what we should test, if at all?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests performed to the implementation were:

  • Checking different output lengths and different configurations of the padding (NIST's for SHA3 and submission's for Keccak)
  • Checking different length outputs with different numbers of blocks
  • Edge cases of padding to make sure that a whole new block is created, and messages that already look padded
  • Checking different endianness of inputs and outputs
  • Check that non-byte length inputs cannot be passed
  • Can reuse constraint system for the same circuit structure
  • Cannot reuse constraint system for different output lengths
  • Cannot reuse constraint system for same output length but different padding
  • Cannot reuse constraint system for different endianness

I would create similar tests for SnarkyJS, but probably more high level since you don't have access to low level details such as constraint system creation (that I am aware of). In particular, I would make sure that inputs which are not "correct formatted hex values" (meaning an odd number of hex digits) is not passed to the function. And of course making sure that all combinations actually produce the expected value (pairs of output lengths and padding types), for some random values. I wouldn't go much deeper than that regarding the snarkyjs level alone.

In that sense, we are not testing against collision, just comparing against official outputs of the hash function on those given inputs, nor performance testing (all we know is 1 block of encryption, which is enough for 1080bits of input data, fits in 15k kimchi rows, thus 4 full blocks can be hashed without chunking), we haven't looked for keccak's potential vulnerabilities, and we have not gathered usability info from users other than ourselves.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shimkiv does this sounds good to you? Is there anything else you would change here?

Copy link
Member

@shimkiv shimkiv Jul 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we will follow the testing approach we already apply to other gadgets and gates

It will be good to add some implementation example links here.

will also include a set of predetermined digests

Can we please be more specific here? What kind of set? Is it standardised set? If not then how we determine the set to be good enough?
Open question

similar to the tests implemented for the OCaml gadget

Can we please include links here?

We will include a range of edge cases in the tests (e.g. empty input, zero, etc).

It will be good if we can be more explicit here and at least highlight the major cases we're aware of and that will be included into the testing plan.

In that sense, we are not testing against collision, just comparing against official outputs of the hash function on those given inputs, nor performance testing (all we know is 1 block of encryption, which is enough for 1080bits of input data, fits in 15k kimchi rows, thus 4 full blocks can be hashed without chunking), we haven't looked for keccak's potential vulnerabilities, and we have not gathered usability info from users other than ourselves.

If we agree on this then it would be good if we can add listed items into the "out of the scope" testing section with some rationale.


## Drawbacks

Compared to Poseidon, hashing with SHA3 and Keccak is expensive. This should be made clear to the developer to avoid inefficient circuits. Additionally, it is important to educate developers of when to use SHA3/Keccak and when to use Poseidon. Additionally, the API should be secure.
Trivo25 marked this conversation as resolved.
Show resolved Hide resolved
Adding new primitives, especially cryptographic primitives, always includes risks such as the possibility of not constraining the algorithm and input enough to provide the developer with a safe API that is required to build secure applications. However, adding these primitives to SnarkyJS enables developers to explore a new range of important use cases.

## Rationale and alternatives

Keccak and SHA3 could not be exposed to SnarkyJS at all. However, this would essentially render these primitives useless since they were specifically designed to be used by developers with SnarkyJS. By adding these primitives, SnarkyJS will become an even more powerful zero-knowledge SDK that enables developers to explore a wide range of use cases.
Trivo25 marked this conversation as resolved.
Show resolved Hide resolved

## Prior art

Exposing gates and gadgets from the OCaml layer to SnarkyJS is nothing new - the same procedure has been applied to other primitives such as Poseidon, Field, and Elliptic Curve operations.

Trivo25 marked this conversation as resolved.
Show resolved Hide resolved
## Unresolved questions

No unresolved questions.