Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create payload->blob type system #1120

Open
wants to merge 28 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
9b222e0
Create type system
litt3 Jan 16, 2025
bae8843
Rename encoding
litt3 Jan 16, 2025
ebf9979
Merge branch 'master' into payload-blob-type-system
litt3 Jan 16, 2025
d66ddf2
Fix padding logic
litt3 Jan 16, 2025
f6021ea
Pad upon poly construction
litt3 Jan 16, 2025
422c2a8
Change name to ProtoBlob
litt3 Jan 16, 2025
a3d184b
Update comments
litt3 Jan 16, 2025
856f5fc
Revert "Change name to ProtoBlob"
litt3 Jan 17, 2025
1eeec85
Rename PayloadHeader to EncodedPayloadHeader
litt3 Jan 21, 2025
a30a4ad
Merge branch 'master' into payload-blob-type-system
litt3 Feb 7, 2025
5eab8dd
Rename to PayloadEncodingVersion
litt3 Feb 7, 2025
132e8de
Do a bunch of cleanup
litt3 Feb 7, 2025
da63536
Fix PayloadVersion naming
litt3 Feb 7, 2025
3016cac
Merge branch 'master' into payload-blob-type-system
litt3 Feb 7, 2025
9f98462
Use new conversion method
litt3 Feb 7, 2025
58b76c1
Add better descriptions to codec utils
litt3 Feb 10, 2025
916573f
Do test work
litt3 Feb 10, 2025
7eed618
Add length checks
litt3 Feb 11, 2025
c3a6291
Merge branch 'master' into payload-blob-type-system
litt3 Feb 11, 2025
b604c82
Write test for power of 2 util
litt3 Feb 11, 2025
2ef8e99
Write more tests
litt3 Feb 12, 2025
13d16c7
Finish utils tests
litt3 Feb 12, 2025
22e8134
Do FFT work
litt3 Feb 12, 2025
23b49c4
Clean up encoded payload
litt3 Feb 12, 2025
2add571
Merge branch 'master' into payload-blob-type-system
litt3 Feb 12, 2025
e5363e4
Make encoding version uint8 instead of byte
litt3 Feb 13, 2025
7801044
Add explanation for 0x00 padding
litt3 Feb 13, 2025
c70fad8
Use fuzz testing
litt3 Feb 13, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
79 changes: 79 additions & 0 deletions api/clients/codecs/blob.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
package codecs

import (
"fmt"
)

// Blob is data that is dispersed on eigenDA.
//
// A Blob is represented under the hood by a coeff polynomial
type Blob struct {
coeffPolynomial *coeffPoly
// blobLength must be a power of 2, and should match the blobLength claimed in the BlobCommitment
// This is the blob length IN SYMBOLS, not in bytes
blobLength uint32
Comment on lines +13 to +14
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider renaming to blobLengthSymbols

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although is symbol standard here? I feel like our whitepaper is the only place I've ever seen it. Should we just use FE for field element everywhere? BlobLengthFE?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally used blobLength to try to keep consistency with the struct in the commitment, but I agree, it's not ideal that I have to put in comments everywhere that it's in symbols

I don't have a preference between BlobLengthFE or BlobLengthSymbols. Usage of Symbols is pretty pervasive in our codebase, but also I don't think anyone would be confused by the FE terminology. Do you have a preference?

Comment on lines +11 to +14
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unclear how coeffPolynomial and blobLength are related. Does len(coeffPolynomial) == blobLength? Guessing we allow last zeros to be truncated. We should explain that

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also can't we just enforce that blobLength = nextPowerOf2(len(coeffPolynomial)) and then we don't need blobLength here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a corner case here:

  • Imagine a user disperses a very small blob, only 64 bytes, and the last 40 bytes are trailing 0s
  • When a different user fetches the blob from a relay, we've established that the relay could remove trailing 0s
  • If we were to say that blobLength = nextPowerOf2(len(coeffPolynomial)), then the user fetching and reconstructing this blob would determine that the blob length is 1 symbol, when it's actually 2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add a description of this corner case as a comment, if you confirm my logic here

}

// BlobFromBytes initializes a Blob from bytes
//
// blobLength is the length of the blob IN SYMBOLS
func BlobFromBytes(bytes []byte, blobLength uint32) (*Blob, error) {
poly, err := coeffPolyFromBytes(bytes)
if err != nil {
return nil, fmt.Errorf("polynomial from bytes: %w", err)
}

return BlobFromPolynomial(poly, blobLength)
}
Comment on lines +20 to +27
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comment above, do we not want to enforce some relationship between len(bytes) and blobLength?


// BlobFromPolynomial initializes a blob from a polynomial
//
// blobLength is the length of the blob IN SYMBOLS
func BlobFromPolynomial(coeffPolynomial *coeffPoly, blobLength uint32) (*Blob, error) {
return &Blob{
coeffPolynomial: coeffPolynomial,
blobLength: blobLength,
}, nil
}

// GetBytes gets the raw bytes of the Blob
func (b *Blob) GetBytes() []byte {
return b.coeffPolynomial.getBytes()
}
Comment on lines +40 to +42
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have a preference for Bytes() without the get prefix

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or toBytes() to be consistent with ToPayload?

Copy link
Contributor Author

@litt3 litt3 Feb 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally tend to understand a nuanced difference between To and Get

  • to (or in other contexts, compute) seems to indicate that the data is changing form, or being processed in some way
  • get implies that the method is simply fetching something that already exists, byte for byte

So, I chose ToPayload because the method is transforming the data, to produce a payload. And it's GetBytes, because we are returning the literal bytes, as they already exist.

But maybe I imagine this nuance, in which case I'm ok changing the current names. What would your preferred naming be for these two methods? Bytes() and ToPayload()?


// ToPayload converts the Blob into a Payload
//
// The payloadStartingForm indicates how payloads are constructed by the dispersing client. Based on the starting form
// of the payload, we can determine what operations must be done to the blob in order to reconstruct the original payload
func (b *Blob) ToPayload(payloadStartingForm PolynomialForm) (*Payload, error) {
Comment on lines +46 to +48
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I prefer if get rid of the starting terminology. It introduces time which I personally find confusing. For me its really just "how does a rollup interpret its payloads as polynomial, ALWAYS", aka its more of an invariant or a way of thinking, than a "starting point".

Wdyt of just payloadForm or payloadPolyForm?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think payloadForm is good. I will change it

var encodedPayload *encodedPayload
var err error
switch payloadStartingForm {
case PolynomialFormCoeff:
// the payload started off in coefficient form, so no conversion needs to be done
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again to me its not about "starting". its more about the invariant that we want to convert a blob to a payload. This requires knowing 2 piece of info:

  1. how does rollup interpret its payload
  2. how does eigenda interpret its blobs (always coeff)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// the payload started off in coefficient form, so no conversion needs to be done
// the payload is interpreted as coefficients of the polynomial, so no conversion needs to be done given that eigenda also interprets blobs as coefficients.

encodedPayload, err = b.coeffPolynomial.toEncodedPayload(b.blobLength)
if err != nil {
return nil, fmt.Errorf("coeff poly to encoded payload: %w", err)
}
case PolynomialFormEval:
// the payload started off in evaluation form, so we first need to convert the blob's coeff poly into an eval poly
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// the payload started off in evaluation form, so we first need to convert the blob's coeff poly into an eval poly
// the payload is interpreted as evaluations of a polynomial, so we need to convert the blob given that eigenda interprets blobs as coefficients.

evalPoly, err := b.coeffPolynomial.toEvalPoly(b.blobLength)
if err != nil {
return nil, fmt.Errorf("coeff poly to eval poly: %w", err)
}

encodedPayload, err = evalPoly.toEncodedPayload(b.blobLength)
if err != nil {
return nil, fmt.Errorf("eval poly to encoded payload: %w", err)
}
default:
return nil, fmt.Errorf("invalid polynomial form")
}

payload, err := encodedPayload.decode()
if err != nil {
return nil, fmt.Errorf("decode payload: %w", err)
}

return payload, nil
}
16 changes: 9 additions & 7 deletions api/clients/codecs/blob_codec.go
Original file line number Diff line number Diff line change
Expand Up @@ -4,22 +4,24 @@ import (
"fmt"
)

type BlobEncodingVersion byte
type PayloadEncodingVersion uint8

const (
// This minimal blob encoding contains a 32 byte header = [0x00, version byte, uint32 len of data, 0x00, 0x00,...]
// PayloadEncodingVersion0 entails a 32 byte header = [0x00, version byte, big-endian uint32 len of payload, 0x00, 0x00,...]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add comment as to why we add the first 0x00 byte (so that it gets parsed as a bn254 FE). Took me a second to remember this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// followed by the encoded data [0x00, 31 bytes of data, 0x00, 31 bytes of data,...]
DefaultBlobEncoding BlobEncodingVersion = 0x0
//
// Each group of 32 bytes starts with a 0x00 byte so that they can be parsed as valid bn254 field elements.
PayloadEncodingVersion0 PayloadEncodingVersion = 0x0
)

type BlobCodec interface {
DecodeBlob(encodedData []byte) ([]byte, error)
EncodeBlob(rawData []byte) ([]byte, error)
}

func BlobEncodingVersionToCodec(version BlobEncodingVersion) (BlobCodec, error) {
func BlobEncodingVersionToCodec(version PayloadEncodingVersion) (BlobCodec, error) {
switch version {
case DefaultBlobEncoding:
case PayloadEncodingVersion0:
return DefaultBlobCodec{}, nil
Comment on lines +24 to 25
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks a bit weird now. Why is payload version 0 mapping to a defaultBlobCodec? Should we use BlobCodecVersion0 instead? Or do we want separate versioning for the payloadEncoding and BlobCodec? Or are these things the same?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed the constant to what I felt made sense for the new code, and but the name doesn't work so well in the old blob_codec code. If I don't rename the constant, then the name doesn't work so well in the new code

Or are these things the same?

Yes, they are essentially the same thing.

default:
return nil, fmt.Errorf("unsupported blob encoding version: %x", version)
Expand All @@ -33,7 +35,7 @@ func GenericDecodeBlob(data []byte) ([]byte, error) {
// version byte is stored in [1], because [0] is always 0 to ensure the codecBlobHeader is a valid bn254 element
// see https://github.com/Layr-Labs/eigenda/blob/master/api/clients/codecs/default_blob_codec.go#L21
// TODO: we should prob be working over a struct with methods such as GetBlobEncodingVersion() to prevent index errors
version := BlobEncodingVersion(data[1])
version := PayloadEncodingVersion(data[1])
codec, err := BlobEncodingVersionToCodec(version)
if err != nil {
return nil, err
Expand All @@ -49,7 +51,7 @@ func GenericDecodeBlob(data []byte) ([]byte, error) {

// CreateCodec creates a new BlobCodec based on the defined polynomial form of payloads, and the desired
// BlobEncodingVersion
func CreateCodec(payloadPolynomialForm PolynomialForm, version BlobEncodingVersion) (BlobCodec, error) {
func CreateCodec(payloadPolynomialForm PolynomialForm, version PayloadEncodingVersion) (BlobCodec, error) {
lowLevelCodec, err := BlobEncodingVersionToCodec(version)
if err != nil {
return nil, fmt.Errorf("create low level codec: %w", err)
Expand Down
40 changes: 40 additions & 0 deletions api/clients/codecs/blob_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
package codecs

import (
"bytes"
"testing"

"github.com/stretchr/testify/require"
)

// TestBlobConversion checks that internal blob conversion methods produce consistent results
func FuzzBlobConversion(f *testing.F) {
for _, seed := range [][]byte{{}, {0x00}, {0xFF}, {0x00, 0x00}, {0xFF, 0xFF}, bytes.Repeat([]byte{0x55}, 1000)} {
f.Add(seed)
}

f.Fuzz(
func(t *testing.T, originalData []byte) {
testBlobConversionForForm(t, originalData, PolynomialFormEval)
testBlobConversionForForm(t, originalData, PolynomialFormCoeff)
})

}

func testBlobConversionForForm(t *testing.T, payloadBytes []byte, form PolynomialForm) {
payload := NewPayload(payloadBytes)

blob, err := payload.ToBlob(form)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The natural interpretation for this reads wrong :(
It reads "payload is getting to blob in this form" instead of "payload, interpreted in this form, gets converted to a blob".
Maybe there's a different way to structure the method such that the interpretation makes more sense?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like perhaps defining on Blob a method .fromPayloadInForm()? Then we could use like
new(Blob).fromPayloadInForm(payload, form). It's def more ugly but at least it reads more easily I feel?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of this idea: when constructing a payload, declare in the constructor what form it is. The sequence would then read:

payload := NewPayload(payloadBytes, form)
blob, err := payload.ToBlob()

I personally really dislike like the pattern of creating a new empty object, just to call a method that creates another instance of the object

require.NoError(t, err)

blobBytes := blob.GetBytes()
blobFromBytes, err := BlobFromBytes(blobBytes, blob.blobLength)
require.NoError(t, err)

decodedPayload, err := blobFromBytes.ToPayload(form)
require.NoError(t, err)

decodedPayloadBytes := decodedPayload.GetBytes()

require.Equal(t, payloadBytes, decodedPayloadBytes)
}
63 changes: 63 additions & 0 deletions api/clients/codecs/coeff_poly.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
package codecs

import (
"fmt"

"github.com/Layr-Labs/eigenda/encoding/fft"
"github.com/Layr-Labs/eigenda/encoding/rs"
"github.com/consensys/gnark-crypto/ecc/bn254/fr"
)

// coeffPoly is a polynomial in coefficient form.
//
// The underlying bytes represent 32 byte field elements, and each field element represents a coefficient
type coeffPoly struct {
fieldElements []fr.Element
}
Comment on lines +14 to +16
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This duplicates an existing type (currently in encoding/rs/frame_coeffs.go in master, named something different if you aren't up to date with the latest master`).

type FrameCoeffs []fr.Element

Also, there is a bunch of serialization logic in that file as well for going back and forth between serialized and deserialized form.

Is the plan to circle back and replace these old types in a follow up PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see []fr.Element used directly as a type in many places in this PR. Is there a reason why you don't use the fancy type you define here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why you put this into a struct instead of doing type coefPoly []fr.Element?


// coeffPolyFromBytes creates a new polynomial from bytes. This function performs the necessary checks to guarantee that the
// bytes are well-formed, and returns a new object if they are
func coeffPolyFromBytes(bytes []byte) (*coeffPoly, error) {
fieldElements, err := rs.ToFrArray(bytes)
if err != nil {
return nil, fmt.Errorf("deserialize field elements: %w", err)
}

return &coeffPoly{fieldElements: fieldElements}, nil
}

// coeffPolyFromElements creates a new coeffPoly from field elements.
func coeffPolyFromElements(elements []fr.Element) *coeffPoly {
return &coeffPoly{fieldElements: elements}
}

// toEvalPoly converts a coeffPoly to an evalPoly, using the FFT operation
//
// blobLength (in SYMBOLS) is required, to be able to choose the correct parameters when performing FFT
func (p *coeffPoly) toEvalPoly(blobLength uint32) (*evalPoly, error) {
// TODO (litt3): this could conceivably be optimized, so that multiple objects share an instance of FFTSettings,
// which has enough roots of unity for general use. If the following construction of FFTSettings ever proves
// to present a computational burden, consider making this change.
fftSettings := fft.FFTSettingsFromBlobLength(blobLength)

// the FFT method pads to the next power of 2, so we don't need to do that manually
fftedElements, err := fftSettings.FFT(p.fieldElements, false)
if err != nil {
return nil, fmt.Errorf("perform FFT: %w", err)
}

return evalPolyFromElements(fftedElements), nil
}

// GetBytes returns the bytes that underlie the polynomial
func (p *coeffPoly) getBytes() []byte {
return rs.FieldElementsToBytes(p.fieldElements)
}
Comment on lines +53 to +55
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My personal preference for methods like this where we convert to/from []byte is to use a serialize/deserialize naming schema. Open to discussion if you don't agree.


// toEncodedPayload converts a coeffPoly into an encoded payload
//
// blobLength is required, to be able to perform length checks on the encoded payload during construction.
// blobLength is in symbols, NOT bytes
func (p *coeffPoly) toEncodedPayload(blobLength uint32) (*encodedPayload, error) {
return encodedPayloadFromElements(p.fieldElements, blobLength)
}
35 changes: 35 additions & 0 deletions api/clients/codecs/conversion_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
package codecs

import (
"bytes"
"testing"

"github.com/stretchr/testify/require"
)

// FuzzConversionConsistency checks that data can be encoded and decoded repeatedly, always getting back the original data
func FuzzConversionConsistency(f *testing.F) {
for _, seed := range [][]byte{{}, {0x00}, {0xFF}, {0x00, 0x00}, {0xFF, 0xFF}, bytes.Repeat([]byte{0x55}, 1000)} {
f.Add(seed)
}

f.Fuzz(
func(t *testing.T, originalData []byte) {
payload := NewPayload(originalData)

blob1, err := payload.ToBlob(PolynomialFormEval)
require.NoError(t, err)

blob2, err := payload.ToBlob(PolynomialFormCoeff)
require.NoError(t, err)

decodedPayload1, err := blob1.ToPayload(PolynomialFormEval)
require.NoError(t, err)

decodedPayload2, err := blob2.ToPayload(PolynomialFormCoeff)
require.NoError(t, err)

require.Equal(t, originalData, decodedPayload1.GetBytes())
require.Equal(t, originalData, decodedPayload2.GetBytes())
})
}
2 changes: 1 addition & 1 deletion api/clients/codecs/default_blob_codec.go
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ func (v DefaultBlobCodec) EncodeBlob(rawData []byte) ([]byte, error) {
codecBlobHeader := make([]byte, 32)
// first byte is always 0 to ensure the codecBlobHeader is a valid bn254 element
// encode version byte
codecBlobHeader[1] = byte(DefaultBlobEncoding)
codecBlobHeader[1] = byte(PayloadEncodingVersion0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't this function EncodedBlob need to be changed? it returns []byte, shouldnt it return a Blob in your type system? Or whatever else this is (prob just an encodedPayload?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default_blob_codec will not be used in v2, all encoding is contained in the new structs


// encode length as uint32
binary.BigEndian.PutUint32(codecBlobHeader[2:6], uint32(len(rawData))) // uint32 should be more than enough to store the length (approx 4gb)
Expand Down
Loading
Loading