Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v6] Make hash usage extensible to enable concurrent support of multiple hash types #1362

Merged
merged 46 commits into from
Jan 27, 2025

Conversation

pjbgf
Copy link
Member

@pjbgf pjbgf commented Jan 6, 2025

Refactor the codebase so that it stops relying on plumbing.Hash's implementation details, and instead rely on exported methods.
This is a stepping stone to enable support of multiple hash types, which will be dealt with as a follow-up PR.

The hash package now contains all code regarding hash, with exception to Hasher related logic, which is spread between plumbing and hasher. plumbing.Hash now redirects to hash.SHA1Hash, which may (or may not) be dropped at a later stage.

The previous sha256 specific tests that relied on build tags were removed as they are now redundant.

Relates to #706 #899.
Supersedes to #920.

@pjbgf pjbgf added the breaking label Jan 6, 2025
@pjbgf pjbgf added this to the v6.0.0 milestone Jan 6, 2025
@pjbgf pjbgf requested a review from aymanbagabas January 6, 2025 00:10
Comment on lines 10 to 17
const (
// CryptoType defines what hash algorithm is being used.
CryptoType = crypto.SHA1
// Size defines the amount of bytes the hash yields.
Size = 20
Size = SHA1Size
// HexSize defines the strings size of the hash when represented in hexadecimal.
HexSize = 40
HexSize = SHA1HexSize
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need these?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still being used, instead of replacing all the refs to SHA1Size, to then have to change them again later, I think we are better off leaving as is and replace as the specific packages get multi-hash support.

Comment on lines 3 to 14
func Zero() SHA1Hash {
return SHA1Hash{}
}

func ZeroSHA256() SHA256Hash {
return SHA256Hash{}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we replace these with var zero hash.SHA1Hash and var zero256 hash.SHA256Hash?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main reason for a func was to avoid the awkard position were API users are able to change the meaning of zero, using a single var will revert to that. However, I agree with the change for the time being, but it is likely this code will change again before it being ready for merging back to v6-exp.

Comment on lines 16 to 19

func (ih SHA256Hash) IsZero() bool {
var empty SHA256Hash
return ih == empty
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func (ih SHA256Hash) IsZero() bool {
var empty SHA256Hash
return ih == empty
var empty256 SHA256Hash
func (ih SHA256Hash) IsZero() bool {
return ih == empty256

Comment on lines 26 to 29

func (ih SHA1Hash) IsZero() bool {
var empty SHA1Hash
return ih == empty
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func (ih SHA1Hash) IsZero() bool {
var empty SHA1Hash
return ih == empty
}
var empty SHA1Hash
func (ih SHA1Hash) IsZero() bool {
return ih == empty
}

Comment on lines 104 to 126
// ZeroFromHash returns a zeroed hash based on the given hash.Hash.
//
// Defaults to SHA1-sized hash if the provided hash is not supported.
func ZeroFromHash(h hash.Hash) ObjectID {
switch h.Size() {
case SHA256Size:
return SHA256Hash{}
default:
return SHA1Hash{}
}
}

// ZeroFromHash returns a zeroed hash based on the given ObjectFormat.
//
// Defaults to SHA1-sized hash if the provided format is not supported.
func ZeroFromObjectFormat(f format.ObjectFormat) ObjectID {
switch f {
case format.SHA256:
return SHA256Hash{}
default:
return SHA1Hash{}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't use these funcs now. We can add them later when the need arises

import "io"

// ObjectID represents a calculated hash, which is immutable by nature.
type ObjectID interface {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rename this to Hash to keep naming consistent

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am holding back on making this change while plumbing.Hash exists, to avoid additional confusion on having multiple "Hash" types. At present plumbing.Hash has more than one use which is leading me to think that potentially it could be split into two different types. Either way, let's make that decision together with the final design for the API (as per discord thread) - which will be in a follow-up PR.


// LazyObjectID represents an object hash which may not be known at the time
// the object is created. Or an objectID which changes during its lifetime.
type LazyObjectID interface {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, I'd rename this to LazyHash

plumbing/hasher/hasher.go Outdated Show resolved Hide resolved
@pjbgf
Copy link
Member Author

pjbgf commented Jan 6, 2025

@aymanbagabas changes made and code rebased. PTAL

Comment on lines 46 to 47
if len(in) < SHA1HexSize ||
len(in) > SHA256HexSize {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if len(in) < SHA1HexSize ||
len(in) > SHA256HexSize {
if len(in) != SHA1HexSize &&
len(in) != SHA256HexSize {

Comment on lines 80 to 81
if len(in) < SHA1Size ||
len(in) > SHA256Size {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if len(in) < SHA1Size ||
len(in) > SHA256Size {
if len(in) != SHA1Size &&
len(in) != SHA256Size {

hasher hash.Hash
m sync.Mutex
// both fields below are allocation optimisations.
b [20]byte
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
b [20]byte
b [hash.SHA1Size]byte

hasher hash.Hash
m sync.Mutex
// both fields below are allocation optimisations.
b [32]byte
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
b [32]byte
b [hash.SHA256Size]byte

)

type SHA1Hash struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we could move the actual implementation in a sub package plumbing/hash/sha1 with a unified API for both sha1 and sha256

I could also see the use for an object format registry similar to crypto.RegisterHash

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could also see the use for an object format registry similar to crypto.RegisterHash

I had some discussions with the folks from gittuf a while back, and that seemed to support some use cases they had. Let's definitely consider this.

Some of these changes I already have locally. I am thinking on v6-transport merged first, and then getting them going. Just to avoid all the rebasing.

dependabot bot and others added 13 commits January 22, 2025 22:31
Bumps the golang-org group with 3 updates: [golang.org/x/crypto](https://github.com/golang/crypto), [golang.org/x/net](https://github.com/golang/net) and [golang.org/x/sys](https://github.com/golang/sys).


Updates `golang.org/x/crypto` from 0.31.0 to 0.32.0
- [Commits](golang/crypto@v0.31.0...v0.32.0)

Updates `golang.org/x/net` from 0.32.0 to 0.34.0
- [Commits](golang/net@v0.32.0...v0.34.0)

Updates `golang.org/x/sys` from 0.28.0 to 0.29.0
- [Commits](golang/sys@v0.28.0...v0.29.0)

---
updated-dependencies:
- dependency-name: golang.org/x/crypto
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: golang-org
- dependency-name: golang.org/x/net
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: golang-org
- dependency-name: golang.org/x/sys
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: golang-org
...

Signed-off-by: dependabot[bot] <[email protected]>
Bumps [github.com/ProtonMail/go-crypto](https://github.com/ProtonMail/go-crypto) from 1.1.3 to 1.1.5.
- [Release notes](https://github.com/ProtonMail/go-crypto/releases)
- [Commits](ProtonMail/go-crypto@v1.1.3...v1.1.5)

---
updated-dependencies:
- dependency-name: github.com/ProtonMail/go-crypto
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Bumps [github/codeql-action](https://github.com/github/codeql-action) from 2.22.11 to 3.28.3.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Commits](github/codeql-action@v2.22.11...v3.28.3)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <[email protected]>
…ng-org-61daae229c

build: bump the golang-org group with 3 updates
…github/codeql-action-3.28.3

build: bump github/codeql-action from 2.22.11 to 3.28.3
Bumps [github.com/pjbgf/sha1cd](https://github.com/pjbgf/sha1cd) from 0.3.0 to 0.3.2.
- [Release notes](https://github.com/pjbgf/sha1cd/releases)
- [Commits](pjbgf/sha1cd@v0.3.0...v0.3.2)

---
updated-dependencies:
- dependency-name: github.com/pjbgf/sha1cd
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
…ub.com/ProtonMail/go-crypto-1.1.5

build: bump github.com/ProtonMail/go-crypto from 1.1.3 to 1.1.5
…ub.com/pjbgf/sha1cd-0.3.2

build: bump github.com/pjbgf/sha1cd from 0.3.0 to 0.3.2
Signed-off-by: Paulo Gomes <[email protected]>
Recently the master branch was frozen, and a new releases/v5.x branch
was created to keep the maintenance of the V5 version.
The new default branch is now main.

This change update all GitHub Workflows to align with the new
branch strategy.

Signed-off-by: Paulo Gomes <[email protected]>
The previous fuzzing code was static and required each target to be defined
within oss-fuzz.sh. In addition, the Makefile target fuzz would statically
call each one of the targets.

The changes remove the no longer needed oss-fuzz.sh. Please refer to the linked
upstream for more information.

The Makefile target is mostly for convenience when smoke testing all Go native
fuzzing are working. The changes now enumerate all packages which contains at
least one fuzzing test, and then execute them for the period set in FUZZ_TIME.

Signed-off-by: Paulo Gomes <[email protected]>
This specific fuzzing test has some design issues which result in more
false positives than necessary. As V6 work progresses a replacement
will be added.

Signed-off-by: Paulo Gomes <[email protected]>
build: Fixes fuzzing issues
pjbgf and others added 24 commits January 24, 2025 08:53
build: Update branch strategy
Bumps [github/codeql-action](https://github.com/github/codeql-action) from 3.28.3 to 3.28.5.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Commits](github/codeql-action@v3.28.3...v3.28.5)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
…github/codeql-action-3.28.5

build: bump github/codeql-action from 3.28.3 to 3.28.5
The introduction of both new types are a stepping stone to enable
SHA256 support concurrently - without the need for a build tag.

ObjectID provides a way to represent varied sized hashes (for SHA1
or SHA256) with a single type, while keeping similar performance of the
existing plumbing.Hash.

The names were picked in order to provide a clearer distinction on what
they do. ObjectID is the result of a hash operation and can't be
changed once calculated. ObjectHasher, adds the Git object header before
a normal hash operation that hash.Hash or plumbing/hash.Hash do.

The performance results shows that SHA1 operations could be slower,
however for SHA256 it can be over 3 times faster:

~Hasher/hasher-sha1-100B-16         	            2202226	     574.7 ns/op	 174.00 MB/s     272 B/op     7 allocs/op
~Hasher/objecthash-sha1-100B-16     	            1511851	     772.6 ns/op	 129.43 MB/s     272 B/op     9 allocs/op
~Hasher/objecthash-sha256-100B-16   	            5057584	     247.4 ns/op	 404.21 MB/s      96 B/op     7 allocs/op
~Hasher/hasher-sha1-5000B-16        	              96476	     12523 ns/op	 399.25 MB/s    5536 B/op     7 allocs/op
~Hasher/objecthash-sha1-5000B-16    	             105376	     10983 ns/op	 455.27 MB/s     272 B/op     9 allocs/op
~Hasher/objecthash-sha256-5000B-16  	             420741	      2828 ns/op	1767.77 MB/s      96 B/op     7 allocs/op
~HashFromHex/hasher-parse-sha1-valid-16         	23243548     64.65 ns/op	 618.69 MB/s      48 B/op     1 allocs/op
~HashFromHex/objecthash-fromhex-sha1-valid-16   	18120699     79.62 ns/op	 502.36 MB/s      72 B/op     2 allocs/op
~HashFromHex/objecthash-fromhex-sha256-valid-16 	 8969871    124.2 ns/op	     515.22 MB/s      96 B/op     2 allocs/op

Signed-off-by: Paulo Gomes <[email protected]>
Signed-off-by: Paulo Gomes <[email protected]>
Signed-off-by: Paulo Gomes <[email protected]>
Improves plumbing.Hash extensibility, which is part of the sha256 support.

Signed-off-by: Paulo Gomes <[email protected]>
Removes redundant logic, and keep all hash specific code inside its own package.
The refactoring of Hasher will be done separately.

Signed-off-by: Paulo Gomes <[email protected]>
The initial implementation required build tags in order to support SHA256.
This will no longer be the case, and therefore this code is redundant.

Signed-off-by: Paulo Gomes <[email protected]>
@pjbgf
Copy link
Member Author

pjbgf commented Jan 27, 2025

Latest feedback adopted and rebased from main.

@pjbgf pjbgf merged commit 139212e into go-git:v6-sha256 Jan 27, 2025
14 checks passed
@pjbgf pjbgf deleted the sha256-newhash branch January 27, 2025 23:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants