Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use CIDs for content addressing rather than plain hashes #21

Merged
merged 5 commits into from
Aug 9, 2023

Conversation

snowteamer
Copy link
Collaborator

@snowteamer snowteamer commented Aug 6, 2023

Closes #19
Closes #20

Summary of changes

Content addressing

  • chel hash now outputs content identifiers (CIDs) rather than unadorned hashes. For now, CIDs created by the command include the raw (0x55) code by default, and the json (0x0200) code for JSON files.
  • It should now be much easier to switch the hashing algo or hash encoding, by updating constants.
  • Improved chel hash help message.

Testing (related to #11)

  • Added tests for chel hash.
  • Added tests for getBackend().
  • New deno task test command, to run the test suite with suitable CLI flags.
  • Files under ./test/assets/ are no longer git-ignored.

Database backends

  • Added a read-only dataFolder property to database backends. Useful e.g. in tests.
  • Fixed an error when calling initStorage() more than once on the sqlite backend.

Other

  • Refactored chel eventsAfter and chel migrate using the new getBackend() helper (related to Make more use of getBackend to DRY things #20)
  • Added a minimal .vscode/settings.json file to make sure the Deno extension is enabled in VS Code / Codium.
  • Fixed missing --no-remote flag in deno task chel.
  • Made the isDir() and isFile() utilities async.

src/utils.ts Outdated Show resolved Hide resolved
@snowteamer snowteamer force-pushed the add-json-codec-info branch from aab6889 to 4aef606 Compare August 7, 2023 22:28
@snowteamer snowteamer self-assigned this Aug 7, 2023
@snowteamer snowteamer changed the title Use CIDs rather than plain hashes Use CIDs for content addressing rather than plain hashes Aug 7, 2023
@@ -4,7 +4,7 @@ import { path } from './deps.ts'
import { checkKey } from './utils.ts'

// Initialized in `initStorage()`.
let dataFolder = ''
export let dataFolder = ''
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, why is this being exported? 🤔

I don't see it imported anywhere... and isn't kind of a weird thing to do, to export such a variable?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see it imported anywhere...

It's available as backend.dataFolder once the whole module has been imported as backend. It's not meant to be imported separately.

why is this being exported?

We might not need this in common use cases, but in some getBackend tests where the same backend object is "recycled" to use a new data folder, I could not easily make sure it had been correctly updated without having access to this propery.

Of course this happens because in this design, the backend object is the result of exporting its whole module, like export * as fs from './src/database-fs.ts'. In other words, the module acts as a singleton instance in OOP terminology, in which case exporting a module variable is like making a property of that singleton public rather than keeping it private.
So I guess it isn't that weird in the context of that design. But maybe that design itself is a bit weird.

By the way, the ES module spec ensures an exported let binding is read-only outside its module (although mutable from within it)

Copy link
Member

@taoeffect taoeffect left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nicely done @snowteamer! Left a few change requests + questions

Comment on lines -44 to -59
let from
let options
if (isDir(src)) {
from = 'fs'
options = { internal: true, dirname: src }
} else if (isFile(src)) {
from = 'sqlite'
options = { internal: true, dirname: path.dirname(src), filename: path.basename(src) }
} else throw new Error(`invalid argument: "${src}"`)

backend = backends[from as keyof typeof backends]
try {
await backend.initStorage(options)
} catch (error) {
exit(`could not init storage backend at "${src}" to fetch events from: ${error.message}`)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always great to see massive chunks of code get deleted 😄

Comment on lines +21 to +23
const data = isURL(src)
? await readRemoteData(src, key)
: await (await getBackend(src)).readData(key)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reducing lines of code is always appreciated 👍

src/hash.ts Outdated
if (!internal) {
console.log(`blake32Hash(${filename}):`, hash)
console.log(`CID for file (${filename}):`, cid)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you update this to read:

Suggested change
console.log(`CID for file (${filename}):`, cid)
console.log(`CID(${filename}):`, cid)

This makes it easier to use the command with programs like awk and maintains consistency with the format we had before.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do!

Comment on lines -22 to -50

const backendFrom = backends[from as keyof typeof backends]
const backendTo = backends[to as keyof typeof backends]

if (!backendFrom) exit(`unknown storage backend: "${from}"`)
if (!backendTo) exit(`unknown storage backend: "${to}"`)
if (from === to) exit('arguments --from and --to must be different')

if (isDir(src)) {
if (from === 'sqlite') exit(`not a database file: "${src}"`)
} else if (isFile(src)) {
if (from === 'fs') exit(`not a directory: "${src}"`)
} else {
exit(`not found: "${src}"`)
}

if (isDir(out)) {
if (to === 'sqlite') exit(`argument --out is a directory: "${out}"`)
} else if (isFile(out)) {
if (to === 'fs') exit(`argument --out is a file: "${out}"`)
} else if (out.endsWith('./')) {
if (to === 'sqlite') exit(`argument --out ends with a slash: "${out}"`)
}

try {
await backendFrom.initStorage(from === 'fs' ? {dirname: src} : {dirname: path.dirname(src), filename: path.basename(src)})
} catch (error) {
exit(`could not init storage backend at "${src}" to migrate from: ${error.message}`)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The amount of a DRYing brings a tear to my eye 🥲

import * as fs from './database-fs.ts'
import * as sqlite from './database-sqlite.ts'

const backends = { fs, sqlite}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing the spacing on the brace here 👍

// We can update these constants later if we want.
const backends = { fs, sqlite }
const multibase = base58btc
const multicodes = { JSON: 0x0200, RAW: 0x55 }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add a comment on the line above this one linking to where these values come from?

export function isArrayLength (arg: number): boolean {
return Number.isInteger(arg) && arg >= 0 && arg <= 2 ** 32 - 1
}

// Checks whether a path points to a directory, following symlinks if any.
export function isDir (path: string | URL): boolean {
export async function isDir (path: string | URL): Promise<boolean> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you change these to be async?

And if there's a strong reason to, can you add that as a comment?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nowadays async is generally the default way for any I/O operation in JavaScript, and the fact these functions were sync caught my eye. Agreed, not sure whether there is a real performance benefit in the CLI tool use case, at least as long as the functions aren't called in loops. But in principe it still allows the engine to not forcibly stay idle while waiting for the operation to complete. And we might start using these functions in loops sooner or later, in which case being async enables usage of e.g. Promise.all().
Also these function names have no -Sync suffix unlike native Nodejs and Deno sync IO functions

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright 👍

// nodejs as a wrapper executable to run our binary, greatly speeding up performance
// see: https://github.com/EverlastingBugstopper/binary-install/tree/main/packages/binary-install

const { existsSync, mkdirSync } = require("fs")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you delete this file?

I'm pretty sure this file must not be deleted or it will break the install

Copy link
Collaborator Author

@snowteamer snowteamer Aug 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, seems like I incidentally deleted it when re-vendoring the dependencies at some point

Comment on lines +23 to +34
await t.step('it should create a json CID for a json file', async () => {
const filepath = './test/assets/hello.json'
const actual = await hash([filepath], true)
const expected = 'zyop8PQypgycyyCEcQsEHoc9Z8S9bQt1CuHuRKnXrAXagVBGxQyvH'
assertEquals(actual, expected)

const cid = CID.parse(actual, base58btc.decoder)
assertEquals(cid.code, 0x0200)
assertEquals(cid.byteLength, 39)
assertEquals(cid.multihash.code, 45600)
assertEquals(cid.version, 1)
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏 Bravo on these tests btw!!

This is definitely very much needed for sanity checking!!

Copy link
Member

@taoeffect taoeffect left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@taoeffect taoeffect merged commit b3d653a into okTurtles:master Aug 9, 2023
@snowteamer snowteamer deleted the add-json-codec-info branch August 9, 2023 08:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make more use of getBackend to DRY things Add JSON encoding info to hash
2 participants