List of features

Support for entity changes

Support for these entity change operations:

OPERATION_CREATE: The received entity changes are directly inserted into ClickHouse according to the provided schema.
OPERATION_UPDATE: By default, updates are treated as new items. This allows to build an history of every transaction. If required, previous records can be replaced by specifying the engine as ReplacingMergeTree. See this article.
OPERATION_DELETE: Entity changes are not actually deleted from the database. Again, this allows to build an history of every transaction. The deleted fields are inserted into deleted_entity_changes. This table can then be used to filter out deleted data if required.

The sink accepts database changes payloads. The sink only takes into account the newValue for every column.
These operations are supported: OPERATION_CREATE, OPERATION_UPDATE and OPERATION_DELETE.

The behavior is the same as for entity changes. Read more here.

Table initialization

The sink offers many ways to setup a database to store substreams data.
SQL schemas are the default option, GraphQL entities offer the fastest setup and materialized views are more hands on but allow full control on ClickHouse's behavior.

SQL schemas

This option is the default way to create a table.

Execute SQL against the database to setup a table.
The sink allows for a remote file as a query parameter. If so, the body is ignored.

$ curl --location --request PUT "localhost:3000/schema/sql" \
    --header "Content-Type: text/plain" \
    --header "Authorization: Bearer <password>" \
    --data "CREATE TABLE foo () ENGINE=MergeTree ORDER BY();"
$ # OR
$ curl --location --request PUT 'localhost:3000/schema/sql?schema-url=<url>' --header 'Authorization: Bearer <password>'

Here is a valid SQL schema. Notice that the sorting key contains an undefined column: chain.

CREATE TABLE IF NOT EXISTS example (
    foo UInt32
)
ENGINE = MergeTree
ORDER BY (chain, foo)

This is allowed because the sink injects columns automatically when creating a table.
These are the columns. These names cannot be used in the original schemas.

Column	Type
`block_number`	UInt32
`module_hash`	FixedString(40)
`timestamp`	DateTime64(3, 'UTC')
`chain`	LowCardinality(String)

GraphQL schemas

This option is the fastest option to create a table but allows for less control.

A GraphQL entity schema can be executed against the sink. It is translated into a SQL schema beforehand. The types are converted according to the following rules.

The available data types are defined here.

GraphQL data type	ClickHouse equivalent
`Bytes`	`String`
`String`	`String`
`Boolean`	`boolean`
`Int`	`Int32`
`BigInt`	`String`
`BigDecimal`	`String`
`Float`	`Float64`
`ID`	`String`

The schema is executred as follows.

$ curl --location --request PUT "localhost:3000/schema/graphql" \
    --header "Content-Type: text/plain" \
    --header "Authorization: Bearer <password>" \
    --data "type foo { id: ID! }"
$ # OR
$ curl --location --request PUT 'localhost:3000/schema/graphql?schema-url=<url>' --header 'Authorization: Bearer <password>'

Here is a valid entity schema.

type example @entity {
  id: ID!
  foo: Int!
}

The sorting key is auto-generated. It contains every ID field.
The table engine cannot be changed.
If more control is required, it is suggested to use SQL schemas.

Materialized View

This option is fully hands on. Everything can be configured.

No schema is required to store data in ClickHouse. Everything can be stored in unparsed_json (see database structure).

The user must build custom views to transform the data according to their needs. Further details are available in ClickHouse's documentation.

The following code will store every key found in the substreams provided data in the substreams_keys table.

CREATE TABLE substreams_keys (
	source String,
	keys   Array(String),
)
ENGINE MergeTree
ORDER BY (source)

CREATE MATERIALIZED VIEW substreams_keys_mv
TO substreams_keys
AS
SELECT source, JSONExtractKeys(raw_data) AS keys
FROM unparsed_json

substreams-sink-clickhouse --allow-unparsed true

Automatic block metadata

Data for each block is stored alongside every record. The fields and their structure can be found in the database structure.

Serverless

The ClickHouse sink is stateless. It can be scaled as needed.

Authentication

Some endpoints are protected in the following manner.

Basic authentication for users

To disable authentication for users, set AUTH_KEY= in .env. Suggested for development

Generate a hash for a password and set in .env as AUTH_KEY.
Further PUT requests will require the password in the Authoriation header.

$ curl --location 'localhost:3000/hash' --header 'Content-Type: text/plain' --data '<password>'

ed25519 signatures for webhooks

Each incoming payload (POST /webhook) is expected to be signed with the following headers. This signature is generated by the webhook sink.

x-signature-ed25519 = <signature>
x-signature-ed25519-expiry = <expiry-unix-format>
x-signature-ed25519-public-key = <public-key>

The token must not be expired.
The public key must be set in the environment. Many public keys can be set in a comma separated list.
The signature must match the payload { exp: <expiry>, id: <id> }.

If any of these criterias are not met, the sink will return an error code.

No data loss

The substreams sink ensures no data loss.
This is done by saving every record locally in SQLite before sending a response to the webhook sink.

SQLite is then periodically saved into ClickHouse when the sink is running.
It can also be done when starting the sink by setting --resume true (on by default).

Pause

The sink can be paused if needed. This allows ClickHouse to be modified without losing blocks.
The sink will return HTTP 400 error code when paused.
The sink will behave normally when unpaused.

$ # Pause
$ curl --location --request PUT 'localhost:3000/pause' --header 'Authorization: Bearer <password>'
$ # Unpause
$ curl --location --request PUT 'localhost:3000/unpause' --header 'Authorization: Bearer <password>'

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

features.md

features.md

List of features

Support for entity changes

Support for database changes

Table initialization

SQL schemas

GraphQL schemas

Materialized View

Automatic block metadata

Serverless

Authentication

Basic authentication for users

ed25519 signatures for webhooks

No data loss

Pause

Files

features.md

Latest commit

History

features.md

File metadata and controls

List of features

Support for entity changes

Support for database changes

Table initialization

SQL schemas

GraphQL schemas

Materialized View

Automatic block metadata

Serverless

Authentication

Basic authentication for users

ed25519 signatures for webhooks

No data loss

Pause