Skip to content

Latest commit

 

History

History
189 lines (131 loc) · 7.7 KB

features.md

File metadata and controls

189 lines (131 loc) · 7.7 KB

List of features

Support for entity changes

Support for these entity change operations:

  • OPERATION_CREATE: The received entity changes are directly inserted into ClickHouse according to the provided schema.

  • OPERATION_UPDATE: By default, updates are treated as new items. This allows to build an history of every transaction. If required, previous records can be replaced by specifying the engine as ReplacingMergeTree. See this article.

  • OPERATION_DELETE: Entity changes are not actually deleted from the database. Again, this allows to build an history of every transaction. The deleted fields are inserted into deleted_entity_changes. This table can then be used to filter out deleted data if required.

Support for database changes

The sink accepts database changes payloads. The sink only takes into account the newValue for every column.
These operations are supported: OPERATION_CREATE, OPERATION_UPDATE and OPERATION_DELETE.

The behavior is the same as for entity changes. Read more here.

Table initialization

The sink offers many ways to setup a database to store substreams data.
SQL schemas are the default option, GraphQL entities offer the fastest setup and materialized views are more hands on but allow full control on ClickHouse's behavior.

SQL schemas

This option is the default way to create a table.

Execute SQL against the database to setup a table.
The sink allows for a remote file as a query parameter. If so, the body is ignored.

$ curl --location --request PUT "localhost:3000/schema/sql" \
    --header "Content-Type: text/plain" \
    --header "Authorization: Bearer <password>" \
    --data "CREATE TABLE foo () ENGINE=MergeTree ORDER BY();"
$ # OR
$ curl --location --request PUT 'localhost:3000/schema/sql?schema-url=<url>' --header 'Authorization: Bearer <password>'

Here is a valid SQL schema. Notice that the sorting key contains an undefined column: chain.

CREATE TABLE IF NOT EXISTS example (
    foo UInt32
)
ENGINE = MergeTree
ORDER BY (chain, foo)

This is allowed because the sink injects columns automatically when creating a table.
These are the columns. These names cannot be used in the original schemas.

Column Type
block_number UInt32
module_hash FixedString(40)
timestamp DateTime64(3, 'UTC')
chain LowCardinality(String)

GraphQL schemas

This option is the fastest option to create a table but allows for less control.

A GraphQL entity schema can be executed against the sink. It is translated into a SQL schema beforehand. The types are converted according to the following rules.

The available data types are defined here.

GraphQL data type ClickHouse equivalent
Bytes String
String String
Boolean boolean
Int Int32
BigInt String
BigDecimal String
Float Float64
ID String

The schema is executred as follows.

$ curl --location --request PUT "localhost:3000/schema/graphql" \
    --header "Content-Type: text/plain" \
    --header "Authorization: Bearer <password>" \
    --data "type foo { id: ID! }"
$ # OR
$ curl --location --request PUT 'localhost:3000/schema/graphql?schema-url=<url>' --header 'Authorization: Bearer <password>'

Here is a valid entity schema.

type example @entity {
  id: ID!
  foo: Int!
}

The sorting key is auto-generated. It contains every ID field.
The table engine cannot be changed.
If more control is required, it is suggested to use SQL schemas.

Materialized View

This option is fully hands on. Everything can be configured.

No schema is required to store data in ClickHouse. Everything can be stored in unparsed_json (see database structure).

The user must build custom views to transform the data according to their needs. Further details are available in ClickHouse's documentation.

The following code will store every key found in the substreams provided data in the substreams_keys table.

CREATE TABLE substreams_keys (
	source String,
	keys   Array(String),
)
ENGINE MergeTree
ORDER BY (source)
CREATE MATERIALIZED VIEW substreams_keys_mv
TO substreams_keys
AS
SELECT source, JSONExtractKeys(raw_data) AS keys
FROM unparsed_json
substreams-sink-clickhouse --allow-unparsed true

Automatic block metadata

Data for each block is stored alongside every record. The fields and their structure can be found in the database structure.

Serverless

The ClickHouse sink is stateless. It can be scaled as needed.

Authentication

Some endpoints are protected in the following manner.

Basic authentication for users

To disable authentication for users, set AUTH_KEY= in .env. Suggested for development

Generate a hash for a password and set in .env as AUTH_KEY.
Further PUT requests will require the password in the Authoriation header.

$ curl --location 'localhost:3000/hash' --header 'Content-Type: text/plain' --data '<password>'

ed25519 signatures for webhooks

Each incoming payload (POST /webhook) is expected to be signed with the following headers. This signature is generated by the webhook sink.

x-signature-ed25519 = <signature>
x-signature-ed25519-expiry = <expiry-unix-format>
x-signature-ed25519-public-key = <public-key>

The token must not be expired.
The public key must be set in the environment. Many public keys can be set in a comma separated list.
The signature must match the payload { exp: <expiry>, id: <id> }.

If any of these criterias are not met, the sink will return an error code.

No data loss

The substreams sink ensures no data loss.
This is done by saving every record locally in SQLite before sending a response to the webhook sink.

SQLite is then periodically saved into ClickHouse when the sink is running.
It can also be done when starting the sink by setting --resume true (on by default).

Pause

The sink can be paused if needed. This allows ClickHouse to be modified without losing blocks.
The sink will return HTTP 400 error code when paused.
The sink will behave normally when unpaused.

$ # Pause
$ curl --location --request PUT 'localhost:3000/pause' --header 'Authorization: Bearer <password>'
$ # Unpause
$ curl --location --request PUT 'localhost:3000/unpause' --header 'Authorization: Bearer <password>'