From 4e9079dac0846c053e0bd8c4d8ab24ffbc9656ce Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Benjamin=20Egelund-M=C3=BCller?= Date: Thu, 5 Sep 2024 17:28:57 +0100 Subject: [PATCH] Docs: Update docs for cron refreshes in Rill Developer (#5619) * Docs: Update docs for cron refreshes in Rill Developer * Small cleanups and clarifications --------- Co-authored-by: Andrew Tsao --- docs/docs/build/connect/source-refresh.md | 13 ++++++-- docs/docs/build/models/environments.md | 30 ++++++++++++------- docs/docs/deploy/performance.md | 10 ------- .../docs/reference/project-files/rill-yaml.md | 2 ++ 4 files changed, 33 insertions(+), 22 deletions(-) diff --git a/docs/docs/build/connect/source-refresh.md b/docs/docs/build/connect/source-refresh.md index 5d1711cb7a6..ce349163502 100644 --- a/docs/docs/build/connect/source-refresh.md +++ b/docs/docs/build/connect/source-refresh.md @@ -11,7 +11,7 @@ When creating or updating a source in Rill Cloud, you also have the option to co :::tip Configuring source refreshes for Cloud deployments -It is generally strongly recommended to configure source refreshes when [deploying a project](/deploy/existing-project/existing-project.md) to Rill Cloud to ensure that your production data (and dashboards) _remains up-to-date_. The interval that you should set really depends on how often your own source data is refreshed. Furthermore, while it is technically possible to configure source refreshes for Rill Developer as well, Rill Developer is primarily used for local development and thus typically does not require working with the most up-to-date data (local source refreshes that occur too often as well could also lead to resource constraints on your local machine). For more details, please see our pages on [environments](/build/models/environments#default-dev-and-prod-environments), [templating](/deploy/templating#environments-and-rill), and [performance optimization](/deploy/performance#configure-source-refresh-schedules-in-production-only). +It is generally strongly recommended to configure source refreshes when [deploying a project](/deploy/existing-project/existing-project.md) to Rill Cloud to ensure that your production data (and dashboards) _remains up-to-date_. The interval that you should set really depends on how often your own source data is refreshed. Automatic scheduled refreshes are not applied in Rill Developer by default since it typically does not require working with the most up-to-date data (local source refreshes that occur too often as well could also lead to resource constraints on your local machine). For more details, please see our pages on [environments](/build/models/environments#default-dev-and-prod-environments), [templating](/deploy/templating#environments-and-rill), and [performance optimization](/deploy/performance#configure-source-refresh-schedules-in-production-only). ::: @@ -59,4 +59,13 @@ sources: If you have both a project-wide default and source specific refresh schedule _configured in the same project_, the source specific refresh will **override** the project default based on how [inheritance](/build/models/environments#specifying-environment-specific-yaml-overrides) works in Rill. Otherwise, if not specified, the project-wide default will be used instead! -::: \ No newline at end of file +::: + +## Running scheduled refreshes in development + +By default, scheduled refreshes are **not** applied in Rill Developer. If you want to run or test scheduled refreshes in local development, you can override this behavior using the `run_in_dev` property: +```yaml +refresh: + cron: 0 * * * * + run_in_dev: true +``` diff --git a/docs/docs/build/models/environments.md b/docs/docs/build/models/environments.md index 721fb91beb5..2a8cb32471a 100644 --- a/docs/docs/build/models/environments.md +++ b/docs/docs/build/models/environments.md @@ -23,10 +23,9 @@ For the built-in `dev` and `prod` environments **specifically**, Rill provides a ```yaml env: dev: - path: s3://path/to/bucket/Y=2024/M=01/*.parquet + sources: + path: s3://path/to/bucket/Y=2024/M=01/*.parquet prod: - refresh: - cron: 0 * * * * models: materialize: true ``` @@ -34,10 +33,9 @@ env: This would be exactly equivalent to (within the same `rill.yaml`): ```yaml dev: - path: s3://path/to/bucket/Y=2024/M=01/*.parquet + sources: + path: s3://path/to/bucket/Y=2024/M=01/*.parquet prod: - refresh: - cron: 0 * * * * models: materialize: true ``` @@ -60,7 +58,7 @@ rill start --env ## Specifying environment specific YAML overrides -Environment overrides can be applied to source properties in the [YAML configuration](/reference/project-files/sources.md) of a source. For example, let's say that you have a [S3](/reference/connectors/s3.md) source defined but you only wanted to read from a particular month partition during local development and make sure that [source refreshes](/build/connect/source-refresh.md) are only applied _in production_ (i.e. when a project is deployed on Rill Cloud). Then, in your `source.yaml` file, you can define it as: +Environment overrides can be applied to source properties in the [YAML configuration](/reference/project-files/sources.md) of a source. For example, let's say that you have a [S3](/reference/connectors/s3.md) source defined but you only wanted to read from a particular month partition during local development. Then, in your `source.yaml` file, you can define it as: ```yaml type: source @@ -69,9 +67,6 @@ path: s3://path/to/bucket/*.parquet env: dev: path: s3://path/to/bucket/Y=2024/M=01/*.parquet - prod: - refresh: - cron: 0 * * * * ``` Similarly, if you wanted to set a project-wide default in `rill.yaml` where models are [materialized](/reference/project-files/models.md#model-materialization) only on Rill Cloud (i.e. `prod) and dashboards use a different default [theme](../dashboards/customize.md#changing-themes--colors) in production compared to locally, you could do this by: @@ -94,6 +89,21 @@ As a general rule of thumb, properties that have been specified at a more _granu ::: +## Running scheduled source refreshes in development + +As an exception, scheduled source refreshes specified using `refresh:` are not applied in the `dev` environment by default. If you want to run or test scheduled refreshes in local development, you can override this behavior using the `run_in_dev` property: +```yaml +refresh: + cron: 0 * * * * + run_in_dev: true +``` + +:::tip Why are source refreshes only enabled by default for Rill Cloud? + +Source refreshes are primarily meant to _help keep the data in your deployed dashboards on Rill Cloud up-to-date_ (without needing to manually trigger refreshes). For more details, see our documentation on [configuring source refreshes](/build/connect/source-refresh.md). + +::: + ## Using environments to generate custom templated SQL Environments are also useful when you wish to apply environment-specific SQL logic to your sources and models. One common use case would to apply a filter or limit for models automatically when developing locally (in Rill Developer) but not having these same conditions applied to production models deployed on Rill Cloud. These same principles could also be extended to apply more advanced logic and conditional statements based on your requirements. This is all possible by combining environments with Rill's ability to leverage [templating](/deploy/templating.md). diff --git a/docs/docs/deploy/performance.md b/docs/docs/deploy/performance.md index 570f27f69f3..2fe05c6610e 100644 --- a/docs/docs/deploy/performance.md +++ b/docs/docs/deploy/performance.md @@ -102,16 +102,6 @@ If you use templating in SQL models, you must replace references to tables / mod Some organizations might have both a development and production version of source data. In these cases, your sources should be configured to use the "dev" bucket or database for local development (in Rill Developer) and pointed to the "prod" bucket or database when in production (when deployed to Rill Cloud). Please refer to [this example](templating.md#changing-the-database-user-based-on-dev--prod) and [this example](templating.md#changing-the-bucket-location-based-on-dev--prod) for a complete walkthrough of how this can be configured. -### Configure source refresh schedules in production only - -As [source refreshes](/build/connect/source-refresh.md) are generally aimed to help with ensuring data freshness in a [production environment](/build/models/environments.md) and can also take time to reconcile when a source refresh does end up triggering, they are not necessarily needed for model development and testing purposes. If you're using both Rill Developer and Cloud, it is recommended to configure source refreshes only in `prod`. This can be done by adding the following to your `rill.yaml`: - -```yaml -prod: - refresh: - cron: '' -``` - ## Query Optimization Query optimization is crucial for maintaining high performance and efficiency, especially when working with data-intensive applications. As Rill dashboards are powered by [OLAP engines](../build/olap/olap.md), designed for analytical queries, ensuring that our queries are well-optimized can help maximize the responsiveness and speed of our dashboards. There are also additional potential second-order benefits to optimizing queries in Rill, such as improving ingestion times, how long it takes to build models, how resource intensive it is to build models, how fast profiling queries run, and more. diff --git a/docs/docs/reference/project-files/rill-yaml.md b/docs/docs/reference/project-files/rill-yaml.md index ce5d370ba23..04e47a22953 100644 --- a/docs/docs/reference/project-files/rill-yaml.md +++ b/docs/docs/reference/project-files/rill-yaml.md @@ -51,6 +51,8 @@ title: My Rill Project sources: refresh: cron: '0 * * * *' + # Uncomment to run cron jobs in development: + # run_in_dev: true models: materialize: true dashboards: