Skip to content

Commit

Permalink
Docs: Update docs for cron refreshes in Rill Developer (#5619)
Browse files Browse the repository at this point in the history
* Docs: Update docs for cron refreshes in Rill Developer

* Small cleanups and clarifications

---------

Co-authored-by: Andrew Tsao <[email protected]>
  • Loading branch information
begelundmuller and AndrewRTsao authored Sep 5, 2024
1 parent 5484eea commit 4e9079d
Show file tree
Hide file tree
Showing 4 changed files with 33 additions and 22 deletions.
13 changes: 11 additions & 2 deletions docs/docs/build/connect/source-refresh.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ When creating or updating a source in Rill Cloud, you also have the option to co

:::tip Configuring source refreshes for Cloud deployments

It is generally strongly recommended to configure source refreshes when [deploying a project](/deploy/existing-project/existing-project.md) to Rill Cloud to ensure that your production data (and dashboards) _remains up-to-date_. The interval that you should set really depends on how often your own source data is refreshed. Furthermore, while it is technically possible to configure source refreshes for Rill Developer as well, Rill Developer is primarily used for local development and thus typically does not require working with the most up-to-date data (local source refreshes that occur too often as well could also lead to resource constraints on your local machine). For more details, please see our pages on [environments](/build/models/environments#default-dev-and-prod-environments), [templating](/deploy/templating#environments-and-rill), and [performance optimization](/deploy/performance#configure-source-refresh-schedules-in-production-only).
It is generally strongly recommended to configure source refreshes when [deploying a project](/deploy/existing-project/existing-project.md) to Rill Cloud to ensure that your production data (and dashboards) _remains up-to-date_. The interval that you should set really depends on how often your own source data is refreshed. Automatic scheduled refreshes are not applied in Rill Developer by default since it typically does not require working with the most up-to-date data (local source refreshes that occur too often as well could also lead to resource constraints on your local machine). For more details, please see our pages on [environments](/build/models/environments#default-dev-and-prod-environments), [templating](/deploy/templating#environments-and-rill), and [performance optimization](/deploy/performance#configure-source-refresh-schedules-in-production-only).

:::

Expand Down Expand Up @@ -59,4 +59,13 @@ sources:

If you have both a project-wide default and source specific refresh schedule _configured in the same project_, the source specific refresh will **override** the project default based on how [inheritance](/build/models/environments#specifying-environment-specific-yaml-overrides) works in Rill. Otherwise, if not specified, the project-wide default will be used instead!

:::
:::

## Running scheduled refreshes in development

By default, scheduled refreshes are **not** applied in Rill Developer. If you want to run or test scheduled refreshes in local development, you can override this behavior using the `run_in_dev` property:
```yaml
refresh:
cron: 0 * * * *
run_in_dev: true
```
30 changes: 20 additions & 10 deletions docs/docs/build/models/environments.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,21 +23,19 @@ For the built-in `dev` and `prod` environments **specifically**, Rill provides a
```yaml
env:
dev:
path: s3://path/to/bucket/Y=2024/M=01/*.parquet
sources:
path: s3://path/to/bucket/Y=2024/M=01/*.parquet
prod:
refresh:
cron: 0 * * * *
models:
materialize: true
```
This would be exactly equivalent to (within the same `rill.yaml`):
```yaml
dev:
path: s3://path/to/bucket/Y=2024/M=01/*.parquet
sources:
path: s3://path/to/bucket/Y=2024/M=01/*.parquet
prod:
refresh:
cron: 0 * * * *
models:
materialize: true
```
Expand All @@ -60,7 +58,7 @@ rill start --env <name_of_environment>

## Specifying environment specific YAML overrides

Environment overrides can be applied to source properties in the [YAML configuration](/reference/project-files/sources.md) of a source. For example, let's say that you have a [S3](/reference/connectors/s3.md) source defined but you only wanted to read from a particular month partition during local development and make sure that [source refreshes](/build/connect/source-refresh.md) are only applied _in production_ (i.e. when a project is deployed on Rill Cloud). Then, in your `source.yaml` file, you can define it as:
Environment overrides can be applied to source properties in the [YAML configuration](/reference/project-files/sources.md) of a source. For example, let's say that you have a [S3](/reference/connectors/s3.md) source defined but you only wanted to read from a particular month partition during local development. Then, in your `source.yaml` file, you can define it as:

```yaml
type: source
Expand All @@ -69,9 +67,6 @@ path: s3://path/to/bucket/*.parquet
env:
dev:
path: s3://path/to/bucket/Y=2024/M=01/*.parquet
prod:
refresh:
cron: 0 * * * *
```

Similarly, if you wanted to set a project-wide default in `rill.yaml` where models are [materialized](/reference/project-files/models.md#model-materialization) only on Rill Cloud (i.e. `prod) and dashboards use a different default [theme](../dashboards/customize.md#changing-themes--colors) in production compared to locally, you could do this by:
Expand All @@ -94,6 +89,21 @@ As a general rule of thumb, properties that have been specified at a more _granu

:::

## Running scheduled source refreshes in development

As an exception, scheduled source refreshes specified using `refresh:` are not applied in the `dev` environment by default. If you want to run or test scheduled refreshes in local development, you can override this behavior using the `run_in_dev` property:
```yaml
refresh:
cron: 0 * * * *
run_in_dev: true
```

:::tip Why are source refreshes only enabled by default for Rill Cloud?

Source refreshes are primarily meant to _help keep the data in your deployed dashboards on Rill Cloud up-to-date_ (without needing to manually trigger refreshes). For more details, see our documentation on [configuring source refreshes](/build/connect/source-refresh.md).

:::

## Using environments to generate custom templated SQL

Environments are also useful when you wish to apply environment-specific SQL logic to your sources and models. One common use case would to apply a filter or limit for models automatically when developing locally (in Rill Developer) but not having these same conditions applied to production models deployed on Rill Cloud. These same principles could also be extended to apply more advanced logic and conditional statements based on your requirements. This is all possible by combining environments with Rill's ability to leverage [templating](/deploy/templating.md).
Expand Down
10 changes: 0 additions & 10 deletions docs/docs/deploy/performance.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,16 +102,6 @@ If you use templating in SQL models, you must replace references to tables / mod

Some organizations might have both a development and production version of source data. In these cases, your sources should be configured to use the "dev" bucket or database for local development (in Rill Developer) and pointed to the "prod" bucket or database when in production (when deployed to Rill Cloud). Please refer to [this example](templating.md#changing-the-database-user-based-on-dev--prod) and [this example](templating.md#changing-the-bucket-location-based-on-dev--prod) for a complete walkthrough of how this can be configured.

### Configure source refresh schedules in production only

As [source refreshes](/build/connect/source-refresh.md) are generally aimed to help with ensuring data freshness in a [production environment](/build/models/environments.md) and can also take time to reconcile when a source refresh does end up triggering, they are not necessarily needed for model development and testing purposes. If you're using both Rill Developer and Cloud, it is recommended to configure source refreshes only in `prod`. This can be done by adding the following to your `rill.yaml`:

```yaml
prod:
refresh:
cron: '<cron_syntax>'
```

## Query Optimization

Query optimization is crucial for maintaining high performance and efficiency, especially when working with data-intensive applications. As Rill dashboards are powered by [OLAP engines](../build/olap/olap.md), designed for analytical queries, ensuring that our queries are well-optimized can help maximize the responsiveness and speed of our dashboards. There are also additional potential second-order benefits to optimizing queries in Rill, such as improving ingestion times, how long it takes to build models, how resource intensive it is to build models, how fast profiling queries run, and more.
Expand Down
2 changes: 2 additions & 0 deletions docs/docs/reference/project-files/rill-yaml.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,8 @@ title: My Rill Project
sources:
refresh:
cron: '0 * * * *'
# Uncomment to run cron jobs in development:
# run_in_dev: true
models:
materialize: true
dashboards:
Expand Down

0 comments on commit 4e9079d

Please sign in to comment.