From e9773b6ba6704141edb6b0f65fae6ee31b77ea0c Mon Sep 17 00:00:00 2001 From: dat-a-man <98139823+dat-a-man@users.noreply.github.com> Date: Wed, 29 Jan 2025 09:18:04 +0000 Subject: [PATCH] Updated for docs --- docs/website/docs/reference/performance.md | 23 ++++++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/docs/website/docs/reference/performance.md b/docs/website/docs/reference/performance.md index 436cce76b9..1bf6efb208 100644 --- a/docs/website/docs/reference/performance.md +++ b/docs/website/docs/reference/performance.md @@ -319,3 +319,26 @@ volumes { } } ``` +## Handling storage limits + +If your storage reaches its limit, you are likely running dlt in a cloud environment with restricted disk space. To prevent issues, mount an external cloud storage location and set the `DLT_DATA_DIR` environment variable to point to it. This ensures that dlt uses the mounted storage as its data directory instead of local disk space. + + +### Setting `DLT_DATA_DIR` + +You can configure `DLT_DATA_DIR` in your environment setup as follows: + +```py +import os +from dlt.common.known_env import DLT_DATA_DIR + + +# Define the path to your mounted external storage +data_dir = "/path/to/mounted/bucket/dlt_pipeline_data" + +# Set the DLT_DATA_DIR environment variable +os.environ[DLT_DATA_DIR] = data_dir + +# Rest of your pipeline code +``` +This directs dlt to use the specified external storage for all data operations, preventing local storage constraints. \ No newline at end of file