Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Follow the XDG Base Directory spec for local pipelines' data #2319

Open
goosethedev opened this issue Feb 17, 2025 · 5 comments · May be fixed by #2361
Open

Follow the XDG Base Directory spec for local pipelines' data #2319

goosethedev opened this issue Feb 17, 2025 · 5 comments · May be fixed by #2361
Assignees

Comments

@goosethedev
Copy link

Feature description

When running a dlt pipeline on Linux, a directory gets created at ~/.dlt. According to the XDG Base Directory specification, application specific data should be stored at $XDG_DATA_HOME/dlt if the environment variable is set.

Are you a dlt user?

Yes, I'm already a dlt user.

Use case

No response

Proposed solution

Check if the $XDG_DATA_HOME env var is set. If so, use that location to store the pipelines' data.

An additional condition in the run_context.py file could be enough.

Related issues

No response

@sh-rp
Copy link
Collaborator

sh-rp commented Feb 24, 2025

Hey @goosethedev, would you like to provide a PR for this? It should be only a couple of lines of codes and I can point you in the right direction.

@sh-rp sh-rp moved this from Todo to Planned in dlt core library Feb 24, 2025
@goosethedev
Copy link
Author

Yes, I'll prepare a PR. However, how would be best to handle cases where the $XDG_DATA_HOME env var is already set, but ~/.dlt is used currently? Some options I can think of:

  • Moving the directory if present.
  • Recreating the data and displaying a warning to remove manually the previous directory.

@sh-rp
Copy link
Collaborator

sh-rp commented Feb 26, 2025

I would do option 2

@rudolfix
Copy link
Collaborator

implementation tip: there's already an env variable that will place data_dir of dlt wherever you want:

DLT_DATA_DIR = "DLT_DATA_DIR"
"""Sets default directory where pipelines' data (working directories) will be stored"""

my take would be to find how this is used and add a fallback with XDG_DATA_HOME. it should be just one place in the code (excl. tests) somewhere in run_context,py

btw. what about other directories? https://specifications.freedesktop.org/basedir-spec/latest/
@sh-rp maybe we should implement dlt RunContext that stores dlt files per those settings instead of modifying the default? otherwise I see backward compatibility problems as mentioned above?

@goosethedev
Copy link
Author

@sh-rp @rudolfix I ended up using the XDG_DATA_HOME env var if it is set and ~/.dlt doesn't exist. Otherwise ~/.dlt is used and no compat breakage.

As for other XDG dirs, I'm not familiar with what other data dlt stores globally, but I think for now the data dir is a good start.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Planned
Development

Successfully merging a pull request may close this issue.

3 participants