Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reach a conclusion regarding data sharing #4

Open
velix opened this issue Feb 2, 2025 · 0 comments
Open

Reach a conclusion regarding data sharing #4

velix opened this issue Feb 2, 2025 · 0 comments

Comments

@velix
Copy link
Contributor

velix commented Feb 2, 2025

We are working with a lot of different sources of data, without any source of truth or centralized repository of clean datasets.

That is an issue because there are some proprietary data sources that cannot be shared publicly (e.g uploaded to the public repo) and also people working with their own datasets locally makes reproduction of their work very difficult.

It would be useful to create some structure so that specific analyses and their input and output data would be tied together (pipelines) and also have some kind of data store that can act as source of truth.

We can look at what the Zetkin infra looks like, and if there is already some cloud database that we can piggyback off of.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant