Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

persistence for ExecutionContextState? #755

Open
jimexist opened this issue Jul 19, 2021 · 7 comments
Open

persistence for ExecutionContextState? #755

jimexist opened this issue Jul 19, 2021 · 7 comments
Labels
enhancement New feature or request

Comments

@jimexist
Copy link
Member

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
(This section helps Arrow developers understand the context and why for this feature, in addition to the what)

i wonder if there's anyway for ExecutionContextState to be persisted? So that it can be persisted across binary startup

Describe the solution you'd like
A clear and concise description of what you want to happen.

SQLite would be a good choice

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

@jimexist jimexist added the enhancement New feature or request label Jul 19, 2021
@alamb
Copy link
Contributor

alamb commented Jul 19, 2021

Maybe using serde might be a good choice so that users could choose what particular persistence mechanism they wanted.

@Dandandan
Copy link
Contributor

Dandandan commented Jul 19, 2021

Serde sounds like a good option - I would not add SQLite to DataFusion.

What is the exact use case though? I think table / metadata is commonly kept in a data catalog / metastore and configuration is given on startup of the session. Any things other than that?
AFAIK Spark doesn't give an option like this?

@Dandandan Dandandan reopened this Jul 19, 2021
@EricJoy2048
Copy link
Member

Serde sounds like a good option - I would not add SQLite to DataFusion.

What is the exact use case though? I think table / metadata is commonly kept in a data catalog / megastore and configuration is given on startup of the session. Any things other than that? AFAIK Spark doesn't give an option like this?

Some times we want to create table with SQL, and still want to use the table when the session is restart.

@alamb
Copy link
Contributor

alamb commented Dec 9, 2021

It sounds like a usecase would be to save all the table providers -- since they can be user provided (in other Rust code) I am not sure serializing them in the core of DataFusion makes much sense.

Adding some sort of table / session persistence to datafusion-cli (and other users of the core DataFusion) would make sense to me

@EricJoy2048
Copy link
Member

It sounds like a usecase would be to save all the table providers -- since they can be user provided (in other Rust code) I am not sure serializing them in the core of DataFusion makes much sense.

Adding some sort of table / session persistence to datafusion-cli (and other users of the core DataFusion) would make sense to me

ExecutionContext only support create catalog from default. I want to unify the management of catalog and schema information externally, and this information can be shared by different ExecutionContexts, it is impossible to do so now. If the content in ExecutionContextState can be init through the new(state: Arc<Mutex<ExecutionContextState>>) method, then we can manage this information in the ballista scheduler, and send this information to the ballista executor, where every datafusion ExecutionContext created in the ballista executor can be Have the same ExecutionContextState content.

image

image

@houqp
Copy link
Member

houqp commented Dec 10, 2021

I also think serde would be a good fit for what we are trying to serialize here.

@drauschenbach
Copy link
Contributor

FYI ExecutionContextState is called SessionState nowadays. I briefly added #[derive(..., Deserialize) to it and a few other things, and this looks like a non-trivial change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants