Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trim down BallistaConfig #1104

Closed
Tracked by #1067
milenkovicm opened this issue Oct 30, 2024 · 1 comment · Fixed by #1108
Closed
Tracked by #1067

Trim down BallistaConfig #1104

milenkovicm opened this issue Oct 30, 2024 · 1 comment · Fixed by #1108
Labels
enhancement New feature or request

Comments

@milenkovicm
Copy link
Contributor

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

I would like to propose to trim down BallistaConfig, make SessionConfig as main way to configure Ballista and make BallistaConfig just a part of it, with ballista specific configurations

Describe the solution you'd like

merge following configuration:

+-------------------------------------------------------------------------+---------------------------+
| name                                                                    | value                     |
+-------------------------------------------------------------------------+---------------------------+
| ballista.batch.size                                                     | 8192                      |
| datafusion.execution.batch_size                                         | 8192                      |
| ballista.collect_statistics                                             | false                     |
| datafusion.execution.collect_statistics                                 | false                     |
| ballista.optimizer.hash_join_single_partition_threshold                 | 1048576                   |
| datafusion.optimizer.hash_join_single_partition_threshold               | 1048576                   |
| ballista.parquet.pruning                                                | true                      |
| datafusion.execution.parquet.pruning                                    | true                      |
| ballista.repartition.aggregations                                       | true                      |
| datafusion.optimizer.repartition_aggregations                           | true                      |
| ballista.repartition.joins                                              | true                      |
| datafusion.optimizer.repartition_joins                                  | true                      |
| ballista.repartition.windows                                            | true                      |
| datafusion.optimizer.repartition_windows                                | true                      |
| ballista.with_information_schema                                        | false                     |
| datafusion.catalog.information_schema                                   | true                      |
| ballista.shuffle.partitions                                             | 16                        |
| datafusion.execution.target_partitions                                  | 8                         |
| ballista.standalone.parallelism                                         | 8                         |
| datafusion.execution.target_partitions                                  | 8                         |
+-------------------------------------------------------------------------+---------------------------+

If we check /ballista/scheduler/src/state/session_manager.rs

pub fn create_datafusion_context(
    ballista_config: &BallistaConfig,
    session_builder: SessionBuilder,
) -> Arc<SessionContext> {
    let config =
        SessionConfig::from_string_hash_map(&ballista_config.settings().clone()).unwrap();
    let config = config
        .with_target_partitions(ballista_config.default_shuffle_partitions())
        .with_batch_size(ballista_config.default_batch_size())
        .with_repartition_joins(ballista_config.repartition_joins())
        .with_repartition_aggregations(ballista_config.repartition_aggregations())
        .with_repartition_windows(ballista_config.repartition_windows())
        .with_collect_statistics(ballista_config.collect_statistics())
        .with_parquet_pruning(ballista_config.parquet_pruning())
        .set_usize(
            "datafusion.optimizer.hash_join_single_partition_threshold",
            ballista_config.hash_join_single_partition_threshold(),
        )
        .set_bool("datafusion.optimizer.enable_round_robin_repartition", false);
    let session_state = session_builder(config);
    Arc::new(SessionContext::new_with_state(session_state))
}

ballista specific configuration, which should be preserved:


+-------------------------------------------------------------------------+---------------------------+
| name                                                                    | value                     |
+-------------------------------------------------------------------------+---------------------------+
| ballista.grpc_client_max_message_size                                   | 16777216                  |
| ballista.job.name                                                       |                           |
+-------------------------------------------------------------------------+---------------------------+

we can see that configuration map 1 to 1 to datafusion configuration

Note: datafusion.optimizer.enable_round_robin_repartition has to be false false ;

Describe alternatives you've considered

keep BallistaConfig but at the moment i see no benefit of keeping it or scenarios which SessionConfig cant support

Additional context

As it stands with introduction of SessionContextExt, BallistaConfig has been removed from all public interfaces, with this change BallistaConfig will probably be removed from most internal interfaces as well.

@milenkovicm milenkovicm added the enhancement New feature or request label Oct 30, 2024
@milenkovicm
Copy link
Contributor Author

will take this once #1103 and #1099 get merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant