Release v24.02.3
Generated on 2024-04-24
- Cache CLI calls for node instance description (#952)
- Improve error handling in prediction code (#950)
- Support dynamic calculation of JVM resources in CLI cmd (#944)
- Syncup estimation model prediction logic updates (#946)
- Cluster inference should not run for unsupported platform (#941)
- Fix invalid values in cluster creation script (#935)
- Fix core tool doc links and user qualification tool default argument values (#931)
- Fix gpu cluster recommendation in user tools (#930)
- Bump idna from 3.4 to 3.7 in /data_validation (#932)
- Add cluster details in qualification summary output (#921)
- Refactor
find_matches_for_node
return values (#920) - [FEA] Add and use g5 AWS instances as default for qualification tool output (#898)
- Add jar argument to spark_rapids CLI (#902)
- Support driverlog argument in profiler CLI (#897)
- Followups on handling Photon eventlogs (#953)
- Sync operators support timestamped 24-04-16 (#951)
- Add CheckOverflowInTableInsert support: verify absence from physical plan (#942)
- Fix Notes column in the supported ops CSV files (#933)
- Improve sync plugin supported CSV python script (#919)
- Add cluster details in qualification summary output (#921)
- Add support for unsupported expressions reasons per Exec (#923)
- Adding more metrics and options for qual validation (#926)
- Generate cluster details in JSON output (#912)
- Add Divide and multiple interval expressions as supported (#917)
- Add support for PythonMapInArrowExec and MapInArrowExec (#913)
- Re-enable support for GetJsonObject by default (#916)
- Add support for WindowGroupLimitExec (#906)
- [FEA] Skip Spark Structured Streaming event logs for Qualification tool (#905)
- [FEA] Add and use g5 AWS instances as default for qualification tool output (#898)
- Initial version of qual tool validation script for classification metrics (#903)
- Fix Delta-core dependency for Spark35+ (#904)
- Add support for AtomicCreateTableAsSelectExec (#895)
- Add support for KnownNullable and EphemeralSubstring expressions (#894)
- Add Support for BloomFilterAggregate and BloomFilterMightContain exprs (#891)
- [DOC] Update README for sync plugin supported ops script (#893)
- Add operators to ignore list and update WindowExpr parser (#890)
- Add support to RoundCeil and RoundFloor expressions (#889)
Release v24.02.2
Generated on 2024-03-27
- Override estimated speedups when estimation model is enabled (#885)
- [FEA] Make top candidates view as the default view in user-tools (#879)
- Introduce new csv file containing output for all apps before grouping (#875)
- Fix calculation of unsupported operators stages duration and update output row (#874)
- Implement top candidate filter for user tools CLI output (#866)
- [FEA] Skip Databricks Photon jobs at app level in Qualification tool (#886)
- [FEA] Add Estimation Model to Qualification CLI (#870)
- Add rootExecutionID to output csv files (#871)
- [FEA] Generate updated supported CSV files from plugin repo (#847)
- Add action column to qual execs output (#859)
- Extend supportLevels in PluginTypeChecker (#863)
- Propagate Reason/Notes for operators disabled by default from plugin to Qualification tool unsupported operators csv file (#850)
Release v24.02.1
Generated on 2024-03-15
- Remove redundant initialization scripts from user tools output (#830)
- [DOC] Update Databricks Azure user tool setup instructions for output format (#826)
- Estimate cluster instances and generate cost savings (#803)
- Fix implementation of processSQLPlanMetrics in Profiler (#853)
- Deduplicate SQL duration wallclock time for databricks eventlog (#810)
- Consider additional factors in spark.sql.shuffle.partitions recommendation in Autotuner (#722)
- Fix case matching error In AutoTuner (#828)
- Fix ReadSchema in Qualification tool and NPE in Profiling tool (#825)
- AutoTuner does not process arguments skipList and limitedLogic (#812)
Release v24.02.0
Generated on 2024-02-24
- Fix missing config file for Dataproc GKE (#778)
- [FEA] Qualification user_tools runs AutoTuner by default (#771)
- [BUG] Fix databricks-aws user profiling tool error with
--gpu_cluster
argument (#707)
- [FEA] Qualification tool should mark WriteIntoDeltaCommand as supported (#801)
- Qualification tool should mark SubqueryExec as IgnoreNoPerf (#798)
- Generate cluster information from event logs in Qualification tool (#789)
- Sync up supported ops for 24.02 plugin release (#796)
- Qualification should mark empty2null as supported (#791)
- Incorrect parsing of aggregates in DB queries (#790)
- Qualification should mark WriteFiles as supported (#784)
- Introduce GpuDevice abstraction and refactor AutoTuner (#740)
- Consolidate unsupportedOperators into a single view (#766)
- Speedup generator script fails after adding runtime_properties (#776)
- Tools fail on DB10.4 clusters with IllegalArgException (#768)
- Fix SparkPlanGraphCluster constructor for DB Platforms (#765)
- Amendment to PR-763 (#764)
- Fix SQLPLanMetric constructor for DB Platforms (#763)
- Fix node constructor for DB platforms (#761)
- Add penalty for stages with UDF's (#757)
- Add support to appendDataExecV1 and overwriteByExprExecV1 (#756)
- Qualification fails to detect sortMergeJoin with arguments (#754)
- Fix Qualification crash during aggregation of stats (#753)
- [FEA] Extend the list of operators to be ignored in Qualification (#745)
- Remove ReusedSubquery from SparkPlanGraph construction (#741)
- Update unsupported operator csv file's app duration column (#748)
- [FEA] Qualification tool triggers the AutoTuner module (#739)
- Disable support of GetJsonObject in Qualification tool (#737)
- [FEA] AutoTuner warns that non-utf8 may not support some GPU expressions (#736)
- [FEA] AutoTuner should not skip non-gpu eventlogs (#728)
- Add auto-copyright for precommits (#732)
Release v23.12.3
Generated on 2024-01-12
- Add support of HiveTableScan and InsertIntoHive text-format (#723)
- Fix compilation error with JDK11 (#720)
- Generate an output file with runtime and build information (#705)
- AutoTuner should poll maven-meta to retrieve the latest jar version (#711)
- Profiling tool : Profiling tool throws NPE when appInfo is null and unchecked (#640)
- Add support to parse_url host and protocol (#708)
- [FEA] Profiling tool auto-tuner should consider
spark.databricks.adaptive.autoOptimizeShuffle.enabled
(#710) - [FEA] Profiler autotuner should only specify standard Spark versions for shuffle manager setting (#662)
- [FEA] Enable AQE related recommendations in Profiler Auto-tuner (#688)
Release v23.12.2
Generated on 2023-12-27
- Polling maven-metadata.xml to pull the latest tools jar (#703)
- Update pom to fail on warnings (#701)
Release v23.12.1
Generated on 2023-12-23
- no changes
Release v23.12.0
Generated on 2023-12-20
- Fix user qualification tool runtime error in
get_platform_name
for onprem platform (#684) - [FEA] User tool should pass
--platform
option/argument to Profiling tool (#679) - Fix incorrect processing of short flags for user tools cli (#677)
- Updating new CLI name from ascli to spark_rapids (#673)
- Bump pyarrow version (#664)
- Improve new CLI testing ensuring complete coverage of arguments cases (#652)
- Qualification tool: Add more information for unsupported operators (#680)
- Sync Execs and Expressions from spark-rapids resources (#691)
- Support parsing of inprogress eventlogs (#686)
- Enable features via config that are off by default in the profiler AutoTuner (#668)
- Fix platform names as string constants and reduce redundancy in unit tests (#667)
- Unified platform handling and fetching of operator score files (#661)
- Qualification tool: Ignore some of the unsupported Execs from output (#665)
- add markdown link checker (#672)
Release v23.10.1
Generated on 2023-11-16
- Updating tools docs to remove dead links and profiling docs to not require cluster/worker info (#651)
- Updating autotuner to generation recommendation always, even without cluster info (#650)
- Updating dataproc container cost to be multiplied by number of cores (#648)
- [BUG] Support autoscaling clusters for user qualification tool on Databricks platforms (#647)
- Support extra arguments in new user tools CLI (#646)
- Improve logs with user tools and jar version details (#642)
- Profiling tool: Add support for driver log as input to generate unsupported operators report (#654)
- Updating tools docs to remove dead links and profiling docs to not require cluster/worker info (#651)
- Updating autotuner to generation recommendation always, even without cluster info (#650)
- Qualification tool: Enhance mapping of Execs to stages (#634)
Release v23.10.0
Generated on 2023-10-30
- Fix system command processing during logging in user tools (#633)
- Fix spinner animation blocking user input in diagnostic tool (#631)
- Enable Dynamic 'Zone' Configuration for Dataproc User Tools (#629)
- Profiling tool : Update readSchema string parser (#635)
- [FEA] Fix empty softwareProperties field in worker_info.yaml file for profiling tool (#623)
Release v23.08.2
Generated on 2023-10-19
- Add unit tests for Dataproc GKE with mock GKE cluster (#618)
- Add support in user tools for running qualification on Dataproc GKE (#612)
- [BUG] Update user tools to use latest Databricks CLI version 0.200+ (#614)
- Add argprocessor unit test for checking error messages for onprem with no eventlogs (#605)
- Updating docs for custom speedup factors for scale factor (#604)
- [FEA] Add qualification user tool options to support external pricing (#595)
- [DOC] Add documentation for qualification user tool pricing discount options (#596)
- [FEA] Add user qualification tool options for specifying pricing discounts for CPU or GPU cluster, or both (#583)
- Add diagnostic capabilities for Databricks (AWS/Azure) environments (#533)
- Add verbose option to the CLI (#550)
- [FEA] Remove URLs from pydantic error messages (#560)
- Rename and change pyrapids to spark_rapids_tools (#570)
- Fix sdk_monitor exception thrown by abfs protocol (#569)
- Generating speedup factors for Dataproc GKE L4 GPU instances (#617)
- Qualification tool: Add penalty for row conversions (#471)
- Add support in core tools for running qualification on Dataproc GKE (#613)
- Sync up remaining updated execs and exprs from rapids-plugin (#602)
- Adding speedup factors for Dataproc Serverless and docs fix (#603)
- Add xxhash64 function as supported in qualification tools (#597)
- Fix ProjectExecParser to include digits in expression names (#592)
- [FEA] Add json_tuple function as supported in qualification tool (#589)
- [FEA] Add flatten function as supported in qualification tool (#587)
- [FEA] Sync up conv function with rapids-plugin resources (#573)
- Bump urllib3 from 1.26.17 to 1.26.18 in /data_validation (#622)
- Bump urllib3 from 1.26.14 to 1.26.17 in /data_validation (#606)
- Ignore pylint errors to fix python tests (#611)
Release v23.08.1
Generated on 2023-09-12
- [DOC] Fix help command in documentation (#540)
- Implement a cross-CSP storage driver (#485)
- Build tools package as single artifact for restricted environments (#516)
- Remove memoryOverhead recommendations for Standalone Spark (#557)
- [FEA] Add support to TIMESTAMP functions (#549)
- Fix handling of current_database and ArrayBuffer (#556)
- Add
translate
as supported expression in qualification tools (#546) - Adding TakeOrderedAndProject and BroadcastNestedLoopJoin, removing Project from speedup generation (#548)
- Qualification should treat promote_precision as supported (#545)
- Improve tool error message for files with text extensions (#544)
- Improve parsing of aggregate expressions (#535)
- Bump default build to use Spark-333 (#537)
- Improve AutoTuner plugin recommendation for Fat mode (#543)
- Updating speedup generation for more execs from NDS + validation script (#530)
- [FEA] Reset speedup factors for qualification tool in EMR 6.12 environments (#529)
- Add min, median and max columns to AccumProfileResults (#522)
- [FEA] Reset speedup factors for qualification tool in Databricks 12.2 environments (#524)
- Filter parser should check ignored-functions (#520)
- Update speedup factors for qualification tool in Dataproc 2.1 environments (#509)
- Changing max_value to total based on profiler core changes (#555)
- Add platform encoding to plugins defined in pom (#526)
Release v23.08.0
Generated on 2023-08-25
- Support offline execution of user tools in restricted environments (#497)
- Handle deprecation errors in python packaging (#513)
- Adds profiling support for EMR in user tools. (#500)
- Fix unit-tests for Spark-340 and Add spark-versions to gh-workflow (#503)
- fix gh-workflow for Python unit-tests (#505)
- Refactoring the speedup factor generation to support WholeStageCodegen parsing and environment defaults (#493)
- Try fix push issue in release action [skip ci] (#495)
- Revert "Push to protected branch using third-party action (#492)" (#494)
- Push to protected branch using third-party action (#492)
- Add secrets in the release.yml (#491)
- Add sign-off and token in release workflow (#490)
Release v23.06.4
Generated on 2023-08-16
- Creating custom speedup factors README with generation script (#488)
- Bump dev-version to 23.06.4 (#468)