From 5836d08c8e45212e35cfa11222b51c9802266d63 Mon Sep 17 00:00:00 2001 From: Ray Douglass Date: Wed, 11 Dec 2024 13:10:31 -0500 Subject: [PATCH] Update Changelog [skip ci] --- CHANGELOG.md | 329 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 329 insertions(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index 7a75b2a95a4..97f7afb33a1 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,3 +1,332 @@ +# cudf 24.12.00 (11 Dec 2024) + +## 🚨 Breaking Changes + +- Fix reading Parquet string cols when `nrows` and `input_pass_limit` > 0 ([#17321](https://github.com/rapidsai/cudf/pull/17321)) [@mhaseeb123](https://github.com/mhaseeb123) +- prefer wheel-provided libcudf.so in load_library(), use RTLD_LOCAL ([#17316](https://github.com/rapidsai/cudf/pull/17316)) [@jameslamb](https://github.com/jameslamb) +- Deprecate single component extraction methods in libcudf ([#17221](https://github.com/rapidsai/cudf/pull/17221)) [@Matt711](https://github.com/Matt711) +- Move detail header floating_conversion.hpp to detail subdirectory ([#17209](https://github.com/rapidsai/cudf/pull/17209)) [@davidwendt](https://github.com/davidwendt) +- Refactor Dask cuDF legacy code ([#17205](https://github.com/rapidsai/cudf/pull/17205)) [@rjzamora](https://github.com/rjzamora) +- Make HostMemoryBuffer call into the DefaultHostMemoryAllocator ([#17204](https://github.com/rapidsai/cudf/pull/17204)) [@revans2](https://github.com/revans2) +- Remove java reservation ([#17189](https://github.com/rapidsai/cudf/pull/17189)) [@revans2](https://github.com/revans2) +- Separate evaluation logic from `IR` objects in cudf-polars ([#17175](https://github.com/rapidsai/cudf/pull/17175)) [@rjzamora](https://github.com/rjzamora) +- Upgrade to polars 1.11 in cudf-polars ([#17154](https://github.com/rapidsai/cudf/pull/17154)) [@wence-](https://github.com/wence-) +- Remove the additional host register calls initially intended for performance improvement on Grace Hopper ([#17092](https://github.com/rapidsai/cudf/pull/17092)) [@kingcrimsontianyu](https://github.com/kingcrimsontianyu) +- Correctly set `is_device_accesible` when creating `host_span`s from other container/span types ([#17079](https://github.com/rapidsai/cudf/pull/17079)) [@vuule](https://github.com/vuule) +- Unify treatment of `Expr` and `IR` nodes in cudf-polars DSL ([#17016](https://github.com/rapidsai/cudf/pull/17016)) [@wence-](https://github.com/wence-) +- Deprecate support for directly accessing logger ([#16964](https://github.com/rapidsai/cudf/pull/16964)) [@vyasr](https://github.com/vyasr) +- Made cudftestutil header-only and removed GTest dependency ([#16839](https://github.com/rapidsai/cudf/pull/16839)) [@lamarrr](https://github.com/lamarrr) + +## 🐛 Bug Fixes + +- Turn off cudf.pandas 3rd party integrations tests for 24.12 ([#17500](https://github.com/rapidsai/cudf/pull/17500)) [@Matt711](https://github.com/Matt711) +- Ignore errors when testing glibc versions ([#17389](https://github.com/rapidsai/cudf/pull/17389)) [@vyasr](https://github.com/vyasr) +- Adapt to KvikIO API change in the compatibility mode ([#17377](https://github.com/rapidsai/cudf/pull/17377)) [@kingcrimsontianyu](https://github.com/kingcrimsontianyu) +- Support pivot with index or column arguments as lists ([#17373](https://github.com/rapidsai/cudf/pull/17373)) [@mroeschke](https://github.com/mroeschke) +- Deselect failing polars tests ([#17362](https://github.com/rapidsai/cudf/pull/17362)) [@pentschev](https://github.com/pentschev) +- Fix integer overflow in compiled binaryop ([#17354](https://github.com/rapidsai/cudf/pull/17354)) [@wence-](https://github.com/wence-) +- Update cmake to 3.28.6 in JNI Dockerfile ([#17342](https://github.com/rapidsai/cudf/pull/17342)) [@jlowe](https://github.com/jlowe) +- fix library-loading issues in editable installs ([#17338](https://github.com/rapidsai/cudf/pull/17338)) [@jameslamb](https://github.com/jameslamb) +- Bug fix: restrict lines=True to JSON format in Kafka read_gdf method ([#17333](https://github.com/rapidsai/cudf/pull/17333)) [@a-hirota](https://github.com/a-hirota) +- Fix various issues with `replace` API and add support in `datetime` and `timedelta` columns ([#17331](https://github.com/rapidsai/cudf/pull/17331)) [@galipremsagar](https://github.com/galipremsagar) +- Do not exclude nanoarrow and flatbuffers from installation if statically linked ([#17322](https://github.com/rapidsai/cudf/pull/17322)) [@hyperbolic2346](https://github.com/hyperbolic2346) +- Fix reading Parquet string cols when `nrows` and `input_pass_limit` > 0 ([#17321](https://github.com/rapidsai/cudf/pull/17321)) [@mhaseeb123](https://github.com/mhaseeb123) +- Remove another reference to `FindcuFile` ([#17315](https://github.com/rapidsai/cudf/pull/17315)) [@KyleFromNVIDIA](https://github.com/KyleFromNVIDIA) +- Fix reading of single-row unterminated CSV files ([#17305](https://github.com/rapidsai/cudf/pull/17305)) [@vuule](https://github.com/vuule) +- Fixed lifetime issue in ast transform tests ([#17292](https://github.com/rapidsai/cudf/pull/17292)) [@lamarrr](https://github.com/lamarrr) +- Switch to using `TaskSpec` ([#17285](https://github.com/rapidsai/cudf/pull/17285)) [@galipremsagar](https://github.com/galipremsagar) +- Fix data_type ctor call in JSON_TEST ([#17273](https://github.com/rapidsai/cudf/pull/17273)) [@davidwendt](https://github.com/davidwendt) +- Expose delimiter character in JSON reader options to JSON reader APIs ([#17266](https://github.com/rapidsai/cudf/pull/17266)) [@shrshi](https://github.com/shrshi) +- Fix extract-datetime deprecation warning in ndsh benchmark ([#17254](https://github.com/rapidsai/cudf/pull/17254)) [@davidwendt](https://github.com/davidwendt) +- Disallow cuda-python 12.6.1 and 11.8.4 ([#17253](https://github.com/rapidsai/cudf/pull/17253)) [@bdice](https://github.com/bdice) +- Wrap custom iterator result ([#17251](https://github.com/rapidsai/cudf/pull/17251)) [@galipremsagar](https://github.com/galipremsagar) +- Fix binop with LHS numpy datetimelike scalar ([#17226](https://github.com/rapidsai/cudf/pull/17226)) [@mroeschke](https://github.com/mroeschke) +- Fix `Dataframe.__setitem__` slow-downs ([#17222](https://github.com/rapidsai/cudf/pull/17222)) [@galipremsagar](https://github.com/galipremsagar) +- Fix groupby.get_group with length-1 tuple with list-like grouper ([#17216](https://github.com/rapidsai/cudf/pull/17216)) [@mroeschke](https://github.com/mroeschke) +- Fix discoverability of submodules inside `pd.util` ([#17215](https://github.com/rapidsai/cudf/pull/17215)) [@galipremsagar](https://github.com/galipremsagar) +- Fix `Schema.Builder` does not propagate precision value to `Builder` instance ([#17214](https://github.com/rapidsai/cudf/pull/17214)) [@ttnghia](https://github.com/ttnghia) +- Mark column chunks in a PQ reader `pass` as large strings when the cumulative `offsets` exceeds the large strings threshold. ([#17207](https://github.com/rapidsai/cudf/pull/17207)) [@mhaseeb123](https://github.com/mhaseeb123) +- [BUG] Replace `repo_token` with `github_token` in Auto Assign PR GHA ([#17203](https://github.com/rapidsai/cudf/pull/17203)) [@Matt711](https://github.com/Matt711) +- Remove unsanitized nulls from input strings columns in reduction gtests ([#17202](https://github.com/rapidsai/cudf/pull/17202)) [@davidwendt](https://github.com/davidwendt) +- Fix ``to_parquet`` append behavior with global metadata file ([#17198](https://github.com/rapidsai/cudf/pull/17198)) [@rjzamora](https://github.com/rjzamora) +- Check `num_children() == 0` in `Column.from_column_view` ([#17193](https://github.com/rapidsai/cudf/pull/17193)) [@cwharris](https://github.com/cwharris) +- Fix host-to-device copy missing sync in strings/duration convert ([#17149](https://github.com/rapidsai/cudf/pull/17149)) [@davidwendt](https://github.com/davidwendt) +- Add JNI Support for Multi-line Delimiters and Include Test ([#17139](https://github.com/rapidsai/cudf/pull/17139)) [@SurajAralihalli](https://github.com/SurajAralihalli) +- Ignore loud dask warnings about legacy dataframe implementation ([#17137](https://github.com/rapidsai/cudf/pull/17137)) [@galipremsagar](https://github.com/galipremsagar) +- Fix the GDS read/write segfault/bus error when the cuFile policy is set to GDS or ALWAYS ([#17122](https://github.com/rapidsai/cudf/pull/17122)) [@kingcrimsontianyu](https://github.com/kingcrimsontianyu) +- Fix `DataFrame._from_arrays` and introduce validations ([#17112](https://github.com/rapidsai/cudf/pull/17112)) [@galipremsagar](https://github.com/galipremsagar) +- [Bug] Fix Arrow-FS parquet reader for larger files ([#17099](https://github.com/rapidsai/cudf/pull/17099)) [@rjzamora](https://github.com/rjzamora) +- Fix bug in recovering invalid lines in JSONL inputs ([#17098](https://github.com/rapidsai/cudf/pull/17098)) [@shrshi](https://github.com/shrshi) +- Reenable huge pages for arrow host copying ([#17097](https://github.com/rapidsai/cudf/pull/17097)) [@vyasr](https://github.com/vyasr) +- Correctly set `is_device_accesible` when creating `host_span`s from other container/span types ([#17079](https://github.com/rapidsai/cudf/pull/17079)) [@vuule](https://github.com/vuule) +- Fix ORC reader when using `device_read_async` while the destination device buffers are not ready ([#17074](https://github.com/rapidsai/cudf/pull/17074)) [@ttnghia](https://github.com/ttnghia) +- Fix regex handling of fixed quantifier with 0 range ([#17067](https://github.com/rapidsai/cudf/pull/17067)) [@davidwendt](https://github.com/davidwendt) +- Limit the number of keys to calculate column sizes and page starts in PQ reader to 1B ([#17059](https://github.com/rapidsai/cudf/pull/17059)) [@mhaseeb123](https://github.com/mhaseeb123) +- Adding assertion to check for regular JSON inputs of size greater than `INT_MAX` bytes ([#17057](https://github.com/rapidsai/cudf/pull/17057)) [@shrshi](https://github.com/shrshi) +- bug fix: use `self.ck_consumer` in `poll` method of kafka.py to align with `__init__` ([#17044](https://github.com/rapidsai/cudf/pull/17044)) [@a-hirota](https://github.com/a-hirota) +- Disable kvikio remote I/O to avoid openssl dependencies in JNI build ([#17026](https://github.com/rapidsai/cudf/pull/17026)) [@pxLi](https://github.com/pxLi) +- Fix `host_span` constructor to correctly copy `is_device_accessible` ([#17020](https://github.com/rapidsai/cudf/pull/17020)) [@vuule](https://github.com/vuule) +- Add pinning for pyarrow in wheels ([#17018](https://github.com/rapidsai/cudf/pull/17018)) [@vyasr](https://github.com/vyasr) +- Use std::optional for host types ([#17015](https://github.com/rapidsai/cudf/pull/17015)) [@robertmaynard](https://github.com/robertmaynard) +- Fix write_json to handle empty string column ([#16995](https://github.com/rapidsai/cudf/pull/16995)) [@karthikeyann](https://github.com/karthikeyann) +- Restore export of nvcomp outside of wheel builds ([#16988](https://github.com/rapidsai/cudf/pull/16988)) [@KyleFromNVIDIA](https://github.com/KyleFromNVIDIA) +- Allow melt(var_name=) to be a falsy label ([#16981](https://github.com/rapidsai/cudf/pull/16981)) [@mroeschke](https://github.com/mroeschke) +- Fix astype from tz-aware type to tz-aware type ([#16980](https://github.com/rapidsai/cudf/pull/16980)) [@mroeschke](https://github.com/mroeschke) +- Use `libcudf` wheel from PR rather than nightly for `polars-polars` CI test job ([#16975](https://github.com/rapidsai/cudf/pull/16975)) [@brandon-b-miller](https://github.com/brandon-b-miller) +- Fix order-preservation in pandas-compat unsorted groupby ([#16942](https://github.com/rapidsai/cudf/pull/16942)) [@wence-](https://github.com/wence-) +- Fix cudf::strings::findall error with empty input ([#16928](https://github.com/rapidsai/cudf/pull/16928)) [@davidwendt](https://github.com/davidwendt) +- Fix JsonLargeReaderTest.MultiBatch use of LIBCUDF_JSON_BATCH_SIZE env var ([#16927](https://github.com/rapidsai/cudf/pull/16927)) [@davidwendt](https://github.com/davidwendt) +- Parse newline as whitespace character while tokenizing JSONL inputs with non-newline delimiter ([#16923](https://github.com/rapidsai/cudf/pull/16923)) [@shrshi](https://github.com/shrshi) +- Respect groupby.nunique(dropna=False) ([#16921](https://github.com/rapidsai/cudf/pull/16921)) [@mroeschke](https://github.com/mroeschke) +- Update all rmm imports to use pylibrmm/librmm ([#16913](https://github.com/rapidsai/cudf/pull/16913)) [@Matt711](https://github.com/Matt711) +- Fix order-preservation in cudf-polars groupby ([#16907](https://github.com/rapidsai/cudf/pull/16907)) [@wence-](https://github.com/wence-) +- Add a shortcut for when the input clusters are all empty for the tdigest merge ([#16897](https://github.com/rapidsai/cudf/pull/16897)) [@jihoonson](https://github.com/jihoonson) +- Properly handle the mapped and registered regions in `memory_mapped_source` ([#16865](https://github.com/rapidsai/cudf/pull/16865)) [@vuule](https://github.com/vuule) +- Fix performance regression for generate_character_ngrams ([#16849](https://github.com/rapidsai/cudf/pull/16849)) [@davidwendt](https://github.com/davidwendt) +- Fix regex parsing logic handling of nested quantifiers ([#16798](https://github.com/rapidsai/cudf/pull/16798)) [@davidwendt](https://github.com/davidwendt) +- Compute whole column variance using numerically stable approach ([#16448](https://github.com/rapidsai/cudf/pull/16448)) [@wence-](https://github.com/wence-) + +## 📖 Documentation + +- Add documentation for low memory readers ([#17314](https://github.com/rapidsai/cudf/pull/17314)) [@btepera](https://github.com/btepera) +- Fix the example in documentation for `get_dremel_data()` ([#17242](https://github.com/rapidsai/cudf/pull/17242)) [@mhaseeb123](https://github.com/mhaseeb123) +- Fix some documentation rendering for pylibcudf ([#17217](https://github.com/rapidsai/cudf/pull/17217)) [@mroeschke](https://github.com/mroeschke) +- Move detail header floating_conversion.hpp to detail subdirectory ([#17209](https://github.com/rapidsai/cudf/pull/17209)) [@davidwendt](https://github.com/davidwendt) +- Add TokenizeVocabulary to api docs ([#17208](https://github.com/rapidsai/cudf/pull/17208)) [@davidwendt](https://github.com/davidwendt) +- Add jaccard_index to generated cuDF docs ([#17199](https://github.com/rapidsai/cudf/pull/17199)) [@davidwendt](https://github.com/davidwendt) +- [no ci] Add empty-columns section to the libcudf developer guide ([#17183](https://github.com/rapidsai/cudf/pull/17183)) [@davidwendt](https://github.com/davidwendt) +- Add 2-cpp approvers text to contributing guide [no ci] ([#17182](https://github.com/rapidsai/cudf/pull/17182)) [@davidwendt](https://github.com/davidwendt) +- Changing developer guide int_64_t to int64_t ([#17130](https://github.com/rapidsai/cudf/pull/17130)) [@hyperbolic2346](https://github.com/hyperbolic2346) +- docs: change 'CSV' to 'csv' in python/custreamz/README.md to match kafka.py ([#17041](https://github.com/rapidsai/cudf/pull/17041)) [@a-hirota](https://github.com/a-hirota) +- [DOC] Document limitation using `cudf.pandas` proxy arrays ([#16955](https://github.com/rapidsai/cudf/pull/16955)) [@Matt711](https://github.com/Matt711) +- [DOC] Document environment variable for failing on fallback in `cudf.pandas` ([#16932](https://github.com/rapidsai/cudf/pull/16932)) [@Matt711](https://github.com/Matt711) + +## 🚀 New Features + +- Add version config ([#17312](https://github.com/rapidsai/cudf/pull/17312)) [@vyasr](https://github.com/vyasr) +- Java JNI for Multiple contains ([#17281](https://github.com/rapidsai/cudf/pull/17281)) [@res-life](https://github.com/res-life) +- Add `cudf::calendrical_month_sequence` to pylibcudf ([#17277](https://github.com/rapidsai/cudf/pull/17277)) [@Matt711](https://github.com/Matt711) +- Raise errors on specific types of fallback in `cudf.pandas` ([#17268](https://github.com/rapidsai/cudf/pull/17268)) [@Matt711](https://github.com/Matt711) +- Add `catboost` to the third-party integration tests ([#17267](https://github.com/rapidsai/cudf/pull/17267)) [@Matt711](https://github.com/Matt711) +- Add type stubs for pylibcudf ([#17258](https://github.com/rapidsai/cudf/pull/17258)) [@wence-](https://github.com/wence-) +- Use pylibcudf contiguous split APIs in cudf python ([#17246](https://github.com/rapidsai/cudf/pull/17246)) [@Matt711](https://github.com/Matt711) +- Upgrade nvcomp to 4.1.0.6 ([#17201](https://github.com/rapidsai/cudf/pull/17201)) [@bdice](https://github.com/bdice) +- Added Arrow Interop Benchmarks ([#17194](https://github.com/rapidsai/cudf/pull/17194)) [@lamarrr](https://github.com/lamarrr) +- Rewrite Java API `Table.readJSON` to return the output from libcudf `read_json` directly ([#17180](https://github.com/rapidsai/cudf/pull/17180)) [@ttnghia](https://github.com/ttnghia) +- Support storing `precision` of decimal types in `Schema` class ([#17176](https://github.com/rapidsai/cudf/pull/17176)) [@ttnghia](https://github.com/ttnghia) +- Migrate CSV writer to pylibcudf ([#17163](https://github.com/rapidsai/cudf/pull/17163)) [@Matt711](https://github.com/Matt711) +- Add compute_shared_memory_aggs used by shared memory groupby ([#17162](https://github.com/rapidsai/cudf/pull/17162)) [@PointKernel](https://github.com/PointKernel) +- Added ast tree to simplify expression lifetime management ([#17156](https://github.com/rapidsai/cudf/pull/17156)) [@lamarrr](https://github.com/lamarrr) +- Add compute_mapping_indices used by shared memory groupby ([#17147](https://github.com/rapidsai/cudf/pull/17147)) [@PointKernel](https://github.com/PointKernel) +- Add remaining datetime APIs to pylibcudf ([#17143](https://github.com/rapidsai/cudf/pull/17143)) [@Matt711](https://github.com/Matt711) +- Added strings AST vs BINARY_OP benchmarks ([#17128](https://github.com/rapidsai/cudf/pull/17128)) [@lamarrr](https://github.com/lamarrr) +- Use `libcudf_exception_handler` throughout `pylibcudf.libcudf` ([#17109](https://github.com/rapidsai/cudf/pull/17109)) [@brandon-b-miller](https://github.com/brandon-b-miller) +- Include timezone file path in error message ([#17102](https://github.com/rapidsai/cudf/pull/17102)) [@bdice](https://github.com/bdice) +- Migrate NVText Byte Pair Encoding APIs to pylibcudf ([#17101](https://github.com/rapidsai/cudf/pull/17101)) [@Matt711](https://github.com/Matt711) +- Migrate NVText Tokenizing APIs to pylibcudf ([#17100](https://github.com/rapidsai/cudf/pull/17100)) [@Matt711](https://github.com/Matt711) +- Migrate NVtext subword tokenizing APIs to pylibcudf ([#17096](https://github.com/rapidsai/cudf/pull/17096)) [@Matt711](https://github.com/Matt711) +- Migrate NVText Stemming APIs to pylibcudf ([#17085](https://github.com/rapidsai/cudf/pull/17085)) [@Matt711](https://github.com/Matt711) +- Migrate NVText Replacing APIs to pylibcudf ([#17084](https://github.com/rapidsai/cudf/pull/17084)) [@Matt711](https://github.com/Matt711) +- Add IWYU to CI ([#17078](https://github.com/rapidsai/cudf/pull/17078)) [@vyasr](https://github.com/vyasr) +- `cudf-polars` string/numeric casting ([#17076](https://github.com/rapidsai/cudf/pull/17076)) [@brandon-b-miller](https://github.com/brandon-b-miller) +- Migrate NVText Normalizing APIs to Pylibcudf ([#17072](https://github.com/rapidsai/cudf/pull/17072)) [@Matt711](https://github.com/Matt711) +- Migrate remaining nvtext NGrams APIs to pylibcudf ([#17070](https://github.com/rapidsai/cudf/pull/17070)) [@Matt711](https://github.com/Matt711) +- Add profilers to CUDA 12 conda devcontainers ([#17066](https://github.com/rapidsai/cudf/pull/17066)) [@vyasr](https://github.com/vyasr) +- Add conda recipe for cudf-polars ([#17037](https://github.com/rapidsai/cudf/pull/17037)) [@bdice](https://github.com/bdice) +- Implement batch construction for strings columns ([#17035](https://github.com/rapidsai/cudf/pull/17035)) [@ttnghia](https://github.com/ttnghia) +- Add device aggregators used by shared memory groupby ([#17031](https://github.com/rapidsai/cudf/pull/17031)) [@PointKernel](https://github.com/PointKernel) +- Add optional column_order in JSON reader ([#17029](https://github.com/rapidsai/cudf/pull/17029)) [@karthikeyann](https://github.com/karthikeyann) +- Migrate Min Hashing APIs to pylibcudf ([#17021](https://github.com/rapidsai/cudf/pull/17021)) [@Matt711](https://github.com/Matt711) +- Reorganize `cudf_polars` expression code ([#17014](https://github.com/rapidsai/cudf/pull/17014)) [@brandon-b-miller](https://github.com/brandon-b-miller) +- Migrate nvtext jaccard API to pylibcudf ([#17007](https://github.com/rapidsai/cudf/pull/17007)) [@Matt711](https://github.com/Matt711) +- Migrate nvtext generate_ngrams APIs to pylibcudf ([#17006](https://github.com/rapidsai/cudf/pull/17006)) [@Matt711](https://github.com/Matt711) +- Control whether a file data source memory-maps the file with an environment variable ([#17004](https://github.com/rapidsai/cudf/pull/17004)) [@vuule](https://github.com/vuule) +- Switched BINARY_OP Benchmarks from GoogleBench to NVBench ([#16963](https://github.com/rapidsai/cudf/pull/16963)) [@lamarrr](https://github.com/lamarrr) +- [FEA] Report all unsupported operations for a query in cudf.polars ([#16960](https://github.com/rapidsai/cudf/pull/16960)) [@Matt711](https://github.com/Matt711) +- [FEA] Migrate nvtext/edit_distance APIs to pylibcudf ([#16957](https://github.com/rapidsai/cudf/pull/16957)) [@Matt711](https://github.com/Matt711) +- Switched AST benchmarks from GoogleBench to NVBench ([#16952](https://github.com/rapidsai/cudf/pull/16952)) [@lamarrr](https://github.com/lamarrr) +- Extend `device_scalar` to optionally use pinned bounce buffer ([#16947](https://github.com/rapidsai/cudf/pull/16947)) [@vuule](https://github.com/vuule) +- Implement `cudf-polars` chunked parquet reading ([#16944](https://github.com/rapidsai/cudf/pull/16944)) [@brandon-b-miller](https://github.com/brandon-b-miller) +- Expose streams in public round APIs ([#16925](https://github.com/rapidsai/cudf/pull/16925)) [@Matt711](https://github.com/Matt711) +- add telemetry setup to test ([#16924](https://github.com/rapidsai/cudf/pull/16924)) [@msarahan](https://github.com/msarahan) +- Add cudf::strings::contains_multiple ([#16900](https://github.com/rapidsai/cudf/pull/16900)) [@davidwendt](https://github.com/davidwendt) +- Made cudftestutil header-only and removed GTest dependency ([#16839](https://github.com/rapidsai/cudf/pull/16839)) [@lamarrr](https://github.com/lamarrr) +- Add an example to demonstrate multithreaded `read_parquet` pipelines ([#16828](https://github.com/rapidsai/cudf/pull/16828)) [@mhaseeb123](https://github.com/mhaseeb123) +- Implement `extract_datetime_component` in `libcudf`/`pylibcudf` ([#16776](https://github.com/rapidsai/cudf/pull/16776)) [@brandon-b-miller](https://github.com/brandon-b-miller) +- Add cudf::strings::find_re API ([#16742](https://github.com/rapidsai/cudf/pull/16742)) [@davidwendt](https://github.com/davidwendt) +- Migrate hashing operations to `pylibcudf` ([#15418](https://github.com/rapidsai/cudf/pull/15418)) [@brandon-b-miller](https://github.com/brandon-b-miller) + +## 🛠️ Improvements + +- Simplify serialization protocols ([#17552](https://github.com/rapidsai/cudf/pull/17552)) [@vyasr](https://github.com/vyasr) +- Add `pynvml` as a dependency for `dask-cudf` ([#17386](https://github.com/rapidsai/cudf/pull/17386)) [@pentschev](https://github.com/pentschev) +- Enable unified memory by default in `cudf_polars` ([#17375](https://github.com/rapidsai/cudf/pull/17375)) [@galipremsagar](https://github.com/galipremsagar) +- Support polars 1.14 ([#17355](https://github.com/rapidsai/cudf/pull/17355)) [@wence-](https://github.com/wence-) +- Remove cudf._lib.quantiles in favor of inlining pylibcudf ([#17347](https://github.com/rapidsai/cudf/pull/17347)) [@mroeschke](https://github.com/mroeschke) +- Remove cudf._lib.labeling in favor of inlining pylibcudf ([#17346](https://github.com/rapidsai/cudf/pull/17346)) [@mroeschke](https://github.com/mroeschke) +- Remove cudf._lib.hash in favor of inlining pylibcudf ([#17345](https://github.com/rapidsai/cudf/pull/17345)) [@mroeschke](https://github.com/mroeschke) +- Remove cudf._lib.concat in favor of inlining pylibcudf ([#17344](https://github.com/rapidsai/cudf/pull/17344)) [@mroeschke](https://github.com/mroeschke) +- Extract ``GPUEngine`` config options at translation time ([#17339](https://github.com/rapidsai/cudf/pull/17339)) [@rjzamora](https://github.com/rjzamora) +- Update java datetime APIs to match CUDF. ([#17329](https://github.com/rapidsai/cudf/pull/17329)) [@revans2](https://github.com/revans2) +- Move strings url_decode benchmarks to nvbench ([#17328](https://github.com/rapidsai/cudf/pull/17328)) [@davidwendt](https://github.com/davidwendt) +- Move strings translate benchmarks to nvbench ([#17325](https://github.com/rapidsai/cudf/pull/17325)) [@davidwendt](https://github.com/davidwendt) +- Writing compressed output using JSON writer ([#17323](https://github.com/rapidsai/cudf/pull/17323)) [@shrshi](https://github.com/shrshi) +- Test the full matrix for polars and dask wheels on nightlies ([#17320](https://github.com/rapidsai/cudf/pull/17320)) [@vyasr](https://github.com/vyasr) +- Remove cudf._lib.avro in favor of inlining pylicudf ([#17319](https://github.com/rapidsai/cudf/pull/17319)) [@mroeschke](https://github.com/mroeschke) +- Move cudf._lib.unary to cudf.core._internals ([#17318](https://github.com/rapidsai/cudf/pull/17318)) [@mroeschke](https://github.com/mroeschke) +- prefer wheel-provided libcudf.so in load_library(), use RTLD_LOCAL ([#17316](https://github.com/rapidsai/cudf/pull/17316)) [@jameslamb](https://github.com/jameslamb) +- Clean up misc, unneeded pylibcudf.libcudf in cudf._lib ([#17309](https://github.com/rapidsai/cudf/pull/17309)) [@mroeschke](https://github.com/mroeschke) +- Exclude nanoarrow and flatbuffers from installation ([#17308](https://github.com/rapidsai/cudf/pull/17308)) [@vyasr](https://github.com/vyasr) +- Update CI jobs to include Polars in nightlies and improve IWYU ([#17306](https://github.com/rapidsai/cudf/pull/17306)) [@vyasr](https://github.com/vyasr) +- Move strings repeat benchmarks to nvbench ([#17304](https://github.com/rapidsai/cudf/pull/17304)) [@davidwendt](https://github.com/davidwendt) +- Fix synchronization bug in bool parquet mukernels ([#17302](https://github.com/rapidsai/cudf/pull/17302)) [@pmattione-nvidia](https://github.com/pmattione-nvidia) +- Move strings replace benchmarks to nvbench ([#17301](https://github.com/rapidsai/cudf/pull/17301)) [@davidwendt](https://github.com/davidwendt) +- Support polars 1.13 ([#17299](https://github.com/rapidsai/cudf/pull/17299)) [@wence-](https://github.com/wence-) +- Replace FindcuFile with upstream FindCUDAToolkit support ([#17298](https://github.com/rapidsai/cudf/pull/17298)) [@KyleFromNVIDIA](https://github.com/KyleFromNVIDIA) +- Expose stream-ordering in public transpose API ([#17294](https://github.com/rapidsai/cudf/pull/17294)) [@shrshi](https://github.com/shrshi) +- Replace workaround of JNI build with CUDF_KVIKIO_REMOTE_IO=OFF ([#17293](https://github.com/rapidsai/cudf/pull/17293)) [@pxLi](https://github.com/pxLi) +- cmake option: `CUDF_KVIKIO_REMOTE_IO` ([#17291](https://github.com/rapidsai/cudf/pull/17291)) [@madsbk](https://github.com/madsbk) +- Use more pylibcudf Python enums in cudf._lib ([#17288](https://github.com/rapidsai/cudf/pull/17288)) [@mroeschke](https://github.com/mroeschke) +- Use pylibcudf enums in cudf Python quantile ([#17287](https://github.com/rapidsai/cudf/pull/17287)) [@mroeschke](https://github.com/mroeschke) +- enforce wheel size limits, README formatting in CI ([#17284](https://github.com/rapidsai/cudf/pull/17284)) [@jameslamb](https://github.com/jameslamb) +- Use numba-cuda<0.0.18 ([#17280](https://github.com/rapidsai/cudf/pull/17280)) [@gmarkall](https://github.com/gmarkall) +- Add compute_column_expression to pylibcudf for transform.compute_column ([#17279](https://github.com/rapidsai/cudf/pull/17279)) [@mroeschke](https://github.com/mroeschke) +- Optimize distinct inner join to use set `find` instead of `retrieve` ([#17278](https://github.com/rapidsai/cudf/pull/17278)) [@PointKernel](https://github.com/PointKernel) +- remove WheelHelpers.cmake ([#17276](https://github.com/rapidsai/cudf/pull/17276)) [@jameslamb](https://github.com/jameslamb) +- Plumb pylibcudf datetime APIs through cudf python ([#17275](https://github.com/rapidsai/cudf/pull/17275)) [@Matt711](https://github.com/Matt711) +- Follow up making Python tests more deterministic ([#17272](https://github.com/rapidsai/cudf/pull/17272)) [@mroeschke](https://github.com/mroeschke) +- Use pylibcudf.search APIs in cudf python ([#17271](https://github.com/rapidsai/cudf/pull/17271)) [@Matt711](https://github.com/Matt711) +- Use `pylibcudf.strings.convert.convert_integers.is_integer` in cudf python ([#17270](https://github.com/rapidsai/cudf/pull/17270)) [@Matt711](https://github.com/Matt711) +- Move strings filter benchmarks to nvbench ([#17269](https://github.com/rapidsai/cudf/pull/17269)) [@davidwendt](https://github.com/davidwendt) +- Make constructor of DeviceMemoryBufferView public ([#17265](https://github.com/rapidsai/cudf/pull/17265)) [@liurenjie1024](https://github.com/liurenjie1024) +- Put a ceiling on cuda-python ([#17264](https://github.com/rapidsai/cudf/pull/17264)) [@jameslamb](https://github.com/jameslamb) +- Always prefer `device_read`s and `device_write`s when kvikIO is enabled ([#17260](https://github.com/rapidsai/cudf/pull/17260)) [@vuule](https://github.com/vuule) +- Expose streams in public quantile APIs ([#17257](https://github.com/rapidsai/cudf/pull/17257)) [@shrshi](https://github.com/shrshi) +- Add support for `pyarrow-18` ([#17256](https://github.com/rapidsai/cudf/pull/17256)) [@galipremsagar](https://github.com/galipremsagar) +- Move strings/numeric convert benchmarks to nvbench ([#17255](https://github.com/rapidsai/cudf/pull/17255)) [@davidwendt](https://github.com/davidwendt) +- Add new ``dask_cudf.read_parquet`` API ([#17250](https://github.com/rapidsai/cudf/pull/17250)) [@rjzamora](https://github.com/rjzamora) +- Add read_parquet_metadata to pylibcudf ([#17245](https://github.com/rapidsai/cudf/pull/17245)) [@mroeschke](https://github.com/mroeschke) +- Search for kvikio with lowercase ([#17243](https://github.com/rapidsai/cudf/pull/17243)) [@vyasr](https://github.com/vyasr) +- KvikIO shared library ([#17239](https://github.com/rapidsai/cudf/pull/17239)) [@madsbk](https://github.com/madsbk) +- Use more pylibcudf.io.types enums in cudf._libs ([#17237](https://github.com/rapidsai/cudf/pull/17237)) [@mroeschke](https://github.com/mroeschke) +- Expose mixed and conditional joins in pylibcudf ([#17235](https://github.com/rapidsai/cudf/pull/17235)) [@wence-](https://github.com/wence-) +- Add io.text APIs to pylibcudf ([#17232](https://github.com/rapidsai/cudf/pull/17232)) [@mroeschke](https://github.com/mroeschke) +- Add `num_iterations` axis to the multi-threaded Parquet benchmarks ([#17231](https://github.com/rapidsai/cudf/pull/17231)) [@vuule](https://github.com/vuule) +- Move strings to date/time types benchmarks to nvbench ([#17229](https://github.com/rapidsai/cudf/pull/17229)) [@davidwendt](https://github.com/davidwendt) +- Support for polars 1.12 in cudf-polars ([#17227](https://github.com/rapidsai/cudf/pull/17227)) [@wence-](https://github.com/wence-) +- Allow generating large strings in benchmarks ([#17224](https://github.com/rapidsai/cudf/pull/17224)) [@davidwendt](https://github.com/davidwendt) +- Refactor gather/scatter benchmarks for strings ([#17223](https://github.com/rapidsai/cudf/pull/17223)) [@davidwendt](https://github.com/davidwendt) +- Deprecate single component extraction methods in libcudf ([#17221](https://github.com/rapidsai/cudf/pull/17221)) [@Matt711](https://github.com/Matt711) +- Remove `nvtext::load_vocabulary` from pylibcudf ([#17220](https://github.com/rapidsai/cudf/pull/17220)) [@Matt711](https://github.com/Matt711) +- Benchmarking JSON reader for compressed inputs ([#17219](https://github.com/rapidsai/cudf/pull/17219)) [@shrshi](https://github.com/shrshi) +- Expose stream-ordering in partitioning API ([#17213](https://github.com/rapidsai/cudf/pull/17213)) [@shrshi](https://github.com/shrshi) +- Move strings::concatenate benchmark to nvbench ([#17211](https://github.com/rapidsai/cudf/pull/17211)) [@davidwendt](https://github.com/davidwendt) +- Expose stream-ordering in subword tokenizer API ([#17206](https://github.com/rapidsai/cudf/pull/17206)) [@shrshi](https://github.com/shrshi) +- Refactor Dask cuDF legacy code ([#17205](https://github.com/rapidsai/cudf/pull/17205)) [@rjzamora](https://github.com/rjzamora) +- Make HostMemoryBuffer call into the DefaultHostMemoryAllocator ([#17204](https://github.com/rapidsai/cudf/pull/17204)) [@revans2](https://github.com/revans2) +- Unified binary_ops and ast benchmarks parameter names ([#17200](https://github.com/rapidsai/cudf/pull/17200)) [@lamarrr](https://github.com/lamarrr) +- Add in new java API for raw host memory allocation ([#17197](https://github.com/rapidsai/cudf/pull/17197)) [@revans2](https://github.com/revans2) +- Remove java reservation ([#17189](https://github.com/rapidsai/cudf/pull/17189)) [@revans2](https://github.com/revans2) +- Fixed unused attribute compilation error for GCC 13 ([#17188](https://github.com/rapidsai/cudf/pull/17188)) [@lamarrr](https://github.com/lamarrr) +- Change default KvikIO parameters in cuDF: set the thread pool size to 4, and compatibility mode to ON ([#17185](https://github.com/rapidsai/cudf/pull/17185)) [@kingcrimsontianyu](https://github.com/kingcrimsontianyu) +- Use make_device_uvector instead of cudaMemcpyAsync in inplace_bitmask_binop ([#17181](https://github.com/rapidsai/cudf/pull/17181)) [@davidwendt](https://github.com/davidwendt) +- Make ai.rapids.cudf.HostMemoryBuffer#copyFromStream public. ([#17179](https://github.com/rapidsai/cudf/pull/17179)) [@liurenjie1024](https://github.com/liurenjie1024) +- Separate evaluation logic from `IR` objects in cudf-polars ([#17175](https://github.com/rapidsai/cudf/pull/17175)) [@rjzamora](https://github.com/rjzamora) +- Move nvtext ngrams benchmarks to nvbench ([#17173](https://github.com/rapidsai/cudf/pull/17173)) [@davidwendt](https://github.com/davidwendt) +- Remove includes suggested by include-what-you-use ([#17170](https://github.com/rapidsai/cudf/pull/17170)) [@vyasr](https://github.com/vyasr) +- Reading multi-source compressed JSONL files ([#17161](https://github.com/rapidsai/cudf/pull/17161)) [@shrshi](https://github.com/shrshi) +- Process parquet bools with microkernels ([#17157](https://github.com/rapidsai/cudf/pull/17157)) [@pmattione-nvidia](https://github.com/pmattione-nvidia) +- Upgrade to polars 1.11 in cudf-polars ([#17154](https://github.com/rapidsai/cudf/pull/17154)) [@wence-](https://github.com/wence-) +- Deprecate current libcudf nvtext minhash functions ([#17152](https://github.com/rapidsai/cudf/pull/17152)) [@davidwendt](https://github.com/davidwendt) +- Remove unused variable in internal merge_tdigests utility ([#17151](https://github.com/rapidsai/cudf/pull/17151)) [@davidwendt](https://github.com/davidwendt) +- Use the full ref name of `rmm.DeviceBuffer` in the sphinx config file ([#17150](https://github.com/rapidsai/cudf/pull/17150)) [@Matt711](https://github.com/Matt711) +- Move `segmented_gather` function from the copying module to the lists module ([#17148](https://github.com/rapidsai/cudf/pull/17148)) [@Matt711](https://github.com/Matt711) +- Use async execution policy for true_if ([#17146](https://github.com/rapidsai/cudf/pull/17146)) [@PointKernel](https://github.com/PointKernel) +- Add conversion from cudf-polars expressions to libcudf ast for parquet filters ([#17141](https://github.com/rapidsai/cudf/pull/17141)) [@wence-](https://github.com/wence-) +- devcontainer: replace `VAULT_HOST` with `AWS_ROLE_ARN` ([#17134](https://github.com/rapidsai/cudf/pull/17134)) [@jjacobelli](https://github.com/jjacobelli) +- Replace direct `cudaMemcpyAsync` calls with utility functions (limited to `cudf::io`) ([#17132](https://github.com/rapidsai/cudf/pull/17132)) [@vuule](https://github.com/vuule) +- use rapids-generate-pip-constraints to pin to oldest dependencies in CI ([#17131](https://github.com/rapidsai/cudf/pull/17131)) [@jameslamb](https://github.com/jameslamb) +- Set the default number of threads in KvikIO thread pool to 8 ([#17126](https://github.com/rapidsai/cudf/pull/17126)) [@kingcrimsontianyu](https://github.com/kingcrimsontianyu) +- Fix clang-tidy violations for span.hpp and hostdevice_vector.hpp ([#17124](https://github.com/rapidsai/cudf/pull/17124)) [@davidwendt](https://github.com/davidwendt) +- Disable the Parquet reader's wide lists tables GTest by default ([#17120](https://github.com/rapidsai/cudf/pull/17120)) [@mhaseeb123](https://github.com/mhaseeb123) +- Add compile time check to ensure the `counting_iterator` type in `counting_transform_iterator` fits in `size_type` ([#17118](https://github.com/rapidsai/cudf/pull/17118)) [@mhaseeb123](https://github.com/mhaseeb123) +- Minor I/O code quality improvements ([#17105](https://github.com/rapidsai/cudf/pull/17105)) [@kingcrimsontianyu](https://github.com/kingcrimsontianyu) +- Remove the additional host register calls initially intended for performance improvement on Grace Hopper ([#17092](https://github.com/rapidsai/cudf/pull/17092)) [@kingcrimsontianyu](https://github.com/kingcrimsontianyu) +- Split hash-based groupby into multiple smaller files to reduce build time ([#17089](https://github.com/rapidsai/cudf/pull/17089)) [@PointKernel](https://github.com/PointKernel) +- build wheels without build isolation ([#17088](https://github.com/rapidsai/cudf/pull/17088)) [@jameslamb](https://github.com/jameslamb) +- Polars: DataFrame Serialization ([#17062](https://github.com/rapidsai/cudf/pull/17062)) [@madsbk](https://github.com/madsbk) +- Remove unused hash helper functions ([#17056](https://github.com/rapidsai/cudf/pull/17056)) [@PointKernel](https://github.com/PointKernel) +- Add to_dlpack/from_dlpack APIs to pylibcudf ([#17055](https://github.com/rapidsai/cudf/pull/17055)) [@mroeschke](https://github.com/mroeschke) +- Move `flatten_single_pass_aggs` to its own TU ([#17053](https://github.com/rapidsai/cudf/pull/17053)) [@PointKernel](https://github.com/PointKernel) +- Replace deprecated cuco APIs with updated versions ([#17052](https://github.com/rapidsai/cudf/pull/17052)) [@PointKernel](https://github.com/PointKernel) +- Refactor ORC dictionary encoding to migrate to the new `cuco::static_map` ([#17049](https://github.com/rapidsai/cudf/pull/17049)) [@mhaseeb123](https://github.com/mhaseeb123) +- Move pylibcudf/libcudf/wrappers/decimals to pylibcudf/libcudf/fixed_point ([#17048](https://github.com/rapidsai/cudf/pull/17048)) [@mroeschke](https://github.com/mroeschke) +- make conda installs in CI stricter (part 2) ([#17042](https://github.com/rapidsai/cudf/pull/17042)) [@jameslamb](https://github.com/jameslamb) +- Use managed memory for NDSH benchmarks ([#17039](https://github.com/rapidsai/cudf/pull/17039)) [@karthikeyann](https://github.com/karthikeyann) +- Clean up hash-groupby `var_hash_functor` ([#17034](https://github.com/rapidsai/cudf/pull/17034)) [@PointKernel](https://github.com/PointKernel) +- Add json APIs to pylibcudf ([#17025](https://github.com/rapidsai/cudf/pull/17025)) [@mroeschke](https://github.com/mroeschke) +- Add string.replace_re APIs to pylibcudf ([#17023](https://github.com/rapidsai/cudf/pull/17023)) [@mroeschke](https://github.com/mroeschke) +- Replace old host tree algorithm with new algorithm in JSON reader ([#17019](https://github.com/rapidsai/cudf/pull/17019)) [@karthikeyann](https://github.com/karthikeyann) +- Unify treatment of `Expr` and `IR` nodes in cudf-polars DSL ([#17016](https://github.com/rapidsai/cudf/pull/17016)) [@wence-](https://github.com/wence-) +- make conda installs in CI stricter ([#17013](https://github.com/rapidsai/cudf/pull/17013)) [@jameslamb](https://github.com/jameslamb) +- Pylibcudf: pack and unpack ([#17012](https://github.com/rapidsai/cudf/pull/17012)) [@madsbk](https://github.com/madsbk) +- Remove unneeded pylibcudf.libcudf.wrappers.duration usage in cudf ([#17010](https://github.com/rapidsai/cudf/pull/17010)) [@mroeschke](https://github.com/mroeschke) +- Add custom "fused" groupby aggregation to Dask cuDF ([#17009](https://github.com/rapidsai/cudf/pull/17009)) [@rjzamora](https://github.com/rjzamora) +- Make tests more deterministic ([#17008](https://github.com/rapidsai/cudf/pull/17008)) [@galipremsagar](https://github.com/galipremsagar) +- Remove unused import ([#17005](https://github.com/rapidsai/cudf/pull/17005)) [@Matt711](https://github.com/Matt711) +- Add string.convert.convert_urls APIs to pylibcudf ([#17003](https://github.com/rapidsai/cudf/pull/17003)) [@mroeschke](https://github.com/mroeschke) +- Add release tracking to project automation scripts ([#17001](https://github.com/rapidsai/cudf/pull/17001)) [@jarmak-nv](https://github.com/jarmak-nv) +- Implement inequality joins by translation to conditional joins ([#17000](https://github.com/rapidsai/cudf/pull/17000)) [@wence-](https://github.com/wence-) +- Add string.convert.convert_lists APIs to pylibcudf ([#16997](https://github.com/rapidsai/cudf/pull/16997)) [@mroeschke](https://github.com/mroeschke) +- Performance optimization of JSON validation ([#16996](https://github.com/rapidsai/cudf/pull/16996)) [@karthikeyann](https://github.com/karthikeyann) +- Add string.convert.convert_ipv4 APIs to pylibcudf ([#16994](https://github.com/rapidsai/cudf/pull/16994)) [@mroeschke](https://github.com/mroeschke) +- Add string.convert.convert_integers APIs to pylibcudf ([#16991](https://github.com/rapidsai/cudf/pull/16991)) [@mroeschke](https://github.com/mroeschke) +- Add string.convert_floats APIs to pylibcudf ([#16990](https://github.com/rapidsai/cudf/pull/16990)) [@mroeschke](https://github.com/mroeschke) +- Add string.convert.convert_fixed_type APIs to pylibcudf ([#16984](https://github.com/rapidsai/cudf/pull/16984)) [@mroeschke](https://github.com/mroeschke) +- Remove unnecessary `std::move`'s in pylibcudf ([#16983](https://github.com/rapidsai/cudf/pull/16983)) [@Matt711](https://github.com/Matt711) +- Add docstrings and test for strings.convert_durations APIs for pylibcudf ([#16982](https://github.com/rapidsai/cudf/pull/16982)) [@mroeschke](https://github.com/mroeschke) +- JSON tokenizer memory optimizations ([#16978](https://github.com/rapidsai/cudf/pull/16978)) [@shrshi](https://github.com/shrshi) +- Turn on `xfail_strict = true` for all python packages ([#16977](https://github.com/rapidsai/cudf/pull/16977)) [@wence-](https://github.com/wence-) +- Add string.convert.convert_datetime/convert_booleans APIs to pylibcudf ([#16971](https://github.com/rapidsai/cudf/pull/16971)) [@mroeschke](https://github.com/mroeschke) +- Auto assign PR to author ([#16969](https://github.com/rapidsai/cudf/pull/16969)) [@Matt711](https://github.com/Matt711) +- Deprecate support for directly accessing logger ([#16964](https://github.com/rapidsai/cudf/pull/16964)) [@vyasr](https://github.com/vyasr) +- Expunge NamedColumn ([#16962](https://github.com/rapidsai/cudf/pull/16962)) [@wence-](https://github.com/wence-) +- Add clang-tidy to CI ([#16958](https://github.com/rapidsai/cudf/pull/16958)) [@vyasr](https://github.com/vyasr) +- Address all remaining clang-tidy errors ([#16956](https://github.com/rapidsai/cudf/pull/16956)) [@vyasr](https://github.com/vyasr) +- Apply clang-tidy autofixes ([#16949](https://github.com/rapidsai/cudf/pull/16949)) [@vyasr](https://github.com/vyasr) +- Use nvcomp wheel instead of bundling nvcomp ([#16946](https://github.com/rapidsai/cudf/pull/16946)) [@KyleFromNVIDIA](https://github.com/KyleFromNVIDIA) +- Refactor the `cuda_memcpy` functions to make them more usable ([#16945](https://github.com/rapidsai/cudf/pull/16945)) [@vuule](https://github.com/vuule) +- Add string.split APIs to pylibcudf ([#16940](https://github.com/rapidsai/cudf/pull/16940)) [@mroeschke](https://github.com/mroeschke) +- clang-tidy fixes part 3 ([#16939](https://github.com/rapidsai/cudf/pull/16939)) [@vyasr](https://github.com/vyasr) +- clang-tidy fixes part 2 ([#16938](https://github.com/rapidsai/cudf/pull/16938)) [@vyasr](https://github.com/vyasr) +- clang-tidy fixes part 1 ([#16937](https://github.com/rapidsai/cudf/pull/16937)) [@vyasr](https://github.com/vyasr) +- Add string.wrap APIs to pylibcudf ([#16935](https://github.com/rapidsai/cudf/pull/16935)) [@mroeschke](https://github.com/mroeschke) +- Add string.translate APIs to pylibcudf ([#16934](https://github.com/rapidsai/cudf/pull/16934)) [@mroeschke](https://github.com/mroeschke) +- Add string.find_multiple APIs to pylibcudf ([#16920](https://github.com/rapidsai/cudf/pull/16920)) [@mroeschke](https://github.com/mroeschke) +- Batch memcpy the last offsets for output buffers of str and list cols in PQ reader ([#16905](https://github.com/rapidsai/cudf/pull/16905)) [@mhaseeb123](https://github.com/mhaseeb123) +- reduce wheel build verbosity, narrow deprecation warning filter ([#16896](https://github.com/rapidsai/cudf/pull/16896)) [@jameslamb](https://github.com/jameslamb) +- Improve aggregation device functors ([#16884](https://github.com/rapidsai/cudf/pull/16884)) [@PointKernel](https://github.com/PointKernel) +- Upgrade pandas pinnings to support `2.2.3` ([#16882](https://github.com/rapidsai/cudf/pull/16882)) [@galipremsagar](https://github.com/galipremsagar) +- Fix 24.10 to 24.12 forward merge ([#16876](https://github.com/rapidsai/cudf/pull/16876)) [@bdice](https://github.com/bdice) +- Manually resolve conflicts in between branch-24.12 and branch-24.10 ([#16871](https://github.com/rapidsai/cudf/pull/16871)) [@galipremsagar](https://github.com/galipremsagar) +- Add in support for setting delim when parsing JSON through java ([#16867](https://github.com/rapidsai/cudf/pull/16867)) [@revans2](https://github.com/revans2) +- Reapply `mixed_semi_join` refactoring and bug fixes ([#16859](https://github.com/rapidsai/cudf/pull/16859)) [@mhaseeb123](https://github.com/mhaseeb123) +- Add string padding and side_type APIs to pylibcudf ([#16833](https://github.com/rapidsai/cudf/pull/16833)) [@mroeschke](https://github.com/mroeschke) +- Organize parquet reader mukernel non-nullable code, introduce manual block scans ([#16830](https://github.com/rapidsai/cudf/pull/16830)) [@pmattione-nvidia](https://github.com/pmattione-nvidia) +- Remove superfluous use of std::vector for std::future ([#16829](https://github.com/rapidsai/cudf/pull/16829)) [@kingcrimsontianyu](https://github.com/kingcrimsontianyu) +- Rework `read_csv` IO to avoid reading whole input with a single `host_read` ([#16826](https://github.com/rapidsai/cudf/pull/16826)) [@vuule](https://github.com/vuule) +- Add strings.combine APIs to pylibcudf ([#16790](https://github.com/rapidsai/cudf/pull/16790)) [@mroeschke](https://github.com/mroeschke) +- Add remaining string.char_types APIs to pylibcudf ([#16788](https://github.com/rapidsai/cudf/pull/16788)) [@mroeschke](https://github.com/mroeschke) +- Add new nvtext minhash_permuted API ([#16756](https://github.com/rapidsai/cudf/pull/16756)) [@davidwendt](https://github.com/davidwendt) +- Avoid public constructors when called with columns to avoid unnecessary validation ([#16747](https://github.com/rapidsai/cudf/pull/16747)) [@mroeschke](https://github.com/mroeschke) +- Use `changed-files` shared workflow ([#16713](https://github.com/rapidsai/cudf/pull/16713)) [@KyleFromNVIDIA](https://github.com/KyleFromNVIDIA) +- lint: replace `isort` with Ruff's rule I ([#16685](https://github.com/rapidsai/cudf/pull/16685)) [@Borda](https://github.com/Borda) +- Improve the performance of low cardinality groupby ([#16619](https://github.com/rapidsai/cudf/pull/16619)) [@PointKernel](https://github.com/PointKernel) +- Parquet reader list microkernel ([#16538](https://github.com/rapidsai/cudf/pull/16538)) [@pmattione-nvidia](https://github.com/pmattione-nvidia) +- AWS S3 IO through KvikIO ([#16499](https://github.com/rapidsai/cudf/pull/16499)) [@madsbk](https://github.com/madsbk) +- Refactor `histogram` reduction using `cuco::static_set::insert_and_find` ([#16485](https://github.com/rapidsai/cudf/pull/16485)) [@srinivasyadav18](https://github.com/srinivasyadav18) +- Use numba-cuda>=0.0.13 ([#16474](https://github.com/rapidsai/cudf/pull/16474)) [@gmarkall](https://github.com/gmarkall) + # cudf 24.10.00 (9 Oct 2024) ## 🚨 Breaking Changes