-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move CompressionZstrong.cpp into fb directory #36
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
fbshipit-source-id: 7d9c3fbad922ea80c21c9f2982049c23f5f1a470
Summary: Pull Request resolved: facebookincubator/nimble#1 Reviewed By: xiaoxmeng Differential Revision: D52203900 fbshipit-source-id: 5263b9754f80d4f06aa68c15a8a5b182729aef17
…oken Summary: We have found in the S385274 that there is some con-currency issue when multiple thread try to access or update the Storage access token (BAT/SAT). This change currently introduce a lock mechanism to address this issue. Next steps: We have also seen that currently we call WS token service multiple time and override the token instead we can add a way to ensure that token is only fetched once. But this might require some changes in the way we use koski. As it might add an additional step to resolve the token once. Reviewed By: Magoja Differential Revision: D52144115 fbshipit-source-id: e49331ab51d8544969a613f8beb7405e42da0e79
Summary: Pull Request resolved: facebookincubator/nimble#2 Split the process wide memory manager creation and get. Deprecate the default memory manager and memory pool APIs. X-link: facebookincubator/velox#7168 Reviewed By: mbasmanova Differential Revision: D50513171 Pulled By: xiaoxmeng fbshipit-source-id: 0f147c1e3c971ea4f39271fe5609a2224971509d
Summary: With this change, it is now possible to use captured encoding layout file when using Alpha transform. Reviewed By: Sullivan-Patrick Differential Revision: D50957360 fbshipit-source-id: c4e82a23bad81231e426ef4a026ead318eb648d0
Summary: With this change, we now use lazy stats instead of eager stats. This affects both "ad-hoc" encoding selection, and "replay" encoding selection. I measured the performance of this change for "ad-hoc" encoding selection, and it seems negligible. Transforming a file with eager stats took 1m28s on average and the same was observed for lazy. Performance of captured encodings transform shows a big perf improvement, at 1m08s on average (~20% improvement). Reviewed By: Sullivan-Patrick Differential Revision: D50943760 fbshipit-source-id: d602480d2df125724dfef9283146e650f2d00da8
Summary: Adding the ability to dump a captured encoding layout file to Alpha Dump. Output looks as follows: ``` Node Id Parent Id Node Type Node Name Encoding Layout 0 0 Row Trivial{Uncompressed} 1 0 Scalar Trivial{Zstrong} 2 0 Scalar RLE{Uncompressed}[RunLengths:FixedBitWidth{Zstrong},RunValues:Trivial{Zstrong}] 3 0 Scalar Trivial{Zstrong}[Lengths:Constant{Uncompressed}] 4 0 Scalar Trivial{Zstrong}[Lengths:Constant{Uncompressed}] 5 0 Scalar RLE{Uncompressed}[RunLengths:Varint{Uncompressed},RunValues:FixedBitWidth{Uncompressed}] 6 0 Scalar FixedBitWidth{Zstrong} 7 0 Scalar Constant{Uncompressed} 8 0 Scalar RLE{Uncompressed}[RunLengths:FixedBitWidth{Zstrong},RunValues:Trivial{Zstrong}[Lengths:FixedBitWidth{Zstrong}]] 9 0 Scalar FixedBitWidth{Zstrong} 10 0 Map FixedBitWidth{Zstrong} 11 10 Scalar Trivial{Zstrong} 12 10 Scalar Varint{Uncompressed} 13 0 FlatMap Constant{Uncompressed} 14 13 Scalar 1154997419 Trivial{Zstrong} 15 13 Scalar 1451287068 Trivial{Zstrong} 16 13 Scalar 1135016111 Trivial{Zstrong} 17 13 Scalar 1354897182 RLE{Uncompressed}[RunLengths:FixedBitWidth{Zstrong},RunValues:Dictionary{Uncompressed}[Alphabet:Trivial{Uncompressed},Indices:FixedBitWidth{Zstrong}]] 18 13 Scalar 1138012430 Trivial{Zstrong} 19 13 Scalar 2072355203 Trivial{Zstrong} 20 13 Scalar 1467446921 Trivial{Zstrong} 21 13 Scalar 1097049326 Trivial{Zstrong} 22 13 Scalar 1227099726 Trivial{Zstrong} 23 13 Scalar 1944408639 RLE{Uncompressed}[RunLengths:FixedBitWidth{Zstrong},RunValues:Trivial{Zstrong}] 24 13 Scalar 110159 Trivial{Zstrong} 25 13 Scalar 1410611079 RLE{Uncompressed}[RunLengths:FixedBitWidth{Zstrong},RunValues:Trivial{Zstrong}] ... ``` Reviewed By: Sullivan-Patrick Differential Revision: D50957981 fbshipit-source-id: 2f4cf4b44acb24c064612dcec38cf1ba793b2a09
Summary: Allowing training of streams in parallel. Reduced the time of training from minutes to seconds (with current simple training). With this, we can now implement more demanding trainings (like brute forcing). Reviewed By: HuamengJiang Differential Revision: D51076951 fbshipit-source-id: 96acdc932116b73b7960db72790ec8fad44c7f18
The internal and external repositories are out of sync. This Pull Request attempts to brings them back in sync by patching the GitHub repository. Please carefully review this patch. You must disable ShipIt for your project in order to merge this pull request. DO NOT IMPORT this pull request. Instead, merge it directly on GitHub using the MERGE BUTTON. Re-enable ShipIt after merging.
Summary: Pull Request resolved: facebookincubator/nimble#4 Reviewed By: HuamengJiang Differential Revision: D52280685 Pulled By: pedroerp fbshipit-source-id: 46e94f7669b8d629861693985e437b64a8d4ed24
Summary: DWIO contbuild is broken, because the DPP Reader tool (in the Alpha tools folder) can no longer be built in opt mode... Until we have a fix for that, I'm moving this tool outside of the dwio folder. I'm using this opportunity to move few other tools as well. These tools have "external" (heavy) dependencies, like Koski, which make building the alpha directory slower. Reviewed By: HuamengJiang Differential Revision: D52304448 fbshipit-source-id: 85a4c86a2e36dff49b8f21a3eace47ee7d34c129
Summary: Minor cleanups in Alpha: 1. Running glean to remove unused headers 3. Fixing lint warnings Reviewed By: HuamengJiang Differential Revision: D52304447 fbshipit-source-id: 34c130688453386f560b1d291eed1dc5cc826c3c
Summary: LLVM-15 has a warning `-Wunused-but-set-variable` which we treat as an error because it's so often diagnostic of a code issue. Unused variables can compromise readability or, worse, performance. This diff either (a) removes an unused variable and, possibly, it's associated code, or (b) qualifies the variable with `[[maybe_unused]]`, mostly in cases where the variable _is_ used, but, eg, in an `assert` statement that isn't present in production code. - If you approve of this diff, please use the "Accept & Ship" button :-) Reviewed By: dmm-fb Differential Revision: D52356232 fbshipit-source-id: c47aba63a8b4d2a728de7a03e91f542cfcb4182d
Reviewed By: fadimounir Differential Revision: D52322091 fbshipit-source-id: db087755928bed065b9f336f255c3c5aa44ae64e
Summary: Unify Alpha and DWRF feature flattening utils. Reviewed By: marxhxxx Differential Revision: D52425910 fbshipit-source-id: 2b0e4bd0ebbb7caf5ad8290e7dc9b8b1a7b24c37
Summary: Add fbpkg definition such that we can use data copy in chronos jobs. Reviewed By: marxhxxx Differential Revision: D52509300 fbshipit-source-id: a79c555a432681f2782850307b3aa1d7cfa9a2c4
Summary: As title, adding a cache here that keeps track of the cache value and size. Reviewed By: kparichay, HuamengJiang Differential Revision: D50424176 fbshipit-source-id: 7645ebb67d00ec2ec36bdc7b600bedebd9e34292
Summary: Adding unit tests for caching behavior. Tests two write cases - writing one vector at once vs. writing two vectors, ensures that the two readers provide the same output result w.r.t. correctness + compression. Reviewed By: HuamengJiang Differential Revision: D51322567 fbshipit-source-id: ff75dfbb70e35dd9a9522604c2f9b22e3feafaa6
Summary: These variable names are needlessy difficult to understand imo (`offsets` and `offsets_`, `childRanges` intertwined with `filteredRanges`, `casted` for a casted `arrayVector`) Changing them to better represent what they actually are Reviewed By: kparichay Differential Revision: D50847850 fbshipit-source-id: f00fe5a6a905776ed7c94ad031f6f7e2f00d9871
Summary: algorithm header is where std::reverse is defined (which is used below in this file) Reviewed By: tanjialiang Differential Revision: D52709839 fbshipit-source-id: 274cab8085f17db32fd93162e4707c09d13215ff
Summary: Adding a first sketch for an open source cmake-based build system for Alpha. For now it only compiles alpha/common, and requires a few internal dependencies to be commented out first. In the next PRs I'll clean them up. Pull Request resolved: facebookincubator/nimble#5 Reviewed By: zzhao0 Differential Revision: D52717233 Pulled By: pedroerp fbshipit-source-id: a3c5dcef235dbcf6107002e24e3960457e00175a
Summary: Remove usages of checked_memcpy() to remove dependency on internal library. Reviewed By: zzhao0 Differential Revision: D52741439 fbshipit-source-id: 951cb24a07d8fa66a3b113218c07b694af2a75dc
Summary: Three fixes: * Setting default dependency source as AUTO (tries to find locally, bundles if not found). * Adds flag to ignore existing folly compiler warning. * Removes StopWatch.cpp and its tests from CMake, since it relies on an internal library and is not used anywhere in Alpha today. With this change, CMake builds successfully (though they only compile alpha/common today). Pull Request resolved: facebookincubator/nimble#6 Reviewed By: zzhao0 Differential Revision: D52808613 Pulled By: pedroerp fbshipit-source-id: 8c61b6eceba1360778d539477d67c721295e08dd
Summary: Remove unused include and adding an explicit include and dependency since the file uses a symbol from velox/common/file (InMemoryWriteFile). Reviewed By: HuamengJiang Differential Revision: D52823742 fbshipit-source-id: ac218a444d9128f3d6f182ae84cc12a866a135d0
Summary: Adding a compile-time dependency on flatbuffers for now, but this can be removed in the future if we decide to check in the generated files. Adding third-party cmake toolchain for flatbuffer support. Pull Request resolved: facebookincubator/nimble#7 Reviewed By: zzhao0 Differential Revision: D52823929 Pulled By: pedroerp fbshipit-source-id: 2589468a26443e26d0c2812178c65cb5240bc44c
Summary: The order specified in designator intialization should follow the declaration order. Some compilers refuse to compile otherwise. Reviewed By: HuamengJiang Differential Revision: D52823278 fbshipit-source-id: 5feb51b4b3901dacfe6cfc2d75bbcd94f8527b07
Summary: Remove unused thrift files Reviewed By: helfman Differential Revision: D52989670 fbshipit-source-id: ce4cf72d4663a9a1b87978ce4672c9a1344f7cbb
…cy.h Summary: LLVM-15 has a warning `-Wunused-but-set-variable` which we treat as an error because it's so often diagnostic of a code issue. Unused variables can compromise readability or, worse, performance. This diff either (a) removes an unused variable and, possibly, it's associated code, or (b) qualifies the variable with `[[maybe_unused]]`, mostly in cases where the variable _is_ used, but, eg, in an `assert` statement that isn't present in production code. - If you approve of this diff, please use the "Accept & Ship" button :-) Reviewed By: bunnypak, dmm-fb Differential Revision: D53011692 fbshipit-source-id: 861ca264edc6adc812c595b3a11607542a0826eb
Summary: Pull Request resolved: facebookincubator/nimble#8 Move compression/uncompression code to a different file to be able to selectively add them based on compilation flags. This is required to skip zstrong in external builds. Reviewed By: zzhao0 Differential Revision: D52897034 fbshipit-source-id: 3d9180890d957c9f66f07acaa478e2a2465e75a1
Summary: Add missing overrides Reviewed By: zzhao0 Differential Revision: D53030470 fbshipit-source-id: 8c6291a5630b648ae002a1716fc2fabe62d23f3e
Summary: Future changes to FieldWriter (removing chunking from FieldWriter) will change the way parallelism is applied during writes. To simplify following diffs, we revert the current implementation, and we will reimplement it after the field writer is updated. Reviewed By: HuamengJiang Differential Revision: D54076455 fbshipit-source-id: cb8d62e5d5585c1690af837493777c317f4ad7a8
Summary: Until now, we had one "main" stream per schema node (*). The (*) means that we actually needed 2 main streams for ArrayWithOffset, but we worked around it, by modeling the second stream as a child schema node (which is incorrect). With this change, we now allow having multiple streams within each schema node. So schema nodes now truly represent the schema tree, and each node type, describes its own stream, without affecting the schema tree. The assumption of one main stream was used throughout the code, so I had to find all this locations and fix them. Hopefully, I found them all (all tests pass...). I'm splitting this change to two. One for writer and one for reader. I'm doing so trying to make these changes smaller (although, it is still big). Because of this split, I also separated the TypeBuilder from Type (e.g. it no longer inherits from Type). I might bring it back at some point, although, currently, I think I like it... NOTE: As part of this change, the layout of an encoding tree (e.g. encoding db) has changed. I didn't implement a backward compatible deserializer, as we still don't use this thing in production. Once we land this, we should regenerate captured encodings, if we have any stored. (Code should not fail if an old version is stored. It will just ignore it.) Reviewed By: HuamengJiang Differential Revision: D54076457 fbshipit-source-id: cf29a372a6771b91e5a23022648751d2b6dc2ae8
Summary: Second part of the change to support multiple streams per schema node. See description of parent diff. Reviewed By: HuamengJiang Differential Revision: D54076456 fbshipit-source-id: 28692a2d2c42b76d996a47ab463e08aed93d7a1a
Summary: Future diff will introduce the need to be able to materialize empty encodings. With current use of the C memcpy, these cases fail in opt builds, as we call memcpy with nullptr and count of zero. We can add if statements all over to prevent calling memcpy on empty encodings, but instead of adding more branching, I opted for removing branching, by using the std::memcpy, which handles the zero size memcpy correctly. I also used this opportunity to change some places to use std::copy, as in some cases it might have some vectorized improvements internally. Reviewed By: HuamengJiang Differential Revision: D55383916 fbshipit-source-id: 3ad21688dbf1f609d57963f66bf23c736a6f4ba2
Summary: Currently, stream chunking is implemented in the FieldWriter. However, field writer shouldn't know anything about how data is layed out in the file. This is the responsibility of the VeloxWriter layer. This change creates a clear separation between how FieldWriter is accumulating data and how VeloxWriter is later encoding/chunking the data. To do so, we introduce the concept of StreamData. FieldWriter is accumulating raw data in StreamData objects, and VeloxWriter is using these objects to decide when and how to flush. As part of this change, I had to also fix chunking for string types. Previously it was easy to disable string chunking, as FieldWriter had the context to identify string sata easily. With the new implementation, it is not that easy, so I opted for fixing string chunking. With this change I also introduced min size (configurable) for chunking, which prevents creating chunks that are too small. Reviewed By: HuamengJiang Differential Revision: D54879575 fbshipit-source-id: b8fa6bca229cb13b886cfa925eb5bbddecb600a8
Summary: Enabling optional parallel encoding during writes. NOTE: Currently, we added a mutex to the Buffer class, which will affect all callsites. In a future diff, we will change encodings to accept two buffers, one for internal temporary allocations, and another for the final encoded result. The internal buffer will not be thread safe, whereas the exteranl one will be, thus reducing the mutex contention. Reviewed By: HuamengJiang Differential Revision: D55383917 fbshipit-source-id: 8de10acf5e13118fb44907ae206aedd8950de4e1
…nfusion Summary: X-link: facebookincubator/velox#9244 A `TypeWithId` must be a proper tree without diamonds because each node references its parent. The constructor is reparenting the children, so taking shared ownership on children is misleading, especially when they are `const` qualified. With this change, the constructor will take `unique_ptr` of children so they will not be shared between multiple parents. Once the tree is constructed, the individual nodes is immutable and can be shared, so they are stored as `shared_ptr<const TypeWithId>` internally. bypass-github-export-checks Reviewed By: oerling Differential Revision: D55318609 fbshipit-source-id: 12098e76110e59d364fe7eaf8448cae46103b7ed
Summary: Original commit changeset: 8de10acf5e13 Original Phabricator Diff: D55383917 Reviewed By: prasoontelang Differential Revision: D55647585 fbshipit-source-id: 2eda76bf58d9795c8f77579eeb48257939f1431c
Summary: Original commit changeset: b8fa6bca229c Original Phabricator Diff: D54879575 Reviewed By: prasoontelang Differential Revision: D55648607 fbshipit-source-id: 1f4d657a1c5e12871441760ff62ce0af0845019d
Summary: Original commit changeset: 3ad21688dbf1 Original Phabricator Diff: D55383916 Reviewed By: prasoontelang Differential Revision: D55648608 fbshipit-source-id: 596133c36ac7c1c313970da322d635f6d2b13d57
Summary: MemoryLeakCheck checks for total memory used which can vary based on seed (coming from fuzzer and repeated copies) This diff relaxes the memory used check Created from CodeHub with https://fburl.com/edit-in-codehub Reviewed By: DanielMunozT Differential Revision: D55871379 fbshipit-source-id: 5cd9b783a6a753d5552244dd9f67c2bad2928717
Summary: Previously this code conformed from clang-format 12. Reviewed By: igorsugak Differential Revision: D56065247 fbshipit-source-id: f5a985dd8f8b84f2f9e1818b3719b43c5a1b05b3
Summary: Redoing diff D55383916 which was backed out at D55648608 With fix for the issue introduced (incorrect handling of multi-buffer IOBufs). Reviewed By: HuamengJiang Differential Revision: D55822996 fbshipit-source-id: ea1910149688ac98482186d8db58ee84ecd8900f
Summary: This was originally introduced in D54879575 and reverted in D55648607 (because a base diff D55648608 was failing) Reintorducing this diff, rebased on the fixed version. Reviewed By: HuamengJiang Differential Revision: D55822998 fbshipit-source-id: e034b53abba40f4ca750d48303c41e70ac4cffa7
Summary: This was originally introduced in D55383917 and reverted in D55647585 (because a base diff D55648608 was failing) Reintorducing this diff, rebased on the fixed version. Reviewed By: HuamengJiang Differential Revision: D55822997 fbshipit-source-id: 7d7fc6e2ccf5a4333a719354a52098fd2dfa1815
Summary: Recent change broke memory accounting for strings, which led to writing huge stripes if data had huge string columns. Reviewed By: zzhao0, HuamengJiang Differential Revision: D56199608 fbshipit-source-id: b95cbefd97419dd9b21bae8dc73c5f9e7b049ceb
Summary: Explicit test for verifying correct memory accounting of string streams. Reviewed By: zzhao0 Differential Revision: D56204244 fbshipit-source-id: 316b8e6905852497a58481a018c15fbad5e3d0fb
Summary: TSIA. I explicitly not rename the following: * XldbAlphaLogger * Input/Output format strings and Serde * Serde config keys * JNI/Jave paths Reviewed By: sdruzkin, pedroerp Differential Revision: D56225190 fbshipit-source-id: 2da7e826fbbb50f3a7c212d20ac92d96cfb05b6a
The internal and external repositories are out of sync. This Pull Request attempts to brings them back in sync by patching the GitHub repository. Please carefully review this patch. You must disable ShipIt for your project in order to merge this pull request. DO NOT IMPORT this pull request. Instead, merge it directly on GitHub using the MERGE BUTTON. Re-enable ShipIt after merging. Co-authored-by: Facebook Community Bot <[email protected]>
Summary: Our reader lifecycle tests verify that internal velox vectors and buffers are being reused as much as possible (as long as the caller is not holding a reference to these buffers). In these tests we verify that the pointer before and after calling next() is the same. However, for null buffers, they might still change, depending if the data contains nulls, vs not containing any nulls (in this case, we free the nulls vector). Since we randomly fuzz input buffers, we sometimes get parts of the vector containing no nulls and sometimes containing nulls, leading to a nulls buffer release/allocation, and this is failing the test. I changed the tests to only verify null pointers, if both sides are not null. Reviewed By: sdruzkin Differential Revision: D56361752 fbshipit-source-id: f6bc6b473c289bdd6ad88cace5c221fa6bf7886d
Summary: Pull Request resolved: facebookincubator/nimble#29 Reviewed By: zzhao0 Differential Revision: D56377002 Pulled By: pedroerp fbshipit-source-id: c4eb5ff7a382f81bbef63e9407e82f6544ecd67d
Summary: Pull Request resolved: facebookincubator/nimble#31 Reviewed By: zzhao0 Differential Revision: D56381223 Pulled By: pedroerp fbshipit-source-id: 3f9c90d47e939324e907c7aec480d54fb56376fb
Summary: A few small fixes two unblock open source compilation. Reviewed By: zzhao0 Differential Revision: D56381504 fbshipit-source-id: 99ca8dafb38b61c1a1aa7d0a073cc4ee4ae75d4d
Summary: Adding missing files, fixing last compilation nit exposed by clang 16, and disabling compilation of test which currently brings too many velox dependencies. Pull Request resolved: facebookincubator/nimble#32 Reviewed By: helfman Differential Revision: D56453875 Pulled By: pedroerp fbshipit-source-id: 88cf902a897cf35cb0c48f589e54c08328c8e799
Summary: Pull Request resolved: facebookincubator/nimble#34 Reviewed By: zzhao0 Differential Revision: D56494415 Pulled By: pedroerp fbshipit-source-id: d8a6daa5649493df62c54f359b285e83c27fda6f
…nstead Summary: X-link: facebookincubator/velox#9566 As said Reviewed By: kgpai, Magoja Differential Revision: D56430332 fbshipit-source-id: 6c6698bf82c9075930630974f48d911a5b62ea8a
Summary: Moving it to a fb/ directory prevents shipIt from exposing it to open source. Differential Revision: D56523416
facebook-github-bot
added
the
CLA Signed
This label is managed by the Meta Open Source bot.
label
Apr 24, 2024
facebook-github-bot
force-pushed
the
main
branch
from
April 25, 2024 18:34
57e9c86
to
394d51c
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary:
Moving it to a fb/ directory prevents shipIt from exposing it to open
source.
Differential Revision: D56523416