Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding optional arg to BF.INSERT to allow users to check if their bloom filter can reach the desired size #41

Open
wants to merge 2 commits into
base: unstable
Choose a base branch
from

Conversation

zackcam
Copy link
Contributor

@zackcam zackcam commented Jan 17, 2025

Overview

This PR is proposing to add an extra optional arg to the BF.INSERT command or create a new command in order to allow users to know if their creation will be able to scale to what they desire. For option 1, we would add an optional arg called “ATLEASTCAPACITY” and validate whether it is possible to achieve X capacity given the bloom object memory usage limit (by default 128MB), fp rate, Tightening, expansion, and scale outs. For option 2 we will add a new command that would output the capacity that can be reached before having an error thrown from scaling out given the bloom object memory usage limit (by default 128MB), fp rate, Tightening, expansion.

Option 1: Amend the BF.INSERT command

Adding ATLEASTCAPCITY as an optional arg of BF.INSERT. This argument would allow the user to specify what they think their filter will scale up to. If the capacity that they think it will scale to isn't viable given the specifications provided then we will throw an error. New bf.insert command will look like:

BF.INSERT <key> [CAPACITY capacity] [ERROR fp_error] [EXPANSION expansion] [NOCREATE] [NONSCALING] [ATLEASTCAPACITY wantedcapacity] [ITEMS item]

Other name proposals for ATLEASTCAPACITY name:

  • MINPOTENTIALCAPACITY
  • PLANNEDSIZE
  • VALIDATECAPACITY
  • GROWTHCAPCITY
  • TARGETCAPACITY

There are three new error messages associated with this:

pub const WANTED_CAPACITY_EXCEEDS_MAX_SIZE: &str =
    "ERR Wanted capacity would go beyond bloom object memory limit";
pub const WANTED_CAPACITY_FALSE_POSITIVE_INVALID: &str =
    "ERR False positive degrades too much to reach wanted capacity";
pub const NON_SCALING_AND_WANTED_CAPACITY_IS_INVALID: &str =
    "ERR Specifying NONSCALING and ATLEASTCAPCITY is not allowed";

Note: the user specifying NONSCALING and ATLEASTCAPACITY isn't allowed as this new argument is only useful for seeing if a filter can scale to the desired capacity.

Option 2: Create a new command BF.GETMAXCAPACITY

This would check what the max capacity of a bloom filter is given a capacity, error rate and expansion. This command will look like:

BF.GETMAXCAPACITY [CAPACITY capacity] [ERROR fp_error] [EXPANSION expansion]

This command would then output the capacity that could be reached before having an error thrown from scaling out. This would not create a filter given the specified arguments.

Testing

Added checks for these three error messages in test_bloom_command.py
If we choose to implement option two will create new tests for this.

@zackcam zackcam force-pushed the unstable branch 6 times, most recently from e2b5632 to 94bdd4c Compare January 21, 2025 19:20
…om filter can reach the desired size

Signed-off-by: zackcam <[email protected]>
@madolson
Copy link
Member

All things considered, I think the current naming and API arguments make sense. The only alternative I thought of is we can add it BF.INFO, which might make sense either way. We don't need a new command, and diverging on info should be OK since it's written to be extensible.

@zackcam
Copy link
Contributor Author

zackcam commented Jan 22, 2025

All things considered, I think the current naming and API arguments make sense. The only alternative I thought of is we can add it BF.INFO, which might make sense either way. We don't need a new command, and diverging on info should be OK since it's written to be extensible.

Would we want to add to both BF.INFO and BF.INSERT?
I think that adding to both would make the most sense as adding to BF.INSERT can allow start up confirmation of the size they want is possible. While adding to BF.INFO can provide the max capacity information at any point while also showing how big it could get not just if it can get bigger than what the user wanted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants