Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for read-only files #13

Merged
merged 2 commits into from
Nov 25, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 15 additions & 15 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,30 +13,30 @@ is a member of a set. The `wikipedia page <http://en.wikipedia.org/wiki/Bloom_fi
has further information on their nature. This module implements a Bloom filter
in python that's fast and uses mmap files for better scalability.

Here's a quick example::
Here's a quick example:

.. code:: python
from pybloomfilter import BloomFilter
.. code-block:: python
bf = BloomFilter(10000000, 0.01, 'filter.bloom')
>>> from pybloomfilter import BloomFilter
with open("/usr/share/dict/words") as f:
for word in f:
bf.add(word.rstrip())
>>> bf = BloomFilter(10000000, 0.01, 'filter.bloom')
>>> with open("/usr/share/dict/words") as f:
>>> for word in f:
>>> bf.add(word.rstrip())
print 'apple' in bf
#outputs True
>>> print 'apple' in bf
True
That wasn't so hard, was it? Now, there are a lot of other things
we can do. For instance, let's say we want to create a similar
filter with just a few pieces of fruit::
filter with just a few pieces of fruit:

.. code:: python
fruitbf = bf.copy_template("fruit.bloom")
fruitbf.update(("apple", "banana", "orange", "pear"))
print fruitbf.to_base64()
>>> fruitbf = bf.copy_template("fruit.bloom")
>>> fruitbf.update(("apple", "banana", "orange", "pear"))
>>> print(fruitbf.to_base64())
"eJzt2k13ojAUBuA9f8WFyofF5TWChlTHaPzqrlqFCtj6gQi/frqZM2N7aq3Gis59d2ye85KTRbhk"
"0lyu1NRmsQrgRda0I+wZCfXIaxuWv+jqDxA8vdaf21HIOSn1u6LRE0VL9Z/qghfbBmxZoHsqM3k8"
"N5XyPAxH2p22TJJoqwU9Q0y0dNDYrOHBIa3BwuznapG+KZZq69JUG0zu1tqI5weJKdpGq7PNJ6tB"
Expand Down Expand Up @@ -76,7 +76,7 @@ Install
Please have `Cython` installed. Please note that this version is for Python 3.
In case you are using Python 2, please see https://github.com/axiak/pybloomfiltermmap.

To install:
To install::

$ pip install cython
$ pip install pybloomfiltermmap3
Expand Down
44 changes: 24 additions & 20 deletions docs/ref.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,12 +10,12 @@ BloomFilter Class Reference
.. moduleauthor:: Michael Axiak <[email protected]>


.. class:: BloomFilter(capacity : int, error_rate : float, [filename=None : string], [perm=0755])
.. class:: BloomFilter(capacity: int, error_rate: float, [filename = None: string], [mode = "rw+"], [perm=0755])

Create a new BloomFilter object with a given capacity and error_rate.
**Note that we do not check capacity.** This is important, because
I want to be able to support logical OR and AND (see below).
The capacity and error_rate then together serve as a contract---you add
we want to be able to support logical OR and AND (see below).
The capacity and error_rate then together serve as a contract --- you add
less than capacity items, and the Bloom Filter will have an error rate
less than error_rate.

Expand All @@ -24,7 +24,7 @@ Class Methods

.. classmethod:: BloomFilter.open(filename)

Return a BloomFilter object using an already-existing Bloomfilter file.
Return a BloomFilter object using an already existing BloomFilter file.

.. classmethod:: BloomFilter.from_base64(filename, string, [perm=0755])

Expand All @@ -35,11 +35,11 @@ Class Methods
Example::

>>> bf = BloomFilter.from_base64("/tmp/mike.bf",
"eJwFwcuWgiAAANC9v+JCx7By0QKt0GHEbKSknflAQ9QmTyRfP/fW5E9XTRSX"
"qcLlqGNXphAqcfVH\nRoNv0n4JlTpIvAP0e1+RyXX6I637ggA+VPZnTYR1A4"
"Um5s9geYaZZLiT208JIiG3iwhf3Fwlzb3Y\n5NRL4uNQS6/d9OvTDJbnZMnR"
"zcrplOX5kmsVIkQziM+vw4hCDQ3OkN9m3WVfPWzGfaTeRftMCLws\nPnzEzs"
"gjAW60xZTBbj/bOAgYbK50PqjdzvgHZ6FHZw==\n")
"eJwFwcuWgiAAANC9v+JCx7By0QKt0GHEbKSknflAQ9QmTyRfP/fW5E9XTRSX"
"qcLlqGNXphAqcfVH\nRoNv0n4JlTpIvAP0e1+RyXX6I637ggA+VPZnTYR1A4"
"Um5s9geYaZZLiT208JIiG3iwhf3Fwlzb3Y\n5NRL4uNQS6/d9OvTDJbnZMnR"
"zcrplOX5kmsVIkQziM+vw4hCDQ3OkN9m3WVfPWzGfaTeRftMCLws\nPnzEzs"
"gjAW60xZTBbj/bOAgYbK50PqjdzvgHZ6FHZw==\n")
>>> "MIKE" in bf
True

Expand All @@ -60,15 +60,20 @@ Instance Attributes

.. attribute:: BloomFilter.name

The file name (compatible with file objects)
The file name (compatible with file objects).

.. attribute:: BloomFilter.num_bits

The number of bits used in the filter as buckets
The number of bits used in the filter as buckets.

.. attribute:: BloomFilter.num_hashes

The number of hash functions used when computing
The number of hash functions used when computing.

.. attribute:: BloomFilter.read_only

Boolean, indicating if the opened BloomFilter is read-only.
Always ``False`` for an in-memory BloomFilter.


Instance Methods
Expand All @@ -78,8 +83,8 @@ Instance Methods

Add the item to the bloom filter.

:param item: Hashable object
:rtype: Boolean (True if item already in the filter)
:param item: hashable object
:rtype: boolean (``True`` if item already in the filter)

.. method:: BloomFilter.clear_all()

Expand Down Expand Up @@ -121,7 +126,7 @@ Instance Methods
this may not be too useful. I find it useful for debugging so I can
copy filters from one terminal to another in their entirety.

:rtype: Base64 encoded string representing filter
:rtype: base64 encoded string representing filter

.. method:: BloomFilter.update(iterable)

Expand All @@ -136,7 +141,7 @@ Instance Methods

The result will occur **in place**. That is, calling::

bf.union(bf2)
bf.union(bf2)

is a way to add all the elements of bf2 to bf.

Expand All @@ -147,7 +152,7 @@ Instance Methods

The same as union() above except it uses a set AND instead of a
set OR.

*N.B.: Calling this function will render future calls to len()
invalid.*

Expand Down Expand Up @@ -182,11 +187,11 @@ Magic Methods

.. method:: BloomFilter.__ior__(filter) -> BloomFilter

See union(filter)
See :meth:`BloomFilter.union`.

.. method:: BloomFilter.__iand__(filter) -> BloomFilter

See intersection(filter)
See :meth:`BloomFilter.intersection`.

Exceptions
--------------
Expand All @@ -195,4 +200,3 @@ Exceptions

The exception that is raised if len() is called on a BloomFilter
object after |=, &=, intersection(), or union() is used.
5 changes: 4 additions & 1 deletion src/mmapbitarray.c
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,7 @@ MBArray * mbarray_Create_Mmap(BTYPE num_bits, const char * file, const char * he
MBArray * array = (MBArray *)malloc(sizeof(MBArray));
uint64_t filesize;
int32_t fheaderlen;
int mmap_flags = PROT_READ;

if (!array || errno) {
return NULL;
Expand Down Expand Up @@ -148,9 +149,11 @@ MBArray * mbarray_Create_Mmap(BTYPE num_bits, const char * file, const char * he
}

errno = 0;
// Add PROT_WRITE if we have write permissions
mmap_flags |= (oflag & O_RDWR) ? PROT_WRITE : 0;
array->vector = (DTYPE *)mmap(NULL,
_mmap_size(array),
PROT_READ | PROT_WRITE,
mmap_flags,
MAP_SHARED,
array->fd,
0);
Expand Down
Loading