Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use new hashtable for hashes #1095

Closed
Tracked by #169
zuiderkwast opened this issue Sep 30, 2024 · 0 comments · Fixed by #1502
Closed
Tracked by #169

Use new hashtable for hashes #1095

zuiderkwast opened this issue Sep 30, 2024 · 0 comments · Fixed by #1502

Comments

@zuiderkwast
Copy link
Contributor

zuiderkwast commented Sep 30, 2024

The new hashtable doesn't have a dict entry, so we need to invent one to hold field and value.

Possibly, we can embed field or value or both. The method used earlier is the function sdscopytobuffer and a byte to store the size of the sds header.

A field name never changes so it seems safe to always embed the field name, while the value can change and it can be very large so we can allow it to be non-embedded. This is very similar to string keys with the EMBSTR vs RAW encoding.

Idea of layout with both embedded:

+-----------------------------+-----------------------------+
| field                       | value                       |
| hdr-size | hdr | content \0 | hdr-size | hdr | content \0 |
+-----------------------------+-----------------------------+

Idea of layout with field embedded, value as pointer, stored first to make it aligned. (#1502 simply uses this representation for all field-value pairs.)

+-------+-----------------------------+
| value | field                       |
| ptr   | hdr-size | hdr | content \0 |
+-------+-----------------------------+

To distinguish between different representations, we could use the LSB bits in the pointer to it, the pointer stored in the hashtable. (This solution was not chosen in #1579.)

 hashtable
+----------+        +-----------+----------------+
| ptr  000 -------->| value-ptr | embedded field |
|          |        +-----------+----------------+
|          |
|          |        +----------------+----------------+
| ptr  001 -------->| embedded field | embedded value |
|          |        +----------------+----------------+
| ...      |
+----------+

Another way to indicate different kinds of entries is to use some bits in the sds header of the field. That's the chosen representation in #1579.

@zuiderkwast zuiderkwast changed the title Use hashset for hashes Use new hashtable for hashes Nov 20, 2024
proost pushed a commit to proost/valkey that referenced this issue Jan 17, 2025
This PR replaces dict with the new hashtable data structure in the HASH
datatype. There is a new struct for hashtable items which contains a
pointer to value sds string and the embedded key sds string. These
values were previously stored in dictEntry. This structure is kept
opaque so we can easily add small value embedding or other optimizations
in the future.

closes valkey-io#1095

---------

Signed-off-by: Rain Valentine <[email protected]>
Signed-off-by: proost <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant