Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extra memory consumed by embedded sds in new HashObject fields #1567

Open
ranshid opened this issue Jan 15, 2025 · 2 comments · May be fixed by #1613
Open

Extra memory consumed by embedded sds in new HashObject fields #1567

ranshid opened this issue Jan 15, 2025 · 2 comments · May be fixed by #1613
Assignees

Comments

@ranshid
Copy link
Member

ranshid commented Jan 15, 2025

Following #1502 which introduced hashtable instead of dict in hash objects we started embedding the field sds in the hashTableEntry.
The problem is that the field sds might arrive from different sources:

  1. command arguments - in the usual case the sds will be provided from the parsed command arguments.. In such cases the sds is from a stringObject which means it will always have a minimal header size of 3 bytes (sds8).
  2. listpack conversion - when we convert from listpack to hashtable the listpack is scanned and the field sds is being created from the listpack string via sdsnewsize which will use the minimal header size (ie small strings will use sds5 which is 1 byte).
  3. Modules - for example VM_HashSet will create a RAW string object which will have the sds allocated with a minimal size header (ie sds5 for small strings which is 1 byte long).

When we create the hashTable entry in hashTypeCreateEntry we will embed the field sds according to the provided sds representation, so in case the field originated at a parsed command argument it will use extra 2 bytes for the header.

While there would probably NOT be any degradation in overall memory utilization (since the new hashtable is more memory efficient) it might cause strange results following listpack conversions.
for example:
say hash1 is created when the hash_max_listpack_entries config is 0 and added with 10 small fields
and hash2 is created when the hash_max_listpack_entries config is 0 9 and added with 10 small fields

after all 10 elements were added both tables are expected to show the same memory consumption, but hash1 would show as using extra 18 bytes of memory.

NOTE - I do think the issue is minor and would probably be addressed during the work on #1551 and/or #640 So I mainly opened it in order to have a better tracking of the issue.

@zuiderkwast
Copy link
Contributor

I started looking at this. My first idea was to convert from sds8 to sds5 in sdscopytobuffer (used when we embed it in another structure), but it may be better to allow sds5 already in EMBSTR-encoded serverObject. Then sdscopytobuffer can just copy the representation as-is.

The only drawback is that sds5 doesn't know its own allocation size. With embedded sds5, I saw some weird output from debug sdslen command that we need to fix. Even for EMBSTR encoded serverObject, we track the usable size of the allocation using the sds header of the embedded sds8 string. With sds5, we'll need to rely on zmalloc_usable_size for the sds5 case, which is probably fine too.

@zuiderkwast
Copy link
Contributor

Note: It's not only embedded has fields. It's the same problem for embedded keys in serverObject.

@zuiderkwast zuiderkwast linked a pull request Jan 24, 2025 that will close this issue
@zuiderkwast zuiderkwast self-assigned this Jan 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants