Extra memory consumed by embedded sds in new HashObject fields #1567

ranshid · 2025-01-15T08:04:36Z

Following #1502 which introduced hashtable instead of dict in hash objects we started embedding the field sds in the hashTableEntry.
The problem is that the field sds might arrive from different sources:

command arguments - in the usual case the sds will be provided from the parsed command arguments.. In such cases the sds is from a stringObject which means it will always have a minimal header size of 3 bytes (sds8).
listpack conversion - when we convert from listpack to hashtable the listpack is scanned and the field sds is being created from the listpack string via sdsnewsize which will use the minimal header size (ie small strings will use sds5 which is 1 byte).
Modules - for example VM_HashSet will create a RAW string object which will have the sds allocated with a minimal size header (ie sds5 for small strings which is 1 byte long).

When we create the hashTable entry in hashTypeCreateEntry we will embed the field sds according to the provided sds representation, so in case the field originated at a parsed command argument it will use extra 2 bytes for the header.

While there would probably NOT be any degradation in overall memory utilization (since the new hashtable is more memory efficient) it might cause strange results following listpack conversions.
for example:
say hash1 is created when the hash_max_listpack_entries config is 0 and added with 10 small fields
and hash2 is created when the hash_max_listpack_entries config is 0 9 and added with 10 small fields

after all 10 elements were added both tables are expected to show the same memory consumption, but hash1 would show as using extra 18 bytes of memory.

NOTE - I do think the issue is minor and would probably be addressed during the work on #1551 and/or #640 So I mainly opened it in order to have a better tracking of the issue.

zuiderkwast · 2025-01-20T11:24:54Z

I started looking at this. My first idea was to convert from sds8 to sds5 in sdscopytobuffer (used when we embed it in another structure), but it may be better to allow sds5 already in EMBSTR-encoded serverObject. Then sdscopytobuffer can just copy the representation as-is.

The only drawback is that sds5 doesn't know its own allocation size. With embedded sds5, I saw some weird output from debug sdslen command that we need to fix. Even for EMBSTR encoded serverObject, we track the usable size of the allocation using the sds header of the embedded sds8 string. With sds5, we'll need to rely on zmalloc_usable_size for the sds5 case, which is probably fine too.

zuiderkwast · 2025-01-24T19:00:03Z

Note: It's not only embedded has fields. It's the same problem for embedded keys in serverObject.

zuiderkwast linked a pull request Jan 24, 2025 that will close this issue

Embed keys and hash fields as SDS type 5 #1613

Draft

zuiderkwast self-assigned this Jan 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extra memory consumed by embedded sds in new HashObject fields #1567

Extra memory consumed by embedded sds in new HashObject fields #1567

ranshid commented Jan 15, 2025 •

edited

Loading

zuiderkwast commented Jan 20, 2025

zuiderkwast commented Jan 24, 2025

Extra memory consumed by embedded sds in new HashObject fields #1567

Extra memory consumed by embedded sds in new HashObject fields #1567

Comments

ranshid commented Jan 15, 2025 • edited Loading

zuiderkwast commented Jan 20, 2025

zuiderkwast commented Jan 24, 2025

ranshid commented Jan 15, 2025 •

edited

Loading