Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Metal vertex format lookup logic and reduce memory used by MVKPixelFormats lookups. #2105

Merged
merged 4 commits into from
Jan 4, 2024

Conversation

billhollings
Copy link
Contributor

  • Remove MVKPixelFormats::_mtlFormatDescIndicesByMTLVertexFormats and
    index into _mtlVertexFormatDescriptions using MTLVertexFormat directly.
  • Fix assertion to test MTLVertexFormat < _mtlVertexFormatCount.
  • Recognize every MTLPixelFormat value can be held in uint16_t.
  • Reduce array sizes to minimum to hold mapped MTLPixelFormat values, and rely on assertions to validate if additional formats are added in future.

This fixes the assertion issues first identified in PR #1940.

- Remove MVKPixelFormats::_mtlFormatDescIndicesByMTLVertexFormats and
  index into _mtlVertexFormatDescriptions using MTLVertexFormat directly.
- Fix assertion to test MTLVertexFormat < _mtlVertexFormatCount.
@spnda
Copy link
Collaborator

spnda commented Dec 28, 2023

Is there any particular reason we don't use something like std::vector here instead of these statically sized C-arrays? This would get rid of all of this size nonsense, which was the only reason #1940 seemed correct but wasn't. And it would allow seamless addition of new formats in the future, while also avoiding any magic values like for example static const uint32_t _mtlPixelFormatCoreCount = MTLPixelFormatX32_Stencil8 + 2; to make older Xcode versions happy. The optimization value we get by saving one allocation is so minimal here I don't think it should be the deciding factor.

@billhollings
Copy link
Contributor Author

Is there any particular reason we don't use something like std::vector here

You're not wrong.

However, MVKPixelFormats instances hang around forever, and are inherently static, so the design was about statically allocating the appropriate amount of memory. A std::vector will over allocate (unless the capacity is set correctly). We could use an MVKSmallVector with the correct allocation, but we'd want to still have the assertion tests anyway, so that we're not expanding the vector beyond its capacity.

BTW...the constants _vkFormatCoreCount and _mtlPixelFormatCoreCount are not used for allocation. They handle the non-linear nature of VkFormat and MTLPixelFormat values, respectively. Beyond those respective values, the enums take large jumps between values. So we're stuck with those constants at least.

Copy link
Collaborator

@cdavis5e cdavis5e left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few minor nitpicks...

MoltenVK/MoltenVK/GPUObjects/MVKPixelFormats.mm Outdated Show resolved Hide resolved
MoltenVK/MoltenVK/GPUObjects/MVKPixelFormats.mm Outdated Show resolved Hide resolved
@spnda
Copy link
Collaborator

spnda commented Dec 28, 2023

However, MVKPixelFormats instances hang around forever, and are inherently static, so the design was about statically allocating the appropriate amount of memory. A std::vector will over allocate (unless the capacity is set correctly). We could use an MVKSmallVector with the correct allocation, but we'd want to still have the assertion tests anyway, so that we're not expanding the vector beyond its capacity.

std::vector::shrink_to_fit can be called at the end of the initialization, which will shrink the allocation to precisely fit as many elements as are currently stored, as no additional memory will ever be needed. It can then function as a statically sized array, which would solve your memory concern. Alternatively, std::vector::reserve could (also) be called with a guess or some statically known value. That wouldn't get rid of the value, but it wouldn't tie the correctness of the program to a single integer.

BTW...the constants _vkFormatCoreCount and _mtlPixelFormatCoreCount are not used for allocation. They handle the non-linear nature of VkFormat and MTLPixelFormat values, respectively. Beyond those respective values, the enums take large jumps between values. So we're stuck with those constants at least.

Yes I know. Counting the formats is inherently finicky, and having to additionally rely on magic values for newer formats makes it even worse. I was also critiquing the weird use of one statically sized array and an unordered_map for core and extension format description lookups. Very similar issue to the vector replacement at eliminating accidental errors, though not as important.

@billhollings
Copy link
Contributor Author

I was also critiquing the weird use of one statically sized array and an unordered_map for core and extension format description lookups.

I agree, it's a different design. 😉

Access to the format descriptions is the core operation of MVKPixelFormats, and is used by essentially all of its functions. So the intention was to make it as fast as possible. The array/map combo design was to handle this:

...
VK_FORMAT_ASTC_12x10_SRGB_BLOCK = 182,
VK_FORMAT_ASTC_12x12_UNORM_BLOCK = 183,
VK_FORMAT_ASTC_12x12_SRGB_BLOCK = 184,
VK_FORMAT_G8B8G8R8_422_UNORM = 1000156000,
VK_FORMAT_B8G8R8G8_422_UNORM = 1000156001,
...

The first 184 elements are consecutive from zero, lending themselves to be put into an array or vector. The remaining elements cannot be handled that way. Also, the linear elements are in general, more commonly used than the later elements, so it would be a shame to put them all in a umap (where access would be something like an order of magnitude slower).

- Add MVKInflectionMap collection to manage lookups based on enums
  that have a large set of consecutive elements, plus additional enum
  values that are more sparsely assigned.
- Recognize every MTLPixelFormat value can be held in uint16_t.
- Reduce inflection-map sizes by calling shrink_to_fit().
- runcts script log completion time (unrelated).
@billhollings
Copy link
Contributor Author

billhollings commented Dec 31, 2023

Okay. I decided to have a little design fun to incorporate @spnda's recommendations.

I've added a new MVKInflectionMap class to encapsulate the combo of linear and sparse format enum elements, remove the need for assertions, and allow the excess memory to be trimmed after population.

@billhollings
Copy link
Contributor Author

@cdavis5e @spnda Any further suggestions before I pull this in?

Copy link
Collaborator

@spnda spnda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MVKInflectionMap should probably get a reserve function to avoid a lot of reallocations, which we can then use in initMTLPixelFormatCapabilities and initVkFormatCapabilities in addition to the shrink_to_fit.

MoltenVK/MoltenVK/GPUObjects/MVKPixelFormats.mm Outdated Show resolved Hide resolved
- Add MVKInflectionMap collection to manage lookups based on enums
  that have a large set of consecutive elements, plus additional enum
  values that are more sparsely assigned.
- Recognize every MTLPixelFormat value can be held in uint16_t.
- Reduce inflection-map sizes by calling shrink_to_fit().
- runcts script log completion time (unrelated).
@spnda
Copy link
Collaborator

spnda commented Jan 4, 2024

@billhollings I also mentioned this in my review, but didn't know where to put it into the diff as a review so just put it into the main review message:

MVKInflectionMap should probably get a reserve function to avoid a lot of reallocations, which we can then use in initMTLPixelFormatCapabilities and initVkFormatCapabilities in addition to the shrink_to_fit.

That hasn't been addressed and I still think it's important. When looping through those two functions the MVKInflectionMap calls push_back() for every element, effectively letting the MVKSmallVector resize over and over again. As the size is know, having a reserve call at the start of the function is important in my opinion.

- Add MVKInflectionMap collection to manage lookups based on enums
  that have a large set of consecutive elements, plus additional enum
  values that are more sparsely assigned.
- Recognize every MTLPixelFormat value can be held in uint16_t.
- Reduce inflection-map sizes by calling shrink_to_fit().
- runcts script log completion time (unrelated).
@billhollings
Copy link
Contributor Author

@billhollings I also mentioned this in my review, but didn't know where to put it into the diff as a review so just put it into the main review message:

MVKInflectionMap should probably get a reserve function to avoid a lot of reallocations, which we can then use in initMTLPixelFormatCapabilities and initVkFormatCapabilities in addition to the shrink_to_fit.

That hasn't been addressed and I still think it's important. When looping through those two functions the MVKInflectionMap calls push_back() for every element, effectively letting the MVKSmallVector resize over and over again. As the size is know, having a reserve call at the start of the function is important in my opinion.

Uggh. Sorry. I missed your initial recommendation for some reason. Good catch.

I've added reserve(), and to future-proof, I've arbitrarily pre-allocated enough capacity for roughly double the current number of formats in each case. shrink_to_fit() will take care of collapsing the excess.

@billhollings billhollings merged commit 48adb42 into KhronosGroup:main Jan 4, 2024
6 checks passed
@billhollings billhollings deleted the fix-mtl-fmt-lookup branch January 4, 2024 19:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants