Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add vk::BufferPointer to HLSL #37

Merged
merged 2 commits into from
Apr 14, 2023
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
Add vk::BufferPointer to HLSL
Create a new first class type vk::BufferPointer which allows users to
efficiently reference buffer device addresses and allows tools to
analyze and report on usage in a logical rather than physical way i.e.
names rather than numeric offsets.
greg-lunarg committed Apr 13, 2023
commit ce5e089f7cce8c0339f9f0327d218f7117568c44
280 changes: 280 additions & 0 deletions proposals/0010-vk-buffer-ref.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,280 @@
<!-- {% raw %} -->
# Buffer Pointers in HLSL With vk::BufferPointer

* Author(s): [Greg Fischer](https://github.com/greg-lunarg)
* Sponsor(s): [Chris Bieneman](https://github.com/llvm-beanz), [Steven Perron](https://github.com/s-perron), [Diego Novillo](https://github.com/dnovillo)
* Status: **Under Consideration**
* Planned Version: Retroactive addition to Vulkan X.X (requires SPIR-V X.X. Some language details require HLSL 202x
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to know specifically which Vulkan/SPIR-V versions are required for this feature.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Device addresses are officially supported in core Vulkan 1.2. But some earlier versions support the extension as well, depending on the vendor.

Not sure about SPIRV

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vulkan 1.2 and SPIR-V 1.3. Done.


Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add myself, @s-perron, and @dnovillo as sponsors.

## Introduction

This proposal seeks to improve tool support for Vulkan shaders doing buffer device addressing by adding the vk::BufferPointer type to HLSL.

## Motivation

vk::RawBufferLoad() is currently used to address physical storage buffer space. Unfortunately, use of this function has a number of shortcomings. One is that it generates low-level SPIR-V so that tools such as spirv-reflect, spirv-opt and renderdoc do not have the context to analyze and report on which members of a buffer are used in a logical manner. A bigger problem is that the HLSL programmer must compute the physical offsets of the members of a buffer which is error prone and difficult to maintain.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might want to also describe similar limitations with vk::RawBufferStore(). I'm not sure if this is a driver issue or something else, but could be related: microsoft/DirectXShaderCompiler#4620

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Thanks!


For example, here is a shader using vk::RawBufferLoad(). Note the physical offset 16 hard-coded into the shader:

```c++
// struct GlobalsTest_t
// {
// float4 g_vSomeConstantA;
// float4 g_vTestFloat4;
// float4 g_vSomeConstantB;
// };

struct TestPushConstant_t
{
uint64_t m_nBufferDeviceAddress; // GlobalsTest_t
};

[[vk::push_constant]] TestPushConstant_t g_PushConstants;

float4 MainPs(void) : SV_Target0
{
float4 vTest = vk::RawBufferLoad<float4>(g_PushConstants.m_nBufferDeviceAddress + 16);

return vTest;
}
```

The SPIR-V for this shader can be seen in Appendix A. Note the lack of logical context for the accessed buffer i.e. no declaration for the underlying structure GlobalsTest_t as is generated for other buffers.

There is another way to use RawBufferLoad which does allow logical selection of the buffer fields, but it inefficiently loads the entire buffer to do it. See https://github.com/microsoft/DirectXShaderCompiler/issues/4986.

The goal of this proposal is to have a solution that meets the following requirements:

* Removes the need for having to manually or automatically generate offsets to load structured data with BufferDeviceAddress.
* Get equivalent tooling functionality as is provided by the buffer reference feature in GLSL. Namely, tools like RenderDoc are able to introspect the type information such that its buffer inspection and shader debugger are able to properly understand and represent the type of the data.
* Make it possible through SPIR-V reflection to determine which members of a struct accessed by BufferDeviceAddress are statically referenced and at what offset. This is already possible for other data like cbuffers in order for shader tooling to be able to identify which elements are used and where to put them.

## Proposed solution

Our solution is to add a new builtin type in the vk namespace that is a pointer to a buffer of a given type, `vk::BufferPointer<T,A>`. The template argument `T` must be a struct. `A` must be an integer and is the alignment in bytes of the pointer. If `A` is not specified, the alignment is assumed to be 16 bytes.

This new type will have the following operations

* Copy assignment and copy construction - These copy the value of the pointer from one variable to another.
* Dereference Method - The get() method represents the struct rvalue pointed at by the pointer to which the get() is applied. Note that this does not necessarily imply the entire value is physically loaded at this point; it merely represents the rvalue that would be loaded, similar to the effect of the * operator when applied to a pointer in C++. As selection . operators are applied to the get(), the data that is physically loaded is reduced appropriately.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Get() capital G to be consistent with HLSL style conventions.

I think it is probably more accurate that this is a const lvalue reference than an rvalue since I assume that's how this should likely be implemented.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

* Null Pointer Method - The IsNull() method returns true if the pointer is 0, false if not.

Note the operations that are not allowed:

* There is no default construction. Every vk::BufferPointer<T> is either contained in a global resource (like a cbuffer, ubo, or ssbo), or it must be constructed using the copy constructor.
* There is no conversion from uint64_t to vk:BufferPointer.
* There is no explicit pointer arithmetic. All addressing is implicitly done using the `.` pointer, or indexing an array in the struct T.
* The comparison operators == and != are not supported for buffer pointers.

Most of these restrictions are there for safety. They minimize the possibility of getting an invalid pointer. If the get() method is used on a null pointer, the behaviour is undefined.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed earlier, these restrictions prevent key functionality of pointers exposed by vk::RawBufferLoad and Store. Specifically, casting and pointer arithmetic need to be exposed somehow.

But for the sake of progress, we can move this discussion to another issue.


When used as a member in a buffer, vk::BufferPointer can be used to pass physical buffer addresses into a shader, and address and access buffer space with logical addressing, which allows tools such as spirv-opt, spirv-reflect and renderdoc to be able to better work with these shaders.

For example, here is a shader using vk::BufferPointer to do the same thing as the shader above using vk::RawBufferLoad. Note the natural, logical syntax of the reference:

```c++

struct Globals_s
{
float4 g_vSomeConstantA;
float4 g_vTestFloat4;
float4 g_vSomeConstantB;
};

typedef vk::BufferPointer<Globals_s> Globals_p;

struct TestPushConstant_t
{
Globals_p m_nBufferDeviceAddress;
};

[[vk::push_constant]] TestPushConstant_t g_PushConstants;

float4 MainPs(void) : SV_Target0
{
float4 vTest = g_PushConstants.m_nBufferDeviceAddress.g_vTestFloat4;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The accessing syntax is irregular here. A vk::buffer_ptr<T> can’t be treated as member accessing T. I think there are two options:

(1) Add a Get method to vk::buffer_ptr which returns a reference. This would be consistent with some of the existing buffer accessors, and provides regular syntax which can be supported for all HLSL versions.
(2) For HLSL 202x a dereference (->) operator could be added to indirect to the members.

Copy link
Contributor Author

@greg-lunarg greg-lunarg Apr 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For HLSL 202x a dereference (->) operator could be added to indirect to the members.

Does that mean that "->" cannot be added today? If not, what approximate calendar date could this be added and released?

Copy link
Contributor Author

@greg-lunarg greg-lunarg Apr 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding get(), would the above line then be?:

float4 vTest = g_PushConstants.m_nBufferDeviceAddress.get().g_vTestFloat4;

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The support for dereference operators is tied to adding references, which hasn’t yet been done. HLSL 202x doesn’t have a schedule yet, but we’re already starting to do some preliminary work on it.

Yes, that is how the Get() method should work.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it required to have the full reference support to use ->?
For now parser seems to ignore MemberExpr->isArrow() and allow . and -> (isArrow is just always false), and then fails in
tools/clang/lib/Sema/SemaExprCXX.cpp

Could we check the type here and allow arrow operators for this type without having the full reference implem?
This this feature can being used with the proper syntax, and then real reference support can roll-in without disturbances?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don’t need to have references fully implemented to add -> and * dereference operators to built-in types, but those operators will be HLSL 202x features, not available in the older language versions.

Right, just unsure how the release of the HLSL 202x relates to this. Since we add a new type to the language, how would that be different from adding allowing the -> operator for this type? Is it because you don't want to have the operator being tied to a specific type for now instead of a more generic semantic?

Agree with Greg, as the get() does seem to imply we load the full struct, and then hope for the optimizer to only load 1 field.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me get() simply reminds me of a smart pointer in C++ where you retrieve the underlying weak pointer. I don't necessarily think of the pointed-to object being loaded. Just offering a different perspective, but I also see your point.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In reverse order:

Agree with Greg, as the get() does seem to imply we load the full struct, and then hope for the optimizer to only load 1 field.

I don't think this is a concern we should design around. As @vettoreldaniele pointed out this is common syntax for C++ smart pointers so our users will be familiar with it, also the existing HLSL resource types that support subscript operators and Load methods already have the behavior where a load of the full object is not implied by the object being returned. It is 100% the expectation of our users that even if a subscript or Load method is used to store the full object to a local temporary, unused fields are optimized away and their loads removed from the final optimized code.

Right, just unsure how the release of the HLSL 202x relates to this. Since we add a new type to the language, how would that be different from adding allowing the -> operator for this type? Is it because you don't want to have the operator being tied to a specific type for now instead of a more generic semantic?

Is there a specific Vulkan/SPIR-V version requirement that this is gated on?

In general for HLSL we have two revisioning axis: language version and runtime version. In the past we've kinda blurred the two and it has caused a lot of problems. We're trying to be much more explicit about the separation.

Language versioning changes the core language of HLSL: syntax, grammar, semantics, etc.

Runtime version (for DXIL this is Shader Models) changes the library functionality of the language: data types, methods, etc.

There's an overlap here when you talk about adding something like a dereference operator which does not exist in the language. In this case you are adding a data type to the runtime version, and a new language construct on the language side. Both need to be versioned.

Having a Get method provides a syntax to use the object which is independent of the language version, so it can be exposed across all HLSL language versions supported by DXC.

From the specification perspective, HLSL 2021 is fully baked, done and out the door. We don't have a formal language spec for HLSL 2021 (although I have been writing one), but if we had a complete spec it would contain no reference to dereference operators because they don't exist in HLSL 2021 (or earlier versions). Introducing a new operator syntax is a change to the language and thus must be tied to a language version. Users should be able to know what syntaxes are supported by a compiler based on knowing what language version(s) the compiler supports. For a non-HLSL analogy: you wouldn't expect a C++98 compiler to support Lambdas, you shouldn't expect a HLSL 2021 compiler to support dereferences.

From an implementation perspective things are a little different and more nuanced. We have in the past exposed limited functionalities of new language modes in older language modes. A most recent example was the addition of the select, and and or built-in functions in versions older than HLSL 2021. We did that to ease developer transition by allowing valid HLSL 2021 code to be valid in earlier versions. Those intrinsics also did not introduce new syntax or grammar constructs. As another example, we did not expose HLSL 2021 templates in HLSL 2018 or earlier. templates did introduce new syntaxes. Because this is a completely new syntax I'm not sure this is good candidate to expose in earlier HLSL versions.

If the implementation is sufficiently simple, robust and doesn't rely on intrusive language changes, or expose other breakages in language semantics we could consider exposing it as an extension to earlier versions of HLSL. I've previously advocated internally for allowing language extensions when they are non-breaking, strictly additive changes and have a compatibility diagnostic implemented (meaning the compiler emits a warning if you use the feature while setting an old language version). That might be an option here, but I wouldn't want to commit to it until we get into the implementation and can fully explore the interactions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I am fine with get().

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the thorough explanation!
for get(), missed the parallel with shared pointers. Good argument, fine with get().

Thanks for the clarification on the frontier between HLSL version and shader model version. Those arguments make sense 😊

return vTest;
}

```

In SPIR-V, Globals_p would be a pointer to the physical buffer storage class. The struct type of the push constant would contain one of those pointers. The SPIR-V for this shader can be seen in Appendix B. Note the logical context of the declaration and addressing of underlying struct Globals_s including Offset decorations all Globals_s members.

## Linked Lists and Local Variables

vk::BufferPointer can be used to program a linked list of identical buffers:

```c++

// Forward declaration
typedef struct block_s block_t;
typedef vk::BufferPointer<block_t> block_p;

struct block_s
{
float4 x;
block_p next;
};

struct TestPushConstant_t
{
block_p root;
};

[[vk::push_constant]] TestPushConstant_t g_PushConstants;

float4 MainPs(void) : SV_Target0
{
block_p g_p = g_PushConstants.root;
g_p = g_p.next;
if (uint64_t(g_p) == 0) return float4(0.0,0.0,0.0,0.0);
return g_p.x
}

```

Note also the ability to create local variables of type vk::BufferPointer such as g_p which can be read, written and dereferenced.

## Design Details

### Differences from C++ Pointers

vk::BufferPointer is different from a C++ pointer in that a selection “.” operation can and must be applied to a buffer pointer to de-reference it.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is out of date.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to Get() to dereference.


### Buffer Pointer Target Alignment

The target alignment `A` of `vk::BufferPointer(T,A)` must be at least as large as the largest component type in the buffer pointer's pointee struct type `T` or the compiler may issue an error.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: vk::BufferPointer<T,A>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


### Buffer Pointer Data Size and Alignment

For the purpose of laying out a buffer containing a vk::BufferPointer, the data size and alignment is that of a uint64_t.

### Buffer Pointer Pointee Buffer Layout

The pointee of a vk::BufferPointer is considered to be a buffer and will be laid out as the user directs all buffers to be laid out through the dxc compiler. All layouts that are supported by dxc are supported for vk::BufferPointer pointee buffers.

### Buffer Pointer Usage

vk::BufferPointer cannot be used in Input and Output variables. It also cannot be used in Unions, when those finally appear in HLSL.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: strike the word finally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


A vk::BufferPointer can otherwise be used whereever the HLSL spec does not otherwise disallow it through listing of allowed types. Specifically, buffer members, local and static variables, function argument and return types can be vk::BufferPointer. Ray tracing payloads and shader buffer table records may also contain vk::BufferPointer.

### Buffer Pointer and Semantic Annotations

Applying HLSL semantic annotations to objects of type vk::BufferPointer is disallowed.

## SPIR-V Appendices

### Appendix A: SPIR-V for RawBufferLoad

Note the lack of logical context for the accessed buffer i.e. no declaration for the underlying structure GlobalsTest_t as is generated for other buffers.

```

OpCapability Shader
OpCapability Int64
OpCapability PhysicalStorageBufferAddresses
OpExtension "SPV_KHR_physical_storage_buffer"
OpMemoryModel PhysicalStorageBuffer64 GLSL450
OpEntryPoint Fragment %MainPs "MainPs" %out_var_SV_Target0 %g_PushConstants
OpExecutionMode %MainPs OriginUpperLeft
OpSource HLSL 600
OpName %type_PushConstant_TestPushConstant_t "type.PushConstant.TestPushConstant_t"
OpMemberName %type_PushConstant_TestPushConstant_t 0 "m_nBufferDeviceAddress"
OpName %g_PushConstants "g_PushConstants"
OpName %out_var_SV_Target0 "out.var.SV_Target0"
OpName %MainPs "MainPs"
OpDecorate %out_var_SV_Target0 Location 0
OpMemberDecorate %type_PushConstant_TestPushConstant_t 0 Offset 0
OpDecorate %type_PushConstant_TestPushConstant_t Block
%int = OpTypeInt 32 1
%int_0 = OpConstant %int 0
%ulong = OpTypeInt 64 0
%ulong_16 = OpConstant %ulong 16
%type_PushConstant_TestPushConstant_t = OpTypeStruct %ulong
%_ptr_PushConstant_type_PushConstant_TestPushConstant_t = OpTypePointer PushConstant %type_PushConstant_TestPushConstant_t
%float = OpTypeFloat 32
%v4float = OpTypeVector %float 4
%_ptr_Output_v4float = OpTypePointer Output %v4float
%void = OpTypeVoid
%14 = OpTypeFunction %void
%15 = OpTypeFunction %v4float
%_ptr_Function_v4float = OpTypePointer Function %v4float
%_ptr_PushConstant_ulong = OpTypePointer PushConstant %ulong
%_ptr_PhysicalStorageBuffer_v4float = OpTypePointer PhysicalStorageBuffer %v4float
%g_PushConstants = OpVariable %_ptr_PushConstant_type_PushConstant_TestPushConstant_t PushConstant
%out_var_SV_Target0 = OpVariable %_ptr_Output_v4float Output
%MainPs = OpFunction %void None %14
%19 = OpLabel
%20 = OpVariable %_ptr_Function_v4float Function
%21 = OpVariable %_ptr_Function_v4float Function
%22 = OpAccessChain %_ptr_PushConstant_ulong %g_PushConstants %int_0
%23 = OpLoad %ulong %22
%24 = OpIAdd %ulong %23 %ulong_16
%25 = OpBitcast %_ptr_PhysicalStorageBuffer_v4float %24
%26 = OpLoad %v4float %25 Aligned 4
OpStore %20 %26
OpStore %21 %26
OpStore %out_var_SV_Target0 %26
OpReturn
OpFunctionEnd

```

### Appendix B: SPIR-V for vk::buffer_ref

Here is the SPIR-V for this shader. Note the logical context of the declaration and addressing of underlying struct Globals_s including Offset decorations all Globals_s members:

```
OpCapability Shader
OpCapability PhysicalStorageBufferAddresses
OpExtension "SPV_KHR_physical_storage_buffer"
OpMemoryModel PhysicalStorageBuffer64 GLSL450
OpEntryPoint Fragment %MainPs "MainPs" %out_var_SV_Target0 %g_PushConstants
OpExecutionMode %MainPs OriginUpperLeft
OpSource HLSL 600
OpName %type_PushConstant_TestPushConstant_t "type.PushConstant.TestPushConstant_t"
OpMemberName %type_PushConstant_TestPushConstant_t 0 "m_nBufferDeviceAddress"
OpName %Globals_s "Globals_s"
OpMemberName %Globals_s 0 "g_vSomeConstantA"
OpMemberName %Globals_s 1 "g_vTestFloat4"
OpMemberName %Globals_s 2 "g_vSomeConstantB"
OpName %g_PushConstants "g_PushConstants"
OpName %out_var_SV_Target0 "out.var.SV_Target0"
OpName %MainPs "MainPs"
OpDecorate %out_var_SV_Target0 Location 0
OpMemberDecorate %Globals_s 0 Offset 0
OpMemberDecorate %Globals_s 1 Offset 16
OpMemberDecorate %Globals_s 2 Offset 32
OpDecorate %Globals_s Block
OpMemberDecorate %type_PushConstant_TestPushConstant_t 0 Offset 0
OpDecorate %type_PushConstant_TestPushConstant_t Block
%int = OpTypeInt 32 1
%int_0 = OpConstant %int 0
%int_1 = OpConstant %int 1
%float = OpTypeFloat 32
%v4float = OpTypeVector %float 4
%Globals_s = OpTypeStruct %v4float %v4float %v4float
%_ptr_PhysicalStorageBuffer_Globals_s = OpTypePointer PhysicalStorageBuffer %Globals_s
%type_PushConstant_TestPushConstant_t = OpTypeStruct %_ptr_PhysicalStorageBuffer_Globals_s
%_ptr_PushConstant_type_PushConstant_TestPushConstant_t = OpTypePointer PushConstant %type_PushConstant_TestPushConstant_t
%_ptr_Output_v4float = OpTypePointer Output %v4float
%void = OpTypeVoid
%15 = OpTypeFunction %void
%16 = OpTypeFunction %v4float
%_ptr_Function_v4float = OpTypePointer Function %v4float
%_ptr_PushConstant__ptr_PhysicalStorageBuffer_Globals_s = OpTypePointer PushConstant %_ptr_PhysicalStorageBuffer_Globals_s
%_ptr_PhysicalStorageBuffer_v4float = OpTypePointer PhysicalStorageBuffer %v4float
%g_PushConstants = OpVariable %_ptr_PushConstant_type_PushConstant_TestPushConstant_t PushConstant
%out_var_SV_Target0 = OpVariable %_ptr_Output_v4float Output
%MainPs = OpFunction %void None %15
%20 = OpLabel
%23 = OpAccessChain %_ptr_PushConstant__ptr_PhysicalStorageBuffer_Globals_s %g_PushConstants %int_0
%24 = OpLoad %_ptr_PhysicalStorageBuffer_Globals_s %23
%25 = OpAccessChain %_ptr_PhysicalStorageBuffer_v4float %24 %int_1
%26 = OpLoad %v4float %25 Aligned 16
OpStore %out_var_SV_Target0 %26
OpReturn
OpFunctionEnd
```
<!-- {% endraw %} -->