-
Notifications
You must be signed in to change notification settings - Fork 248
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add floating-point pack/unpack proposal (#6191)
* add initial proposal * update proposal * update proposal * update proposal * update proposal * fix typo * improve wording --------- Co-authored-by: Yong He <[email protected]>
- Loading branch information
1 parent
bbaaab4
commit 78f26f0
Showing
1 changed file
with
114 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,114 @@ | ||
SP #018: Data pack/unpack intrinsics | ||
==================================== | ||
|
||
Adds intrinsics for converting numeric vector data to and from packed unsigned integer values. | ||
|
||
## Status | ||
|
||
Status: Design Review. | ||
|
||
Implementation: N/A | ||
|
||
Author: Darren Wihandi and Slang community. | ||
|
||
Reviewer: Yong He. | ||
|
||
## Background | ||
|
||
Data packing/unpacking intrinsics provide great utility. Slang's core module, which derives from HLSL, already defines integer pack/unpack intrinsics that were introduced in SM 6.6. | ||
Floating-point variants, however, are undefined. Floating-point pack/unpack intrinsics exist as built-in intrinsics in GLSL, SPIRV, Metal, and WGSL but not in HLSL and Slang, and | ||
there is no way to access the intrinsics provided by the other shader language targets. | ||
|
||
## Proposed Approach | ||
|
||
We propose to add new packed-data intrinsics to cover both floating-point and integer types. Although the HLSL integer intrinsics are already implemented, integer variants are also | ||
added to obtain independence from the HLSL specifications and syntax. For floating-point processing, different variants exist that handle conversion between unorm, snorm, and | ||
unnormalized/standard IEEE 754 floats and vectors of 16-bit and 32-bit floats. | ||
|
||
### Floating-point Unpack Functions | ||
|
||
A set of unpack intrinsics are added to decompose a 32-bit integer of packed 8-bit or 16-bit float chunks and reinterpret them as a vector of unorm, snorm, or standard floats and | ||
halfs. Each 8-bit or 16-bit chunk is converted to either a normalized float through a conversion rule or a standard IEEE-754 floating-point. | ||
``` | ||
float4 unpackUnorm4x8ToFloat(uint packedVal); | ||
half4 unpackUnorm4x8ToHalf(uint packedVal); | ||
float4 unpackSnorm4x8ToFloat(uint packedVal); | ||
half4 unpackSnorm4x8ToHalf(uint packedVal); | ||
float2 unpackUnorm2x16ToFloat(uint packedVal); | ||
half2 unpackUnorm2x16ToHalf(uint packedVal); | ||
float2 unpackSnorm2x16ToFloat(uint packedVal); | ||
half2 unpackSnorm2x16ToHalf(uint packedVal); | ||
float2 unpackHalf2x16ToFloat(uint packedVal); | ||
half2 unpackHalf2x16ToHalf(uint packedVal); | ||
``` | ||
### Floating-point Pack Functions | ||
|
||
A set of pack intrinsics are added to pack a vector of unorm, snorm, or standard floats and halves to a 32-bit integer of packed 8-bit or 16-bit float chunks. Each vector element is converted | ||
to an 8-bit or 16-bit integer chunk through conversion rules, then packed into one 32-bit integer value. | ||
|
||
``` | ||
uint packUnorm4x8(float4 unpackedVal); | ||
uint packUnorm4x8(half4 unpackedVal); | ||
uint packSnorm4x8(float4 unpackedVal); | ||
uint packSnorm4x8(half4 unpackedVal); | ||
uint packUnorm2x16(float2 unpackedVal); | ||
uint packUnorm2x16(half2 unpackedVal); | ||
uint packSnorm2x16(float2 unpackedVal); | ||
uint packSnorm2x16(half2 unpackedVal); | ||
uint packHalf2x16(float2 unpackedVal); | ||
uint packHalf2x16(half2 unpackedVal); | ||
``` | ||
|
||
### Integer Unpack Functions | ||
|
||
A set of unpack intrinsics are added to decompose a 32-bit integer containing four packed 8-bit signed or unsigned integer values and reinterpret them as vectors of 16-bit or 32-bit integers. | ||
These intrinsics support sign-extension for signed integers and zero-extension for unsigned integers. | ||
|
||
``` | ||
uint32_t4 unpackUint4x8ToUint32(uint packedVal); | ||
uint16_t4 unpackUint4x8ToUint16(uint packedVal); | ||
int32_t4 unpackInt4x8ToInt32(uint packedVal); | ||
int16_t4 unpackInt4x8ToInt16(uint packedVal); | ||
``` | ||
|
||
### Integer Pack Functions | ||
|
||
A set of pack intrinsics are added to convert a vector of 16-bit or 32-bit signed or unsigned integers into a 32-bit packed representation, storing only the lower 8 bits of each value. | ||
The clamped variants ensure each 8-bit value is clamped to [0, 255] for unsigned values and [-128, 127] for signed values. | ||
|
||
``` | ||
uint packUint4x8(uint32_t4 unpackedVal); | ||
uint packUint4x8(uint16_t4 unpackedVal); | ||
uint packInt4x8(int32_t4 unpackedVal); | ||
uint packInt4x8(int16_t4 unpackedVal); | ||
uint packUint4x8Clamp(int32_t4 unpackedVal); | ||
uint packUint4x8Clamp(int16_t4 unpackedVal); | ||
uint packInt4x8Clamp(int32_t4 unpackedVal); | ||
uint packInt4x8Clamp(int16_t4 unpackedVal); | ||
``` | ||
|
||
### Implementation Details | ||
|
||
#### Normalized float conversion rules | ||
Normalized float conversion rules are standard across GLSL/SPIRV, Metal, and WGSL. Slang follows these standards. The conversion rules for each target are detailed in: | ||
- Section 8.4, `Floating-Point Pack and Unpack Functions`, of the GLSL language specification, which is also used by the SPIR-V extended instrucsion for GLSL. | ||
- Section 7.7.1, `Conversion Rules for Normalized Integer Pixel Data Types`, of the Metal Shading language specification. | ||
- Sections [16.9 Data Packing Built-in Functions](https://www.w3.org/TR/WGSL/#pack-builtin-functions) and [16.10 Data Unpacking Built-in Functions](https://www.w3.org/TR/WGSL/#unpack-builtin-functions) of the WebGPU Shading language specification. | ||
|
||
#### Targets without built-in intrinsics | ||
For targets without built-in intrinsics, the implementation is done manually using a combination of arithmetic and bitwise operations. | ||
|
||
#### Built-in packed Datatypes | ||
Unlike HLSL's implementation, which introduces new packed datatypes (uint8_t4_packed and int8_t4_packed), unsigned 32-bit integers are used directly, and no new packed datatypes are introduced. |