cudnn frontend v1.10 release notes
cudnn frontend v1.10 is the preferred cudnn frontend to be used for
cudnn backend 9.7.0 and later as it adds to the Blackwell specific
features.
New API
-
cudnn Frontend v1.10 introduces two new operators,
block_scale_quantize and block_scale_dequantize to specify the scaling
and de-scaling of low precision datatypes supported from Blackwell GPU
onwards. -
create_execution_plan(int64_t const engine_id, std::unordered_map<KnobType_t, int64_t> const &knobs)
allows creation
of a custom execution plan with hardcoded engine and knobs. Added a
sample insamples/cpp/misc/custom_plan.cpp
to showcase how to work
with differentEngine
andKnobs
.
Improvements
-
Users can now query behavior notes of a particular execution plan
usingget_behavior_notes(std::vector<BehaviorNote_t> ¬es) const
and
get_behavior_notes_for_plan_at_index(int64_t const index, std::vector<BehaviorNote_t> ¬es) const
functions. -
SDPA operations now accept both left window and right window size with
respect to diagonal. See Attention.md for more details. -
SDPA operations now accept a diagonal alignment for the Attention
score matrix to be used describe the above window. Whens_q != s_kv
,
and causal mask is on this can be used to specify if the diagonal is top
left or bottom right. -
Bottom right causal masking can now be enabled on the sdpa_fp8
operation.
Bug fixes
- Fixed a regression in cuDNN FrontEnd v1.9.0 where the softmax node
would override user-set dims and strides for softmax_stats and m_zinv.
This also affected sdpa_forward and sdpa_fp8_forward node
New samples
- Added an example to showcase how native cuda graphs can be constructed
from the SDPA operation graph.