Skip to content

v1.10.0 release

Latest
Compare
Choose a tag to compare
@Anerudhan Anerudhan released this 28 Jan 05:43
91b7532

cudnn frontend v1.10 release notes

cudnn frontend v1.10 is the preferred cudnn frontend to be used for
cudnn backend 9.7.0 and later as it adds to the Blackwell specific
features.

New API

  • cudnn Frontend v1.10 introduces two new operators,
    block_scale_quantize and block_scale_dequantize to specify the scaling
    and de-scaling of low precision datatypes supported from Blackwell GPU
    onwards.

  • create_execution_plan(int64_t const engine_id, std::unordered_map<KnobType_t, int64_t> const &knobs) allows creation
    of a custom execution plan with hardcoded engine and knobs. Added a
    sample in samples/cpp/misc/custom_plan.cpp to showcase how to work
    with different Engine and Knobs.

Improvements

  • Users can now query behavior notes of a particular execution plan
    using get_behavior_notes(std::vector<BehaviorNote_t> &notes) const and
    get_behavior_notes_for_plan_at_index(int64_t const index, std::vector<BehaviorNote_t> &notes) const functions.

  • SDPA operations now accept both left window and right window size with
    respect to diagonal. See Attention.md for more details.

  • SDPA operations now accept a diagonal alignment for the Attention
    score matrix to be used describe the above window. When s_q != s_kv,
    and causal mask is on this can be used to specify if the diagonal is top
    left or bottom right.

  • Bottom right causal masking can now be enabled on the sdpa_fp8
    operation.

Bug fixes

  • Fixed a regression in cuDNN FrontEnd v1.9.0 where the softmax node
    would override user-set dims and strides for softmax_stats and m_zinv.
    This also affected sdpa_forward and sdpa_fp8_forward node

New samples

  • Added an example to showcase how native cuda graphs can be constructed
    from the SDPA operation graph.