Skip to content

Navigation Menu

Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

DefTruth / CUDA-Learn-Notes Public

Notifications You must be signed in to change notification settings
Fork 204
Star 1.9k

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Releases: DefTruth/CUDA-Learn-Notes

Releases · DefTruth/CUDA-Learn-Notes

HGEMM Up to 115 TFLOPS:L20

21 Oct 12:55

DefTruth

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

HGEMM Up to 115 TFLOPS:L20

What's Changed

[HGEMM] Add MMA 16816 swizzle, Up to 115 TFLOPS by @DefTruth in #98

Full Changelog: v2.4.13...v2.4.15

Contributors

DefTruth

Assets 2

Loading

All reactions

HGEMM Up to 113 TFLOPS:L20

21 Oct 01:56

DefTruth

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

HGEMM Up to 113 TFLOPS:L20

What's Changed

[Mat][Trans] Add f32/f32x4 row/col first kernel by @bear-zd in #89
[Docs][Contribute] Add How to contribute Notes by @DefTruth in #90
[HGEMM] optimize SMEM padding, up to 113 TFLOPS by @DefTruth in #92
[Mat][Trans] Add f32x4_shared/bcf row/col first kernel. by @bear-zd in #91
[Docs] rename mat_transpose -> mat-transpose by @DefTruth in #93
[HGEMM] Add GeForce RTX 3080 Laptop benchmark by @DefTruth in #94
[HGEMM] update HGEMM benchmark option by @DefTruth in #95
[HGEMM] Refactor HGEMM WMMA 161616 kernels by @DefTruth in #96
[HGEMM] Update HGEMM WMMA Benchmark by @DefTruth in #97

Full Changelog: v2.4.12...v2.4.13

Contributors

DefTruth and bear-zd

Assets 2

Loading

All reactions

v2.4.12 SGEMM TF32 Swizzle

17 Oct 02:24

DefTruth

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v2.4.12 SGEMM TF32 Swizzle

What's Changed

[SGEMM] SGEMM TF32 Thread Block Swizzle by @DefTruth in #84
[HGEMM] mma4x4_warp4x4_stages with swizzle by @DefTruth in #86
[SWISH] support Swish F32/F16 kernel by @wangzijian1010 in #85
[SGEMM] Update SGEMM TF32 Benchmark by @DefTruth in #87

New Contributors

@wangzijian1010 made their first contribution in #85

Full Changelog: v2.4.11...v2.4.12

Contributors

DefTruth and wangzijian1010

Assets 2

Loading

All reactions

v2.4.11 HGEMM Block Swizzle

16 Oct 03:04

DefTruth

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v2.4.11 HGEMM Block Swizzle

What's Changed

[Docs] Update README.md by @DefTruth in #81
[HEGMM] HGEMM WMMA Thread Block Swizzle by @DefTruth in #82
[HGEMM] make thread block swizzle stride as N/4 by @DefTruth in #83

Full Changelog: v2.4.10...v2.4.11

Contributors

DefTruth

Assets 2

Loading

All reactions

v2.4.10 SGEMM TF32 Stage 2/3

15 Oct 02:04

DefTruth

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v2.4.10 SGEMM TF32 Stage 2/3

What's Changed

[HGEMM] HGEMM WMMA Stage mma4x2+warp4x4 by @DefTruth in #76
[SGEMM] Add SGEMM WMMA TF32 Stage2/3 by @DefTruth in #77
[SGEMM] Add cuBLAS SGEMM F32/TF32 baseline by @DefTruth in #78
[SGEMM] Add Kernel cudaFuncSetAttribute hint by @DefTruth in #79
[RoPE] Add minimal RoPE f32/f32x4 pack impl by @bear-zd in #80

Full Changelog: v2.4.9...v2.4.10

Contributors

DefTruth and bear-zd

Assets 2

Loading

All reactions

v2.4.9 HGEMM WMMA Stage

13 Oct 09:15

DefTruth

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v2.4.9 HGEMM WMMA Stage

What's Changed

[HGEMM] Add HGEMM WMMA Double Buffers by @DefTruth in #69
[Embedding] Add embedding kernel f32/x4/x4_pack, f16/x8/x8_pack by @bear-zd in #68
[HGEMM] Add HGEMM mma4x2, warp2x4x2 kernel by @DefTruth in #70
[HGEMM] HGEMM WMMA with Reg double buffers by @DefTruth in #71
[HGEMM] Add HGEMM WMMA Stage 3/4 Kernel by @DefTruth in #74
[Softmax] Add online softmax f32x4 pack kernel by @bear-zd in #73
[HEGMM][Bugfix] fix HGEMM Stage cp.async error by @DefTruth in #75

Full Changelog: v2.4.8...v2.4.9

Contributors

DefTruth and bear-zd

Assets 2

Loading

All reactions

v2.4.8 HGEMM WMMA Part-1

11 Oct 11:05

DefTruth

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v2.4.8 HGEMM WMMA Part-1

What's Changed

[GELU] Add f32/x4, f16/x2/x8/x8pack kernel. by @bear-zd in #66
[HGEMM] HGEMM Tensor Cores Support Part-1 by @DefTruth in #67

Full Changelog: v2.4.7...v2.4.8

Contributors

DefTruth and bear-zd

Assets 2

Loading

All reactions

v2.4.7 SGEMM Copy Async

10 Oct 06:16

DefTruth

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v2.4.7 SGEMM Copy Async

What's Changed

[SGEMM][Async] Add naive copy async SGEMM by @DefTruth in #64
[SGEMM][Async] Add K16 + Copy Async Kernel by @DefTruth in #65

Full Changelog: v2.4.6...v2.4.7

Contributors

DefTruth

Assets 2

Loading

wangzijian1010 and DefTruth reacted with thumbs up emoji

All reactions

👍 2 reactions

2 people reacted

v2.4.6 HGEMM Copy Async

08 Oct 03:48

DefTruth

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v2.4.6 HGEMM Copy Async

What's Changed

[Softmax] Add online softmax according to Nvidia Paper by @bear-zd in #60
[HGEMM][Async] support K16/32 pack+cp.async+dbuf by @DefTruth in #62
[Softmax][Bugfix] fixed softmax compile error by @DefTruth in #63

New Contributors

@bear-zd made their first contribution in #60

Full Changelog: v2.4.5...v2.4.6

Contributors

DefTruth and bear-zd

Assets 2

Loading

DefTruth and bear-zd reacted with heart emoji

All reactions

❤️ 2 reactions

2 people reacted

v2.4.5 HGEMM Double Buffers

30 Sep 07:47

DefTruth

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v2.4.5 HGEMM Double Buffers

What's Changed

[FlashAttention] Refactor FlashAttention PyTorch bindings by @DefTruth in #55
[SGEMM] test bank conflicts free with smem offset by @DefTruth in #56
[HGEMM] HEGMM kernel with double buffers by @DefTruth in #57
[Docs] Add docs for HGEMM/SGEMM double buffers by @DefTruth in #58
[HGEMM] Add PyTorch HGEMM profile by @DefTruth in #59

Full Changelog: v2.4.4...v2.4.5

Contributors

DefTruth

Assets 2

Loading

DefTruth reacted with heart emoji

All reactions

❤️ 1 reaction

1 person reacted

Previous 1 2 3 4 5 Next

Footer

© 2025 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.