New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

AMX Extension #93

Draft

marenz2569 wants to merge 44 commits into master from cyrill.amx_integration

Member

marenz2569 commented Nov 22, 2024

@cyssi-cb I merged the refactoring branch onto this PR. You will probably want to reopen this PR with your own user account. You should have the permission to commit to this branch directly. I'll leave you the comments here but can also readd them to a new pr.

cyssi-cb added 30 commits

September 20, 2024 12:54


          [ADD] AMX implementation and Sapphire rapids config

95f1c83


          [FIX] update asmjit calls

1fdf36f


          [FIX] include Sapphire Rapids config

2b9adf9


          [FIX] asmjit call in sapphire rapids config

a91ea44


          [FIX] add new files to cmake

e2a8731


          [FIX] adapted workload to new asmjit api

529bb7f


          [FIX] adapted workload to new asmjit api

b3b94f8


          [FIX] add missing compiler flags

8e8e82d


          [FIX] register SApphire Rapids config

892f234


          [REMOVED] unneded prints

5c5f935


          [ADD] use bf16

edf806c


          [FIX] typo

af302d6


          [ADD] limit init value

7caf6cb


          [REMOVED] unnecessary defines

38fe61d


          [ADD] use AVX512 config with AMX


          [FIX] correct CPUID model, function as static, logging behavior

9fc69b6


          [FIX] includes

b5fa958


          [FIX] includes

fb60acd


          [ADD] check for AMX during compilePayload


          [FIX] spelling

ee707cf


          [FIX] merge AMX into AVX512 workload with runtime check for AMX feature

6e66d26


          [FIX] CMakeLists

824520c


          [FIX] naming convention

87d6af9


          [FIX] spelling

b775b8c


          [FIX] spelling

ce86848


          [FIX] spelling

3a789bf


          [REMOVE] unnecesary payload definition, now merged into AVX512Payload…

83e91eb

….hpp


          Merge branch 'master' into amx

fdf01db

Added requested changes in master. Update PR branch.


          [FIX] Cmake

ce35b28


          [FIX] move __tilecfg definition into header

b1577ce

marenz2569 added 9 commits

November 22, 2024 13:51


          Merge remote-tracking branch 'origin/master' into cyrill.amx_integration

15822f2


          Squash merge branch code-style-enforcing into cyrill.amx_integration

31226ad


          Merge branch 'code-style-enforcing' into cyrill.amx_integration

e73d1ef


          reduce diff of merge

27ce53e


          Specialize AVX512 Payload for AMX extension

98efa54


          remove AMX instruction from default AVX512 payload

345ed8e


          fix merge error

0396c64


          minimize diff

c4e8ca6


          cleanup include

287d87a

marenz2569 commented

View reviewed changes

CMakeLists.txt

@@ @@ -57,7 +57,7 @@ git_submodule_update() @@
               if("${CMAKE_CXX_COMPILER_ID}" STREQUAL "MSVC")
               else()
-              SET(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wall -Wextra -O2 -fdata-sections -ffunction-sections")
+              SET(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -mamx-tile -Wall -Wextra -O2 -fdata-sections -ffunction-sections")

Member Author

marenz2569 Nov 22, 2024

This change will not be necessary when the ldtilecfg instruction is integrated into the assembler kernel

src/firestarter/Environment/X86/Payload/AVX512Payload.cpp Outdated

Comment on lines 451 to 463

+                // Create tile_cfg, fill it and return
+                int i;
+                tileinfo->palette_id = 1;
+                tileinfo->start_row = 0;
+                for (i = 0; i < 8; ++i) {
+                  tileinfo->colsb[i] = MAX_COLS;
+                  tileinfo->rows[i] = MAX_ROWS;
+                }
+                _tile_loadconfig(tileinfo);
+              }

Member Author

marenz2569 Nov 22, 2024

This function can be integrated into the assembler kernel. This will also require that the tile config struct is passed via the c-style inferface to this kernel. The pointer to the value returned by getMemoryAddress function in https://github.com/tud-zih-energy/FIRESTARTER/blob/code-style-enforcing/include/firestarter/LoadWorkerMemory.hpp is saved in PointerReg. Also take a look at X86Payload::emitDumpRegisterCode function to see how this memory is addressed in the assembler payload.

src/firestarter/Environment/X86/Payload/AVX512Payload.cpp Outdated

+                long rc;
+                unsigned long bitmask;
+                rc = syscall(SYS_arch_prctl, ARCH_REQ_XCOMP_PERM, XFEATURE_XTILEDATA);

Member Author

marenz2569 Nov 22, 2024 •

edited

Loading

I assume that the syscall is required to enable AMX on the OS level. This however will cause to not compile on Windows and MacOS. Please guard this code with an linux ifdef and workerLog::fatal on Windows/MacOS.

Member Author

marenz2569 Nov 24, 2024

Taking a look at this PR microsoft/onnxruntime#14042 it seems that AMX is just supported on Windows. Only question is if the compiler sets some flag in the binary for the operating system or if it just works with Jit generated assembler code.

src/firestarter/Environment/X86/Payload/AVX512Payload.cpp Outdated

+                request_permission();
+                create_AMX_config(&tile_data); // Create tilecfg and fill it
+                static bool init = true;

Member Author

marenz2569 Nov 22, 2024

unused variable

src/firestarter/Environment/X86/Payload/AVX512Payload.cpp Outdated

Comment on lines 486 to 510

+                // Initialize buffer with random values
+                // Multiplication always produces either 1 or -1
+                // Accumulation operation always on (1 + -1) = 0 ensures stable values
+                __bfloat16* buf1 = (__bfloat16*)src1;
+                __bfloat16* buf2 = (__bfloat16*)src2;
+                // TODO: Change MAX_ROWS/MAXC_COLS from constant to maximum size check by asmJit
+                //	   Currently not supported by asmJit
+                //	   Alternative: Manually parse CPUID
+                for (int i = 0; i < MAX_ROWS; i++) {
+                  __bfloat16 random_init = (__bfloat16)(rand() % 65536); // Limit maximum size as 1/x needs to fit bfloat16
+                  for (int j = 0; j < MAX_COLS; j++) {
+                    buf1[i * MAX_COLS + j] = (__bfloat16)(random_init);
+                    if (!(j % 2)) {
+                      buf2[i * MAX_COLS + j] = (__bfloat16)((-1) / random_init);
+                    } else if (j % 2) {
+                      buf2[i * MAX_COLS + j] = (__bfloat16)(1 / random_init);
+                    }
+                  }
+                }
+              }

Member Author

marenz2569 Nov 22, 2024

This function should be called in the AVX512WithAMXPayload::init function. It should override and call the AVX512Payload::init. The pointer to memory is the same as in the assembler kernel.

src/firestarter/Environment/X86/Payload/AVX512Payload.cpp Outdated

+                  workerLog::error() << "prctl(ARCH_GET_XCOMP_PERM) error: " << rc;
+                }
+                if (bitmask & XFEATURE_MASK_XTILE) {

Member Author

marenz2569 Nov 22, 2024

Should this check if XFEATURE_MASK_XTILEDATA is set? This check returns true if either/and XFEATURE_MASK_XTILECFG and XFEATURE_MASK_XTILEDATA is set.

src/firestarter/Environment/X86/Payload/AVX512Payload.cpp Outdated

		}

		void AVX512Payload::init_buffer_rand(uintptr_t src1, uintptr_t src2) {

Member Author

marenz2569 Nov 22, 2024

Pointers to the function should be __bfloat16* This may however not be supported by all compilers. We might need to initialize this memory differently.

src/firestarter/Environment/X86/Payload/AVX512Payload.cpp Outdated

Comment on lines 492 to 493

__bfloat16* buf2 = (__bfloat16*)src2;

Member Author

marenz2569 Nov 22, 2024

Please do not use c-style casts

src/firestarter/Environment/X86/Payload/AVX512Payload.cpp Outdated

Comment on lines 191 to 202

+                unsigned int aligned_alloc_size = static_cast<unsigned int>(MAX * sizeof(__bfloat16));
+                if (aligned_alloc_size % 1024) { // aligned_alloc expects size to be multiple of alignment (aka 1024)
+                  aligned_alloc_size = aligned_alloc_size + (1024 - (aligned_alloc_size % 1024));
+                }
+                src1 = (uintptr_t)aligned_alloc(1024, aligned_alloc_size);
+                src2 = (uintptr_t)aligned_alloc(1024, aligned_alloc_size);
+                src3 = (uint64_t)aligned_alloc(1024, aligned_alloc_size);
+                if (((void*)src1 == nullptr) || (void*)src2 == nullptr ||
+                    (void*)src3 == nullptr) { // uintptr_t garantuees we can cast it to void* and back
+                  std::cout << "[ERROR]: Allocation of source and target buffer for AMX failed. Aborting...\n";
+                  exit(1);
+                }

Member Author

marenz2569 Nov 22, 2024

This memory should be allocated in the LoadWorkerMemory class. A platform independent alligned alloc abstraction is available there. You might need to change it sightly, so that these variables are aligned with 1024B instead of 64B. This change will not only allocate these arrays for the AVX512/AMX payload but for all. There should however be no negative effect other than increased allocated RAM size for all payloads.

include/firestarter/Environment/X86/Payload/AVX512WithAMXPayload.hpp

Comment on lines +33 to +38

+                AVX512WithAMXPayload() noexcept {
+                  // Enable the AMX instruction in the AVX512 Payload and request AMX_TILE and AMX_BF16 feature.
+                  addInstructionFlops("AMX", 512);
+                  addFeatureRequest(asmjit::CpuFeatures::X86::kAMX_TILE);
+                  addFeatureRequest(asmjit::CpuFeatures::X86::kAMX_BF16);
+                }

Member Author

marenz2569 Nov 22, 2024

I added this wrapper to the AVX512Payload to allow for checking the AMX_TILE and AMX_BF16 features.


          rename tileconfig

997565d

marenz2569 commented

View reviewed changes

include/firestarter/Environment/X86/Platform/SapphireRapidsConfig.hpp Outdated

Comment on lines 33 to 46

+                                        environment::payload::PayloadSettings(/*Threads=*/{1, 2},
+                                                                              /*DataCacheBufferSize=*/{32768, 1048576, 1441792},
+                                                                              /*RamBufferSize=*/1048576000, /*Lines=*/1536,
+                                                                              /*InstructionGroups=*/
+                                                                              {{"RAM_S", 3},
+                                                                               {"RAM_P", 1},
+                                                                               {"L3_S", 1},
+                                                                               {"L3_P", 1},
+                                                                               {"L2_S", 4},
+                                                                               {"L2_L", 70},
+                                                                               {"L1_S", 0},
+                                                                               {"L1_L", 40},
+                                                                               {"REG", 140},
+                                                                               {"AMX", 1}}),

Member Author

marenz2569 Nov 22, 2024

These values should be updated. E.g. the L1 data cache size changed from SkylakeSP to Sapphire Rapids

Member Author

marenz2569 commented Nov 23, 2024

After taking another look, you will also need to specialize the compilePayload function so the call to CompiledX86Payload::create<AVX512Payload>(Stats, Code) uses AVX512WithAMXPayload in case of AMX and AVX512Payload otherwise. This will also cause the correct overloaded init function to be used.

cyssi-cb added 4 commits

November 25, 2024 09:51


          [ADD] small fixes

872ffa1


          [ADD] small fixes>

a30f850


          [FIX] casts and types

b4618e5


          [FIX] cast and include

13202bd

marenz2569 mentioned this pull request

AMX extension #68

Closed

Base automatically changed from code-style-enforcing to master

December 5, 2024 12:47

marenz2569 changed the base branch from master to code-style-enforcing

December 5, 2024 13:36

Base automatically changed from code-style-enforcing to master

December 5, 2024 14:22

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet