Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to BLIS 0.9.0 #1

Merged
merged 616 commits into from
May 5, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
616 commits
Select commit Hold shift + click to select a range
bf1b578
Reduced KC on skx from 384 to 256.
fgvanzee Mar 19, 2021
57ef61f
Merge branch 'master' of github.com:flame/blis
fgvanzee Mar 19, 2021
ca83f95
CREDITS file update.
fgvanzee Mar 22, 2021
e56d9f2
ReleaseNotes.md update in advance of next version.
fgvanzee Mar 22, 2021
8535b3e
Version file update (0.8.1)
fgvanzee Mar 22, 2021
545e6c2
CHANGELOG update (0.8.1)
fgvanzee Mar 22, 2021
159ca6f
Made test/3/octave scripts robust to missing data.
fgvanzee Mar 24, 2021
36cb411
Switch allocator mutexes to static initialization.
fgvanzee Mar 27, 2021
3a6f41a
Renamed membrk files/vars/functions to pba.
fgvanzee Mar 27, 2021
0450249
Always stay initialized after BLAS compat calls.
fgvanzee Mar 29, 2021
22c6b5d
Fixed bug in power10 microkernel I/O. (#488)
nicholaiTukanov Mar 31, 2021
9050819
Update do_sde.sh (#489)
devinamatthews Mar 31, 2021
f9ad55c
Merge branch 'master' into dev
fgvanzee Mar 31, 2021
09bd4f4
Add err_t* "return" parameter to malloc functions.
fgvanzee Mar 31, 2021
ba3ba8d
Minor updates and fixes to test/3/octave scripts.
fgvanzee Apr 6, 2021
2688f21
Added Fujitsu A64fx (512-bit SVE) perf results.
fgvanzee Apr 7, 2021
1e6ed82
Additional A64fx Comments (#490)
xrq-phys Apr 7, 2021
6280757
Minor updates to a64fx section of Performance.md.
fgvanzee Apr 7, 2021
6548ceb
Allow clang for ThunderX2 config
devinamatthews Apr 14, 2021
1f3461a
Fix typo in FAQ.md
cassiersg Apr 21, 2021
40ce5fd
Merge pull request #493 from cassiersg/patch-1
devinamatthews Apr 21, 2021
f6424b5
Added dedicated Performance section to README.md.
fgvanzee Apr 23, 2021
6a4aa98
Fixed typo in Table of Contents.
fgvanzee Apr 23, 2021
4534daf
Minor API breakage in bli_pack API.
fgvanzee Apr 27, 2021
6a89c7d
Defined setijv, getijv to set/get vector elements.
fgvanzee May 1, 2021
5d46dbe
Replace bli_dlamch with something less archaic (#498)
devinamatthews May 12, 2021
f0e8634
Defined eqsc, eqv, eqm to test object equality.
fgvanzee May 12, 2021
5aa63cd
Fixed typo in cpp guard in bli_util_ft.h.
fgvanzee May 13, 2021
d4427a5
Minor preprocessor/header cleanup.
fgvanzee May 13, 2021
b683d01
Use extra #undef when including ba/ex API headers.
fgvanzee May 13, 2021
61584de
Added 512b SVE-based a64fx subconfig + SVE kernels.
xrq-phys May 19, 2021
91d3636
Travis Support Arm SVE
xrq-phys May 15, 2021
bd156a2
Adjust TravisCI
xrq-phys May 15, 2021
932dfe6
Travis CI Revert Unnecessary Extras from 91d3636
xrq-phys May 19, 2021
859fb77
Remove `rm-dupls` function in common.mk.
devinamatthews May 23, 2021
6d4ab02
Merge pull request #502 from flame/rm-rm-dupls
devinamatthews May 23, 2021
5feb04e
Add explicit compiler check for Windows.
devinamatthews May 23, 2021
cbd8d39
Merge pull request #500 from xrq-phys/armsve+travis
devinamatthews May 24, 2021
e5c85da
Merge pull request #503 from flame/windows-compiler-check
devinamatthews May 24, 2021
82af05f
Updated Fugaku (a64fx) performance results.
fgvanzee May 25, 2021
213dce3
Added a new 'gemmlike' sandbox.
fgvanzee May 28, 2021
7fabd89
Asm Flag Mingling for Darwin_Aarch64
xrq-phys May 29, 2021
916e1fa
Armv8A Rename Regs for Clang Compile: FP64 Part
xrq-phys May 29, 2021
9f4a4a3
Armv8A Rename Regs for Clang Compile: FP32 Part
xrq-phys May 29, 2021
5fc93e2
Armv8A Rename Regs for Safe Darwin Compile
xrq-phys May 29, 2021
7f7d726
Fixed bugs in cpackm kernels, gemmlike code.
fgvanzee May 31, 2021
7c3eb44
Add vhsubpd/vhsubpd.
devinamatthews Jun 2, 2021
d10e05b
Sandbox header edits trigger full library rebuild.
fgvanzee Jun 14, 2021
689fa0f
Merge branch 'master' into dev
fgvanzee Jun 14, 2021
56ffca6
Fix asm warning
Jun 15, 2021
e28f2a2
Merge pull request #513 from nicholaiTukanov/asm_warning_p9_fix
devinamatthews Jun 16, 2021
bf72763
Merge pull request #506 from xrq-phys/arm64-mac
devinamatthews Jun 18, 2021
bc10a3f
Merge pull request #492 from flame/thunderx2-clang
devinamatthews Jun 19, 2021
aaa10c8
Skip clearing temp microtile in gemmlike sandbox.
fgvanzee Jun 21, 2021
907226c
Rework POWER10 sandbox
nicholaiTukanov Jul 3, 2021
d073fc9
Update POWER10.md
nicholaiTukanov Jul 3, 2021
a201a53
Always run `make check`.
devinamatthews Jul 6, 2021
ad6231c
Fixed configure script bug.
Jul 6, 2021
5ef7f68
Merge pull request #515 from chengguosun/bug-fix
devinamatthews Jul 6, 2021
78eac6a
Revert "Always run `make check`."
devinamatthews Jul 6, 2021
f648df4
Add symlink to blis.pc.in for out-of-tree builds
awild82 Jul 6, 2021
551c6b4
Merge pull request #519 from awild82/oot_build_bugfix
devinamatthews Jul 7, 2021
174f7fc
Test installation in Travis CI
devinamatthews Jul 7, 2021
69205ac
CREDITS file update.
fgvanzee Jul 7, 2021
4651583
Merge pull request #520 from flame/travis-ci-install
devinamatthews Jul 7, 2021
75f0390
Add comment about make checkblas on Windows
devinamatthews Jul 7, 2021
9a8e649
Fix Win64 AVX512 bug.
devinamatthews Jul 7, 2021
c9a7f59
Merge pull request #522 from flame/windows-avx512
devinamatthews Jul 8, 2021
17729cf
Add vzeroupper to Haswell microkernels. (#524)
devinamatthews Jul 9, 2021
21911d6
Merge branch 'dev'
fgvanzee Jul 9, 2021
84f9dcd
Remove unnecesary windows/zen2 directory.
devinamatthews Jul 13, 2021
fab5c86
Merge pull request #516 from nicholaiTukanov/p10-sandbox-rework
devinamatthews Jul 13, 2021
cc9206d
Added Graviton2 Neoverse N1 performance results.
fgvanzee Jul 16, 2021
8dba1e7
CREDITS file update.
fgvanzee Jul 27, 2021
868b901
Fixed one-time use property of bli_init() (#525).
fgvanzee Aug 4, 2021
c8728cf
Fixed configure breakage on OSX clang.
fgvanzee Aug 5, 2021
a32257e
Fixed bli_init.c compile-time error on OSX clang.
fgvanzee Aug 5, 2021
64a1f78
Implement proposed new function pointer fields for obj_t.
devinamatthews Aug 11, 2021
a29c163
Armv8-A Add 8x4 Kernel WIP
xrq-phys May 28, 2021
6639999
Armv8A DGEMM 4x4 Kernel WIP. Slow
xrq-phys May 28, 2021
df40efe
Armv8-A Add Part of GEMMSUP 8x4m Kernel
xrq-phys Jun 1, 2021
a9ba79e
Armv8-A Add GEMMSUP 4x8n Kernel
xrq-phys Jun 2, 2021
8ed8f5e
Armv8-A Add More DGEMMSUP
xrq-phys Jun 3, 2021
3efe707
Armv8-A DGEMMSUP Adjustments
xrq-phys Jun 3, 2021
c3faf93
Armv8-A DGEMMSUP 6x8m Kernel
xrq-phys Jun 3, 2021
49b05df
Armv8-A Introduced s/d Packing Kernels
xrq-phys Jun 4, 2021
3c5f740
Armv8-A s/d Packing Kernels Fix Typo
xrq-phys Jun 4, 2021
afd0fa6
Armv8-A GEMMSUP-RD 6x8n
xrq-phys Jun 4, 2021
8a32d19
Armv8-A GEMMSUP-RD 6x8m
xrq-phys Jun 4, 2021
ce44735
Armv8-A Adjust Types for PACKM Kernels
xrq-phys Jun 4, 2021
c792d50
Armv8-A Fix GEMMSUP-RD Kernels on GNU Asm
xrq-phys Jun 4, 2021
4e7e225
Armv8-A Supplimentary GEMMSUP Sizes for RD
xrq-phys Jun 9, 2021
3df0e9b
Arm64 8x4 Kernel Use Less Regs
xrq-phys Jul 16, 2021
e38ca28
Added Apple Firestorm (A14/M1) Subconfig
xrq-phys Aug 12, 2021
e366665
Fixed stale API calls to membrk API in gemmlike.
fgvanzee Aug 12, 2021
20a1c40
Disabled sanity check in bli_pool_finalize().
fgvanzee Aug 12, 2021
ec06b6a
Add dependency on the "flat" blis.h file for the BLIS and BLAS testsu…
devinamatthews Aug 13, 2021
3cddce1
Remove schema field on obj_t (redundant) and add new API functions.
devinamatthews Aug 13, 2021
4f70eb7
Clean up some warnings that show up on clang/OSX.
devinamatthews Aug 13, 2021
1772db0
Add row- and column-strides for A/B in obj_ukr_fn_t.
devinamatthews Aug 13, 2021
e6d68bc
Merge pull request #529 from flame/fix_make_check_dependencies
devinamatthews Aug 13, 2021
c99fae5
Merge pull request #530 from flame/fix_clang_warnings
devinamatthews Aug 13, 2021
4b8ed99
Whitespace tweaks.
fgvanzee Aug 13, 2021
2c0b415
Merge pull request #527 from flame/obj_t_makeover
devinamatthews Aug 14, 2021
4a955e9
Tweaks to gemmlike to facilitate 3rd party mods.
fgvanzee Aug 16, 2021
7144230
README.md citation updates (e.g. BLIS7 bibtex).
fgvanzee Aug 18, 2021
3eccfd4
Added local _check() code to gemmlike sandbox.
fgvanzee Aug 19, 2021
3b275f8
Minor tweaks to gemmlike sandbox.
fgvanzee Aug 19, 2021
7d5903d
Arm64 Fix: Support Alpha/Beta in GEMMSUP Intrin
xrq-phys Aug 20, 2021
e6799b2
Arm: Implement GEMMSUP Fallback Method
xrq-phys Aug 20, 2021
e320ec6
Moved lang defs from _macro_def.h to _lang_defs.h.
fgvanzee Aug 20, 2021
5fc65cd
Add test to Travis using C++ compiler to make sure blis.h is C++-comp…
devinamatthews Aug 21, 2021
eaea674
Merge branch 'master' into cxx_test
devinamatthews Aug 21, 2021
a361492
Arm: DGEMMSUP ?rc(rd) Invoke Edge Size
xrq-phys Aug 22, 2021
35409eb
Arm: DGEMMSUP ??r(rv) Invoke Edge Size
xrq-phys Aug 22, 2021
4fd82b0
Header Typo
xrq-phys Aug 22, 2021
7e2951e
Arm: DGEMMSUP `Macro' Edge Cases Stop Calling Ref
xrq-phys Aug 23, 2021
2f7325b
Blacklist clang10/gcc9 and older for 'armsve'.
fgvanzee Aug 23, 2021
d6eb70f
Updated stale calls to malloc_intl() in gemmlike.
fgvanzee Aug 26, 2021
8e0c425
Define BLIS_OS_NONE when using --disable-system.
fgvanzee Aug 26, 2021
820f11a
Arm Whole GEMMSUP Call Route is Asm/Int Optimized
xrq-phys Aug 27, 2021
2be78fc
Disabled (at least temporarily) commit 8e0c425.
fgvanzee Aug 27, 2021
ade10f4
Updated travis-ci.org link in README.md to .com.
fgvanzee Aug 27, 2021
9c0064f
Fix config_name in bli_arch.c
devinamatthews Sep 10, 2021
fbb3560
Attempt to fix cxx-test for OOT builds.
devinamatthews Sep 10, 2021
e486d66
Use C++ cross-compiler for ARM tests.
devinamatthews Sep 10, 2021
c76fcad
Fix AArch64 tests and consolidate some other tests.
devinamatthews Sep 10, 2021
98ce6e8
Do a fast test on OSX. [ci skip]
devinamatthews Sep 10, 2021
9293a68
Merge pull request #534 from flame/cxx_test
devinamatthews Sep 10, 2021
bffa85b
Arm SVE: Correct PACKM Ker Name: Intrinsic Kers
xrq-phys Sep 15, 2021
30c29b2
Arm SVE Exclude SVE-Intrinsic Kernels for GCC 8-9
xrq-phys Sep 15, 2021
5191c43
Fix more copy-paste errors in the haswell gemmsup code.
devinamatthews Sep 16, 2021
e3dc195
Fix problem where uninitialized registers are included in vhaddpd in …
devinamatthews Sep 16, 2021
b6f71fd
Merge pull request #544 from flame/haswell-gemmsup-fpe
devinamatthews Sep 16, 2021
849aae0
Added new packm var3 to 'gemmlike'.
fgvanzee Sep 16, 2021
52f29f7
Removed last vestige of #define BLIS_NUM_ARCHS.
fgvanzee Sep 17, 2021
eaa554a
bli_error: more cleanup on the error strings array
Sep 15, 2021
fb93d24
Re-enable and fix 8e0c425 (BLIS_ENABLE_SYSTEM).
fgvanzee Sep 20, 2021
7b39c14
Reverted fb93d24.
fgvanzee Sep 20, 2021
1f527a9
Re-enable and fix fb93d24.
fgvanzee Sep 20, 2021
1fc23d2
Safelist 'master', 'dev', 'amd' branches.
fgvanzee Sep 21, 2021
c52c431
Merge branch 'dev'
fgvanzee Sep 26, 2021
89aaf00
Updates to FAQ.md, Sandboxes.md, and README.md.
fgvanzee Sep 28, 2021
3442d40
More minor fixes to FAQ.md and Sandboxes.md.
fgvanzee Sep 28, 2021
b36fb0f
Fixed newly broken link to CREDITS in FAQ.md.
fgvanzee Sep 28, 2021
5013a6c
More edits and fixes to docs/FAQ.md.
fgvanzee Sep 29, 2021
ae0eeea
Add explicit handling for beta == 0 in armsve sd and armv7a d gemm ukrs.
devinamatthews Sep 29, 2021
13dbd5b
Apply patch from @xrq-phys.
devinamatthews Oct 2, 2021
0a45bc0
Merge pull request #552 from flame/armsve_beta_0
devinamatthews Oct 2, 2021
abc6483
Armv8 Fix 6x8 Row-Maj Ukr
xrq-phys Oct 3, 2021
f5c03e9
Armv8 Handle *beta == 0 for GEMMSUP ?rc Case.
xrq-phys Oct 3, 2021
91408d1
Use @path-based install name on MacOS and use relocatable RPATH entri…
devinamatthews Oct 4, 2021
d0a0b4b
Arm micro-architecture dispatch (#344)
loveshack Oct 4, 2021
c4a3168
Fix $ORIGIN usage on linux.
devinamatthews Oct 4, 2021
64a421f
Add an option to control whether or not to use @rpath.
devinamatthews Oct 4, 2021
80c5366
Move unused ARM SVE kernels to "old" directory.
devinamatthews Oct 4, 2021
53377fc
Merge pull request #554 from flame/armsve-cleanup
devinamatthews Oct 4, 2021
6d3036e
Merge pull request #545 from hominhquan/clean_error
devinamatthews Oct 4, 2021
9905f44
Merge pull request #553 from flame/rpath-fix
devinamatthews Oct 4, 2021
079fbd4
Merge branch 'master' into arm64-hi-bw
devinamatthews Oct 4, 2021
40baf83
Armv8 Handle *beta == 0 for GEMMSUP ??r Case.
xrq-phys Oct 5, 2021
4bfadf9
Firestorm Block Size Fixes
xrq-phys Oct 5, 2021
353a0d8
Update .appveyor.yml
devinamatthews Oct 5, 2021
c302499
Fix data race in testsuite.
devinamatthews Oct 5, 2021
34919de
Make error checking level a thread-local variable.
devinamatthews Oct 2, 2021
b9da6d5
Armv8 GEMMSUP Edge Cases Require Signed Ints
xrq-phys Oct 6, 2021
a024715
Firestorm CPUID Dispatcher
xrq-phys Oct 6, 2021
14b1358
Add test for Apple M1 (firestorm)
devinamatthews Oct 6, 2021
2920dde
Armv8 DGEMMSUP Fix 8x4m Store Inst. Typo
xrq-phys Oct 6, 2021
d7a3372
Armv8 DGEMMSUP Fix Edge 6x4 Switch Case Typo
xrq-phys Oct 6, 2021
a4066f2
Register firestorm into arm64 Metaconfig
xrq-phys Oct 6, 2021
1e32003
Revert __has_include(). Distinguish w/ BLIS_FAMILY_**
xrq-phys Oct 6, 2021
2604f40
Config ArmSVE Unregister 12xk. Move 12xk to Old
xrq-phys Oct 6, 2021
70b52ca
Enable testing 1m in `make check`.
devinamatthews Oct 7, 2021
f44149f
Armv8 Trash New Bulk Kernels
xrq-phys Oct 7, 2021
2329d99
Update Travis CI badge
devinamatthews Oct 7, 2021
4277fec
Merge pull request #533 from xrq-phys/arm64-hi-bw
devinamatthews Oct 7, 2021
49b9d79
Arm SVE Add ZGEMM 2Vx8 Unindexed
xrq-phys Sep 13, 2021
e13abde
Arm SVE Add ZGEMM 2Vx7 Unindexed
xrq-phys Sep 14, 2021
c19db2f
Arm SVE Add ZGEMM 2Vx10 Unindexed
xrq-phys Sep 15, 2021
3f68e83
Arm SVE ZGEMM Support Gather Load / Scatt. St.
xrq-phys Sep 15, 2021
b677e0d
Arm SVE Add SGEMM 2Vx10 Unindexed
xrq-phys Sep 15, 2021
e4cabb9
Arm SVE Typo Fix ZGEMM/CGEMM C Prefetch Reg
xrq-phys Sep 15, 2021
f7c6c2b
A64FX Config Use ZGEMM/CGEMM
xrq-phys Sep 15, 2021
9e1e781
Arm SVE ZGEMM 2Vx10 Unindex Process Alpha=1.0
xrq-phys Sep 19, 2021
66a018e
Arm SVE CGEMM 2Vx10 Unindex Process Alpha=1.0
xrq-phys Sep 19, 2021
f76ea90
Arm SVE: Update Perf. Graph
xrq-phys Sep 21, 2021
4b648e4
Arm SVE Config armsve Use ZGEMM/CGEMM
xrq-phys Sep 22, 2021
1749dfa
Arm SVE C/ZGEMM Support *beta==0
xrq-phys Oct 8, 2021
82b6128
SH Kernel Unused Eigher
xrq-phys Oct 8, 2021
ccf1628
Arm SVE C/ZGEMM Fix FMOV 0 Mistake
xrq-phys Oct 8, 2021
408906f
Merge pull request #542 from xrq-phys/armsve-zgemm
devinamatthews Oct 9, 2021
32a6d93
Merge pull request #543 from xrq-phys/armsve-packm-fix
devinamatthews Oct 9, 2021
327481a
Fix insufficient pool-growing logic in bli_pool.c. (#559)
hominhquan Oct 12, 2021
81e1034
Alloc at least 1 elem in pool_t block_ptrs. (#560)
hominhquan Oct 13, 2021
e9da642
Allow use of 1m with mixing of row/col-pref ukrs.
fgvanzee Oct 13, 2021
514fd10
Fixed substitution bug in configure.
fgvanzee Oct 14, 2021
290ff4b
Disable SDE testing of old AMD microarchitectures.
fgvanzee Oct 14, 2021
e8caf20
Updated do_sde.sh to get SDE from GitHub.
fgvanzee Oct 18, 2021
f065a80
Removed support for 3m, 4m induced methods.
fgvanzee Oct 28, 2021
cfa3db3
Fixed bug in mixed-dt gemm introduced in e9da642.
fgvanzee Nov 3, 2021
28b0982
Refactored her[2]k/syr[2]k in terms of gemmt. (#531)
devinamatthews Nov 10, 2021
7bc8ab4
Added BLAS/CBLAS APIs for axpby, gemm_batch. (#566)
Meghana-vankadari Nov 11, 2021
7bde468
Added support for addons.
fgvanzee Nov 13, 2021
78cd1b0
Added 'Example Code' section to README.md.
fgvanzee Nov 16, 2021
cbc88fe
Marked some markdown shell code blocks as 'bash'.
fgvanzee Nov 16, 2021
74c0c62
Reverted cbc88fe.
fgvanzee Nov 16, 2021
26e4b6b
Added support for AMD's Zen3 microarchitecture.
dzambare Nov 17, 2021
9be97c1
Support all four dts in test/test_her[2][k].c (#578)
madanm3 Nov 17, 2021
b727645
Merge branch 'dev'
fgvanzee Nov 19, 2021
a4bc03b
Brief mention/link to Addons.md in README.md.
fgvanzee Nov 19, 2021
12c66a4
Minor updates to README.md, docs/Addons.md.
fgvanzee Nov 19, 2021
e229e04
Added recu-sed.sh script to 'build' directory.
fgvanzee Dec 1, 2021
cf7d616
Enable user-customized packm ukernel/variant. (#549)
devinamatthews Dec 2, 2021
961d9d5
Re-add BLIS_ENABLE_ZEN_BLOCK_SIZES macro for 'zen'.
kvaragan Dec 7, 2021
54fa28b
Move edge cases to gemm ukr; more user-custom mods. (#583)
devinamatthews Dec 24, 2021
08174a2
Evict <arm_sve.h> Requirement for SVE GEMM
xrq-phys Jan 1, 2022
466b68a
Add unique tag to branch labels for Apple ARM64.
devinamatthews Jan 2, 2022
864bfab
CREDITS file update.
fgvanzee Jan 4, 2022
3f2440b
Added m, n dims to gemmd/gemmlike ukernel calls.
fgvanzee Jan 6, 2022
268ce1f
Relax alignment constraints
devinamatthews Jan 10, 2022
81f93be
Fix row-/column-major pref. in 16x8 haswell sgemm ukr (unused)
devinamatthews Jan 10, 2022
0ab20c0
the Apple local label thing is required by Clang in general
jeffhammond Jan 13, 2022
0be9282
Updated zen3 macro constant names.
fgvanzee Jan 26, 2022
35195bb
Add armclang detection to configure.
devinamatthews Jan 31, 2022
b5df181
Armv8a, ArmSVE: Simplify Gen-C
xrq-phys Feb 2, 2022
9cc897f
Fix SVE Compil.
xrq-phys Feb 3, 2022
72089bb
ArmSVE Use Predicate in M-Direction
xrq-phys Feb 5, 2022
2f3872e
ArmSVE Adopts Label Wrapper
xrq-phys Feb 7, 2022
2674291
Update CC_VENDOR logic
devinamatthews Feb 13, 2022
5a4d3f5
Use -flat_namespace option to link on macOS
devinamatthews Feb 13, 2022
2506159
Don't use `-Wl,-flat-namespace`.
devinamatthews Feb 14, 2022
ee9ff98
Move edge cases to gemmtrsm ukrs; doc updates.
fgvanzee Feb 15, 2022
c9700f3
Renamed SIMD-related macro constants for clarity.
fgvanzee Feb 15, 2022
4d83523
Add armsve to arm64 Metaconfig (#614)
xrq-phys Feb 22, 2022
d514658
ArmSVE Ensure Non-zero Block Size (#615)
xrq-phys Feb 22, 2022
84732bf
Revamp how tools are handled/checked by configure.
fgvanzee Feb 28, 2022
71851a0
Fixed level-3 performance bug in haswell ukernels.
fgvanzee Mar 8, 2022
cad1041
POWER10: edge cases in microkernel (#620)
ivan23kor Mar 10, 2022
7c07b47
Avoid gemmsup barriers when not packing A or B. (#622)
fgvanzee Mar 11, 2022
f1dbb0e
Trival whitespace change; commit log addendum.
fgvanzee Mar 11, 2022
d681000
Update Multithreading.md
devinamatthews Mar 14, 2022
0db2bd5
Added BLAS/CBLAS APIs for gemm3m. (#590)
BhaskarNallani Mar 24, 2022
1ec020b
AMD kernel updates; frame-specific AMD updates. (#597)
dzambare Mar 29, 2022
cf06364
Fixed typo in BLAS gemm3m call to _check().
fgvanzee Mar 29, 2022
bee7678
CREDITS file update.
fgvanzee Mar 31, 2022
99bb900
ReleaseNotes.md update in advance of next version.
fgvanzee Apr 1, 2022
14c86f6
Version file update (0.9.0)
fgvanzee Apr 1, 2022
aeb9d30
Update to 0.9.0
danieldk Apr 4, 2022
967afeb
Add additional architectures for compilers without Zen 2/3 support
danieldk Apr 4, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
5 changes: 4 additions & 1 deletion .appveyor.yml
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
skip_branch_with_pr: true

environment:
matrix:
- LIB_TYPE: shared
Expand Down Expand Up @@ -39,10 +41,11 @@ build_script:
- bash -lc "cd /c/projects/blis && ./configure %CONFIGURE_OPTS% --enable-threading=%THREADING% --enable-arg-max-hack --prefix=/c/blis %CONFIG%"
- bash -lc "cd /c/projects/blis && mingw32-make -j4 V=1"
- bash -lc "cd /c/projects/blis && mingw32-make install"
- ps: Compress-Archive -Path C:\blis -DestinationPath C:\blis.zip
- 7z a C:\blis.zip C:\blis
- ps: Push-AppveyorArtifact C:\blis.zip

test_script:
# "make checkblas" does not work with shared linking Windows due to inability to override xerbla_
- if [%LIB_TYPE%]==[shared] set "TEST_TARGET=checkblis-fast"
- if [%LIB_TYPE%]==[static] set "TEST_TARGET=check"
- bash -lc "cd /c/projects/blis && mingw32-make %TEST_TARGET% -j4 V=1"
Expand Down
9 changes: 9 additions & 0 deletions .dir-locals.el
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
;; First (minimal) attempt at configuring Emacs CC mode for the BLIS
;; layout requirements.
((c-mode . ((c-file-style . "stroustrup")
(c-basic-offset . 4)
(comment-start . "// ")
(comment-end . "")
(indent-tabs-mode . t)
(tab-width . 4)
(parens-require-spaces . nil))))
8 changes: 7 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@

config.mk
bli_config.h
bli_addon.h

# -- monolithic headers --

Expand All @@ -43,7 +44,12 @@ include/*/*.h
# -- misc. --

# BLIS testsuite output file
output.testsuite
output.testsuite.*

# BLAS test output files
out.*

# GTAGS database
GPATH
GRTAGS
GTAGS
88 changes: 44 additions & 44 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -1,83 +1,83 @@
language: c
sudo: required
dist: trusty
env:
global:
secure: "Ty3PM1xGhXwxfJG6YyY9bUZyXzw98ekHxQEqU9VnrMXTZb28IxfocPCXHjL34r9HTGosO5Pmierhal1Cs3ZKE5ZAJqJhCfck+kwlH21Uay5CNYglDtSmy2qxtbbDG4AxpEZ1UKlIZr1pNh/x+pRemSmnMEnQp/E7QJqdkhm4+aMX2bWKyLPtrdL+B9QXLVT2nT6/Fw3i05aBhpcFJpSPfvYX2KoCZYdJOSKcKci4T8nAfP/c0olkz+jAkBZxZFgO9Ptrt/lvHtVPrkh5o29GvHg2i/4vucbsMltoxlV31/2eYpdr17Ngtt41MMVn2fHV4lVhLmENc04nlm084fBtg73T6b8hNy5JlcA44xI/UrPJsQAJ+0A0ds9BbBQKPxOmaF/O8WGXhwiwdKT6DGS9lj05f3S+yZfeNE3pQhLEcvwXLO5SW3VvKXMj0t/lZyG+XCkvFjD7KEPQV4g+BZc2zzD9TwDx3ydn8Uzd6zZlq1erQUzCnODP24wuwfrNP8nqxFYG0VtI8oZW62IC9U2hcnAF5QNXXW3yDYD65k3BHbigfI28gu9iO9G8RxOglR27J7Whdqkqw3AMRaqyHt2tdbz7tM2dLZ0EatT5m8esjC+LP4EshW9C59jP2U9vJ/94YEgOfwiqk8+e6fL/7dJvOumbwu1RclRI9DS88PPYb3Q="
dist: focal
branches:
only:
- master
- dev
- amd
matrix:
include:
# full testsuite (all tests except for mixed datatype)
# full testsuite (all tests + mixed datatype (gemm_nn only) + salt + SDE + OOT)
- os: linux
compiler: gcc
env: OOT=0 TEST=1 SDE=0 THR="none" CONF="auto"
# mixed-datatype testsuite (gemm_nn only)
- os: linux
compiler: gcc
env: OOT=0 TEST=MD SDE=0 THR="none" CONF="auto"
# salt testsuite (fast set of operations+parameters)
- os: linux
compiler: gcc
env: OOT=0 TEST=SALT SDE=0 THR="none" CONF="auto"
# test x86_64 ukrs with SDE
- os: linux
compiler: gcc
env: OOT=0 TEST=0 SDE=1 THR="none" CONF="x86_64"
env: OOT=1 TEST=ALL SDE=1 THR="none" CONF="x86_64" \
PACKAGES="gcc-9 binutils"
# openmp build
- os: linux
compiler: gcc
env: OOT=0 TEST=0 SDE=0 THR="openmp" CONF="auto"
env: OOT=0 TEST=FAST SDE=0 THR="openmp" CONF="auto" \
PACKAGES="gcc-9 binutils"
# pthreads build
- os: linux
compiler: gcc
env: OOT=0 TEST=0 SDE=0 THR="pthreads" CONF="auto"
# out-of-tree build
- os: linux
compiler: gcc
env: OOT=1 TEST=0 SDE=0 THR="none" CONF="auto"
env: OOT=0 TEST=FAST SDE=0 THR="pthreads" CONF="auto" \
PACKAGES="gcc-9 binutils"
# clang build
- os: linux
compiler: clang
env: OOT=0 TEST=0 SDE=0 THR="none" CONF="auto"
env: OOT=0 TEST=FAST SDE=0 THR="none" CONF="auto"
# There seems to be some difficulty installing 2 Clang toolchains of different versions.
# Use the TravisCI default.
# PACKAGES="clang-8 binutils"
# macOS with system compiler (clang)
- os: osx
compiler: clang
env: OOT=0 TEST=1 SDE=0 THR="none" CONF="auto"
env: OOT=0 TEST=FAST SDE=0 THR="none" CONF="auto"
# cortexa15 build and fast testsuite (qemu)
- os: linux
compiler: arm-linux-gnueabihf-gcc
env: OOT=0 TEST=FAST SDE=0 THR="none" CONF="cortexa15" \
PACKAGES="gcc-arm-linux-gnueabihf qemu-system-arm qemu-user" \
CC=arm-linux-gnueabihf-gcc CXX=arm-linux-gnueabihf-g++ \
PACKAGES="gcc-arm-linux-gnueabihf g++-arm-linux-gnueabihf libc6-dev-armhf-cross qemu-system-arm qemu-user" \
TESTSUITE_WRAPPER="qemu-arm -cpu cortex-a15 -L /usr/arm-linux-gnueabihf/"
# cortexa57 build and fast testsuite (qemu)
- os: linux
compiler: aarch64-linux-gnu-gcc
env: OOT=0 TEST=FAST SDE=0 THR="none" CONF="cortexa57" \
PACKAGES="gcc-aarch64-linux-gnu qemu-system-arm qemu-user" \
CC=aarch64-linux-gnu-gcc CXX=aarch64-linux-gnu-g++ \
PACKAGES="gcc-aarch64-linux-gnu g++-aarch64-linux-gnu libc6-dev-arm64-cross qemu-system-arm qemu-user" \
TESTSUITE_WRAPPER="qemu-aarch64 -L /usr/aarch64-linux-gnu/"
# Apple M1 (firestorm) build and fast testsuite (qemu)
- os: linux
compiler: aarch64-linux-gnu-gcc
env: OOT=0 TEST=FAST SDE=0 THR="none" CONF="firestorm" \
CC=aarch64-linux-gnu-gcc CXX=aarch64-linux-gnu-g++ \
PACKAGES="gcc-aarch64-linux-gnu g++-aarch64-linux-gnu libc6-dev-arm64-cross qemu-system-arm qemu-user" \
TESTSUITE_WRAPPER="qemu-aarch64 -L /usr/aarch64-linux-gnu/"
# armsve build and fast testsuite (qemu)
- os: linux
compiler: aarch64-linux-gnu-gcc-10
env: OOT=0 TEST=FAST SDE=0 THR="none" CONF="armsve" \
CC=aarch64-linux-gnu-gcc-10 CXX=aarch64-linux-gnu-g++-10 \
PACKAGES="gcc-10-aarch64-linux-gnu g++-10-aarch64-linux-gnu libc6-dev-arm64-cross qemu-system-arm qemu-user" \
TESTSUITE_WRAPPER="qemu-aarch64 -cpu max,sve=true,sve512=true -L /usr/aarch64-linux-gnu/"
install:
- if [ "$TRAVIS_OS_NAME" = "linux" ]; then sudo rm -f /usr/bin/as; fi
- if [ "$TRAVIS_OS_NAME" = "linux" ]; then sudo ln -s /usr/lib/binutils-2.26/bin/as /usr/bin/as; fi
- if [ "$TRAVIS_OS_NAME" = "linux" ]; then sudo rm -f /usr/bin/ld; fi
- if [ "$TRAVIS_OS_NAME" = "linux" ]; then sudo ln -s /usr/lib/binutils-2.26/bin/ld /usr/bin/ld; fi
- if [ "$CC" = "gcc" ] && [ "$TRAVIS_OS_NAME" = "linux" ]; then export CC="gcc-6"; fi
- if [ -n "$PACKAGES" ]; then sudo apt-get install -y $PACKAGES; fi
addons:
apt:
sources:
- ubuntu-toolchain-r-test
packages:
- gcc-6
- binutils-2.26
- clang
- if [ "$CC" = "gcc" ] && [ "$TRAVIS_OS_NAME" = "linux" ]; then export CC="gcc-9"; fi
- if [ -n "$PACKAGES" ] && [ "$TRAVIS_OS_NAME" = "linux" ]; then sudo apt-get install -y $PACKAGES; fi
script:
- export DIST_PATH=.
- pwd
- if [ $OOT -eq 1 ]; then export DIST_PATH=`pwd`; mkdir ../oot; cd ../oot; chmod -R a-w $DIST_PATH; fi
- pwd
- $DIST_PATH/configure -t $THR CC=$CC $CONF
- $DIST_PATH/configure -p `pwd`/../install -t $THR CC=$CC $CONF
- pwd
- ls -l
- $CC --version
- make -j 2
- make install
- $DIST_PATH/travis/cxx/cxx-test.sh $DIST_PATH $(ls -1 include)
# Qemu SVE is failing sgemmt in some cases. Skip as this issue is not observed on real chip (A64fx).
- if [ "$CONF" = "armsve" ]; then sed -i 's/.*\<gemmt\>.*/0/' $DIST_PATH/testsuite/input.operations.fast; fi
- if [ "$TEST" != "0" ]; then travis_wait 30 $DIST_PATH/travis/do_testsuite.sh; fi
- if [ $SDE -eq 1 ] && [ "$TRAVIS_PULL_REQUEST" = "false" ] ; then travis_wait 30 $DIST_PATH/travis/do_sde.sh; fi
- if [ "$SDE" = "1" ]; then travis_wait 30 $DIST_PATH/travis/do_sde.sh; fi
Loading