Skip to content

Commit

Permalink
Removed support for 3m, 4m induced methods.
Browse files Browse the repository at this point in the history
Details:
- Removed support for all induced methods except for 1m. This included
  removing code related to 3mh, 3m1, 4mh, 4m1a, and 4m1b as well as any
  code that existed only to support those implementations. These
  implementations were rarely used and posed code maintenance challenges
  for BLIS's maintainers going forward.
- Removed reference kernels for packm that pack 3m and 4m micropanels,
  and removed 3m/4m-related code from bli_cntx_ref.c.
- Removed support for 3m/4m from the code in frame/ind, then reorganized
  and streamlined the remaining code in that directory. The *ind(),
  *nat(), and *1m() APIs were all removed. (These additional API layers
  no longer made as much sense with only one induced method (1m) being
  supported.) The bli_ind.c file (and header) were moved to frame/base
  and bli_l3_ind.c (and header) and bli_l3_ind_tapi.h were moved to
  frame/3.
- Removed 3m/4m support from the code in frame/1m/packm.
- Removed 3m/4m support from trmm/trsm macrokernels and simplified some
  pointer arithmetic that was previously expressed in terms of the
  bli_ptr_inc_by_frac() static inline function (whose definition was
  also removed).
- Removed the following subdirectories of level-0 macro headers from
  frame/include/level0: ri3, rih, ri, ro, rpi. The level-0 scalar macros
  defined in these directories were used exclusively for 3m and 4m
  method codes.
- Simplified bli_cntx_set_blkszs() and bli_cntx_set_ind_blkszs() in
  light of 1m being the only induced method left within BLIS.
- Removed dt_on_output field within auxinfo_t and its associated
  accessor functions.
- Re-indexed the 1e/1r pack schemas after removing those associated with
  variants of the 3m and 4m methods. This leaves two bits unused within
  the pack format portion of the schema bitfield. (See bli_type_defs.h
  for more info.)
- Spun off the basic and expert interfaces to the object and typed APIs
  into separate files: bli_l3_oapi.c and bli_l3_oapi_ex.c; bli_l3_tapi.c
  and bli_l3_tapi_ex.c.
- Moved the level-3 operation-specific _check function calls from the
  operations' _front() functions to the corresponding _ex() function of
  the object API. (This change roughly maintains where the _check()
  functions are called in the call stack but lays the groundwork for
  future changes that may come to the level-3 object APIs.) Minor
  modifications to bli_l3_check.c to allow the check() functions to be
  called from the expert interface APIs.
- Removed support within the testsuite for testing the aforementioned
  induced methods, and updated the standalone test drivers in the 'test'
  directory so reflect the retirement of those induced methods.
- Modified the sandbox contract so that the user is obliged to define
  bli_gemm_ex() instead of bli_gemmnat(). (This change was made in light
  of the *nat() functions no longer existing.) Also updated the existing
  'power10' and 'gemmlike' sandboxes to come into compliance with the
  new sandbox rules.
- Updated BLISObjectAPI.md, BLISTypedAPI.md, Testsuite.md documentation
  to reflect the retirement of 3m/4m, and also modified Sandboxes.md to
  bring the document into alignment with new conventions.
- Updated various comments; removed segments of commented-out code.
- (cherry picked from commit f065a80)
  • Loading branch information
fgvanzee committed Sep 10, 2022
1 parent d8adf1d commit a538807
Show file tree
Hide file tree
Showing 163 changed files with 1,441 additions and 17,012 deletions.
7 changes: 0 additions & 7 deletions docs/BLISObjectAPI.md
Original file line number Diff line number Diff line change
Expand Up @@ -2336,16 +2336,9 @@ char* bli_info_get_trsm_u_ukr_impl_string( ind_t method, num_t dt )
```

Possible implementation (ie: the `ind_t method` argument) types are:
* `BLIS_3MH`: Implementation based on the 3m method applied at the highest level, outside the 5th loop around the microkernel.
* `BLIS_3M1`: Implementation based on the 3m method applied within the 1st loop around the microkernel.
* `BLIS_4MH`: Implementation based on the 4m method applied at the highest level, outside the 5th loop around the microkernel.
* `BLIS_4M1B`: Implementation based on the 4m method applied within the 1st loop around the microkernel. Computation is ordered such that the 1st loop is fissured into two loops, the first of which multiplies the real part of the current micropanel of packed matrix B (against all real and imaginary parts of packed matrix A), and the second of which multiplies the imaginary part of the current micropanel of packed matrix B.
* `BLIS_4M1A`: Implementation based on the 4m method applied within the 1st loop around the microkernel. Computation is ordered such that real and imaginary components of the current micropanels are completely used before proceeding to the next virtual microkernel invocation.
* `BLIS_1M`: Implementation based on the 1m method. (This is the default induced method when real domain kernels are present but complex kernels are missing.)
* `BLIS_NAT`: Implementation based on "native" execution (ie: NOT an induced method).

**NOTE**: `BLIS_3M3` and `BLIS_3M2` have been deprecated from the `typedef enum` of `ind_t`, and `BLIS_4M1B` is also effectively no longer available, though the `typedef enum` value still exists.

Possible microkernel types (ie: the return values for `bli_info_get_*_ukr_impl_string()`) are:
* `BLIS_REFERENCE_UKERNEL` (`"refrnce"`): This value is returned when the queried microkernel is provided by the reference implementation.
* `BLIS_VIRTUAL_UKERNEL` (`"virtual"`): This value is returned when the queried microkernel is driven by a the "virtual" microkernel provided by an induced method. This happens for any `method` value that is not `BLIS_NAT` (ie: native), but only applies to the complex domain.
Expand Down
7 changes: 0 additions & 7 deletions docs/BLISTypedAPI.md
Original file line number Diff line number Diff line change
Expand Up @@ -2015,16 +2015,9 @@ char* bli_info_get_trsm_u_ukr_impl_string( ind_t method, num_t dt )
```

Possible implementation (ie: the `ind_t method` argument) types are:
* `BLIS_3MH`: Implementation based on the 3m method applied at the highest level, outside the 5th loop around the microkernel.
* `BLIS_3M1`: Implementation based on the 3m method applied within the 1st loop around the microkernel.
* `BLIS_4MH`: Implementation based on the 4m method applied at the highest level, outside the 5th loop around the microkernel.
* `BLIS_4M1B`: Implementation based on the 4m method applied within the 1st loop around the microkernel. Computation is ordered such that the 1st loop is fissured into two loops, the first of which multiplies the real part of the current micropanel of packed matrix B (against all real and imaginary parts of packed matrix A), and the second of which multiplies the imaginary part of the current micropanel of packed matrix B.
* `BLIS_4M1A`: Implementation based on the 4m method applied within the 1st loop around the microkernel. Computation is ordered such that real and imaginary components of the current micropanels are completely used before proceeding to the next virtual microkernel invocation.
* `BLIS_1M`: Implementation based on the 1m method. (This is the default induced method when real domain kernels are present but complex kernels are missing.)
* `BLIS_NAT`: Implementation based on "native" execution (ie: NOT an induced method).

**NOTE**: `BLIS_3M3` and `BLIS_3M2` have been deprecated from the `typedef enum` of `ind_t`, and `BLIS_4M1B` is also effectively no longer available, though the `typedef enum` value still exists.

Possible microkernel types (ie: the return values for `bli_info_get_*_ukr_impl_string()`) are:
* `BLIS_REFERENCE_UKERNEL` (`"refrnce"`): This value is returned when the queried microkernel is provided by the reference implementation.
* `BLIS_VIRTUAL_UKERNEL` (`"virtual"`): This value is returned when the queried microkernel is driven by a the "virtual" microkernel provided by an induced method. This happens for any `method` value that is not `BLIS_NAT` (ie: native), but only applies to the complex domain.
Expand Down
52 changes: 20 additions & 32 deletions docs/Sandboxes.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,13 +17,9 @@ Simply put, a sandbox in BLIS provides an alternative implementation to the
`gemm` operation.

To get a little more specific, a sandbox provides an alternative implementation
to the function `bli_gemmnat()`, which is the object-based API call for
computing the `gemm` operation via native execution.

**Note**: Native execution simply means that an induced method will not be used.
It's what you probably already think of when you think of implementing the
`gemm` operation: a series of loops around an optimized (usually assembly-based)
microkernel with some packing functions thrown in at various levels.
to the function `bli_gemm_ex()`, which is the
[expert interface](BLISObjectAPI.md##basic-vs-expert-interfaces) for calling the
[object-based API](BLISObjectAPI.md#gemm) for the `gemm` operation.

Why sandboxes? Sometimes you want to experiment with tweaks or changes to
the `gemm` operation, but you want to do so in a simple environment rather than
Expand All @@ -45,18 +41,11 @@ corresponds to a sub-directory of `sandbox` named `gemmlike`. (Reminder: the
`auto` argument is the configuration target and thus unrelated to
sandboxes.)

NOTE: If you want your sandbox implementation to handle *all* problem
sizes and shapes, you'll need to disable the skinny/unpacked "sup"
sub-framework within BLIS, which is enabled by default. This can be
done by passing the `--disable-sup-handling` option to configure:
```
$ ./configure --enable-sandbox=gemmlike --disable-sup-handling auto
```
If you leave sup enabled, the sup implementation will, at runtime, detect
and handle certain smaller problem sizes upstream of where BLIS calls
`bli_gemmnat()` while all other problems will fall to your sandbox
implementation. Thus, you should only leave sup enabled if you are fine
with those smaller problems being handled by sup.
NOTE: Using your own sandbox implementation means that BLIS will call your
sandbox for *all* problem sizes and shapes, for *all* datatypes supported
by BLIS. If you intend to only implement a subset of this functionality
within your sandbox, you should be sure to redirect execution back into
the core framework for the parts that you don't wish to reimplement yourself.

As `configure` runs, you should get output that includes lines
similar to:
Expand All @@ -67,13 +56,12 @@ configure: sandbox/gemmlike
And when you build BLIS, the last files to be compiled will be the source
code in the specified sandbox:
```
Compiling obj/haswell/sandbox/gemmlike/bli_gemmnat.o ('haswell' CFLAGS for sandboxes)
Compiling obj/haswell/sandbox/gemmlike/bls_gemm.o ('haswell' CFLAGS for sandboxes)
Compiling obj/haswell/sandbox/gemmlike/bls_gemm_bp_var1.o ('haswell' CFLAGS for sandboxes)
...
```
That's it! After the BLIS library is built, it will contain your chosen
sandbox's implementation of `bli_gemmnat()` instead of the default
sandbox's implementation of `bli_gemm_ex()` instead of the default BLIS
implementation.

## Sandbox rules
Expand All @@ -97,15 +85,15 @@ Note that `blis.h` already contains all of its definitions inside of an
`extern "C"` block, so you should be able to `#include "blis.h"` from your
C++11 source code without any issues.

3. All of your code to replace BLIS's default implementation of `bli_gemmnat()`
3. All of your code to replace BLIS's default implementation of `bli_gemm_ex()`
should reside in the named sandbox directory, or some directory therein.
(Obviously.) For example, the "gemmlike" sandbox is located in
`sandbox/gemmlike`. All of the code associated with this sandbox will be
contained within `sandbox/gemmlike`. Note that you absolutely *may* include
additional code and interfaces within the sandbox, if you wish -- code and
interfaces that are not directly or indirectly needed for satisfying the
the "contract" set forth by the sandbox (i.e., including a local definition
of`bli_gemmnat()`).
of`bli_gemm_ex()`).

4. The *only* header file that is required of your sandbox is `bli_sandbox.h`.
It must be named `bli_sandbox.h` because `blis.h` will `#include` this file
Expand All @@ -119,12 +107,12 @@ you should only place things (e.g. prototypes or type definitions) in
(b) an *application* that calls your sandbox-enabled BLIS library.
Usually, neither of these situations will require any of your local definitions
since those local definitions are only needed to define your sandbox
implementation of `bli_gemmnat()`, and this function is already prototyped by
implementation of `bli_gemm_ex()`, and this function is already prototyped by
BLIS. *But if you are adding additional APIs and/or operations to the sandbox
that are unrelated to `bli_gemmnat()`, then you'll want to `#include` those
that are unrelated to `bli_gemm_ex()`, then you'll want to `#include` those
function prototypes from within `bli_sandbox.h`*

5. Your definition of `bli_gemmnat()` should be the **only function you define**
5. Your definition of `bli_gemm_ex()` should be the **only function you define**
in your sandbox that begins with `bli_`. If you define other functions that
begin with `bli_`, you risk a namespace collision with existing framework
functions. To guarantee safety, please prefix your locally-defined sandbox
Expand All @@ -147,9 +135,9 @@ For example, with a BLIS sandbox you **can** do the following kinds of things:
kernels, which can already be customized within each sub-configuration);
- try inlining your functions manually;
- pivot away from using `obj_t` objects at higher algorithmic level (such as
immediately after calling `bli_gemmnat()`) to try to avoid some overhead;
immediately after calling `bli_gemm_ex()`) to try to avoid some overhead;
- create experimental implementations of new BLAS-like operations (provided
that you also provide an implementation of `bli_gemmnat()`).
that you also provide an implementation of `bli_gemm_ex()`).

You **cannot**, however, use a sandbox to do the following kinds of things:
- define new datatypes (half-precision, quad-precision, short integer, etc.)
Expand All @@ -167,17 +155,17 @@ Another important limitation is the fact that the build system currently uses
# Example framework CFLAGS used by 'haswell' sub-configuration
-O3 -Wall -Wno-unused-function -Wfatal-errors -fPIC -std=c99
-D_POSIX_C_SOURCE=200112L -I./include/haswell -I./frame/3/
-I./frame/ind/ukernels/ -I./frame/1m/ -I./frame/1f/ -I./frame/1/
-I./frame/include -DBLIS_VERSION_STRING=\"0.3.2-51\"
-I./frame/1m/ -I./frame/1f/ -I./frame/1/ -I./frame/include
-DBLIS_VERSION_STRING=\"0.3.2-51\"
```
which are likely more general-purpose than the `CFLAGS` used for, say,
optimized kernels or even reference kernels.
```
# Example optimized kernel CFLAGS used by 'haswell' sub-configuration
-O3 -mavx2 -mfma -mfpmath=sse -march=core-avx2 -Wall -Wno-unused-function
-Wfatal-errors -fPIC -std=c99 -D_POSIX_C_SOURCE=200112L -I./include/haswell
-I./frame/3/ -I./frame/ind/ukernels/ -I./frame/1m/ -I./frame/1f/ -I./frame/1/
-I./frame/include -DBLIS_VERSION_STRING=\"0.3.2-51\"
-I./frame/3/ -I./frame/1m/ -I./frame/1f/ -I./frame/1/ -I./frame/include
-DBLIS_VERSION_STRING=\"0.3.2-51\"
```
(To see precisely which flags are being employed for any given file, enable
verbosity at compile-time via `make V=1`.) Compiling sandboxes with these more
Expand Down
7 changes: 1 addition & 6 deletions docs/Testsuite.md
Original file line number Diff line number Diff line change
Expand Up @@ -128,11 +128,6 @@ sdcz # Datatype(s) to test:
300 # Problem size: maximum to test
100 # Problem size: increment between experiments
# Complex level-3 implementations to test
1 # 3mh ('1' = enable; '0' = disable)
1 # 3m1 ('1' = enable; '0' = disable)
1 # 4mh ('1' = enable; '0' = disable)
1 # 4m1b ('1' = enable; '0' = disable)
1 # 4m1a ('1' = enable; '0' = disable)
1 # 1m ('1' = enable; '0' = disable)
1 # native ('1' = enable; '0' = disable)
1 # Simulate application-level threading:
Expand Down Expand Up @@ -169,7 +164,7 @@ _**Test gemm with mixed-precision operands?**_ This boolean determines whether `

_**Problem size.**_ These values determine the first problem size to test, the maximum problem size to test, and the increment between problem sizes. Note that the maximum problem size only bounds the range of problem sizes; it is not guaranteed to be tested. Example: If the initial problem size is 128, the maximum is 1000, and the increment is 64, then the last problem size to be tested will be 960.

_**Complex level-3 implementations to test.**_ With the exception of the switch marked `native`, these switches control whether experimental complex domain implementations are tested (when applicable). These implementations employ induced methods complex matrix multiplication and apply to some (though not all) of the level-3 operations. If you don't know what these are, you can ignore them. The `native` switch corresponds to native execution of complex domain level-3 operations, which we test by default. We also test the `1m` method, since it is the induced method of choice when complex microkernels are not available. Note that all of these induced method tests (including `native`) are automatically disabled if the `c` and `z` datatypes are disabled.
_**Complex level-3 implementations to test.**_ This section lists which complex domain implementations of level-3 operations are tested. If you don't know what these are, you can ignore them. The `native` switch corresponds to native execution of complex domain level-3 operations, which we test by default. We also test the `1m` method, since it is the induced method of choice when optimized complex microkernels are not available. Note that all of these induced method tests (including `native`) are automatically disabled if the `c` and `z` datatypes are disabled.

_**Simulate application-level threading.**_ This setting specifies the number of threads the testsuite will spawn, and is meant to allow the user to exercise BLIS as a multithreaded application might if it were to make multiple concurrent calls to BLIS operations. (Note that the threading controlled by this option is orthogonal to, and has no effect on, whatever multithreading may be employed _within_ BLIS, as specified by the environment variables described in the [Multithreading](Multithreading.md) documentation.) When this option is set to 1, the testsuite is run with only one thread. When set to n > 1 threads, the spawned threads will parallelize (in round-robin fashion) the total set of tests specified by the testsuite input files, executing them in roughly the same order as that of a sequential execution.

Expand Down
26 changes: 0 additions & 26 deletions frame/1m/bli_l1m_ft_ker.h
Original file line number Diff line number Diff line change
Expand Up @@ -110,28 +110,6 @@ typedef void (*PASTECH3(ch,opname,_ker,tsuf)) \

INSERT_GENTDEF( unpackm_cxk )

// packm_3mis_ker
// packm_4mi_ker

#undef GENTDEF
#define GENTDEF( ctype, ch, opname, tsuf ) \
\
typedef void (*PASTECH3(ch,opname,_ker,tsuf)) \
( \
conj_t conja, \
dim_t cdim, \
dim_t n, \
dim_t n_max, \
ctype* restrict kappa, \
ctype* restrict a, inc_t inca, inc_t lda, \
ctype* restrict p, inc_t is_p, inc_t ldp, \
cntx_t* restrict cntx \
);

INSERT_GENTDEF( packm_cxk_3mis )
INSERT_GENTDEF( packm_cxk_4mi )

// packm_rih_ker
// packm_1er_ker

#undef GENTDEF
Expand All @@ -150,12 +128,8 @@ typedef void (*PASTECH3(ch,opname,_ker,tsuf)) \
cntx_t* restrict cntx \
);

INSERT_GENTDEF( packm_cxk_rih )
INSERT_GENTDEF( packm_cxk_1er )





#endif

45 changes: 0 additions & 45 deletions frame/1m/bli_l1m_ker.h
Original file line number Diff line number Diff line change
Expand Up @@ -74,51 +74,6 @@ INSERT_GENTPROT_BASIC0( unpackm_14xk_ker_name )
INSERT_GENTPROT_BASIC0( unpackm_16xk_ker_name )


// 3mis packm kernels

#undef GENTPROT
#define GENTPROT PACKM_3MIS_KER_PROT

INSERT_GENTPROT_BASIC0( packm_2xk_3mis_ker_name )
INSERT_GENTPROT_BASIC0( packm_4xk_3mis_ker_name )
INSERT_GENTPROT_BASIC0( packm_6xk_3mis_ker_name )
INSERT_GENTPROT_BASIC0( packm_8xk_3mis_ker_name )
INSERT_GENTPROT_BASIC0( packm_10xk_3mis_ker_name )
INSERT_GENTPROT_BASIC0( packm_12xk_3mis_ker_name )
INSERT_GENTPROT_BASIC0( packm_14xk_3mis_ker_name )
INSERT_GENTPROT_BASIC0( packm_16xk_3mis_ker_name )


// 4mi packm kernels

#undef GENTPROT
#define GENTPROT PACKM_4MI_KER_PROT

INSERT_GENTPROT_BASIC0( packm_2xk_4mi_ker_name )
INSERT_GENTPROT_BASIC0( packm_4xk_4mi_ker_name )
INSERT_GENTPROT_BASIC0( packm_6xk_4mi_ker_name )
INSERT_GENTPROT_BASIC0( packm_8xk_4mi_ker_name )
INSERT_GENTPROT_BASIC0( packm_10xk_4mi_ker_name )
INSERT_GENTPROT_BASIC0( packm_12xk_4mi_ker_name )
INSERT_GENTPROT_BASIC0( packm_14xk_4mi_ker_name )
INSERT_GENTPROT_BASIC0( packm_16xk_4mi_ker_name )


// rih packm kernels

#undef GENTPROT
#define GENTPROT PACKM_RIH_KER_PROT

INSERT_GENTPROT_BASIC0( packm_2xk_rih_ker_name )
INSERT_GENTPROT_BASIC0( packm_4xk_rih_ker_name )
INSERT_GENTPROT_BASIC0( packm_6xk_rih_ker_name )
INSERT_GENTPROT_BASIC0( packm_8xk_rih_ker_name )
INSERT_GENTPROT_BASIC0( packm_10xk_rih_ker_name )
INSERT_GENTPROT_BASIC0( packm_12xk_rih_ker_name )
INSERT_GENTPROT_BASIC0( packm_14xk_rih_ker_name )
INSERT_GENTPROT_BASIC0( packm_16xk_rih_ker_name )


// 1e/1r packm kernels

#undef GENTPROT
Expand Down
52 changes: 0 additions & 52 deletions frame/1m/bli_l1m_ker_prot.h
Original file line number Diff line number Diff line change
Expand Up @@ -70,58 +70,6 @@ void PASTEMAC(ch,varname) \
);


// 3mis packm kernels

#define PACKM_3MIS_KER_PROT( ctype, ch, varname ) \
\
void PASTEMAC(ch,varname) \
( \
conj_t conja, \
dim_t cdim, \
dim_t n, \
dim_t n_max, \
ctype* restrict kappa, \
ctype* restrict a, inc_t inca, inc_t lda, \
ctype* restrict p, inc_t is_p, inc_t ldp, \
cntx_t* restrict cntx \
);


// 4mi packm kernels

#define PACKM_4MI_KER_PROT( ctype, ch, varname ) \
\
void PASTEMAC(ch,varname) \
( \
conj_t conja, \
dim_t cdim, \
dim_t n, \
dim_t n_max, \
ctype* restrict kappa, \
ctype* restrict a, inc_t inca, inc_t lda, \
ctype* restrict p, inc_t is_p, inc_t ldp, \
cntx_t* restrict cntx \
);


// rih packm kernels

#define PACKM_RIH_KER_PROT( ctype, ch, varname ) \
\
void PASTEMAC(ch,varname) \
( \
conj_t conja, \
pack_t schema, \
dim_t cdim, \
dim_t n, \
dim_t n_max, \
ctype* restrict kappa, \
ctype* restrict a, inc_t inca, inc_t lda, \
ctype* restrict p, inc_t ldp, \
cntx_t* restrict cntx \
);


// 1e/1r packm kernels

#define PACKM_1ER_KER_PROT( ctype, ch, varname ) \
Expand Down
6 changes: 0 additions & 6 deletions frame/1m/packm/bli_packm.h
Original file line number Diff line number Diff line change
Expand Up @@ -43,15 +43,9 @@
#include "bli_packm_var.h"

#include "bli_packm_struc_cxk.h"
#include "bli_packm_struc_cxk_4mi.h"
#include "bli_packm_struc_cxk_3mis.h"
#include "bli_packm_struc_cxk_rih.h"
#include "bli_packm_struc_cxk_1er.h"

#include "bli_packm_cxk.h"
#include "bli_packm_cxk_4mi.h"
#include "bli_packm_cxk_3mis.h"
#include "bli_packm_cxk_rih.h"
#include "bli_packm_cxk_1er.h"

// Mixed datatype support.
Expand Down
Loading

0 comments on commit a538807

Please sign in to comment.