ergo
|
Matrix multiplication template for architectures with SSE2 or higher and compilers that support C++ intrinsics for access to SSE instructions. More...
#include <mm_kernel_inner_sse2_A.h>
Classes | |
struct | Loop |
struct | Loop< T_end, T_end > |
class | Pack |
Template for packing of matrix elements. More... | |
Public Types | |
typedef T_real | real |
Real number type (usually float or double) More... | |
typedef Pack< M, K, Ordering_col_wise, 1 > | Pack_type_A |
Type that can (should) be used to pack A. More... | |
typedef Pack< K, N, Ordering_row_wise, floats_per_register > | Pack_type_B |
Type that can (should) be used to pack B. More... | |
typedef Pack< M, N, Ordering_col_wise, 1 > | Pack_type_C |
Type that can (should) be used to pack C. More... | |
Static Public Member Functions | |
static void | exec (real const *const *const A, real const *const *const B, real *const C, int const i=1, int const offset_A=0, int const offset_B=0, int const offset_C=0) |
Executes the matrix-matrix multiply C += A B with the three matrices A, B, and C stored according to the static members and typedefs of this class. More... | |
template<int T_offset_A, int T_offset_B, int T_offset_C> | |
static void | exec (real const *const *const A, real const *const *const B, real *const C, int const i=1) |
Static Public Attributes | |
static int const | M = T_M |
Number of rows of A and C. More... | |
static int const | N = T_N |
Number of columns of B and C. More... | |
static int const | K = T_K |
Number of columns of A and rows of B. More... | |
Static Protected Attributes | |
static int const | floats_per_register = ( sizeof(T_reg) / sizeof(real) ) |
Number of real numbers that fit in one register. More... | |
Matrix multiplication template for architectures with SSE2 or higher and compilers that support C++ intrinsics for access to SSE instructions.
Choice of template parameters:
The public typedefs and static members specify how the matrices must be stored.
typedef Pack< M, K, Ordering_col_wise, 1 > MM_kernel_inner_sse2_A< T_real, T_reg, T_M, T_N, T_K >::Pack_type_A |
Type that can (should) be used to pack A.
typedef Pack< K, N, Ordering_row_wise, floats_per_register > MM_kernel_inner_sse2_A< T_real, T_reg, T_M, T_N, T_K >::Pack_type_B |
Type that can (should) be used to pack B.
typedef Pack< M, N, Ordering_col_wise, 1 > MM_kernel_inner_sse2_A< T_real, T_reg, T_M, T_N, T_K >::Pack_type_C |
Type that can (should) be used to pack C.
typedef T_real MM_kernel_inner_sse2_A< T_real, T_reg, T_M, T_N, T_K >::real |
Real number type (usually float or double)
|
static |
References A, B, and STATIC_ASSERT_DEBUG.
|
static |
Executes the matrix-matrix multiply C += A B with the three matrices A, B, and C stored according to the static members and typedefs of this class.
References A, B, and STATIC_ASSERT_DEBUG.
|
staticprotected |
Number of real numbers that fit in one register.
Referenced by MM_kernel_inner_sse2_A< T_real, T_reg, T_M, T_N, T_K >::Loop< T_loop_index, T_end >::add(), MM_kernel_inner_sse2_A< T_real, T_reg, T_M, T_N, T_K >::Loop< T_loop_index, T_end >::inner(), MM_kernel_inner_sse2_A< T_real, T_reg, T_M, T_N, T_K >::Loop< T_loop_index, T_end >::middle(), MM_kernel_inner_sse2_A< T_real, T_reg, T_M, T_N, T_K >::Loop< T_loop_index, T_end >::outer(), and MM_kernel_inner_sse2_A< T_real, T_reg, T_M, T_N, T_K >::Loop< T_loop_index, T_end >::store().
|
static |
Number of columns of A and rows of B.
|
static |
Number of rows of A and C.
|
static |
Number of columns of B and C.