ergo
MM_kernel_inner_sse2_A< T_real, T_reg, T_M, T_N, T_K > Class Template Reference

Matrix multiplication template for architectures with SSE2 or higher and compilers that support C++ intrinsics for access to SSE instructions. More...

#include <mm_kernel_inner_sse2_A.h>

Classes

struct  Loop
 
struct  Loop< T_end, T_end >
 
class  Pack
 Template for packing of matrix elements. More...
 

Public Types

typedef T_real real
 Real number type (usually float or double) More...
 
typedef Pack< M, K, Ordering_col_wise, 1 > Pack_type_A
 Type that can (should) be used to pack A. More...
 
typedef Pack< K, N, Ordering_row_wise, floats_per_registerPack_type_B
 Type that can (should) be used to pack B. More...
 
typedef Pack< M, N, Ordering_col_wise, 1 > Pack_type_C
 Type that can (should) be used to pack C. More...
 

Static Public Member Functions

static void exec (real const *const *const A, real const *const *const B, real *const C, int const i=1, int const offset_A=0, int const offset_B=0, int const offset_C=0)
 Executes the matrix-matrix multiply C += A B with the three matrices A, B, and C stored according to the static members and typedefs of this class. More...
 
template<int T_offset_A, int T_offset_B, int T_offset_C>
static void exec (real const *const *const A, real const *const *const B, real *const C, int const i=1)
 

Static Public Attributes

static int const M = T_M
 Number of rows of A and C. More...
 
static int const N = T_N
 Number of columns of B and C. More...
 
static int const K = T_K
 Number of columns of A and rows of B. More...
 

Static Protected Attributes

static int const floats_per_register = ( sizeof(T_reg) / sizeof(real) )
 Number of real numbers that fit in one register. More...
 

Detailed Description

template<typename T_real, typename T_reg, int T_M, int T_N, int T_K>
class MM_kernel_inner_sse2_A< T_real, T_reg, T_M, T_N, T_K >

Matrix multiplication template for architectures with SSE2 or higher and compilers that support C++ intrinsics for access to SSE instructions.


Choice of template parameters:

  • T_M and T_N should be chosen so that the T_M x T_N matrix C
    fits in registers. For example T_M == T_N == 4
  • T_K should be chosen so that the generated code fits in L1 instruction cache. For example T_K == 128.
  • T_real and T_reg must go together. Example:
    • <T_real, T_reg> == <double, __m128d>
    • <T_real, T_reg> == <float, __m128>

The public typedefs and static members specify how the matrices must be stored.

Member Typedef Documentation

◆ Pack_type_A

template<typename T_real , typename T_reg , int T_M, int T_N, int T_K>
typedef Pack< M, K, Ordering_col_wise, 1 > MM_kernel_inner_sse2_A< T_real, T_reg, T_M, T_N, T_K >::Pack_type_A

Type that can (should) be used to pack A.

◆ Pack_type_B

template<typename T_real , typename T_reg , int T_M, int T_N, int T_K>
typedef Pack< K, N, Ordering_row_wise, floats_per_register > MM_kernel_inner_sse2_A< T_real, T_reg, T_M, T_N, T_K >::Pack_type_B

Type that can (should) be used to pack B.

◆ Pack_type_C

template<typename T_real , typename T_reg , int T_M, int T_N, int T_K>
typedef Pack< M, N, Ordering_col_wise, 1 > MM_kernel_inner_sse2_A< T_real, T_reg, T_M, T_N, T_K >::Pack_type_C

Type that can (should) be used to pack C.

◆ real

template<typename T_real , typename T_reg , int T_M, int T_N, int T_K>
typedef T_real MM_kernel_inner_sse2_A< T_real, T_reg, T_M, T_N, T_K >::real

Real number type (usually float or double)

Member Function Documentation

◆ exec() [1/2]

template<typename real , typename T_reg , int T_M, int T_N, int T_K>
template<int T_offset_A, int T_offset_B, int T_offset_C>
void MM_kernel_inner_sse2_A< real, T_reg, T_M, T_N, T_K >::exec ( real const *const *const  A,
real const *const *const  B,
real *const  C,
int const  i = 1 
)
static

References A, B, and STATIC_ASSERT_DEBUG.

◆ exec() [2/2]

template<typename real , typename T_reg , int T_M, int T_N, int T_K>
void MM_kernel_inner_sse2_A< real, T_reg, T_M, T_N, T_K >::exec ( real const *const *const  A,
real const *const *const  B,
real *const  C,
int const  i = 1,
int const  offset_A = 0,
int const  offset_B = 0,
int const  offset_C = 0 
)
static

Executes the matrix-matrix multiply C += A B with the three matrices A, B, and C stored according to the static members and typedefs of this class.

References A, B, and STATIC_ASSERT_DEBUG.

Member Data Documentation

◆ floats_per_register

◆ K

template<typename T_real , typename T_reg , int T_M, int T_N, int T_K>
int const MM_kernel_inner_sse2_A< T_real, T_reg, T_M, T_N, T_K >::K = T_K
static

Number of columns of A and rows of B.


◆ M

template<typename T_real , typename T_reg , int T_M, int T_N, int T_K>
int const MM_kernel_inner_sse2_A< T_real, T_reg, T_M, T_N, T_K >::M = T_M
static

Number of rows of A and C.


◆ N

template<typename T_real , typename T_reg , int T_M, int T_N, int T_K>
int const MM_kernel_inner_sse2_A< T_real, T_reg, T_M, T_N, T_K >::N = T_N
static

Number of columns of B and C.



The documentation for this class was generated from the following file: