Load a kernel created with Cuda/OpenCL and compiled with gpuBuild function.
func=gpuLoadFunction(bin,"kernelname",block_height,block_width,grid_height,grid_width);
Matrix of string which contain the path of the file containing the kernel and the language of kernels writing (Cuda or OpenCL).
Name of the kernel
Height of a block
Width of a block
Height of the computing grid (in block)
Width of the computing grid (in block)
The Cuda/OpenCL kernel.
func=gpuLoadFunction(bin,kernelname)
loads
a kernel from a compiling file. Then the returned value can directly called in Scilab.
Inputs argument are input arguments of the kernel. Only device pointer, scalar double or scalar integer can be used.
Optional arguments can be used to modify block/grid dimensions.
bin
is an absolute path to a compiling file. The
user must ckeck that its hardware meets kernel requirement (that is, a ptx
build for 1.3 compute capable gpu card will return false results when
launched on a 1.1 compute capable gpu card.)
kernelname
is the name of the kernel inside the
PTX file. It is the name given to the corresponding function in C for gpu
source code, but can be mangled. To avoid name mangling, please add the
extern "C" attribute before declaring the kernel.
//--matrixAdd.cu-- extern "C" __global__ void matrixAdd( double* C, double* A, double* B, int M, int N) { int idx = blockIdx.x; int idy = blockIdx.y; int dx = blockDim.x; int dy = blockDim.y; int tx = threadIdx.x; int ty = threadIdx.y; int x=tx+dx*idx; int y=ty+dy*idy; if(x<M && y<N) C[ x + y*M ]= A[ x + y*M ] + B[ x+ y*M ]; } //--Scilab script-- A=ones(16,16);gA=gpuSetData(A) B=2*ones(16,16);gB=gpuSetData(B) C=0*ones(16,16);gC=gpuSetData(C) bin=gpuBuild(gpuPATH+"/tests/unit_tests/"+"matrixAdd"); func=gpuLoadFunction(bin,"matrixAdd",16,16,1,1) // call the kernel with block/grid dimensions setted in gpuLoadFunction call. func(gC,gB,gA,int32(16),int32(16)) C=gpuGetData(gC) // call the kernel with new block/grid dimensions. func(gC,gB,gA,int32(10),int32(10), blockX=10, blockY=10, gridX=1, gridY=1) C=gpuGetData(gC) clear gA; clear gB; clear gC; gpuExit(); | ![]() | ![]() |