Skip to content

RheagalFire/CUDA_Programming

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CUDA Programming

The GPU architecture has many blocks which in turn contains multiple threads which are capable of executing operations in parallel.
GPU is optimized for throughput, but not necessarily for latency.
Each GPU core is slow but there are thousands of it.
GPU works well for massively parallel tasks such as matrix multiplication, but it can be quite inefficient for tasks where massive parallelization is impossible or difficult.

These are the main steps to run you programme on parallel threads of GPU

  • Initate The Input Data in HOST(CPU)
  • Allocate Memory on Device(GPU) for input and output variables
  • Copy the Input data from HOST to DEVICE
  • Launch a kernel (call the GPU code)
  • Copy the output from DEVICE to HOST
  • FREE the Allocated memory on GPU
Programme Links
Square of numbers Click Here
Adding Vectors Click Here
Barrier Synchronisation Click Here
Vector Multiplication Click Here

About

Executing Operations in Parallel using GPU with the help of CUDA

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages