GitHub - RheagalFire/CUDA_Programming: Executing Operations in Parallel using GPU with the help of CUDA

CUDA Programming

The GPU architecture has many blocks which in turn contains multiple threads which are capable of executing operations in parallel.
GPU is optimized for throughput, but not necessarily for latency.
Each GPU core is slow but there are thousands of it.
GPU works well for massively parallel tasks such as matrix multiplication, but it can be quite inefficient for tasks where massive parallelization is impossible or difficult.

These are the main steps to run you programme on parallel threads of GPU

Initate The Input Data in HOST(CPU)
Allocate Memory on Device(GPU) for input and output variables
Copy the Input data from HOST to DEVICE
Launch a kernel (call the GPU code)
Copy the output from DEVICE to HOST
FREE the Allocated memory on GPU

Programme	Links
Square of numbers	Click Here
Adding Vectors	Click Here
Barrier Synchronisation	Click Here
Vector Multiplication	Click Here

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Barrier_sync.cu		Barrier_sync.cu
README.md		README.md
Vector_Add.cu		Vector_Add.cu
square.cu		square.cu
vec_mult.cu		vec_mult.cu

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CUDA Programming

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CUDA Programming

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages