Matrix Transpose on GPU using CUDA

The following sample demonstrates matrix transpose on GPU. It starts with sequential code on the CPU and progresses towards more advanced optimizations, first a parallel transformation on the CPU, then several transformations on the GPU.

