Wednesday, March 27, 2013

C++ AMP: How fast is it?

Category: Windows Store app, C++ AMP, GPU Programming
Prerequisites: C++, C++ AMP

Full Text in Thai (PDF 1.13 MB)

This study measures time used in millisecond for calculating square matrix multiplication at different dimension sizes starting from 256x256 to 2048x2048. The C++ AMP tested engines are two GPUs (Intel HD Graphics 4000 and NVIDIA Geforce GT-650M) and one software engine (Microsoft Basic Render Driver).  Two C++ AMP methods (simple and tiling) are used.  The study also measures time used by normal sequential code for using as a baseline comparison. The testing software is C++ Windows Store app running on Intel i7 RAM 8 MB.


The figure above shows Windows Store app used in this study. It also has 3D rotating cube in background for testing with DirectX.



This is the result table of time used measured in milliseconds. The table also shows computed ratio (in red) comparing between MS Basic Render Driver and Sequential code and also between both GPUs and MS Basic Render Driver. The size means the matrix dimension starting from 256x256 to 2048x2048.


From the above figure, MS Render Driver speed is around five to ten times comparing with sequential code when using simple method and five to twenty times when using tile method.


When using simple method, NVIDIA's speed is ten to thirty times comparing with MS Render Driver while Intel's speed is around five times comparing with MS Render Driver.


When using tile method, NVIDIA's speed is from fifteen to twenty five times comparing with MS Render Driver while Intel's speed is around five times comparing with MS Render Driver.