Histogram Computations on GPUs Kernel using Global and Shared Memory Atomics
Keywords:
GPUs, Histograms, CUDA, Global Memory, Shared MemoryAbstract
In this paper we implement histogram computations on a Graphics Processing Unit (GPU). Our Histogram computations is implemented using compute unified device architecture (CUDA) which is a minimal extension to C/C++. In this development Histogram computations, computed on GPU’s global memory as well as on shared memory. We also perform Histogram computations on CPU and consider it as a baseline performance. Experimental results demonstrate that shared memory in GPU gives seven times speedup over our baseline CPU.
References
. J. D. Owens, D. Luebke, N. Govindaraju, M. Harris, J. KrÄuger, A. E. Lefohn, and T. Pur- cell, “A Survey of General-Purpose Computation on Graphics Hardware," Computer Graphics Forum, vol. 26, pp. 80-113, Mar. 2007.
. NVIDIA Corporation, CUDA: Compute Unified Device Architecture Programming Guide," tech. rep., NVIDIA, 2007.
. S. Sengupta, M. Harris, Y. Zhang, and J. D. Owens, “Scan Primitives for GPU Computing,"inGH '07: Proceedings of the 22nd ACSIGGRAPH/EUROGRAPHICS Symposium onGraphics Hardware, Switzerland, Eurographics Association, 2007, pp. 97-106.
. M. Harris, J. Owens, S. Sengupta, Y. Zhang, and A. Davidson. “CUDPP: CUDA Data Parallel Primitives Library".
. P. Harish and P. Narayanan, “Accelerating Large Graph Algorithms on the GPU Using CUDA" in High Performance Computing HiPC 2007, pp. 197-208.
. H. Nguyen, GPU Gems 3. Addison-Wesley Professional, 2007.
. Y. Luo and R. Duraiswami, “Canny Edge Detection on Nvidia CUDA" in Proc. of IEEE Computer Vision and Pattern Recognition, 2008, pp. 1-8.
. V. Podlozhnyuk, “64-bin histogram”, NVIDIA, Tech. Rep., 2007.
. K. H. Knuth, “Optimal data-based binning for histograms,” ArXiv Physics e-prints, May 2006.
. Compute Unified Device Architecture (CUDA) Programming Guide.
. NVIDIA. CUDA Compute Unified Device Architecture Programming Guide 2.0, July 2008.
. C. Ling, K. Benkrid, and T. Hamada, “A parameterisable and scalable smith-waterman algorithm implementation on cuda- compatible gpus,” Application Specific Processors, 2009. SASP ’09. IEEE 7th Symposium on, pp. 94–100, jul. 2009.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors contributing to this journal agree to publish their articles under the Creative Commons Attribution 4.0 International License, allowing third parties to share their work (copy, distribute, transmit) and to adapt it, under the condition that the authors are given credit and that in the event of reuse or distribution, the terms of this license are made clear.