Skip to content

parallelcodefoundry/GPA-Benchmark

 
 

Repository files navigation

GPA-Benchmark

Benchmark applications for GPU Performance Advisor

Experimental Result

Platform: Volta V100

Application Kernel Optimization Original Optimized Speedup Estimate Speedup Error
rodinia/backprop bpnn_layerforward_CUDA Warp Balance 18.10us 15.36us 1.18x 1.21x 2%
rodinia/backprop bpnn_layerforward_CUDA Strength Reduction 15.32us 12.63us 1.21x 1.13x 7%
rodinia/bfs Kernel Loop Unrolling 578.28us 508.54us 1.14x 1.59x 28%
rodinia/b+tree findRangeK Code Reorder 53.29us 46.40us 1.15x 1.28x 10%
rodinia/cfd cuda_compute_flux Fast Math 187.53ms 128.37ms 1.46x 1.54x 5%
rodinia/gaussian Fan2 Thread Increase 116.76ms 30.21ms 3.86x 3.33x 16%
rodinia/heartwall kernel Loop Unrolling 49.03ms 42.35ms 1.16x 1.15x 1%
rodinia/hotspot calculate_temp Strength Reduction 15.45us 13.40us 1.15x 1.10x 5%
rodinia/huffman vlc_encode_kernel_sm64huff Warp Balance 133.24us 121.59us 1.10x 1.17x 6%
rodinia/kmeans kmeansPoint Loop Unrolling 787.14us 700.73us 1.12x 1.21x 7%
rodinia/lavaMD kernel_gpu_cuda Loop Unrolling 4.07ms 3.61ms 1.11x 1.12x 1%
rodinia/lud lud_diagonal Code Reorder 221.81us 162.96us 1.36x 1.48x 8%
rodinia/myocyte solver_2 Fast Math 308.55ms 259.63ms 1.19x 1.13x 5%
rodinia/myocyte solver_2 Function Spliting 259.69ms 254.47ms 1.02x 1.03x 1%
rodinia/nw needle_cuda_shared_1 Warp Balance 840.70us 762.70us 1.10x 1.09x 1%
rodinia/particlefilter likelihood_kernel Block Increase 2.34ms 1.21ms 1.92x 1.93x 1%
rodinia/streamcluster kernel_compute_cost Block Increase 21.51ms 14.17ms 1.52x 1.46x 4%
rodinia/sradv1 reduce Warp Balance 2.01ms 1.95ms 1.03x 1.16x 11%
rodinia/pathfinder dynproc_kernel Code Reorder 93.48us 88.67us 1.05x 1.23x 15%
Quicksilver CycleTrackingKernel Function Inlining 1.18s 1.05s 1.12x 1.18x 5%
Quicksilver CycleTrackingKernel Register Reuse 1.05s 1.02s 1.03x 1.04x 1%
ExaTENSOR tensor_transpose Strength Reduction 5.46ms 5.08ms 1.07x 1.06x 1%
ExaTENSOR tensor_transpose Memory Transaction Reduction 5.08ms 4.91ms 1.03x 1.05x 2%
PeleC pc_expl_reactions Block Increase 440.12ms 370.34ms 1.19x 1.23x 3%
Minimod target_pml_3d Fast Math 89.12ms 86.31ms 1.03x 1.09x 6%
Minimod target_pml_3d Code Reorder 86.31ms 82.07ms 1.05x 1.10x 5%

About

Benchmark applications for GPU Performance Advisor

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Cuda 60.6%
  • C 24.6%
  • C++ 11.8%
  • Makefile 2.3%
  • Other 0.7%