In a single instruction, multiple thread (SIMT) execution model, threads are grouped into sets of 32 threads and each group is called a warp. If a warp encounters a conditional statement or branch, its threads can be diverged and serialized to execute each condition. This is called branch divergence, which impacts performance significantly.
CUDA warp divergence refers to such CUDA threads' divergent operation in a warp. If the conditional branch has an if-else structure and a warp has this warp divergence, all CUDA threads have an active and inactive operation part for the branched code block.
The following figure shows a warp divergence effect in a CUDA warp. CUDA threads that are not in the idle condition and reduce the efficient use of GPU threads:
As more of the branched part becomes significant, the GPU scheduling...