Efficient Parallel Merge Sort for Fixed and Variable Length Keys
Andrew Davidson - University of California, Davis David Tarjan - NVidia Corporation Michael Garland - NVidia Corporation John Owens - University of California, Davis Abstract: We design a high-performance parallel merge sort for highly parallel systems. Our merge sort is designed to use more register communication (not shared memory), and does not suffer from oversegmentation as opposed to previous comparison based sorts. Using these techniques we are able to achieve a sorting rate of 250 MKeys/sec, which is about 2.5 times faster than Thrust merge sort performance, and 70% faster than non-stable state-of-the-art GPU merge sorts. Building on this sorting algorithm, we develop a scheme for sorting variable-length key/value pairs, with a special emphasis on string keys. Sorting non-uniform, unaligned data such as strings is a fundamental step in a variety of algorithms, yet it has received comparatively little attention. To our knowledge, our system is the first published description of an efficient string sort for GPUs. We are able to sort strings at a rate of 70 MStrings/s on an NVidia GTX 580 on one dataset, and up to 1.25 GB/s on another dataset.