Similar Items: Efficient and Portable Support for Overdecomposition on Distributed Memory GPGPU Platforms