Abstract: Leveraging sparsity in deep neural network (DNN) models holds significant promise for accelerating model inference. However, current GPUs can only harness sparsity in model weights, leaving ...