Text this: SST-Net: Self-Supervised Violence Detection Using Spatiotemporal Modeling and Adaptive Background Suppression