Text this: Modeling spatiotemporal chromatic energy using attention-enhanced CNN-LSTM networks for deepfake video detection