Text this: Application of an improved vision transformer and optical flow fusion algorithm for human action recognition in human–computer interaction systems