Text this: Intuitive Multi‐Scale Visual Feature Fusion for Automated Classification of Human Activity