Text this: Feature-optimized hybrid CNN–ViT architecture for sustainable vision-based condition assessment in agriculture