Text this: MViT: A vision transformer with fractal path reordering and dynamic positional encoding