Text this: Optimized graph convolutional shunted self-attention neural network for multilingual speech-to-text training using cross-language voice conversion of speech representations