Text this: Sequence based virus host prediction: a curated dataset and generalizable framework for training artificial intelligence to identify viruses of humans