Text this: Enhanced Data Sampling and Feature Generation for Machine Learning-based Lithography Hotspot Detection