Text this: Dataset selection for aggregate model implementation in predictive data mining