Text this: Language identification in a highly unbalanced dataset