Text this: Salience prediction methods for video cropping in sidewalk footage