Text this: Optimization and Generalization of Gradient Descent for Shallow ReLU Networks with Minimal Width