Similar Items: Beyond accuracy metrics: Toward responsible integration of large language models in pediatric diagnostic reasoning