Full Text Available

Access Repository Access Repository

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Comparing the Quality of AI-generated and Instructor Feedback in a University Writing Program

Feedback is an undeniably important aspect of the language learning process. It helps students recognize their strengths and weaknesses and identifies ways they can improve. Over the years, feedback has been provided by teachers, peers and Automated Writing Evaluation (AWE) tools. However, in recent...

Full description

Saved in:

Bibliographic Details
Main Author:	Nassar, Hana Mohamed
Format:	Thesis
Published:	AUC Knowledge Fountain 2025
Subjects:	AI-generated feedback teacher feedback language instruction writing AWE scores proficiency Applied Linguistics
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1867613424495624192
access_status_str	Open Access
author	Nassar, Hana Mohamed
author_browse	Nassar, Hana Mohamed
author_facet	Nassar, Hana Mohamed
author_sort	Nassar, Hana Mohamed
collection	Thesis
description	Feedback is an undeniably important aspect of the language learning process. It helps students recognize their strengths and weaknesses and identifies ways they can improve. Over the years, feedback has been provided by teachers, peers and Automated Writing Evaluation (AWE) tools. However, in recent years, artificial intelligence applications have proliferated significantly. With abilities to analyze and generate any kind of content, these models are being used to generate scores and feedback on written assignments to help lighten teachers’ load. ChatGPT has been called “the world’s most advanced chatbot” and “a potential chance to improve second language learning and instruction” (Shabara et al., 2024). The present study aims to investigate the quality of AI-generated scores and feedback on writing in comparison to teacher scores and feedback. Using a mixed methods design, the study compared ChatGPT-generated and its regenerated scores and qualitative comments to those assigned by experienced university instructors. A total of 89 argumentative essays were collected from the archives of a private university in Egypt. ChatGPT- 4o and two human raters scored them using a rubric that evaluates writing based on four criteria: content and development, organization and connection of ideas, linguistic range and control, and communicative effect. All scores were statistically analyzed to examine the consistency and accuracy of ChatGPT in scoring. Similarly, the written feedback was thematically analyzed and compared to teacher feedback. Themes identified from the data included tone of feedback, following the rubric, prioritizing certain writing features, and providing judgmental or improvement-oriented feedback. The quantitative data revealed a moderate correlation between AI-generated and teacher scores, with the only strong relationship being in the linguistic precision criterion. The results also showed a weak consistency in ChatGPT-generated and regenerated scores. In terms of qualitative feedback, it was found to be considerably close in quality to teacher feedback. Additionally, the study tapped into the effect of writing proficiency on the nature of the feedback, and the data showed that ChatGPT did not differentiate between students based on abilities whereas the teachers did, especially in terms of tone. This lack of differentiation, however, indicates that ChatGPT’s feedback may not be as personalized to students’ needs as the teacher feedback. Implications of the study include using ChatGPT for scoring language areas and generating feedback provided that teachers revise this evaluation. Study limitations such as evaluating the effectiveness of the feedback are also discussed.
format	Thesis
id	oai:fount.aucegypt.edu:etds-3514
institution	American University in Cairo (Egypt)
last_indexed	2026-06-10T12:35:55.364Z
license_str	Not specified — see source repository
provenance_str_mv	Harvested via OAI-PMH from AUC Knowledge Fountain — bepress
publishDate	2025
publishDateRange	2025
publishDateSort	2025
publisher	AUC Knowledge Fountain
publisherStr	AUC Knowledge Fountain
record_format	dspace
source_str	AUC Knowledge Fountain — bepress
spelling	oai:fount.aucegypt.edu:etds-3514 Comparing the Quality of AI-generated and Instructor Feedback in a University Writing Program Nassar, Hana Mohamed Feedback is an undeniably important aspect of the language learning process. It helps students recognize their strengths and weaknesses and identifies ways they can improve. Over the years, feedback has been provided by teachers, peers and Automated Writing Evaluation (AWE) tools. However, in recent years, artificial intelligence applications have proliferated significantly. With abilities to analyze and generate any kind of content, these models are being used to generate scores and feedback on written assignments to help lighten teachers’ load. ChatGPT has been called “the world’s most advanced chatbot” and “a potential chance to improve second language learning and instruction” (Shabara et al., 2024). The present study aims to investigate the quality of AI-generated scores and feedback on writing in comparison to teacher scores and feedback. Using a mixed methods design, the study compared ChatGPT-generated and its regenerated scores and qualitative comments to those assigned by experienced university instructors. A total of 89 argumentative essays were collected from the archives of a private university in Egypt. ChatGPT- 4o and two human raters scored them using a rubric that evaluates writing based on four criteria: content and development, organization and connection of ideas, linguistic range and control, and communicative effect. All scores were statistically analyzed to examine the consistency and accuracy of ChatGPT in scoring. Similarly, the written feedback was thematically analyzed and compared to teacher feedback. Themes identified from the data included tone of feedback, following the rubric, prioritizing certain writing features, and providing judgmental or improvement-oriented feedback. The quantitative data revealed a moderate correlation between AI-generated and teacher scores, with the only strong relationship being in the linguistic precision criterion. The results also showed a weak consistency in ChatGPT-generated and regenerated scores. In terms of qualitative feedback, it was found to be considerably close in quality to teacher feedback. Additionally, the study tapped into the effect of writing proficiency on the nature of the feedback, and the data showed that ChatGPT did not differentiate between students based on abilities whereas the teachers did, especially in terms of tone. This lack of differentiation, however, indicates that ChatGPT’s feedback may not be as personalized to students’ needs as the teacher feedback. Implications of the study include using ChatGPT for scoring language areas and generating feedback provided that teachers revise this evaluation. Study limitations such as evaluating the effectiveness of the feedback are also discussed. 2025-02-19T08:00:00Z thesis application/pdf https://fount.aucegypt.edu/etds/2468 https://fount.aucegypt.edu/context/etds/article/3514/viewcontent/hana_mohamed_nassar_thesis.pdf Theses and Dissertations AUC Knowledge Fountain AI-generated feedback teacher feedback language instruction writing AWE scores proficiency Applied Linguistics
spellingShingle	AI-generated feedback teacher feedback language instruction writing AWE scores proficiency Applied Linguistics Nassar, Hana Mohamed Comparing the Quality of AI-generated and Instructor Feedback in a University Writing Program
title	Comparing the Quality of AI-generated and Instructor Feedback in a University Writing Program
title_full	Comparing the Quality of AI-generated and Instructor Feedback in a University Writing Program
title_fullStr	Comparing the Quality of AI-generated and Instructor Feedback in a University Writing Program
title_full_unstemmed	Comparing the Quality of AI-generated and Instructor Feedback in a University Writing Program
title_short	Comparing the Quality of AI-generated and Instructor Feedback in a University Writing Program
title_sort	comparing the quality of ai generated and instructor feedback in a university writing program
topic	AI-generated feedback teacher feedback language instruction writing AWE scores proficiency Applied Linguistics
url	https://fount.aucegypt.edu/etds/2468 https://fount.aucegypt.edu/context/etds/article/3514/viewcontent/hana_mohamed_nassar_thesis.pdf
work_keys_str_mv	AT nassarhanamohamed comparingthequalityofaigeneratedandinstructorfeedbackinauniversitywritingprogram

Full Text Available

Comparing the Quality of AI-generated and Instructor Feedback in a University Writing Program

Similar Items