Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

A Text Analysis-Based Predictive Approach for Assessing Clause Risk: An Application in Construction Contracts

The identification and evaluation of risky contractual clauses remain a critical challenge in the construction industry during the tender phase. Such clauses can expose the contractors and other parties to conflicts and disputes, which will cause delays and cost overruns during project execution. He...

Full description

Saved in:
Bibliographic Details
Main Author: Abouelwy, Seifeldin Ahmed
Format: Thesis
Published: AUC Knowledge Fountain 2026
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The identification and evaluation of risky contractual clauses remain a critical challenge in the construction industry during the tender phase. Such clauses can expose the contractors and other parties to conflicts and disputes, which will cause delays and cost overruns during project execution. Hence, the traditional contract review processes that are currently in practice and the time-consuming assessment methods that rely heavily on expert judgment and manual procedures cause inconsistency and lead to human error, particularly when multiple contracts must be reviewed under strict deadlines. To address these issues, this study introduces an automated data-driven framework and will be referred to as Contracts Assessment Tool (CAT) that leverages text mining techniques and machine learning algorithms to enhance the contract evaluation process. This CAT integrates contractual clauses collected from multiple projects with expert assessments and generates automated report in which each clause is classified according to its impact level, probability of occurrence, and similarity compared to a reference contract. To achieve this objective, two main paths were undertaken. First, a text extraction model was developed to accurately identify, extract, and compare contractual clauses. Second, data were collected from contracts and experts, then preprocessed and visualized to extract meaningful insights. Finally, clause risk probability and impact classification models were developed and validated using different machine learning techniques such as Random Forest, SVM, KNN, XGBoost, Naïve Bayes, and Logistic Regression. The results showed that the Logistic Regression achieved the best results with an accuracy of 0.740 and F1-score of 0.736 for the risk model, and an accuracy of 0.710 with F1-score of 0.707 for the probability model. However, the use of resampling techniques, particularly ADASYN approach enhanced the models' performance, with the SVM achieving an accuracy of 0.922 and F1-score of 0.921 for the risk model, and an accuracy of 0.928 with F1-score of 0.926 for the probability model. Finally, the CAT was tested on two contract documents from different projects, where it successfully identified, extracted, and evaluated clauses by assigning accurate classifications for both impact and probability of occurrence. These results demonstrate CAT’s capability to support contract engineers by accelerating the review process, reducing human error, and improving efficiency, showing the potential of having an automated and machine learning based tools to enhance contract evaluation and strengthen contract risk identification in the pre-award phase.