Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

On Route to Sustainable Software: AI-Agent-Enabled Green Coding in Integrated Development Environments

With software becoming more at the heart of everything we use nowadays and with the unprecedented advancements in technology comes the increase in energy consumption accompanied by the carbon emissions that this entails. To address these concerns, there is an escalating need for building sustainable...

Full description

Saved in:
Bibliographic Details
Main Author: Salaheldin, Sohaida
Format: Thesis
Published: AUC Knowledge Fountain 2026
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:With software becoming more at the heart of everything we use nowadays and with the unprecedented advancements in technology comes the increase in energy consumption accompanied by the carbon emissions that this entails. To address these concerns, there is an escalating need for building sustainable software that scales to meet industrial development and consumer needs while ensuring the greenness of the underlying infrastructure. Taking a look at the Sustainable Development Goals (SDGs) by the United Nations, it is evident that Goal 9 directly addresses the necessity to build sustainable and resilient technological infrastructure [2], and Goal 13 further builds on that by setting an urgent call-to-action to combat climate change by offsetting carbon emissions [3]. Recent research has begun to tap into the latter by exploring different methods of calculating the carbon emissions of software, such as the study by MIT, Harvard, Mercedes, and TWT in March 2024 that introduced the “Green Capacity” (GC) score which quantifies the greenness of a software [10]. While there are continuous experimentations being done in the space of Green Software, the realm of software development never sleeps, and everyday witnesses even further advancements in several dimensions that require regular assessment of their “greenness”. The aim of this study is to evaluate the effectiveness of AI agents embedded within an integrated development environment (IDE) in suggesting code improvements that attempt to reduce the software footprint of carbon-equivalent emissions, in addition to satisfying functional code correctness and maintaining consideration for non-functional constraints. How well the codebase achieves greener software measures after the AI suggestions are applied is evaluated through several metrics, primarily energy and memory consumption, runtime, carbon-equivalent emissions, and code correctness. To achieve this, the research at hand introduces a Python-oriented plugin called GreenMeter, which acts as a green software development agent embedded within Visual Studio Code (VS Code). This extension offers commands that developers can invoke to prompt selected AI models to attempt to improve the greenness of a selected codebase. In our study, a set of experiments are run using GreenMeter to prompt 3 LLMs (GPT 4.1, Gemini 3 Flash Preview, and Claude Haiku 4.5) with 23 different prompts to attempt making the top solutions for 5 LeetCode problems more sustainable. Both sets of solutions, human-written code and AI-generated code, are assessed to determine their green score, measured by 4 sustainability-related metrics: execution time (referred to as runtime), memory consumption, carbon-equivalent emissions, and energy consumption. In light of that, our research introduces a new software greenness score, Green Index (GI), that combines those 4 prominent, green-related metrics into a single score to evaluate the overall sustainability of the AI-generated code. The Green Index (GI) is then compared to other greenness measurements proposed in literature, such as the Green Capacity (GC) [10] and TOPSIS [32], to assess the competency of each method in providing an objective and reliable greenness evaluation of the underlying code. Statistics on the code correctness are also gathered to check whether the LLM-generated implementation meets the functional requirements. While recent literature shows that AI has low potential in improving the sustainability of the software, our study confirms that modern popular LLMs that are used widely in agentic software development do in fact generate green-oriented code when directed with the pertinent prompts. Moreover, our results show that, among the 3 models in our scope of experimentation, GPT 4.1 portrayed the highest potential for consistency in generating green code, with almost 65% chance of producing code with GI > 0, as opposed to 60% or below by Gemini 3 Flash Preview and Claude Haiku 4.5. On the contrary, Gemini 3 Flash Preview has stood out in its rate of generating code3that accurately solves the problem, with more than 99% correctness rate, while also leading nearly 7% of the results to a Favorably Green GI (GINORMALIZED > 0.8), thereby outperforming its counterparts. Although the leap in code generation by AI is relatively recent, its growing awareness of green coding comes with a tradeoff, such as maintainability. Our research shows that 26% of all the reported bugs in our experiments are traced back to code changes that utilize libraries or data structures that could drive more code greenness yet break the flow of execution of the program; moreover, 22% of the defects can be linked to AI-generated code that changes the signature of the original function thereby breaking the program execution. With AI introducing those types of failures into the software, assessing the greenness of the code would not be possible. In light of this, it is increasingly important to embed some form of static code analysis, preferably on the serving side of the LLM, to confirm that the generated code abides by functional and non-functional requirements prior to submitting the code to the client side. Prompt engineering plays an essential role in driving agentic green code development. By exploring the potential of 23 prompts of different types and flavors across the 3 state-of-the-art LLMs in our scope, our study shows that Few-Shot Prompts, which include guidelines and examples that AI can follow to improve the greenness of the code, are highly effective in the process. Moreover, the usage of the verb “optimize” in the prompts is found to be essential in maximizing the generation of green code across the different LLMs. To make the best of both worlds, 3 Few-Shot Prompts that leveraged the verb “optimize” (i.e., Prompts 16, 18, and 19) led nearly 4% of the generated code across all experiments to be Favorably Green (i.e., belong to the top GI score category). Because prompt engineering shall not be a single-shot activity in green code development, it is important to capitalize on the top 4 prompts that our study advocates for, particularly through prompt-chaining to maximize the code greenness benefits obtained through each prompt.