Channels - Output format biases in the evaluation of large language models for code translation :: FRELIP Discovery

Similar Items: Output format biases in the evaluation of large language models for code translation

Quick Look
An evaluation study of large language models for addressing code quality issues
Quick Look
ArkTS code generation: A comprehensive evaluation with large language models
Quick Look
Evaluating large language models for multilingual vulnerability detection at dual granularities
Quick Look
HAFix: history-augmented large language models for bug fixing
Quick Look
Byam: Fixing Breaking Dependency Updates with Large Language Models
Quick Look
Peer-aided repairer: empowering large language models to repair advanced student assignments
Quick Look
Less is more: usefulness of data flow diagrams and large language models for security threat validation
Quick Look
Code review as decision-making - building a cognitive model from the questions asked during code review
Quick Look
Robustness evaluation and enhancement of LLMs in code generation: an empirical study
Quick Look
Mitigating omitted variable bias in empirical software engineering
Quick Look
The price of precision: the cost of preprocessing for automated code revision in code review
Quick Look
Performance analysis of AI-generated code: A case study of Copilot, Copilot Chat, CodeLlaMa, and DeepSeek-Coder models
Quick Look
Implicit security requirements classification with large language models using the OWASP application security verification standard: a shift-left approach
Quick Look
Learning to represent code changes
Quick Look
AdvGen-X: Transferability driven adversarial example generation for pre-trained models of code
Quick Look
Evaluating Large Language Models for Arduino Code Generation
Quick Look
DPS: Design pattern summarisation using code features
Quick Look
Do code LLMs do static analysis?
Quick Look
Evaluating Bias Detection and Mitigation Approaches Across Classical and Large Language Models
Quick Look
: Synthesizing and scheduling bug-triggering code segments for history-driven compiler testing
Quick Look
Peer code review in research software development: The research software engineer perspective
Quick Look
How challenging it is to identify real code authors: an empirical study
Quick Look
On the synchronization between Hugging Face pre-trained language models and their upstream GitHub repository
Quick Look
An exploratory eye tracking study on how developers classify and debug Python code in different paradigms