Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Context-Aware Autoscaling for Cost-Efficient Large Language Model Inference With Prefix Cache Integration

Saved in:
Bibliographic Details
Published in:IEEE Access
Format: Online Article RSS Article
Published: 2026
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1864030191777480704
collection WordPress RSS
FRELIP Feed Integration
container_title IEEE Access
description
discipline_display Engineering & Technology
discipline_facet Engineering & Technology
format Online Article
RSS Article
genre Journal Article
id rss_article:10553
institution FRELIP
journal_source_facet IEEE Access
publishDate 2026
publishDateSort 2026
record_format rss_article
spellingShingle Context-Aware Autoscaling for Cost-Efficient Large Language Model Inference With Prefix Cache Integration
Computer Science & Information Science
Computer Science & IT
Engineering & Technology
sub_discipline_display Computer Science & IT
sub_discipline_facet Computer Science & IT
subject_display Computer Science & Information Science
Computer Science & IT
Engineering & Technology
Computer Science & Information Science
Computer Science & IT
Engineering & Technology
subject_facet Computer Science & Information Science
Computer Science & IT
Engineering & Technology
title Context-Aware Autoscaling for Cost-Efficient Large Language Model Inference With Prefix Cache Integration
title_auth Context-Aware Autoscaling for Cost-Efficient Large Language Model Inference With Prefix Cache Integration
title_full Context-Aware Autoscaling for Cost-Efficient Large Language Model Inference With Prefix Cache Integration
title_fullStr Context-Aware Autoscaling for Cost-Efficient Large Language Model Inference With Prefix Cache Integration
title_full_unstemmed Context-Aware Autoscaling for Cost-Efficient Large Language Model Inference With Prefix Cache Integration
title_short Context-Aware Autoscaling for Cost-Efficient Large Language Model Inference With Prefix Cache Integration
title_sort context-aware autoscaling for cost-efficient large language model inference with prefix cache integration
topic Computer Science & Information Science
Computer Science & IT
Engineering & Technology
url http://ieeexplore.ieee.org/document/11455169