Full Text Available

Access Repository

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Context-Aware Autoscaling for Cost-Efficient Large Language Model Inference With Prefix Cache Integration

Saved in:

Bibliographic Details
Published in:	IEEE Access
Format:	Online Article RSS Article
Published:	2026
Subjects:	Computer Science & Information Science Computer Science & IT Engineering & Technology Journal Article
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1864030191777480704
collection	WordPress RSS FRELIP Feed Integration
container_title	IEEE Access
description
discipline_display	Engineering & Technology
discipline_facet	Engineering & Technology
format	Online Article RSS Article
genre	Journal Article
id	rss_article:10553
institution	FRELIP
journal_source_facet	IEEE Access
publishDate	2026
publishDateSort	2026
record_format	rss_article
spellingShingle	Context-Aware Autoscaling for Cost-Efficient Large Language Model Inference With Prefix Cache Integration Computer Science & Information Science Computer Science & IT Engineering & Technology
sub_discipline_display	Computer Science & IT
sub_discipline_facet	Computer Science & IT
subject_display	Computer Science & Information Science Computer Science & IT Engineering & Technology Computer Science & Information Science Computer Science & IT Engineering & Technology
subject_facet	Computer Science & Information Science Computer Science & IT Engineering & Technology
title	Context-Aware Autoscaling for Cost-Efficient Large Language Model Inference With Prefix Cache Integration
title_auth	Context-Aware Autoscaling for Cost-Efficient Large Language Model Inference With Prefix Cache Integration
title_full	Context-Aware Autoscaling for Cost-Efficient Large Language Model Inference With Prefix Cache Integration
title_fullStr	Context-Aware Autoscaling for Cost-Efficient Large Language Model Inference With Prefix Cache Integration
title_full_unstemmed	Context-Aware Autoscaling for Cost-Efficient Large Language Model Inference With Prefix Cache Integration
title_short	Context-Aware Autoscaling for Cost-Efficient Large Language Model Inference With Prefix Cache Integration
title_sort	context-aware autoscaling for cost-efficient large language model inference with prefix cache integration
topic	Computer Science & Information Science Computer Science & IT Engineering & Technology
url	http://ieeexplore.ieee.org/document/11455169

Full Text Available

Context-Aware Autoscaling for Cost-Efficient Large Language Model Inference With Prefix Cache Integration

Similar Items