Supercharging LLM Inference: KV Caching with Valkey

Large language model inference is often constrained by GPU memory, and KV caching is emerging as a practical way to reuse precomputed state across text that repeats. This talk explores a database-centered question: can an open-source datastore serve as a practical tiered caching layer for KV state between fast local GPU memory and slower shared tiers? We walk through a real architecture that uses Valkey as a high-performance key/value datastore, together with LMCache for cross-instance KV cache reuse and llm-d for KV-cache-aware routing and coordination. Attendees will leave with a practical framework for evaluating when a database-backed KV cache tier improves latency, throughput, and infrastructure efficiency for LLM workloads, and when it adds more complexity than value.
Download imageSpeaker

Chaitanya is a Senior Technical Product Manager in AWS In-Memory Database Services, focused on Amazon ElastiCache for Valkey. Previously, he built solutions with generative AI, machine learning, and graph networks. Off …


