Supercharging LLM Inference: KV Caching with Valkey

May 27-29, 2026 • Computer History Museum, CaliforniaDate, time, and room will be announced soon.
Large language model inference is often constrained by GPU memory, and KV caching is emerging as a practical way to reuse precomputed state across text that repeats. This talk explores a database-centered question: can an open-source datastore serve as a practical tiered caching layer for KV state between fast local GPU memory and slower shared tiers? We walk through a real architecture that uses Valkey as a high-performance key/value datastore, together with LMCache for cross-instance KV cache reuse and llm-d for KV-cache-aware routing and coordination. Attendees will leave with a practical framework for evaluating when a database-backed KV cache tier improves latency, throughput, and infrastructure efficiency for LLM workloads, and when it adds more complexity than value.
Speaker

Chaitanya is a Senior Technical Product Manager in AWS In-Memory Database Services, focused on Amazon ElastiCache for Valkey. Previously, he built solutions with generative AI, machine learning, and graph networks. Off …

