Supercharging LLM Inference: KV Caching with Valkey

Technical Product Manager, Amazon ElastiCache

AWS

📅 Wednesday, May 27 • 2:05pm – 2:35pm • Learning Lab

Valkey

Large language model inference is often constrained by GPU memory, and KV caching is emerging as a practical way to reuse precomputed state across text that repeats. This talk explores a database-centered question: can an open-source datastore serve as a practical tiered caching layer for KV state between fast local GPU memory and slower shared tiers? We walk through a real architecture that uses Valkey as a high-performance key/value datastore, together with LMCache for cross-instance KV cache reuse and llm-d for KV-cache-aware routing and coordination. Attendees will leave with a practical framework for evaluating when a database-backed KV cache tier improves latency, throughput, and infrastructure efficiency for LLM workloads, and when it adds more complexity than value.

Download image

Speaker

Chaitanya Nuthalapati

Technical Product Manager, Amazon ElastiCache

AWS

Chaitanya is a Senior Technical Product Manager in AWS In-Memory Database Services, focused on Amazon ElastiCache for Valkey. Previously, he built solutions with generative AI, machine learning, and graph networks. Off …

Supercharging LLM Inference: KV Caching with Valkey

Speaker

Thanks to our 2026 Percona Live Bay Area Sponsors!

Community Partners