Home Sponsorship Information Agenda Speakers Venue & Hotels Registration Percona Live 2026 - Amsterdam All Events
← Back to Talks 30 Minute Presentation

Supercharging LLM Inference: KV Caching with Valkey

Chaitanya Nuthalapati
Chaitanya Nuthalapati
Technical Product Manager, Amazon ElastiCache
Amazon

May 27-29, 2026 • Computer History Museum, California
Date, time, and room will be announced soon.

Valkey

Large language model inference is often constrained by GPU memory, and KV caching is emerging as a practical way to reuse precomputed state across text that repeats. This talk explores a database-centered question: can an open-source datastore serve as a practical tiered caching layer for KV state between fast local GPU memory and slower shared tiers? We walk through a real architecture that uses Valkey as a high-performance key/value datastore, together with LMCache for cross-instance KV cache reuse and llm-d for KV-cache-aware routing and coordination. Attendees will leave with a practical framework for evaluating when a database-backed KV cache tier improves latency, throughput, and infrastructure efficiency for LLM workloads, and when it adds more complexity than value.

Supercharging LLM Inference: KV Caching with Valkey

Speaker

Chaitanya Nuthalapati
Chaitanya Nuthalapati
Technical Product Manager, Amazon ElastiCache
Amazon

Chaitanya is a Senior Technical Product Manager in AWS In-Memory Database Services, focused on Amazon ElastiCache for Valkey. Previously, he built solutions with generative AI, machine learning, and graph networks. Off …