GPTCache (Open-Source Semantic Cache Library)
An open-source semantic caching library for LLM applications that stores responses and retrieves them via embedding similarity to skip repeated or paraphrased model calls.
GPTCache is an open-source semantic cache library, originally released by Zilliz in 2023, designed to reduce cost and latency for applications that call large language model APIs. It intercepts requests to providers such as OpenAI and Anthropic, embeds each query as a vector, and performs a similarity search against previously cached queries; on a hit it returns the stored response without invoking the model. The library is modular: pluggable embedding models, vector stores (FAISS, Milvus, Chroma, Redis), similarity evaluators, and eviction policies can be swapped independently. It integrates with orchestration frameworks including LangChain and LlamaIndex. Reported speedups for cache hits range from two to ten times relative to a live model call, with hit rates depending heavily on workload — narrow FAQ traffic reaches high ratios, while diverse open-ended agent traffic typically lands in the 30–70% range. Common failure modes are false positives caused by uniform similarity thresholds in dense embedding regions and stale answers when underlying facts change. The project popularized the broader pattern of semantic-similarity response caching now standard in many production LLM stacks.