Proxy Caching

Proxy caching puts a managed response cache in front of your HTTP container. Repeat requests for the same URL are served from the cache instead of reaching your container, so origin load drops, tail latency improves, and bursty read traffic stops eating your compute budget.

When you enable it, Bahriya stands up a three-node managed cache cluster in every region the container runs in. The platform's edge checks the cache before forwarding a request. Hits come back in milliseconds and never touch your container. Misses go through, the response is stored, and the next caller benefits.

When to use it

Proxy caching is a strong fit for read-heavy public APIs, catalogue and listing endpoints, anything whose upstream work is expensive (joins, third-party calls, LLM inference), and as a cheap anti-scrape layer.

When NOT to use it

Highly dynamic responses that change on almost every request (live prices, leaderboards, real-time inventory).
Per-user content — if every response depends on the caller, the cache key fans out to one entry per user and the hit rate collapses. Reach for application-level memcached instead.
Write endpoints — POST, PUT, PATCH, DELETE are never cached.
Single-digit-second freshness requirements — even a short TTL means some users will see stale data for the length of the TTL.

Sizing the cache

Pick total cache size from 256 MB to 8 GB per region, in 256 MB steps.

A rough rule:

working_set = avg_response_size_kb * unique_cacheable_urls
target_size = working_set * 1.5    # headroom for metadata + churn

Worked examples:

5 KB JSON, 50,000 unique URLs → ~250 MB working set → start at 512 MB.
20 KB JSON, 200,000 URLs → ~4 GB → start at 6 GB.
80 KB thumbnails, 30,000 URLs → ~2.4 GB → start at 4 GB.

If you can't estimate it, start at 1 GB, watch the hit rate after a day of traffic, and resize.

Max item size

Defaults to 1 MB, can go up to 128 MB. Increase it if you cache large JSON, big SVGs, or rendered HTML. Anything bigger than the limit is passed through uncached. Pick the smallest value that fits what you actually want to cache — larger limits raise memory ceilings and increase fragmentation.

TTLs

Cache TTL (default 300 s) — how long a response is considered fresh.
Storage TTL (optional, must be ≥ cache TTL) — how long the response is kept around after going stale. With storage TTL set, a stale entry can serve as a fallback if your container is briefly unreachable, without paying for a cold re-fetch.

For most APIs, 60–600 s is the right range. Pages that change a few times a day sit at 3600+ comfortably.

If Honour Cache-Control is on (the default), your container can override the platform TTL per-response with standard Cache-Control headers. Turn it off if you want platform settings to win unconditionally.

The optional X-Proxy-Cache-Memcached-Force: true header lets a trusted client bypass freshness and force a stored response. Off by default; only enable it on containers where trusted clients call you.

Cache key, methods, status codes, content types

Methods default to GET, HEAD. You can add OPTIONS. Write methods are rejected.
Status codes default to 200, 301, 404. Common additions: 410, 204.
Content types default to text/plain, application/json. Add text/html or image/* as needed.
Cache key defaults to method + path + query. You can extend it with specific headers (Accept-Language), an explicit query-param allow-list, or JSON body fields for POST-style search endpoints. Keep the key narrow — every dimension multiplies your entry count.

Variant: before or after rate limit

You pick where the cache sits relative to the container's rate limiter.

Before rate limit — hits don't consume rate-limit tokens. Use this for public read endpoints: a scraper hitting the same URL 10,000 times in a minute gets 10,000 cached responses and your real users keep their full token budget.
After rate limit — every request, hit or miss, counts. Use this when the rate limit exists to enforce per-user fairness on an authenticated API, not to protect the origin.

High availability

The cache cluster runs three nodes per region. If one fails, the cluster keeps serving; the share of keys that lived on the lost node will miss until rebalancing re-populates that slice from real traffic. Expect a temporary dip in hit rate, not an outage. If the whole cache is unavailable, the platform falls back to your container origin transparently — cache loss never causes request failures.

The managed cache appears in your memcached list as read-only ("Managed by container "). You manage it through the container's settings, not directly.

Pricing

Component	Standard	Premium
Proxy cache surcharge	$5.00 / region / month	$7.50 / region / month
Managed memcached (cache memory)	$10 / GB / month	$15 / GB / month

The surcharge is flat per region. The memcached charge scales with the cache size you pick.

Worked example

1 GB standard cache in 2 regions:

Managed memcached: 1 GB × 2 × $10 = $20
Proxy cache surcharge: 2 × $5 = $10
Total: $30 / month

4 GB premium cache in 4 regions:

Managed memcached: 4 GB × 4 × $15 = $240
Proxy cache surcharge: 4 × $7.50 = $30
Total: $270 / month

Common configurations

Public read API, anti-scrape focus — 1 GB, cache TTL 60 s, methods GET, HEAD, status 200, 404, vary on Accept-Language, variant before rate limit.

Authenticated tenant API, per-user fairness — 2 GB, cache TTL 300 s, vary on Authorization (or a tenant header), variant after rate limit.

Catalogue / product listing — 4 GB, cache TTL 1800 s, storage TTL 7200 s, honour Cache-Control on, content types application/json, text/html, variant before rate limit.

All of these can be changed without rebuilding your container image — the cache layer picks up the new configuration on the next deployment cycle.