Resources and Autoscaling

Bahriya gives you fine-grained control over how much compute your container uses and how it scales under load.

Updated 8 Jun 20262 min read

Bahriya gives you fine-grained control over how much compute your container uses and how it scales under load.

CPU and memory requests

When you deploy a container, you set a minimum CPU (in millicores) and minimum memory (in megabytes). These become the guaranteed floor — the amount of compute Bahriya guarantees your container will always have available.

Bahriya automatically computes resource limits (the ceiling) by applying a multiplier to your minimum values. This means your container can use more CPU or memory than its minimum if the node has spare capacity, but will be throttled or evicted if it exceeds the limit.

Min CPU:    350 millicores  → guaranteed floor
Max CPU:    700 millicores  → computed ceiling (2× in this example)
Min memory: 256 MB          → guaranteed floor
Max memory: 512 MB          → computed ceiling

Autoscaling

Autoscaling lets Bahriya automatically add or remove replicas of your container based on current load. You configure:

Setting	Description
Autoscaling enabled	Whether autoscaling is active.
Min replicas	The minimum number of instances always running (e.g. `2`).
Max replicas	The maximum number of instances Bahriya will scale up to (e.g. `10`).
Target CPU	The CPU utilisation percentage that triggers scaling (e.g. `70` = scale up when average CPU across replicas exceeds 70%).
Target memory	The memory utilisation percentage that triggers scaling.

When average CPU across running replicas exceeds your target, a new replica is added. When load drops, replicas are removed — but never below your minimum.

Without autoscaling

If autoscaling is disabled, Bahriya runs exactly min replicas instances at all times. This is predictable for cost estimation but provides no burst capacity.

With autoscaling

With autoscaling enabled, Bahriya runs between min replicas and max replicas depending on load. Every replica is billed per minute for the time it is actually running, with a 60-second minimum and rounded up to the next whole minute. A 90-second burst of extra replicas is billed for 120 seconds, not for a full hour.

Multi-region scaling

Resource and autoscaling settings apply per region. If your container runs in three regions with min replicas set to 2, there are at least 6 instances running globally (2 per region). Billing is calculated per region.

Choosing the right values

Start with min CPU 250, min memory 256 for lightweight services and observe real usage on the Metrics tab.
Set autoscaling min replicas to 2 in production for redundancy — a single replica means any instance restart causes downtime.
Set a target CPU of 70% — this leaves headroom for traffic spikes before new replicas come online.