CPU allocation (services)

By default, Cloud Run instances are only allocated CPU during request processing, container startup and shutdown. (Refer to instance lifecycle). You can change this behavior so CPU is always allocated and available even when there are no incoming requests. Setting the CPU to be always allocated can be useful for running short-lived background tasks and other asynchronous processing tasks.

Illustration of CPU allocation modes

Even if CPU is always allocated, Cloud Run autoscaling is still in effect, and may terminate instances if they aren't needed to handle incoming traffic or current CPU utilization outside of requests. An instance will never stay idle for more than 15 minutes after processing a request unless it is kept active using minimum instances.

Combining CPU always allocated with a number of minimum instances results in a number of instances up and running with full access to CPU resources, enabling background processing use cases. When using this pattern, Cloud Run applies instance autoscaling even if a service is using CPU outside of any requests.

If you use healthcheck probes, CPU is allocated for every probe. See container healthcheck probes for billing details.

Pricing impact

If you choose the CPU to be allocated only during request processing, you are charged per request and only when the instance processes a request. If you choose the CPU always allocated setting, you are charged for the entire lifecycle of the instance. See the Cloud Run pricing tables for details.

Google's Recommender automatically looks at traffic received by your Cloud Run service over the past month, and will recommend switching from CPU allocated during requests to CPU always allocated, if this is cheaper.

How to choose the appropriate CPU allocation

Choosing the appropriate CPU allocation for your use case depends on several factors, such as traffic patterns, background execution, and cost, each of which is described in the following sections.

Traffic patterns considerations

  • CPU only allocated during request processing is recommended when incoming traffic is sporadic, bursty or spiky.
  • CPU always allocated is recommended when incoming traffic is steady, slowly varying.

Background execution considerations

Selecting CPU always allocated allows you to execute short-lived background tasks and other asynchronous processing work after returning responses. For example:

  • Leveraging monitoring agents like OpenTelemetry that may assume to be able to run in the background.
  • Using Go's Goroutines, Node.js async, Java threads, and Kotlin coroutines.
  • Using application frameworks that rely on built-in scheduling/background functionalities.

Idle instances, including those kept warm using minimum instances, can be shut down at any time. If you need to finish outstanding tasks before the container is terminated, you can trap SIGTERM to give a instance 10 seconds grace time before it is stopped.

Consider using Cloud Tasks for executing asynchronous tasks. Cloud Tasks automatically retries failed tasks and supports running times up to 30 minutes.

Cost considerations

If you are currently using CPU only allocated during request processing, CPU always allocated is probably more economical if:

  • Your Cloud Run service is processing high number of current requests at a rather steady rate.
  • You do not see a lot of "idle" instances when looking at the instance count metric.

You can use the pricing calculator to estimate cost differences.

Autoscaling considerations

Cloud Run autoscales the number of container instances.

For a service set to CPU only allocated during request processing, Cloud Run autoscales the number of instances based on CPU utilization only during request processing.

For a service set to CPU always allocated, Cloud Run autoscales the number of instances based on CPU utilization for the entire lifecycle of the container instance, except when scaling to and from zero, where it only uses requests.

Required roles

To get the permissions that you need to configure and deploy Cloud Run services, ask your administrator to grant you the following IAM roles:

For a list of IAM roles and permissions that are associated with Cloud Run, see Cloud Run IAM roles and Cloud Run IAM permissions. If your Cloud Run service interfaces with Google Cloud APIs, such as Cloud Client Libraries, see the service identity configuration guide. For more information about granting roles, see deployment permissions and manage access.

Set and update CPU allocation

Any configuration change leads to the creation of a new revision. Subsequent revisions will also automatically get this configuration setting unless you make explicit updates to change it.

If you are choosing the always-allocated CPU option, you must specify at least 512MiB of memory.

By default, CPU is only allocated during request processing for each container instance. You can change this using the Google Cloud console, the gcloud command line, or a YAML file when you create a new service or deploy a new revision:

Console

  1. In the Google Cloud console, go to Cloud Run:

    Go to Cloud Run

  2. Click Deploy container and select Service to configure a new service. If you are configuring an existing service, click the service, then click Edit and deploy new revision.

  3. If you are configuring a new service, fill out the initial service settings page, then click Container(s), volumes, networking, security to expand the service configuration page.

  4. Click the Container tab.

    image

    • Select the desired CPU allocation under CPU allocation and pricing. Select CPU is only allocated during request processing for your instances to receive CPU only when they are receiving requests. Select CPU is always allocated to allocate CPU for the entire lifetime of instances.
  5. Click Create or Deploy.

gcloud

You can update the CPU allocation. To set CPUs to be always allocated for a given service:

gcloud run services update SERVICE --no-cpu-throttling 

Replace SERVICE with the name of your service.

To set CPU allocation only during request processing:

gcloud run services update SERVICE --cpu-throttling 

You can also set CPU allocation during deployment. To set CPUs to be always allocated:

gcloud run deploy --image IMAGE_URL --no-cpu-throttling

To set CPU allocation only during request processing:

gcloud run deploy --image IMAGE_URL --cpu-throttling

Replace IMAGE_URL with a reference to the container image, for example, us-docker.pkg.dev/cloudrun/container/hello:latest. If you use Artifact Registry, the repository REPO_NAME must already be created. The URL has the shape LOCATION-docker.pkg.dev/PROJECT_ID/REPO_NAME/PATH:TAG .

YAML

  1. If you are creating a new service, skip this step. If you are updating an existing service, download its YAML configuration:

    gcloud run services describe SERVICE --format export > service.yaml
  2. Update the cpu attribute:

    apiVersion: serving.knative.dev/v1
    kind: Service
    metadata:
      name: SERVICE
    spec:
      template:
        metadata:
          annotations:
            run.googleapis.com/cpu-throttling: 'BOOLEAN'
          name: REVISION

    Replace

    • SERVICE with the name of your Cloud Run service
    • BOOLEAN with true to set CPU allocation only during request processing, or false to set CPU to always allocated.
    • REVISION with a new revision name or delete it (if present). If you supply a new revision name, it must meet the following criteria:
      • Starts with SERVICE-
      • Contains only lowercase letters, numbers and -
      • Does not end with a -
      • Does not exceed 63 characters
  3. Create or update the service using the following command:

    gcloud run services replace service.yaml

Terraform

To learn how to apply or remove a Terraform configuration, see Basic Terraform commands.

Add the following to a google_cloud_run_v2_service resource in your Terraform configuration, under template.containers.resources.

resource "google_cloud_run_v2_service" "default" {
  name     = "cloudrun-service-cpu-allocation"
  location = "us-central1"

  deletion_protection = false # set to "true" in production

  template {
    containers {
      image = "us-docker.pkg.dev/cloudrun/container/hello"
      resources {
        # If true, garbage-collect CPU when once a request finishes
        cpu_idle = false
      }
    }
  }
}

View CPU allocation settings

To view the current CPU allocation settings for your Cloud Run service:

Console

  1. In the Google Cloud console, go to Cloud Run:

    Go to Cloud Run

  2. Click the service you are interested in to open the Service details page.

  3. Click the Revisions tab.

  4. In the details panel at the right, the CPU allocation setting is listed under the Container tab.

gcloud

  1. Use the following command:

    gcloud run services describe SERVICE
  2. Locate the CPU allocation setting in the returned configuration.