About resize requests in a MIG


This document describes how resize requests in a managed instance group (MIG) work and their limitations. Use resize requests to create virtual machine (VM) instances with GPUs all at once in a MIG.

Creating VMs all at once in a MIG through a resize request is useful in the following scenarios:

  • When you want an exact number of VMs to run a job, a resize request helps you to create VMs all at once. This helps avoid unnecessary charges for the partial capacity that Compute Engine creates while you wait for all resources to become available.

  • When you want GPU VMs for a specific time only, a resize request increases chances of obtaining these highly-demanded resources.

How resize requests work

The following sections outline how resize requests work.

On creation

When creating a resize request, you must specify the following properties:

  • resizeBy: the number of VMs that you want to create all at once as part of the request.

  • requestedRunDuration: the duration for which the VMs created as part of the request must run. The run duration must be between 10 minutes and 7 days. At the end of the run duration, the MIG automatically deletes the created VMs. When you create a resize request in a MIG for Hypercompute Cluster, this property is optional. If you don't specify a run duration for a resize request in a Hypercompute Cluster, then the VMs run until the end of the reservation that the MIG uses.

After creation

After you create a resize request, Compute Engine changes its state as follows:

  • CREATING: Compute Engine is creating the resize request, the MIG's target size increases by the number of VMs specified in the request, and the MIG creates managed instances that are in a CREATING state. These managed instances represent the VMs that the MIG creates when the resize request succeeds.

  • ACCEPTED: the request has been created and accepted. The underlying scheduler mechanism, the Dynamic Workload Scheduler (DWS), schedules the creation of the requested resources based on resource availability and the run duration specified in the request. If you lack quota for the requested resources or the resources are temporarily unavailable, then the DWS persists the request until you have sufficient quota and the resources become available.

  • SUCCEEDED: the MIG created the requested number of VMs all at once. The VMs run until the MIG deletes them after the specified run duration ends, or until you delete the VMs.

  • FAILED: the resize request failed due to a technical error and Compute Engine decreased the target size of the MIG by the number of requested VMs.

  • CANCELLED: a user canceled the resize request. Canceling a resize request stops the MIG from creating the requested resources. After canceling a resize request, Compute Engine decreases the MIG's target size by the number of requested VMs and automatically deletes the request after 14 days. Optionally, you can delete a resize request before Compute Engine automatically deletes it.

If you delete a MIG containing resize requests, then this operation also deletes any resize requests and VMs in the MIG. However, if you delete a MIG when the MIG is creating VMs to fulfill a resize request, Compute Engine waits until the MIG has finished creating the requested number of VMs and the state of the resize request transitions to SUCCEEDED before deleting the MIG.

Limitations

The following sections outline the limitations for creating resize requests in a MIG.

For resize requests

For resize requests, the following limitations apply:

  • You can use resize request to obtain GPU VMs only.

  • You can only cancel accepted (ACCEPTED) resize requests.

  • You can only delete a resize request after it succeeds (SUCCEEDED), fails (FAILED), or a user cancels it (CANCELLED).

For the instance template

For the instance template used in the MIG in which you want to create resize requests, the following limitations apply:

For the MIG

For the MIG in which you want to create resize requests, the following limitations apply:

Quota for GPU VMs with requested run duration

GPU VMs that are configured to be automatically deleted after a predefined run time of 7 days or less can consume either preemptible or standard allocation quotas. This behavior is intended to help you improve the obtainability of allocation quota for temporary-but-uninterrupted workloads. For more information about this behavior, see GPU VMs and preemptible allocation quotas.

Pricing

There are no costs associated with creating, canceling, or deleting resize requests. You only incur charges for the VMs created through a resize request—from the moment when the MIG creates the VMs, until the MIG automatically deletes the VMs at the end of their run duration or you manually delete the VMs.

If a MIG creates only some of the requested VMs and fails to create the remaining ones, then you may still incur charges for the created VMs until the MIG automatically deletes them.

What's next