Use core dumps to analyze the causes of an unresponsive virtual machine (VM) instance.
To collect core dumps on Compute Engine, you must configure your
VMs to receive a Non-Maskable Interrupt (NMI) signal, and then run a
SendDiagnosticInterrupt
command to prompt a kernel panic or blue screen in
your operating system. A kernel panic or blue screen starts a core dump
collection by the guest operating system. These core dumps can then be used for
debugging purposes especially in scenarios that are hard to reproduce, such as
a kernel freeze.
Before you begin
- Sending NMI signals are counted in the default Queries API quota. For more information, see API rate limits.
-
If you haven't already, then set up authentication.
Authentication is
the process by which your identity is verified for access to Google Cloud services and APIs.
To run code or samples from a local development environment, you can authenticate to
Compute Engine by selecting one of the following options:
Select the tab for how you plan to use the samples on this page:
gcloud
-
Install the Google Cloud CLI, then initialize it by running the following command:
gcloud init
- Set a default region and zone.
REST
To use the REST API samples on this page in a local development environment, you use the credentials you provide to the gcloud CLI.
Install the Google Cloud CLI, then initialize it by running the following command:
gcloud init
For more information, see Authenticate for using REST in the Google Cloud authentication documentation.
-
Required roles
To ensure that your user or service account has the necessary
permission to send NMI signals to a VM,
ask your administrator to grant your user or service account the
Compute Instance Admin (v1) (roles/compute.instanceAdmin.v1
) IAM role on your project.
For more information about granting roles, see Manage access to projects, folders, and organizations.
This predefined role contains the
compute.instances.sendDiagnosticInterrupt
permission,
which is required to
send NMI signals to a VM.
Your administrator might also be able to give your user or service account this permission with custom roles or other predefined roles.
Overview
To use core dumps to help debug an unresponsive VM or a security issue, you need to complete the following steps:
- Configure your VM to generate core dumps
- Send an NMI signal to generate core dumps
- Review the core dumps
Limitations
For VMs that have Secure Boot enabled, you must disable Secure boot before you send an NMI interrupt signal. For instructions, see Modifying Shielded VM options on a VM instance.
Configure VM
A VM's response to receiving an NMI interrupt signal depends on the VM's operating system configuration.
Each operating system writes its core dump logs in a different location. For
example in Ubuntu operating systems the crash dump file is saved to
/var/crash/
by default.
To configure your guest OS to generate a crash dump when an NMI signal is received, review the documentation for the supported operating system.
Operating system | Links to instructions | Additional notes |
---|---|---|
Ubuntu | Ubuntu: Kernel crash dump | For Linux VMs, you must configure the kernel to crash when it receives the
NMI interrupt signal. To configure the kernel to crash, add the following to your configuration file: kernel.unknown_nmi_panic=1 |
SUSE Linux Enterprise Server (SLES) | Configure crashkernel memory for kernel core dump analysis | |
Red Hat Enterprise Linux (RHEL) | Use both of the following documents: |
|
Container-Optimized OS (COS) | Enabling Kernel Crash Dump on GCE COS Instances | Only COS 93 and later support kdump generation using NMI signal. |
Windows | Generate a kernel or complete crash dump | Windows client VMs don't keep memory dump files unless they are members of an AD domain or the following is true:
For more information, see Kernel dump storage and clean up behavior in Windows 7 |
Send NMI to generate core dumps
After you configure the VM, you can then send the NMI signal to the VM by using either the Google Cloud CLI, or REST.
gcloud
To send the NMI signal, use the
instances send-diagnostic-interrupt
command.
gcloud compute instances send-diagnostic-interrupt VM_NAME \ --zone=ZONE
Replace the following:
VM_NAME
: instance ID or name of the VM that you want to collect core dumps fromZONE
: the zone where your VM is located
The output is similar to the following:
<Empty Response>
For a complete list of outputs, see the next section in this document about "NMI command responses".
REST
Optional. If not already available, create an API key. For more information about creating API keys, see Creating an API key.
To send the NMI signal, make a
POST
request to thesendDiagnosticInterrupt
method.POST https://compute.googleapis.com/compute/v1/projects/PROJECT_ID/zones/ZONE/instances/VM_NAME/sendDiagnosticInterrupt?key=API_KEY
For, example, you can use the
curl
command to make the request as follows:curl --request POST 'https://compute.googleapis.com/compute/v1/projects/PROJECT_ID/zones/ZONE/instances/VM_NAME/sendDiagnosticInterrupt?key=API_KEY' \ --header 'Authorization: Bearer $(gcloud auth print-access-token)' \ --header 'Accept: application/json' \ --compressed
Replace the following:
PROJECT_ID
: ID of the project to create the VM inZONE
: the zone where your VM is locatedVM_NAME
: instance ID or name of the VM that you want to collect core dumps fromAPI_KEY
: your API key
The output is similar to the following:
<Empty Response>
For a complete list of outputs, see the next section in this document about "NMI command responses".
NMI command responses
One of the following responses are returned when you attempt to send an NMI signal.
State | Body | Notes |
---|---|---|
SUCCESS | <Empty Response> |
SUCCESS shows that the NMI signal is delivered to the
operating system. It does not guarantee that the core dump is collected, or
that the VM shuts down or reboots. These behaviors are determined by the
operating system configuration. |
FAIL | UNSUPPORTED_OPERATION
|
This occurs when the operating system fails to receive the NMI signal. There
are multiple reasons for this. Common scenarios are that the VM is being
live migrated or the VM
is not properly configured to receive NMI signals.
To resolve this, you can try the following:
|
FAIL | Required 'compute.instances.sendDiagnosticInterrupt' permission for [..]
|
The command failed because the user making the request does not have
sufficient permissions. To resolve this, you can assign a role to the user that contains the compute.instances.sendDiagnosticInterrupt permission. |
Review core dumps
Review the crash dump file in the configured or default location for your operating system.
For example in Ubuntu operating systems, by default, the crash dump file is
saved to /var/crash/
.