This guide explains how to use Google Cloud platform logs to troubleshoot issues when you are using Cloud Storage import topics to ingest data.
About ingestion failure in Cloud Storage import topics
Cloud Storage import topics can encounter issues that prevent data from being successfully ingested. For example, when using a Cloud Storage import topic, you might face issues ingesting a Cloud Storage object or part of an object.
The following list describes reasons for ingestion failure in Cloud Storage import topics that generate platform logs:
Message size
Individual messages can't be larger than 10 MB. If they are, the entire message is skipped.
If you're using the Avro or the Pub/Sub Avro format, message blocks can't be larger than 16 MB. Larger message blocks are skipped.
Message attributes
Messages can have a maximum of 100 attributes. Any extra attribute is dropped when the message is ingested.
Attribute keys can't be larger than 256 bytes and values can't be larger than 1024 bytes. Larger keys or values are removed from the message when it is ingested.
For more information about the guidelines for using message keys and attributes, see Use attributes to publish a message.
Avro formatting
- Make sure your Avro objects are correctly formatted. Incorrect formatting prevents the message from being ingested.
Data format
- Make sure that you're using a supported Avro version. Unsupported formats are not processed.
About platform logs
A supported Google Cloud service generates its own set of platform logs, capturing events and activities relevant to that service's operation. These platform logs contain detailed information about what's happening within a service, including successful operations, errors, warnings, and other noteworthy events.
Platform logs are a part of Cloud Logging and share the same features. For example, the following is a list of important features for platform logs:
Logs are typically structured as JSON objects that allow for further querying and filtering.
You can view platform logs by using Logging in the console.
Platform logs can also be integrated with Cloud Monitoring and other monitoring tools to create dashboards, alerts, and other monitoring mechanisms.
Log storage incurs costs based on ingested volume and retention period.
For more information about platform logs, see Google Cloud platform logs.
Required roles and permissions to use platform logs
Before you begin, verify that you have access to Logging.
You require the Logs Viewer (roles/logging.viewer)
Identity and Access Management (IAM) role. For more information about Logging
access, see Access control with IAM.
The following describe how to verify and grant IAM access:
View current access to verify the access that each principal has.
Grant a role to relevant principals in your project.
Enable platform logs
Platform logs is disabled by default for import topics. You can enable platform logs when you create or update a Cloud Storage import topic.
To disable platform logs, update the Cloud Storage import topic.
Enable platform logs while creating a Cloud Storage import topic
Ensure that you have completed the prerequisites for creating a Cloud Storage import topic.
To create a Cloud Storage import topic with platform logs enabled, follow these steps:
Console
-
In the Google Cloud console, go to the Topics page.
Click Create topic.
The topic details page opens.
In the Topic ID field, enter an ID for your Cloud Storage import topic.
For more information about naming topics, see the naming guidelines.
Select Add a default subscription.
Select Enable ingestion.
- Specify the options for ingestion by following the instructions in Create a Cloud Storage import topic.
- Select Enable platform logs.
- Retain the other default settings.
- Click Create topic.
gcloud
-
In the Google Cloud console, activate Cloud Shell.
At the bottom of the Google Cloud console, a Cloud Shell session starts and displays a command-line prompt. Cloud Shell is a shell environment with the Google Cloud CLI already installed and with values already set for your current project. It can take a few seconds for the session to initialize.
To enable platform logs, ensure the
--ingestion-log-severity
flag is set toWARNING
or a lower severity level such asINFO
orDEBUG
. Run thegcloud pubsub topics create
command:gcloud pubsub topics create TOPIC_ID\ --cloud-storage-ingestion-bucket=BUCKET_NAME\ --cloud-storage-ingestion-input-format=INPUT_FORMAT\ --ingestion-log-severity=WARNING
Replace the following:
TOPIC_ID: The name or ID of your topic.
BUCKET_NAME: Specifies the name of an existing bucket. For example,
prod_bucket
. The bucket name must not include the project ID. To create a bucket, see Create buckets.INPUT_FORMAT: Specifies the format of the objects that is ingested. This can be
text
,avro
, orpubsub_avro
. For more information about these options, See Input format.
If you run into issues, see Troubleshooting a Cloud Storage import topic.
Enable platform logs while updating a Cloud Storage import topic
Perform the following steps:
Console
In the Google Cloud console, go to the Topics page.
Click the Cloud Storage import topic.
In the topic details page, click Edit.
- Select Enable platform logs.
Click Update.
gcloud
-
In the Google Cloud console, activate Cloud Shell.
At the bottom of the Google Cloud console, a Cloud Shell session starts and displays a command-line prompt. Cloud Shell is a shell environment with the Google Cloud CLI already installed and with values already set for your current project. It can take a few seconds for the session to initialize.
To avoid losing your settings for the import topic, make sure to include all of them every time you update the topic. If you leave something out, Pub/Sub resets the setting to its original default value.
To enable platform logs, ensure the ingestion-log-severity is set to
WARNING
or a lower severity level such asINFO
orDEBUG
. Run thegcloud pubsub topics update
command with all the flags mentioned in the following sample:gcloud pubsub topics update TOPIC_ID \ --cloud-storage-ingestion-bucket=BUCKET_NAME\ --cloud-storage-ingestion-input-format=INPUT_FORMAT\ --cloud-storage-ingestion-text-delimiter=TEXT_DELIMITER\ --cloud-storage-ingestion-minimum-object-create-time=MINIMUM_OBJECT_CREATE_TIME\ --cloud-storage-ingestion-match-glob=MATCH_GLOB --ingestion-log-severity=WARNING
Replace the following:
TOPIC_ID is the topic ID or name. This field cannot be updated.
BUCKET_NAME: Specifies the name of an existing bucket. For example,
prod_bucket
. The bucket name must not include the project ID.INPUT_FORMAT: Specifies the format of the objects that is ingested. This can be
text
,avro
, orpubsub_avro
. For more information about these options, see Input format.-
TEXT_DELIMITER: Specifies the delimiter with which to split text objects into Pub/Sub messages. This should be a single character and should only be set when
INPUT_FORMAT
istext
. It defaults to the newline character (\n
).When using gcloud CLI to specify the delimiter, pay close attention to the handling of special characters like newline
\n
. Use the format'\n'
to ensure the delimiter is correctly interpreted. Simply using\n
without quotes or escaping results in a delimiter of"n"
. -
MINIMUM_OBJECT_CREATE_TIME: Specifies the minimum time at which an object was created in order for it to be ingested. This should be in UTC in the format
YYYY-MM-DDThh:mm:ssZ
. For example,2024-10-14T08:30:30Z
.Any date, past or future, from
0001-01-01T00:00:00Z
to9999-12-31T23:59:59Z
inclusive, is valid. -
MATCH_GLOB: Specifies the glob pattern to match in order for an object to be ingested. When you are using gcloud CLI, a match glob with
*
characters must have the*
character formatted as escaped in the form\*\*.txt
or the whole match glob must be in quotes"**.txt"
or'**.txt'
. For information about supported syntax for glob patterns, see the Cloud Storage documentation.
Disable platform logs
Perform the following steps:
Console
In the Google Cloud console, go to the Topics page.
Click the Cloud Storage import topic.
In the topic details page, click Edit.
- Clear Enable platform logs.
Click Update.
gcloud
-
In the Google Cloud console, activate Cloud Shell.
At the bottom of the Google Cloud console, a Cloud Shell session starts and displays a command-line prompt. Cloud Shell is a shell environment with the Google Cloud CLI already installed and with values already set for your current project. It can take a few seconds for the session to initialize.
To avoid losing your settings for the import topic, make sure to include all of them every time you update the topic. If you leave something out, Pub/Sub resets the setting to its original default value.
To disable platform logs, ensure the ingestion-log-severity is set to
DISABLED
. Run thegcloud pubsub topics update
command with all the flags mentioned in the following sample:gcloud pubsub topics update TOPIC_ID \ --cloud-storage-ingestion-bucket=BUCKET_NAME\ --cloud-storage-ingestion-input-format=INPUT_FORMAT\ --cloud-storage-ingestion-text-delimiter=TEXT_DELIMITER\ --cloud-storage-ingestion-minimum-object-create-time=MINIMUM_OBJECT_CREATE_TIME\ --cloud-storage-ingestion-match-glob=MATCH_GLOB --ingestion-log-severity=DISABLED
Replace the following:
TOPIC_ID is the topic ID or name. This field cannot be updated.
BUCKET_NAME: Specifies the name of an existing bucket. For example,
prod_bucket
. The bucket name must not include the project ID.INPUT_FORMAT: Specifies the format of the objects that is ingested. This can be
text
,avro
, orpubsub_avro
. For more information about these options, see Input format.-
TEXT_DELIMITER: Specifies the delimiter with which to split text objects into Pub/Sub messages. This should be a single character and should only be set when
INPUT_FORMAT
istext
. It defaults to the newline character (\n
).When using gcloud CLI to specify the delimiter, pay close attention to the handling of special characters like newline
\n
. Use the format'\n'
to ensure the delimiter is correctly interpreted. Simply using\n
without quotes or escaping results in a delimiter of"n"
. -
MINIMUM_OBJECT_CREATE_TIME: Specifies the minimum time at which an object was created in order for it to be ingested. This should be in UTC in the format
YYYY-MM-DDThh:mm:ssZ
. For example,2024-10-14T08:30:30Z
.Any date, past or future, from
0001-01-01T00:00:00Z
to9999-12-31T23:59:59Z
inclusive, is valid. -
MATCH_GLOB: Specifies the glob pattern to match in order for an object to be ingested. When you are using gcloud CLI, a match glob with
*
characters must have the*
character formatted as escaped in the form\*\*.txt
or the whole match glob must be in quotes"**.txt"
or'**.txt'
. For information about supported syntax for glob patterns, see the Cloud Storage documentation.
View platform logs
To view platform logs for Cloud Storage import topic, do the following:
Google Cloud console
In the Google Cloud console, go to Logs Explorer.
Select a Google Cloud project.
If required, from the Upgrade menu, switch from Legacy Logs Viewer to Logs Explorer.
To filter your logs to show only entries for Cloud Storage import topics, type
resource.type="resource.type=pubsub_topic AND severity=WARNING
into the query field and click Run query.In the Query results pane, click Edit time to change the time period for which to return results.
For more information about using the Logs Explorer, see Using the Logs Explorer.
gcloud CLI
To use the gcloud CLI to search for
platform logs for Cloud Storage import topics, use the
gcloud logging read
command.
Specify a filter to limit your results to platform logs for Cloud Storage import topics.
gcloud logging read "resource.type=pubsub_topic AND severity=WARNING"
Cloud Logging API
Use the entries.list
Cloud Logging API method.
To filter your results to include only platform logs for
Cloud Storage import topics,
use the filter
field. The following is a sample JSON request object.
{
"resourceNames":
[
"projects/my-project-name"
],
"orderBy": "timestamp desc",
"filter": "resource.type=\"pubsub_topic\" AND severity=WARNING"
}
View and understand platform log format
The following section includes sample platform logs and describes the fields for platform logs.
All platform log specific fields are contained within a
jsonPayload
object.
Avro failure
{
"insertId": "1xnzx8md4768",
"jsonPayload": {
"@type": "type.googleapis.com/google.pubsub.v1.IngestionFailureEvent",
"cloudStorageFailure": {
"objectGeneration": "1661148924738910",
"bucket": "bucket_in_avro_format",
"objectName": "counts/taxi-2022-08-15T06:10:00.000Z-2022-08-15T06:15:00.000Z-pane-0-last-00-of-01",
"avroFailureReason": {}
},
"topic": "projects/interpod-p2-management/topics/avro_bucket_topic",
"errorMessage": "Unable to parse the header of the object. The object won't be ingested."
},
"resource": {
"type": "pubsub_topic",
"labels": {
"project_id": "interpod-p2-management",
"topic_id": "avro_bucket_topic"
}
},
"timestamp": "2024-10-07T18:55:45.650103193Z",
"severity": "WARNING",
"logName": "projects/interpod-p2-management/logs/pubsub.googleapis.com%2Fingestion_failures",
"receiveTimestamp": "2024-10-07T18:55:46.678221398Z"
}
Log field | Description |
---|---|
insertId |
A unique identifier for the log entry. |
jsonPayload.@type |
Identifies the event type. Always type.googleapis.com/google.pubsub.v1.IngestionFailureEvent . |
jsonPayload.cloudStorageFailure.objectGeneration |
The generation number of the Cloud Storage object. |
jsonPayload.cloudStorageFailure.bucket |
The Cloud Storage bucket containing the object. |
jsonPayload.cloudStorageFailure.objectName |
The name of the Cloud Storage object. |
jsonPayload.cloudStorageFailure.avroFailureReason |
Contains more specific Avro parsing error details. This field is left empty. |
jsonPayload.topic |
The Pub/Sub topic the message was intended for. |
jsonPayload.errorMessage |
A human-readable error message. |
resource.type |
The resource type. Always pubsub_topic . |
resource.labels.project_id |
The Google Cloud project ID. |
resource.labels.topic_id |
The Pub/Sub topic ID. |
timestamp |
Log entry generation timestamp. |
severity |
Severity level which is WARNING . |
logName |
Name of the log. |
receiveTimestamp |
Log entry received timestamp. |
Text failure
{
"insertId": "1kc4puoag",
"jsonPayload": {
"@type": "type.googleapis.com/google.pubsub.v1.IngestionFailureEvent",
"cloudStorageFailure": {
"bucket": "bucket_in_text_format",
"apiViolationReason": {},
"objectName": "counts/taxi-2022-08-15T06:10:00.000Z-2022-08-15T06:15:00.000Z-pane-0-last-00-of-01",
"objectGeneration": "1727990048026758"
},
"topic": "projects/interpod-p2-management/topics/large_text_bucket_topic",
"errorMessage": "The message has exceeded the maximum allowed size of 10000000 bytes. The message won't be published."
},
"resource": {
"type": "pubsub_topic",
"labels": {
"topic_id": "large_text_bucket_topic",
"project_id": "interpod-p2-management"
}
},
"timestamp": "2024-10-09T14:09:07.760488386Z",
"severity": "WARNING",
"logName": "projects/interpod-p2-management/logs/pubsub.googleapis.com%2Fingestion_failures",
"receiveTimestamp": "2024-10-09T14:09:08.483589656Z"
}
Log field | Description |
---|---|
insertId |
A unique identifier for the log entry. |
jsonPayload.@type |
Identifies the event type. Always type.googleapis.com/google.pubsub.v1.IngestionFailureEvent . |
jsonPayload.cloudStorageFailure.objectGeneration |
The generation number of the Cloud Storage object. |
jsonPayload.cloudStorageFailure.bucket |
The Cloud Storage bucket containing the object. |
jsonPayload.cloudStorageFailure.objectName |
The name of the Cloud Storage object. |
jsonPayload.cloudStorageFailure.apiViolationReason |
Contains details about the API violation. This field is left empty. |
jsonPayload.topic |
The Pub/Sub topic. |
jsonPayload.errorMessage |
A human-readable message. |
resource.type |
Resource type, always pubsub_topic . |
resource.labels.project_id |
Google Cloud project ID. |
resource.labels.topic_id |
Pub/Sub topic ID. |
timestamp |
Log entry generation timestamp. |
severity |
Severity level which is WARNING . |
logName |
Name of the log. |
receiveTimestamp |
Time at which the log entry was received by Logging. |