AIP-193
Errors
Effective error communication is an important part of designing simple and intuitive APIs. Services returning standardized error responses enable API clients to construct centralized common error handling logic. This common logic simplifies API client applications and eliminates the need for cumbersome custom error handling code.
Guidance
Services must return a google.rpc.Status
message when an
API error occurs, and must use the canonical error codes defined in
google.rpc.Code
. More information about the particular codes
is available in the gRPC status code documentation.
Error messages should help a reasonably technical user understand and resolve the issue, and should not assume that the user is an expert in your particular API. Additionally, error messages must not assume that the user will know anything about its underlying implementation.
Error messages should be brief but actionable. Any extra information
should be provided in the details
field. If even more information
is necessary, you should provide a link where a reader can get more
information or ask questions to help resolve the issue. It is also
important to set the right tone when writing messages.
The following sections describe the fields of google.rpc.Status
.
Status.message
The message
field is a developer-facing, human-readable "debug message"
which should be in English. (Localized messages are expressed using
a LocalizedMessage
within the details
field. See
LocalizedMessage
for more details.) Any dynamic aspects of
the message must be included as metadata within the ErrorInfo
that appears
in details
.
The message is considered a problem description. It is intended for
developers to understand the problem and is more detailed than
ErrorInfo.reason
, discussed later.
Messages should use simple descriptive language that is easy to understand (without technical jargon) to clearly state the problem that results in an error, and offer an actionable resolution to it.
For pre-existing (brownfield) APIs which have previously returned errors
without machine-readable identifiers, the value of message
must
remain the same for any given error. For more information, see
Changing Error Messages.
Status.code
The code
field is the status code, which must be the numeric value of
one of the elements of the google.rpc.Code
enum.
For example, the value 5
is the numeric value of the NOT_FOUND
enum element.
Status.details
The details
field allows messages with additional error information to
be included in the error response, each packed in a google.protobuf.Any
message.
Google defines a set of standard detail payloads for error details, which cover most common needs for API errors. Services should use these standard detail payloads when feasible.
Each type of detail payload must be included at most once. For
example, there must not be more than one BadRequest
message in the details
, but there may be a BadRequest
and a
PreconditionFailure
.
All error responses must include an ErrorInfo
within details
. This
provides machine-readable identifiers so that users can write code against
specific aspects of the error.
The following sections describe the most common standard detail payloads.
ErrorInfo
The ErrorInfo
message is the primary way to send a
machine-readable identifier. Contextual information should be
included in metadata
in ErrorInfo
and must be included if it
appears within an error message.
The reason
field is a short snake_case description of the cause of the
error. Error reasons are unique within a particular domain of errors.
The reason must be at most 63 characters and match a regular expression of
[A-Z][A-Z0-9_]+[A-Z0-9]
. (This is UPPER_SNAKE_CASE, without leading
or trailing underscores, and without leading digits.)
The reason should be terse, but meaningful enough for a human reader to understand what the reason refers to.
Good examples:
CPU_AVAILABILITY
NO_STOCK
CHECKED_OUT
AVAILABILITY_ERROR
Bad examples:
THE_BOOK_YOU_WANT_IS_NOT_AVAILABLE
(overly verbose)ERROR
(too general)
The domain
field is the logical grouping to which the reason
belongs.
The domain must be a globally unique value, and is typically the name of the service
that generated the error, e.g. pubsub.googleapis.com
.
The (reason, domain) pair form a machine-readable way of identifying a particular error. Services must use the same (reason, domain) pair for the same error, and must not use the same (reason, domain) pair for logically different errors. The decision about whether two errors are "the same" or not is not always clear, but should generally be considered in terms of the expected action a client might take to resolve them.
The metadata
field is a map of key/value pairs providing additional
dynamic information as context. Each key within metadata
must be at most
64 characters long, and conform to the regular expression [a-z][a-zA-Z0-9-_]+
.
Any request-specific information which contributes to the Status.message
or
LocalizedMessage.message
messages must be represented within metadata
.
This practice is critical so that machine actors do not need to parse error
messages to extract information.
For example consider the following message:
An <e2-medium> VM instance with <local-ssd=3,nvidia-t4=2> is currently unavailable in the <us-east1-a> zone. Consider trying your request in the <us-central1-f,us-central1-c> zone(s), which currently has/have capacity to accommodate your request. Alternatively, you can try your request again with a different VM hardware configuration or at a later time. For more information, see the troubleshooting documentation.
The ErrorInfo.metadata
map for the same error could be:
"zone": "us-east1-a"
"vmType": "e2-medium"
"attachment": "local-ssd=3,nvidia-t4=2"
"zonesWithCapacity": "us-central1-f,us-central1-c"
Additional contextual information that does not appear in an error message
may also be included in metadata
to allow programmatic use by the client.
The metadata included for any given (reason,domain) pair can evolve over time:
- New keys may be included
- All keys that have been included must continue to be included (but may have empty values)
In other words, once a user has observed a given key for a (reason, domain) pair, the service must allow them to rely on it continuing to be present in the future.
The set of keys provided in each (reason, domain) pair is independent from other pairs,
but services should aim for consistent key naming. For example, two error reasons
within the same domain should not use metadata keys of vmType
and virtualMachineType
.
LocalizedMessage
google.rpc.LocalizedMessage
is used to provide an error
message which should be localized to a user-specified locale where
possible.
If the Status.message
field has a sub-optimal value
which cannot be changed due to the constraints in the
Changing Error Messages section, LocalizedMessage
may be used to provide a better error message even when no user-specified
locale is available.
Regardless of how the locale for the message was determined, both the locale
and message
fields must be populated.
The locale
field specifies the locale of the message,
following IETF bcp47 (Tags for
Identifying Languages). Example values: "en-US"
, "fr-CH"
, "es-MX"
.
The message
field contains the localized text itself. This
should include a brief description of the error and a call to action
to resolve the error. The message should include contextual information
to make the message as specific as possible. Any contextual information
in the message must be included in ErrorInfo.metadata
. See
ErrorInfo
for more details of how contextual information
may be included in a message and the corresponding metadata.
The LocalizedMessage
payload should contain the complete resolution
to the error. If more information is needed than can reasonably fit in this
payload, then additional resolution information must be provided in
a Help
payload. See the Help section for guidance.
Help
When other textual error messages (in Status.message
or
LocalizedMessage.message
) don't provide the user sufficient
context or actionable next steps, or if there are multiple points of
failure that need to be considered in troubleshooting, a link to
supplemental troubleshooting documentation must be provided in the
Help
payload.
Provide this information in addition to a clear problem definition and
actionable resolution, not as an alternative to them. The linked
documentation must clearly relate to the error. If a single page
contains information about multiple errors, the
ErrorInfo.reason
value must be used to narrow down
the relevant information.
The description
field is a textual description of the linked information.
This must be suitable to display to a user as text for a hyperlink.
This must be plain text (not HTML, Markdown etc).
Example description
value: "Troubleshooting documentation for STOCKOUT errors"
The url
field is the URL to link to. This must be an absolute URL,
including scheme.
Example url
value:
"https://cloud.google.com/compute/docs/resource-error"
For publicly-documented services, even those with access controls on actual usage, the linked content must be accessible without authentication.
For privately-documented services, the linked content may require authentication.
Error messages
Textual error messages can be present in both Status.message
and
LocalizedMessage.message
fields. Messages should be succinct but
actionable, with request-specific information (such as a resource name
or region) providing precise details where appropriate. Any request-specific
details must be present in ErrorInfo.metadata
.
Changing error messages
Changing the content of Status.message
over time must be done carefully,
to avoid breaking clients who have previously had to rely on the message
for all information. See the rationale section
for more details.
For a given RPC:
- If the RPC has always returned
ErrorInfo
with machine-readable information, the content ofStatus.message
may change over time. (For example, the API producer may provide a clearer explanation, or more request-specific information.) - Otherwise, the content of
Status.message
must be stable, providing the same text with the same request-specific information. Instead of changingStatus.message
, the API should include aLocalizedMessage
withinStatus.details
.
Even if an RPC has always returned ErrorInfo
, the API may keep
the existing Status.message
stable and add a
LocalizedMessage
within Status.details
.
The content of LocalizedMessage.details
may change over time.
Partial errors
APIs should not support partial errors. Partial errors add significant complexity for users, because they usually sidestep the use of error codes, or move those error codes into the response message, where the user must write specialized error handling logic to address the problem.
However, occasionally partial errors are necessary, particularly in bulk operations where it would be hostile to users to fail an entire large request because of a problem with a single entry.
Methods that require partial errors should use long-running
operations, and the method should put partial failure information
in the metadata message. The errors themselves must still be
represented with a google.rpc.Status
object.
Permission Denied
If the user does not have permission to access the resource or parent,
regardless of whether or not it exists, the service must error with
PERMISSION_DENIED
(HTTP 403). Permission must be checked prior to
checking if the resource or parent exists.
If the user does have proper permission, but the requested resource or
parent does not exist, the service must error with NOT_FOUND
(HTTP
404).
HTTP/1.1+JSON representation
When clients use HTTP/1.1 as per AIP-127, the error information
is returned in the body of the response, as a JSON object. For backward
compatibility reasons, this does not map precisely to google.rpc.Status
,
but contains the same core information. The schema is defined in the following proto:
message Error {
message Status {
// The HTTP status code that corresponds to `google.rpc.Status.code`.
int32 code = 1;
// This corresponds to `google.rpc.Status.message`.
string message = 2;
// This is the enum version for `google.rpc.Status.code`.
google.rpc.Code status = 4;
// This corresponds to `google.rpc.Status.details`.
repeated google.protobuf.Any details = 5;
}
Status error = 1;
}
The most important difference is that the code
field in the JSON is an HTTP status code,
not the direct value of google.rpc.Status.code
. For example, a google.rpc.Status
message with a code
value of 5 would be mapped to an object including the following
code-related fields (as well as the message, details etc):
{
"error": {
"code": 404, // The HTTP status code for "not found"
"status": "NOT_FOUND" // The name in google.rpc.Code for value 5
}
}
The following JSON shows a fully populated HTTP/1.1+JSON representation of an error response.
{
"error": {
"code": 429,
"message": "The zone 'us-east1-a' does not have enough resources available to fulfill the request. Try a different zone, or try again later.",
"status": "RESOURCE_EXHAUSTED",
"details": [
{
"@type": "type.googleapis.com/google.rpc.ErrorInfo",
"reason": "RESOURCE_AVAILABILITY",
"domain": "compute.googleapis.com",
"metadata": {
"zone": "us-east1-a",
"vmType": "e2-medium",
"attachment": "local-ssd=3,nvidia-t4=2",
"zonesWithCapacity": "us-central1-f,us-central1-c"
}
},
{
"@type": "type.googleapis.com/google.rpc.LocalizedMessage",
"locale": "en-US",
"message": "An <e2-medium> VM instance with <local-ssd=3,nvidia-t4=2> is currently unavailable in the <us-east1-a> zone. Consider trying your request in the <us-central1-f,us-central1-c> zone(s), which currently has/have capacity to accommodate your request. Alternatively, you can try your request again with a different VM hardware configuration or at a later time. For more information, see the troubleshooting documentation."
},
{
"@type": "type.googleapis.com/google.rpc.Help",
"links": [
{
"description": "Additional information on this error",
"url": "https://cloud.google.com/compute/docs/resource-error"
}
]
}
]
}
}
Rationale
Requiring ErrorInfo
ErrorInfo
is required because it further identifies an error. With
only approximately twenty available values for Status.status
,
it is difficult to disambiguate one error from another across an entire
API Service.
Also, error messages often contain dynamic segments that express variable information, so there needs to be machine-readable component of every error response that enables clients to use such information programmatically.
Including LocalizedMessage
LocalizedMessage
was selected as the location to present alternate
error messages. While LocalizedMessage
may use a locale specified
in the request, a service may provide a LocalizedMessage
even without
a user-specified locale, typically to provide a better error message in
situations where Status.message
cannot be changed.
Where the locale is not specified by the user, it should be en-US
(US English).
A service may include LocalizedMessage
even when the same message is
provided in Status.message
and when localization into a user-specified locale
is not supported. Reasons for this include:
- An intention to support user-specified localization in the near future, allowing
clients to consistently use
LocalizedMessage
and not change their error-reporting code when the functionality is introduced. - Consistency across all RPCs within a service: if some RPCs include
LocalizedMessage
and some only useStatus.message
for error messages, clients have to be aware of which RPCs will do what, or implement a fall-back mechanism. ProvidingLocalizedMessage
on all RPCs allows simple and consistent client code to be written.
Updating Status.message
If a client has ever observed an error with Status.message
populated
(which it always will be) but without ErrorInfo
, the developer of that client
may well have had to resort to parsing Status.message
in order to find out
information beyond just what Status.code
conveys. That information may be
found by matching specific text (e.g. "Connection closed with unknown cause")
or by parsing the message to find out metadata values (e.g. a region with
insufficient resources). At that point, Status.message
is implicitly part
of the API contract, so must not be updated - that would be a breaking
change. This is one reason for introducing LocalizedMessage
into the
Status.details
.
RPCs which have always included ErrorInfo
are in a better position:
the contract is then more about the stability of ErrorInfo
for any given
error. The reason and domain need to be consistent over time, and the
metadata provided for any given (reason,domain) can only be expanded.
It's still possible that clients could be parsing Status.message
instead of
using ErrorInfo
, but they will always have had a more robust option
available to them.
Further reading
- For which error codes to retry, see AIP-194.
- For how to retry errors in client libraries, see AIP-4221.
Changelog
- 2024-10-18: Rewrite/restructure for clarity.
- 2024-01-10: Incorporate guidance for writing effective messages.
- 2023-05-17: Change the recommended language for
Status.message
to be the service's native language rather than English. - 2023-05-17: Specify requirements for changing error messages.
- 2023-05-10: Require
ErrorInfo
for all error responses. - 2023-05-04: Require uniqueness by message type for error details.
- 2022-11-04: Added guidance around PERMISSION_DENIED errors previously found in other AIPs.
- 2022-08-12: Reworded/Simplified intro to add clarity to the intent.
- 2020-01-22: Added a reference to the
ErrorInfo
message. - 2019-10-14: Added guidance restricting error message mutability to if there is a machine-readable identifier present.
- 2019-09-23: Added guidance about error message strings being able to change.