AIP-151
Long-running operations
Occasionally, an API may need to expose a method that takes a significant amount of time to complete. In these situations, it is often a poor user experience to simply block while the task runs; rather, it is better to return some kind of promise to the user and allow the user to check back in later.
The long-running operations pattern is roughly analogous to a Python Future, or a Node.js Promise. Essentially, the user is given a token that can be used to track progress and retrieve the result.
Guidance
Individual API methods that might take a significant amount of time to complete
should return a google.longrunning.Operation
object instead of the
ultimate response message.
// Create a book.
rpc CreateBook(CreateBookRequest) returns (google.longrunning.Operation) {
option (google.api.http) = {
post: "/v1/{parent=publishers/*}/books"
body: "book"
};
option (google.longrunning.operation_info) = {
response_type: "Book"
metadata_type: "OperationMetadata"
};
}
- The response type must be
google.longrunning.Operation
. TheOperation
proto definition must not be copied into individual APIs.- The response must not be a streaming response.
- The method must include a
google.longrunning.operation_info
annotation, which must define both response and metadata types.- The response and metadata types must be defined in the file where the RPC appears, or a file imported by that file.
- If the response and metadata types are defined in another package, the fully-qualified message name must be used.
- The response type should not be
google.protobuf.Empty
(except forDelete
methods), unless it is certain that response data will never be needed. If response data might be added in the future, define an empty message for the RPC response and use that. - The metadata type is used to provide information such as progress, partial
failures, and similar information on each
GetOperation
call. The metadata type should not begoogle.protobuf.Empty
, unless it is certain that metadata will never be needed. If metadata might be added in the future, define an empty message for the RPC metadata and use that.
- APIs with messages that return
Operation
must implement theOperations
service. Individual APIs must not define their own interfaces for long-running operations to avoid non-uniformity. - If an RPC supports a validate-only mode, the response to a
validation request must be one of the following:
- A successful response with an
Operation
which is already complete, with thedone
field set totrue
, and a valid (but potentially empty) response message in theresponse
field, wrapped in agoogle.protobuf.Any
message. Thename
field may be empty, to avoid the service having to maintain state for successful validation. - An immediate error response (typically "bad request")
- An
Operation
with thedone
field set tofalse
, to indicate long-running validation. In this case, thename
field must be set, to allow clients to poll the long-running validation operation until it has completed. Successful validation must eventually be represented by an operation withdone=true
and a valid (but potentially empty) wrapped response message in theresponse
field. Unsuccessful validation must eventually be represented by an operation withdone=true
and the error details provided in theerror
field.
- A successful response with an
Note: User expectations can vary on what is considered "a significant amount of time" depending on what work is being done. A good rule of thumb is 10 seconds.
Standard methods
APIs may return an Operation
from the Create
,
Update
, or Delete
standard methods if appropriate. In
this case, the response type in the operation_info
annotation must be the
standard and expected response type for that standard method.
When creating or deleting a resource with a long-running operation, the
resource should be included in List
and Get
calls; however, the resource should indicate that it is not usable,
generally with a state enum.
Parallel operations
A resource may accept multiple operations that will work on it in parallel, but is not obligated to do so:
- Resources that accept multiple parallel operations may place them in a queue rather than work on the operations simultaneously.
- Resources that do not permit multiple operations in parallel (denying any
new operation until the one that is in progress finishes) must return
ABORTED
if a user attempts a parallel operation, and include an error message explaining the situation. - Resources with declarative-friendly APIs may allow subsequent
updates to preempt existing operations. In this case, the latest update
begins processing and previous operations are marked as
ABORTED
with an error message explaining the situation.
Expiration
APIs may allow their operation resources to expire after sufficient time has elapsed after the operation completed.
Note: A good rule of thumb for operation expiry is 30 days.
Errors
Errors that prevent a long-running operation from starting must return an error response (AIP-193), similar to any other method.
Errors that occur over the course of an operation may be placed in the metadata message. The errors themselves must still be represented with a google.rpc.Status object.
Backwards compatibility
Changing either the response_type
or metadata_type
of a long-running operation
is a breaking change.
Rationale
Validate-only behavior
The guidance for validate-only responses comes from a tension between clients, which benefit from "fully formed" operations that can be treated uniformly, and servers, which don't wish to maintain additional state for trivial operations. It seems counterintuitive that just validating a request should generate more state, but a full operation response that can be fetched later would either require that or "special" singleton operation IDs. The guidance provided is a compromise: by returning a "done" operation, clients can use existing logic to check that the operation has completed successfully (and therefore doesn't need to be fetched for an updated status) but server don't need to maintain any additional state.
Changelog
- 2024-04-23: Provided pattern for validation on RPCs returning long-running operations.
- 2022-05-31: Added compatibility section.
- 2020-08-24: Clarified that responses are not streaming responses.
- 2020-06-24: Added guidance for parallel operations.
- 2020-03-20: Clarified that both
response_type
andmetadata_type
are required. - 2019-11-22: Added a short explanation of what
metadata_type
is for. - 2019-09-23: Added guidance on errors.
- 2019-08-23: Added guidance about fully-qualified message names when the message name is in another package.
- 2019-08-01: Changed the examples from "shelves" to "publishers", to present a better example of resource ownership.