AIP-217

Unreachable resources

Occasionally, a user may ask for a list of resources, and some set of resources in the list are temporarily unavailable. For example, a user may ask to list resources across multiple parent locations, but one of those locations is temporarily unreachable. In this situation, it is still desirable to provide the user with all the available resources, while indicating that something is missing.

Guidance

If a method to retrieve data is capable of partially failing due to one or more resources being temporarily unreachable, the response message must include a field to indicate this:

message ListBooksResponse {
  // The books matching the request.
  repeated Book books = 1;

  // The next page token, if there are more books matching the
  // request.
  string next_page_token = 2;

  // Unreachable resources.
  repeated string unreachable = 3;
}
  • The field must be a repeated string, and should be named unreachable.
  • The field must be set to the names of the resources which are the cause of the issue, such as the parent or individual resources that could not be reached. The objects listed as unreachable may be parents (or higher ancestors) rather than the individual resources being requested. For example, if a location is unreachable, the location is listed.
    • The response must not provide any other information about the issue, such as error details or codes. To discover what the underlying issue is, users should send a more specific request.
    • The service must provide a way for the user to get an error with additional information, and should allow the user to repeat the original call with more restrictive parameters in order to do so.
    • The resource names provided in this field may be heterogeneous. The field should document what potential resources may be provided in this field, and note that it might expand later.

Important: If a single unreachable location or resource prevents returning any data by definition (for example, a list request for a single publisher where that publisher is unreachable), the service must fail the entire request with an error.

Pagination

When paginating over a list, it is likely that the service will not know that there are unreachable parents or resources initially. Further, parents may alternate between being available and unavailable in unpredictable ways throughout the process of listing all the requested resources.

These facts lead to the following guidance:

  • The response must provide any outstanding unreachable locations or resources in the unreachable field on pages following the final page that contains a resource.
    • The response should not include both requested data and unreachable resources on the same page.
      • For example, if there are two pages of books and one unavailable publisher, there should be three pages total: first the two pages of books, and then a final page with no books and the unavailable publisher.
    • If the number of unreachable resources to list is very large, the response should honor the page_size field in the same way as for resources. In this case, all pages with requested information should precede all pages with unavailable resources or locations.
    • The final page's unreachable field must only include resources or parents that were partially provided (or missing completely) across the entirety of the pagination process.
      • For example, if a parent or resource was unreachable at the beginning of pagination and it became reachable again and the entire set of previously unreachable data was provided to the user on any page, the unreachable field must not include the intermittently-unreachable parent or resource.
      • On the other hand, if only some of the resources for a given parent are provided during such an incident as described above, the parent or resource must be included in the unreachable field.

Adopting partial succcess

In order for an existing API that has a default behavior differing from the aforementioned guidance i.e. the API call returns an error status instead of a partial result, to adopt the unreachable pattern the API must do the following:

  • The default behavior must be retained to avoid incompatible behavioral changes
    • For example, if the default behavior is to return an error if any location is unreachable, that default behavior must be retained.
  • The request message must have a bool return_partial_success field
  • The response message must have the standard repeated string unreachable field
  • The two aforementioned fields must be added simultaneously

When the bool return_partial_success field is set to true in a request, the API must behave as described in the aforementioned guidance with regards to populating the repeated string unreachable response field.

message ListBooksRequest {
  // Standard List request fields...

  // Setting this field to `true` will opt the request into returning the
  // resources that are reachable, and into including the names of those that
  // were unreachable in the [ListBooksResponse.unreachable] field. This can
  // only be `true` when reading across collections e.g. when `parent` is set to
  //  `"projects/example/locations/-"`.
  bool return_partial_success = 4;
}

message ListBooksResponse {
  // Standard List Response fields...

  // Unreachable resources. Populated when the request opts into
  // `return_partial_success` and reading across collections e.g. when
  // attempting to list all resources across all supported locations.
  repeated string unreachable = 3;
}

Partial success granularity

If the bool return_partial_success field is set to true in a request that is scoped beyond the supported granualirty of the API's ability to reasonably report unreachable resources, the API should return an INVALID_ARGUMENT error with details explaining the issue. For example, if the API only supports return_partial_success when [Reading Across Collections][aip159], it returns an INVALID_ARGUMENT error when given a request scoped to a specific parent resource collection. The supported granularity must be documented on the return_partial_success field.

Rationale

Using request field to opt-in

Introducing a new request field as means of opting into the partial success behavior is the best way to communicate user intent while keeping the default behavior backwards compatible. The alternative, changing the default behavior with the introduction of the unreachable response field, presents a backwards incompatible change. Users that previously expected failure when any resource was unreachable, assume the successful response means all resources are accounted for in the response.

Introducing fields simultaneously

Introducing the request and response fields simultaneously is to prevent an invalid intermediate state that is presented by only adding one or the other. If only unreachable is added, then it could be assumed that it being empty means all resources were returned when that may not be true. If only return_partial_success is added, then the user wouldn't have a means of knowing which resources were unreachable.

Partial success granularity limitations

At a certain level of request scope granularity, an API is simply unable to enumerate the resources that are unreachable. For example, global-only APIs may be unable to provide granularity at a localized collection level. In such a case, preemptively returning an error when return_partial_success=true protects the user from the risks of the alternative - expecting unreachable resources if there was an issue, but not getting any, thus falsely assuming everything was retrieved. This aligns with guidance herein that suggests failing requests that cannot be fulfilled preemptively.

Further reading

  • For listing across collections, see AIP-159.

Changelog

  • 2024-07-19: Add guidance for brownfield adoption of partial success.