AIP-4221

Client-side retry

Many APIs have error modes that a client should retry. These error modes vary across APIs, leaving users to hand-write retry logic around client libraries based on available documentation. Client libraries have the opportunity to help by implementing automatic client-side retries of well-known error modes.

Guidance

Client libraries should provide automatic, client-side retry logic. Client libraries with automatic client-side retry logic should provide a mechanism for users to specify error codes to be retried and delays for those retries.

Client libraries for API systems that support remotely-resolved client retry configuration should respect the remotely-resolved configuration. However, user configuration must be honored.

Client library generators implementing this feature must accept a retry configuration. This retry configuration may be supplied via a protoc plugin option. In the absence of a given retry configuration, client library generators should not generate a default retry configuration.

Retry implementation

Client libraries should make client-side retry transparent to the user. The user should not have to opt-in to client-side retry explicitly, but the user must have a way to disable client-side retry altogether.

Retry configuration mechanisms

Client libraries should surface a mechanism through which users may control the client-side retry configuration, including disabling client-side retry altogether.

For example, Go client libraries for gRPC services can supply an option, WithDisableRetry, at client initialization to disable the use of the automatic client-side retry logic.

opts := []grpc.DialOption{
  grpc.WithDisableRetry(),
  grpc.WithTransportCredentials(creds),
}

cc, err := grpc.Dial("my.api.net:443", opts...)
if err != nil {
  // ...
}

Remotely-resolved client configuration

Some API systems have built-in mechanisms for clients to retrieve a remotely-defined configuration that includes client-side retry configuration.

For example, gRPC supports the resolution of client configuration that includes configuration for automatic client-side retry.

Client libraries should respect the remotely-resolved configuration, except when a user overrides it via the aforementioned client library retry configuration mechanisms.

Client library generator retry configuration

Client library generators that implement client-side retry must accept a retry configuration. This is to enable API producers to supply a retry configuration that best-suits their service.

For example, gRPC client-side retry is configured with a RetryPolicy within a gRPC Service Config. Here is a configuration that applies the RetryPolicy to all methods in the google.example.library.v1.LibraryService service, except for those that are explicitly named which get no RetryPolicy.

{
  "methodConfig": [{
    "name": [{ "service": "google.example.library.v1.LibraryService" }],
    "waitForReady": true,
    "timeout": "60s",
    "retryPolicy": {
      "maxAttempts": 3,
      "initialBackoff": "0.01s",
      "maxBackoff": "60s",
      "backoffMultiplier": 1.3,
      "retryableStatusCodes": ["UNAVAILABLE"]
    }
  },
  {
    "name": [
      { "service": "google.example.library.v1.LibraryService", "method": "CreatePublisher" },
      { "service": "google.example.library.v1.LibraryService", "method": "DeletePublisher" },
      { "service": "google.example.library.v1.LibraryService", "method": "CreateBook" },
      { "service": "google.example.library.v1.LibraryService", "method": "DeleteBook" },
      { "service": "google.example.library.v1.LibraryService", "method": "UpdateBook" },
      { "service": "google.example.library.v1.LibraryService", "method": "MoveBook" }
    ],
    "waitForReady": true,
    "timeout": "60s"
  }]
}

The retry configuration may be a protoc plugin option. For example, a generator could accept the file path of the configuration with an option like:

  --{plugin_name}_opt="retry-config=/path/to/config.file"

In the absence of a retry configuration, a generator should not generate a "default" retry configuration. This results in a generated client library that does not retry anything unless configured to do so by the user.

Further reading

  • For which error codes to retry, see AIP-194.

Changelog

  • 2020-09-23: Changed the examples from "shelves" to "publishers", to present a better example of resource ownership.