AIP-143

Standardized codes

Many common concepts, such as spoken languages, countries, currency, and so on, have common codes (usually formalized by the International Organization for Standardization) that are used in data communication and processing. These codes address the issue that there are often different ways to express the same concept in written language (for example, "United States" and "USA", or "EspaƱol" and "Spanish").

Guidance

For concepts where a standardized code exists and is in common use, fields representing these concepts should use the standardized code for both input and output.

// A message representing a book.
message Book {
  // Other fields...

  // The IETF BCP-47 language code representing the language in which
  // the book was originally written.
  // https://en.wikipedia.org/wiki/IETF_language_tag
  string language_code = 99;
}
  • Fields representing standardized concepts must use the appropriate data type for the standard code (usually string).
    • Fields representing standardized concepts should not use enums, even if they only allow a small subset of possible values. Using enums in this situation often leads to frustrating lookup tables when using multiple APIs together.
    • Fields representing standardized concepts must indicate which standard they follow, preferably with a link (either to the standard itself, the Wikipedia description, or something similar).
  • The field name should end in _code or _type unless the concept has an obviously clearer suffix.
  • When accepting values provided by users, validation should be case-insensitive unless this would introduce ambiguity (for example, accept both en-gb and en-GB). When providing values to users, APIs should use the canonical case (in the example above, en-GB).

Content types

Fields representing a content or media type must use IANA media types. For legacy reasons, the field should be called mime_type.

Countries and regions

Fields representing individual countries or nations must use the Unicode CLDR region codes (list), such as US or CH, and the field must be called region_code.

Important: We use region_code and not country_code to include regions distinct from any country, and avoid political disputes over whether or not some regions are countries.

Currency

Fields representing currency must use ISO-4217 currency codes, such as USD or CHF, and the field must be called currency_code.

Note: For representing an amount of money in a particular currency, rather than the currency code itself, use google.protobuf.Money.

Language

Fields representing spoken languages must use IETF BCP-47 language codes (list), such as en-US or de-CH, and the field must be called language_code.

Time zones

Fields representing a time zone should use the IANA TZ codes, and the field must be called time_zone.

Fields also may represent a UTC offset rather than a time zone (note that these are subtly different). In this case, the field must use the ISO-8601 format to represent this, and the field must be named utc_offset.

Changelog

  • 2020-05-12: Replaced country_code guidance with region_code, correcting an original error.