AIP-143
Standardized codes
Many common concepts, such as spoken languages, countries, currency, and so on, have common codes (usually formalized by the International Organization for Standardization) that are used in data communication and processing. These codes address the issue that there are often different ways to express the same concept in written language (for example, "United States" and "USA", or "EspaƱol" and "Spanish").
Guidance
For concepts where a standardized code exists and is in common use, fields representing these concepts should use the standardized code for both input and output.
// A message representing a book.
message Book {
// Other fields...
// The IETF BCP-47 language code representing the language in which
// the book was originally written.
// https://en.wikipedia.org/wiki/IETF_language_tag
string language_code = 99;
}
- Fields representing standardized concepts must use the appropriate data
type for the standard code (usually
string
).- Fields representing standardized concepts should not use enums, even if they only allow a small subset of possible values. Using enums in this situation often leads to frustrating lookup tables when using multiple APIs together.
- Fields representing standardized concepts must indicate which standard they follow, preferably with a link (either to the standard itself, the Wikipedia description, or something similar).
- The field name should end in
_code
or_type
unless the concept has an obviously clearer suffix. - When accepting values provided by users, validation should be
case-insensitive unless this would introduce ambiguity (for example, accept
both
en-gb
anden-GB
). When providing values to users, APIs should use the canonical case (in the example above,en-GB
).
Content types
Fields representing a content or media type must use IANA media types.
For legacy reasons, the field should be called mime_type
.
Countries and regions
Fields representing individual countries or nations must use the Unicode
CLDR region codes (list), such as US
or CH
, and the field
must be called region_code
.
Important: Please read the rationale for this requirement.
Currency
Fields representing currency must use ISO-4217 currency codes,
such as USD
or CHF
, and the field must be called currency_code
.
Note: For representing an amount of money in a particular currency, rather
than the currency code itself, use google.type.Money
.
Language
Fields representing spoken languages must use IETF BCP-47 language
codes (list), such as en-US
or de-CH
, and the field must
be called language_code
.
Time zones
Fields representing a time zone should use the IANA TZ codes, and the
field must be called time_zone
.
Fields also may represent a UTC offset rather than a time zone (note that
these are subtly different). In this case, the field must use the ISO-8601
format to represent this, and the field must be named utc_offset
.
Rationale
Country/region field naming
The use of region_code
instead of country_code
is critical to being able to
convey regions that are distinct from any country and to avoid any political
disputes associated with said region regarding their soverignty or affiliation.
Google and many other companies are supporters of Unicode CLDR and standardize
their product internationalization efforts on Unicode CLDR, and APIs are no
different here. Furthermore, many of the values supported by Unicode CLDR are
not countries on their own, so using a more generic name is actually more
compatible with the specification.
Changelog
- 2024-12-03: Strengthen rationale of country/region field naming
- 2024-11-12: Change country/region code list to CLDR list from IANA list
- 2020-05-12: Replaced
country_code
guidance withregion_code
, correcting an original error.