AIP-236
Policy preview
A policy is a resource that provides rules that admit or deny access to other resources. Generally, the outcome of a policy can be evaluated to a specific set of outcomes.
Changes to policies without proper validation may have unintended consequences that can severely impact a customer’s overall infrastructure setup. To safely update resources, it is beneficial to test these changes via policy rollout APIs.
Preview is a rollout safety mechanism for policy resources, which gives the customer the ability to validate the effect of their proposed changes against production traffic prior to the changes going live. The result of the policy evaluation against traffic is logged in order to give the customer the data required to test the correctness of the change.
Firewall policies exemplify a case that is suitable for previewing. A new configuration can be evaluated against traffic to observe which IPs would be allowed or denied. This gives the customer the data to guide a decision on whether to promote the proposed changes to live.
The expected flow for previewing a policy is as follows:
- The user creates an experiment containing a new policy configuration intended to replace the live policy.
- The user uses the "startPreview" method to start generating logs which compare the live and experiment policy evaluations against live traffic.
- The user inspects the logs to determine whether the experiment has the intended result.
- The user uses the "commit" method to promote the experiment to live.
Guidance
Non-goals
This proposal is for a safety mechanism for policy rollouts only. Safe rollouts for non-policy resources are not in scope.
Experiments
A new configuration of a policy to be previewed is stored as a nested collection under the policy. These nested collections are known as experiments.
A hypothetical policy resource called, Policy
, is used throughout. It has the
following resource name pattern:
projects/{project}/locations/{location}/policies/{policy}
The experimental versions of the resource used for previewing or other safe
rollout practices are represented as a nested collection under Policy
using a
new resource type. The resource type must follow the naming convention
RegularResourceTypeExperiment
.
The following pattern is used for the experiment collection:
projects/{project}/locations/{location}/policies/{policy}/experiments/{experiment}
A proto used to represent an experiment must contain the following:
1. The required top-level fields for a resource, like `name` and `etag`
2. The policy message that is being tested itself
3. The field, `preview_metadata`, which contains metadata specific to
previewing the experiment of a specific resource type.
message PolicyExperiment {
// google.api.resource, name, and other annotations and fields
// The policy experiment. This Policy will be used to preview the effects of
// the change but will not affect live traffic.
Policy policy = 2;
// The metadata associated with this policy experiment.
PolicyPreviewMetadata preview_metadata = 3
[(google.api.field_behavior) = OUTPUT_ONLY];
// Allows clients to store small amounts of arbitrary data.
map<string, string> annotations = 4;
}
- The experiment proto must have a top-level field with the same type as the
live policy.
- It must be named as the live resource type. For example, if the
experiment is for FirewallPolicy, then this field must be named
firewall_policy
. - The name inside the embedded
policy
message must be the name of the live policy.
- It must be named as the live resource type. For example, if the
experiment is for FirewallPolicy, then this field must be named
- When the user is ready to promote an experiment, they must copy the
policy
message into the live policy and delete the experiment. This can be done manually or via a "commit" custom method. - A product may support multiple experiments concurrently being previewed
for a single live policy.
- Each experiment must generate logs having each entry preceded by log_prefix so that the user can compare the results of the experiment with the behavior of the live policy.
- The number of experimental configurations for a given live policy may be capped at a certain number and the cap must be documented.
- Cascading deletes must occur: if the live policy is deleted, all experiments must also be deleted.
map<string,string>
annotations must allow clients to store small amounts of arbitrary data.
Metadata
preview_metadata
tracks all metadata of previewing the experiment. The
messages must follow the convention: RegularResourceTypePreviewMetadata
.
This is so the proto can be defined uniquely for each resource type in the
same service with experiments.
message PolicyPreviewMetadata {
// Possible values of the state of previewing the experiment.
enum State {
// Default value. This value is unused.
STATE_UNDEFINED = 0;
// The experiment is actively previewing.
ACTIVE = 1;
// The previewing of the experiment has been stopped.
SUSPENDED = 2;
}
// The state of previewing the experiment.
State state = 1;
// An identifying string common to all logs generated when previewing the
// experiment. Searching all logs for this string will isolate the results.
string log_prefix = 2;
// The most recent time at which this experiment started previewing.
google.protobuf.Timestamp start_time = 3;
// The most recent time at which this experiment stopped previewing.
google.protobuf.Timestamp stop_time = 4;
}
PolicyPreviewMetadata
must have the fields defined in the proto above.- It may have additional fields if the service or resource requires it.
- When an experiment is first previewed,
preview_metadata
must be absent.- It is present on the experiment once the "startPreview" method is used.
- All
preview_metadata
fields must be output only. state
changes betweenACTIVE
andSUSPENDED
when previewing is started or stopped. This happens when the "startPreview" or "stopPreview custom methods are invoked, respectively.- The first time the "startPreview" custom method is used, the system must
create
preview_metadata
and do the following:- It must set the
state
toACTIVE
- It must populate
start_time
with the current time.start_time
must be updated every timestate
is changed toACTIVE
.
- It must set a system generated
log_prefix
string, which is a predefined constant hard coded by the system developers. - The same value is used for previewing experiments for the given resource type. For example, "FirewallPolicyPreviewLog" for FirewallPolicy.
- It must set the
- When the "stopPreview" custom method is used, the system must do the
following:
- It must set the
state
toSUSPENDED
- It must populate the
stop_time
with the current time.
- It must set the
Methods
create
- The resource must be created using long-running
Create and
google.longrunning.operation_info.response_type
must bePolicyExperiment
. - Creating a new experiment to preview must support the following use
cases:
- Preview a new policy.
- Preview an update to an already live policy.
- Preview a deletion of a current policy.
- For the update and delete use cases, the
policy
field in the experiment must have the full payload of the live policy copied into it, including the name.- The user must set the rules to the new intended state to preview an update.
- The user must set set the rules to represent a no-op to preview a delete.
- To preview a new policy, the system must do the following:
- If the system does not support a nested collection without a live policy,
the user must create a live policy and set the rules to represent a
no-op. For example, the rules of a no-op policy may be empty.
- An experiment is created as a child of the no-op policy.
- If the system does not support a nested collection without a live policy,
the user must create a live policy and set the rules to represent a
no-op. For example, the rules of a no-op policy may be empty.
- If the system supports previewing multiple experiments for a live policy, calling "create" more than once must create multiple experiments.
update
- The resource must be updated using long-running
Update and
google.longrunning.operation_info.response_type
must bePolicyExperiment
. - The name inside
policy
must not change but the other fields can in order to change the experiment being previewed because thispolicy
is intended to replace the live policy, and the name of the live policy must not change. - The system must set the
state
toSUSPENDED
if thestate
wasACTIVE
at the time of an update.- This is so the user can easily distinguish between different versions of the experiment being previewed.
get
- The standard method, Get, must be included for
PolicyExperiment
resource types.
list
- The standard method, List, must be included for
PolicyExperiment
resource types. - Filtering on
PolicyPreviewMetadata
indicates which experiments are actively previewed.- For example, the following filter string returns a List response with experiments being previewed: preview_metadata.state = ACTIVE.
delete
- The resource must be deleted using long-running
Delete and
google.longrunning.operation_info.response_type
must bePolicyExperiment
.
startPreview
// Starts previewing a PolicyExperiment. This triggers the system to start
// generating logs to evaluate the PolicyExperiment.
rpc StartPreviewPolicyExperiment(StartPreviewPolicyExperimentRequest)
returns (google.longrunning.Operation) {
option (google.api.http) = {
post: "/v1/{name=policies/*/experiments/*}:startPreview"
body: "*"
};
option (google.longrunning.operation_info) = {
response_type: "PolicyExperiment"
metadata_type: "StartPreviewPolicyExperimentMetadata"
};
}
// The request message for the startPreview custom method.
message StartPreviewPolicyExperimentRequest {
// The name of the PolicyExperiment.
string name = 1;
}
- This custom method is required.
google.longrunning.Operation.metadata_type
must follow guidance on Long-running operations- This method must trigger the system to start generating logs to preview the experiment.
- Whenever the method is called successfully, the system must set the
following values in the
PolicyPreviewMetadata
:log_prefix
to the predefined constant.start_time
to the current timestate
toACTIVE
.
- If the method is called on an experiment with the rules representing a no-op, then the system must preview the deletion of the live policy.
stopPreview
// Stops previewing a PolicyExperiment. This triggers the system to stop
// generating logs to evaluate the PolicyExperiment.
rpc StopPreviewPolicyExperiment(StopPreviewPolicyExperimentRequest)
returns (google.longrunning.Operation) {
option (google.api.http) = {
post: "/v1/{name=policies/*/experiments/*}:stopPreview"
body: "*"
};
option (google.longrunning.operation_info) = {
response_type: "PolicyExperiment"
metadata_type: "StopPreviewPolicyExperimentMetadata"
};
}
// The request message for the stopPreview custom method.
message StopPreviewPolicyExperimentRequest {
// The name of the PolicyExperiment.
string name = 1;
}
- This custom method is required.
google.longrunning.Operation.metadata_type
must follow guidance on Long-running operations- This method must trigger the system to stop generating logs to preview the experiment.
- Whenever the method is called successfully, the system must set the
following values in the
PolicyPreviewMetadata
:stop_time
to the current timestate
toSUSPENDED
commit
The resource may expose a new custom method called "commit" to promote an
experiment. The system copies policy
from the experiment into the live policy
and then deletes the experiment.
Declarative clients may manually copy fields from an experiment into the live policy and then delete the experiment rather than calling "commit" if preferable.
// Commits a PolicyExperiment. This copies the PolicyExperiment's policy message
// to the live policy then deletes the PolicyExperiment.
rpc CommitPolicyExperiment(CommitPolicyExperimentRequest)
returns (google.longrunning.Operation) {
option (google.api.http) = {
post: "/v1/{name=policies/*/experiments/*}:commit"
body: "*"
};
option (google.longrunning.operation_info) = {
response_type: "google.protobuf.Empty"
metadata_type: "CommitPolicyExperimentMetadata"
};
}
// The request message for the commit custom method.
message CommitPolicyExperimentRequest {
string name = 1;
string etag = 2;
string parent_etag = 3;
}
google.longrunning.Operation.metadata_type
must follow guidance on Long-running operations- The method must atomically copy
policy
from the experiment into the live policy, and then delete the experiment. - If any experiment fails "commit", previewing it must not stop, and the live policy must not be updated.
- The method can be called on an experiment in any state.
- The
etag
must match that of the experiment in order for commit to be successful. This is so the user does not commit an unintended version of the experiment.- If no
etag
is provided, the API must not succeed to prevent the user from unintentionally committing a different version of the experiment as intended. - A
parent_etag
may be provided to guarantee that the experiment overwrites a specific version of the live policy.
- If no
- The method is not idempotent and calling it twice on the same experiment must return a 404 NOT_FOUND as the experiment is deleted as part of the first call.
Changes to live policy API methods
delete
- A delete of the live policy must delete all experiments.
- To maintain the experiments while negating the effect of the live policy, the live policy must be changed to a no-op policy instead of using this method.
Logging
Logging is crucial for the user to evaluate whether an experiment should be promoted to live.
Logs must contain the results of the evaluated experiment, the etag
associated with that experiment alongside that of the live policy, and be
preceded by the value of log_prefix
.
- The etag
fields help the user identify which
configurations of the live and experiment are evaluated in the log.
- log_prefix
helps the user separate logs specifically generated for
previewing the experiment from other use cases.
Overall, these logs help the user make a decision about whether to promote the experiment to live.
Changelog
- 2023-04-27: Methods for start and stop renamed. State to enum. Annotations added.
- 2023-03-30: Initial AIP written.