Configuring Key Rotation for AWS Event Handlers

As a part of deploying Event Handlers on AWS, Azure or GCP, Guardrails automatically generates a JSON Web Token (JWT) with a security token embedded in it. On a periodic basis, this token ought to be rotated. This document describes the policies, best practices and troubleshooting procedures for rotating the JWT.

Workspace Configuration Policies

These are the Turbot > Workspace policies relevant to event handling for SaaS and Enterprise customers. Ideally, they are configured before enabling event handling for the first time but can be changed at any time.

Initial Setup Process

This process assumes that Event Handlers have already been enabled and deployed. If not, follow these configuration steps then enable Event Handling.

  1. Decide how often the JWT will rotate for the Event Handlers. It can be as often as 1 month or as long as 5 years.
  2. Set Webhook Secrets > Expiration Period to a value that meets your organizational needs.
  3. Only if specific secrets are required, set Webhook Secrets. Otherwise, Guardrails will automatically generate new secrets. Setting specific secrets is an uncommon requirement.
  4. Set Webhook Secrets > Rotation to Enforce: Rotate webhook secret. This will kick off rotation of the JWT for all the event handlers in this workspace.

Forcing a Key Rotation

In cases where a key has been compromised or a very old key needs to be refreshed, follow these steps to kick off a refresh.

Preflight Checks

  1. Examine all the event handler controls for all platforms in this workspace.
  2. Verify that all event handler controls are in an ok state.
  3. For AWS: Set the control type filter to AWS > SNS > Subscription > Configured. Verify that all Subscription Configured controls for the turbot_aws_api_handler topics are in an ok state.
  4. For Azure: Check that the Azure > Monitor > Action Group > Configured controls are in ok for each turbot_azure_event_handler_action_group action group in each turbot_rg resource group.
  5. For GCP: Check that the GCP > Turbot > Event Handlers > Pub/Sub controls are all in ok.
  6. Resolve any controls in error.

Rotation and Verification

NOTE: In large environments, this can cause significant load on Guardrails. Schedule this change for off-hours.

  1. Set Webhook Secrets > Rotation to Enforce: Rotate webhook secret if not already set.
  2. Set Webhook Secrets > Expiration Period to 1 month. This will cause an immediate recalculation of the Webhook Secrets policy.
    1. If Expiration Period is already set to 1 month, set to 2 months then back to 1 month. When you see the activity described in the next step, rotation was successful.
  3. Look at the Activity page of the Webhook Secrets policy setting. You should see the following activity:
    1. A Control Updated notification for the Turbot > Webhook Secrets Rotation control from ok to alarm.
    2. A Notify saying "Rotated Webhook secrets"
    3. A Policy Setting Updated notification for Webhook Secrets
    4. A Control Updated notification for the Turbot > Webhook Secrets Rotation control from alarm to ok.
  4. Go to the Controls by Control Type report in the top Reports tab.
    1. Filter for the Event Handlers for each platform used in this workspace.
    2. Verify that all Event Handler controls are in ok. If there are controls in an error state, resolve them immediately.
  5. Extended verification that the webhook secret was updated. Each of the control types listed below are responsible for the cloud resource that holds the JWT. If these controls are in an error state, then the webhook hasn't rotated for some reason.
    1. For AWS: Set the control type filter to AWS > SNS > Subscription > Configured. Verify that all Subscription Configured controls for the turbot_aws_api_handler topics are in an ok state.
    2. For Azure: Check that the Azure > Monitor > Action Group > Configured controls are in ok for each turbot_azure_event_handler_action_group action group in each turbot_rg resource group.
    3. For GCP: Check that the GCP > Turbot > Event Handlers > Pub/Sub controls are all in ok.
  6. Set Webhook Secrets > Expiration Period back to whatever the normal rotation period is.
  7. Go to the Activity Ledger report. Filter for the resource notification type. In sufficiently busy environments, there should be some activity after the JWT was rotated. If there is no activity, then generate some in a testing account.

Troubleshooting

In case event handling has stopped because of a key rotation, try the following steps:

  • Was event handling working before the key rotation?
  • Are the event handling policies set to Enforce: Configured? Are there any exceptions where event handling is set to Skip or Enforce: Not configured?
  • Are all the event handlers in an ok state? If not, grab the logs for an Event Handler control that is in error.
  • Are all the controls listed in the extended verification step above in an ok state?
  • Have Webhook Secrets been specified?
  • Were any other Event Handler policies changed at the same time?
  • Is there any environmental change visible in the Guardrails console after the key rotation?
  • Are events missing for all cloud accounts in the workspace, or a specific account/sub/project?
  • If AWS, is Cloudtrail present and functional in all accounts?

If Webhook Secrets has been set and event handling isn't working, do the following:

  1. Delete the Webhook Secrets policy setting.
  2. Follow the rotation and verification steps described above.

If event handling is still not working, gather the above troubleshooting information, then send it to help@turbot.com for additional assistance.

Best Practices

  • Rotating the Webhook Secret should be done at least once per year.
  • Unless there is a very good reason, stay with the default behavior where Guardrails generates new secrets. This avoids the chances of a silent and accidental event handling outage.
  • Be sure to set two Webhook secrets with overlapping expiration periods.
    • Setting a single key may cause Event handling to silently stop working when the secret expires.
    • Setting two keys without overlapping active periods may cause a silent break in event handling too.