Restoring a Workspace

In this guide, you will:

  • Test backup and restore procedures for Turbot Guardrails workspaces within the single region.
  • Monitor and troubleshoot the disaster recovery process.

An essential part of maintaining Turbot Guardrails is testing disaster recovery. This document covers the process for restoring a destroyed workspace. Restoration should be tested at least once a year, ideally twice. The goal is to have Guardrails Admins familiar with the restoration process and the tools involved.

Testing backup and restore procedures is critical for:

  • Validating backup integrity and restore processes
  • Meeting compliance and audit requirements
  • Training administrators on recovery procedures
  • Measuring recovery time objectives (RTO)
Note

Workspace restoration is just one of several disaster recovery scenarios. Evaluate other scenarios as part of your organization's comprehensive disaster recovery strategy.

Prerequisites

  • Administrator access to AWS Console.
  • Familiarity with Guardrails installation.
  • Understanding of database backup/restore.
  • Access to required AWS services such as RDS, CloudFormation, ECS and Route 53.

Process Summary

  • Build a New Workspace – Set up a fresh workspace for testing, install required mods, and take an RDS snapshot.
  • Simulate DisasterDestroy the workspace by deleting its CloudFormation stack.
  • Restore the Workspace – Recover data from the latest backup, apply migrations, and restart the workspace.
  • Validate Restoration – Log in and verify the workspace is functional.
Important

Only test with non-production workspaces

Document all parameters and configurations

Time the restore process to measure RTO

Test regularly (recommended twice per year)

Follow security best practices

Step 1: Build a New Workspace

In this phase, create a workspace and install baseline mods. Then, import an AWS account with Event Pollers.

Note

Same process applies to Azure and GCP.

This process assumes that Route53 is used for DNS. Customers with manually configured DNS will need to keep track of their configuration.

Steps:

  1. Select TE Version:

    • Choose a dedicated TE version for testing
    • Note: ECS container flush during restore may cause brief outages for workspaces using this TE version
    • If multiple workspaces use this TE version, pause event processing
  2. Access AWS Master Account:

    • Navigate to the alpha region of your AWS Master account
  3. Create Test Workspace:

    • Follow the workspace creation guide
    • Save all CloudFormation parameters used (needed for restoration)
    • Record credentials from CloudFormation Stack outputs
    • Note the Turbot ID of workspace Turbot Root (tmod:@turbot/turbot#/)
  4. Install Required AWS Mods:

    • aws
    • aws-iam
    • aws-kms
    • aws-s3
  5. Configure Workspace:

    • Create "AWS" folder under Turbot Root
    • Import an AWS account into the folder
    • Verify no controls/policies are in tbd state
  6. Document Initial State:

    • Take screenshots of workspace dashboard
    • Record key metrics:
      • Number of resources
      • Active controls count
      • Other relevant statistics
    • Save for post-restore validation
  7. Create Backup:

    • Wait for automated "Restore to point in time" backup
    • Or take a manual RDS backup

Step 2: Drop the Workspace

Warning

Do not delete a production workspace CloudFormation Stack.

Do not delete original database.

  1. Delete the Workspace CloudFormation stack created earlier.
  2. If necessary, force delete the workspace.
  3. Verify that the workspace URL is no longer accessible.

Step 3: Restore the Workspace

In this step, we will recreate a new workspace which initializes an empty database schema. The goal is to restore this empty schema with the data from our restored DB, effectively bringing back the workspace to its previous state. This process ensures we maintain the database structure while recovering all workspace configurations, resources, and control states from the backup.

Steps:

  1. Start RTO Measurement:

    • Begin timing the restore process
    • This helps determine your Recovery Time Objective (RTO)
  2. Recreate Workspace:

    • Use original Workspace CloudFormation template
    • Apply identical parameter values from original workspace
    • Deploy the new workspace stack
  3. Restore Database:

    • Navigate to AWS RDS console
    • Choose either:
      • Restore from snapshot, or
      • Use "Restore to point in time" feature
    • Ensure restored DB configurations match original:
      • Instance class
      • Storage type/size
      • Network settings
      • Security groups
  4. Configure Temporary Database:

    • Wait for restored DB to become available
    • Record the new database endpoint
    • Verify connectivity
  5. Deploy Bastion Host:

  6. Execute Migration:

    • Run migration script to copy DB schema:
      • From (Source): The restored database
      • To (Target): New existing database
nohup ./migration.sh <turbot_schema> <source_or_restored_DB_endpoint> <target_or_actual_db_endpoint> &
example: nohup ./migration.sh panda turbot-panda.abcxyzabcxyz.us-east-1.rds.amazonaws.com turbot-babbage.abcxyzabcxyz.us-east-1.rds.amazonaws.com &
  1. Wait for the pg_dump and pg_restore process in migration.sh to complete.
  2. Flush ECS Containers:
    • Navigate to the AWS ECS consoleCluster open the Tasks tab
    • Locate the TE version-related tasks and stop them.

Step 4: Clear Redis Cache

To clear the workspace from Redis, log into the bastion host and execute:

export REDISHOST=master.turbot-babbage-cache-cluster.abcxyz.use1.cache.amazonaws.com
redis-cli -h $REDISHOST --tls -p 6379 -a <password> KEYS "<turbot_schema>*" | xargs redis-cli -h $REDISHOST --tls -p 6379 -a <password> DEL
example: redis-cli -h $REDISHOST --tls -p 6379 -a mysecurepassword KEYS "panda*" | xargs redis-cli -h $REDISHOST --tls -p 6379 -a mysecurepassword DEL

Step 5: Review

This step validates the restoration process.

  • Login Validation to ensure the previous credentials still work.
  • Resource & Control Check: Verify the number of resources and controls match pre-disaster stats.
  • Test New Resource Import: Create a new S3 bucket and verify it appears in Guardrails UI.
  • Verify Control Execution: Run a control scan to confirm that all controls are in OK or Skipped state.

Next Steps

Explore the following resources to expand your understanding of Guardrails disaster recovery and workspace management:

Troubleshooting

IssueDescriptionGuide
Workspace Not AccessibleIf the workspace does not restore correctly, ensure that RDS endpoints are correct in the migration script.
Redis Cache Not ClearedIf controls fail to execute, verify that Redis cache clearing was performed correctly.See Step 4: Clear Redis Cache in this guide.
Further AssistanceIf the issue persists, open a support ticket and provide logs & screenshots for faster resolution.Open Support Ticket