Restoring a Workspace
In this guide, you will:
- Test backup and restore procedures for Turbot Guardrails workspaces
within the single region
. - Monitor and troubleshoot the disaster recovery process.
An essential part of maintaining Turbot Guardrails is testing disaster recovery. This document covers the process for restoring a destroyed workspace. Restoration should be tested at least once a year, ideally twice. The goal is to have Guardrails Admins familiar with the restoration process and the tools involved.
Testing backup and restore procedures is critical for:
- Validating backup integrity and restore processes
- Meeting compliance and audit requirements
- Training administrators on recovery procedures
- Measuring recovery time objectives (RTO)
NoteWorkspace restoration is just one of several disaster recovery scenarios. Evaluate other scenarios as part of your organization's comprehensive disaster recovery strategy.
Prerequisites
- Administrator access to AWS Console.
- Familiarity with Guardrails installation.
- Understanding of database backup/restore.
- Access to required AWS services such as RDS, CloudFormation, ECS and Route 53.
Process Summary
- Build a New Workspace – Set up a fresh workspace for testing, install required mods, and
take an RDS snapshot
. - Simulate Disaster –
Destroy the workspace
by deleting its CloudFormation stack. - Restore the Workspace – Recover data from the latest backup, apply migrations, and restart the workspace.
- Validate Restoration – Log in and verify the workspace is functional.
ImportantOnly test with non-production workspaces
Document all parameters and configurations
Time the restore process to measure RTO
Test regularly (recommended twice per year)
Follow security best practices
Step 1: Build a New Workspace
In this phase, create a workspace and install baseline mods. Then, import an AWS account with Event Pollers.
NoteSame process applies to Azure and GCP.
This process assumes that Route53 is used for DNS. Customers with manually configured DNS will need to keep track of their configuration.
Steps:
Select TE Version:
- Choose a dedicated TE version for testing
- Note: ECS container flush during restore may cause brief outages for workspaces using this TE version
- If multiple workspaces use this TE version, pause event processing
Access AWS Master Account:
- Navigate to the alpha region of your AWS Master account
Create Test Workspace:
- Follow the workspace creation guide
Save all CloudFormation parameters
used (needed for restoration)- Record credentials from CloudFormation Stack outputs
- Note the Turbot ID of workspace Turbot Root (
tmod:@turbot/turbot#/
)
Install Required AWS Mods:
aws
aws-iam
aws-kms
aws-s3
Configure Workspace:
- Create "AWS" folder under Turbot Root
- Import an AWS account into the folder
- Verify no controls/policies are in
tbd
state
Document Initial State:
- Take screenshots of workspace dashboard
- Record key metrics:
- Number of resources
- Active controls count
- Other relevant statistics
- Save for post-restore validation
Create Backup:
- Wait for automated "Restore to point in time" backup
- Or take a manual RDS backup
Step 2: Drop the Workspace
WarningDo not delete a production workspace CloudFormation Stack.
Do not delete original database.
- Delete the Workspace CloudFormation stack created earlier.
- If necessary, force delete the workspace.
- Verify that the workspace URL is no longer accessible.
Step 3: Restore the Workspace
In this step, we will recreate a new workspace which initializes an empty database schema. The goal is to restore this empty schema with the data from our restored DB, effectively bringing back the workspace to its previous state. This process ensures we maintain the database structure while recovering all workspace configurations, resources, and control states from the backup.
Steps:
Start RTO Measurement:
- Begin timing the restore process
- This helps determine your Recovery Time Objective (RTO)
Recreate Workspace:
- Use original Workspace CloudFormation template
- Apply identical parameter values from original workspace
- Deploy the new workspace stack
Restore Database:
- Navigate to AWS RDS console
- Choose either:
- Restore from snapshot, or
- Use "Restore to point in time" feature
- Ensure restored DB configurations match original:
- Instance class
- Storage type/size
- Network settings
- Security groups
Configure Temporary Database:
- Wait for restored DB to become available
- Record the new database endpoint
- Verify connectivity
Deploy Bastion Host:
- Launch a Turbot Bastion Host instance. Follow setup guide Turbot Bastion Host Setup
- Ensure network access to both databases
Execute Migration:
- Run migration script to copy DB schema:
- From (Source): The restored database
- To (Target): New existing database
- Run migration script to copy DB schema:
nohup ./migration.sh <turbot_schema> <source_or_restored_DB_endpoint> <target_or_actual_db_endpoint> &
example: nohup ./migration.sh panda turbot-panda.abcxyzabcxyz.us-east-1.rds.amazonaws.com turbot-babbage.abcxyzabcxyz.us-east-1.rds.amazonaws.com &
- Wait for the
pg_dump
andpg_restore
process inmigration.sh
to complete. - Flush ECS Containers:
- Navigate to the AWS ECS console → Cluster open the Tasks tab
- Locate the TE version-related tasks and stop them.
Step 4: Clear Redis Cache
To clear the workspace from Redis, log into the bastion host and execute:
export REDISHOST=master.turbot-babbage-cache-cluster.abcxyz.use1.cache.amazonaws.comredis-cli -h $REDISHOST --tls -p 6379 -a <password> KEYS "<turbot_schema>*" | xargs redis-cli -h $REDISHOST --tls -p 6379 -a <password> DEL
example: redis-cli -h $REDISHOST --tls -p 6379 -a mysecurepassword KEYS "panda*" | xargs redis-cli -h $REDISHOST --tls -p 6379 -a mysecurepassword DEL
Step 5: Review
This step validates the restoration process.
- Login Validation to ensure the previous credentials still work.
- Resource & Control Check: Verify the number of resources and controls match pre-disaster stats.
- Test New Resource Import: Create a new S3 bucket and verify it appears in Guardrails UI.
- Verify Control Execution: Run a control scan to confirm that all controls are in OK or Skipped state.
Next Steps
Explore the following resources to expand your understanding of Guardrails disaster recovery and workspace management:
Troubleshooting
Issue | Description | Guide |
---|---|---|
Workspace Not Accessible | If the workspace does not restore correctly, ensure that RDS endpoints are correct in the migration script. | |
Redis Cache Not Cleared | If controls fail to execute, verify that Redis cache clearing was performed correctly. | See Step 4: Clear Redis Cache in this guide. |
Further Assistance | If the issue persists, open a support ticket and provide logs & screenshots for faster resolution. | Open Support Ticket |