Workspace Restore

Restoring a Workspace

In this guide, you will:

An essential part of maintaining Turbot Guardrails is testing disaster recovery. This document covers the process for restoring a destroyed workspace. Restoration should be tested at least once a year, ideally twice. The goal is to have Guardrails Admins familiar with the restoration process and the tools involved.

Testing backup and restore procedures is critical for:

[!NOTE] Workspace restoration is just one of several disaster recovery scenarios. Evaluate other scenarios as part of your organization's comprehensive disaster recovery strategy.

Prerequisites

Process Summary

[!IMPORTANT]

Only test with non-production workspaces

Document all parameters and configurations

Time the restore process to measure RTO

Test regularly (recommended twice per year)

Follow security best practices

Step 1: Build a New Workspace

In this phase, create a workspace and install baseline mods. Then, import an AWS account with Event Pollers.

[!NOTE] Same process applies to Azure and GCP.

This process assumes that Route53 is used for DNS. Customers with manually configured DNS will need to keep track of their configuration.

Steps:

  1. Select TE Version:
  1. Access AWS Master Account:
  1. Create Test Workspace:
  1. Install Required AWS Mods:
  1. Configure Workspace:
  1. Document Initial State:
  1. Create Backup:

Step 2: Drop the Workspace

[!WARNING] Do not delete a production workspace CloudFormation Stack.

Do not delete original database.

  1. Delete the Workspace CloudFormation stack created earlier.
  2. If necessary, force delete the workspace.
  3. Verify that the workspace URL is no longer accessible.

Step 3: Restore the Workspace

In this step, we will recreate a new workspace which initializes an empty database schema. The goal is to restore this empty schema with the data from our restored DB, effectively bringing back the workspace to its previous state. This process ensures we maintain the database structure while recovering all workspace configurations, resources, and control states from the backup.

Steps:

  1. Start RTO Measurement:
  1. Recreate Workspace:
  1. Restore Database:
  1. Configure Temporary Database:
  1. Deploy Bastion Host:
  1. Execute Migration:
nohup ./migration.sh <turbot_schema> <source_or_restored_DB_endpoint> <target_or_actual_db_endpoint> &

example: nohup ./migration.sh panda turbot-panda.abcxyzabcxyz.us-east-1.rds.amazonaws.com turbot-babbage.abcxyzabcxyz.us-east-1.rds.amazonaws.com &
  1. Wait for the pg_dump and pg_restore process in migration.sh to complete.
  2. Flush ECS Containers:

Step 4: Clear Redis Cache

To clear the workspace from Redis, log into the bastion host and execute:

export REDISHOST=master.turbot-babbage-cache-cluster.abcxyz.use1.cache.amazonaws.com
redis-cli -h $REDISHOST --tls -p 6379 -a <password> KEYS "<turbot_schema>*" | xargs redis-cli -h $REDISHOST --tls -p 6379 -a <password> DEL

example: redis-cli -h $REDISHOST --tls -p 6379 -a mysecurepassword KEYS "panda*" | xargs redis-cli -h $REDISHOST --tls -p 6379 -a mysecurepassword DEL

Step 5: Review

This step validates the restoration process.

Next Steps

Explore the following resources to expand your understanding of Guardrails disaster recovery and workspace management:

Troubleshooting

Issue Description Guide
Workspace Not Accessible If the workspace does not restore correctly, ensure that RDS endpoints are correct in the migration script.
Redis Cache Not Cleared If controls fail to execute, verify that Redis cache clearing was performed correctly. See Step 4: Clear Redis Cache in this guide.
Further Assistance If the issue persists, open a support ticket and provide logs & screenshots for faster resolution. Open Support Ticket