Case Study

Making Guardrails more efficient: ARM64, cache optimization, and more

Learn how we cut Guardrails' runtime costs by 30% while boosting performance with a suite of new optimizations for running Guardrails efficiently.

Turbot Team
5 min. read - Oct 21, 2024
Learn how we cut Guardrails' runtime costs by 30% while boosting performance with a suite of new optimizations for running Guardrails efficiently.

We continually strive to optimize the performance and cost efficiency of Guardrails, ensuring our customers take advantage of smarter, more innovative solutions to reduce these costs without sacrificing performance. Starting with the release of Turbot Guardrails Enterprise v5.45.0, we’ve focused on improving runtime efficiency by migrating to ARM64 and optimizing Redis memory usage. In following releases we also introduced new improvements from upgraded Node.js in Lambda functions, a new Activity Retention policy to better handle record versions in the database, and a guide to resize the PostgreSQL database without downtime. These changes have resulted in significant cost savings, all while maintaining the high standards of performance and reliability that our customers depend on.

ARM64: Reducing ECS and Lambda costs by 30%

A major part of our strategy has been the transition from traditional x86_64 architecture to ARM64 for AWS ECS and Lambda workloads. ARM instances offer the same performance as x86_64 but at a much lower cost, making them an ideal solution for reducing operational expenses without sacrificing efficiency.

By migrating ECS and Lambda workloads to ARM64, we’ve realized 30% savings in infrastructure costs. This shift allows us to continue offering the same high level of service while driving down the cost of running Guardrails. While the transition is ongoing, we’re already seeing significant benefits, and similar savings are projected in the months ahead.

Cost savings transitioning to ARM64

Cache optimization: Reducing Redis memory usage by 50%

Another area of focus has been improving how we use Redis as part of our AWS Lambda processing. Previously, high message backlogs during processing caused Redis memory usage to spike due to the creation of numerous keys—such as those for processes, locks, and logs. When Redis reached its memory limit, it began deleting these keys randomly, leading to failures and inefficiencies.

To solve this, we restructured the key creation process. Now, during polling, only a minimal lock key is created with a 24-hour time-to-live (TTL), and full process keys are generated only when a message is actively handled by a worker node.

Memory usage after optimizations

This change has dramatically reduced memory usage, allowing us to cut our Amazon ElastiCache Redis cluster size by half and achieve 50% savings on our runtime costs. By optimizing how and when keys are created, we’ve not only reduced costs but also made the system more efficient and reliable under high load.

Activity log retention: New log lifecycle policies to manage activity records

New Turbot > Workspace > Retention > Activity Retention and Turbot > Workspace > Retention > Activity Purge Limit policies were introduced in the @turbot/turbot-5.46.0 mod. These policies enable administrators to configure how long activity records such as actions, events, and notifications are retained before being purged from the database. With Smart Retention logic built in, important record versions are preserved while outdated records are automatically cleaned up, preventing excessive table growth and reducing storage overhead.

By default when you update the mod, the value is set to 'None' to ensure your data isn't deleted. You can adjust this value based on your needs, and activity will be deleted accordingly. We recommend setting it to 90 days. Customers who have implemented this policy already have seen significant improvements, with an average 60% reduction in activity records and a 40% decrease in database storage size. And with millions of unnecessary records gone, queries run faster.

PostgreSQL optimization: Reclaiming space with zero downtime

Turbot Enterprise Administrators can resize PostgreSQL databases and reclaim unused storage with minimal manual intervention and downtime. This new guide outlines the process from setting up logical replication to sync the original database with a new instance, followed by a pg_dump and pg_restore of all schemas. Once both databases are synchronized, the new instance seamlessly takes over.

While the entire process can take hours depending on the size of the environment, the bulk of the operation runs in the background, reducing the impact on normal operations. The end result is a more efficient database. Some customers have already reclaimed up to 50% of unused storage with no downtime.

Node.js upgrade: Enhancing Lambda performance

We’ve upgraded the Node.js runtime in our AWS Lambda functions from Node.js v18 to v20 to improve performance and efficiency. This upgrade provides better memory management, faster startup times, and more responsive functions. All of this is handled behind the scenes for administrators as part of the latest TE version, delivering the benefits of quicker execution times for Guardrails controls and more efficient handling of workloads.

PostgreSQL 16 support: Unlocking new database performance gains

Turbot Guardrails Enterprise now supports PostgreSQL 16. This latest version of Postgres includes significant performance enhancements:

  • 40% faster query execution through improved parallelism
  • 20% more efficient indexing with B-tree deduplication
  • 30% better write performance during large data syncs

Guardrails Enterprise customers can update their TED stack to PostgreSQL 16 to take advantage of these latest performance improvements.

A collaborative effort for greater impact

The success of these optimizations—whether it's the transition to ARM64, Redis memory improvements, PostgreSQL upgrades, or the introduction of the Activity Retention Policy—is the result of close collaboration between Turbot’s product, operations, and customer success teams. By identifying areas where performance and cost can be improved, we ensure that both our SaaS infrastructure and self-hosted Guardrails environments benefit from these changes.

For our Turbot Guardrails Enterprise customers, these enhancements mean not only cost savings but also improved performance and operational efficiency. Whether it's running faster queries, reclaiming storage, or reducing the overhead of managing millions of records, we continue to focus on delivering a smarter, more cost-effective cloud governance solution.

We encourage all Guardrails Enterprise customers to take advantage of these optimizations. Reach out to your Customer Success team lead to help you get started.