In this article

Auto Scaling refresher

A WWT customer, well along on their cloud journey, wanted to optimize performance for their thousands of users and keep ballooning infrastructure costs under control at the same time. The initial and obvious choice to achieve this goal was Amazon EC2 Auto Scaling.

Amazon EC2 Auto Scaling lets businesses maximize the benefits of the AWS cloud platform from multiple angles, offering better availability of instances across availability zones, fault tolerance and cost optimization. Auto Scaling ensures that a minimum set of active servers handles your organization's load, automatically shifting this number up and down as server demand fluctuates.

Auto scaling visual

WWT's customer explored ways to implement Amazon EC2 Auto Scaling natively and quickly discovered that both developer and infrastructure teams would have to do a significant amount of extra work to ensure Auto Scaling was implemented in accordance with certain company requirements. This heavy workload made them question whether Auto Scaling servers in the cloud would actually save any money.

WWT's cloud experts

The customer engaged me, their embedded WWT Cloud Architect, at this point. WWT has played an integral role in this customer's cloud efforts from early on, working hand in hand to resolve many issues over the years. For this latest problem, I wanted to use an outside-the-box approach.

The solution? An innovative "pilot lighting" methodology that leveraged cloud native services such as Lambda, SNS and Cloudwatch to enhance native Auto Scaling. This approach would allow the customer to reap the cost saving benefits they sought without the need for extra development and infrastructure work.

But before we dive into the details of my solution, we need to understand the challenges that prompted the customer to engage WWT in the first place.

Challenges of native Auto Scaling

To configure Amazon EC2 Auto Scaling natively, you must first select an Amazon Machine Instance (AMI) and create what is known as a Launch Configuration (or Launch Template). This is where you specify parameters such as instance type, Amazon EBS volume to attach on launch, AMI ID, security group, key pairs to associate, etc. If you have ever launched an Amazon EC2 instance before, you would have specified this same information.

When a new instance is created due to scale out, it causes a new AMI to launch from scratch. This AMI might have a bootstrap user data script installed; it could be a custom AMI with many components of an application built into the image; or it could use a configuration management tool like puppet to auto-configure the server at boot.

Amazon EC2 Auto Scaling is a great solution path for a large majority of workloads. But it can become challenging for developers and cloud teams when you introduce enterprise requirements for server builds and/or additional setup complexities.

Challenge No. 1: Enterprise requirements

One challenge to Auto Scaling natively can arise if your organization has specific requirements that must be met during implementation. For example, say developers are required to use their company's own Active Directory domain when implementing Amazon EC2 Auto Scaling. While AWS and other vendors have blog posts about joining servers to a domain using Amazon EC2 Auto Scaling, such solutions usually involve using AWS Managed Active Directory exclusively.

To meet their company's Active Domain directory requirement, developers would have to take the following steps: use bootstrap scripts to place the instance into a Pending:Wait state on boot in order to join to the domain; reboot the instance; verify successful domain join; finish other custom automation; and then finish their lifecycle hook. Moreover, they would need to create a terminating script to perform Active Directory cleanup after servers were removed and terminated from the Auto Scaling Group.

In short, overcoming this first challenge to Auto Scaling natively would require development of at least two additional scripts.

Challenge No. 2: Complex/custom server setup

Another challenge for developers can arise if the application hosted on the instance has a complex setup process that requires customization on each individual instance—meaning the application cannot simply be the copy of an image.

Overcoming this challenge would likely require development of another script or complex configuration management process, bringing the number of extra automation scripts/steps needed to Auto Scale natively to three at a minimum.

Challenge No. 3: Server availability

Server availability can become a problem if your teams experience the above challenges. Upon joining to the domain, the server will require a reboot. Rebooting the server and completing any other infrastructure automation will take at least one to two minutes. If the application is very difficult to configure with many steps (and potential reboots), you will experience an even longer delay before the server can receive requests.

While this time delta can be lowered/optimized over time, doing so would involve additional time and investment from development teams who may prefer to focus on optimizing their application rather than engaging in bootstrapping efforts.

In the end, implementing Amazon EC2 Auto Scaling natively under the above scenarios can lead to a significant amount of extra work from both developer teams (for each app) and infrastructure teams (to ensure company requirements are in place in the infrastructure).

Is there a better way?

Pilot lighting cloud servers

I came up with a unique solution to these challenges for my customer. In brief, I developed a custom workflow using AWS Lambda, SNS and Cloudwatch that could be deployed alongside the customer's Terraform cloud infrastructure in multiple accounts to "pilot light" existing servers in an Auto Scaling manner. The benefit of this approach is that there is only one script to manage. Plus, it can work with any pool of similar servers that are fully configured and ready to serve traffic.

The solution allowed cloud teams to set up their servers to meet infrastructure requirements; it enabled development teams to configure the application on each server exactly how they wanted; and the customer reaped the benefits of Auto Scaling on demand (i.e., improved fault tolerance and cost optimization) without tapping development and infrastructure teams for bootstrapping work.

Code snapshot

Below is a flow diagram of the existing pilot lighting code I deployed, which runs every minute and constantly checks the environment:

Pilot light coding flow diagram

The code above consists of the following logic:

  1. Upon start, gather variables and validate. If these validations fail, exit the script to prevent any unintended consequences.
  2. Once the variables are gathered, determine our pool set. Check all the servers in the pool for dynamically generated "Scale Down" tags that potentially were applied by the solution on a previous run.
  3. If there is a "low CPU" tag, it will have a timestamp assigned to it. The solution will determine if this timestamp is greater than the amount of time that we have determined to power off the instance. In other words, we could configure the solution such that after 10 minutes of a scale down event, if the server has not been added back to the Target Group/load, we power off the instance because we deemed the scaling event to be over.
  4. If there is a "high CPU" tag, it will have a count on it. This count represents the amount of times that the server was removed from the Load Balancer due to being above a certain CPU threshold (individually) over a specified duration of time.
    1. If there is a high CPU tag and the count exceeds a threshold of removals from the load balancer, the solution will reboot the instance. This typically fixes CPUs that get stuck near 100% and recovers the instance.
    2. If there is a high CPU tag and the count is less than the threshold of removals, check the CPU again. If the CPU is below the "high" limit, then remove the tag from the instance.
    3. All instances tagged with "high CPU" will be considered last in terms of being added back to the Target Group from the pool. This usually is enough time for the CPU to recover.
  5. The solution always guarantees a minimum number of servers in the Target Group, even if no scaling event has occurred (Scale Up or Scale Down). This is a configurable count.
  6. After determining if we need to reboot a server or power one off, the solution evaluates the average CPU across all instances in the Target Group. If this is above a threshold, it will add a configurable number of servers to the Target Group with a Scale Up event. It will balance the placement of servers over Availability Zones to ensure HA of the application.
  7. If there is no requirement for a Scale Up event, a separate set of data points is evaluated to determine if we can Scale Down. If we can Scale Down, then we remove one instance from the Target Group and tag it with "low CPU" and a timestamp of the removal time. Because the solution runs every minute, this ensures a slow drain of the servers in case of burst traffic. The solution also evaluates Availability Zone balance and ensures that when it is draining, it removes an instance from an unbalanced Availability Zone or picks one from an Availability Zone at random.
  8. Lastly, we can mark servers with a maintenance tag to ensure that they are neither added to the Target Group nor powered off in case an engineer wants to investigate potential issues with the server.

Limitations of pilot lighting

WWT's pilot lighting solution does have limitations. The pool of servers must be fully configured to be added to the Target Group. As such, the solution can only scale to the total amount of servers that are "ready" to take application traffic. With Amazon EC2 Auto Scaling, a dynamic number of servers can be used to appropriately handle the load. However, if you have done adequate performance testing and data traffic analyses on your servers, you can accommodate the proportionate number of servers for expected load plus additional unforeseen traffic.

Second, pilot lighting only evaluates the CPU for metric evaluation at this time. However, because the solution is written in a modular way, another function could be added to the Lambda code to evaluate other CloudWatch metrics (e.g., request count). All input variables are also controlled via Environmental Variables, and the code itself is completely dynamic outside of these variables.

Benefits of pilot lighting

WWT's pilot lighting solution is far faster than Amazon EC2 Auto Scaling because the servers are fully built. The Lambda Function simply turns them on (if they are powered off) and adds them to the Target Group. The time the server takes to serve traffic is only based on the boot time and initial registration to the Target Group, without the requirement of custom lifecycle actions, server preparation or application preparation.

In addition, the customer enjoys the benefits of Amazon EC2 Auto Scaling while having their enterprise requirements met. As such, they can realize tremendous cost savings (tens of thousands of dollars per month across multiple environments), with the potential for even more if they pilot light other applications with similar requirements.


The ability of WWT architects and engineers to think outside the box — encouraged by our company's deep expertise in Multicloud, Software Development and many other technology domains — can lead to truly impactful innovations for organizations. This pilot lighting methodology can be used by other companies who are in the process of deploying Auto Scaling natively and run into similar issues. If you have questions about our unique methodology, approach or code, please reach out to your WWT Account Manager or contact us here to get in touch with our Cloud or Application Services experts.