🟒

ECS Service Auto Scaling

  • Dynamically adjusts the desired number of running ECS tasks based on load or metrics.
  • Powered by AWS Application Auto Scaling.

Key Metrics for Scaling

  • ECS Service Average CPU Utilization
  • ECS Service Average Memory Utilization
  • ALB Request Count per Target (from Application Load Balancer)

Scaling Methods

  • Target Tracking Scaling – Keeps a CloudWatch metric at a specific target value.
  • Step Scaling – Responds to CloudWatch alarms with scaling actions based on thresholds.
  • Scheduled Scaling – Runs scaling actions at fixed times/dates (good for predictable traffic patterns).

Scope of Scaling

  • ECS Service Auto Scaling β†’ Adjusts task count.
  • EC2 Auto Scaling β†’ Adjusts container instance count (EC2 launch type).
  • Fargate Auto Scaling β†’ Only adjusts task count (no instance scaling needed).

EC2 Launch Type – Auto Scaling EC2 Instances

When running ECS with the EC2 launch type, scaling ECS tasks may require scaling the underlying EC2 instances.

Auto Scaling Group (ASG)

  • Manages the number of EC2 instances in the cluster.
  • Scales based on CloudWatch metrics such as CPU utilization.
  • Ensures additional EC2 instances are launched when needed to run more tasks.

ECS Cluster Capacity Provider

  • Integrates ECS with an ASG for automated infrastructure scaling.
  • Monitors ECS task capacity needs (CPU, RAM).
  • Automatically provisions EC2 instances when there’s insufficient cluster capacity.

ECS Scaling – CPU Usage Example

Scenario: ECS Service Auto Scaling based on CPU utilization.
  1. CloudWatch Metric
      • ECS service reports Average CPU Utilization to CloudWatch.
  1. CloudWatch Alarm
      • Alarm triggers when CPU exceeds a set threshold (e.g., 70%).
  1. Service Auto Scaling Action
      • Policy adds new ECS tasks (e.g., from 2 to 3 tasks) to handle increased demand.
  1. Infrastructure Scaling (Optional)
      • If EC2 launch type is used and the cluster lacks resources,
        • the Capacity Provider + ASG launches more EC2 instances.
This combined scaling approach ensures both the application layer (tasks) and the infrastructure layer (instances) can respond to traffic spikes.