© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Paul Maddox, Amazon Web Services
Alfonso Acosta, Weaveworks
December 1, 2016
Operational Management
with Amazon ECS
What to Expect from the Session
• Shared model of operational responsibility
• Deployment
• Availability
• Cost optimization
• Scaling
• Security
• Monitoring & logging
• Weaveworks: Networking and monitoring in ECS
• Weave Net
• Weave Scope
What *not* to Expect from the Session
• CON302 - Development Workflow with Docker and
Amazon ECS (CI/CD)
• CON309 - Running Microservices on Amazon ECS
(service discovery)
Key Components
Development cluster
Container instance Container instance
Container instance
Production cluster
Container instance Container instance
Container instance
Amazon EC2 Container Service
(Amazon ECS)
Task definition
Amazon EC2 Container Registry
(Amazon ECR)
Component: ECS
AWS is responsible for
operations of the cloud
You are responsible for operations in the cloud
using the building blocks provided.
Cost Control
$ aws ecs create-cluster --cluster-name dev
Component: ECR
AWS is responsible for
operations of the cloud
You are responsible for operations in the cloud
using the building blocks provided.
Cost Control
Component: Container Instances
Development cluster
Cluster instance Cluster instance
Cluster instance
AWS is responsible for
operations of the cloud
Deployment Cost Control
Patching Monitoring
Scaling Availability
You are responsible for operations in the cloud
using the building blocks provided.
Component: Container Instances
• An EC2 instance (or collection of)
• Running Docker
• With the open-source ECS agent running
Tip: Use ECS-optimized AMIs
echo “ECS_CLUSTER=dev” >> /etc/ecs/ecs.config
Container Instances: Building Blocks Provided
Cost Control
Update your AMI, replace instances
Auto Scaling group
Reserved Instances
CLI SDKs etc...
IAM Inspector VPC Flow Logs etc...
Spot Fleet
Component: Tasks & Containers
AWS is responsible for
operations of the cloud
You are responsible for operations in the cloud
using the building blocks provided.
How Should I Set This Up?
Use the AWS
Management Console?
Not repeatable
How Should I Set This Up?
Flex your scripting
What happens if
my script fails
halfway through?
How long
should I
How do I upgrade /
roll back?
set -e
CLUSTER_ID=$(aws ecs create-cluster --cluster-name $CLUSTER_NAME | jq '.cluster.clusterArn’);
# TODO: Don’t forget to add error checks here
aws ec2 run-instances 
--instance-type t2.medium 
--image-id ami-1924770e 
--user-data "echo ECS_CLUSTER=$CLUSTER_NAME >> /etc/ecs/ecs.config"
# ???
sleep 120
AWS CloudFormation
Infrastructure as Code
This is Alice…
She needs to build a new environment.
It needs to be:
- A self-contained, deployable unit
- Repeatable
- Auditable
- Self-documenting
Luckily, Alice knows about CloudFormation…
Time to deploy!
alice@macbook:~$ aws cloudformation create-stack
--stack-name preprod
--template-body file://Users/alice/env.yaml
Time to update…
alice@macbook:~$ aws cloudformation update-stack
--stack-name preprod
--template-body file://Users/alice/env.yaml
When a new environment is required…
alice@macbook:~$ aws cloudformation create-stack
--stack-name production
--template-body file://Users/alice/env.yaml
$ aws ecr create-repository
--repository-name myapp
"repository": { "registryId": ”123456789012",
"repositoryName": "myapp", "repositoryArn":
"repositoryUri": ”123456789012.dkr.ecr.us-east-
CloudFormation (YAML)
Type: AWS::ECR::Repository
Name: myapp
Using ECR
Use AWS CLI to perform ‘docker login’
Tip: Use the Amazon ECR Credential Helper for automatic logins
$ $(aws ecr get-login)
$ docker pull <repo-url>/<image>:<version>
$ aws ecs create-cluster
--cluster-name preprod
"cluster": {
"status": "ACTIVE",
"clusterName": ”preprod",
"registeredContainerInstancesCount": 0,
"pendingTasksCount": 0
"runningTasksCount": 0,
"activeServicesCount": 0,
"clusterArn": "arn:aws:ecs:us-east…”
ECS Cluster
CloudFormation (YAML)
Type: AWS::ECS::Cluster
ClusterName: preprod
ECS Container Instances
• Highly available architecture, distributed
across multiple Availability Zones
• VPC with public and private subnets
• Application Load Balancer with path
based routing for inbound traffic
• NAT gateways for outbound traffic
• Auto Scaling group of container
• CloudWatch Logs for centralized
container logging
Private Subnet
Availability Zone Availability Zone
Public Subnet Public Subnet
Private Subnet
Nat GatewayNat Gateway
AutoScaling GroupContainer InstanceContainer Instance Container InstanceContainer Instance
Load Balancer
CloudWatch Logs
(container logs)
Inbound Traffic
$ curl -v https://api.example.com/v1/products/1
> GET / HTTP/1.1
> Host: api.example.com
> User-Agent: curl/7.43.0
> Accept: */*
• Incoming HTTP/HTTPS traffic comes in
via the Application Load Balancer (ALB)
in public subnets
• The ALB uses path based routing to
route /products/* to the container
instances in private subnets running our
product’s service
• Supports dynamic host port mapping,
allowing multiple containers of the same
type on each host
AutoScaling GroupContainer Instance Container Instance
Load Balancer
Outbound Traffic
• Our container instances are in private
subnets, with no direct internet access
• At some point, they might need access
to external services
• NAT gateways provide a highly scalable
and available solution
Private Subnet
Public Subnet Public Subnet
Private Subnet
Nat GatewayNat Gateway
Container Instance Container Instance
Container Instance Container Instance
CloudWatch Logs
(container logs)
• ECS integrates directly with CloudWatch
Logs (as well as others)
• Centralized collection container logs
• Search, filter, and alert on log conditions
• (more to come later…)
tl;dr - ECS Reference Architecture on GitHub
Cost Optimization
Reserved Instances
Up to 75%
• Use Auto Scaling groups
• Reserve ECS container
instances when you have
known baseline capacity
• Use On-Demand pricing for
capacity peaks.
* Dependent on specific AWS service, size/type, and region
Spot Instances
Up to 90%
• Use Spot Fleet to maintain
instance availability and
define cluster based on
required CPU/memory.
* Compared to On-Demand price based on specific EC2 instance type, region, and Availability Zone
Multiple ECS Clusters
Creating multiple ECS clusters is easy, and often more cost
efficient. Consider availability and compute requirements.
Example: Development Cluster
Spot Fleet
Example: Production Cluster
Auto Scaling group with Reserved Instances for baseline and
On-Demand for capacity peaks
Example: Batch Processing Cluster
Spot Fleet of GPU Instances
Scaling ECS Container Instances Automatically
Scale out as needed
• Use Auto Scaling groups
• Set Auto Scaling group
min, max, desired
• Scale in and out based
on CloudWatch alarms
Scaling ECS Container Instances Automatically
Use the ECS cluster
CloudWatch metric
Tutorial: Scaling Container Instances with CloudWatch Alarms
Application Auto Scaling for ECS Services
Application Auto Scaling for ECS Services
Patching ECS Container Instances
Type: AWS::AutoScaling::LaunchConfiguration
ImageId: ami-1924770e
Type: AWS::AutoScaling::AutoScalingGroup
MinSize: 2
MaxSize: 8
DesiredCapacity: 2
MinInstancesInService: 2
MaxBatchSize: 2
PauseTime: PT15M
WaitOnResourceSignals: true
1. Ensure you have an
AutoScalingRollingUpdate policy
on your Auto Scaling group
2. Update the AMI in your
CloudFormation template
3. aws cloudformation update-stack
4. Let CloudFormation perform a rolling
update to your ECS container
Patching Containers
Minimal Containers
• Use the smallest FROM
base container to minimize
surface attack
• FROM scratch is ideal for
Go and other languages
that compile a (near) static
IAM Roles
IAM roles for container instances:
• Bound to the ECS container instance
• Applies to all containers running on the host
• Pulling images from ECR
• CloudWatch Logs
IAM roles for tasks:
• Bound to specific ECS tasks
• Task-specific access to AWS services
Tip Use principle of least privilege – prefer IAM roles for tasks where applicable
Environment Variables
• Quick and easy
• Configuration stored in task definition (or passed
• Version in immutable definition; easy rollback
• Good for configuration items
• Bad for secrets (API keys, passwords, etc.)
Configuration & Secrets Management
KMS + S3 / DynamoDB
• Use environment variables to provide
pointer to encrypted data in S3/DynamoDB
• Use KMS or AWS encryption clients to
encrypt secrets at rest
• Use VPC endpoints, IAM policies, and IAM
roles to restrict decryption
Configuration & Secrets Management
Monitoring & Logging
Monitoring with CloudWatch
Monitoring with CloudWatch
Centralized Logging with CloudWatch Logs
"image": ”nginx:latest",
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": ”nginx",
"awslogs-region": "us-east-1"
• Defined within the task definition
• Available log drivers
• awslogs
• fluentd
• gelf
• journald
• json-file
• splunk
• Syslog
• Submit a pull request on ECS agent
GitHub repo if you would like others
Centralized Logging with CloudWatch Logs
Tip: Use Metric Filters with CloudWatch Logs
AWS is responsible for operations
of the cloud.
You are responsible for operations
in the cloud using the building
blocks provided.
Networking and Monitoring
Weave Net and Weave Scope
Weave Net
• Overlay network between hosts
• First container networking solution
• Automatic DNS-based service discovery
• Automatic IP allocation (IPAM)
• Minimum overhead (VxLan)
• Gossip protocol to share updates (no explicit DB)
• Multi DC
• Encryption
Weave Net: Overlay Network
Weave Net: Service Discovery
Weave Net: Service Discovery
Weave Net on ECS
Weave Net on ECS
Weave Scope
Weave Scope
(host 1)
(host 2)
(host n)
Reports (CRDT-like semantics)
Weave Cloud
(host 1)
(host 2)
(host n)
Weave Net + Scope on ECS
Thank you!
Remember to complete
your evaluations!

Weaveworks at AWS re:Invent 2016: Operations Management with Amazon ECS

  • 1. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Paul Maddox, Amazon Web Services Alfonso Acosta, Weaveworks December 1, 2016 Operational Management with Amazon ECS CON301
  • 2. What to Expect from the Session • Shared model of operational responsibility • Deployment • Availability • Cost optimization • Scaling • Security • Monitoring & logging • Weaveworks: Networking and monitoring in ECS • Weave Net • Weave Scope
  • 3. What *not* to Expect from the Session • CON302 - Development Workflow with Docker and Amazon ECS (CI/CD) • CON309 - Running Microservices on Amazon ECS (service discovery)
  • 4. Key Components Development cluster Container instance Container instance Container instance Production cluster Container instance Container instance Container instance Amazon EC2 Container Service (Amazon ECS) Container Container Volume Task definition Amazon EC2 Container Registry (Amazon ECR)
  • 5. Component: ECS AWS is responsible for operations of the cloud You are responsible for operations in the cloud using the building blocks provided. Deployment Security Patching Monitoring Scaling Availability Cost Control $ aws ecs create-cluster --cluster-name dev AWS Customer
  • 6. Component: ECR AWS is responsible for operations of the cloud You are responsible for operations in the cloud using the building blocks provided. Deployment Security Cost Control AWS Customer Monitoring Scaling Availability Patching
  • 7. Component: Container Instances Development cluster Cluster instance Cluster instance Cluster instance AWS is responsible for operations of the cloud Deployment Cost Control Patching Monitoring Scaling Availability Security AWS Customer You are responsible for operations in the cloud using the building blocks provided.
  • 8. Component: Container Instances • An EC2 instance (or collection of) • Running Docker • With the open-source ECS agent running Tip: Use ECS-optimized AMIs echo “ECS_CLUSTER=dev” >> /etc/ecs/ecs.config https://github.com/aws/amazon-ecs-agent http://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-optimized_AMI.html
  • 9. Container Instances: Building Blocks Provided Deployment Security Patching Monitoring Scaling Availability Cost Control CloudFormation Update your AMI, replace instances CloudWatch Auto Scaling group Reserved Instances CLI SDKs etc... IAM Inspector VPC Flow Logs etc... Spot Fleet
  • 10. Component: Tasks & Containers Container Container Volume AWS is responsible for operations of the cloud Deployment Security Patching Monitoring Scaling Availability Logging AWS Customer You are responsible for operations in the cloud using the building blocks provided.
  • 11. How Should I Set This Up? Use the AWS Management Console? Time-consuming Error-prone Not repeatable
  • 12. How Should I Set This Up? Flex your scripting skills? What happens if my script fails halfway through? How long should I pause? How do I upgrade / roll back? #!/bin/bash set -e CLUSTER_NAME=“dev” AMI=“ami-c8337dbb” CLUSTER_ID=$(aws ecs create-cluster --cluster-name $CLUSTER_NAME | jq '.cluster.clusterArn’); # TODO: Don’t forget to add error checks here aws ec2 run-instances --instance-type t2.medium --image-id ami-1924770e --user-data "echo ECS_CLUSTER=$CLUSTER_NAME >> /etc/ecs/ecs.config" # ??? sleep 120
  • 14. This is Alice… She needs to build a new environment. It needs to be: - A self-contained, deployable unit - Repeatable - Auditable - Self-documenting
  • 15. Luckily, Alice knows about CloudFormation…
  • 16. Time to deploy! alice@macbook:~$ aws cloudformation create-stack --stack-name preprod --template-body file://Users/alice/env.yaml …or…
  • 17. Time to update… alice@macbook:~$ aws cloudformation update-stack --stack-name preprod --template-body file://Users/alice/env.yaml …or…
  • 18. When a new environment is required… alice@macbook:~$ aws cloudformation create-stack --stack-name production --template-body file://Users/alice/env.yaml …or…
  • 19. AWS CLI $ aws ecr create-repository --repository-name myapp { "repository": { "registryId": ”123456789012", "repositoryName": "myapp", "repositoryArn": "arn:aws:ecr:us-east...”, "repositoryUri": ”123456789012.dkr.ecr.us-east- 1.amazonaws.com/myapp" } } ECR CloudFormation (YAML) Resources: ECRRepository: Type: AWS::ECR::Repository Properties: Name: myapp
  • 20. Using ECR Use AWS CLI to perform ‘docker login’ Tip: Use the Amazon ECR Credential Helper for automatic logins https://github.com/awslabs/amazon-ecr-credential-helper $ $(aws ecr get-login) $ docker pull <repo-url>/<image>:<version>
  • 21. AWS CLI $ aws ecs create-cluster --cluster-name preprod { "cluster": { "status": "ACTIVE", "clusterName": ”preprod", "registeredContainerInstancesCount": 0, "pendingTasksCount": 0 "runningTasksCount": 0, "activeServicesCount": 0, "clusterArn": "arn:aws:ecs:us-east…” } } ECS Cluster CloudFormation (YAML) Resources: ECSCluster: Type: AWS::ECS::Cluster Properties: ClusterName: preprod
  • 22. ECS Container Instances • Highly available architecture, distributed across multiple Availability Zones • VPC with public and private subnets • Application Load Balancer with path based routing for inbound traffic • NAT gateways for outbound traffic • Auto Scaling group of container instances • CloudWatch Logs for centralized container logging Private Subnet Availability Zone Availability Zone Internet Gateway Public Subnet Public Subnet Private Subnet Nat GatewayNat Gateway AutoScaling GroupContainer InstanceContainer Instance Container InstanceContainer Instance Application Load Balancer CloudWatch Logs (container logs)
  • 23. Inbound Traffic $ curl -v https://api.example.com/v1/products/1 > GET / HTTP/1.1 > Host: api.example.com > User-Agent: curl/7.43.0 > Accept: */* • Incoming HTTP/HTTPS traffic comes in via the Application Load Balancer (ALB) in public subnets • The ALB uses path based routing to route /products/* to the container instances in private subnets running our product’s service • Supports dynamic host port mapping, allowing multiple containers of the same type on each host Internet Gateway AutoScaling GroupContainer Instance Container Instance Application Load Balancer
  • 24. Outbound Traffic • Our container instances are in private subnets, with no direct internet access • At some point, they might need access to external services • NAT gateways provide a highly scalable and available solution Private Subnet Internet Gateway Public Subnet Public Subnet Private Subnet Nat GatewayNat Gateway Container Instance Container Instance
  • 25. Logging Container Instance Container Instance CloudWatch Logs (container logs) • ECS integrates directly with CloudWatch Logs (as well as others) • Centralized collection container logs • Search, filter, and alert on log conditions • (more to come later…)
  • 26. tl;dr - ECS Reference Architecture on GitHub https://github.com/awslabs/ecs-refarch-cloudformation
  • 28. Reserved Instances Up to 75% Savings* • Use Auto Scaling groups • Reserve ECS container instances when you have known baseline capacity requirements. • Use On-Demand pricing for capacity peaks. * Dependent on specific AWS service, size/type, and region
  • 29. Spot Instances Up to 90% Savings* • Use Spot Fleet to maintain instance availability and define cluster based on required CPU/memory. * Compared to On-Demand price based on specific EC2 instance type, region, and Availability Zone
  • 30. Multiple ECS Clusters Creating multiple ECS clusters is easy, and often more cost efficient. Consider availability and compute requirements. Example: Development Cluster Spot Fleet Example: Production Cluster Auto Scaling group with Reserved Instances for baseline and On-Demand for capacity peaks Example: Batch Processing Cluster Spot Fleet of GPU Instances
  • 32. Scaling ECS Container Instances Automatically Min Desired Scale out as needed Max • Use Auto Scaling groups • Set Auto Scaling group min, max, desired • Scale in and out based on CloudWatch alarms
  • 33. Scaling ECS Container Instances Automatically Tip Use the ECS cluster MemoryReservation CloudWatch metric Tutorial: Scaling Container Instances with CloudWatch Alarms
  • 34. Application Auto Scaling for ECS Services
  • 35. Application Auto Scaling for ECS Services
  • 37. Patching ECS Container Instances ECSLaunchConfiguration: Type: AWS::AutoScaling::LaunchConfiguration Properties: ImageId: ami-1924770e ECSAutoScalingGroup: Type: AWS::AutoScaling::AutoScalingGroup Properties: MinSize: 2 MaxSize: 8 DesiredCapacity: 2 AutoScalingRollingUpdate: MinInstancesInService: 2 MaxBatchSize: 2 PauseTime: PT15M WaitOnResourceSignals: true 1. Ensure you have an AutoScalingRollingUpdate policy on your Auto Scaling group 2. Update the AMI in your CloudFormation template 3. aws cloudformation update-stack 4. Let CloudFormation perform a rolling update to your ECS container instances
  • 39. Minimal Containers • Use the smallest FROM base container to minimize surface attack • FROM scratch is ideal for Go and other languages that compile a (near) static binary
  • 40. IAM Roles IAM roles for container instances: • Bound to the ECS container instance • Applies to all containers running on the host • Pulling images from ECR • CloudWatch Logs IAM roles for tasks: • Bound to specific ECS tasks • Task-specific access to AWS services Tip Use principle of least privilege – prefer IAM roles for tasks where applicable
  • 41. Environment Variables • Quick and easy • Configuration stored in task definition (or passed in) • Version in immutable definition; easy rollback • Good for configuration items • Bad for secrets (API keys, passwords, etc.) Configuration & Secrets Management
  • 42. KMS + S3 / DynamoDB • Use environment variables to provide pointer to encrypted data in S3/DynamoDB • Use KMS or AWS encryption clients to encrypt secrets at rest • Use VPC endpoints, IAM policies, and IAM roles to restrict decryption Configuration & Secrets Management
  • 46. Centralized Logging with CloudWatch Logs { "image": ”nginx:latest", ... "logConfiguration": { "logDriver": "awslogs", "options": { "awslogs-group": ”nginx", "awslogs-region": "us-east-1" } } { • Defined within the task definition • Available log drivers • awslogs • fluentd • gelf • journald • json-file • splunk • Syslog • Submit a pull request on ECS agent GitHub repo if you would like others
  • 47. Centralized Logging with CloudWatch Logs
  • 48. Tip: Use Metric Filters with CloudWatch Logs 5
  • 49. Summary AWS is responsible for operations of the cloud. You are responsible for operations in the cloud using the building blocks provided.
  • 50. Networking and Monitoring Weave Net and Weave Scope
  • 51. Weave Net • Overlay network between hosts • First container networking solution • Automatic DNS-based service discovery • Automatic IP allocation (IPAM) • Minimum overhead (VxLan) • Gossip protocol to share updates (no explicit DB) • Multi DC • Encryption
  • 53. Weave Net: Service Discovery
  • 54. Weave Net: Service Discovery
  • 55. Weave Net on ECS ??
  • 58. Weave Scope Scope Probe (host 1) Scope Probe (host 2) Scope Probe (host n) Scope App Reports (CRDT-like semantics) Controls
  • 59. Weave Cloud Scope Probe (host 1) Scope Probe (host 2) Scope Probe (host n) https://cloud.weave.works
  • 60. Weave Net + Scope on ECS https://cloud.weave.works

Editor's Notes

  1. SIMPLY not JUST
  2. /
  3. Mention Tagging
  4. Mention Change Sets
  5. Quite a lot of text
  6. Security is #1 priority
  7. Mention expiring the logs
  8. What’s weave, goal How it complements ECS AMIs / CloudFormation
  9. Each container gets an IP Also: Multicast, AWS VPC Data-center agnostic
  10. Each hexagon is a container Each container gets its own IP (no port clashes) Non-fully connected topology Multi cloud, multi region,even multiorchestrator Routing and naming information is propagated through gossip without a central DB tolerant to partitions
  11. All the containers are created with name NAME Weave creates DNS records for each container and propagates it through Gossip A client can access the containers by that name and requests will be load balanced, randomly client-side
  12. Sample 2-tier appication
  13. Explain ECS infrastructure How is service discovery done? * Statically (list of IPs associated to each service) ELB ALB This requires management
  14. This is what we provide in the AMIs and Cloud Formation This is how we solve service discovery with Weave Net Expain how: Each node is equipped with Weave Router/DNS, propagating routing and DNS information Traffic itself doesn’t normally go through Weave: VxLan How Weave Proxy intercepts calls
  15. Visualization monitoring and control solution NO INSTRUMENTATION!!! Weave Scope describes and lets you interact with your microservice application without any instrumentation, you just need to run an agent (probe) in each of your hosts
  16. Weave Scope standalone is open source
  17. Weave cloud hosts Scope for you * Providing enterprise features: authentication, team management Zero management and firewall problems
  18. The Weave AMIs and Cloud formation Templates also come equipped with Weave Scope