• Six Pillars
  • Introduced in re:Invent in 2015
  • A set of questions you should ask about the architecture of your system
  • General Design Principles
    • Stop guessing your capacity needs
    • Test systems at production scale
    • Automate to make architectual experimentation easier
    • Allow for evolutionary architectures
    • Data driven architecture decisions
    • Improve through “game days” to simulate production events

Security

  • Design Principles
    • Apply security at all layers
    • Enable traceability
    • Automate responses to security events
    • Focus on securing your system
    • Automate security best practices
  • Shared Responsibility Model
    • Customers responsible for data, platform, applications, IAM, OS/Firewall Configuration, Client Side Encryption, Network Traffic Protection
    • AWS responsible for Compute, Storage, Database, Network, Regions, AZs, Edge Locations
  • Definition
    • Security consists of 4 areas
      • Data Protection
        • Basic data classification should be in place. Organize and classify data in segments such as publicly available, available to certain members, etc.
        • You should implement a least privilege access system so people can only access what they need
        • You should encrypte everything wherever possible
      • Priviledge Management
        • Ensure only authorized and authenticated users can access your resources
          • Access Control Lists
          • Role Based Access Control
          • Password Management
      • Infrastructure Protection
        • How you protect your VPC from the world
      • Detective Controls
        • Use detective controls to identify security breach
        • CloudTrail, CloudWatch, S3, Glacier
  • Key Services
    • ELB, EBS, S3, RDS for data protection
    • IAM w/ MFA for priviledge management
    • VPC for infrastructure protection
    • CloudTrail, CloudWatch, Config for detective controls

Reliability

  • Covers the ability of the system to recover from either service or infrastructure outage, as well as dynamically scale
  • Design Principles
    • Test your recovery procedures (think Netflix Simian Army)
    • Automatically recover from failure. Automate a system for KPIs and track failures, then automate recovery
    • Scale horizontally to increase aggregate system availability
    • Stop guessing capacity
  • Definitions
    • Reliability in the cloud consists of the following:
      • Foundations - ensure your foundation are in place before you “lay the first brick”. Understand the pre-requisite infrastructure is in place before jumping into code!
        • AWS handles a lot of this for you, but they do setup service limits to stop customers from accidentally over-provisioning
      • Change Management - be aware how change effects a system. Use monitoring to detect changes and react.
        • Use CloudWatch to monitor and auto-scaling to react to those changes
      • Failure Management - you should always architect your system with the assumption that failure will occur
  • Key Services
    • IAM and VPC for foundations
    • CloudTrail for change management
    • CloudFormation for failure management

Performance Efficiency

  • Focuses on how to use compute resources efficiency to meet business needs, and how to change as demand evolves
  • Design Principals
    • Democratize advanced technologies, rather than having to learn how to manage advanced services use hosted services in the cloud
    • Go global in minutes
    • Use serverless architectures
    • Experiment more often
  • Definition
    • Compute
      • Choose the right kind of server
      • With AWS you can change the type of server with a click of the button almost, or go serverless with Lambda
    • Storage
      • Understand the access needs of your system before selecting a storage solution
    • Database
      • Understand how to select the proper type of database need based upon application access needs, consistency requirements, etc.
    • Space/Time Tradeoff
      • Add read replicas to RDS to reduce load on databases by creating multiple copies
      • Use Direct Connect to provide predictable latency between your on premis and AWS
  • Key Services
    • Compute: Autoscaling
    • Storage: EBS, S3, Glacier
    • Database: RDS, DynamoDB, Redshift
    • Space/Time: CloudFront, Elasticache, etc.

Cost Optimization

  • Reduce your costs to a minimum and use those savings for other parts of your business
  • Design Principals
    • Transparently attribute expenditure - identify ROI on investment and convert to incentives to save cost
    • Use managed services to reduce cost of ownership and maintenance
    • Trade capital expense for operating expense - instead of purchasing expensive equipment and data centers
    • Benefit from economies of scale
    • Stop spending money on data center operations
  • Definition
    • Matched supply and demand
      • Try to optimially align supply with demand
      • Don’t over-provision your resources, auto scale instead
      • Or use Lambda/Serverless
    • Cost effective resources
      • Using the correct instance type is key to cost savings
      • Understand that sometimes the cheapest instance type isn’t the right answer. A t2micro running for 10 hours to complete a task is more expensive than an xlarge instance which does it in minutes
    • Expenditure Awareness
      • You no longer have to get quotes on physical servers anymore
      • Use tags and the like to allow tracking to a business unit where costs are going, as well as billing alerts
    • Optimize over time
      • AWS moves incredibly fast. Hundreds of new services per year.
      • Look at Trusted Advisor for recommendations
  • Key Services
    • Matched supply and demand: auto scaling
    • Cost Effective Resources: EC2, AWS Trusted Advisor
    • Expendature Awareness: CloudWatch, SNS
    • Optimizing Over Time: AWS Trusted Advisor, AWS Blog

Operational Excellence

  • Design Principles
    • Perform operations with code
    • Align operational processes to business objectives
    • Make small, redular incremental changes
    • Test for responses to unexpected events
    • Learn from operational events and failures
    • Keep operations procedures current
  • Definition
    • Preparation
      • Should have runbooks and playbooks. Runbooks offer operations guidance for normal operations, playbooks offer guidance for unexpected events
      • Use CloudFormation
      • Implement AutoScaling
      • Use AWS Config rules to create mechanisms to automatically track and respond to changes in your AWS Workload
    • Operation
      • Should be standardized and manageable on a routine basis
      • Focus on automation and small, frequent changes
      • Changes should not require scheduled downtime, nor should they require manual execution
      • Can setup CI/CD pipelines within AWS
    • Response
      • Responses to unexpected events should also be automated.
      • Alerts should be timely and invoke escalations as needed
  • Key Services
    • Preparation: AWS Service Catalog, AWS Config, Auto Scaling, SQS
    • Operation: CodeCommit, CodeDeploy, CodePipeline, CloudTrail
    • Response: CloudWatch, SNS

Sustainability

  • New in re:Invent 2021
  • Focuses on sustainable solutions to reduce carbon footprint of products and services