AWS Interview Cheat Sheet

The Big Six — Know These Cold

EC2 — Elastic Compute Cloud

What it is:Virtual machines (instances) running in AWS data centers

Purchasing:On-Demand, Reserved (1/3yr), Spot, Savings Plans, Dedicated Host

Billing:Per second (Linux/Ubuntu) or per hour (Windows, RHEL)

Families:t/m (general), c (compute), r/x (memory), p/g (GPU), i/d (storage)

Key feature:User Data scripts run on first boot to auto-configure instances

AMI:Amazon Machine Image — template (OS + software) used to launch instances

S3 — Simple Storage Service

What it is:Object storage for files, images, backups, logs, static websites

Durability:99.999999999% (eleven nines) — replicated across ≥ 3 AZs

Max object size:5 TB per object (multipart upload required above 5 GB)

Storage classes:Standard, IA, One Zone-IA, Glacier Instant/Flexible/Deep Archive, Intelligent-Tiering

Not for:Block storage or OS drives — use EBS for that

Buckets:Globally unique name; data stored in a specific region

IAM — Identity & Access Management

What it is:Controls who can access which AWS resources and how

Entities:Users (people/apps), Groups (collections of users), Roles (temporary credentials), Policies (JSON permission docs)

Root account:Never use for daily tasks; lock with MFA immediately

Principle:Least privilege — grant only the permissions required

Roles:Used by AWS services (e.g., EC2 to access S3); avoids hardcoding credentials

Global:IAM is not region-specific — applies across all regions

VPC — Virtual Private Cloud

What it is:Your own logically isolated network within AWS

Subnets:Public (has route to IGW) vs Private (no direct internet access)

IGW:Internet Gateway — attaches to VPC, enables internet access for public subnets

NAT Gateway:Lets private subnet instances reach internet outbound; blocks inbound

Default VPC:Each AWS account gets one default VPC per region (CIDR: 172.31.0.0/16)

CIDR:IP range of the VPC, e.g. 10.0.0.0/16 (65,536 IPs)

RDS — Relational Database Service

What it is:Fully managed SQL databases — AWS handles patching, backups, HA

Engines:MySQL, PostgreSQL, MariaDB, Oracle, SQL Server, Aurora

Multi-AZ:Synchronous standby replica in another AZ; auto failover (HA, not read scaling)

Read Replicas:Asynchronous; for read scaling; can be cross-region; not for failover

Backups:Automated backups 1–35 day retention + manual snapshots

Not for:NoSQL or key-value workloads — use DynamoDB instead

Lambda

What it is:Run code without provisioning or managing servers (serverless/FaaS)

Triggers:S3 events, API Gateway, SQS, SNS, DynamoDB Streams, CloudWatch, EventBridge

Pricing:Pay per request ($0.20/1M requests) + duration rounded to 1ms

Limits:Max 15 min timeout; 128 MB – 10 GB memory; 512 MB–10 GB ephemeral /tmp

Concurrency:Default 1,000 concurrent executions per region (soft limit; can increase)

Cold start:First invocation latency; mitigate with Provisioned Concurrency

Interview tip: Know the compute model differences — EC2 (IaaS, you manage OS), Elastic Beanstalk (PaaS, AWS manages infra), Lambda (FaaS, serverless), ECS/EKS (containers). They'll ask "which would you use for X?"

EC2 Purchasing Options

Type	Savings	Commitment	Best for
On-Demand	—	None	Short-term, unpredictable workloads; testing
Reserved Instances (Standard)	Up to 72%	1 or 3 years	Steady-state, predictable usage (specific instance type/region)
Reserved Instances (Convertible)	Up to 54%	1 or 3 years	Steady-state but need flexibility to change instance type
Savings Plans (Compute)	Up to 66%	1 or 3 years	Flexible — applies to EC2, Lambda, Fargate across any region/family
Savings Plans (EC2 Instance)	Up to 72%	1 or 3 years	Like Standard RI but more flexible (any size/OS in a family/region)
Spot Instances	Up to 90%	None (interruptible)	Fault-tolerant batch jobs, CI/CD, big data, stateless web
Dedicated Instances	—	On-Demand pricing	Single-tenant hardware; shared hardware within your account
Dedicated Hosts	—	On-Demand or Reserved	Compliance, BYOL (Bring Your Own License) requirements

EC2 Instance Families

Instance Types

t, m:General purpose — balanced CPU/memory/network Default choice

c:Compute optimized — high CPU-to-memory ratio; gaming, HPC, batch

r, x, z:Memory optimized — high RAM; in-memory DBs, real-time big data

p, g, inf, trn:Accelerated computing — GPU, ML training/inference

i, d, h:Storage optimized — high sequential I/O; NVMe SSD, Hadoop

Naming:e.g. m5.xlarge — family(m) + generation(5) + size(xlarge)

Other Compute Services

Elastic Beanstalk:PaaS — deploy code, AWS handles load balancing/scaling/patching

ECS:Elastic Container Service — managed Docker orchestration (AWS proprietary)

EKS:Elastic Kubernetes Service — managed Kubernetes control plane

Fargate:Serverless containers — no EC2 instances to manage; works with ECS & EKS

Lightsail:Simple VPS — fixed monthly price; good for small projects/beginners

AWS Batch:Fully managed batch computing at any scale using EC2/Spot/Fargate

Outposts:AWS hardware in your on-prem data center for hybrid cloud

EC2 Storage Options

Storage type	Persistence	Scope	Use case
Instance Store	Ephemeral — lost on stop/terminate	Local to host	Temp buffer, cache; highest I/O performance
EBS gp3 (SSD)	Persistent	One AZ, one instance at a time*	OS volumes, databases, general workloads
EBS io2 Block Express	Persistent	One AZ	Latency-sensitive databases (SAP HANA, Oracle)
EBS st1 / sc1 (HDD)	Persistent	One AZ	Throughput-heavy, sequential (log processing, cold data)
EFS (NFS)	Persistent	Multi-AZ, multi-instance	Shared file system across many EC2 instances
S3	Persistent	Regional, globally accessible	Object storage — backups, media, static content

Key distinction: EBS is locked to one AZ (you CAN use EBS Multi-Attach with io1/io2 on a limited basis). EFS spans multiple AZs automatically. S3 is not mounted like a drive — it's accessed via API/URL.

S3 Storage Classes — Cost vs Access Speed

Class	Retrieval speed	Min storage	Best for
S3 Standard	Instant (ms)	None	Frequently accessed data; web assets, active content
S3 Intelligent-Tiering	Instant (ms)	None	Unknown or changing access patterns; auto-optimizes cost
S3 Standard-IA	Instant (ms, retrieval fee applies)	30 days	Infrequent access; disaster recovery backups
S3 One Zone-IA	Instant (ms, retrieval fee applies)	30 days	Infrequent, reproducible data; only one AZ (risk of loss)
S3 Glacier Instant Retrieval	Milliseconds	90 days	Archived data accessed ~once a quarter
S3 Glacier Flexible Retrieval	Minutes to 12 hours	90 days	Long-term backup archives with occasional retrieval
S3 Glacier Deep Archive	12–48 hours	180 days	7–10 year regulatory retention; lowest cost storage in AWS

S3 Key Features

Access & Security

Bucket policies:JSON resource-based policies — grant/deny access to bucket/objects

Block Public Access:Account or bucket level setting — always enable unless you have a specific reason not to

Server-side encryption:SSE-S3 (AWS-managed keys), SSE-KMS (your KMS key), SSE-C (customer-provided key)

Presigned URLs:Time-limited, temporary access URLs for private objects (e.g. download link that expires)

ACLs:Legacy per-object access control; AWS recommends bucket policies instead

Functionality

Versioning:Keep all versions of every object; protects against accidental deletion/overwrites

Lifecycle rules:Automatically transition objects to cheaper classes or expire them by age

Replication:CRR (cross-region replication) for compliance/latency; SRR (same-region) for log aggregation

Static website hosting:Serve HTML/CSS/JS files directly from S3 with a public endpoint

Event notifications:Trigger Lambda, SQS, or SNS on object PUT/DELETE events

S3 Select:Query CSV/JSON/Parquet data in-place with SQL — no download needed

Other Storage Services

Block & File Storage

EBS volumes:gp2/gp3 (SSD, general), io1/io2 (provisioned IOPS), st1/sc1 (HDD)

EBS Snapshots:Point-in-time backups stored in S3; incremental; can copy cross-region

EFS:Elastic NFS file system; auto-scales; Linux workloads; two modes: Standard and One Zone

FSx for Windows:Managed Windows File Server (SMB protocol, Active Directory integration)

FSx for Lustre:High-performance parallel file system for HPC, ML, video processing

Data Transfer & Hybrid

Storage Gateway:Bridge on-prem to AWS — File Gateway (S3), Volume Gateway (EBS), Tape Gateway (Glacier)

AWS DataSync:Automate data transfer between on-prem/S3/EFS/FSx; up to 10x faster than manual

Snowcone:Smallest Snow device — 8 TB usable; portable, rugged

Snowball Edge:80 TB usable; Storage Optimized or Compute Optimized variant; edge processing

Snowmobile:Exabyte-scale — 100 PB per truck; for massive data center migrations

AWS Backup:Centralized, policy-based backup across EC2, RDS, DynamoDB, EFS, S3, etc.

Choosing the Right Database

Service	Type	Best for	Key fact
RDS	Relational (SQL)	OLTP; traditional apps needing ACID compliance	Managed; supports 6 engines
Aurora	Relational (MySQL/PostgreSQL compatible)	High-throughput SQL; production workloads	Up to 5× faster than MySQL; auto-healing, 6-copy replication across 3 AZs
Aurora Serverless v2	Relational	Variable/unpredictable workloads	Scales in fractions of ACUs; pay per use
DynamoDB	NoSQL (key-value + document)	Serverless, single-digit ms latency, massive scale	Fully managed; auto-scales; no server to manage
ElastiCache for Redis	In-memory cache	Session management, leaderboards, pub/sub, caching	Sub-millisecond; supports rich data types
ElastiCache for Memcached	In-memory cache	Simple distributed caching (key-value only)	Multi-threaded; simpler than Redis
Redshift	Data warehouse (columnar SQL)	OLAP — analytics, BI, large-scale reporting	Petabyte-scale; Redshift Spectrum queries S3 directly
DocumentDB	Document (MongoDB-compatible)	JSON document storage; content catalogs, user profiles	Not actual MongoDB — AWS-built compatible engine
Neptune	Graph	Social networks, fraud detection, knowledge graphs	Supports Gremlin and SPARQL
Timestream	Time-series	IoT sensor data, telemetry, operational metrics	Auto-scales; faster and cheaper than relational for time-series
QLDB	Ledger (immutable)	Financial audit trails, supply chain provenance	Cryptographically verifiable transaction log
Keyspaces	Wide column (Cassandra-compatible)	Cassandra workloads without managing infrastructure	Serverless; pay per use

RDS vs Aurora vs DynamoDB Deep Dive

RDS — Multi-AZ vs Read Replicas

Multi-AZ:Synchronous standby; auto-failover ~60s; for HA not performance

Read Replicas:Asynchronous copy; scale read traffic; up to 5 per DB

Cross-region RR:Read Replicas can be in different regions (disaster recovery + low latency reads)

Promote RR:Can promote a Read Replica to a standalone DB (breaks replication)

Aurora Extras

Storage:Auto-grows in 10 GB increments up to 128 TB; 6 copies across 3 AZs

Read Replicas:Up to 15 Aurora Read Replicas with sub-10ms replication lag

Failover:Automatic, faster than RDS Multi-AZ (~30s)

Global Database:Primary + up to 5 read-only regions; <1s replication lag

DynamoDB Key Concepts

Primary key:Partition key alone, or partition key + sort key (composite)

Capacity modes:Provisioned (RCU/WCU) or On-Demand (pay per request)

GSI/LSI:Global Secondary Index (any attributes, cross-partition); Local (same partition, sort key only)

DynamoDB Streams:Ordered log of item-level changes — trigger Lambda in real time

DAX:DynamoDB Accelerator — in-memory cache; microsecond reads

TTL:Automatically delete items by timestamp attribute (no RCU charge)

Remember: RDS = managed SQL (you choose engine). Aurora = AWS-optimized MySQL/PostgreSQL (faster, better HA). DynamoDB = NoSQL, serverless, infinite horizontal scale. ElastiCache sits in front of any DB to cache hot reads.

VPC Core Components

VPC Building Blocks

Subnet:Divides VPC CIDR into smaller ranges; tied to one AZ; public (route to IGW) or private

Route Table:Rules controlling where traffic is directed; each subnet associates with one route table

Internet Gateway (IGW):Horizontally scaled, HA gateway; attaches to VPC for internet access

NAT Gateway:Managed service in public subnet; lets private instances initiate internet connections (outbound only)

Elastic IP:Static public IPv4 address; associated with an instance or NAT Gateway

VPC Endpoints:Private connection to AWS services (S3, DynamoDB) without internet; Gateway or Interface type

VPC Connectivity

VPC Peering:1-to-1 private connection between VPCs (same or different account/region); not transitive

Transit Gateway:Hub-and-spoke router; connect thousands of VPCs + on-prem; IS transitive

VPN (Site-to-Site):Encrypted IPSec tunnel from on-prem to AWS over public internet

Client VPN:OpenVPN-based; individual users connect to VPC securely

Direct Connect:Dedicated private fiber link to AWS; consistent bandwidth; NOT over internet; 1 Gbps or 10 Gbps

PrivateLink:Expose your service to other VPCs privately via Interface VPC Endpoints

Security Groups vs NACLs — Critical Distinction

Feature	Security Groups	Network ACLs (NACLs)
Level	Instance level (ENI)	Subnet level
State	Stateful — return traffic automatically allowed	Stateless — must explicitly allow inbound AND outbound
Rules	Allow rules only; no deny rules	Allow AND deny rules; evaluated in order by rule number
Default behavior	Deny all inbound, allow all outbound	Default NACL allows all; custom NACL denies all until rules added
Rule evaluation	All rules evaluated together	Rules evaluated lowest number first; stops at first match
Association	Multiple SGs per instance	One NACL per subnet

Traffic Distribution & CDN

Elastic Load Balancer (ELB)

ALB:HTTP/HTTPS Layer 7; path-based & host-based routing; ideal for microservices

NLB:TCP/UDP Layer 4; ultra-low latency; handles millions of requests/sec; static IP

GWLB:Layer 3; distributes traffic to 3rd-party virtual appliances (firewalls, IDS/IPS)

CLB:Classic LB (legacy) — Layer 4 & 7; avoid for new architectures

CloudFront (CDN)

What:Content Delivery Network — caches content at 400+ global Edge Locations

Origins:S3, ALB, EC2, or any custom HTTP endpoint

Cache:Reduce latency + offload origin traffic; configurable TTL per behavior

Security:Integrates with WAF, Shield; HTTPS only; OAC/OAI to restrict direct S3 access

Route 53

What:Managed authoritative DNS; domain registration; health checks

Simple:One record → one value; no health checks

Weighted:Split traffic by percentage — A/B testing, canary deployments

Latency-based:Route to region with lowest network latency

Failover:Active-passive; switch to secondary if health check fails

Geolocation/Geoproximity:Route based on user's geographic location

Stateless vs Stateful explained: Security Groups are stateful — if you allow port 443 inbound, the response traffic on ephemeral ports is automatically allowed back out. NACLs are stateless — you must explicitly create both the inbound allow rule AND the outbound allow rule for the same connection.

Shared Responsibility Model

AWS RESPONSIBILITY — "Security OF the Cloud"

Physical data centers & buildings · Power & cooling & network hardware · Hypervisor & host OS · Managed service software (e.g. RDS engine patching, S3 hardware) · Global fiber network & edge infrastructure

CUSTOMER RESPONSIBILITY — "Security IN the Cloud"

Guest OS patching on EC2 · Application code security · IAM users, roles & permissions · Data encryption (at rest and in transit) · Security Group & NACL configuration · Customer data & classification · Network configuration (VPC, subnets, routing)

The line moves with managed services: For RDS, AWS patches the DB engine (their responsibility), but you configure Security Groups, IAM access, and encryption (your responsibility). For Lambda, AWS manages everything except your code and IAM permissions.

IAM — Core Concepts

IAM Entities

Root user:Created with AWS account; has full access; lock away with MFA; never use for daily tasks

IAM User:Long-term credentials (username/password + access keys); represents a person or application

IAM Group:Logical collection of IAM Users; attach policies to group, not individual users

IAM Role:Temporary credentials; assumed by services, users, or cross-account principals; no long-term keys

IAM Policy:JSON document with Effect (Allow/Deny), Action, Resource, Condition fields

Explicit Deny:Always overrides any Allow — most specific denial wins

IAM Best Practices

Enable MFA:On root account and all privileged/admin users; virtual, hardware, or U2F key

Least privilege:Start with no permissions; grant only what is required for the task

No root for daily use:Create admin IAM user for day-to-day administration

Rotate access keys:Regularly rotate programmatic access keys; delete unused ones

Use Roles, not keys:Attach IAM Role to EC2 instead of embedding access keys in code

SCP (Organizations):Service Control Policies — org-wide guardrails; can restrict what even admins can do

Security Services

Service	What it does	Key detail
KMS	Create, manage, and control encryption keys (CMKs)	Integrated with most AWS services; audit usage via CloudTrail
CloudHSM	Dedicated hardware security module in your VPC	FIPS 140-2 Level 3; you manage keys; KMS can use CloudHSM as backing store
Secrets Manager	Store, rotate, and retrieve secrets (DB passwords, API keys)	Auto-rotates RDS passwords; native Lambda rotation; replaces SSM Parameter Store for secrets
SSM Parameter Store	Hierarchical key-value store for config + secrets	Free tier for standard params; SecureString uses KMS; no auto-rotation
GuardDuty	Intelligent threat detection — ML-based anomaly detection	Analyzes CloudTrail, VPC Flow Logs, DNS logs; no agents; works even if logging is disabled on resources
Inspector	Automated vulnerability scanning	Scans EC2 (via SSM agent) and ECR container images for CVEs and network exposure
Macie	Discover and protect sensitive data in S3	ML-powered; finds PII, financial data, credentials; sends findings to Security Hub
Shield Standard	DDoS protection for all AWS customers	Free; protects against common L3/L4 attacks (SYN floods, reflection attacks)
Shield Advanced	Enhanced DDoS protection with 24/7 DRT access	~$3,000/month; cost protection; works with ALB, CloudFront, Route 53, EC2, EIP
WAF	Web Application Firewall — filter HTTP/S traffic	Protects against SQLi, XSS, rate limiting, IP blocking; applies to ALB, CloudFront, API Gateway
Security Hub	Central security findings aggregator and compliance dashboard	Aggregates from GuardDuty, Inspector, Macie; maps to CIS, PCI-DSS, NIST standards
Detective	Investigate and analyze security findings	Uses ML + graph analysis on CloudTrail, VPC Flow, GuardDuty data; for forensics
Cognito	User identity and authentication for web/mobile apps	User Pools (user directory, sign-up/in); Identity Pools (federate access to AWS services)

The Big Three — CloudWatch vs CloudTrail vs Config

CloudWatch — Performance Monitoring

Metrics:Collect & track time-series data (CPU, NetworkIn, DiskOps) from AWS services

Custom Metrics:Push your own app/infra metrics via PutMetricData API or CloudWatch Agent

Alarms:Alert when metric breaches threshold; trigger SNS, Auto Scaling, EC2 actions

Logs:Collect, store, query log data; Log Groups → Log Streams; Logs Insights for SQL-like queries

Dashboards:Cross-region, cross-account customizable monitoring views

Events/EventBridge:React to state changes in real time; schedule cron-like tasks

Agent:CloudWatch Agent required to collect memory/disk metrics from EC2 (not built-in)

CloudTrail — API Audit Logging

What:Records every API call made in your AWS account — who did what, when, from where

Captures:Management events (control plane) + Data events (S3 object ops, Lambda invocations) + Insight events

Retention:90-day event history free in console; deliver to S3 for indefinite retention

Integrity:Log file validation — detects if logs were tampered with (SHA-256 digest files)

Multi-region:Single trail can cover all regions; always enable in all regions

Use for:Security audits, compliance, "who deleted that resource?" investigations

AWS Config — Resource Compliance

What:Track and record configuration changes to AWS resources over time

Config Rules:Evaluate resources against desired configurations; AWS-managed or custom Lambda rules

Timeline:See config history for any resource — what changed, when, who triggered it

Remediation:Auto-remediate non-compliant resources via SSM Automation documents

Aggregation:Config Aggregator — multi-account, multi-region compliance view

Use for:Compliance (PCI-DSS, HIPAA, SOC2), drift detection, change management

Additional Monitoring & Operations Tools

Service	Purpose	Key detail
X-Ray	Distributed request tracing for microservices and Lambda	Visualize service maps; find bottlenecks; debug latency; requires X-Ray SDK in app
Trusted Advisor	Automated best practice recommendations	5 pillars: Cost, Performance, Security, Fault Tolerance, Service Limits; Business/Enterprise plan unlocks all checks
AWS Health / Personal Health Dashboard	AWS service health + your account-specific events	Service Health Dashboard = global AWS status; Personal Health = your resources affected by AWS events
Compute Optimizer	Right-sizing recommendations to reduce cost/improve performance	Analyzes EC2, ASG, Lambda, EBS using ML; shows over-provisioned resources
Systems Manager (SSM)	Operations management for EC2 and on-prem instances	Session Manager (SSH without SSH/bastion), Run Command, Patch Manager, Parameter Store, Automation
CloudFormation	Infrastructure as Code — define AWS resources in JSON or YAML	Stack = group of resources; drift detection; rollback on failure; Change Sets to preview changes
CDK (Cloud Development Kit)	Define cloud infrastructure using Python, TypeScript, Java, etc.	Compiles to CloudFormation; higher-level abstractions (L1 = raw CFN, L2 = opinionated constructs)

Classic interview trap: CloudWatch = monitor HOW your resources are performing (CPU, latency, errors). CloudTrail = WHO made API calls (audit log). Config = WHAT changed in your config over time (compliance/drift). All three are different and complementary.

Serverless Architecture Stack

Lambda — Deep Dive

Runtimes:Node.js, Python, Java, Go, Ruby, .NET; or Custom Runtime (any language via bootstrap binary)

Memory:128 MB – 10,240 MB; CPU scales proportionally with memory allocation

Timeout:Max 15 minutes (900 seconds) per invocation

Ephemeral storage:/tmp — 512 MB (default) up to 10 GB; use for temp files during execution

Layers:Share code/dependencies across functions; up to 5 layers per function

Concurrency:Default 1,000 concurrent executions per region; Reserved or Provisioned Concurrency available

Invocation types:Synchronous (API GW, CLI) — wait for response; Asynchronous (S3, SNS) — fire and forget; Event Source Mapping (SQS, DynamoDB Streams, Kinesis)

API Gateway

What:Create, deploy, manage, and secure REST, HTTP, and WebSocket APIs at scale

REST API:Feature-rich — request/response transformation, usage plans, API keys, caching

HTTP API:Lower latency, lower cost; fewer features; good for Lambda proxy & HTTP backends

WebSocket API:Persistent connections for real-time apps (chat, dashboards, gaming)

Auth options:IAM, Amazon Cognito User Pools, Lambda Authorizer (custom JWT/OAuth)

Throttling:10,000 requests/sec default steady-state (5,000 burst); configurable per stage

Stages:Deploy to named stages (dev, staging, prod) with independent settings

Messaging & Eventing

SQS — Simple Queue Service

Standard queue:At-least-once delivery; best-effort ordering; nearly unlimited throughput

FIFO queue:Exactly-once processing; strict ordering; 300 msg/s (3,000 with batching)

Visibility timeout:Period a received message is hidden from other consumers (default 30s, max 12hr)

Message retention:Default 4 days; configurable 1 minute to 14 days

Max message size:256 KB (use S3 + Extended Client Library for larger payloads)

DLQ:Dead Letter Queue — messages that fail processing N times are moved here for inspection

Long Polling:Wait up to 20s for messages to arrive — reduces empty responses and cost

SNS — Simple Notification Service

What:Pub/Sub — publish a message once, deliver to many subscribers simultaneously

Subscribers:Lambda, SQS, HTTP/S endpoints, Email, SMS, Mobile Push (APNS, GCM)

Fan-out pattern:SNS topic → multiple SQS queues — parallel processing with different consumers

FIFO topics:Ordered, deduplication; only SQS FIFO queues can subscribe

Message filtering:Subscription filter policies — each subscriber receives only relevant messages

SQS vs SNS:SQS = pull-based queue (one consumer processes each message). SNS = push-based, all subscribers get every message.

Kinesis — Real-time Streaming

Kinesis Data Streams:Real-time data ingestion; shards (1 MB/s in, 2 MB/s out each); retention 1–365 days

Kinesis Data Firehose:Fully managed delivery to S3, Redshift, OpenSearch, Splunk; near real-time (~60s)

Kinesis Data Analytics:SQL or Apache Flink queries on streaming data in real time

vs SQS:Kinesis = multiple consumers, ordered per shard, replay-able. SQS = one consumer group processes & deletes.

Orchestration & Events

Step Functions

What:Orchestrate multi-step workflows as JSON-defined state machines; visual workflow designer

State types:Task, Choice (branching), Wait, Parallel, Map (iterate over array), Pass, Succeed, Fail

Standard:Up to 1 year duration; exactly-once; at-most-once execution per transition; for long-running workflows

Express:Up to 5 min; at-least-once; higher throughput; for high-volume, short-lived workflows

EventBridge

What:Serverless event bus; route events between AWS services, SaaS apps, custom apps

Event buses:Default (AWS services), custom (your app), partner (SaaS: Zendesk, Stripe, etc.)

Rules:Match events by pattern; route to Lambda, SQS, SNS, Step Functions, API Gateway, etc.

Scheduler:Cron and rate-based schedules; replaced CloudWatch Events Scheduled Rules

Schema Registry:Discover, create, and manage event schemas; auto-generates code bindings

Serverless pattern to know: API Gateway → Lambda → DynamoDB is the classic serverless CRUD backend. Add Cognito for auth, CloudFront in front of API GW for caching/global edge, and SQS between Lambda functions for decoupling.

AWS Global Infrastructure

Region

What:Geographic cluster of data centers; 33+ regions worldwide

Choosing a region:Data residency laws, latency to users, service availability, pricing

Isolated:Regions are completely independent; disaster in one does NOT affect another

Availability Zone (AZ)

What:One or more discrete, isolated data centers within a region; 105+ AZs globally

Connected:Low-latency, high-bandwidth private fiber links between AZs in a region

Best practice:Deploy across ≥ 2 AZs for high availability

Edge Locations

What:CloudFront CDN PoPs; 400+ globally; cache and serve content close to users

Local Zones:AWS infra extensions to metro areas for ultra-low latency (gaming, live streaming)

Wavelength:AWS compute embedded in 5G networks; sub-10ms latency for mobile apps

Auto Scaling Groups (ASG)

ASG Core Concepts

Purpose:Automatically add or remove EC2 instances based on demand or schedule

Launch Template:Defines what instance to launch — AMI, instance type, security groups, user data, etc.

Min / Max / Desired:Always configure all three; Desired is current target; Min protects against scale-in removing everything

Multi-AZ:ASG distributes instances across AZs and rebalances automatically

ELB integration:ASG registers new instances with ELB; removes and drains unhealthy ones

Cooldown period:Default 300s after scaling action; prevents rapid repeated scaling

Scaling Policies

Target Tracking:Maintain a target metric (e.g. CPU = 50%); AWS adjusts capacity automatically — most common

Step Scaling:Scale by a configured amount when a CloudWatch alarm triggers; different steps for different breach levels

Simple Scaling:Single step on alarm; waits for cooldown before re-evaluating; legacy option

Scheduled Scaling:Scale based on known patterns (e.g. add 5 instances every weekday at 9am)

Predictive Scaling:ML-based; proactively scales before anticipated demand spikes

Disaster Recovery Strategies

Strategy	RTO	RPO	Cost	How it works
Backup & Restore	Hours	Hours	$	Regular backups to S3/Glacier; restore infra from scratch via CloudFormation on disaster
Pilot Light	Tens of minutes	Minutes	$$	Core systems (DBs) always running in DR region; scale out app servers only after failover
Warm Standby	Minutes	Seconds	$$$	Reduced-scale replica running continuously in DR region; scale to full size on failover
Multi-Site Active/Active	Near zero (<60s)	Near zero	$$$$	Full-scale environment in 2+ regions simultaneously; Route 53 Weighted or Latency routing splits traffic

HA & DR Key Terms

Recovery Objectives

RPO (Recovery Point Objective):Max acceptable data loss measured in time — "how old can the data be when we recover?"

RTO (Recovery Time Objective):Max acceptable time to restore service — "how long can we be down?"

Fault Tolerant:System continues operating with zero downtime despite component failure (harder, more expensive)

Highly Available:System recovers quickly from failure with minimal downtime (practical HA target)

HA Patterns

Multi-AZ RDS:Synchronous standby; auto failover; typically <60–120 seconds RTO

S3 CRR:Cross-Region Replication — async; enables cross-region HA for object storage

Route 53 Failover:Active-passive with health checks; automatically routes DNS to secondary endpoint

Aurora Global DB:RPO < 1s; RTO < 1 min; global write forwarding; cross-region reads

DynamoDB Global Tables:Active-active multi-region; auto-replication; single-digit ms reads anywhere

HA ≠ DR: High Availability addresses AZ-level failures (one data center goes down). Disaster Recovery addresses region-level failures (entire region is unavailable). Both are separate design concerns and interviewers love this distinction.

The 6 Pillars of the AWS Well-Architected Framework

PILLAR 01

Operational Excellence

Run and monitor systems to deliver business value while continuously improving supporting processes. Key practices: Infrastructure as Code (CloudFormation/CDK), CI/CD pipelines, small & frequent reversible changes, anticipate failure (game days), runbooks and playbooks, annotate documentation. Key services: CodePipeline, CodeDeploy, Systems Manager, CloudFormation.

PILLAR 02

Security

Protect data, systems, and assets. Key practices: Strong identity foundation (IAM least privilege, MFA), enable traceability (CloudTrail, Config), apply security at all layers (SGs, NACLs, WAF), automate security best practices, encrypt data in transit and at rest (KMS), protect people from making mistakes (SCPs). Key services: IAM, KMS, GuardDuty, Security Hub, Shield, WAF, Macie.

PILLAR 03

Reliability

Recover from infrastructure or service failures, dynamically acquire computing resources to meet demand. Key practices: Test recovery procedures, automatically recover from failure (Auto Scaling, Multi-AZ), scale horizontally, stop guessing capacity, manage change with automation. Key services: Auto Scaling, ELB, Multi-AZ RDS, Route 53, Backup, CloudFormation.

PILLAR 04

Performance Efficiency

Use computing resources efficiently and maintain efficiency as demand changes. Key practices: Democratize advanced technologies (use managed services), go global in minutes (CloudFront, Multi-Region), use serverless architecture, experiment more often, mechanical sympathy (choose the right resource type). Key services: Lambda, Fargate, CloudFront, ElastiCache, DynamoDB, Compute Optimizer.

PILLAR 05

Cost Optimization

Avoid unnecessary costs. Key practices: Implement cloud financial management, adopt a consumption model (pay only for what you use), measure overall efficiency (CloudWatch, Cost Explorer), stop spending money on undifferentiated heavy lifting (use managed services), analyze and attribute expenditure (tagging, Cost Allocation Tags). Key services: Cost Explorer, Budgets, Savings Plans, Reserved Instances, Compute Optimizer, Trusted Advisor.

PILLAR 06

Sustainability

Minimize environmental impacts of running cloud workloads. Key practices: Understand your impact, establish sustainability goals, maximize utilization (right-size, reduce idle resources), anticipate and adopt more efficient offerings (Graviton/ARM processors), use managed services (AWS achieves higher utilization than individual customers), reduce downstream impact. Key services: Graviton instances, serverless, Compute Optimizer, Sustainability in the Well-Architected Tool.

Well-Architected Tool

What it is:Free AWS Console tool to review your architecture against the 6 pillars using questionnaires

Output:Improvement plan with prioritized recommendations and links to guidance

Lens library:Specialized lenses for SaaS, Serverless, Machine Learning, Analytics, Government, etc.

Partner programs:AWS Well-Architected Partner Program — APN partners can run formal reviews

Mnemonic: O · S · R · P · C · S → "Oh So Reliable, Performance Costs Something" — Operational Excellence, Security, Reliability, Performance Efficiency, Cost Optimization, Sustainability

AWS Pricing Principles

Pay for what you use

No upfront costs for most services; pay per unit of consumption (seconds, GB, requests)

Compute billed per-second (Linux EC2, Lambda) or per-hour (Windows)

Pay less as you use more

Tiered pricing — the more you use, the less you pay per unit

Applies to S3 storage, EC2 data transfer, and many other services

Save with reservations

Commit to 1 or 3 years for significant discounts vs On-Demand

Reserved Instances (up to 72%), Savings Plans (up to 66%)

Cost Management Tools

Tool	Purpose	Key capability
Cost Explorer	Visualize, analyze, and forecast spend & usage	12-month historical view; 12-month forecast; RI/SP recommendations; filter by service/tag/account
AWS Budgets	Set cost & usage thresholds with alerts	Alert at X% of budget; trigger SNS/email; Budgets Actions can stop EC2/RDS automatically
Cost & Usage Report (CUR)	Most granular cost data available	Hourly or daily CSV; every resource/usage type; load into Athena or Redshift for analysis
Pricing Calculator	Estimate cost for new architectures before building	calculator.aws — model any combination of services; export as CSV or share link
Savings Plans	Flexible discount model — commit $/hr compute spend	Compute SP (most flexible — EC2/Lambda/Fargate any region/family); EC2 SP (higher discount, specific family)
Reserved Instances	Commit to specific instance for 1 or 3 years	Standard RI (up to 72%, fixed); Convertible RI (up to 54%, can change family/OS/tenancy)
Cost Allocation Tags	Tag resources to attribute costs to teams/projects	AWS-generated + user-defined; activated in Billing Console; appears in CUR
Compute Optimizer	Right-sizing recommendations	Identifies over-provisioned EC2, Lambda, EBS; estimates potential savings

AWS Support Plans

Plan	Starting price	Critical case response	Key benefits
Basic	Free (all accounts)	No technical support	Documentation, forums, Trusted Advisor (7 core checks), Personal Health Dashboard
Developer	$29/month (or 3% of monthly spend)	Not available	1 primary contact, business hours email support, general guidance <24 hr, system impaired <12 hr
Business	$100/month (or % of spend)	Production system down: <1 hour	Unlimited contacts, 24/7 phone/chat/email, full Trusted Advisor, AWS Health API, Infrastructure Event Management (extra fee)
Enterprise On-Ramp	$5,500/month (or % of spend)	Business-critical system down: <30 min	Pool of Technical Account Managers, Concierge Support Team, annual architecture reviews, proactive programs
Enterprise	$15,000/month (or % of spend)	Business-critical system down: <15 min	Dedicated TAM, Concierge Support, proactive guidance, Well-Architected Reviews, training credits

Free Tier — Always Free vs 12-Month Free

Always Free (no expiry)

Lambda:1 million requests/month + 400,000 GB-seconds compute/month

DynamoDB:25 GB storage + 25 WCU + 25 RCU

CloudFront:1 TB data transfer out + 10 million HTTP/S requests/month

CloudWatch:10 custom metrics + 10 alarms + 1 million API requests

SNS:1 million publish API requests; email delivery always free

SQS:1 million requests/month

Cognito:50,000 monthly active users in User Pools

12-Month Free (new accounts only)

EC2:750 hours/month of t2.micro or t3.micro (Linux/Windows)

S3:5 GB standard storage + 20,000 GET + 2,000 PUT requests

RDS:750 hours/month of db.t2.micro or db.t3.micro (MySQL, PostgreSQL, MariaDB)

CloudFront:1 TB data transfer out

EBS:30 GB of SSD storage (gp2/gp3 or magnetic)

Elastic Load Balancer:750 hours/month of Classic or Application Load Balancer

Keyword → Service (Fast-Fire Recall)

If the question says…

"serverless"→ Lambda, Fargate, DynamoDB, Aurora Serverless, S3, API Gateway

"containers"→ ECS (Docker, AWS orchestrator), EKS (Kubernetes), Fargate (serverless containers), ECR (registry)

"shared file system / NFS"→ EFS (not EBS — EBS is one AZ, one instance)

"decouple / async / buffer"→ SQS (queue for decoupling), SNS (pub/sub fan-out)

"real-time data streaming"→ Kinesis Data Streams; Kinesis Firehose to deliver to S3/Redshift

"CDN / edge caching"→ CloudFront

"DNS / routing"→ Route 53

"cache / reduce DB load"→ ElastiCache (Redis or Memcached); DAX for DynamoDB specifically

"data warehouse / BI / OLAP"→ Redshift; Redshift Spectrum to query S3

"audit trail / who called what API"→ CloudTrail (NOT CloudWatch)

"performance monitoring / metrics"→ CloudWatch (metrics, alarms, logs, dashboards)

"config drift / compliance"→ AWS Config

"best practice recommendations"→ Trusted Advisor

"graph database"→ Neptune

"time-series / IoT data"→ Timestream

"immutable audit ledger"→ QLDB

More keywords…

"encrypt / manage keys"→ KMS (managed, software-based); CloudHSM (dedicated hardware FIPS 140-2 L3)

"detect threats / intrusion detection"→ GuardDuty (analyses logs, ML-based)

"scan for vulnerabilities / CVE"→ Amazon Inspector (EC2 and container images)

"find PII in S3"→ Amazon Macie

"DDoS protection"→ Shield Standard (free, automatic); Shield Advanced (paid, 24/7 DRT)

"block SQLi / XSS / web attacks"→ WAF (Web Application Firewall)

"store secrets / DB password rotation"→ Secrets Manager (auto-rotates); SSM Parameter Store (manual, cheaper)

"IaC / infrastructure as code"→ CloudFormation (JSON/YAML); CDK (Python/TypeScript/etc. compiles to CFN)

"hybrid cloud / on-prem integration"→ Direct Connect (fiber), Storage Gateway, Outposts (AWS in your DC)

"migrate database to AWS"→ DMS (Database Migration Service); Schema Conversion Tool (SCT) for heterogeneous migrations

"move petabytes physically"→ Snowball Edge (80 TB); Snowmobile (100 PB per truck)

"orchestrate multi-step workflow"→ Step Functions

"SSH without bastion / SSH keys"→ Systems Manager Session Manager

"user sign-up / authentication for your app"→ Cognito (User Pools = user directory; Identity Pools = AWS access federation)

"cross-account access"→ IAM Roles (assume role from another account); AWS Organizations + SCPs for guardrails

Common Interview Q&As

Q: What's the difference between S3 and EBS?

S3 = object storage, accessed via HTTP API/URL, not mountable as a drive, infinite scale, globally accessible. EBS = block storage (like a hard drive), attached to one EC2 instance, tied to one AZ, low-latency random I/O. Choose EBS for OS and databases; choose S3 for static files, backups, media, and large-scale data.

Q: What's the difference between SQS and SNS?

SQS = pull-based queue; one consumer (or consumer group) processes and deletes each message; great for decoupling services. SNS = push-based pub/sub; one message published → all subscribers receive it simultaneously. Often used together: SNS fan-out to multiple SQS queues for parallel processing.

Q: What's a Region vs an Availability Zone?

Region = independent geographic location (e.g. us-east-1 = Northern Virginia). Has multiple AZs. Data does not leave the region unless you explicitly configure it. AZ = one or more physically separate data centers within a region, connected by low-latency links. Deploy across ≥2 AZs for HA; across ≥2 Regions for DR.

Q: How do EC2 instances securely access S3 without hardcoding credentials?

Attach an IAM Role to the EC2 instance profile with the appropriate S3 permissions. The EC2 instance retrieves temporary credentials automatically via the Instance Metadata Service (IMDS). The application uses the SDK which automatically finds these credentials. Never store access keys in code, environment variables baked into AMIs, or in S3 buckets.

Q: What's "stateless" application design and why does it matter in AWS?

A stateless app doesn't store session data in the instance's local memory or disk. Any server can handle any request. This enables Auto Scaling — you can add/remove instances freely. Store state externally in ElastiCache (sessions), DynamoDB (data), or S3 (files). The opposite (stateful) makes scaling and failover much harder.

Q: What is the AWS Shared Responsibility Model?

AWS is responsible for "Security OF the Cloud" — the physical infrastructure, hardware, hypervisor, managed service software. Customers are responsible for "Security IN the Cloud" — OS patching on EC2, application code, IAM configuration, data encryption, network/firewall config. The boundary shifts for managed services: RDS means AWS patches the DB engine, but you configure Security Groups and encryption.

Q: Multi-AZ vs Read Replicas in RDS — what's the difference?

Multi-AZ = synchronous replication to a standby in another AZ; automatic failover; the standby cannot serve read traffic; purely for high availability. Read Replicas = asynchronous replication; can serve read queries to scale read throughput; can be cross-region; can be promoted to standalone DB; NOT for automatic failover.

Numbers to Memorize

33+ Regions 105+ AZs 400+ Edge Locations S3 durability: 11 nines (99.999999999%) Lambda max timeout: 15 minutes Lambda memory: 128 MB – 10 GB Lambda concurrency default: 1,000/region S3 max object: 5 TB S3 multipart above: 5 GB EC2 Reserved savings: up to 72% Spot savings: up to 90% Savings Plans savings: up to 66% DynamoDB: single-digit ms latency RDS backup retention: 1–35 days Aurora: up to 15 read replicas SQS max retention: 14 days SQS max msg size: 256 KB SQS FIFO: 300 msg/s (3,000 w/ batching) CloudTrail free retention: 90 days Snowball Edge: ~80 TB usable Snowmobile: 100 PB per truck