S3
Solution Architect Associate
Developer Associate
Security Specialty
- Object based storage
- Can enable multi-part upload to make uploading files faster
- 5TB limit per object, but unlimited storage in S3 in general
- Files are stored in buckets
- Folders can exist within a bucket
- Buckets have a universal namespace/DNS address you can connect to that bucket in, folders do not.
- Bucket names have to be unique globally
- URL format: https://BUCKET_NAME.s3.REGION.amazonaws.com (This is likely on the exam!!)
- ex: https://my-bucket.s3.us-east-1.amazonaws.com
- When you upload a file, you’ll get an HTTP 200 code upon success (This is likely on the exam!!)
- Data Consistency Model (This is likely on the exam!!)
- Read after Write consistency for PUTS for new objects
- Eventual consistency for overwrite PUTS and DELETES
- Objects consist of the following
- Key (name of the object)
- Value (the data made up of a sequence of bytes, i.e. the file)
- Version ID
- Metadata
- Subresources
- Access Control Lists
- Torrent (not an exam topic)
- 99.9% availability guaranteed, but built for 99.99%
- Amazon guarantees 99.999999999% durability for S3 information (11 x 9s)
- Supports encryption
- Secure data using ACL and Bucket Policies
- Adding a new object will be private by default, you must make them public
- Minimum billable object is 128KB (know for exam!!)
- If you notice an increased number of HTTP 503 slow down responses, you can use the S3 inventory tool to generate a report on the number of versions for objects. Objects with millions of versions are automatically throttled to protect customers from an excessive amount of request traffic.
Storage Classes
- Standard: 99.99% availability, 11x9s durability, stored redundantly. Designed to sustain the loss of 2 current facilities concurrently. Objects are always stored in at least 3 Availability Zones.
- Standard - IA (Infrequently Accessed): 99.9% availability, For data accessed less frequently, but still requires rapid access when needed. Lower fee than Standard, but you are charged a retrieval fee
- One Zone - IA: 99.5% availability, Infrequently accessed, but only stored in a single AZ. Lower cost than IA, but less durable to outage
- Intelligent Tiering: Small monthly monitoring and auditing fee. But for that, it automatically transitions objects between tiers based upon access patterns. 11x9s durability, 99.9% availability
- Glacier: Very cheap, but for archival only. 3 modes, expedited, standard or bulk. Restore timnes 3-5 hours (know for exam!!)
- Expedited Retrieval (1-5 min)
- Standard Retrieval (3-5 hours)
- Bulk Retrieval (5-12 hours)
- Minimum storage duration of 90 days
- Glacier Deep Archive: For longer term storage, provides cheaper storage
- Standard Retrieval (12 hours)
- Bulk Retrieval (48 hours)
- Minimum storage duration is 180 days
Billing
- Charged for storage
- Charged for requests
- Storage Management Pricing (i.e. metadata/tags)
- Data Transfer Pricing: cross region replication
- Transfer Acceleration: Enables fast and easy secure transfer of files over long distances between you and end users. Uses CloudFront
Versioning
- Once you’ve enabled versioning on a bucket, you cannot disable it. You can only suspend it.
- Everytime you add a new version of a file, it will be private by default. Just because you make the first version public, that doesn’t mean that subsequent versions will be public by default
- When you delete a versioned item, you’re not deleting the object you’re placing a delete marker over it…basically creating a deleted version.
- To actually delete, you must delete all versions of the file.
- To restore a deleted, versioned file, you can delete the delete marker and it will appear restored.
- You can put in MFA on deleting an object in a bucket
- A great way to prevent accidental deletion of objects in a bucket is to enable both versioning and MFA on delete. This is a typical exam question!!
- You have to have specific permission to view old versions of objects in a bucket. You don’t inherit this by having read access to the current version
Cross Region Replication
- Buckets for replication can exist in another account
- Cross region replication requires versioning in both source and destination buckets
- You can enable object ownership of replicated objects to the target bucket owner if needed
- Must create/use an IAM role for this
- Objects in your source bucket are not automatically replicated to the target bucket when CRR is enabled. Only subsequent changes/additions
- To backup existing files, you’ll need to use the AWS CLI for this
- Deleting a file in a primary bucket doesn’t automatically replicate down to the backup bucket! (Important for test)
- You cannot replicate to multiple buckets, or use daisy chaining at this time (Important)
Lifecycle Management
- Useful when you want to automatically change the storage tier of files after a certain amount of time/inactivity has occurred
- Glacier is not available for every region at this time
- You need to create a rulename when creating a rule, and you can also add a filter/tag so that only objects with those tags will get the rule applied to help filter
- You can tranistion current or previous versions for the transition rule
- All transition times are days after creation, not days after previous transition
- Used to minimize storage costs (likely on the exam)
Security and Encryption
- You can setup access control in two ways (Likely on the exam!)
- Bucket policies (apply to entire buckets)
- Access Contol Lists (can apply down to individual objects)
- By default, buckets are set up with “Block all public access” as enabled. Meaning, buckets are private by default.
- S3 buckets can be configured to create access logs. These logs can be written to other buckets, and even across accounts
- Four different methods of encryption for S3
- In Transit
- SSL/TLS based encryption
- At Rest
- Server Side Encryption
- S3 Managed Keys (SSE-S3). Each object encrypted with a unique key, AWS encrypts keys with master key. AWS handles all keys for you; Most Common
- AWS Key Management Service, Managed Keys (SSE-KMS). Provides with an audit trail as to who is decrypting what and when. You can create and manage keys yourself if you’d like.
- Server Side Encryption with Customer Provided Keys (SSE-C). You manage the encryption keys, AWS manages the encryption/decription
- Client Side Encryption
- You encrypt the data on the client and upload the already-encrypted data to S3
- Server Side Encryption
- In Transit
Storage Gateway (Popular Exam Topic, know when to use each of the 4)
- Service that connects an on-premis software appliance with cloud based sotrage to provide seamless and secure integration between an organizations on premis IT and the cloud
- Virtual appliance you install into your hypervisor on site, and then it can upload data into AWS (typically S3 or Glacier)
- Available to be downloaded as a VM
- Four Types of Storage Gateways
- File Gateway (NFS): Store flat files in S3 with this (Word docs, Images, etc.).
- Two Types
- Stored Gateway: Essentially a copy in S3
- Cached Gateway: Stores the most recent version on prem, but backs things up to S3
- Two Types
- Volume Gateway (iSCSI): Block based storage. Virtual hard disk that you have an OS running on, or something like a database running on. Typically not stored in S3.
- Think of this as a virtual hard disk. Data can be asynchronously backed up as point in time shanpshots of your volume, and stored in the cloud as Amazon EBS snapshots
- Snapshots are incremental, so only changes from previous backup are captured.
- All snapshot storage is automatically compressed
- Two Types
- Stored Volumes (store an entire copy of your dataset on premis). Previously called Gateway Stored Volumes
- Cached Volumes (store most recent version of data on premis, rest backed up to cloud). Previously called Gateway Cached Volumes
- Tape Gateway
- Create virtual tapes and send to S3, then use lifecycle settings to back up to Glacier or Glacier Deep Archive
- File Gateway (NFS): Store flat files in S3 with this (Word docs, Images, etc.).
- Gateways can connect typically through the internet, but also through Direct Connect
S3 Transfer Acceleration
- Uses the CloudFront edge network to accelerate uploads to S3. Instead of uploading directly to S3, you’ll upload to an edge location which will propogate the file to S3 for you
- You get a specific URL for this
- You can set this up in Properties. You just click enable and hit save
- The endpoint is listed for you in the Properties card for Transfer Acceleration
Hosted Websites
- URL Format (could be on the exam)
- http://BUCKETNAME.s3-website-REGION_NAME.amazonaws.com
- ex: http://mybucket-s3-website-us-east-1.amazonaws.com
- Dynamic websites cannot be hosted by S3 (could be on the exam!!)
S3 Bucket Event Notification
- SNS, SQS, Lambda targets available
- Must enable versioning on the bucket to get all possible notifications
- Events are typically delivered in seconds, but can take up to a minute
- If two writes are made to a single, non-versioned object at the same time, you may only get one event. This is why versioning is important
- You can create as many events on a bucket as you’d like
Pre-Signed URLs
- Allows an object owner to temporarily grant access to a typically private object in S3
- Anyone who receives the pre-signed URL can then access the object
- Must specify your credentials, the bucket and object, a timestamp for when to expire the pre-signed URL
- Can also generate a pre-signed cookie, which will be saved on the user’s computer and they will be able to browse the contents using the cookie