S3

Solution Architect Associate

Developer Associate

Security Specialty

AWS Documentation

Object based storage
Can enable multi-part upload to make uploading files faster
5TB limit per object, but unlimited storage in S3 in general
Files are stored in buckets
- Folders can exist within a bucket
- Buckets have a universal namespace/DNS address you can connect to that bucket in, folders do not.
- Bucket names have to be unique globally
URL format: https://BUCKET_NAME.s3.REGION.amazonaws.com (This is likely on the exam!!)
- ex: https://my-bucket.s3.us-east-1.amazonaws.com
When you upload a file, you’ll get an HTTP 200 code upon success (This is likely on the exam!!)
Data Consistency Model (This is likely on the exam!!)
- Read after Write consistency for PUTS for new objects
- Eventual consistency for overwrite PUTS and DELETES
Objects consist of the following
- Key (name of the object)
- Value (the data made up of a sequence of bytes, i.e. the file)
- Version ID
- Metadata
- Subresources
  - Access Control Lists
  - Torrent (not an exam topic)
99.9% availability guaranteed, but built for 99.99%
Amazon guarantees 99.999999999% durability for S3 information (11 x 9s)
Supports encryption
Secure data using ACL and Bucket Policies
Adding a new object will be private by default, you must make them public
Minimum billable object is 128KB (know for exam!!)
If you notice an increased number of HTTP 503 slow down responses, you can use the S3 inventory tool to generate a report on the number of versions for objects. Objects with millions of versions are automatically throttled to protect customers from an excessive amount of request traffic.

Storage Classes

Standard: 99.99% availability, 11x9s durability, stored redundantly. Designed to sustain the loss of 2 current facilities concurrently. Objects are always stored in at least 3 Availability Zones.
Standard - IA (Infrequently Accessed): 99.9% availability, For data accessed less frequently, but still requires rapid access when needed. Lower fee than Standard, but you are charged a retrieval fee
One Zone - IA: 99.5% availability, Infrequently accessed, but only stored in a single AZ. Lower cost than IA, but less durable to outage
Intelligent Tiering: Small monthly monitoring and auditing fee. But for that, it automatically transitions objects between tiers based upon access patterns. 11x9s durability, 99.9% availability
Glacier: Very cheap, but for archival only. 3 modes, expedited, standard or bulk. Restore timnes 3-5 hours (know for exam!!)
- Expedited Retrieval (1-5 min)
- Standard Retrieval (3-5 hours)
- Bulk Retrieval (5-12 hours)
- Minimum storage duration of 90 days
Glacier Deep Archive: For longer term storage, provides cheaper storage
- Standard Retrieval (12 hours)
- Bulk Retrieval (48 hours)
- Minimum storage duration is 180 days

Billing

Charged for storage
Charged for requests
Storage Management Pricing (i.e. metadata/tags)
Data Transfer Pricing: cross region replication
Transfer Acceleration: Enables fast and easy secure transfer of files over long distances between you and end users. Uses CloudFront

Versioning

Once you’ve enabled versioning on a bucket, you cannot disable it. You can only suspend it.
Everytime you add a new version of a file, it will be private by default. Just because you make the first version public, that doesn’t mean that subsequent versions will be public by default
When you delete a versioned item, you’re not deleting the object you’re placing a delete marker over it…basically creating a deleted version.
- To actually delete, you must delete all versions of the file.
- To restore a deleted, versioned file, you can delete the delete marker and it will appear restored.
You can put in MFA on deleting an object in a bucket
A great way to prevent accidental deletion of objects in a bucket is to enable both versioning and MFA on delete. This is a typical exam question!!
You have to have specific permission to view old versions of objects in a bucket. You don’t inherit this by having read access to the current version

Cross Region Replication

Buckets for replication can exist in another account
Cross region replication requires versioning in both source and destination buckets
You can enable object ownership of replicated objects to the target bucket owner if needed
Must create/use an IAM role for this
Objects in your source bucket are not automatically replicated to the target bucket when CRR is enabled. Only subsequent changes/additions
- To backup existing files, you’ll need to use the AWS CLI for this
Deleting a file in a primary bucket doesn’t automatically replicate down to the backup bucket! (Important for test)
You cannot replicate to multiple buckets, or use daisy chaining at this time (Important)

Lifecycle Management

Useful when you want to automatically change the storage tier of files after a certain amount of time/inactivity has occurred
Glacier is not available for every region at this time
You need to create a rulename when creating a rule, and you can also add a filter/tag so that only objects with those tags will get the rule applied to help filter
You can tranistion current or previous versions for the transition rule
All transition times are days after creation, not days after previous transition
Used to minimize storage costs (likely on the exam)

Security and Encryption

You can setup access control in two ways (Likely on the exam!)
- Bucket policies (apply to entire buckets)
- Access Contol Lists (can apply down to individual objects)
By default, buckets are set up with “Block all public access” as enabled. Meaning, buckets are private by default.
S3 buckets can be configured to create access logs. These logs can be written to other buckets, and even across accounts
Four different methods of encryption for S3
- In Transit
  - SSL/TLS based encryption
- At Rest
  - Server Side Encryption
    - S3 Managed Keys (SSE-S3). Each object encrypted with a unique key, AWS encrypts keys with master key. AWS handles all keys for you; Most Common
    - AWS Key Management Service, Managed Keys (SSE-KMS). Provides with an audit trail as to who is decrypting what and when. You can create and manage keys yourself if you’d like.
    - Server Side Encryption with Customer Provided Keys (SSE-C). You manage the encryption keys, AWS manages the encryption/decription
  - Client Side Encryption
    - You encrypt the data on the client and upload the already-encrypted data to S3

Storage Gateway (Popular Exam Topic, know when to use each of the 4)

Service that connects an on-premis software appliance with cloud based sotrage to provide seamless and secure integration between an organizations on premis IT and the cloud
Virtual appliance you install into your hypervisor on site, and then it can upload data into AWS (typically S3 or Glacier)
Available to be downloaded as a VM
Four Types of Storage Gateways
- File Gateway (NFS): Store flat files in S3 with this (Word docs, Images, etc.).
  - Two Types
    - Stored Gateway: Essentially a copy in S3
    - Cached Gateway: Stores the most recent version on prem, but backs things up to S3
- Volume Gateway (iSCSI): Block based storage. Virtual hard disk that you have an OS running on, or something like a database running on. Typically not stored in S3.
  - Think of this as a virtual hard disk. Data can be asynchronously backed up as point in time shanpshots of your volume, and stored in the cloud as Amazon EBS snapshots
  - Snapshots are incremental, so only changes from previous backup are captured.
  - All snapshot storage is automatically compressed
  - Two Types
    - Stored Volumes (store an entire copy of your dataset on premis). Previously called Gateway Stored Volumes
    - Cached Volumes (store most recent version of data on premis, rest backed up to cloud). Previously called Gateway Cached Volumes
- Tape Gateway
  - Create virtual tapes and send to S3, then use lifecycle settings to back up to Glacier or Glacier Deep Archive
Gateways can connect typically through the internet, but also through Direct Connect

S3 Transfer Acceleration

Uses the CloudFront edge network to accelerate uploads to S3. Instead of uploading directly to S3, you’ll upload to an edge location which will propogate the file to S3 for you
You get a specific URL for this
You can set this up in Properties. You just click enable and hit save
- The endpoint is listed for you in the Properties card for Transfer Acceleration

Hosted Websites

URL Format (could be on the exam)
- http://BUCKETNAME.s3-website-REGION_NAME.amazonaws.com
- ex: http://mybucket-s3-website-us-east-1.amazonaws.com
Dynamic websites cannot be hosted by S3 (could be on the exam!!)

S3 Bucket Event Notification

SNS, SQS, Lambda targets available
Must enable versioning on the bucket to get all possible notifications
Events are typically delivered in seconds, but can take up to a minute
If two writes are made to a single, non-versioned object at the same time, you may only get one event. This is why versioning is important
You can create as many events on a bucket as you’d like

Pre-Signed URLs

Allows an object owner to temporarily grant access to a typically private object in S3
Anyone who receives the pre-signed URL can then access the object
Must specify your credentials, the bucket and object, a timestamp for when to expire the pre-signed URL
Can also generate a pre-signed cookie, which will be saved on the user’s computer and they will be able to browse the contents using the cookie

< Previous: Elastic File System Next: Well Architected Framework >