DynamoDB

Solution Architect Associate

Developer Associate

Security Specialty

AWS Documentation

Is a fast and flexible, NoSQL database for applications that need consistent, single digit latency at any scale
Supports both Document and Key/Value data models
You don’t need to define your data models up front
Tables are stored on SSD storage, giving consistently faster performance for reads and writes
Spread across 3 geographically distinct data centers
Choice of 2 consistency models
- Eventually consistent reads (default)
  - Consistency is usually reached within 1 second
- Strongly consistent reads
  - Reads will always return the result of all successful writes across all 3 locations
Items = row
Attributes = column
Documents can be written in XML, JSON or HTML
Primary Keys
- 2 types of primary keys
  - Partition Key - unique attribute
    - The value of the partition key is the in put to an internal hash function. The output of that function determines the physical location of where that data will be stored
  - Composite Key (Partition Key + Sort Key)
    - Used when partition key may not be unique
    - 2 items may have the same partition key, but they would have to have a different sort key
- All items with the same partition key are stored together, and then sorted according to the sort key value
Access control is managed via IAM
You can create IAM users/roles which have access to specific tables
You can use a special IAM Condition to reduce access to only their own items in a table as well
- Can attach the condition to an IAM policy to allow access only when items have their specific partition key value equal to their user id
- The condition parameter for this is dynamodb:LeadingKeys
If you want to return metrics on the consumed capacity for a Query operation you can set the ReturnConsumedCapacity in the query request to TOTAL

Indexes

An index is a data structure which allows you to perform really fast queries
DynamoDB supports two types of indices

Local Secondary Index

Can only be created when you are creating your table (know this for exam!!)
You cannod add, remove or modifiy it later
It has the same Partition Key as your original table, but it has a different Sort Key (know this for exam!!)
Any queries based on the sort key when using this index are much faster than when using the main table

Global Secondary Index

You can create when you create table, or any time after (know this for exam!!)
Different partition key and different sort key (know this for exam!!)
Gives you completely different view of the data
Speeds up any queries related to this alternative partition and sort key
BE SURE TO KNOW DIFFERENCES BETWEEN THESE TWO

Scan vs. Query API Call

Query

A Query operation finds items in your table based upon the Primary Key attribute and a distinct value to search for
Selects all attributes for the item that is found in the query
You can use an optional sort key name and value to refine your results
- Ex: if your sort key is a timestamp, you can restrict results based upon time
By default, a Query returns all attributes, but you can use a ProjectionExpression if you only want the query to return specific attributes
Results of a query are always sorted by the sort key
- If numeric, sorted in ascending order by default
- You can reverse the order by setting the ScanIndexForward parameter (know this for exam!! refers to Query not Scan…)
By default, all queries are eventually consistent, but can be set for strongly consistent

Scan

A scan operation examines every item in the table
By default, it also returns all data attributes
Use the ProjectionExpression parameter to refine the scan to only return the attributes you want
You can refine results of the scan by adding filters
- You can filter on any Attribute, and you can specify comparisons like >, <, etc.
Remember though, the scan examines every single item in the table first, then applies filters on top. So you’re always working with the entire table

Differences

Query is much more efficient than a Scan
Scan dumps the entire table then filters
As the table grows, scans take longer, and if large enough the scan can use up the provisioned throughput for a large table in just a single operation

Performance Improvements of Scans

You can reduce the impact of the query/scan by setting a smaller page size, which uses fewer read operations
- This results in a larger number of smaller operations, which in turn allows other requests to succeed without throttling
But in general, avoid using scan operations if you can. Design your tables in such a way that you can use Query, Get, or BatchGetItem APIs
By default, a scan operation processes data sequentially, retruning 1MB increments before moving on to retrieve the next 1MB of data. It can only scan one partition at a time
- You can configure DynamoDB to use Parallel scans instead by logically dividing a table or index into segments and scanning each segment in parallel
- Best to avoid parallel scans if your table or index is already incurring heavy read/write activity from other applications

Provisioned Throughput (know this for exam!!!)

The mechanism we use to define the capacity and performance requirements for DynamoDB
Throughput is measured in Capacity Units
When you create your table, you need to define Read and Write capacity units
1 Write Capacity Unit = 1x 1KB write per second
1 Read Capacity Unit = 1x Strongly Consistent Read of 4KB per second OR 2x Eventually Consistent Reads of 4KB per second (this is default)
Be sure to know how to do this math for the exam!!!
If your app reads and writes larger records, this will cost you more
Ex: Your application needs to read 80 items per seconds, each is 3KB, and you need Strongly Consistent Reads
- Start by calculating the number of Read Capacity Units you need for each Read
- Here we’d start with 3KB / 4KB = 0.75
- Round the result up to the nearest whole number. So you’d need 1 RCU per Item
- Multiply by the number of desired reads per second, here 80 x 1 = 80 RCU needed for this table (for Strongly Consistent Reads)
  - If this were for Eventually Consistent Reads, you’d divide the final number by 2
Ex: Your application needs to write 100 items per second into your table, and each item is 512B in size
- First, calculate how many Capacity Units are needed for each write
- Here we’d start with 512B / 1KB = 0.5
- Round up to the nearest whole number, so here we would need 1x Write Capacity Unit per write
- Multiply by number of desired writes per second, here 100 x 1 = 100 WCU needed

On-Demand Capacity Pricing

New as of re:Invent 2018
Charges for reading, writing and storing data
With On-Demand, you don’t need to specify your requirements
DynamoDB instantly scales up and down based upon activity
Great for unpredictable workloads, as you pay for only what you use
You can change from provisioned to on-demand pricing once per day

DynamoDB Accellerator (DAX)

Fully managed, clustered, in-memory cache for DynamoDB
Gives massive READ performance (up to 10x)
Ideal for read-heavy and bursty read workloads
DAX is a write-through caching service. Meaning data is written to cache and DynamoDB table at the same time
Allows you to point your DynamoDB API call to the DAX cluster instead of the Table directly
If the item you’re querying is in the cache (cache-hit), DAX returns the item to the app
If the item is not in the cache (cache-miss), DAX performs an eventually consistent GetItem operation on the DynamoDB Table
Remember, this caters to apps that support eventually consistent reads. Strongly consistent not supported

Transactions

ACID (Atomic, Consistent, Isolated, Durable)
Read or Write multiple items across multiple tables as an all or nothing approach to database transactions
Check for a pre-requisite condition before writing to a table

DynamoDB TTL (Time to Live)

Time to Live attribute defines an expiry time for your data
Expired items are marked for deletion after the item’s TTL, will be removed within 48 hours
Great for removing irrelevant or old data
- Session data, event logs, temporary data, etc.
Helps to reduce costs for storage of data in DynamoDB
TTL is expressed in epoch time/UNIX/POSIX time
- Number of seconds elapsed since start of Epoch (00:00:00 1/1/1970)
You can filter out expired items from your queries and scans

DynamoDB Streams

A time-ordered sequence/stream of item level modifications. Changes to items are recorded automatically in your stream for insert/update/deletes
Logs are encrypted at rest and stored for 24 hours
Accessed using a dedicated endpoint (and a different API to access the stream data)
By default the Primary Key is recorded, but you can also store before and after images of the Item with relation to the change being made
A DynamoDB Stream can be a Lambda trigger, which can be quite useful
Event are recorded in near real time

ProvisionedThroughputExceeded Exception

Means your request rate is too high for the read/write capacity provisioned on the table
SDK will automatically retry the requests until they are successful
If you are not using the SDK, you can:
- Reduce the request frequency
- Or use Exponential Backoff

Exponential Backoff

Many components in a network can generate errors from being overloaded
All AWS SDKs use simple retries as well as Exponential Backoff
Progressively longer waits between consecutive retries
- i.e. 50ms, 100ms, 200ms, etc. for subsequent requests
If after 1 minute this doesn’t work, your request size may be exceeding the throughput for your read/write capacity

< Previous: Database Next: Elasticache >