DynamoDB

Solution Architect Associate

Developer Associate

Security Specialty

  • Is a fast and flexible, NoSQL database for applications that need consistent, single digit latency at any scale
  • Supports both Document and Key/Value data models
  • You don’t need to define your data models up front
  • Tables are stored on SSD storage, giving consistently faster performance for reads and writes
  • Spread across 3 geographically distinct data centers
  • Choice of 2 consistency models
    • Eventually consistent reads (default)
      • Consistency is usually reached within 1 second
    • Strongly consistent reads
      • Reads will always return the result of all successful writes across all 3 locations
  • Items = row
  • Attributes = column
  • Documents can be written in XML, JSON or HTML
  • Primary Keys
    • 2 types of primary keys
      • Partition Key - unique attribute
        • The value of the partition key is the in put to an internal hash function. The output of that function determines the physical location of where that data will be stored
      • Composite Key (Partition Key + Sort Key)
        • Used when partition key may not be unique
        • 2 items may have the same partition key, but they would have to have a different sort key
    • All items with the same partition key are stored together, and then sorted according to the sort key value
  • Access control is managed via IAM
  • You can create IAM users/roles which have access to specific tables
  • You can use a special IAM Condition to reduce access to only their own items in a table as well
    • Can attach the condition to an IAM policy to allow access only when items have their specific partition key value equal to their user id
    • The condition parameter for this is dynamodb:LeadingKeys
  • If you want to return metrics on the consumed capacity for a Query operation you can set the ReturnConsumedCapacity in the query request to TOTAL

Indexes

  • An index is a data structure which allows you to perform really fast queries
  • DynamoDB supports two types of indices

Local Secondary Index

  • Can only be created when you are creating your table (know this for exam!!)
  • You cannod add, remove or modifiy it later
  • It has the same Partition Key as your original table, but it has a different Sort Key (know this for exam!!)
  • Any queries based on the sort key when using this index are much faster than when using the main table

Global Secondary Index

  • You can create when you create table, or any time after (know this for exam!!)
  • Different partition key and different sort key (know this for exam!!)
  • Gives you completely different view of the data
  • Speeds up any queries related to this alternative partition and sort key
  • BE SURE TO KNOW DIFFERENCES BETWEEN THESE TWO

Scan vs. Query API Call

Query

  • A Query operation finds items in your table based upon the Primary Key attribute and a distinct value to search for
  • Selects all attributes for the item that is found in the query
  • You can use an optional sort key name and value to refine your results
    • Ex: if your sort key is a timestamp, you can restrict results based upon time
  • By default, a Query returns all attributes, but you can use a ProjectionExpression if you only want the query to return specific attributes
  • Results of a query are always sorted by the sort key
    • If numeric, sorted in ascending order by default
    • You can reverse the order by setting the ScanIndexForward parameter (know this for exam!! refers to Query not Scan…)
  • By default, all queries are eventually consistent, but can be set for strongly consistent

Scan

  • A scan operation examines every item in the table
  • By default, it also returns all data attributes
  • Use the ProjectionExpression parameter to refine the scan to only return the attributes you want
  • You can refine results of the scan by adding filters
    • You can filter on any Attribute, and you can specify comparisons like >, <, etc.
  • Remember though, the scan examines every single item in the table first, then applies filters on top. So you’re always working with the entire table

Differences

  • Query is much more efficient than a Scan
  • Scan dumps the entire table then filters
  • As the table grows, scans take longer, and if large enough the scan can use up the provisioned throughput for a large table in just a single operation

Performance Improvements of Scans

  • You can reduce the impact of the query/scan by setting a smaller page size, which uses fewer read operations
    • This results in a larger number of smaller operations, which in turn allows other requests to succeed without throttling
  • But in general, avoid using scan operations if you can. Design your tables in such a way that you can use Query, Get, or BatchGetItem APIs
  • By default, a scan operation processes data sequentially, retruning 1MB increments before moving on to retrieve the next 1MB of data. It can only scan one partition at a time
    • You can configure DynamoDB to use Parallel scans instead by logically dividing a table or index into segments and scanning each segment in parallel
    • Best to avoid parallel scans if your table or index is already incurring heavy read/write activity from other applications

Provisioned Throughput (know this for exam!!!)

  • The mechanism we use to define the capacity and performance requirements for DynamoDB
  • Throughput is measured in Capacity Units
  • When you create your table, you need to define Read and Write capacity units
  • 1 Write Capacity Unit = 1x 1KB write per second
  • 1 Read Capacity Unit = 1x Strongly Consistent Read of 4KB per second OR 2x Eventually Consistent Reads of 4KB per second (this is default)
  • Be sure to know how to do this math for the exam!!!
  • If your app reads and writes larger records, this will cost you more
  • Ex: Your application needs to read 80 items per seconds, each is 3KB, and you need Strongly Consistent Reads
    • Start by calculating the number of Read Capacity Units you need for each Read
    • Here we’d start with 3KB / 4KB = 0.75
    • Round the result up to the nearest whole number. So you’d need 1 RCU per Item
    • Multiply by the number of desired reads per second, here 80 x 1 = 80 RCU needed for this table (for Strongly Consistent Reads)
      • If this were for Eventually Consistent Reads, you’d divide the final number by 2
  • Ex: Your application needs to write 100 items per second into your table, and each item is 512B in size
    • First, calculate how many Capacity Units are needed for each write
    • Here we’d start with 512B / 1KB = 0.5
    • Round up to the nearest whole number, so here we would need 1x Write Capacity Unit per write
    • Multiply by number of desired writes per second, here 100 x 1 = 100 WCU needed

On-Demand Capacity Pricing

  • New as of re:Invent 2018
  • Charges for reading, writing and storing data
  • With On-Demand, you don’t need to specify your requirements
  • DynamoDB instantly scales up and down based upon activity
  • Great for unpredictable workloads, as you pay for only what you use
  • You can change from provisioned to on-demand pricing once per day

DynamoDB Accellerator (DAX)

  • Fully managed, clustered, in-memory cache for DynamoDB
  • Gives massive READ performance (up to 10x)
  • Ideal for read-heavy and bursty read workloads
  • DAX is a write-through caching service. Meaning data is written to cache and DynamoDB table at the same time
  • Allows you to point your DynamoDB API call to the DAX cluster instead of the Table directly
  • If the item you’re querying is in the cache (cache-hit), DAX returns the item to the app
  • If the item is not in the cache (cache-miss), DAX performs an eventually consistent GetItem operation on the DynamoDB Table
  • Remember, this caters to apps that support eventually consistent reads. Strongly consistent not supported

Transactions

  • ACID (Atomic, Consistent, Isolated, Durable)
  • Read or Write multiple items across multiple tables as an all or nothing approach to database transactions
  • Check for a pre-requisite condition before writing to a table

DynamoDB TTL (Time to Live)

  • Time to Live attribute defines an expiry time for your data
  • Expired items are marked for deletion after the item’s TTL, will be removed within 48 hours
  • Great for removing irrelevant or old data
    • Session data, event logs, temporary data, etc.
  • Helps to reduce costs for storage of data in DynamoDB
  • TTL is expressed in epoch time/UNIX/POSIX time
    • Number of seconds elapsed since start of Epoch (00:00:00 1/1/1970)
  • You can filter out expired items from your queries and scans

DynamoDB Streams

  • A time-ordered sequence/stream of item level modifications. Changes to items are recorded automatically in your stream for insert/update/deletes
  • Logs are encrypted at rest and stored for 24 hours
  • Accessed using a dedicated endpoint (and a different API to access the stream data)
  • By default the Primary Key is recorded, but you can also store before and after images of the Item with relation to the change being made
  • A DynamoDB Stream can be a Lambda trigger, which can be quite useful
  • Event are recorded in near real time

ProvisionedThroughputExceeded Exception

  • Means your request rate is too high for the read/write capacity provisioned on the table
  • SDK will automatically retry the requests until they are successful
  • If you are not using the SDK, you can:
    • Reduce the request frequency
    • Or use Exponential Backoff

Exponential Backoff

  • Many components in a network can generate errors from being overloaded
  • All AWS SDKs use simple retries as well as Exponential Backoff
  • Progressively longer waits between consecutive retries
    • i.e. 50ms, 100ms, 200ms, etc. for subsequent requests
  • If after 1 minute this doesn’t work, your request size may be exceeding the throughput for your read/write capacity