Elastic MapReduce

Solution Architect Associate

  • Managed big data platform that allows you to process vast amounts of data using open source tools like Spark, HBase, Hudi, etc. Essentially ETL in the cloud
    • ETL - Extract, Transform, Load
  • EMR runs in clusters, generally runs on either EC2, EKS or Outposts
  • Transformed data is stored in S3
  • You can use reserved instances and spot instances to save money
  • Instances always live in a VPC