Amazon Web Services – Big Data on AWS

In this 3 day course you will learn about the cloud-based big data solutions and Amazon Elastic MapReduce (EMR) and the AWS big data platform. You will learn how to use Amazon EMR in order to process data using brod ecosystem tools of Hadoop like Pig and Hive. Other topics covered in this course include how to create big data environments using Amazon DynamoDB, Amazon Redshift and Amazon Kinesis.

Who needs to attend

Who needs to attend?
This course is aimed at data scientists and analysts interested in learning more about big data solutions on AWS as well as Solutions Architects and SysOps Administrators responsible for design and implementation of big data solutions.

what you will learn

What you will learn
Upon completion you will know how to:

Understand Apache Hadoop in the context of Amazon EMR

Understand the architecture of an Amazon EMR cluster

Launch an Amazon EMR cluster using an appropriate Amazon Machine Image and Amazon EC2 instance types

Choose appropriate AWS data storage options for use with Amazon EMR

Know your options for ingesting, transferring, and compressing data for use with Amazon EMR

Use common programming frameworks available for Amazon EMR including Hive, Pig, and Streaming

Work with Amazon Redshift to implement a big data solution

Leverage big data visualization software

Choose appropriate security options for Amazon EMR and your data

Perform in-memory data analysis with Spark and Shark on Amazon EMR

Choose appropriate options to manage your Amazon EMR environment cost-effectively

Understand the benefits of using Amazon Kinesis for big data


Students need to have:

Basic familiarity with big data technologies, including Apache Hadoop and HDFS

Working knowledge of core AWS services and public cloud implementation

Basic understanding of data warehousing, relational database systems, and database design

Course outline

Course Outline

1. Overview of Big Data
2. Data Ingestion, Transfer, and Compression
3. AWS Data Storage Options
4. Using DynamoDB with Amazon EMR
5. Using Kinesis for Near Real-Time Big Data Processing
6. Introduction to Apache Hadoop and Amazon EMR
7. Using Amazon Elastic MapReduce
8. The Hadoop Ecosystem
9. Using Hive for Advertising Analytics
10. Using Streaming for Life Sciences Analytics
11. Using Hue with Amazon EMR
12. Running Pig Scripts with Hue on Amazon EMR
13. Spark on Amazon EMR
14. Running Spark and Spark SQL Interactively on Amazon EMR
15. Using Spark and Spark SQL for In-Memory Analytics
16. Managing Amazon EMR Costs
17. Securing your Amazon EMR Deployments
18. Data Warehouses and Columnar Datastores
19. Introduction to Amazon Redshift
20. Optimizing Your Amazon Redshift Environment
21. The Big Data Ecosystem on AWS
22. Visualizing and Orchestrating Big Data
23. Using Tibco Spotfire to Visualize Big Data

Follow on
There are no follow-ons for this course

Certification Programs
There are no certifications associated with this course