AWS EC2

How to deploy a cluster AWS EC2 with Spark and Cassandra

Context :

This tutorial presents a step-by-step guide to configure a cluster AWS EC2 with Apache Spark and Apache Cassandra.

Architecture we will create :

First of all, I would like to present you the Architecture I want to deployed.

image

On this architecture, we’ve got 8 instances AWS EC2 :

(Slaves are also called Workers)

In terms of resilience :

If you want to realised this architecture, I invite you to follow (in order) the 4 tutorials bellow.

  1. The first one explain how to create instances on AWS EC2.
  2. The second deal with the installation of apache Cassandra and how to configure it
  3. The third explain you how to install apache Zookeeper and why.
  4. The last one will show you how to install apache Spark and how to configure it

Of course, you can increase the number of Master nodes and Slave nodes.
In this case, you just need to addapt your configuration files.

AWS EC2 Instances Tutorial (Step 1 on 4)

Amazon Elastic Compute Cloud (Amazon EC2) is a web service that provides secure, resizable compute capacity in the cloud. It is designed to make web-scale cloud computing easier for developers. This tutorial will help you get jump started with AWS EC2.

Apache Cassandra Tutorial (Step 2 on 4)

The Apache Cassandra database is the right choice when you need scalability and high availability without compromising performance. Linear scalability and proven fault-tolerance on commodity hardware or cloud infrastructure make it the perfect platform for mission-critical data. We’ll see how to configure Cassandra on an AWS EC2 cluster and create a resilient architecture that is big-data ready.

Zookeeper Tutorial (Step 3 on 4)

ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. All of these kinds of services are used in some form or another by distributed applications. Zookeeper is useful if you would like to secure your architecture a little more and prevent from the consequences of the fall of your Masters for example.

Apache Spark Tutorial (Step 4 on 4)

This topic will help you install Apache-Spark on you cluster AWS EC2.
I’ll show you how to made a standard configuration which allow your elected Master to spread its jobs on Worker nodes.