Spark
Spark-Docker-Akka-Cassandra-Kafka (SDACK) is better architecture for large-scale data processing.
Here we use hadoop 3.1.0, java 1.8.0, and spark 2.3.1 on the following tutorials.
Content
Resource
- Spark (https://spark.apache.org/) : a fast and distributed engine for large-scale data processing.
- Docker (https://www.docker.com/) : a container solution
- Akka (https://akka.io/) : a concurrent, distributed management tool
- Cassandra (http://cassandra.apache.org/) : a distributed database system
- Kafka (https://kafka.apache.org/) : a distributed streaming platform