Powered by GitBook

Spark

Spark-Docker-Akka-Cassandra-Kafka (SDACK) is better architecture for large-scale data processing.

Here we use hadoop 3.1.0, java 1.8.0, and spark 2.3.1 on the following tutorials.

Content

Resource

Spark (https://spark.apache.org/) : a fast and distributed engine for large-scale data processing.
Docker (https://www.docker.com/) : a container solution
Akka (https://akka.io/) : a concurrent, distributed management tool
Cassandra (http://cassandra.apache.org/) : a distributed database system
Kafka (https://kafka.apache.org/) : a distributed streaming platform

results matching ""

No results matching ""