Amazon Web Services’ EMR (Elastic MapReduce) service has been upgraded to handle Spark applications, giving enterprises that want to use the increasingly popular processing engine a way to do so without building their own infrastructure.
Apache Spark is an open-source distributed processing engine used for big data workloads. It’s a good fit for batch processing, streaming, graph databases and machine learning thanks to in-memory caching and optimized execution for fast performance, according to Amazon.
EMR supports Spark version 1.3.1 and utilizes Hadoop YARN as the cluster manager. Running Spark on top of EMR has been possible before, but the integrated support should make using the engine more straightforward. IT staff can create a cluster from the AWS Management Console, for example. Spark applications developed using Scala, Python, Java, and SQL can all run on EMR.