In 2009, AMPLab of the University of Berkeley developed the Spark framework that went open source a year later as an
First, it significantly improves the performance of applications dependent data. Then it radically simplifies the development process of intelligence, which are supplied by the data. Specifically, in its effort to accelerate innovation on Spark ecosystem, IBM decided to include Spark in its own platforms of
IBM will hire more than 3,500 researchers and developers to work on Spark-related projects in more than a dozen laboratories worldwide. The big blue company plans to open a Spark Technology Center in San Francisco for the Data Science and the developer community. IBM will also train Spark to more than one million data scientists and data engineers through partnerships with DataCamp, AMPLab, Galvanize, MetiStream, and Big Data University.
A typical large corporation will have hundreds or thousands of data sets that reside in different databases through their computer system. A data scientist can design an algorithm using to plumb the depths of any database. But is needs 90 working days of scientific data to develop the algorithm. Today, if you want to implement another system, it is a quarter of work to adjust the algorithm so that it works. Spark eliminates that time in half. The spark-based system can access and analyze any database, without development and no additional delay.
Spark has another virtue of ease of use where developers can concentrate on the design of the solution, rather than building an engine from scratch. Spark brings advances in data processing technology on a large scale because it improves the performance of data-dependent applications, radically simplifies the process of developing intelligent solutions and enables a platform capable of unifying all kinds of information on real work schemes.
Many experts consider Spark as the successor to Hadoop, but its adoption remains slow. Spark works very well for machine learning tasks that normally require running large clusters of computers. The latest version of the platform, which recently came out, extends to the machine learning algorithms to run.