Spark seems to outperform Hadoop in every metric, or so it appears. In memory processing vs disk, spark is capable of real time and batch . And also provides a layer for integrating ML as well.

So right now I217;m starting a role in a big investment bank. We operate on petabytes of data and our data pipeline is built on top of hadoop. I just graduated, but my friend at Oracle says that a lot of companies still use hadoop as well.

Forgive me if I am naive, but in ways is Hadoop better? Spark seems to be the best if you want a quick answer during real time analytics. I’ve never used Hadoop before. What capabilities does hadoop provide that spark doesnt? Doesn’t Spark provide batch processing as well?

Source link

No tags for this post.


Please enter your comment!
Please enter your name here