Spark seems to outperform Hadoop in every metric, or so it appears. In memory processing vs disk, spark is capable of streaming and batch streaming. And also provides a layer for integrating ML as well.

So right now I’m starting a role in a big investment bank. We operate on petabytes of and our pipeline is built on of hadoop. I just graduated, but my friend at Oracle says that a lot of tech companies still use hadoop as well.

Forgive me if I am naive, but in ways is Hadoop better? Spark seems to be the best if you want a quick answer during real time analytics. I’ve never used Hadoop before. What capabilities does hadoop provide that spark doesnt? Doesn’t Spark provide batch processing as well?

Source link

No tags for this post.


Please enter your comment!
Please enter your name here