Spark seems to outperform Hadoop in every metric, or so it appears. In memory processing vs disk, spark is capable of real time streaming and batch streaming. And also provides a layer for integrating ML as well.
So right now I’m starting a role in a big investment bank. We operate on petabytes of data and our data pipeline is built on top of hadoop. I just graduated, but my friend at Oracle says that a lot of tech companies still use hadoop as well.
Forgive me if I am naive, but in ways is Hadoop better? Spark seems to be the best if you want a quick answer during real time analytics. I’ve never used Hadoop before. What capabilities does hadoop provide that spark doesnt? Doesn’t Spark provide batch processing as well?