I’ve got a few low level questions about Spark. If anyone can answer just 1 of them that would still be super helpful

  1. What’s it mean by “ enough to fit in memory”? In the context of Spark (heh), does this mean into the memory of 1 executor? Or 1 entire node?

  2. What’s the best way to calculate and determine the #of partitioners to use? Is it still to look at the set and divide the total size by number of executors?

  3. If I’m using data frames (Spark 2.3), should I still be considering older articles that mention RDDs?

Source link

No tags for this post.


Please enter your comment!
Please enter your name here