I’ve got a few low level questions about Spark. If anyone can answer just 1 of them that would still be super helpful
What’s it mean by “small enough to fit in memory”? In the context of Spark (heh), does this mean into the memory of 1 executor? Or 1 entire node?
What’s the best way to calculate and determine the #of partitioners to use? Is it still to look at the data set and divide the total size by number of executors?
If I’m using data frames (Spark 2.3), should I still be considering older articles that mention RDDs?