What Are the Spark Transformations That Cause a Shuffle?
Apache Spark employs transformations and actions to manipulate and analyze data. Some transformations result in shuffling, which is the redistributing of data across the cluster. Shuffling is an expensive operation concerning both time and resources. Below, we’ll delve deeper into the transformations that cause shuffling and provide examples in PySpark. Transformations Causing Shuffling 1. `repartition` …
What Are the Spark Transformations That Cause a Shuffle? Read More »