distinct() vs dropDuplicates() in Apache Spark
What’s the difference between distinct() and dropDuplicates() in Spark?
Published in
3 min readFeb 21, 2021
The Spark DataFrame API comes with two functions that can be used in order to remove duplicates from a given DataFrame. These are distinct()
and dropDuplicates()
. Even though both methods pretty much do the same job, they actually come with one difference which is…