5 Factors to Consider Before Choosing a Sorting Algorithm

Know what sorting algorithm is aligned with your problem space

Pooya Amini
Towards Data Science

--

Photo by Aliaksandr Bahdanovich on iStockPhoto

Thanks to the evolution of new technologies and huge amounts of data constantly being generated, the demand for faster and more efficient sorting algorithms always exists. If you look up sorting algorithms, you will be amazed by the sheer amounts of algorithms even with the same space and time complexity. However, looking deeper into each algorithm reveals that slight implementation variances can cause huge differences in the applications. In this article, I’m going to briefly discuss the main factors to consider when choosing a sorting algorithm aligned with your needs.

1. Simplicity

In the technology industry considering the large amount of data, simplicity of implementation usually has nothing to say when it comes to factors like performance and speed. Algorithms like the Bubble sort, Insertion sort, and Selection sort are easy to understand and implement but considering the quadratic (N²) average and worst-case time complexity of the algorithms, they are not very popular in the computer world. However, due to their simplicity, they are used for educational purposes, small-sized data, and real-life scenarios. For example, a clothing shop employee will choose insertion sort over merge sort to organize t-shirts based on their sizes. For very small-size input, basic sorts like Insertion might outperform Quick sort and Merge sort due to the lack of need to instantiate new memory units or making recursive calls. Consequently, basic sorts such as Insertion sort might be used in hybrid algorithms like Radix to sort the bins.

2. Running time

Running time is the main factor to classify sorting algorithms. In a coding interview, when asked about the time complexity of your algorithm, the interviewer is usually looking for worst-case time complexity. In practice, however, the average case and the performance of the algorithm on all sets of data is what’s mostly considered. A great example is Quick sort which is a common sorting algorithm with an O(N²) worst-case time complexity compared to the O(NlogN) for the average case. However, the worst case is a very special and rare circumstance (whenever the pivot is always minimum or maximum). In addition, the actual running time can greatly change depending on other factors like the number of recursive calls, access to the memory, and required comparisons. For instance, let’s look at Heap sort vs. Quick sort which both possess the same average time complexity but in practice, Quick sort performs much better on large data. Another popular fast sorting algorithm is Merge sort that is used in general cases (Arrays.sort() in Java for objects) or special cases like in e-commerce when a website wants to sort the results fetched from other websites.

There are certain use-cases in critical systems such as air-crafts and life monitoring systems where a guaranteed response time is required. In these cases, Heap sort is a better option than Quick sort because of its better worst-case time complexity.

3. Memory consumption

Space complexity states how much memory is being used regarding the size of input data. Some basic algorithms like Insertion or Bubble sort require no additional memory and can sort the data in place. On the other hand, more efficient algorithms like Quick sort and Merge sort require O(logN) and O(N) time complexity respectively (meaning that extra space is required to complete the sorting). This can be problematic in situations where the input data is huge or the available memory is limited. Due to memory limitations, embedded systems and Linux kernel rather in-place but efficient enough sorts like Heap sort. When we are talking about memory consumption, our assumption is that the algorithm can fit all data in RAM (internal sorting). However, in the scenarios such as sorting data on file systems and databases where the input data cannot fit in RAM, we need other sorting algorithms like external merge sort to sort data by using external storage (external sorting).

3. Parallel processing

In real-world scenarios, terabytes of data still need to be sorted in a resourceful way. Procuring super-strong machines is not feasible due to significant costs. Instead, data can be divided and assigned to cheaper machines to sort independently and then merged later to create the final sorted result. Algorithms that incorporate divide-and-conquer approaches such as Quick sort, Merge sort, Map sort, and Tera sort are good candidates to be used in parallel processing.

4. Stability

A sorting algorithm is deemed stable whenever two objects with equal sorting keys appear in the same order in the sorted output as they were in the original input. This is essential in applications where other attributes of an object are important to us. Imagine that you have a list of students’ information such as their grades already sorted based on their names. If we want to sort the list based on the grades without affecting the name-sort attribute whenever grades are identical, we need to look for stable sorts such as insertion sort, merge sort, etc. Remember that although some sorts like Quick sort and Heap sort are not stable by nature, they can be modified to support stability.

5. Assumptions about input data

In many cases, having extra information about the data and its range allows you to choose a reliable sorting algorithm for that specific purpose. Consider non-comparison-based algorithms such as Radix and Counting sorts that give a linear time complexity for sorting numbers. Sometimes, due to special structures of data, applying regular sorting algorithms is impossible. A well-known example is using Topological sort to put graph nodes in order depending on their relations.

Conclusion: Know the problem space

In the end, there is no best or worst sorting algorithm. Each algorithm can be useful for certain applications. Also, the same algorithm might change its behavior depending on the size of the input data. The very first approach for choosing a sorting solution is to determine the problem space and your requirements. Occasionally, a single algorithm cannot satisfy the requirements and a hybrid approach produces a better outcome. Once the problem space is known, you can then narrow down the list of potential good-enough algorithms and perform a practical analysis on the input data to reach a final solution.

--

--

Sr. Software Engineer at Meta (ex-Amazon/AWS). I write to share my experience regarding personal and career growth.