Abstract
- Basically re-arranging a collection of items or data elements in an ascending or descending order
Not all the sorting algorithms are the same
Different algorithms have different inputs that they are good or bad on.
For example, Bubble Sort, Selection Sort and Insertion Sort have the time complexity. But on sorted or almost sorted array, bubble sort and insertion sort have the time complexity while selection sort has time complexity.
In-Place
- Perform Sorting with constant Main Memory usage regardless the input size
- Worst Space Complexity -
O(1)
Stability of Sorting
- Stable if the items with the same value stay in the same order after sorting
- Unstable if the items with the same value doesn’t stay in the same after sorting
Sorting algorithms that are stable
- Only swap elements that are different
- Be careful with the implementation! Only insert to a point where the element to be inserted is strictly smaller than the element at the insertion point
- Stable if ‘merge’ is carefully implemented. When merging 2 subarrays, the first half always comes first, the second half always comes second
Sorting algorithms that are unstable
Bogo Sort
- Random Permutation may swap elements with the same value
- Imagine in iteration , the element at position is and element at position is too. Then the smallest element in the range is . We swap with the at position . As you can see, the order of the 2 same value is changed. Thus, unstable
Divide-and-Conquer Sorting
- Divide: split array into two halves
- Recursion up: hit the base case, kickstart sorting the two halves
- Combine: merge the sorted halves
Bubble Sort
- The idea here is that at each iteration , we are are going to ‘bubble up’ the biggest element in the range to the position. If no ‘bubbling’ happens in that particular iteration, it means the array is fully sorted and it is safe to terminate the sorting early
- Bubble sort implementation in Java
Is the code editor above not showing the correct source code?
Here is a backup, please report the issue here or comment down below, so I can look into the issue. Thanks :)
Loop Invariant
At the end of iteration , the biggest items are correctly sorted in the final positions of the array.
So for example, at the end of iteration 2, the biggest 2 items are correctly sorted in the final 2 positions of the array.
So if we have items, we need to perform iterations to have the biggest items to be sorted correctly in the final positions of the array. Each iteration takes , and there are items we need to bubble to the right hand side. Thus, the Worst Time Complexity is .
Time Complexity
Best-case
- , already sorted, only one iteration is needed
Average-case
- , assume inputs are chosen at random
Worst-case
- , when we have elements in the given array and all of them aren’t correctly sorted in their final positions of the array
Selection Sort
- The idea here is that at each iteration , we are are going to select the smallest element in the range of the Array. Then swap it with element at the position
- Selection sort implementation in java
Is the code editor above not showing the correct source code?
Here is a backup, please report the issue here or comment down below, so I can look into the issue. Thanks :) import java.util.*;
Loop Invariant
At the end of iteration , the smallest items are correctly sorted in the first positions of the array.
Time Complexity
Best-case
- , because at each iteration, we can only find the current smallest element. Even if we are given a fully sorted array, we need to perform iterations, in order to have the confidence to say that the array is sorted
Average-case
- , assume inputs are chosen at random
Worst-case
- , when we have elements in the given array and all of them aren’t correctly sorted in their final positions of the array
Insertion Sort
- The idea here is that at each iteration , we are are going to insert the element at the position to the prefix Array that is long, such that the new prefix array which is long remains sorted
Is the code editor above not showing the correct source code?
Here is a backup, please report the issue here or comment down below, so I can look into the issue. Thanks :)
Loop Invariant
At the end of iteration , the first items in the array are in sorted order.
Time Complexity
Best-case
- , already sorted, no insertion is needed, just one iteration is needed
Average-case
- , assume inputs are chosen at random
Worst-case
- , when we have elements in the given array and all of them aren’t correctly sorted in their final positions of the array
Merge Sort
- A Divide-and-Conquer Sorting that NOT In-Place
Is the code editor above not showing the correct source code?
Here is a backup, please report the issue here or comment down below, so I can look into the issue. Thanks :)
Time Complexity
Best-case
- , already sorted, a full divide-and-conquer process is needed to have the confidence to claim the array is sorted
Average-case
- , assume inputs are chosen at random
Worst-case
- , when we have elements in the given array and all of them aren’t correctly sorted in their final positions of the array
Time Complexity Proof
Recursion Tree Approach
Merge at each level takes and there will be levels at the conquer stage where we perform the merging since we divide the array by half at each level. So the overall time complexity is . in the diagram above indicate a constant value.
Induction Approach
Guess the time complexity and verify it with the reoccurrence we obtained.
Handle large dataset well
The performance gains on large dataset with time complexity is huge.
And we can perform sorting on subset of the dataset at a time. For example, if our dataset is 10GB, and we need to load the data into the RAM to perform the sorting, but there is only 2GB RAM. With merge sort, we are able to load in 2GB at a time to perform sorting, and eventually sort the dataset.
Slow on small arrays!
The allocation of different arrays are scattered in the Main Memory. Merge sort has a space complexity of with different temporary arrays at each merge layer. Working on multiple arrays means we sacrifice the performance gain from Cache Locality.
The Recursion nature of the algorithm comes with extra overhead too. Recursion is also less predicable, thus impact the Branch Prediction negatively.
When the array is small, such hardware level negative impact outweighs the performance gains from the better time complexity.
Slow on almost sorted array!
Merge sort’s performance is still when the array is almost sorted, because it needs to perform the full divide-and-conquer process regardless how chaotic the given array is. While Bubble Sort and Insertion Sort have a time complexity of .
Quick Sort
- Divide: Select a ‘pivot’ element from the array
- Partition: Rearrange the array so that all elements smaller than the pivot are placed to its left and all elements greater than the pivot are placed to its right.
- Conquer: Recursively apply the same process (steps 1 and 2) to the sub-array on the left of the pivot and the sub-array on the right of the pivot. The base case for recursion occurs when an array or sub-array has one or zero elements, as they are already sorted by default
Important
Quick sort is an In-Place sorting algorithm.
How is partition implemented?
Essentially, we have two indices,
i
andj
, both starting at index . The indexi
keeps track of the last position where an element smaller than the pivot was placed. Meanwhile,j
scans the array from index to the last index.Here is a short video showing how partition is carried out.