Quicksort

Developed by Hardeep Singh
Copyright © Hardeep Singh, 2000
EMail seeingwithc@hotmail.com
Website seeingwithc.org
All rights reserved.
The code may not be used commercially without permission.
The code does not come with any warranties, explicit or implied.
The code cannot be distributed without this header.

Problem: We are given a set of numbers that must be arranged in nondecreasing order. We need to use quicksort for the solution.

Note: In the text below, lg(n) means log of n to base 2.

Solution #1: Basic quicksort, recursive.

Quicksort was developed by C.A.R. Hoare and is a widely used sorting technique, based on divide and conquer. It is an O(n*n) algorithm in the worst case, which is as bad as insertion and selection sorts. However, its popularity derives from the fact that it has an excellent average case behaviour of O(n*log(n)). In practice, quicksort almost always achieves this behaviour.

Quicksort first selects an element that is to be used as split-point from the list of given numbers. All the numbers smaller than the split-point are brought to one side of the list and the rest on the other. This operation is called splitting.

Consider the list: 4 3 1 7 5 9 6 2 8

Now, assume the first element of the list, 4 as the split-point. Then, after splitting the list will be arranged as:

3 1 2 4 7 5 9 6 8

Or in some other order, say: 2 3 1 4 ...

That is, the exact order is immaterial. What is important is that the order should be:

  1. Elements less than split-point element.
  2. Split-point element.
  3. Elements greater than split-point element.

After this, the list is divided into lists, one containing the elements less than the split-point element and the other containing the elements more than the split-point element. These two lists are again individually sorted using quicksort, that is by again finding a split-point and dividing into two parts etc. In the program given below, we will let recursion sort the smaller lists.

In the example, the two sub-lists, are:

2 3 1 and 7 5 9 6 8.

Now let us see the program:

#include <stdio.h>

#define MAXELT 50

typedef int key;
typedef int index;

key list[MAXELT];

void main()
{
        int i=-1,j,n;
        char t[10];
        void quicksort(index,index);
        
        do {                                     //read the list of numbers
                if (i!=-1)
                        list[i++]=n;
                else
                        i++;
                printf("\nEnter the numbers <End by #>");
                fflush(stdin);
                scanf("%[^\n]",t);
                if (sscanf(t,"%d",&n)<1)
                        break;
        } while (1);
        
        quicksort(0,i-1);                        //sort
        
        printf("The list obtained is ");         //print
        
        for (j=0;j<i;j++)
                printf("\n %d",list[j]);
}

void quicksort(index first,index last)
{
        void split(index,index,index*);
        index splitpoint;
        
        if (first<last) {
                split(first,last,&splitpoint);   //split
                quicksort(first,splitpoint-1);   //sort the first sub-list
                quicksort(splitpoint+1,last);    //sort the second sub-list
        }
}

void split(index first,index last,index *splitpoint)
{
        key x;
        index unknown;
        void interchange(int*,int*);
        
        x=list[first];                           //find the split-point element
        *splitpoint=first;                       //find the splitpoint
        
        for (unknown=first+1;unknown<=last;unknown++) {
                                                 //move through the list
                if (list[unknown]<x) {           //comparing each element 
                                                 //with the split-point element
                        *splitpoint=(*splitpoint)+1;
                                                 //move the splitpoint
                        interchange(&list[*splitpoint],&list[unknown]);
                                                 //move the split-point element
                }
        }
        interchange(&list[first],&list[*splitpoint]);
                                                 //get the split-point element
                                                 //in the middle
        return;
}

void interchange(int *x,int *y)                  //interchange
{
        int temp;
        
        temp=*x;
        *x=*y;
        *y=temp;
}

//solution 1 ends

As you can see, quicksort is not an inplace sort, since it needs extra space to store the recursion stack. Also, it has a very bad worst case behaviour. As implemented above, the worst case occurs when the list is already in order, or if it is in the reverse order. This makes quicksort an unnatural sort, since it has to do most work when all the work is already done! However, in practice, quicksort works quite well. You can include counters to count the number of key-comparisons/swaps done to find its behaviour and test it with different types of input data. The amount of extra storage space needed is O(n).

Quicksort can keep doing swaps of elements with equal keys. At times this may not be acceptable. So, to avoid this, the following scheme may be used:

'Attach to each record its sequence number in the orignal file. This number is made a part of the key in such a way that it is the least significant part. This keeps the records in the same relative order as they were in input.'

In particular, all keys can be changed from A(i) to A(i)*n+i (i starts from zero) to make all keys distinct. This, after sorting can be changed to floor(A(i)/n). The equal keys will be in the same relative order as before.

Solution #1: Modifed quicksort, non-recursive.

In this implementation we will make a number of changes. First we will see how we can use a stack to remove recursion. Also, we will choose the split- point more carefully, as a median of the first, the middle and the last elements. This will make the worst case behaviour more difficult to achieve. Also, we will try a different splitting strategy. We will also look at the sizes of the sub-lists and stack only one, smaller sub-list. This will make the worst case stack space come down to O(lg(n)) from O(n). Also, to sort the sub-lists of size less than 10, we will use insertion sort. This will also speed up the process, since for small lists, insertion sort is better than quicksort.

Let us look at the program:

#include <stdio.h>

#define MAXELT          100
#define INFINITY        32760                    //numbers in list should not exceed
                                                 //this. change the value to suit your
                                                 //needs
#define SMALLSIZE       10                       //not less than 3
#define STACKSIZE       100                      //should be ceiling(lg(MAXSIZE)+1)

int list[MAXELT+1];                              //one extra, to hold INFINITY

struct {                                         //stack element.
        int a,b;
} stack[STACKSIZE];

int top=-1;                                      //initialise stack

void main()                                      //overhead!
{
        int i=-1,j,n;
        char t[10];
        void quicksort(int);
        
        do {
                if (i!=-1)
                        list[i++]=n;
                else
                        i++;
                printf("\nEnter the numbers <End by #>");
                fflush(stdin);
                scanf("%[^\n]",t);
                if (sscanf(t,"%d",&n)<1)
                        break;
        } while (1);
        
        quicksort(i-1);
        
        printf("The list obtained is ");
        
        for (j=0;j<i;j++)
                printf("\n %d",list[j]);
}

void interchange(int *x,int *y)                  //swap
{
        int temp;
        
        temp=*x;
        *x=*y;
        *y=temp;
}

void split(int first,int last,int *splitpoint)
{
        int x,i,j,s,g;
        
        //here, atleast three elements are needed
        if (list[first]<list[(first+last)/2]) {  //find median
                s=first;
                g=(first+last)/2;
        }
        else {
                g=first;
                s=(first+last)/2;
        }
        if (list[last]<=list[s]) 
                x=s;
        else if (list[last]<=list[g])
                x=last;
        else
                x=g;
        interchange(&list[x],&list[first]);      //swap the split-point element
                                                 //with the first
        x=list[first];
        i=first;                                 //initialise
        j=last+1;
        while (i<j) {
                do {                             //find j 
                        j--;
                } while (list[j]>x);
                do {
                        i++;                     //find i
                } while (list[i]<x);
                interchange(&list[i],&list[j]);  //swap
        }
        interchange(&list[i],&list[j]);          //undo the extra swap
        interchange(&list[first],&list[j]);      //bring the split-point 
                                                 //element to the first
        *splitpoint=j;
}

void push(int a,int b)                           //push
{
        top++;
        stack[top].a=a;
        stack[top].b=b;
}

void pop(int *a,int *b)                          //pop
{
        *a=stack[top].a;
        *b=stack[top].b;
        top--;
}

void insertion_sort(int first,int last)
{
        int i,j,c;
        
        for (i=first;i<=last;i++) {
                j=list[i];
                c=i;
                while ((list[c-1]>j)&&(c>first)) {
                        list[c]=list[c-1];
                        c--;
                }
                list[c]=j;
        }
}

void quicksort(int n)
{
        int first,last,splitpoint;
        
        push(0,n);
        while (top!=-1) {
                pop(&first,&last);
                for (;;) {
                        if (last-first>SMALLSIZE) {
                                                 //find the larger sub-list
                                split(first,last,&splitpoint);
                                                 //push the smaller list
                                if (last-splitpoint<splitpoint-first) {
                                        push(first,splitpoint-1);
                                        first=splitpoint+1;
                                }
                                else {
                                        push(splitpoint+1,last);
                                        last=splitpoint-1;
                                }
                        }
                        else {                   //sort the smaller sub-lists
                                                 //through insertion sort
                                insertion_sort(first,last);
                                break;
                        }
                }
        }                                        //iterate for larger list
}
//end of solution 2

The program removes many disadvantages of the previous program. Still, it is not necessarily an improvement. While using quicksort for your own problem, think before using the second program. If your language implements recursion well, you may decide to go for a recursive version. However, you may decide to use the median-of-three and the look-before-pushing-in-stack improvements. Also, you may decide to keep one recursive call and remove only tail recursion which is more in-efficient and easier to remove without stack. You can also consider which of the two split functions to use. The first one does not need the INFINITY mark at the end of the list. However, it is slightly slower.

The way this program is written, the stack never requires more than 2*ceiling(lg(n)). If we want to save the extra comparison in determining the larger sub-list, we need to store only the 'first' of the new list on the stack. (If we work immediately on the second sub-list, the second of the first sub-list can be known from the first of the second sub-list by subtracting 2).

Even in this case, one of the first and the last can be saved (depending on the sub-list on which we work immediately) with either a boolean variable to tell us what is stored or using a negative value of the index.