Searching Techniques

Developed by Hardeep Singh
Copyright © Hardeep Singh, 2000
EMail seeingwithc@hotmail.com
Website seeingwithc.org
All rights reserved.
The code may not be used commercially without permission.
The code does not come with any warranties, explicit or implied.
The code cannot be distributed without this header.

Problem: Given a list of distinct numbers (LIST), and another number (A), find the index of A in LIST if it is present or indicate that it is not present.

Although we have narrated the problem for numbers, it can actually refer to any datatype that can be compared for equality and arranged in order, such as strings.

Note: In the text below, lg(n) means log of n to base 2.

Solution #1: The naive approach

The simplest way, of course, is to compare the given number to each of the numbers in the list. If a number in the list matches the given number A, we can return the index of that number. If we reach the end of the list, we can indicate that A does not exist in LIST. Here is an implementation of this simple algorithm:

#include <stdio.h>

#define MAXELT 50

void main()
{
        int i=-1,n=0,list[MAXELT],a;
        char t[10];
        int linear_search(int list[],int n,int a);

        do {                                    //read the list
                if (i!=-1)
                        list[i++]=n;
                else
                        i++;
                printf("\nEnter the numbers <End by #>");
                fflush(stdin);
                scanf("%[^\n]",t);
                if (sscanf(t,"%d",&n)<1)
                        break;
        } while (1);

        printf("\nEnter the number to be searched for >");
                                                //read the number
        scanf("%d",&a);

        int pos=linear_search(list,i,a);        //do linear search

        if (pos==-1)                            //print the result
                printf("\nThe number does not exist in the list.");
        else
                printf("\nThe number exists at location %d.",pos+1);
}

int linear_search(int list[],int n,int a)
{
        for (int i=0;i<n;i++)                   //step through the list
                if (list[i]==a)                 //if found, return
                        return i;
        return -1;                              //else return not found
}

//Note that we did not even need to store the numbers in a list in this 
//case. We could have kept comparing as we received. However, if multiple
//searches are to be done on a single list, this is not possible. In this
//case, we used this method to ensure conformity with other methods to be 
//shown next.

Since this algorithm compares the given number to all N elements of the list, it is an O(n) algorithm. Due to this reason, this algorithm is known as Linear Search.

Although better search algorithms do exist, as we shall see later, at times this algorithm may be the only one applicable. If the medium on which the list is stored is sequential, or if the list is not sorted, this algorithm can be used.

Solution #2: Binary Search

This algorithm goes as follows:

  1. Find the middle element of the list.
  2. Compare the middle element with A (the given element).
  3. If both are equal, return the index of the middle element.
  4. f they are not equal, then if A is smaller than the middle element, then we must search for A in the upper half of the list (remember: the list is sorted) otherwise, we must search in the lower half.
  5. Now in the smaller list, find the middle element and repeat the process.

An implementation of the algorithm is given below:

(Here, we show only the changed function binary_search. The rest of the program is the same as in linear search, just change linear_search function with binary_search and change all references to linear_search.)

int binary_search(int list[],int n,int a)
{
        int first=0,last=n-1,mid;
        
        while (first<=last) {                   //iterate while first<=last
                mid=(first+last)/2;             //calculate mid=trunc((f+l)/2)
                if (list[mid]==a)               //found
                        return mid;
                else if (a<list[mid])           //not found - look in the
                        last=mid-1;             //upper half of list
                else
                        first=mid+1;            //look in lower half
        }
        return -1;                              //return "not found"
}

Some people also implement binary search in the following recursive way:

int binary_search_dash(int list[],int first,int last,int a)
{
        int mid;

        if (first<=last) {   
                mid=(first+last)/2;
                if (list[mid]==a)
                        return mid;
                else if (a<list[mid])
                        return binary_search_dash(list,first,mid-1,a);
                else
                        return binary_search_dash(list,mid+1,last,a);
        }
        return -1;
}

Although this is a more direct implementation of the above description, it uses needless stack space, and is much slower on most systems. Also, this form of recursion is called 'Tail Recursion', which is the most wasteful form of recursion. Recursion is a powerful tool which must be used with care. Recursion is advised only in certain cases, where it does not impose unnecessary action, as in Quicksort.

Binary search is O(lg(n)) as it halves the list size in each step. It is a large improvement over linear search; for a list with 10 million entries, while linear search will need 10 million key comparisons, binary search will need just about 24. Although binary search is already very good, at times it can be slightly improved, lets see how.

Solution #3: Fibonacci Search

Who does not know about fibonacci numbers? They crop up in so many diverse problems that cant even be listed! From estimation of the number of rabbits in successive generations, to the number of leaves on branches, fibonacci numbers have far-reaching uses. In future Seeing With C might feature a write-up on different ways of generating them, here we will see ways of using them! (Generation of Fibonacci numbers is another misuse of recursion.)

The fibonacci series has 1 as the first two terms and each successive term is the sum of two previous terms. So, it goes 1,1,2,3,5,8,13,21,44,...etc. Each number appearing in the fibonacci series is called fibonacci number.

Fibonacci search changes the binary search algorithm slightly: instead of halving the index for a search, a fibonacci number is subtracted from it. The fibonacci number to be subtracted decreases as the size of the list decreases.

Look at its implementation:

int fibonacci_search(int list[],int n,int a)
{
        int f1,f2,t,mid;
        
        f1=1; f2=1;                             //for (j=1;fib(j)<n;j++);
        while (f1<n) {                          //find f1 such that f1>=n
                f1=f1+f2;                       //next fibonacci number
                f2=f1-f2;                       //store old f1
        }
        f1=f1-f2;                               //find lower fibonacci numbers
        f2=f2-f1;                               //f1=fib(j-2), f2=fib(j-3)
        mid=n-f1+1;
        while (a!=list[mid])                    //if not found
                if (mid<0 || a>list[mid]) {     //look in lower half
                        if (f1==1)
                                return -1;
                        mid=mid+f2;             //decrease fibonacci numbers
                        f1=f1-f2;
                        f2=f2-f1;
                }
                else {                          //look in upper half
                        if (f2==0)              //if not found return -1
                                return -1;
                        mid=mid-f2;             //decrease fibonacci numbers
                        t=f1-f2;                //this time, decrease more
                        f1=f2;                  //for smaller list
                        f2=t;
                }
        return mid;
}

Whew, this one has the longest code so far. It is also O(lgn) algorithm. To do a comparison, we used a list of 10 numbers, and searched for each of the 10 numbers in the list once. Then, we searched for three un-present(?) numbers in the list, a total of 13 searches. We counted the total number of key comparisons using fibonacci_search and binary_search. The total for binary_search was 40, for fibonacci_search it was 41. Since this was a small scale example, binary search scored, in larger instances, it may be the other way around.

ADDENDUM

In this part, I will express the linear and the binary search algorithms in 80386 assembly language. This, is towards the following goals:

  1. For people who don't know assembly/machine languages, I want them to understand assembly and the way the programs that they have been writing runs on the machine, after compilation. I want the to specially notice the fact that what look like two distinct key comparisons in the mail loop of binary search can be realised with just one CoMPare instruction at the assembly level, therefore it is always counted as one.
  2. For those who know a little assembly but work mainly in C/C++, I want to provide them with more practice in the native computer language. It always helps to know assembly!
  3. For those who already know assembly, I want to provide them with ready made timesaver routines. Even while writing C/C++ software, these assembly routines can be used to speed things up, since all good C/C++ compilers allow facilities to write inline assembly code.

Linear Search

Precondition: CX holds the number of entries, SI has been initialised to first array element, D=0.

AGAIN: LODSD            ; The instruction reads the next number from memory 
                        ; into a CPU register(EAX)
       CMP EAX,EBX      ; Compare registers EAX and EBX. EBX already holds 
                        ; the value to be found
       LOOPNE AGAIN     ; Loop if EAX and EBX Not Equal and all entries have
                        ; not been exhausted - to statement labeled AGAIN
       CMP EAX,EBX      ; Out of the loop - Two things are possible: EAX and
                        ; EBX were found equal, or all list entries exhausted
                        ; So, compare to make sure
       JNE LABEL1       ; Jump if Not Equal to LABEL1
       ...              ; Code to announce the fact that the entry has been
                        ; found at location marked by CX and exit
LABEL1:...              ; Code to announce that the number has not been
                        ; found and exit.

Binary Search

AGAIN: MOV AL,FIRST     ; Register AL <- Memory location containing FIRST
       CMP AL,LAST      ; CoMPare AL (FIRST) with LAST
       JA NOTFOUND      ; Jump if Above - to NOTFOUND since FIRST>LAST
       ADD AL,LAST      ; ADD LAST to AL. Al now contains FIRST+LAST
       SHR AL,1         ; SHift Right AL once, this will divide it by 2
       MOV MID,AL       ; Send current value of AL to memory location MID
       MOVZX AX,AL      ; AX <- AL. (This is called MOV Zero eXpanded)
       SHL AX,2         ; SH Left AX by 2. This multiplies it by 4. This is
                          done because each array value is of $ bytes.
       ADD AX,OFFSET ARRAY
                        ; Now AX contains the address of MID element
       MOV SI,AX        ; SI <- AX
       LODSD            ; Read value from memory location SI into EAX
       CMP EAX,N        ; CoMPare EAX with N, the number to be found
       JE FOUND         ; Jump if Equal to FOUND
       JL LESS          ; Jump if Less to LESS (to lower list). Note that
                        ; no comparison is done here, just a jump. Just one
                        ; comparison was done in CMP statement above. The flags
                        ; hold the result which is checked by the jump
                        ; instruction. Jump is equivalent to a simple goto
                        ; in high level language; it does not do any inherent
                        ; comparison
       MOV AL,MID       ; AL <- MID
       DEC AL           ; AL <- AL-1
       MOV LAST,AL      ; LAST <- AL. These three instructions accomplish
                        ; LAST=MID-1.
       JMP AGAIN        ; Iterate
LESS:  MOV AL,MID       ; AL <- MID
       INC AL           ; AL <- AL+1
       MOV FIRST,AL     ; FIRST <- AL. These three instructions accomplish
                        ; FIRST=MID+1
       JMP AGAIN        ; JuMP
       
FOUND: ...              ; Code to indicate "Found number" at location contained
                          by SI and exit
NOTFOUND: ...           ; Code to indicate "Not found"

That's all folks!