Searching Techniques
| Developed by Hardeep Singh | |
| Copyright | © Hardeep Singh, 2000 |
| seeingwithc@hotmail.com | |
| Website | seeingwithc.org |
| All rights reserved. | |
| The code may not be used commercially without permission. | |
| The code does not come with any warranties, explicit or implied. | |
| The code cannot be distributed without this header. | |
Problem: Given a list of distinct numbers (LIST), and another number (A), find the index of A in LIST if it is present or indicate that it is not present.
Although we have narrated the problem for numbers, it can actually refer to any datatype that can be compared for equality and arranged in order, such as strings.
Note: In the text below, lg(n) means log of n to base 2.
Solution #1: The naive approach
The simplest way, of course, is to compare the given number to each of the numbers in the list. If a number in the list matches the given number A, we can return the index of that number. If we reach the end of the list, we can indicate that A does not exist in LIST. Here is an implementation of this simple algorithm:
#include <stdio.h>
#define MAXELT 50
void main()
{
int i=-1,n=0,list[MAXELT],a;
char t[10];
int linear_search(int list[],int n,int a);
do { //read the list
if (i!=-1)
list[i++]=n;
else
i++;
printf("\nEnter the numbers <End by #>");
fflush(stdin);
scanf("%[^\n]",t);
if (sscanf(t,"%d",&n)<1)
break;
} while (1);
printf("\nEnter the number to be searched for >");
//read the number
scanf("%d",&a);
int pos=linear_search(list,i,a); //do linear search
if (pos==-1) //print the result
printf("\nThe number does not exist in the list.");
else
printf("\nThe number exists at location %d.",pos+1);
}
int linear_search(int list[],int n,int a)
{
for (int i=0;i<n;i++) //step through the list
if (list[i]==a) //if found, return
return i;
return -1; //else return not found
}
//Note that we did not even need to store the numbers in a list in this
//case. We could have kept comparing as we received. However, if multiple
//searches are to be done on a single list, this is not possible. In this
//case, we used this method to ensure conformity with other methods to be
//shown next.
Since this algorithm compares the given number to all N elements of the list, it is an O(n) algorithm. Due to this reason, this algorithm is known as Linear Search.
Although better search algorithms do exist, as we shall see later, at times this algorithm may be the only one applicable. If the medium on which the list is stored is sequential, or if the list is not sorted, this algorithm can be used.
Solution #2: Binary Search
This algorithm goes as follows:
An implementation of the algorithm is given below:
(Here, we show only the changed function binary_search. The rest of the program is the same as in linear search, just change linear_search function with binary_search and change all references to linear_search.)
int binary_search(int list[],int n,int a)
{
int first=0,last=n-1,mid;
while (first<=last) { //iterate while first<=last
mid=(first+last)/2; //calculate mid=trunc((f+l)/2)
if (list[mid]==a) //found
return mid;
else if (a<list[mid]) //not found - look in the
last=mid-1; //upper half of list
else
first=mid+1; //look in lower half
}
return -1; //return "not found"
}
Some people also implement binary search in the following recursive way:
int binary_search_dash(int list[],int first,int last,int a)
{
int mid;
if (first<=last) {
mid=(first+last)/2;
if (list[mid]==a)
return mid;
else if (a<list[mid])
return binary_search_dash(list,first,mid-1,a);
else
return binary_search_dash(list,mid+1,last,a);
}
return -1;
}
Although this is a more direct implementation of the above description, it uses needless stack space, and is much slower on most systems. Also, this form of recursion is called 'Tail Recursion', which is the most wasteful form of recursion. Recursion is a powerful tool which must be used with care. Recursion is advised only in certain cases, where it does not impose unnecessary action, as in Quicksort.
Binary search is O(lg(n)) as it halves the list size in each step. It is a large improvement over linear search; for a list with 10 million entries, while linear search will need 10 million key comparisons, binary search will need just about 24. Although binary search is already very good, at times it can be slightly improved, lets see how.
Solution #3: Fibonacci Search
Who does not know about fibonacci numbers? They crop up in so many diverse problems that cant even be listed! From estimation of the number of rabbits in successive generations, to the number of leaves on branches, fibonacci numbers have far-reaching uses. In future Seeing With C might feature a write-up on different ways of generating them, here we will see ways of using them! (Generation of Fibonacci numbers is another misuse of recursion.)
The fibonacci series has 1 as the first two terms and each successive term is the sum of two previous terms. So, it goes 1,1,2,3,5,8,13,21,44,...etc. Each number appearing in the fibonacci series is called fibonacci number.
Fibonacci search changes the binary search algorithm slightly: instead of halving the index for a search, a fibonacci number is subtracted from it. The fibonacci number to be subtracted decreases as the size of the list decreases.
Look at its implementation:
int fibonacci_search(int list[],int n,int a)
{
int f1,f2,t,mid;
f1=1; f2=1; //for (j=1;fib(j)<n;j++);
while (f1<n) { //find f1 such that f1>=n
f1=f1+f2; //next fibonacci number
f2=f1-f2; //store old f1
}
f1=f1-f2; //find lower fibonacci numbers
f2=f2-f1; //f1=fib(j-2), f2=fib(j-3)
mid=n-f1+1;
while (a!=list[mid]) //if not found
if (mid<0 || a>list[mid]) { //look in lower half
if (f1==1)
return -1;
mid=mid+f2; //decrease fibonacci numbers
f1=f1-f2;
f2=f2-f1;
}
else { //look in upper half
if (f2==0) //if not found return -1
return -1;
mid=mid-f2; //decrease fibonacci numbers
t=f1-f2; //this time, decrease more
f1=f2; //for smaller list
f2=t;
}
return mid;
}
Whew, this one has the longest code so far. It is also O(lgn) algorithm. To do a comparison, we used a list of 10 numbers, and searched for each of the 10 numbers in the list once. Then, we searched for three un-present(?) numbers in the list, a total of 13 searches. We counted the total number of key comparisons using fibonacci_search and binary_search. The total for binary_search was 40, for fibonacci_search it was 41. Since this was a small scale example, binary search scored, in larger instances, it may be the other way around.
ADDENDUM
In this part, I will express the linear and the binary search algorithms in 80386 assembly language. This, is towards the following goals:
Linear Search
Precondition: CX holds the number of entries, SI has been initialised to first array element, D=0.
AGAIN: LODSD ; The instruction reads the next number from memory
; into a CPU register(EAX)
CMP EAX,EBX ; Compare registers EAX and EBX. EBX already holds
; the value to be found
LOOPNE AGAIN ; Loop if EAX and EBX Not Equal and all entries have
; not been exhausted - to statement labeled AGAIN
CMP EAX,EBX ; Out of the loop - Two things are possible: EAX and
; EBX were found equal, or all list entries exhausted
; So, compare to make sure
JNE LABEL1 ; Jump if Not Equal to LABEL1
... ; Code to announce the fact that the entry has been
; found at location marked by CX and exit
LABEL1:... ; Code to announce that the number has not been
; found and exit.
Binary Search
AGAIN: MOV AL,FIRST ; Register AL <- Memory location containing FIRST
CMP AL,LAST ; CoMPare AL (FIRST) with LAST
JA NOTFOUND ; Jump if Above - to NOTFOUND since FIRST>LAST
ADD AL,LAST ; ADD LAST to AL. Al now contains FIRST+LAST
SHR AL,1 ; SHift Right AL once, this will divide it by 2
MOV MID,AL ; Send current value of AL to memory location MID
MOVZX AX,AL ; AX <- AL. (This is called MOV Zero eXpanded)
SHL AX,2 ; SH Left AX by 2. This multiplies it by 4. This is
done because each array value is of $ bytes.
ADD AX,OFFSET ARRAY
; Now AX contains the address of MID element
MOV SI,AX ; SI <- AX
LODSD ; Read value from memory location SI into EAX
CMP EAX,N ; CoMPare EAX with N, the number to be found
JE FOUND ; Jump if Equal to FOUND
JL LESS ; Jump if Less to LESS (to lower list). Note that
; no comparison is done here, just a jump. Just one
; comparison was done in CMP statement above. The flags
; hold the result which is checked by the jump
; instruction. Jump is equivalent to a simple goto
; in high level language; it does not do any inherent
; comparison
MOV AL,MID ; AL <- MID
DEC AL ; AL <- AL-1
MOV LAST,AL ; LAST <- AL. These three instructions accomplish
; LAST=MID-1.
JMP AGAIN ; Iterate
LESS: MOV AL,MID ; AL <- MID
INC AL ; AL <- AL+1
MOV FIRST,AL ; FIRST <- AL. These three instructions accomplish
; FIRST=MID+1
JMP AGAIN ; JuMP
FOUND: ... ; Code to indicate "Found number" at location contained
by SI and exit
NOTFOUND: ... ; Code to indicate "Not found"
That's all folks!