Searching in Data Structures

Leave a Comment

Hey Guys! How're you doing?
Today we would look at another data structure operation, performed on linear arrays; Searching. This operation can be imagined as a person trying to search for a record amongst a list of records. Naturally we would also try to perform our search in the fastest possible time. Hence let us study more about searching in data structure, along with the two different approaches we can follow to perform searching in data structures, namely Linear Search and Binary Search.



Introduction to Searching

Let DATA be a collection of data elements in memory, and suppose a specific ITEM of information is given. Searching refers to the operation of finding the location LOC of ITEM in DATA, or printing some message that ITEM does not appear there. The search is said to be successful if ITEM does appear in DATA and unsuccessful otherwise.

Frequently, one may want to add the element  ITEM to DATA after an unsuccessful search for ITEM in DATA. One then uses a search and insertion algorithm, rather than simply a search algorithm. This would amount to searching and insertion operations combined. Hence we can see that apart from the search operation being used just for the sake of searching, it is also a prerequisite operation to various other useful data structure operations.

Now there are many different searching algorithms that can be used for the same purpose. The algorithm that one chooses generally depends on the way the information in DATA is organized. In this lesson, we shall study two different search algorithms, both with their own independent strengths and weaknesses, namely Linear and Binary Search.

The complexity of searching algorithms is measured in terms of the number f(n) of the comparisons required to find ITEM in DATA where DATA consists of n elements. We shall show that while linear search is a linear time algorithm, but that binary search is a much more efficient algorithm, proportional in time to log n with the base 2.
On the other hand, we will also discuss  the drawback of  relying only on binary search algorithm.

Linear Search

Suppose DATA is a linear array with n elements. Given no other information about DATA, the most intuitive way to search for a given ITEM in DATA is to compare ITEM with each element of DATA, one by one. That is, first we test whether DATA[1]=ITEM, and then we test whether DATA[2]=ITEM, and so on. This method, which traverses DATA sequentially to locate ITEM, is called linear search or sequential search.
The algorithm for the same is given as following:

Algorithm for Linear Search

A linear array DATA with N elements and a specific ITEM to be found are given. This algorithm finds the location LOC of ITEM in the array DATA.

1. Set K=0, LOC=0 [Initialize]
2. Repeat steps 3 & 4 while LOC=0 and K<N
3. If ITEM=DATA[K], then set LOC:=K
4. Set K=K+1 [Increment counter]
5. [Start of If Structure]
If LOC=0, then write ITEM is not in the array
Else write LOC is the location of ITEM [Successful]
[End of If Structure]
6. EXIT


Now that we have a clear idea about Linear Search, let us spare some minutes to understand how its complexity is found out.

Complexity of Linear Search

Wait, wait.
Comes out that we have already discussed the complexity through an example in a previous post. You can see it here. However, for the sake of continuity and in my pursuit to have every lesson complete in all terms, I'll repeat myself here.

Suppose we are implementing an algorithm that helps us to search for an record amongst a list of records. We can have the following three cases which relate to the relative success our algorithm can achieve with respect to time:
  • Best Case: The record we are trying to search is the first record of the list. If f(n) is the function which gives the running time and/ or storage space requirement of the algorithm in terms of the size n of the input data, this particular case of the algorithm will produce a complexity C(n)=1 for our algorithm f(n) as the algorithm will run only 1 time until it finds the desired record.
  • Worst Case: The record we are trying to search is the last record of the list. If f(n) is the function which gives the running time and/ or storage space requirement of the algorithm in terms of the size n of the input data, this particular case of the algorithm will produce a complexity C(n)=n for our algorithm f(n) as the algorithm will run n times until it finds the desired record.
  • Average Case: The record we are trying to search can be any record in the list. In this case we do not know at which position it might be. Hence we take an average of all the possible times our algorithm may run. Hence assuming for n data, we have a probability of finding any one of them is 1/n. Multiplying each of these with the number of times our algorithm might run for finding each of them and then taking a sum of all those multiples, we can obtain the complexity C(n) for our algorithm f(n) in case of an average case as following:


Hence in this way, we can find the complexity of our linear search algorithm. Lets move on to Binary Search now.

Binary Search

Suppose DATA is a sorted linear array. Then there is a very efficient searching algorithm that exists for these kind of arrays, by the name Binary Search. Let us get an idea of the same through an example.

Suppose one wants to find the location of some name in a telephone directory (or some word in a dictionary). Obviously, one does not perform a linear search. Rather one opens the directory in the middle to determine which half contains the name being sought. Then one opens that half in the middle to determine which quarter of the directory contains the name. Then one opens that quarter in the middle to determine which eighth of the directory contains the name. And so on. Eventually one finds the location of the name, since one is reducing the number of the possible locations for it in the directory. Binary Search follows the same approach in practise.

The Binary Search Algorithm applied to our array DATA works as following.
During each stage of our Algorithm, our search for ITEM is reduced to a segment of elements of DATA.
DATA[BEG], DATA[BEG+1], DATA[BEG+2],..., DATA[END]

Note that variables BEG and END denote, respectively, the beginning and end location of the segment under consideration. The algorithm compares ITEM with the middl element DATA[MID] of the segment, where MID is obtained by
MID=INT((BEG+END)/2)
(We use INT(A) for the integer value of A)

If DATA[MID] = ITEM, then the search is successful and we set LOC := MID. Otherwise a new segment of DATA is obtained as following:
  • If ITEM < DATA[MID], then ITEM can appear only in the left half of the segment. So we reset END := MID-1 an begin searching again as below:
    • DATA[BEG], DATA[BEG+1],...,, DATA[MID-1]
  • If ITEM > DATA[MID], then item can only appear in the right half of the segment. So we reset BEG := MID+1 and begin searching again as below:
    • DATA[MID+1], DATA[MID+2], ... , DATA[END]
Initially we began with the entire array DATA, ie. we begin with BEG=1 and END=n, or, more generally, with BEG=LB and END=UB.

If ITEM is not in DATA, then eventually we obtain
END < BEG
This condition signals that the search is unsuccessful.

Hence let us see the algorithm for Binary Search.

Algorithm for Binary Search

Here DATA is a sorted array with lower bound LB and upper bound UB, and ITEM is a given item of information. The variables BEG, END and MID denote, respectively, the beginning, end and middle locations of a segment of elements of DATA. This algorithm finds the location LOC of ITEM in DATA or displays an "Element Not Found" message.

BINARY(DATA, LB, UB, ITEM, LOC)
1. Set BEG := LB, END := UB and MID = INT((BEG+END)/2). [Initialize segment variables.]
2. Repeat steps 3 and 4 while BEG <= END and DATA[MID] != ITEM.
3. If ITEM < DATA[MID], then 
Set END := MID - 1
Else
Set BEG := MID + 1
[End of If Structure]
4. Set MID := INT((BEG+END)/2)
[End of Step 2 Loop]
5. If DATA[MID] = ITEM, then set LOC := MID.
Else display a "Element Not Found" message.
[End of If Structure]
6. EXIT.


Now that we have a fair idea of Binary Search Algorithm, let us calculate its complexity.

Complexity of Binary Search

The complexity is measured by the number f(n) of comparisons to locate ITEM in DATA where DATA contains n elements. Observe that each comparison reduces the sample size in half. Hence we require at most at most f(n) comparisons to locate ITEM where


That is, the running time for the worst case is approximately equal to log n with the base 2. Consequently, as this algorithm works, one might also understand the running time for the average case is approximately equal to the running time for the worst case, the complexity for which we just calculated above.

Now that we have studied both the algorithms and their respective complexities, let us have a look at their respective advantages and disadvantages.

When NOT to use Binary Search?

Since the binary search algorithm is very efficient ( eg. it requires only about 20 comparisons with an initial list of 1,000,000 elements), why would one want to use any other search algorithm? Observe that the algorithm requires two conditions:
  • The list must be sorted
  • One must have direct access to the middle element in any sublist
This means that one must essentially use a sorted array to hold the data. But keeping data in a sorted array is normally very expensive when there are many insertions and deletions. Accordingly, in such situations, one may use a different  data structure, such as a linked list, which take very less time for insertions or deletions and also have their middle element directly accessible.

With this let us have a look at their codes (implementation) in computers.

Linear Search in C


//Shubham Mehta, Write a program to perform linear search on an array
#include &ltstdio.h>
void main()
{
 int arr[100], i=0, num=0, digits=0, element;
 printf("%s\n", "Please enter the numeric array (PRESS 10101 TO TERMINATE) : ");
 for(i=0; num!=10101; i++)
 {
  scanf("%d", &num);
  arr[i]=num;
  ++digits;
 }
 --digits;
 printf("\n\n%s\n", "So the entered numeric array is as following: \n");
 for(i=0; i<digits; i++)
 printf("%d\t", arr[i]);
 printf("\n\nPlease enter the element you want to search for: ");
 scanf("%d", &element);
 for(i=0; i<digits; i++)
 {
  if(arr[i]==element)
  goto a;
  else
  continue;
 }
 printf("%s", "The entered element is not found, exiting!");
 getch();
 exit(0);
 a:
 printf("%s%d", "The entered element is found at index ", (i+1));
 getch();
}


Binary Search in C (using loops/ iteration)


//Shubham Mehta, Write a program to perform binary search using iteration on an array.
#include &ltstdio.h>
#include &ltprocess.h>
void main()
{
 int arr[100], i=0, num=0, digits=0, j=0, temp=0, element=0, beg=0, mid=0, last=0, index=0, flag=0;
 printf("%s\n", "Please enter the numeric array (Enter 10101 to terminate) : ");
 for(i=0; ; i++)
 {
  scanf("%d", &num);
  if(num==10101)
  break;
  else
  {
   arr[i]=num;
   ++digits;
  }
 }
 printf("\n\nSorting the numeric array...");
 for(i=0; i &lt (digits-1); i++)
 {
  for(j=(i+1); j<digits; j++)
  {
   if(arr[i]>arr[j])
   {
    temp=arr[j];
    arr[j]=arr[i];
    arr[i]=temp;
   }
  }
 }
 printf("%s", "\nHence the entered numeric array is: \n\n");
 for(i=0; i<digits; i++)
 printf("%d\t", arr[i]);
 printf("\n\nEnter the element you want to search for: ");
 scanf("%d", &element);
 printf("\n\nPerforming binary search...\n\n");
 beg=0, last=digits, mid=(beg+last)/2;
 while(beg &lt =last)
 {
  if(element &lt arr[mid])
  last=(mid-1);
  else if(element>arr[mid])
  beg=(mid+1);
  if(element==arr[mid])
  {
   index=mid;
   goto a;
  }
  mid=(beg+last)/2;
 }
        printf("%s", "Element not found, exiting!");
        getch();
        exit(0);
 a:
 printf("%s%d\n\n", "Hence the element being searched for lies at index number ", (index+1));
 getch();
}


Binary Search in C (using recursion)


//Shubham Mehta, Write a program to perform binary search on an array using recursion.
#include &lt stdio.h>
#include &lt process.h>
int arr[100];
int retindex(int, int, int);
void main()
{
 int i=0, num=0, digits=0, j=0, temp=0, element=0, beg=0, mid=0, last=0, index=0, flag=0;
 printf("%s\n", "Please enter the numeric array (Enter 10101 to terminate) : ");
 for(i=0; ; i++)
 {
  scanf("%d", &num);
  if(num==10101)
  break;
  else
  {
   arr[i]=num;
   ++digits;
  }
 }
 printf("\n\nSorting the numeric array...\nHence the entered numeric array is: \n\n");
 for(i=0; i &lt (digits-1); i++)
 {
  for(j=(i+1); j <digits; j++)
  {
   if(arr[i]>arr[j])
   {
    temp=arr[j];
    arr[j]=arr[i];
    arr[i]=temp;
   }
  }
 }
 for(i=0; i<digits; i++)
 printf("%d\t", arr[i]);
 printf("\n\nEnter the element you want to search for: ");
 scanf("%d", &element);
 printf("\n\nPerforming binary search...\n\n");
 beg=0, last=digits, mid=(beg+last)/2;
 index=retindex(element, beg, last);
 if(index==-1)
        printf("%s", "Element not found, exiting!");
 else
        printf("%s%d\n\n", "Hence the element being searched for lies at index number ", (index+1));
        getch();
}
int retindex(int ele, int be, int la)
{
        int mid=(be+la)/2;
        if(ele &lt arr[mid])
        return retindex(ele, be, (mid-1));
        else if(ele>arr[mid])
        return retindex(ele, (mid+1), la);
        return mid;
}
With this we finish our lesson on Searching as a Data Operation on Linear Arrays. Hope you learnt something that'll help you in life. Take care and stay tuned for the next lesson :)



0 comments:

Post a Comment

Powered by Blogger.