Tuesday, 30 July 2013

Sorting And Searching

                                      Introduction

In my next few blog, I'll discuss sorting and searching in details.Sorting is the process of putting a group of items in specific order(increasing or decreasing order according to some linear relationship among the list of  elements) whereas searching refers to finding an item with specified properties among a collection of items.Following figure shows the topics that I'll discuss one after another.


                                                  Linear Search

It is a very straight forward method where we compare each element of Array with the key, If the element is found,we announce that element is found or the location of the element in the array.Otherwise we print that element is not present in the list.
The worst case ( if the element is present at the end of the array or not present at all) and average case complexity of Linear search is proportional to the number of elements in the list,but best time complexity is O(1)(the key is present at the beginning of the array).The algorithm is best suited if the number of elements in the array is small.Advantages of linear search are ->
  1. It is easy to implement and understand.
  2. Unlike binary and ternary search ,it does not impose additional requirements.
The pseudo code for the algorithm is->
                         For each element in the Array:
                                    if the item has the desired value,
                                                 stop the search and return the location.

I have already show you code for linear search in my previous blog

                                               Elementary Sorts

Here I will discuss three sorting algorithms.They are
  1. Bubble Sort or Sink Sort.
  2. Selection Sort.
  3. Insertion sort.

                                                    Bubble Sort or Sink Sort

   This technique is called Bubble sort or Sinking sort because the smaller values gradually  "bubble" their way upward to the top of the array like air bubbles rising in water, while the larger values sink to the bottom of the array.
Pseudo code for the algorithm shown below:
                      procedure BubbleSort(A : Array of  sortable items)
                       For i = 0 to length(A) - 2 :
                              For j = i+1 to length(A) - 1:
                                      if A[i] > A[j] :
                                           swap(A[i],A[j]).
                                      end if
                              end For
                      end For.

Complexity Analysis:
In Worst,Average and Best,all three cases j runs for i+1 to end of the array.Hence the complexity in three cases is clearly O(N^2).
Explanation->
for i=0 j runs (N-1) times
for i=1 j runs (N-2) times
for i=2 j runs (N-3) times
......................................
......................................
......................................
......................................
for i=N-1 j runs 1 times
Hence Total cost =(N-1) + (N-2) + ..... + 2 + 1
                           =(N-1)*(N-1+1)/2
                           = N(N-1)/2
                           = O(N^2)
An Example:
Now let's see how the algorithm works on An array A.
(Each time the selected entries are marked with underline and swapped elements are highlighted).
 1st pass ( i=0,j= 1 to 4)  
 

2nd pass ( i=1,j= 2 to 4) 

3rd pass ( i=2,j= 3 to 4) 

4th pass (i = 3,j = 4 to 4)
                                
   

Now the array is a sorted one after completion if the algorithm.

Here is C++ code for the above algorithm.
Bubble Sort

A sample run is shown below.
    


That's all about Bubble Sort.In next blog, I'll explain two other elementary sorts.If you have any doubt,drop me a email at ernstbinoy@gmail.com.









Sunday, 14 July 2013

How To Solve Recurrence Relations

                                   Introduction

A recurrence relation can be defined in the following way.
1.T(0) or T(1) =c1;
2.T(n) = c2 + T(n-1);where
T(0) or T(1) = cost to solve problem of  0 or 1 size respectively.
T(n) = cost to solve problem of size n.
T(n-1) = cost to solve problem of size n-1.
c1 and c2 are constants.
Here Line 1 describes basic step,it says the cost of problem for trivial case(s),whereas line 2 describes the cost of problem for n input size in terms of (n-1) input size.
When a algorithm contains a recursive call to itself,its running time can often be described in terms of Recurrence Relation.
Solving a Recurrence relation means to eliminate the recursion from function definition and finding the closed form of the function.There are several technique to solve Recurrence Relations.They are 
  • Substitution Method
  • Iterative Method or Recursion Tree Method
  • Master Method
Here i'll discuss all these one by one.

                            Substitution Method

It is one of the most powerful method of solving Recurrence relation.This method comprises of two step

  1. Guess the form of the solution.
  2. Next take help of Mathematical Induction to establish the correctness of the Guess
The name Substitution Method came from the fact that actually we substitute the Guessed solution for the function when applying the inductive hypothesis to the smaller values(You can follow it much when i'll show examples). The key idea of this method lies in Guessing the form of the solution.If the initial guess is wrong,then it will need to make it correct later.One thing I want to mention is that there is no predefined rule on how to make a initial guess,but one can use his experience while taking guess.
Let's see how it Work with examples.
 T(n)=1 ,if n=1
 otherwise  T(n) =2*T(floor(n/2))+n
=>  Guess T(n) =O(n*log(n)).Next we will Prove it by induction.
To show the upper bound,we will try to find two constants c and n(0) such that T(n) <=c*f(n) for all n>=n(0).First we will try to find out the values for n=2,n=3(small n)
Basic Case:

  1. T(2) =4
  2. T(3) = 5.

Our aim is to establish the result  T(n) <=c*f(n), so now we will focus on finding out value of c and one can easily see that c=2 for n=2 and n=3.But we need induction to establish the guess for all n.

Inductive Steps: Let's assume that the guess is true for all n=2,3,......K;
for n=K+1,we have T(n)=2*T(floor(n/2))+n;
                                     <=2*c(floor(n/2))*log(floor(n/2)) +n;
                                     <=c*n*log(n/2)+n;=c*n*log(n)-c*n+n
                                     <=c*n*log(n);
Hence the induction step holds true.The complexity of the above recurrence relation is O(n*log(n))

Be Careful  { Error may occur while using Asymptotic notation.
For example,in the previous question,if one try to establish T(n) = O(n) by taking the initial guess T(n)<=c*n  and apply it in recurrence relation in the following manner,
T(n)<=2(c*(floor(n/2))) + n
      <=(c*n) + n
      <=n(1+c)
      <=O(n)     
But it is not correct because it is not the exact form of inductive hypotheses,the actual form is T(n)<=c*n.So we have to strictly prove that form to establish the complexity is O(n).}
Next I will show you O(n^2) is also a solution of this the recurrence relation.( Because if F(n)= n^2 and G(n)=n*log(n),then G(n)=O(F(n)) ,follow previous blog!!)
Guess T(n) <= c*n^2
applying in the recurrence relation we get T(n)<=2*(c*(floor(n/2))^2) +n
                                                                     <=(c*n^2)/2+n
                                                                     <=C*n^2
Hence complexity is O(n^2).
Try to Prove that T(n) !=O(log(n)) by taking initial guess  T(n)<=c*log(n).
Also try to solve the T(n) =T(n^(.5)) +1.It is the recurrence relation for Euclid's G.C.D Algorithm(Hope you all knew the algorithm,if not go for a google search).

            Iterative Or Recursion Tree Method

Basic idea of this method is to Expend the terms and then try to derive the base case from it.Generally we avoid floor or ceiling here.I'm not going into formal definition as because you all know the concept of iteration.Just look the examples i'm explaining,you can definitely follow it.
Ex1. T(n) = T(n/2) +c ,T(1) =1. be the given recurrence relation.Here we will substitute the recursive part of R.H.S with the help of the definition of recursive expression on L.H.S.
            So,T(n) =T(n/2) + c can be written as
                         =[ T((n/2)/2) +c ] + c
                         =T(n/(2^2)) +c +c
                         =[ T(n/(2^2)/2) +c ] +c+c
                         =T(n/(2^3)) + c+c+c
                         =.................................
                         =.................................
                         =.................................            
                         =T(n/(2^k)) + c+c+c.................k times        -eq(1).
Now,let's consider 2^k =n
                                   k=log(n),substituting in eq(1),we get
                    T(n) =T(1)+ k*c
                           =1+c*log(n)
                          =O(log(n))
Recursion Tree for T(n) = T(n-1) + 1
Another Example :T(n) = n^2 +2*T(n/2)
                                   = n^2 +2*[ (n/2)^2 + 2^2 T(n/(2^2))]
                                   =n^2 + 2* n^2/4 +4*T(n/4)
                                   =n^2+ 2*n^2/4 + 4*n^2/16+8*T(n/8)
                                   =................................................
                                   =................................................
                                   = Σ[k=0 to log(n)-1]  ( n^2(.5)^k+n*T(1) )
                                   processing in the same manner we will get complexity = O(n^2)
Recursion Tree is shown below

Try to solve Previous question using this technique.It's too simple.Also try to Solve this one
T(n) = 2*T(n^(.5)) +log(n),T(1) =1.                          

                                     Master Method

This a faster method for solving recurrence relation.Here we express the inductive step of  recurrence relations as T(n) = a*T(n/b) +f(n) where a>=1 and b>1 and f(n) is some asymptotically positive function.
Master theorem have following three cases.


The basic idea of the theorem is that it will actually find the greater among the functions.
You can easily realize In case 1 n^(log(a)/log(b)-e) is larger and  in case 3 f(n) is larger.In case 2 both have same size,so we multiply by a logarithmic factor,Hence the solution is Θ(f(n)).
Now we will solve some problem.
Example 1 T(n) = 9*T(n/3) + n^2
=> a= 9,b= 3,log(a)/log(b) =2.hence n^(log(a)/log(b))  = n^2.So,it matches with second case.Hence the complexity is Θ(n^2*log(n)).
Example 2 T(n) = 3*T(n/4) + n*log(n).
=>clearly it falls under case 3.Hence complexity is Θ(n*log(n)).
Example 3 T(n) = 2*T(n/2) +n/log(n).
=>Not applicable.There exists non-polynomial difference between two terms.


So that's all about Solving Technique Of Recurrence Relation.If u have any doubt regarding this article,drop me a mail at ernstbinoy@gmail.com. You can also Follow these links.

  1. http://net.pku.edu.cn/~course/cs101/2007/resource/Intro2Algorithm/IntroductionToAlgorithm_v2_en.pdf
  2. http://www.csd.uwo.ca/~moreno//CS424/Ressources/master.pdf
  3. http://cs.anu.edu.au/student/comp3600/lec_print/alg_lec05_recurrences_print.pdf
  4. http://www.cs.nthu.edu.tw/~wkhon/algo09/lectures/lecture3.pdf
  5. http://www.myedupages.com/all-topics/165-fundamentals-of-algorithms/merge-sort/381-the-iteration-method-for-solving-recurrence-relations

Friday, 28 June 2013

Analysis of Time Complexity

Introduction:In order to analysis time complexity of an algorithm,we do not go for measuring what amount of time (in ms or second) an algorithm is taking simply because time taken by an algorithm solely depends on the CPU clock speed and Other hardware factors which are not in our(Programmers') control. A CPU having better processing power than another,takes less time to generate output though same code is running in both machine.
Benchmark Testing


Time is in nanoSeconds unit.












Hence it is clear that empirical method is not a reliable yardstick of measuring time complexity of an algorithm.That's why Asymptotic Analysis is used,which doesn't depend on machine or compiler,instead it takes number of steps an algorithm is taking into consideration.
In order to Analysis an algorithm using Asymptotic Notations,we use uniform cost model of computation because of its simplicity.We assign unity cost to compute each mathematical functions(s) (it could be unary or binary arithmetic operation(s), trigonometrical function(s)), initialization, declaration, condition checking, Assignment operations etc.
Now we are in the position of discussing


  • Asymptotic upper bound or Big Oh Notation.
  • Asymptotic lower bound or Big Omega Notation.
  • Asymptotic tight bound or Theta notation.
There are other two notations,but I will not discuss them because they are less use in compare to above three notation.They are
  • Small Oh Notation.
  • Small  Omega Notation.

                  Asymptotic Upper Bound or Big Oh Notation
The notation is read as "f of n is big o of g of n". Informally f(n)=O(g(n)) means "f(n) grows not faster than g(n)".Before coming into Mathematical definition,Let's assume f(n) is a real world function( say 100*n^2+3*n+27 or 7*2^(n-1) +5*n^2+47)  and  g(n) is another function (comparatively simple like n^2 or 2^n).Now,we can say f(n)=O(g(n)) if there exist positive constants c and n(0) such that f(n) <= c*g(n) for all n>=n(0). The following fig. is graphical representation of  the definition.
Fig-I


Let's see an example first.
Ex-1

Let (10000+10*n)=f(n)=time taken by n transaction.Consider g(n)=n.Now we will try to find c and n(0) such that f(n)<=c*g(n).If we take c=20,then for all n greater than 1000,g(n) dominates f(n), i.e g(n) is asymptotic upper bound of f(n).Look at the picture below.
Fig-II
      
                        Few notes on Big Oh Notation
  • One thing we need to remember that, for f(n)=O(g(n)),g(n) is not unique.It refers to the set of functions that satisfy the definition.As f(n) = 10000+10*n
       <=10000*n^2+10*n^2 (as n>=0)
       <=10010*n^2
        =O(n^2) [consider c=10010,n>=0]
       Hence g(n)=n^2 is also asymptotic upper bound of the function f(x).Same is true for g(n)= n^3 or g(x)=2^n or g(x)=n!.
  • Big Oh doesn't take care about constant terms.There is no difference between O(2*n) and O(n).But O(e^(2*n)) != O(e^n) or  O(5^n) !=  O(2^n) because one term is not constant factor of another.
  • Big oh notation doesn't say about what the functions are.Later I'll explain this topic.
                              Calculation of Upper Bound 
Now,Let's try to calculate upper bound of an algorithm with the help of few basic code.Codes written here are all in C++.If you don't know C++ very well,follow http://www.cplusplus.com/reference/ .It is really helpful.Also you can follow "Thinking In C++" by Bruce Eckel.

  •  First try to understand the code having constant time complexity.

 Source Code 1


So,it's clear that total Time taken  T=c1+c2+c3=C(say).We call it constant time algorithm( Constant time algorithm refers to those which takes always some constant time regardless of the input size).


  • Let's see  an example of Linear Time Algorithm.


It's easy to realize that Total time taken by the algorithm is proportional to the size of Input set.One thing I want to mention that,The complexity of the algorithm is linear not because it takes N no of data as input,but because of traversing the array element.If we are lucky enough,we may get  'X' at the beginning.Then the complexity will be O(1).And in worst case it takes O(N) time.So,For the above code,
Best case running Time =O(1)
Average case running Time =O(N) [O((N+1)/2) =O(N)]
Worst case running Time =O(N).Now you can realize why I have mentioned earlier "Big oh notation doesn't say about what the functions are".





  • Now I will show the code for Printing all subsets of a given set.A set having N no of elements has 2^n no of subset as power of a set contains 2^n elements.

  • Source Code 3
    The outer loop iterates 2^n number of times and for each value of  'i', 'j' runs at least once and 'k'runs exactly n times in inner loop.So,roughly(avoiding complex terms and taking a bit simple and dominating) the complexity of above algorithm is O(n*2^n).One most important thing about the above algorithm is that performance of these kind of code partly depends on output.That's why they are also called Output Sensitive algorithm.Here printing 2^n number contributes significantly to the running time of the code.
    That's end our discussion on Big Oh notation.Now I will explain Big Omega Notation
    .
             Asymptotic lower bound or Big Omega Notation(Ω)
    Big Omega notation defines asymptotic lower bound of an algorithm. Mathematically g(n) is said to be asymptotic lower bound of f(n) if there exist c and n(0) such that for all n >= n(0), f(n)>c*g(n).You may have noticed that It is just reverse of asymptotic upper bound or Big Oh notation.So,we can say f(n) = Ω(g(n)) => g(n)=O(f(n)).So n^2 = Ω (n) as n=O(n^2).

    See an example shown below.It will find out whether any two element of an sorted array sums up to X or not. Here I have used Two Pointer Algorithm Design Technique (later we will see more example).
    Source Code 4
    The algorithm is Pretty simple.One may think that running time is proportional to the input size.It's true.But in Best case,We will find the value X by adding 1st and last element(As in linear search,we may get 'x' at the beginning).Hence lower bound of the algorithm is Ω(1).Look at the figure below.


    For the given input set,the outer loop run only once.








            
                Asymptotic Tight Bound Or Theta Notation (Θ)
    Let f(n) = O(g(n)) and f(n) = Ω(T(n)) (i.e f(n) lies in between c*g(n) and d*T(n) where c and d are constants and n>n(0)).Now if g(n) = T(n) then we can say g(n) is asymptotic tight bound of f(n) i.e f(n) = Θ(g(n)).
    Hence  f(n) = Θ(g(n))  =>  f(n) = O(g(n)) and f(n) = Ω(g(n)).
    Fig 2
    Most important property of Θ notation is that it is symmetric in nature. If n^3  Θ(3*n^3-7*n+6)  then (3*n^3-7*n+6)  ∈ Θ(n^3). Similarly, n^3  Θ(3*n^7-7*n+6) as (3*n^7-7*n+6)  Θ(n^3).
    It is not always possible to find tight bound of a function.Ex- f(n) =n(1+sin(n)).


    That's end the topic Analysis Of Time complexity.Follow these links to know more.

    1. 1.http://www.youtube.com/watch?v=ca3e7UVmeUc
    2. http://www.youtube.com/watch?v=VIS4YDpuP98
    3. http://www.youtube.com/watch?v=NfRp1Yhw8EQ
    4. https://en.wikipedia.org/wiki/Big_O_notation
    5. http://en.wikipedia.org/wiki/Analysis_of_algorithms


    Monday, 17 June 2013

    Correctness Proving Methods

    As I have mentioned in my previous blog that an algorithm is said to be correct if it halts and produces correct output for all of its input values.Now in this blog i'm going to explain how to prove an algorithm correct.
    To prove correctness of an algorithm, generally we use two most common approach.They are
    1. Mathematical Induction
    2. Contradiction 
                       Mathematical Induction
    Mathematical Induction can be broadly divided into two category.
    1. Weak Induction
    2. Strong Induction
    Weak Induction:
    In weak Induction,we commonly use two steps to Prove an algorithm correct.They are
    • Base Case or Trivial Case where we claim that the algorithm holds for trivial case(s).
    • Inductive step where we prove that the the algorithm is correct for some natural number n+1 or n-1,given that it holds true for n.
    P: Premises,C: Conclusion.
    The above figure illustrates the definition in terms of Mathematical Logic.The Premises P(K) means the claim is true for trivial case(K could be 0 or any any trivial value), for all (P(n)=>P(n+1)) specifies that if P(n) is true then P(n+1) is also true.If these two Premises are true, Weak Induction concludes that P(x) is true for all x greater than K.
    Now we will Prove by weak induction that the sum S(n)=1+3+5+7+.......+2n-1 is equal to square of n for all n>0.
    Base case:S(1)=1=1^2 (True),S(2)=1+2(2)-1=1+3=4=2^2 (Holds Good!)
    Inductive Step:Let S(k)=k^2; So,1+3+5+7+.......+2k-1=k^2.
    Now we can write S(k+1)=1+3+5+.....+(2k-1)+2(k+1)-1
                                            =S(k)+2(k+1)-1[ as S(k)=k^2=1+3+5+7+.......+2k-1]
                                            =k^2+2k+2-1
                                            =k^2+2k+1
                                            =(k+1)^2 (Proved)
    Try to prove that a n sided polygon can have almost n(n-3)/2 no of diagonals.
    Strong Induction:
    In strong induction, to prove P(n) is true, we assume that all k(<n) are true and using all of these we prove that the algorithm is true.
    the above figure shows the mathematical definition of Strong Induction.
    Now I will use the idea of Strong Induction to prove that sum of interior angle of a n sided polygon is 
    (n-2)*Pi where Pi=180 degree.

                             
    n=3,sum of interior angle is 180 degree=(3-2)*Pi            







    n=4,sum of interior angle is 360 degree=(4-2)*Pi 






    Now,let's assume that the result holds for all k less than n.To Prove the result holds for n sided polygon,1st draw a n sided polygon.
    we have divided it into two parts having (K+1) and (n-K+1) no. of  edges.Now both (K+1) and (n-K+1) are less than n.So, the sum of interior angle in Polygon 1 is
    =(K+1-2)*Pi and sum of interior angle in polygon 2 is
    =(n-K+1-2)*Pi.
    Hence sum of Interior angle in n sided polygon will be
    =(K+1-2)*Pi +(n-K+1-2)*Pi
    =(K+1-2+n-K+1-2)*Pi
    =(n-2)*Pi .... Proved.

    You may go to these link to learn more on strong induction. http://courses.engr.illinois.edu/cs173/sp2009/lectures/lect_18.pdf

    **Weak Induction is also called as 1st Principal of Mathematical Induction and Strong Induction is called 2nd Principal of Mathematical Induction
       
                                                  Contradiction
    In Proof by Contradiction,we first assume that the statement is false,and next we will show that this assumption implies us that some known and true statement is false.Thus we can conclude that our initial assumption was wrong,i.e the statement is true.
    In term of Mathematical Logic,we assume that Statement S is false, i.e not S is true and next we will show that (not S)=>(T and Not T)(where 'and' and 'not' defines logical 'and' and 'or' operation respectively). So,(not S) is false,hence S is true.
    Let's see one most common example.
    The Theorem says that "There are infinite no of Primes". 
    Proof: Let The above statement is false, that means that there exists finite number of Primes and let us also assume that all the Prime numbers belong to the set  P={p1,p2,.....,pn}.
    Now,consider Q=p1*p2*......*pn+1.It's obvious that q is either Prime or Composite.Q is not divisible by any of the Prime Number listed in set P(There remain 1 as remainder).so Q can not be composite.Hence Q is Prime and it does not belong to set P also.So our assumption was wrong.
    Hence there exists Infinite number of Primes.
    You will get Plenty of examples in Internet.

    Sunday, 16 June 2013

    Introduction To Algorithm


                            What Is Algorithm?
    Before defining algorithm , Let's consider a function y=f(x).What does it mean actually?Here f is nothing but a machine which takes a value of  x as raw material and produces y as product depending on raw material (i.e x).For example ,f(x)=x^2(consider x belongs to set of integer),it is a function (or machine) that gives the square of a integer as output which is given as input to the function.
    In Computer Science (and in Mathematics also), algorithm is a set of instructions (or collection of procedures you can say) which takes some value(s) as  input and after step by step processing the value(s) (as specified by the set of instructions) it produces output in some finite time.It can be written in English,Computer Program or even as a Hardware design.

                                   Why Do We Learn Algorithm?
    No doubt,Design and Analysis of Algorithm is the most challenging and interesting area of computer science.It helps to think a problem more mathematically as well as abstractly.To become a Class creator or Application Programmer in a company one should have a good knowledge on this subject.Algorithm is required in almost all fields of Computer Science.consider Theoretical Computer Science,you need K.M.P string matching algorithm,in Computer Networks you need Routing Algorithms(Static and Dynamic), in Cryptography and Security you use R.S.A Algorithm etc.

               Characteristics Of A Good Algorithm

    •  Finiteness : The most basic and necessary characteristics of an Algorithm is Finiteness. It must terminate after a finite no of steps and each step must be executable in finite amount of time. An algorithm is said to be incorrect if it does not halt for any of its input value.
    • Effectiveness : Each step of a good algorithm Should be effective.It must have some significance related to the solution of the problem(i.e it contribute something to reach the closer to the actual solution).Otherwise it unnecessarily increases time and memory overhead.
    • besides a good algorithm should be Unambiguous,Divided into Modules and Easy to Understand.