CM10135 / Programming II:   Lecture 4


More on Algorithms and Complexity:

Logarithmic Complexity and the Big O


Notes say this lecture ran a little short last year but students still didn't get logs well enough -- recommend having them guess the complexities in part VI.  This year a little more stuff that had been in Lecture 3 & 5, & reverse.

-I.  Reverse

(defun reverse-help (oldlist newlist)
(cond ((null oldlist) newlist)
(t (reverse-help (cdr oldlist) (cons (car oldlist) newlist)))))

(defun my-reverse (list)
(reverse-help list nil))

I. Levels of Complexity (review from Last Week + examples)

  1. How the number of operations they perform changes if parameters change?
    1. This question is referred to as  scaling.
    2. [draw graph of constant, linear & quadratic again; save board]
    3. Scaling happens with respect to some parameter.
    4. Example [mentioned in lecture 3]:  As an animal gets taller, it's weight typically scales at height3.
      1. This is because weight is directly related to volume, not height.
      2. volume = height x width x depth.  If you assume width & depth are also correlated to height, then volume is correlated to height3.
      3. Bone strength can grow this same way, but exoskeletons can't, so vertebrates can grow bigger than insects.
  2. Example in algorithms:  finding out about arrays or lists
    1. the length of an array:  How does this scale on the number of items in a collection?  (Redraw diagram)
      1. Just look up a variable in the collection object that tells you its length.
      2. Always takes the same number of steps however many items there are.
      3. This is called a constant time algorithm.
    2. the length of a list:
      1. Start at the beginning and count until you reach the last item (which must be marked somehow)
      2. The number of steps is dependent on the number of objects.
      3. This is a linear algorithm.
    3. how many unique items are in an array or list
      1. for each item of the list you will have to check if any of the other items are the same, so you go through the list once for each item in the list.
      2. The number of steps is the square of the number of items in the collection.
      3. This is said to scale quadratically.

II. Logarithms & Logarithmic Complexity

  1. You need to remember what logarithms are to understand complexity analysis.
    1. More review about logarithms.  
    2. A slightly heavier logarithm page.
    3. But don't worry  -- if you understand this page, that's all you need for this unit.
  2. The default base for log is 10.  So log(100)=2 is just another way to say 102 = 100.
  3. In computer science, we normally use base 2.  E.g. log2(8) =3.
    1. We like base 2 because of bits.  For example, the number of bits you need to represent an integer is logarithmic (base 2) on its value --      
      0
      1
      10
      11
      100
      101
      110
      111
      1000

      20
      21

      22



      23
      0
      1
      2
      3
      4
      5
      6
      7
      8
  4. If you think about it, a logarithmic complexity would be a good thing for an algorithm -- not as good as constant time, but much better than linear.
  5. In fact, a log2 complexity would be exactly as much better than linear as quadratic is worse.  [draw on graph]
  6. But can we write an algorithm of logarithmic complexity?

III. Trees for Sorting and Searching

  1. A tree is like a list, except that instead of a next reference, each node has two references, left and right.
    1. the objects at left & right are called children.
    2. the object that points directly to another object is called its parent.
    3. the top parent is called the root (back to the tree metaphor.)
  2. An algorithm for sorting a list using a tree.
    1. Take the first number in the list, make it the root.
    2. For each number in the list, start at the root object
      1. If the new number is less than the current object, go to its left child.
      2. If the new number is larger, go to its right child.
      3. (assume you can ignore duplicates.)
      4. If the child you've gone to is empty, insert your number in a new object.
      5. Otherwise go back to the most recent 1 (that this is 5 of.)
  3. Someone asked in class about the duplicate case (I'd actually skipped step 3 above on the board.)  If you are only looking for the existance of an object (see our search below) duplicates can be ignored / skipped.  If you do care about how many items you have of each type, you can have your node objects contain a variable that counts how many instances you found of that object.  You could even imagine keeping a list of objects at each node in the tree if you like!  You'll learn more about this when data structures are covered more formally later in the course.
  4. Example:  15, 23, 14, 27, 3, 18, 5 gives:
                       15
    / \
    14 23
    / / \
    3 18 27
    \
    5
  5. The depth of this tree grows roughly at log2 relative to the number of items.
    1. Notice that no number will ever come to the right of the 14 (& not just because I ran out of space.)
    2. So log2 is the best case, it's unlikely to be the actual case.
    3. There are algorithms for rebalancing trees though, so if the number of items you have happens to be exactly a power of 2, you could get the optimal case.
  6. To find out whether a number is in the tree, almost the same algorithm works:
    1. Start at the root.
    2. If you find your number, return #true, if you find a null node, return #false. 
    3. Assuming you didn't just finish, if the number you are looking for is less than your current node, move to the left, otherwise to the right.
    4. Go back to step 2.
  7. If the tree is perfectly balanced, then you will only have to look at a maximum of log2 N items (where N is the length of the original list, the Number of items.)
  8. Obviously the base of the logarithm here is dependent on the number of children, so you could do even better if you had more children.
    1. But the algorithm gets more complicated if you can't just use </left vs >/right.
      1. You'll learn such an algorithm next year in databases, b-trees
    2. Another reason computer scientists like log2.
  9. You'll learn a lot more about trees (and other data structures) from Dr. Paddon in the second half of this unit.

IV. Big O Notation.

  1. Normally, certain aspects of the algorithm (for example, its complexity, or the duration of its longest operations) have the most impact on how long it takes.  These aspects are said to dominate.
  2. To make it clear that we are only talking about the dominant factor when we analyze an algorithm, we talk about the order of the algorithm. 
    1. The order has its own notation, called "big O" and written like this:  O(n) (read "Order n").
    2. Order notation throws out constants.  For example, if an algorithm takes 2n time (for example, reads through a whole list twice) it is still O(n), the same order as if you only went through the list once.
    3. Sometimes constants matter --- for example, if it's going to take a year to go through the list --- but usually they don't.
  3. Since "N is the length of the original list, the Number of items", a function of linear complexity is said to be O(n).
  4. Just as constants don't matter, neither do lower-order components of an algorithm.  So if an algorithm actually takes n5 + n3 + 7, we say its complexity is O(n5).
  5. This is only for addition!  Obviously  if the algorithm takes n5 * n3 its complexity is O(n8)!
  6. Since ultimately the most important thing about algorithms isn't their exact rate of growth, but rather what the curves look like (remember the graph) we often don't even bother to specify the base for a logarithm or the exponent for a polynomial-time algorithm.  We just say "logarithmic" or "polynomial".

V. Types of Complexity & Their Dominance Relations

This table is shamelessly cribbed from Gerald Kruse's page on Algorithm Efficiency (which also tells you about big-O arithmetic!  Something I've never used, but then I do AI & psychology, not theory.)

Let  a, b, c, and d denote constants

In order of dominance (top is most dominant).  Realize that algorithms of higher dominance are bad!

Function

condition 

common name

Nn

 

 

N!

 

N factorial 

an

dominates bn if a > b 

exponential

Nc

dominates Nd if c > d 

polynomial (cubic, quadratic, etc.)

n log n

 

n log n

n

 

linear 

log n

 

logarithmic (regardless of base) 

1

 

constant 



VI. Criteria for Evaluating an Algorithm, Revisited 

  1. What do you need to know about algorithms and complexity?
  2. You will be thinking about several different cases:
    1. Worst case,
      1. If everything is organized in the worst possible way, how long will it take?
    2. Best case,
      1. If you are really lucky, what's the fastest it could run?
    3. Average or expected case.
      1. What is the most probable situation?
  3. For example, with our tree-searching algorithm, the best case is log2n, the worst case (if everything was already sorted when we started, so it just makes one big list) is n.  But if we know that the original ordering is random, then we can be fairly certain that the real order is pretty close to log2n.
  4. In the old days, people mostly were obsessed with the worst case, but nowadays we often care more about the expected case.  This is because if we approach the problem as a system issue rather than strictly an algorithmic one, we can often recognize & terminate a worst-case scenerio.  But this depends on how bad and how frequent worst-cases are.

page author: Joanna Bryson
6 Feb 2007