CM10135 Lecture3 2005

II. Logarithms & Logarithmic Complexity

You need to remember what logarithms are to understand complexity analysis.

More review about logarithms.
A slightly heavier logarithm page.
But don't worry -- if you understand this page, that's all you need for this unit.

The default base for log is 10. So log(100)=2 is just another way to say 10² = 100.
In computer science, we normally use base 2. E.g. log₂(8) =3.

We like base 2 because of bits. For example, the number of bits you need to represent an integer is logarithmic (base 2) on its value --

0	1	10	11	100	101	110	111	1000
	2⁰	2¹		2²				2³
0	1	2	3	4	5	6	7	8

If you think about it, a logarithmic complexity would be a good thing for an algorithm -- not as good as constant time, but much better than linear.
In fact, a log₂ complexity would be exactly as much better than linear as quadratic is worse. [draw on graph]
But can we write an algorithm of logarithmic complexity?

III. Trees for Sorting and Searching

A tree is like a list, except that instead of a next reference, each node has two references, left and right.

the objects at left & right are called children.
the object that points directly to another object is called its parent.
the top parent is called the root (back to the tree metaphor.)

An algorithm for sorting a list using a tree.

Take the first number in the list, make it the root.
For each number in the list, start at the root object

If the new number is less than the current object, go to its left child.
If the new number is larger, go to its right child.
(assume you can ignore duplicates.)
If the child you've gone to is empty, insert your number in a new object.
Otherwise go back to the most recent 1 (that this is 5 of.)

Someone asked in class about the duplicate case (I'd actually skipped step 3 above on the board.) If you are only looking for the existance of an object (see our search below) duplicates can be ignored / skipped. If you do care about how many items you have of each type, you can have your node objects contain a variable that counts how many instances you found of that object. You could even imagine keeping a list of objects at each node in the tree if you like! You'll learn more about this when data structures are covered more formally later in the course.

Example: 15, 23, 14, 27, 3, 18, 5 gives:

                   15
                  /  \
                 14   23
                /     / \
               3    18   27
                \ 
                 5

The depth of this tree grows roughly at log₂ relative to the number of items.

Notice that no number will ever come to the right of the 14 (& not just because I ran out of space.)
So log₂ is the best case, it's unlikely to be the actual case.
There are algorithms for rebalancing trees though, so if the number of items you have happens to be exactly a power of 2, you could get the optimal case.

To find out whether a number is in the tree, almost the same algorithm works:

Start at the root.
If you find your number, return #true, if you find a null node, return #false.
Assuming you didn't just finish, if the number you are looking for is less than your current node, move to the left, otherwise to the right.
Go back to step 2.

If the tree is perfectly balanced, then you will only have to look at a maximum of log₂ N items (where N is the length of the original list, the Number of items.)
Obviously the base of the logarithm here is dependent on the number of children, so you could do even better if you had more children.

But the algorithm gets more complicated if you can't just use </left vs >/right.

You'll learn such an algorithm next year in databases, b-trees

Another reason computer scientists like log₂.

You'll learn a lot more about trees (and other data structures) from Dr. Paddon in the second half of this unit.

IV. Big O Notation.

Normally, certain aspects of the algorithm (for example, its complexity, or the duration of its longest operations) have the most impact on how long it takes. These aspects are said to dominate.
To make it clear that we are only talking about the dominant factor when we analyze an algorithm, we talk about the order of the algorithm.

The order has its own notation, called "big O" and written like this: O(n) (read "Order n").
Order notation throws out constants. For example, if an algorithm takes 2n time (for example, reads through a whole list twice) it is still O(n), the same order as if you only went through the list once.
Sometimes constants matter --- for example, if it's going to take a year to go through the list --- but usually they don't.

Since "N is the length of the original list, the Number of items", a function of linear complexity is said to be O(n).
Just as constants don't matter, neither do lower-order components of an algorithm. So if an algorithm actually takes n⁵ + n³ + 7, we say its complexity is O(n⁵).
This is only for addition! Obviously if the algorithm takes n⁵ * n³ its complexity is O(n⁸)!
Since ultimately the most important thing about algorithms isn't their exact rate of growth, but rather what the curves look like (remember the graph) we often don't even bother to specify the base for a logarithm or the exponent for a polynomial-time algorithm. We just say "logarithmic" or "polynomial".

V. Types of Complexity & Their Dominance Relations

This table is shamelessly cribbed from Gerald Kruse's page on Algorithm Efficiency (which also tells you about big-O arithmetic! Something I've never used, but then I do AI & psychology, not theory.)

Let a, b, c, and d denote constants

In order of dominance (top is most dominant). Realize that algorithms of higher dominance are bad!

Function	condition	common name
Nⁿ
N!		N factorial
aⁿ	dominates bⁿ if a > b	exponential
N^c	dominates N^d if c > d	polynomial (cubic, quadratic, etc.)
n log n		n log n
n		linear
log n		logarithmic (regardless of base)
1		constant

VI. Criteria for Evaluating an Algorithm, Revisited

What do you need to know about algorithms and complexity?
You will be thinking about several different cases:

Worst case,

If everything is organized in the worst possible way, how long will it take?

Best case,

If you are really lucky, what's the fastest it could run?

Average or expected case.

What is the most probable situation?

For example, with our tree-searching algorithm, the best case is log₂n, the worst case (if everything was already sorted when we started, so it just makes one big list) is n. But if we know that the original ordering is random, then we can be fairly certain that the real order is pretty close to log₂n.
In the old days, people mostly were obsessed with the worst case, but nowadays we often care more about the expected case. This is because if we approach the problem as a system issue rather than strictly an algorithmic one, we can often recognize & terminate a worst-case scenerio. But this depends on how bad and how frequent worst-cases are.

CM10135 / Programming II: Lecture 4

More on Algorithms and Complexity:

Logarithmic Complexity and the Big O

-I. Reverse

I. Levels of Complexity (review from Last Week + examples)

II. Logarithms & Logarithmic Complexity

III. Trees for Sorting and Searching

IV. Big O Notation.

V. Types of Complexity & Their Dominance Relations

VI. Criteria for Evaluating an Algorithm, Revisited