II. Logarithms & Logarithmic Complexity
- You need to remember what logarithms are to understand complexity
analysis.
- More review
about logarithms.
- A slightly heavier logarithm
page.
- But don't worry -- if you understand this page, that's
all you need for this unit.
- The default base for log is 10. So log(100)=2 is just
another way to say 102 = 100.
- In computer science, we normally use base 2. E.g. log2(8)
=3.
- We like base 2 because of bits. For example, the number
of
bits you need to represent an integer is logarithmic (base 2) on its
value
--
0
|
1
|
10
|
11
|
100
|
101
|
110
|
111
|
1000
|
|
20
|
21
|
|
22
|
|
|
|
23
|
0
|
1
|
2
|
3
|
4
|
5
|
6
|
7
|
8
|
- If you think about it, a logarithmic
complexity would be a good thing for an algorithm -- not as good as constant
time, but much better than linear.
- In fact, a log2 complexity would be
exactly as much better than linear as quadratic
is worse. [draw on graph]
- But can we write an algorithm of logarithmic complexity?
III. Trees for Sorting and Searching
- A tree is like a list, except that instead of a next reference, each node has
two references, left and
right.
- the objects at left & right are called children.
- the object that points directly to another object is called its
parent.
- the top parent is called the root (back to the tree
metaphor.)
- An algorithm for sorting a list using a tree.
- Take the first number in the list, make it the root.
- For each number in the list, start at the root object
- If the new number is less than
the current object, go to its left child.
- If the new number is larger, go to its right child.
- (assume you can ignore duplicates.)
- If the child you've gone to is empty, insert your number in a
new object.
- Otherwise go back to the
most recent 1 (that this is 5 of.)
- Someone
asked in class about the duplicate case (I'd actually skipped step 3
above
on the board.) If you are only looking for the existance of an
object
(see our search below) duplicates can be ignored / skipped. If
you
do care about how many items you have of each type, you can have your
node
objects contain a variable that counts how many instances you found of
that
object. You could even imagine keeping a list of objects at each
node
in the tree if you like! You'll learn more about this when data
structures
are covered more formally later in the course.
- Example: 15, 23, 14, 27, 3, 18, 5 gives:
15
/ \
14 23
/ / \
3 18 27
\
5
- The depth of this tree grows roughly at log2 relative
to the number of items.
- Notice that no number will ever come to the right of the 14
(& not just because I ran out of space.)
- So log2 is the best case, it's unlikely to be the
actual case.
- There
are algorithms for rebalancing trees though, so if the number of items
you
have happens to be exactly a power of 2, you could get the optimal case.
- To find out whether a number is in the tree, almost the same
algorithm works:
- Start at the root.
- If you find your number, return #true, if you find a null node,
return #false.
- Assuming
you didn't just finish, if the number you are looking for is less than
your
current node, move to the left, otherwise to the right.
- Go back to step 2.
- If the tree is perfectly balanced, then you will only have to
look at a maximum of log2 N items (where N is the
length of the original list, the Number of
items.)
- Obviously the base of the logarithm here is dependent on the
number
of children, so you could do even better if you had more children.
- But the algorithm gets more complicated if you can't just use
</left vs >/right.
- You'll learn such an algorithm next year in databases, b-trees
- Another reason computer scientists like log2.
- You'll learn a lot more about trees (and other data structures)
from Dr. Paddon in the second half of this unit.
- Normally, certain aspects of the algorithm (for example, its
complexity,
or the duration of its longest operations) have the most impact on how
long
it takes. These aspects are said to dominate.
- To make it clear that we are only talking about the dominant
factor when we analyze an algorithm, we talk about the order of
the algorithm.
- The order has its own notation, called "big O" and written like
this: O(n) (read "Order n").
- Order notation throws out constants. For example, if an
algorithm
takes 2n time (for example, reads through a whole list twice) it is
still
O(n), the same order as if you only went through the list once.
- Sometimes constants matter --- for example, if it's going to
take a year to go through the list --- but usually they don't.
- Since "N
is the length of the original list, the Number of
items", a function of linear complexity is said to be O(n).
- Just as constants don't matter, neither do lower-order components
of an algorithm. So if an algorithm actually takes n5
+ n3 + 7, we say its complexity is O(n5).
- This is only for addition! Obviously if the algorithm
takes n5 * n3 its complexity is O(n8)!
- Since ultimately the most important thing about algorithms isn't
their
exact rate of growth, but rather what the curves look like (remember
the graph) we often don't even bother to specify the base for a
logarithm
or the exponent for a polynomial-time algorithm. We just say
"logarithmic"
or "polynomial".
V. Types of Complexity & Their Dominance Relations
This table is shamelessly cribbed from
Gerald Kruse's page on
Algorithm
Efficiency (which also tells you about big-O arithmetic!
Something I've never used, but then I do AI & psychology, not
theory.)
Let a, b, c, and d denote constants
In order of dominance (top is most dominant). Realize
that algorithms of higher dominance are bad!
Function
|
condition
|
common name
|
Nn
|
|
|
N!
|
|
N factorial
|
an
|
dominates bn if a > b
|
exponential
|
Nc
|
dominates Nd if c > d
|
polynomial (cubic, quadratic, etc.)
|
n log n
|
|
n log n
|
n
|
|
linear
|
log n
|
|
logarithmic (regardless of base)
|
1
|
|
constant
|
VI. Criteria for Evaluating an Algorithm, Revisited
- What do you need to know about algorithms and complexity?
- You will be thinking about several different cases:
- Worst
case,
- If everything is organized in the worst possible way, how
long will it take?
- Best
case,
- If you are really lucky, what's the fastest it could run?
- Average
or expected case.
- What is the most probable situation?
- For example, with our tree-searching algorithm, the best case is
log2n, the worst case (if everything was already sorted when
we started, so it just makes one big list) is n.
But if we know that the original ordering is random, then we can
be
fairly certain that the real order is pretty close to log2n.
- In the old days, people mostly were obsessed with the worst case,
but
nowadays we often care more about the expected case. This is
because
if we approach the problem as a system issue rather than strictly an
algorithmic
one, we can often recognize & terminate a worst-case scenerio.
But
this depends on how bad and how frequent worst-cases are.