CM10135 / Programming II: Lecture 3

Introduction to Algorithms and Complexity

"In teoria, non c'e' differenza tra teoria e pratica. Ma in pratica c'e'"

(In theory, there is no difference between theory and practice. But, in practice, there is.)

-- Jan L.A. van de Snepscheut

Note: This lecture actually always starts out with Quicksort, but I leave that on the sorting lecture notes page.

-I. Housekeeping:

How are the tutorials going?
Link for checking availability of computer labs.
No required text for this course.

There may be no java books that are also good CS texts.
The ones Dr. De Vos suggested are fine.
The on-line notes may link to more resources.

Books for you guys if you really want:

Books only cover GUIs, threads, events, applets, networking (hard book), basic programming (easy book).
Books don't cover algorithms, complexity, data structures, but see front web page for excellent notes on data structures, sorting (with animations!), searching and complexity.
Books:

easier: Java Programming Today, Barbara Johnston.
harder: Learning Java (O'Reilly, tiger mum & cubs) Niemeyer & Knudsen.
May also want to look at Object-Oriented Programming with Java by David Barnes (the guy who wrote the Blue Jay book.) It has nice networking examples too.
An Introduction to Network Programming with Java, by Jan Graba (new Sept 2006), also looks good. It doesn't only talk about networking; it talks about file handling, threads, servlets, CORBA (middleware) etc. I haven't really read it thoroughly, but it looked good enough I've ordered a few copies (bookstore & library.)

I. Criteria for Evaluating an Algorithm

Main Criteria

Speed
Size
Risk of failure.
Ease of maintenance.

Speed is the main criteria that we'll talk about.
Speed and Size are the two things Computer Scientists might be talking about when they talk about complexity formally.
Of course, the conventional meaning of complexity (how complicated it is to understand the algorithm) affects both risk and maintenance.

Often worth going with something slightly slower if it will be easier to maintain.
Software development is often more of a bottleneck than processor speed.

Good programmers are more expensive than fast computers.
Brooks' Law: Adding manpower to a late software project makes it later.
1. This law is so old it's not even gender neutral (In fact, from "The Mythical Man-Month", 1975) but it's still true.
2. Slashdot review of The Mythical Man-Month 20th Anniversary Edition.
But sometimes, time really matters.

Graphics, games engines.
Database engines.
Simulations:

Weather simulations.
Social or political simulations of millions of agents,
Science: the evolution of life, evolution of culture, the big bang, hydrogen atoms, brain cells etc.

If you are a good programmer with spare time & interested in modelling the evolution of culture, come visit me during my office hours. AmonI
Right after the first time I gave this lecture (2004) I got a talk announcement on the importance of algorithms for molecular biology / drug discovery from Bruce R. Donald. Whether you care about helping humanity or making money (not an xor) that's an important research field.

How do you measure Speed?

Stop watch usually isn't a practical way to check (though see the quote above!)
Speed of one instance of an algorithm depends on:

the processor it's run on
other components (e.g. graphics card, bus)
What else is happening on the computer.
Amount of RAM available.

This is only true because read & write operations take time, even to memory.
But they take more time if they have to go to disk.
If a computer runs out of space in RAM, it swaps memory onto disk.
Very bad thing: if most time is spent swapping, little on the computation.
Can happen if working with very large data sets, not processing the data efficiently.

But this isn't what most computers scientists are talking about when they talk about time complexity.

Algorithms are normally analyzed in terms of:

The number of operations they perform.
The types of operations they perform.
How the number of operations they perform changes if parameters change. The key point!!

These criteria are the same for both time and space.
Usually ignore most of the operations and focus on a few that are most significant

e.g. for time disk reads, hard arithmetic
e.g. for space `new' commands (things that allocate more memory.)

How the number of operations they perform changes if parameters change?

This question is referred to as scaling.
Scaling happens with respect to some parameter.
Example: As an animal gets taller, its weight typically scales at height³.

This is because weight is directly related to volume, not height.
volume = height x width x depth. If you assume width & depth are also correlated to height, then volume is correlated to height³.
Bone strength can grow as height² but exoskeletons can't, so vertebrates can grow bigger than insects.

Example in algorithms: finding the length of an array: How does this scale on the number of items in a collection?

Just look up a variable in the collection object that tells you its length.

Always takes the same number of steps however many items there are.
This is called a constant time algorithm.

Start at the beginning and count until you reach the last item (which must be marked somehow, like in the lists.)

The number of steps is dependent on the number of objects.
This is a linear algorithm.

If you are checking for how many unique items are in the collection, then for each item of the list you will have to check if any of the other items are the same, so you go through the list once for each item in the list.

The number of steps is the square of the number of items in the collection.
This is said to scale quadratically.

What do you need to know about algorithms and complexity?

You'll get a longer list of these next week, for now I just want you to get the feel.
You should be able to plot a graph with the number of items on the X axis, and time (or space) on the Y axis.
You will be thinking about several different cases:

Worst case,
Best case,
Average or expected case.

drawing of constant, linear & quadratic time plotted with respect to # of objects

Notice that an algorithm may look very good at a low N, but then turn out to be a nightmare at higher N! On the other hand, if you know for certain that you will only have low N for an application, you may still want to consider that algorithm.

page author: Joanna Bryson
31 January 2007