CM10228 / Programming II: Lecture 2

Example Algorithms:

Sorting

I. Housekeeping:

More about the tutorial

Understand programming better if you get a feel for what the components are.
Understand memory better if you can reverse it --

You should be able to draw what you are doing (like I did yesterday).
Here's a web page with similar diagrams.

Understanding process & memory is central to understanding algorithms & complexity.
Understanding algorithms & complexity is central to the exam!
The main reason to learn Lisp is to learn to think about programming differently.

Languages affect how you think!
Lisp has also been on the exam the past several years & probably will be again this year.
and you'll also use it in your second year
But those facts shouldn't be necessary to motivate you.

Text books also not mandatory, covered next week, or see the course web page...
All of the stuff from lecture & much, much more are available from the course web page!

http://www.cs.bath.ac.uk/~jjb/here/CM10228/course-sched.html
Links to pages with movies of sort algorithms, graphs, other people's explanations of the same material, etc.

Anyone not assigned to a tutorial?
Rough structure for the next few lectures:

Sorting algorithms today.
Intro to algorithms & complexity tomorrow.
Next week we'll look at the complexity of the algorithms we're learning today.
Today's algorithms are in pseudo code. In next week's labs you can write them in Java.

pseudo code is a fake generic programming language used to communicate in papers & talks.
Some high-level languages are designed to be like pseudo code, e.g. python.

II. Sorting in General

Sorting is a useful skill, because it makes searching faster.
Almost 60 years of research in artificial intelligence indicate that searching is most of the problem of intelligence.

You have to find the right thing to do.
But looking for it takes time.
And the right thing to do right now can be quite different from the right thing to do right now.
If things are sorted, then individual things are easier to find (though still not trivial.)

Phone book.
Google.

More about searching next week.

You are mostly learning about how to search so you are familiar with some example algorithms.

Gives us examples for learning how to analyze and compare algorithms.
Helps you practice / exercise your brain / think like a programmer.
These sometimes crop up on Computer Science tests

e.g. the American Graduate Record Exam (GRE)
standard training for CS majors since the days before there were easily accessible libraries of functions.

Items are sorted on the basis of a key.

The key is an attribute or set of attributes about a class of objects that can be used to create a useful ordering.
Set of attributes may be necessary if no one attribute creates a sufficiently predictable ordering.
Examples:

Dictionary definitions

Word they define is used as the key.
Sorted alphabetically on the key.

Students

Student ID is most useful key, guaranteed unique.
May be sorted by name, marks, age etc.
Using non-unique keys can result in errors (e.g. misdelivery of post, pay).

Google

No obvious ordering for web pages.
Creates indecies based on compression of content words.
Also uses access information, links to pages, and other secret things.

For the algorithms below, we'll just sort numbers

But they'll work fine on more interesting objects...
If you can figure out good keys for ordering them.

III. Selection Sort

Basic idea:

You have an array of unsorted numbers.
You have an array big enough to stick each thing in one slot.
You put the smallest available number in the lowest available index.

What do we need to do this?

An integer to keep track of our next available lowest index. sortedIx
Another integer to keep track of where we are while we're searching for the lowest number. searchIx
Another integer to keep track of what the current minimum value is. min
Another integer to keep track of where that value was. minLoc

Initial pseudo code;

array int messy, sorted;  // these have come from somewhere & must have equal length
int sortedIx, searchIx, min, minLoc;
// this also assumes the system defines a number BIGNUM that's bigger
// than anything in messy.

// outer loop over the sorted array -- when everything's sorted we're done
for (sortedIx = 0; sortedIx < sorted.length; sortedIx++) {
    // initialize min to something bogus
    min = BIGNUM; minLoc = -1;
    // inner loop over messy array: find the smallest number in messy. 
    for (searchIx = 0; searchIx < messy.length; searchIx++) {
       // if it's the smallest yet seen.
       if (messy[searchIx] < min) { 
	    min = messy[searchIx];
            minLoc = searchIx;
            }
        } // for searchIx
    // now stick that small value in sorted
    sorted[sortedIx] = min;
    // and make sure it won't get found again
    messy[minLoc] = BIGNUM;
    } // for sortedIx

This isn't actually the standard algorithm for selection sort

We can save space by doing it in just one array
We can save time by not looking at the values we already sorted.
Does anyone care if we save this much space or time?

Probably not, but this is practice.
Maybe, depending on how big the array is.

Here's the standard one. The difference is that now we swap the smallest value with whatever was at the current point in the list:

array int sorta;  // this comes from somewhere only one now!
int sortedIx, searchIx, min, minLoc;
int temp; // add a temporary location for swapping values.
// don't need BIGNUM anymore...

// outer loop over the sorted array -- when everything's sorted we're done
for (sortedIx = 0; sortedIx < sorta.length; sortedIx++) {
    // now min is set to the thing in the slot we're currently trying
    // to fill.
    min = sorta[sortedIx]; minLoc = sortedIx;
    // inner loop over messy part of the array: find the smallest number
    for (searchIx = sortedIx; searchIx < search.length; searchIx++) {
       // if it's the smallest yet seen.
       if (sorta[searchIx] < min) { 
	    min = sorta[searchIx];
            minLoc = searchIx;
            }
        } // for searchIx
    // move the old value to temp 
    temp = sorta[sortedIx];
    // now stick that small value in the right place
    sorta[sortedIx] = min;
    // finish the swap
    sorta[minLoc] = temp;
    } // for sortedIx

IV. Insertion Sort

Basic idea:

You have an array of unsorted numbers.
Split the array between sorted & unsorted parts (initially, take the first element as sorted).
Stick the first number of the unsorted part of the array into the right place in the sorted part.

What do we need to do this?

An integer to keep track of sorted/unsorted boundary. boundaryIx
A place for the item we're currently trying to insert into the sorted bit. currentItem
Another index to keep track of where in the sorted bit we are. searchingIx

Pseudo code;

array int sorta;  // this comes from somewhere in a messy state
int boundaryIx, searchingIx; // these are indecies
int currentItem  // this is the temporary storage for th item we're moving...
                 // it's old home gets filled with stuff making space for it.

// outer loop over the sorted array -- when everything's sorted we're done
for (boundaryIx gets the values 1 through sorta.length) {
    // save new item
    currentItem = sorta[boundaryIx]    
    // inner loop starts at boundary and works down
    for (searchingIx = boundaryIx - 1; (searchingIx >= 0) AND (sorta[searchingIx] > currentItem); 
                                        searchIx--) {
        // haven't found the right place yet or wouldn't be in here, so 
        // just move stuff.
        sorta[searchingIx+1] = sorta[searchingIx];
        } // for searchingIx
    // if we're out of the loop, so searchingIx is pointing just neg. of where we want to be
    sorta[searchingIx+1] = currentItem;
    } // for boundaryIx

This part will come in the third lecture, really.

V. Quick Sort

Basic idea:

You have an array of unsorted numbers.
Pick one number, the pivot point.
Put all the numbers smaller than the pivot to its left, & all the larger numbers to its right.
Call quick sort recursively on the numbers on each side of the pivot.

What do we need to do this?

A place for remembering the pivot value: pivotValue
And where it lives: pivotIx
This time we need a way to do that splitting thing --- the program is really in two parts.
Algorithm & variables for splitting:

Start one index, topIx from the top of the array, move it til it finds a number smaller than pivotValue.
Start one index, botIx from the bottom of the array, move it til it finds a number larger than pivotValue.
Swap these two values (requires a temp variable.)
Terminate when the two Indecies cross, this is the new-found index location for the pivot value.

Actually need some more variables to keep track of the ends when we do this recursively.

Pseudo code main sort;


QuickSort(int sorta[], first, last)

int pivotIx, Split();

// when first >= last, we're through -- we've sorted the smallest possible sized bin
if  (first < last) {
    // This is the real work, and it returns the location for the pivot point
    pivotIx = Split(sorta, first, last);
    // Now the recursive calls
    QuickSort(sorta, first, pivotIx-1);
    QuickSort(sorta, pivotIx+1, last);
    } // if first < last -- all done!

Pseudo code split;


// must have type integer since returns pivotIx
int Split(int sorta[], first, last)

int topIx, botIx; // indecies
int pivotVal, temp; // items

pivotVal = sorta[first]; // start with the first number (arbitrarily: as good a guess as any)
botIx = first, topIx = last+1;  // start looking just past each end (see why below)
while (true) {
 	// look for a small number above the pivot (or the pivot)
	while (sorta[--topIx] > pivotVal AND topix > botix) {}; 
 	// now look for a big number below the pivot (or the pivot)
 	while (sorta[++botIx] < pivotVal AND topix > botix) {};
 	// if botix and topix have met, you're done
 	if (botIx  >= topIx) { break;} // stops the while (true) loop
 		
 	//otherwise, switch the two misaligned things you have found
        temp = sorta[topIx];
        sorta[topIx] = sorta[botIx];
	sorta[botIx] = temp;
 		
 	} // while forever (well, until the break)
//now we've also found the location for the pivot...
temp = sorta[topIx];  		// we know this is smaller than or equal to the pivot
sorta[topIx] = sorta[first];  	// so we swap it with the pivot
sorta[first] = temp;
return(topIx);

Coding quick sort is a great time to learn about using corner / edge cases for debugging

What happens if the first number is the smallest one?
what if the first number is the biggest one?

VI Extra Bonus Knowledge

This isn't in the course description, so you don't need to know it. But...
If you enjoy thinking about the above algorithms (if you are a geek like me), you might really enjoy those algorithms in parallel. Very mind bending.
For the record, here's split in lisp (how I tested the above). This just returns the array though (not the pivot point), so I could quickly check it was right. I have to admit, this isn't the kind of thing lisp does best.

(defun split (sorta first last)
  (let ((topix (+ last 1))
        (botix first)
        (pivotval (aref sorta first)))
    (catch 'done
      (loop 
            (setf topix (do ((tix (- topix 1) (- tix 1)))
                            ((or (>= botix tix) 
                                 (<= (aref sorta tix) pivotval))
                             tix)))
            (setf botix (do ((bix (+ botix 1) (+ bix 1)))
                            ((or (>= bix topix) 
                                 (>= (aref sorta bix) pivotval))
                             bix)))
            (if (>= botix topix) (throw 'done sorta))
            (let ((temp (aref sorta topix)))
              (setf (aref sorta topix) (aref sorta botix))
              (setf (aref sorta botix) temp))
            ))
    (let ((temp (aref sorta first)))
      (setf (aref sorta first) (aref sorta topix))
      (setf (aref sorta topix) temp))
      sorta ; should return topix not sorta to be part of qsort
    ))

page author: Joanna Bryson
5 February 2015