CM10135 / Programming II:   Assignment 1 (2007)

Manipulating Memory:  Hash Tables to Sort & Search

Assignment Date:  Tuesday,  14 February  Due date:  Tuesday, 28 February at 4:00pm


This assignment is the first of three you will have in this course.  Your mark in Programming II is composed of 40% coursework, 60% exam.  This assignment is worth 10% of your mark.

Objectives:

This assignment is in two parts.  First, we will ask you to write a proper hash class to make certain you understand hashing (as well as programming in Java).  Then we will ask you to manipulate it into an improper hash class to make certain you understand what is going on with the computer's memory when your programs you've written are running.  We also are going to make very sure that you know how to properly format & document your code, including giving proper credit for any external help you get.

Please note that you should be certain your code runs outside of BlueJ (or eclipse) --- that is, on the command line.  We want to know that you can compile your code on your own, and that you know what the main function of your program is.  BlueJ is nice for helping you learn Java, but we need to take the training wheels off now & make sure you've really learned to ride your bicycle.

This assignment is expected to take the average student (one who got about 58 in Programming I) about 25 hours.  As mentioned in class, this is a double unit, so it is expected you spend about 20 hours a week on the course, including lectures, lecture preperation, labs and self study.  We expect you will spend all the self-study time in the next two weeks on this assignment (though note you will still have time for labs & lecture preparation.)   Of course, programming takes different people radically different amounts of time, so some students will need to spend considerably more or less than 25 hours on this assignment.

What to Sort & Search, and a note on plagiarism:

This assignment will build on some of the code you wrote for last week's tutorial.  In particular, you should reuse the objects that you created to be sorted, and also the code that generates arrays full of those objects.  If you have not completed last week's tutorial, you need to start off by completing instructions 3 & 4 of that tutorial.  If you have completed that tutorial but the code is in your partner's directory, either rewrite it or ask your partner to email it to you.  Note: Don't worry, we will not be marking the sorts you wrote last week.

An important note on plagiarism:  Plagiarism occurs any time that you use someone else's ideas without acknowledging them.  Plagiarism is much, much worse than asking for help.  If you get help on any part of your assignment you should acknowledge this in the comments.  This includes asking coursemates, tutors, or other staff questions, and of course any code or ideas you get from  books or the Internet.  Say exactly where your ideas came from, and exactly how much of the code you hand in is your own.  In particular, in this assignment, you should acknowledge your partner from last week for helping you to build your initial object, or else explain why you didn't use the code you wrote in tutorial.  We will be looking for this comment!


Phase I:  What to do

In the first phase, you will be writing a new class MyHash which will let you sort & search your objects you created last week.  You will also need to add a new driver class for your main routine.
  1. Write an informal specification of your hash & hashable object classes. 
    1. MySortableObject (from the objects you sorted in your tutorial last week) should include:
      1. The key attribute you are using for your sort (e.g. age if you used last week's defaults).  Only this time, make sure the maximum key size is exactly 35 (even if this means you have old children!).
      2. A method key that returns the key value.
    2. MyHash should include:
      1. An instance variable hashTable.
      2. A class method called hash which generates a hash-value for any integer key.  
        1. Your hash function should be:  hash(key) = key mod 22.  Make sure hashTable is the appropriate size for using this function!
        2. Be sure to implement some simple method for dealing with collisions!  Two possibilities were described in last week's lecture, I expect you will probably use the one described in the notes where that link points to.  
          1. Note: There's no reason to sort elements that are in a bucket together --- there should always be few enough that you can look through them in roughly constant time.  (Look at the analysis part of the lecture.)
          2. Note: Because you are generating your key values randomly, you may have two objects with the same key value!  Feel free to ignore this if you want; you can get a good pass if you store all the instances but only ever retrieve the first one you stored.  If you want to move towards distinction, feel free to think of a more clever way to deal with this, such as employing a second key, or rejecting key values that already exist in the hash, etc.
        3. A method called hashStore which puts an object into the hash table, using the hash function.
        4. A method called hashFind which takes a key and returns an object or null if no such object is in the table.
    3. The driver function should include:
      1. your main function,
      2. a method for generating an array of different lengths of objects with random key values (this is also from last week's tutorial), and
      3. it should accept a command line argument for how many objects you will be sorting.
        1. See below for "test your classes" for more details of what this will need to do.
  2. Write the code for your new class.
  3. Test your classes!
    1. You will be testing your function on arrays of size 14, 16, 18 & 20 of your class.
      1. Don't forget, you should be able to do this using a command line argument.  We will test your program with other values when we mark it!
      2. Sorry, I used the word "arrays" here in the ordinary language sense.  N, the number of objects is 14, 16, 18 & 20 on different test runs.  You don't actually have to put these in an array.  However, you do need to record what the objects are somehow in order to run and check the tests below (step 2.5).
    2. For each array size (really, value of N), do the following
      1. Generate the array.
      2. Generate an instance of MyHash to sort the instances.
      3. For each element in the array:
        1. print out the key value as part of the results,
        2. use hashStore on each object in the array.
      4. Now each instance of MySortableObject should also live in a single instance of MyHash.
      5. For each number from 1-10, check to see:
        1. Does hashFind correctly say whether the number is in the array?
        2. How many objects did hashFind look at before it gave you the right answer?  (you will have to make hashFind report this number to you.)
  4. Analyze your results:
    1. What was the average number of items that hashFind needed to check for each array size?
    2. How close did hashFind come to a constant-time search?
    3. Did the analysis of hash table statistics you learned in lecture predict your results?

Phase II:  What to do

For certain applications we want to know how many instances we have created of a particular class, and to be able to find those instances quickly.  This could be useful if (for example) each instance represents a connection to a user, and the users sometimes login & logout.  Being able to find the object respresenting logged-in users might be one way to let them communicate to each other, or to let a system operator communicate to particular users.  This could be useful for a large internet multi-player game, for example.

In Phase II you will create a new class, MyHashedClass, where a hash is used to keep track of each instance of your class.  This involves making a copy of your SortableObject class, and then sticking in some of the code from your MyHash class.  So it shouldn't take you too long as long as you understand what is going on.
  1. Create a class variable for MyHashedClass.
  2. Make sure hashStore & hashFind are class methods.
  3. Make certain that whenever you add a new instance of MyHashedClass, a call to hashStore is made.
  4. Be sure to write a main function for this class!  This class is its own program all itself.
  5. Be certain not to make any calls to any of the classes you were using in Phase I --- MyHashedClass should be able to stand on its own.
  6. Test your class on at least two of the array sizes you used before --- it should work just the same, except that you will be forced to start the program over to create a new hash.

What to hand in:

  1. Your informal description of your MySortableObject, MyHash & their algorithms.
  2. A printout of your fully-commented code for all four classes
  3. A short write up of the results of your tests, including some analysis about how well the different-sized arrays worked with your hash function. 
  4. A zip file containing uncompiled versions of your code.
Remember, marks are not given just because a program compiles or runs.  It should run robustly, and it should be well commented & well documented as specified above.

How you will be marked:


The marks allocation for this assignment is as follows:
  1. Informal Specification 10% 
  2. Fully commented, working programs implementing your design 70% 
  3. Results and analysis 20 % 
Total 100%

Threshold for Pass (40%+)
  1. Brief discussion of requirements and program design.
  2. Simple program. Basic level of commenting. Code layout is mostly correct.  Code does the minimum possible functionality (maybe even less if exceptionally well commented & analyzed); much running and testing must be done by hand.
  3. Minimum of results, but results are largely correct.  Some conditions not tested.
  4. Little or no analysis.
Good Pass (~55%)
  1. Specification covers all aspects of problem. Good program design. Algorithms are given at an appropriate level of detail and are well explained.
  2. Program is a good implementation of design, or possibly better than the design. Layout of code is correct and clear. Code is generally concise but comprehensible, identifiers well named.
  3. Results are correct, clearly described, and include good analysis. 
  4. Some discussion of the program and the development process you undertook, ideas for improvement. 
Distinction (70%+)
  1. Specification covers all aspects of problem. Algorithms are given at an appropriate level of detail and are well explained. 
  2. A sophisticated program showing good use of Java, but not overly complex. Commenting is appropriate and gives sufficient information. Layout of code is correct and clear. 
  3. Results are clear and presented in an interesting way. Results are sufficient to give scope for a strong analysis. 
  4. Excellent analysis of results, possibly considering alternative implementations.  Shows a thoughtful discussion of the program and the development process you undertook. Detailed and thoughtful criticism of your work, and well-thought-out ideas for improvement.

Marks Table:



Specific Criteria Marks
Out Of
Informal Specification

                       
10




Code
Formatting

10

Commenting

15

Correctness of hash

5

Correctness of hashStore, hashFind

15

Handling of collisions

10

Correct decomposition of 4 Classes

15




Results & Analysis Correct, well-formatted Results

10

Quality of Analysis

10




Penalties
e.g. no instructions,
doesn't compile,
difficult to test

Explanation:


(20)




Total





page author: Joanna Bryson
10 February 2007 (checked by AMB 9 Feb 07)
clarifications added in bright orange on 20 Feb.