Module 8: Hash Tables

Lab


Abstract: We have seen hashing as a technique for obtaining expected constant-time operations on finding or inserting elements in a set. Hashing is also used to compute digital signatures for confirming the authenticity of messages.

In this assignment, you will apply a hashing scheme in pursuit of finding occurences of one string in another, using the Rabin–Karp algorithm.

That algorithm is discussed in Section 32.2 of our text, on page 990.


Introduction

Suppose we have two strings of interest:
target
is a relatively long string of length n.
query
is a relatively short string of length m.
We are interested in finding all occurrences of query in target. For example, consider the query aaba and the target aabaacaadaabaabaa. The last character of the query occurs within the target as shown below:
a a b a a c a a d a a b a a b a a
       |                 |     |
a a b a                  |     |
                  a a b a      |
                        a a b a
Notice that the second and third matches overlap. As discused in class, the most straightforward algorithm is as follows:
   for each i in the interval [0, n)
      match = true
      for each j in the interval [0, m)
          if target[i+j] ≠ query[j]
             match = false
          end if
      end for
      if match
         // definite match of query starting at index i of target
       end if
   end for
The worst-case complexity of this approach is Θ(mn).

Rabin–Karp Algorithm

Instead of trying to match the query at each location in the target, the Rabin–Karp Algorithm computes a rolling hash of a window of m characters of the target. Near the beginning of the target, where there may not be m characters in the window, the value 0 is used for those entries outside of the target.

We will use in particular the following hash function for a window of size m in a character array, with the first character of the window at index start:

private static int hashFor(char[] chars, int start, int m) {
   int ans = 0;
   for (int i=start; i < start+m; ++i) {
      int val = 0;
      //
      // Within the bounds of the chars array?
      //
      if (i ≥ 0 && i < chars.length) {}
         val = chars[i];                        // then use the character value there
      }
      ans = mod( mod(31*ans, 511) + mod(val, 511), 511 );
   }
   return ans;
}
The above function is similar to how Java computes .hashCode() for a String, except that we confine the results to being strictly less than 511 using mod (the % operator). The following identities related to mod should be useful in this assignment:
You should investigate how large the intermediate results of addition and multiplication can get before the final mod p is applied.

Applied to our running example, the rolling hash values at the end of each window within our target string are as follows:
a a b a a c a a d a a b a a b a a
97 38 254 306 214 430 337 153 73 368 92 224 306 214 429 306 214

The Rabin–Karp (RK) algorithm is essentially as follows:

Your assignment

You must complete the constructor and the nextCh method, subject to the constraints stated below.

In terms of our running example, the following calls after instantiation of a KR(4) object would return values as follows:

To receive full credit for this lab, your solution must:

Having trouble?

Post problems on piazza, but do not post code publicly there!

How to submit your work


Submitting your work (read carefully)



Last modified 09:46:24 CDT 22 August 2016