Rabin-Karp Algorithm in Java: A Step-by-Step Guide

In the world of text searching and pattern matching, the Rabin-Karp algorithm stands out for its elegant use of hashing to find patterns efficiently. This guide will walk you through the Rabin-Karp algorithm, explain its logic, and provide a complete Java implementation.

Introduction to Rabin-Karp Algorithm

Rabin-Karp is a string matching algorithm that uses hashing to find patterns in text. It’s particularly effective for detecting multiple patterns simultaneously and works well for patterns with similar lengths.

How It Works

The algorithm relies on hashing to compare the pattern with substrings of the text. Instead of comparing strings directly, it compares their hash values, which can be computed quickly.

Step-by-Step Execution of Rabin-Karp Algorithm

Let’s break down the Rabin-Karp algorithm into detailed steps:

1. Compute Initial Hashes

Compute the hash value for the pattern and the first substring of the text of the same length as the pattern.

2. Sliding Window Comparison

Slide the window over the text one character at a time, updating the hash value for the current window, and compare it with the pattern’s hash.

3. Check for Hash Collisions

If the hash values match, compare the actual substring with the pattern to confirm the match. Hash collisions may occur, so this step ensures accuracy.

4. Recompute Hash Efficiently

Use a rolling hash technique to efficiently update the hash value when the window slides.

Java Implementation of Rabin-Karp Algorithm

Here’s a complete Java program implementing the Rabin-Karp algorithm:

import java.util.ArrayList;
import java.util.List;

public class RabinKarp {
    // A large prime number for hashing
    private static final int prime = 101;

    public static List<Integer> rabinKarp(String text, String pattern) {
        List<Integer> result = new ArrayList<>();
        int patternLength = pattern.length();
        int textLength = text.length();

        // Compute the hash for the pattern and the first window of text
        long patternHash = createHash(pattern, patternLength);
        long textHash = createHash(text, patternLength);

        // Slide the window over the text
        for (int i = 0; i <= textLength - patternLength; i++) {
            // If the hash values match, check the actual substrings
            if (patternHash == textHash && checkEqual(text, pattern, i, i + patternLength - 1)) {
                result.add(i); // Record the starting index of the match
            }

            // Compute the hash for the next window
            if (i < textLength - patternLength) {
                textHash = recalculateHash(text, i, i + patternLength, textHash, patternLength);
            }
        }

        return result;
    }

    // Create the initial hash value for a string of given length
    private static long createHash(String str, int length) {
        long hash = 0;
        for (int i = 0; i < length; i++) {
            hash += str.charAt(i) * Math.pow(prime, i);
        }
        return hash;
    }

    // Recalculate the hash value by sliding the window
    private static long recalculateHash(String str, int oldIndex, int newIndex, long oldHash, int patternLength) {
        long newHash = oldHash - str.charAt(oldIndex);
        newHash /= prime;
        newHash += str.charAt(newIndex) * Math.pow(prime, patternLength - 1);
        return newHash;
    }

    // Check if the actual substrings are equal
    private static boolean checkEqual(String str1, String str2, int start1, int end1) {
        for (int i = start1, j = 0; i <= end1; i++, j++) {
            if (str1.charAt(i) != str2.charAt(j)) {
                return false;
            }
        }
        return true;
    }

    public static void main(String[] args) {
        String text = "ababcabcabababd";
        String pattern = "ababd";
        List<Integer> positions = rabinKarp(text, pattern);

        System.out.println("Pattern found at positions: " + positions); // Output: [10]
    }
}

Detailed Explanation

1. `createHash` Function

This function computes the initial hash value for a given substring. It multiplies each character by a power of a prime number to spread out the hash values, reducing the chance of collisions.

private static long createHash(String str, int length) {
    long hash = 0;
    for (int i = 0; i < length; i++) {
        hash += str.charAt(i) * Math.pow(prime, i);
    }
    return hash;
}

2. `recalculateHash` Function

This function updates the hash value when the window slides by one character. It subtracts the contribution of the outgoing character, divides by the prime to shift the window, and adds the contribution of the incoming character.

private static long recalculateHash(String str, int oldIndex, int newIndex, long oldHash, int patternLength) {
    long newHash = oldHash - str.charAt(oldIndex);
    newHash /= prime;
    newHash += str.charAt(newIndex) * Math.pow(prime, patternLength - 1);
    return newHash;
}

3. `checkEqual` Function

This function verifies that the actual characters of the substring match the pattern when the hash values are equal. This check helps avoid false positives due to hash collisions.

private static boolean checkEqual(String str1, String str2, int start1, int end1) {
    for (int i = start1, j = 0; i <= end1; i++, j++) {
        if (str1.charAt(i) != str2.charAt(j)) {
            return false;
        }
    }
    return true;
}

4. `rabinKarp` Function

This function integrates the previous functions to perform the Rabin-Karp algorithm. It calculates the initial hashes, slides the window, and checks for matches.

public static List<Integer> rabinKarp(String text, String pattern) {
    List<Integer> result = new ArrayList<>();
    int patternLength = pattern.length();
    int textLength = text.length();
    long patternHash = createHash(pattern, patternLength);
    long textHash = createHash(text, patternLength);

    for (int i = 0; i <= textLength - patternLength; i++) {
        if (patternHash == textHash && checkEqual(text, pattern, i, i + patternLength - 1)) {
            result.add(i);
        }
        if (i < textLength - patternLength) {
            textHash = recalculateHash(text, i, i + patternLength, textHash, patternLength);
        }
    }

    return result;
}

Performance Considerations

Time Complexity

Best Case: ( O(n + m) ) (when there are few or no hash collisions)
Worst Case: ( O(n \cdot m) ) (when there are many hash collisions and all substrings need to be compared)

Space Complexity

The space complexity is ( O(1) ) in addition to the input strings since we only use a few extra variables for hash values and pointers.

Use Cases

Plagiarism Detection: Efficiently find repeated phrases.
Database Search: Quickly locate entries in large datasets.
Text Editors: Implement find-and-replace functionalities.

Conclusion

The Rabin-Karp algorithm is a powerful tool for pattern matching in strings, leveraging hashing to improve efficiency. While it shines in scenarios with multiple patterns or long texts, be mindful of hash collisions which can affect performance.

Try experimenting with different texts and patterns to see the Rabin-Karp algorithm in action. Happy coding!

Tags: #Java #Algorithm #RabinKarp #PatternMatching #Hashing