In the world of text searching and pattern matching, the Rabin-Karp algorithm stands out for its elegant use of hashing to find patterns efficiently. This guide will walk you through the Rabin-Karp algorithm, explain its logic, and provide a complete Java implementation.
Introduction to Rabin-Karp Algorithm
Rabin-Karp is a string matching algorithm that uses hashing to find patterns in text. It’s particularly effective for detecting multiple patterns simultaneously and works well for patterns with similar lengths.
How It Works
The algorithm relies on hashing to compare the pattern with substrings of the text. Instead of comparing strings directly, it compares their hash values, which can be computed quickly.
Step-by-Step Execution of Rabin-Karp Algorithm
Let’s break down the Rabin-Karp algorithm into detailed steps:
1. Compute Initial Hashes
Compute the hash value for the pattern and the first substring of the text of the same length as the pattern.
2. Sliding Window Comparison
Slide the window over the text one character at a time, updating the hash value for the current window, and compare it with the pattern’s hash.
3. Check for Hash Collisions
If the hash values match, compare the actual substring with the pattern to confirm the match. Hash collisions may occur, so this step ensures accuracy.
4. Recompute Hash Efficiently
Use a rolling hash technique to efficiently update the hash value when the window slides.
Java Implementation of Rabin-Karp Algorithm
Here’s a complete Java program implementing the Rabin-Karp algorithm:
import java.util.ArrayList;
import java.util.List;
public class RabinKarp {
// A large prime number for hashing
private static final int prime = 101;
public static List<Integer> rabinKarp(String text, String pattern) {
List<Integer> result = new ArrayList<>();
int patternLength = pattern.length();
int textLength = text.length();
// Compute the hash for the pattern and the first window of text
long patternHash = createHash(pattern, patternLength);
long textHash = createHash(text, patternLength);
// Slide the window over the text
for (int i = 0; i <= textLength - patternLength; i++) {
// If the hash values match, check the actual substrings
if (patternHash == textHash && checkEqual(text, pattern, i, i + patternLength - 1)) {
result.add(i); // Record the starting index of the match
}
// Compute the hash for the next window
if (i < textLength - patternLength) {
textHash = recalculateHash(text, i, i + patternLength, textHash, patternLength);
}
}
return result;
}
// Create the initial hash value for a string of given length
private static long createHash(String str, int length) {
long hash = 0;
for (int i = 0; i < length; i++) {
hash += str.charAt(i) * Math.pow(prime, i);
}
return hash;
}
// Recalculate the hash value by sliding the window
private static long recalculateHash(String str, int oldIndex, int newIndex, long oldHash, int patternLength) {
long newHash = oldHash - str.charAt(oldIndex);
newHash /= prime;
newHash += str.charAt(newIndex) * Math.pow(prime, patternLength - 1);
return newHash;
}
// Check if the actual substrings are equal
private static boolean checkEqual(String str1, String str2, int start1, int end1) {
for (int i = start1, j = 0; i <= end1; i++, j++) {
if (str1.charAt(i) != str2.charAt(j)) {
return false;
}
}
return true;
}
public static void main(String[] args) {
String text = "ababcabcabababd";
String pattern = "ababd";
List<Integer> positions = rabinKarp(text, pattern);
System.out.println("Pattern found at positions: " + positions); // Output: [10]
}
}
Detailed Explanation
1. createHash
Function
This function computes the initial hash value for a given substring. It multiplies each character by a power of a prime number to spread out the hash values, reducing the chance of collisions.
private static long createHash(String str, int length) {
long hash = 0;
for (int i = 0; i < length; i++) {
hash += str.charAt(i) * Math.pow(prime, i);
}
return hash;
}
2. recalculateHash
Function
This function updates the hash value when the window slides by one character. It subtracts the contribution of the outgoing character, divides by the prime to shift the window, and adds the contribution of the incoming character.
private static long recalculateHash(String str, int oldIndex, int newIndex, long oldHash, int patternLength) {
long newHash = oldHash - str.charAt(oldIndex);
newHash /= prime;
newHash += str.charAt(newIndex) * Math.pow(prime, patternLength - 1);
return newHash;
}
3. checkEqual
Function
This function verifies that the actual characters of the substring match the pattern when the hash values are equal. This check helps avoid false positives due to hash collisions.
private static boolean checkEqual(String str1, String str2, int start1, int end1) {
for (int i = start1, j = 0; i <= end1; i++, j++) {
if (str1.charAt(i) != str2.charAt(j)) {
return false;
}
}
return true;
}
4. rabinKarp
Function
This function integrates the previous functions to perform the Rabin-Karp algorithm. It calculates the initial hashes, slides the window, and checks for matches.
public static List<Integer> rabinKarp(String text, String pattern) {
List<Integer> result = new ArrayList<>();
int patternLength = pattern.length();
int textLength = text.length();
long patternHash = createHash(pattern, patternLength);
long textHash = createHash(text, patternLength);
for (int i = 0; i <= textLength - patternLength; i++) {
if (patternHash == textHash && checkEqual(text, pattern, i, i + patternLength - 1)) {
result.add(i);
}
if (i < textLength - patternLength) {
textHash = recalculateHash(text, i, i + patternLength, textHash, patternLength);
}
}
return result;
}
Performance Considerations
Time Complexity
- Best Case: ( O(n + m) ) (when there are few or no hash collisions)
- Worst Case: ( O(n \cdot m) ) (when there are many hash collisions and all substrings need to be compared)
Space Complexity
- The space complexity is ( O(1) ) in addition to the input strings since we only use a few extra variables for hash values and pointers.
Use Cases
- Plagiarism Detection: Efficiently find repeated phrases.
- Database Search: Quickly locate entries in large datasets.
- Text Editors: Implement find-and-replace functionalities.
Conclusion
The Rabin-Karp algorithm is a powerful tool for pattern matching in strings, leveraging hashing to improve efficiency. While it shines in scenarios with multiple patterns or long texts, be mindful of hash collisions which can affect performance.
Try experimenting with different texts and patterns to see the Rabin-Karp algorithm in action. Happy coding!
Tags: #Java #Algorithm #RabinKarp #PatternMatching #Hashing