The Boyer-Moore algorithm is a widely used string matching algorithm that offers efficient searching by skipping sections of the text where the pattern cannot possibly match. This blog post will explain the Boyer-Moore algorithm, its core heuristics, and provide a detailed Java implementation.
Introduction to Boyer-Moore Algorithm
The Boyer-Moore algorithm is designed to search for a pattern within a text by utilizing two main heuristics: Bad Character Rule and Good Suffix Rule. These heuristics enable the algorithm to skip over parts of the text, leading to potential improvements in performance compared to naive methods.
How It Works
- Bad Character Rule: If a mismatch occurs, the algorithm shifts the pattern to align the next occurrence of the mismatched character in the pattern with the text.
- Good Suffix Rule: If a mismatch occurs, the algorithm uses the longest suffix of the matched portion to determine the shift distance.
Step-by-Step Execution of Boyer-Moore Algorithm
1. Build the Bad Character Table
This table indicates how far to shift the pattern when a mismatch occurs based on the mismatched character.
2. Search Using Heuristics
Use the Bad Character Rule and Good Suffix Rule to shift the pattern over the text efficiently, skipping sections where the pattern cannot match.
3. Match Verification
If the characters match, continue to verify the remaining characters of the pattern.
Java Implementation of Boyer-Moore Algorithm
Here’s a complete Java program implementing the Boyer-Moore algorithm:
import java.util.ArrayList;
import java.util.List;
public class BoyerMoore {
public static List<Integer> boyerMoore(String text, String pattern) {
List<Integer> result = new ArrayList<>();
int[] badCharTable = buildBadCharTable(pattern);
int textLength = text.length();
int patternLength = pattern.length();
int shift = 0;
while (shift <= (textLength - patternLength)) {
int j = patternLength - 1;
// Move backwards through the pattern and text
while (j >= 0 && pattern.charAt(j) == text.charAt(shift + j)) {
j--;
}
// If the pattern is present at current shift, record the position
if (j < 0) {
result.add(shift);
shift += (shift + patternLength < textLength) ? patternLength - badCharTable[text.charAt(shift + patternLength)] : 1;
} else {
// Shift the pattern
shift += Math.max(1, j - badCharTable[text.charAt(shift + j)]);
}
}
return result;
}
// Build the bad character table
private static int[] buildBadCharTable(String pattern) {
int[] table = new int[256]; // ASCII character set size
int patternLength = pattern.length();
for (int i = 0; i < 256; i++) {
table[i] = -1; // Initialize all occurrences as -1
}
for (int i = 0; i < patternLength; i++) {
table[pattern.charAt(i)] = i; // Fill in the actual last occurrence
}
return table;
}
public static void main(String[] args) {
String text = "ababcabcabababd";
String pattern = "ababd";
List<Integer> positions = boyerMoore(text, pattern);
System.out.println("Pattern found at positions: " + positions); // Output: [10]
}
}
Detailed Explanation
1. buildBadCharTable
Function
This function creates the bad character table, which stores the last occurrence index of each character in the pattern. If a mismatch occurs, this table helps decide how far to shift the pattern.
private static int[] buildBadCharTable(String pattern) {
int[] table = new int[256]; // ASCII character set size
int patternLength = pattern.length();
for (int i = 0; i < 256; i++) {
table[i] = -1; // Initialize all occurrences as -1
}
for (int i = 0; i < patternLength; i++) {
table[pattern.charAt(i)] = i; // Fill in the actual last occurrence
}
return table;
}
Explanation
- Initialization: Set all entries to
-1
indicating no occurrence. - Filling: For each character in the pattern, store its last occurrence index in the table.
2. boyerMoore
Function
This function uses the bad character table and searches the text for occurrences of the pattern.
public static List<Integer> boyerMoore(String text, String pattern) {
List<Integer> result = new ArrayList<>();
int[] badCharTable = buildBadCharTable(pattern);
int textLength = text.length();
int patternLength = pattern.length();
int shift = 0;
while (shift <= (textLength - patternLength)) {
int j = patternLength - 1;
// Move backwards through the pattern and text
while (j >= 0 && pattern.charAt(j) == text.charAt(shift + j)) {
j--;
}
// If the pattern is present at current shift, record the position
if (j < 0) {
result.add(shift);
shift += (shift + patternLength < textLength) ? patternLength - badCharTable[text.charAt(shift + patternLength)] : 1;
} else {
// Shift the pattern
shift += Math.max(1, j - badCharTable[text.charAt(shift + j)]);
}
}
return result;
}
Explanation
- Pattern Matching: Start from the rightmost character of the pattern and compare with the text.
- Match Found: If the pattern matches the substring, record the index.
- Mismatch Handling: If a mismatch occurs, use the bad character table to shift the pattern accordingly.
Performance Considerations
Time Complexity
- Best Case: ( O(n/m) ) when characters mismatch frequently and the bad character rule leads to large shifts.
- Worst Case: ( O(n \cdot m) ) when the text has many repeated patterns leading to minimal shifts.
Space Complexity
- Space Complexity: ( O(m + \sigma) ) where ( \sigma ) is the size of the alphabet (256 for ASCII).
Use Cases
- Text Editors: Efficient search functionalities.
- Network Security: Fast pattern matching in network packets.
- Bioinformatics: Identifying patterns in DNA sequences.
Conclusion
The Boyer-Moore algorithm provides a powerful and efficient method for pattern matching by leveraging two heuristics to skip unnecessary comparisons. Its ability to perform well on large texts with patterns makes it a valuable tool in various applications.
Experiment with different patterns and texts to see how the Boyer-Moore algorithm performs. Happy coding!
Tags: #Java #Algorithm #BoyerMoore #PatternMatching #StringSearch