Knuth Morris Pratt Algorithm

7 min read Oct 13, 2024
Knuth Morris Pratt Algorithm

The Knuth-Morris-Pratt (KMP) algorithm is a powerful string searching algorithm that efficiently finds occurrences of a pattern within a larger text. Unlike naive string search algorithms, KMP cleverly leverages information about the pattern itself to avoid unnecessary comparisons, resulting in significant performance gains.

How Does KMP Work?

Imagine searching for the pattern "AABA" within the text "AAAAABAAABA". A naive approach would compare characters one by one, leading to many redundant comparisons. KMP, however, preprocesses the pattern to create a prefix table, which stores the lengths of the longest proper prefixes of the pattern that are also suffixes.

Let's break down the prefix table creation for "AABA":

Index Character Longest Proper Prefix Suffix
0 A 0
1 A 1
2 B 0
3 A 1
  • At index 0, the longest proper prefix suffix is empty (length 0).
  • At index 1, the longest proper prefix suffix is "A" (length 1).
  • At index 2, there is no proper prefix suffix (length 0).
  • At index 3, the longest proper prefix suffix is "A" (length 1).

This table is crucial for efficient pattern matching.

The Matching Process

  1. Initialization: Start comparing the pattern and text from their beginning positions.
  2. Comparison: Iterate through the text, comparing characters with the pattern.
  3. Match: If a mismatch occurs, use the prefix table to shift the pattern instead of starting from the beginning.
  4. Shifting: The shift value is determined by the length of the longest proper prefix suffix of the pattern at the mismatch position.
  5. Continue: Repeat steps 2-4 until the end of the text is reached.

Example:

Text: "AAAAABAAABA" Pattern: "AABA"

  1. Initial Comparison: "AABA" matches "AAAA" at the beginning.
  2. Mismatch: Mismatch occurs at the fifth position in the text ("B" in the text, "A" in the pattern).
  3. Shifting: The prefix table for index 3 of the pattern (mismatch position) indicates a length of 1. So, shift the pattern one position to the right, aligning "A" (index 1 of the pattern) with the "B" (index 5 of the text).
  4. Continue: The process continues with comparison and shifting until a match is found or the end of the text is reached.

Advantages of KMP

  • Efficiency: KMP eliminates unnecessary comparisons, leading to a faster searching process, especially for patterns with repeating substrings.
  • Complexity: It has a time complexity of O(n), where n is the length of the text, making it significantly faster than the naive approach (O(m*n), where m is the length of the pattern).

Implementation

KMP can be implemented in various programming languages, including Python, Java, C++, and others.

Python Example:

def kmp_search(text, pattern):
  """
  Searches for a pattern in a text using the KMP algorithm.

  Args:
      text: The text to search in.
      pattern: The pattern to search for.

  Returns:
      A list of indices where the pattern is found in the text.
  """
  # Create the prefix table for the pattern
  prefix_table = build_prefix_table(pattern)

  # Initialize indices for the text and pattern
  i = 0  # Index for the text
  j = 0  # Index for the pattern

  # Search for the pattern in the text
  matches = []
  while i < len(text):
    if text[i] == pattern[j]:
      i += 1
      j += 1
      if j == len(pattern):
        matches.append(i - j)
        j = prefix_table[j - 1]
    else:
      if j != 0:
        j = prefix_table[j - 1]
      else:
        i += 1

  return matches

def build_prefix_table(pattern):
  """
  Builds the prefix table for a pattern.

  Args:
      pattern: The pattern for which to build the table.

  Returns:
      A list representing the prefix table.
  """
  prefix_table = [0] * len(pattern)
  k = 0
  for i in range(1, len(pattern)):
    while k > 0 and pattern[i] != pattern[k]:
      k = prefix_table[k - 1]
    if pattern[i] == pattern[k]:
      k += 1
    prefix_table[i] = k
  return prefix_table

# Example usage:
text = "AAAAABAAABA"
pattern = "AABA"
matches = kmp_search(text, pattern)
print(f"Pattern found at indices: {matches}")

Conclusion

The Knuth-Morris-Pratt (KMP) algorithm is a valuable tool for string searching, providing significant efficiency gains over naive approaches. By pre-processing the pattern and using a prefix table to guide comparisons, KMP reduces redundant steps, leading to faster search times. Understanding and implementing KMP can be beneficial for various applications involving string manipulation and pattern matching.

Featured Posts