KMP Algorithm

Goal

If P[1: q] is matched with T[s+1: s+q] but P[q+1] != T[s+q+1],
then we want to find a smallest s' > s such that with some q' < q
we have P[1: q'] matched with T[s'+1: s'+q'].

KMP mismatch
KMP shift

in other words

Find the longest prefix of P[1: q]
that is also a suffix of T[1: s+q], if P[1: q] is matched with T[s+1: s+q].

Prefix Function

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
vector<int> prefix_func(string &s){
vector<int> pi(s.size());// pi[i] = the length of the longest proper prefix of s[0:i] which is also a suffix of s[0:i]
pi[0] = -1;
for(int i = 1, prev = -1; i < s.size(); i++){
while(prev >= 0 && s[prev+1] != s[i]){
prev = pi[prev];// If there is a mismatch, we fall back to the next longest prefix-suffix match using `pi[prev]`. <important!>
}
if(s[prev+1] == s[i]){
// If we have a match, we can extend the current prefix-suffix match by one character.
prev++;
pi[i] = prev;
}
else{
pi[i] = -1;
}
}
}
return pi;

Complexity

  • Time: O(m), m = length of pattern
    Q: Why not O(m^2)?
    A: for each prev, it can only be incremented once every iteration. prev can be decremented at most m times during m iteration of the for loop, since the increment operation happens at most m times.

Matching

1
2
3
4
5
6
7
8
9
10
11
12
13
14
void kmp(string &text, string &pattern){
vector<int> pi = prefix_func(pattern);
int t = 0, p = -1;
while(t < n){
if(text[t] == pattern[p + 1]){
t++, p++;
if(p == m - 1) p = lps[p];// found a match, continue to search for the next match
}
else{
if(p >= 0) p = lps[p];
else t++;
}
}
}

Credits

  • images from slides of DSA course by Prof. Michael Tsai, NTU CSIE