KMP Algorithm

KMP is essentially finding common prefix within the substring.

Create a next (index) array, for each character in substring, find the previous character that also share the same prefix as the current substring. In the process of finding substring, when we find a mismatch, we can use this next (index) array to jump back to the previous character that matches current substring without needing to traverse the whole substring again.

substring: `abeabc`
next: `[0,0,0,1,2,0]`

substring: `aaa`
next: `[0,1,2]`

substring: `abcabc`
next: `[0,0,0,1,2,3]`

E.g., in abcabc, substring [0:3] (abc) doesn’t share any prefix, when we find a mismatch, we always go back to 0 index. For index 3 (a), it shares the same prefix with substring [0:1] (a), so when we find a mismatch at index 4 (abcab), we jump back to next[4-1] = 1, because we know previous character is a and the first character is also a (by building the next array). For index 4 (b, substring[3:5], ab), it shares the same prefix with substring [0:2] (ab), so the value of next[4] is 2.

Java:

public static int kmp(String source, String sub) {
    int[] next = new int[sub.length()];
    for (int i = 1, j = 0; i < sub.length(); i ++) {
        while (j > 0 && sub.charAt(i) != sub.charAt(j)) {
            j = next[j - 1];
        }
        if (sub.charAt(i) == sub.charAt(j)) {
            j++;
        }
        next[i] = j;
    }

    for (int i = 0, j = 0; i < source.length(); i++) {
        while (j > 0 && source.charAt(i) != sub.charAt(j)) {
            j = next[j-1];
        }
        if (source.charAt(i) == sub.charAt(j)) {
            j++;
        }
        if (j == sub.length()) {
            return i - sub.length() + 1;
        }
    }
    return -1;
}

Go:

func Kmp(source string, sub string) int {
	sour := []rune(source)
	subr := []rune(sub)
	next := make([]int, len(subr))
	for i, j := 1, 0; i < len(subr); i++ {
		for j > 0 && subr[i] != subr[j] {
			j = next[j-1]
		}

		if subr[i] == subr[j] {
			j++
		}
		next[i] = j
	}

	for i, j := 0, 0; i < len(sour); i++ {
		for j > 0 && sour[i] != sour[j] {
			j = next[j-1]
		}
		if sour[i] == sour[j] {
			j++
		}
		if j == len(subr) {
			return i - len(subr) + 1
		}
	}
	return -1
}