String Searching Algorithms
Just a Collection of String Searching Algorithms
The "Naive" Method
A very common algorithm to search each elements of the array for a match.
A brute-force substring search algorithm checks all possible positions:
The brute force algorithm consists in checking, at all positions in the text between 0 and n-m, whether an occurrence of the pattern starts there or not. Then, after each attempt, it shifts the pattern by exactly one position to the right.
The Rabin–Karp algorithm is a string searching algorithm created by Michael O. Rabin and Richard M. Karp in 1987 that uses hashing to find any one of a set of pattern strings in a text.
H = s[0] * B(m - 1) + s[1] * B(m - 2) + … + s[m - 2] * B1 + s[m - 1] * B0
The Knuth–Morris–Pratt string searching algorithm (or KMP algorithm) searches for occurrences of a "word" W within a main "text string" S by employing the observation that when a mismatch occurs, the word itself embodies sufficient information to determine where the next match could begin, thus bypassing re-examination of previously matched characters.
algorithm kmp_search:
an array of characters, S (the text to be searched)
an array of characters, W (the word sought)
an integer (the zero-based position in S at which W is found)
define variables:
an integer, m ← 0 (the beginning of the current match in S)
an integer, i ← 0 (the position of the current character in W)
an array of integers, T (the table, computed elsewhere)
while m+i is less than the length of S, do:
if W[i] = S[m + i],
if i equals the (length of W)-1,
return m
let i ← i + 1
let m ← m + i - T[i],
if T[i] is greater than -1,
let i ← T[i]
let i ← 0
(if we reach here, we have searched all of S unsuccessfully)
return the length of S
TopCoder's Algorithm Tutorials
The "Naive" Method
A very common algorithm to search each elements of the array for a match.
A brute-force substring search algorithm checks all possible positions:
The brute force algorithm consists in checking, at all positions in the text between 0 and n-m, whether an occurrence of the pattern starts there or not. Then, after each attempt, it shifts the pattern by exactly one position to the right.
The Rabin–Karp algorithm is a string searching algorithm created by Michael O. Rabin and Richard M. Karp in 1987 that uses hashing to find any one of a set of pattern strings in a text.
H = s[0] * B(m - 1) + s[1] * B(m - 2) + … + s[m - 2] * B1 + s[m - 1] * B0
The Knuth–Morris–Pratt string searching algorithm (or KMP algorithm) searches for occurrences of a "word" W within a main "text string" S by employing the observation that when a mismatch occurs, the word itself embodies sufficient information to determine where the next match could begin, thus bypassing re-examination of previously matched characters.
algorithm kmp_search:
an array of characters, S (the text to be searched)
an array of characters, W (the word sought)
an integer (the zero-based position in S at which W is found)
define variables:
an integer, m ← 0 (the beginning of the current match in S)
an integer, i ← 0 (the position of the current character in W)
an array of integers, T (the table, computed elsewhere)
while m+i is less than the length of S, do:
if W[i] = S[m + i],
if i equals the (length of W)-1,
return m
let i ← i + 1
let m ← m + i - T[i],
if T[i] is greater than -1,
let i ← T[i]
let i ← 0
(if we reach here, we have searched all of S unsuccessfully)
return the length of S
TopCoder's Algorithm Tutorials