Levenshtein Distance
The Levenshtein distance is a string metric for measuring the difference between two sequences. Informally, the Levenshtein distance between two words is the minimum number of single-character edits (insertions, deletions or substitutions) required to change one word into the other.
Definition
Mathematically, the Levenshtein distance between two strings
a and b (of length |a| and |b| respectively) is given by
where
where
is the indicator function equal to
0 when
and equal to 1 otherwise, and
is the distance between the first
i characters of a and the first
j characters of b.
Note that the first element in the minimum corresponds to
deletion (from a to b), the second to insertion and
the third to match or mismatch, depending on whether the
respective symbols are the same.
Example
For example, the Levenshtein distance between kitten and
sitting is 3, since the following three edits change one
into the other, and there is no way to do it with fewer than
three edits:
- kitten → sitten (substitution of "s" for "k")
- sitten → sittin (substitution of "i" for "e")
- sittin → sitting (insertion of "g" at the end).