Calculate the Levenshtein distance between
two strings str1
and str2
.
In Ruby 1.8, str1
and str2
should be ASCII,
UTF-8, or a one-byte-per character encoding such as ISO-8859-*. They will
be treated as UTF-8 if $KCODE is set appropriately (i.e. 'u').
Otherwise, the comparison will be performed byte-by-byte. There is no
specific support for Shift-JIS or EUC strings.
In Ruby 1.9+, the strings will be processed as UTF-8.
When using Unicode text, be aware that this algorithm does not perform normalisation. If there is a possibility of different normalised forms being used, normalisation should be performed beforehand.
# File lib/text/levenshtein.rb, line 32 def distance(str1, str2) prepare = if "ruby".respond_to?(:encoding) lambda { |str| str.encode(Encoding::UTF_8).unpack("U*") } else rule = $KCODE.match(/^U/) ? "U*" : "C*" lambda { |str| str.unpack(rule) } end s, t = [str1, str2].map(&prepare) n = s.length m = t.length return m if n.zero? return n if m.zero? d = (0..m).to_a x = nil n.times do |i| e = i + 1 m.times do |j| cost = (s[i] == t[j]) ? 0 : 1 x = [ d[j+1] + 1, # insertion e + 1, # deletion d[j] + cost # substitution ].min d[j] = e e = x end d[m] = x end return x end