## Edit Distance

The edit distance of two strings commonly refers to the Levenshtein Distance of two strings- the amount of work it takes to transform one string to the other by either inserting, deleting, or changing one character at a time. For example, changing sitting -> kitten takes 3

moves: sitting -> kitting, kitting -> kittin, and finally kittin -> kitten. The algorithm to solve this in reasonable time takes dynamic programming, and has many applications such as word checking, and typing speed tests. Algorithm below:

public static int editDistance(String s, String t){

int m=s.length();

int n=t.length();

int[][]d=new int[m+1][n+1];

for(int i=0;i<=m;i++){

d[i][0]=i;

}

for(int j=0;j<=n;j++){

d[0][j]=j;

}

for(int j=1;j<=n;j++){

for(int i=1;i<=m;i++){

if(s.charAt(i-1)==t.charAt(j-1)){

d[i][j]=d[i-1][j-1];

}

else{

d[i][j]=min((d[i-1][j]+1),(d[i][j-1]+1),(d[i-1][j-1]+1));

}

}

}

return(d[m][n]);

}

public static int min(int a,int b,int c){

return(Math.min(Math.min(a,b),c));

}

It is important to note that there are only three possibilities one string can get to another: either deleting, inserting, or changing. Thus, if we let L(i,j) indicate the cost it takes from changing i to j, if s and t are the original strings, the cost is:

Math.min(1+L(i-1,j-1),1+L(i,j-1),1+L(i-1,j)) unless the last characters of the strings are the same, then the minimum is L(i-1,j-1). This solves the problem, but note that it is very slow because of all the recursive calls, which repeat many similiar problems. Thus, we use dynamic programming: Construct a matrix d[m][n] with sizes s.length() and t.length(). instead of recursive calls, we will use a matrix lookup which will store the smallest current cost to change a smaller i into j. Thus, with he java algorithm code for edit distance above, the speed is O(mn) where m and n are the sizes of the strings.

moves: sitting -> kitting, kitting -> kittin, and finally kittin -> kitten. The algorithm to solve this in reasonable time takes dynamic programming, and has many applications such as word checking, and typing speed tests. Algorithm below:

public static int editDistance(String s, String t){

int m=s.length();

int n=t.length();

int[][]d=new int[m+1][n+1];

for(int i=0;i<=m;i++){

d[i][0]=i;

}

for(int j=0;j<=n;j++){

d[0][j]=j;

}

for(int j=1;j<=n;j++){

for(int i=1;i<=m;i++){

if(s.charAt(i-1)==t.charAt(j-1)){

d[i][j]=d[i-1][j-1];

}

else{

d[i][j]=min((d[i-1][j]+1),(d[i][j-1]+1),(d[i-1][j-1]+1));

}

}

}

return(d[m][n]);

}

public static int min(int a,int b,int c){

return(Math.min(Math.min(a,b),c));

}

It is important to note that there are only three possibilities one string can get to another: either deleting, inserting, or changing. Thus, if we let L(i,j) indicate the cost it takes from changing i to j, if s and t are the original strings, the cost is:

Math.min(1+L(i-1,j-1),1+L(i,j-1),1+L(i-1,j)) unless the last characters of the strings are the same, then the minimum is L(i-1,j-1). This solves the problem, but note that it is very slow because of all the recursive calls, which repeat many similiar problems. Thus, we use dynamic programming: Construct a matrix d[m][n] with sizes s.length() and t.length(). instead of recursive calls, we will use a matrix lookup which will store the smallest current cost to change a smaller i into j. Thus, with he java algorithm code for edit distance above, the speed is O(mn) where m and n are the sizes of the strings.

Provided by website-hit-counters.com site. |