Language edit distance, $(\\min,+)$-matrix multiplication & beyond
Presenter
November 6, 2017
Abstract
The language edit distance is a significant generalization of two basic problems in computer science: parsing and string edit distance computation. Given any context free grammar, it computes the minimum number of insertions, deletions and substitutions required to convert a given input string into a valid member of the language. In 1972, Aho and Peterson gave a dynamic programming algorithm that solves this problem in time cubic in the string length. Despite its vast number of applications, in forty years there has been no improvement over this running time.
Computing $(\min,+)$-product of two n by n matrices in truly subcubic time is an outstanding open question, as it is equivalent to the famous All-Pairs-Shortest-Paths problem. Even when matrices have entries bounded in $[1,n]$, obtaining a truly subcubic $(\min,+)$-product algorithm will be a major breakthrough in computer science.
In this presentation, I will explore the connection between these two problems which led us to develop the first truly subcubic algorithms for the following problems with improvements coming for each of these problems after several decades: (1) language edit distance, (2) RNA-folding-a basic computational biology problem and a special case of language edit distance computation, (3) stochastic grammar parsing—fundamental to natural language processing, and (4) $(\min,+)$-product of integer matrices with entries bounded in $n^{3-\omega-c}$ where $c > 0$ is any constant and, $\omega$ is the exponent of the fast matrix multiplication, widely believed to be 2.
Time permitting, we will also discuss developing highly efficient linear time approximation algorithms for language edit distance for important subclasses of context free grammars.