Abstract
We propose a method based on sparse representation to cluster data drawn from multiple low-dimensional linear or affine subspaces embedded in a high-dimensional space. Our method is based on the fact that each point in a union of subspaces has a sparse representation with respect to a dictionary formed by all other data points. In general, finding such a spare representation is NP hard. Our key contribution is to show that, under mild assumptions, the sparse representation can be obtained 'exactly' by using
1 optimization. The segmentation of the data is obtained by applying spectral clustering to a similarity matrix built from this sparse representation. Our method can be extended to handle noise, outliers as well as missing data by exploiting sparsity. Experiments on the Hopkins155 motion segmentation database and other motion sequences with outliers and missing data show that our approach significantly outperforms state-of-the-art methods.