Wednesday, September 26, 2012

Sparse Representation, Compressed Sensing, Single Measurement Vector, Multiple Measurement Vector: connections and differences

Purpose of this blog entry

Sparse representation (SR), compressed sensing (CS), single measurement vector (SMV),  multiple measurement vector (MMV), I was ever confused about these terms. They seems to be the same, however, the way they using to describe problems differs.In this blog, I aim to make clear the inner relationship among them, and the development of these theories.

The Big Picture

To my understanding, the following figure describes the relationship of them. 
To see the inherent relationship, I'd rather like to break the CS into two kinds, 1st generation CS (CS1G) and 2nd generation CS (CS2G). The difference will be discussed later. Nevertheless, such notation is not used in current society. So please not be confused when reading papers.
In the paper, CS is referred to CS2G. And SMV is referred to CS1G.
The notion SMV is first mentioned in [10]. (And in [11], one measurement vectors is used to describe the SMV problem).
We will return to the big picture in the end of this blog.

SMV: SR and CS1G

The SMV model basically presents such problem: $y = \Phi x$
  •  sparse representation: $y$ is viewed as a signal in reality, $\Phi$ is called an over-complete dictionary, $x$ is the sparse representation
    •  we want to find a sparse representation of the signal using a dictionary, and we want the signal to be as sparse as possible
    •  $\Phi \in R^{M \times N}$: if M < N, the dictionary is over-complete, if M = N, the dictionary is complete, if M > N, the dictionary is under-complete. The "complete" is viewed from the signal $y$. Simply speaking, if the length of the representation is larger than that of the signal, then the signal is "over-represented", and thus the dictionary is "over-complete".
    •  2D DCT, DFT, etc., are sparse representation of the signal using a complete dictionary
    •  In this blog, we consider sparse representation of the signal using an over-complete dictionary
  • 1st generation compressed sensing:  $x$ is a sparse signal in itself, i.e., there are only a few non-zero entries in $x$, $\Phi$ is called measurement matrix, $y$ is called measurements
    • we want to sample a sparse signal using less measurements than its length, in other words, compressing the signal.
    • here we only want $\Phi$ to be over-complete in terms of $y$
    • note that "$x$ is a sparse signal in itself " is the identity of CS1G. For CS2G, as we will discussed in next section, $x$ is a sparse signal in domain $\Psi$, which is a generation of CS1G
Therefore, SMV is actually a model of SR and CS1G, with each describes different applications.
And we can also view SR as the reconstruction step of CS1G. In other words, in a sampling-reconstruction system, CS1G is the sampling process, and SR is the reconstruction process, by taking the sample as the signal.
If you have this in mind, you then probably can have a better understanding of following references discussed in this section.

The problem discussed in [1] is that a signal can be reconstructed if certain requirements were met in both cases:
  1. a signal is sparse in frequency domain, and under-sampled, in other words, missing samples, in time domain, 
  2. or a signal is sparse in time domain, and under-sampled in frequency domain
They are actually CS1G problems, i.e., sampling a sparse signal in itself, and reconstructing the signal by solving the $\ell_1$ minimization.
For the first case, $x$ is the frequency components of the signal, $\tilde{\Phi}$ is inverse DFT basis, i.e. $IDFT(x) = \tilde{\Phi} x$  and $\Phi = R\tilde{\Phi}$ is the measurement matrix, where $R$ is a selection matrix, selecting $M$ rows of $\tilde{\Phi}$, $y$ is under-sampled time domain signal.
For the second case, $x$ is the time domain signal, $\Phi = R \tilde{\Phi}$ is the measurement matrix, where $\tilde{\Phi}$ is the DFT basis, $y$ is under-sampled frequency domain signal.
Thus, by taking the frequency domain components (first case), and time domain components (second case) as the original signal, respectively, the problem discussed is essentially a CS1G problem.

I'd like to also mention the following principle proposed in [1], which, I think, is the basic of following theoretical results.
  1. Classical Uncertainty Principle: a function $f$ and its Fourier transform $\hat{f}$ cannot both be highly concentrated. $\Delta t \cdot \Delta w \geq 1$
  2. [1] show a more general principle holds: it is not necessary to suppose that $f$ and $\hat{f}$are concentrated on intervals; instead, they can be just concentrated on a measurable set. $|T| |W| \geq 1-\delta$. And it also applies to sequences. $N_t \cdot N_w \geq N$
    1. CT principle: missing segments of a bandlimited function can be restored stably in the presence of noise if (total measurement of the missing segments) $\cdot$ (total bandwidth) < 1.
    2. DT principle: a wideband signal can be reconstructed from narrow-band data, provided the wideband signal to be recovered is sparse or "impulsive"
Then I have an intuitive understanding on he SMV/SR/CS1G theory: 
  1. a signal cannot be both sparse in two incoherent basis
  2. under-sample the signal in a non-sparse domain, and recover the signal in sparse domain
    • then from the recovered sparse domain signal, we are able to get the original signal
And this is actually the CS2G theory, which do not require $x$ to be sparse in itself. We leave the discussion in next section.
[2]-[5] are a sub-group of researches, related to both CS1G and CS2G theory.
[2] actually discussed the SMV problem from SR's point of view. And it proposed more incoherent basis pairs besides (time, frequency). And these incoherent basis pairs are actually CS2G theory.
[3] improved the constrains related to the replacement of $\ell_0$ minimization with $\ell_1$ minimization in [2].
[4] proved that the condition in [3] is both sufficient and necessary, whereas [3] only proved the sufficiency.
[5] should be an extension of [2]. However, I didn't go into details.

[6] is about CS1G, and can be viewed as a direct extension of [1]. The problem discussed in the paper can be phrased as follows:
  • a N-length discrete time signal $f$ is sparse in time domain, i.e., consists of a superposition of |T| spikes. 
  • sub-sample the signal in frequency domain, i.e., only sample |$\Omega$|  frequency components, instead of N.
  • If |T| $\leq C_M \cdot (\log{N})^{-1} \cdot |\Omega|$, then the reconstruction can be exact with probability at least 1-O($N^{-M^})
  • the reconstruction is via solving $\ell_1$ minimization problem.
See, it is almost the same with [1] in this sense.

In addition, [6] shows that the min TV problem is actually the problem stated in the paper.


CS2G

As I stated before, 
  1. a signal cannot be both sparse in two incoherent basis
  2. under-sample the signal in a non-sparse domain, and recover the signal in sparse domain
The most classical CS theory is expressing this idea.
$y = \Phi x = \Phi \Psi \theta$.
To better understand the essence, we let ($\tilde{\Phi}, \Psi$) be a pair of incoherence basis, $\Phi = R \tilde{\Phi}^T$.
Then, in $\tilde{\Phi}$ domain, $x$ can be expressed as $\gamma = \tilde{\Phi}^T x$, in $\Psi$ domain, $x$ can be expressed as $\theta = \Psi^T x$.
So $y$ is actually achieved by under-sample the signal in non-sparse domain, i.e., $\tilde{\Phi}$ domain.
In other words, $y = R \gamma$. 
And the signal is reconstruct in sparse domain, i.e.,, $\Psi$ domain.
Then the CS1G is actually taking $\Psi$ as identity basis.

[7]-[9] are actually the theoretical foundation work on the CS2G theory, together with [2]-[5].
I didn't find the paper [7]. However, it is stated in [8] that [7] extended the result in [6], and showed the exact recovery holds for other synthesis/measurement pairs. And [8] describes these results, making reading [7] not so necessary. The results are shown as follows:
  • an N-length signal $f$ is sparse in domain $\Phi$, i.e., $\theta = \Phi f$  has only a few nonzero entries.
  • sub-sample the signal in domain $\Psi$, i.e., only sample $|\Omega|$ coefficients, instead of N
  • the reconstruction can be exact, via solving an $\ell_1$ minimization problem.
  • the requirement of the exact reconstruction is about the incoherence between $\Phi$ and $\Psi$.
[8] extended the work of [6] and [7], showing that the signal compressible, not necessarily strict sparse, in certain domain can also be optimally reconstructed, i.e., the reconstruction error using K measurements is as good as the best K-term approximation, with overwhelming probability.
[9] discussed the linear code corrupted by noise problem. However, the RIP condition is proposed in this paper, although it does not show the clear relationship between the decoding and CS. Besides, the Gaussian ensembles are proved to follow RIP with an overwhelming probability. On the other hand, it should be noticed that, if RIP is satisfied, the reconstruction is exact determinedly, whereas in [6] [7] [8], the reconstruction is with overwhelming probability.

Then [13] [14] organized all related theoretical results and formulate the CS problem.
And [12] gives a tutorial for CS theory, which I preferred to recommend for reading.


MMV

The only thing left is MMV. As is shown in the big picture, MMV is a direct extension of SMV, in a different way with the evolution from SMV to CS.
[11] formulate the MMV problem. Similar to SMV, the MMV can be viewed as either a 2D sparse representation problem, or a 2D CS1G problem.
[10], [15] give theoretical results on MMV, which is derived from those on SMV, i.e., [1]-[6].


Discussions


  1. In this blog, I didn't discuss the development from noiseless to noisy case, which is also an interesting and important evolution.
  2. In this blog, I didn't focus on the development from sparse signals to compressible signals.
  3. In this blog, I only discussed the discrete-time situation. However, CS is actually can be, and should be addressed in analog signals, which is the meaning of "sensing".
  4. Since I'm working on CS theory, my knowledge to SR, MMV is limited. Thus, the goal is to discern the CS with SR, SMV, MMV, instead of introducing SR, MMV, which I believe are far more complicated.
  5. Return to the big picture, it is interesting to ask, can we extend MMV similar to the extension form SMV to CS? To the best of my knowledge, recent work of Duarte and Baranuik in [16] discussing the high-dimensional CS, where 2D CS is an special case. However, it differs with MMV in the reconstruction method, since it still solving the reconstruction problem in 1D. I would appreciate your help if you can share your opinions and references on this issue.
  6. At last, I really hope this blog can smooth your perplexity when reading references where you don't know what SR, CS, SMV, MMV refer to. They are on the one hand synonyms and inherently similar, and on the other hand differ slightly from each other in terms of their representations and applications. By the help of this blog, you may have in mind what the reference is discussing when reading, at least, references given in the blog.

[1] Uncertainty principles and signal recovery
[2] Uncertainty principles and idea atomic decomposition
[3] A generalized uncertainty principle and sparse representation
[4] On sparse representation in pairs of bases
[5] Optimally sparse representation in general (nonorthogonal) dictionaries via l1 minimization
[6] Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information
[7] The role of sparsity and incoherence for exactly reconstructing a signal from limited measurements
[8] Near-Optimal signal recovery from random projections: universal encoding strategies ?
[9] Decoding by linear programming
[10] Sparse representation for multiple measurement vectors (MMV) in an over-complete dictionary
[11] Sparse solutions to linear inverse problems with multiple measurement vectors
[12] An introduction to compressive sampling
[13] Compressive sampling
[14] Compressed sensing
[15] Theoretical results on sparse representations of multiple-measurement vectors
[16] Kronecker compressed sensing

1 comment: