Thursday, September 27, 2012

Sampling Rate is Not Only about Pixels: How to compare the sampling rate between your camera and "single-pixel" camera


At the beginning, I'd like to make clear two terms: "Nyquist frequency" and "Nyquist rate".
Some of you may take them as the same thing, and even some textbooks did so. However, although they are quite similar, they are actually different.
Reference: http://en.wikipedia.org/wiki/Nyquist_frequency
Nyquist Frequency: A property of system. For a given system, sampling rate $f_s$ is fixed. The Nyquist frequency of the system is the allowed highest frequency of a signal that could be sampled without aliasing, which is $f_s / 2$.
Nyquist Rate: A property of signal. For a given band-limited or band-pass signal, bandwidth B is fixed. The Nyquist rate of the signal is the necessary lowest sampling frequency of a system that could sample the signal without aliasing, which is $2B$.

Then for an (imaging system, image signals), you can interpret the sampling rate is the number of pixels, the highest frequency of the image is the resolution of the image.
Then the Nyquist says here, for a imaging system, the number of pixels (sampling rate) is fixed, the highest allowed resolution of the pixel is determined by/proportional to the number of pixels. For different kinds of measure of resolution, the proportion factor differs.
Meanwhile, for an image signal, whose resolution (band-width) is fixed, the required pixels is then determined by/proportional to the resolution.

So the number of pixels is actually the number of samples a system takes for a given image.
Then does it mean "single-pixel" means the number of samples is 1? The answer is definitely no.
So how to interpret the "single"-pixel?

It's a little tricky, but still can be understood. The only thing you need to do is adding a time-axis.
The traditional imaging system in your cameras actually taking samples at one time, each sample is a pixel. So the total number of samples is (number of pixels in a unit of time) x (unit of time). Since the (unit of time) is 1 here, we usually ignore it.
For the "signal-pixel" camera, the number of samples is (number of pixels in a unit of time) x (unit of time). Here (the number of pixels in a unit of time) is 1, whereas the (unit of time) is not 1, which determines the number of samples.
Image sensors are silicon chips that capture and read light. So in traditional imaging system, we need thousands of image sensors, whereas in single pixel camera, we only use one image sensor to capture and read light.
That's the trade-off, number of pixels (image sensors) and unit of time (exposure time).

Therefore, improved interpretation of Nyquist sampling here is to consider the sampling rate being (number of pixels in a unit of time) x (unit of time), i.e., the product of space sampling and time sampling.

How to understand "Nyquist sample v.s. CS"?

In this blog, I will discuss from a communication system point of view about the Nyquist sample and CS theory. A communication system usually samples and processes a 1D signal. So here we are talk about 1D signal cases.


What does Nyquist say for 1D signal?


As shown in the figure, for a analog signal, we need to first sample it to digital signal, and then process it, and de-sample/reconstruct it into analogy signal again.
Therefore, the "process" part is called digital signal processing (DSP).
The Nyquist sampling happens in the "sampling" part, before DSP.
Then what does Nyquist say?
That is, given a signal with the highest frequency component being $f_m$. The sufficient condition for return from the digital signal $x[n]$ to $x(t)$ is that the sampling frequency $f_s$ is at least $2f_m$.
It should be noticed that this is only a sufficient, not necessary condition.
In certain cases, the condition is necessary, e.g., the signal is band-limited. If the whole limited band is occupied by the signal, or we don't know which portion of band the signal occupied, in other words, the only information we've got is the highest frequency of the signal,
(we call this kind of signal strict-sense band-limited signal), then the only way is to use Nyquist rate to sample the signal.
However, you may argue that, in practice, the signal may not band-limited, then what should we do?
The answer is making the signal band-limited before we sample it. This can be done using a LPF. And it is another story that how the band-limited signal approximates the real signal. Here we only focused on the already band-limited signal.
On the other hand, you may also ask that, if we have a band-pass signal, (of course, the band-pass signal is a band-limited signal, however, not strict sense band-limited signal), then can we have lower sampling frequency.
The answer is yes. To see the method, just think we move the band-pass signal to the base-band, and then it become a strict-sense band-limited signal. Then it is obvious how Nyquist rate works. Actually, we are given more information, i.e., the signal is band-passed from $f_L$ to $f_H$, thus we are able to sample the signal using frequency below $2f_H$.
In this case, we can call such sampling process under-sampling. However, it can be viewed as Nyquist sampling via some conversion.

  • It should also be noticed that here the sampling is uniform sampling. If you are interested in this topic, you can review some references in non-uniform sampling.
  • Also, when it comes to 2D signal, it is a little tricky to see the Nyquist sampling. It's beyond this blog.


What does CS say?



Before we go into CS, I'd like to a more complete communication system.
As you see in the figure, all "information" after the dash line are in format of bits stream. And if we ignore all imperfection of the system in between, e.g. we have perfect channel, perfect modulation scheme. Then the process after dash line is distortion free.
Then we move before the dash line, as we shown in last section, the sampling and de-sampling process can be distortion free.
Then how about the quantization? It brings distortion, and the precision of the quantizer determines the intensity of the distortion.
However, you can set all these asides to understand CS.

As shown in the figure, A/D converter actually is responsible for sampling and quantizationg, D/A converter does reverse things.
And source encoder is responsible for compression.
Then consider, if we can sub-sample the signal, then we may not need to compress the signal. Then the only question is how to reconstruct the signal. CS theory just asserts the signal can be recovered faithfully, if the signal is sparse in certain domain, which is highly incoherent with time domain. For instance, the frequency domain is incoherent with time domain.
The CS system can be imaging as following,

$x[n]$ is corresponding to $r[n]$, however, in the real system, we actually do not have $x[n]$. Instead, we have already compressed $p[n]$, which corresponds to $s[n]$.
From $r[n]$ to $r(t)$ is not a problem. Then how to get $r[n]$ from $s[n]$? That's where CS works.
And as you can see, the A/D (D/A) converter becomes A/I (I/A) converter.
The sampling requirement reduced, while the reconstruction is more powerful. This is another kind of asymmetry compared to traditional "source encoding".

How can CS be more powerful?
Say, what if the signal is sparse in time domain, or in a domain coherent with time domain? Then, according to the CS theory, we should not under-sample in time domain as shown before. We need to under-sample in a domain incoherent with the sparsity domain.
Then how to design it? The most beautiful part of CS is RIP. The RIP says if you design the "compression" part as under-sampling signal in a domain, while the under-sampled part of the domain satisfies RIP, then the signal can be reconstructed faithfully.
To my understanding, (and certainly you can have your own interpretation), a matrix satisfies RIP essentially is a subset of domain, i.e., only a few of basis of the domain, while the domain is incoherent with almost every other domain. That's the "university" of CS theory.
An example is Gaussian matrix, which is almost incoherent with either time domain or frequency domain.
It is a little tricky, but if you take into account the role of "overwhelming probability", it would be easier to understand the inner connection.
I didn't go into the mathematical part of these relationships. But if you look at them as a black box, such interpretation does help.

Then the problem is how to first transform the signal into $\Phi$ domain, to make the following under-sampling possible.
This is the story of A/I converter. I won't go into this topic. But intuitively speaking, it takes the help of convolution/integrator.

And again, for 2D signals, like imaging system, even high-dimensional signals, the Nyquist sample and CS can be interpreted in another way. And a most vivid example is the "single-pixel camera".
I will discuss the 2D signal sampling, especially the imaging systems, in future.

Wednesday, September 26, 2012

Sparse Representation, Compressed Sensing, Single Measurement Vector, Multiple Measurement Vector: connections and differences

Purpose of this blog entry

Sparse representation (SR), compressed sensing (CS), single measurement vector (SMV),  multiple measurement vector (MMV), I was ever confused about these terms. They seems to be the same, however, the way they using to describe problems differs.In this blog, I aim to make clear the inner relationship among them, and the development of these theories.

The Big Picture

To my understanding, the following figure describes the relationship of them. 
To see the inherent relationship, I'd rather like to break the CS into two kinds, 1st generation CS (CS1G) and 2nd generation CS (CS2G). The difference will be discussed later. Nevertheless, such notation is not used in current society. So please not be confused when reading papers.
In the paper, CS is referred to CS2G. And SMV is referred to CS1G.
The notion SMV is first mentioned in [10]. (And in [11], one measurement vectors is used to describe the SMV problem).
We will return to the big picture in the end of this blog.

SMV: SR and CS1G

The SMV model basically presents such problem: $y = \Phi x$
  •  sparse representation: $y$ is viewed as a signal in reality, $\Phi$ is called an over-complete dictionary, $x$ is the sparse representation
    •  we want to find a sparse representation of the signal using a dictionary, and we want the signal to be as sparse as possible
    •  $\Phi \in R^{M \times N}$: if M < N, the dictionary is over-complete, if M = N, the dictionary is complete, if M > N, the dictionary is under-complete. The "complete" is viewed from the signal $y$. Simply speaking, if the length of the representation is larger than that of the signal, then the signal is "over-represented", and thus the dictionary is "over-complete".
    •  2D DCT, DFT, etc., are sparse representation of the signal using a complete dictionary
    •  In this blog, we consider sparse representation of the signal using an over-complete dictionary
  • 1st generation compressed sensing:  $x$ is a sparse signal in itself, i.e., there are only a few non-zero entries in $x$, $\Phi$ is called measurement matrix, $y$ is called measurements
    • we want to sample a sparse signal using less measurements than its length, in other words, compressing the signal.
    • here we only want $\Phi$ to be over-complete in terms of $y$
    • note that "$x$ is a sparse signal in itself " is the identity of CS1G. For CS2G, as we will discussed in next section, $x$ is a sparse signal in domain $\Psi$, which is a generation of CS1G
Therefore, SMV is actually a model of SR and CS1G, with each describes different applications.
And we can also view SR as the reconstruction step of CS1G. In other words, in a sampling-reconstruction system, CS1G is the sampling process, and SR is the reconstruction process, by taking the sample as the signal.
If you have this in mind, you then probably can have a better understanding of following references discussed in this section.

The problem discussed in [1] is that a signal can be reconstructed if certain requirements were met in both cases:
  1. a signal is sparse in frequency domain, and under-sampled, in other words, missing samples, in time domain, 
  2. or a signal is sparse in time domain, and under-sampled in frequency domain
They are actually CS1G problems, i.e., sampling a sparse signal in itself, and reconstructing the signal by solving the $\ell_1$ minimization.
For the first case, $x$ is the frequency components of the signal, $\tilde{\Phi}$ is inverse DFT basis, i.e. $IDFT(x) = \tilde{\Phi} x$  and $\Phi = R\tilde{\Phi}$ is the measurement matrix, where $R$ is a selection matrix, selecting $M$ rows of $\tilde{\Phi}$, $y$ is under-sampled time domain signal.
For the second case, $x$ is the time domain signal, $\Phi = R \tilde{\Phi}$ is the measurement matrix, where $\tilde{\Phi}$ is the DFT basis, $y$ is under-sampled frequency domain signal.
Thus, by taking the frequency domain components (first case), and time domain components (second case) as the original signal, respectively, the problem discussed is essentially a CS1G problem.

I'd like to also mention the following principle proposed in [1], which, I think, is the basic of following theoretical results.
  1. Classical Uncertainty Principle: a function $f$ and its Fourier transform $\hat{f}$ cannot both be highly concentrated. $\Delta t \cdot \Delta w \geq 1$
  2. [1] show a more general principle holds: it is not necessary to suppose that $f$ and $\hat{f}$are concentrated on intervals; instead, they can be just concentrated on a measurable set. $|T| |W| \geq 1-\delta$. And it also applies to sequences. $N_t \cdot N_w \geq N$
    1. CT principle: missing segments of a bandlimited function can be restored stably in the presence of noise if (total measurement of the missing segments) $\cdot$ (total bandwidth) < 1.
    2. DT principle: a wideband signal can be reconstructed from narrow-band data, provided the wideband signal to be recovered is sparse or "impulsive"
Then I have an intuitive understanding on he SMV/SR/CS1G theory: 
  1. a signal cannot be both sparse in two incoherent basis
  2. under-sample the signal in a non-sparse domain, and recover the signal in sparse domain
    • then from the recovered sparse domain signal, we are able to get the original signal
And this is actually the CS2G theory, which do not require $x$ to be sparse in itself. We leave the discussion in next section.
[2]-[5] are a sub-group of researches, related to both CS1G and CS2G theory.
[2] actually discussed the SMV problem from SR's point of view. And it proposed more incoherent basis pairs besides (time, frequency). And these incoherent basis pairs are actually CS2G theory.
[3] improved the constrains related to the replacement of $\ell_0$ minimization with $\ell_1$ minimization in [2].
[4] proved that the condition in [3] is both sufficient and necessary, whereas [3] only proved the sufficiency.
[5] should be an extension of [2]. However, I didn't go into details.

[6] is about CS1G, and can be viewed as a direct extension of [1]. The problem discussed in the paper can be phrased as follows:
  • a N-length discrete time signal $f$ is sparse in time domain, i.e., consists of a superposition of |T| spikes. 
  • sub-sample the signal in frequency domain, i.e., only sample |$\Omega$|  frequency components, instead of N.
  • If |T| $\leq C_M \cdot (\log{N})^{-1} \cdot |\Omega|$, then the reconstruction can be exact with probability at least 1-O($N^{-M^})
  • the reconstruction is via solving $\ell_1$ minimization problem.
See, it is almost the same with [1] in this sense.

In addition, [6] shows that the min TV problem is actually the problem stated in the paper.


CS2G

As I stated before, 
  1. a signal cannot be both sparse in two incoherent basis
  2. under-sample the signal in a non-sparse domain, and recover the signal in sparse domain
The most classical CS theory is expressing this idea.
$y = \Phi x = \Phi \Psi \theta$.
To better understand the essence, we let ($\tilde{\Phi}, \Psi$) be a pair of incoherence basis, $\Phi = R \tilde{\Phi}^T$.
Then, in $\tilde{\Phi}$ domain, $x$ can be expressed as $\gamma = \tilde{\Phi}^T x$, in $\Psi$ domain, $x$ can be expressed as $\theta = \Psi^T x$.
So $y$ is actually achieved by under-sample the signal in non-sparse domain, i.e., $\tilde{\Phi}$ domain.
In other words, $y = R \gamma$. 
And the signal is reconstruct in sparse domain, i.e.,, $\Psi$ domain.
Then the CS1G is actually taking $\Psi$ as identity basis.

[7]-[9] are actually the theoretical foundation work on the CS2G theory, together with [2]-[5].
I didn't find the paper [7]. However, it is stated in [8] that [7] extended the result in [6], and showed the exact recovery holds for other synthesis/measurement pairs. And [8] describes these results, making reading [7] not so necessary. The results are shown as follows:
  • an N-length signal $f$ is sparse in domain $\Phi$, i.e., $\theta = \Phi f$  has only a few nonzero entries.
  • sub-sample the signal in domain $\Psi$, i.e., only sample $|\Omega|$ coefficients, instead of N
  • the reconstruction can be exact, via solving an $\ell_1$ minimization problem.
  • the requirement of the exact reconstruction is about the incoherence between $\Phi$ and $\Psi$.
[8] extended the work of [6] and [7], showing that the signal compressible, not necessarily strict sparse, in certain domain can also be optimally reconstructed, i.e., the reconstruction error using K measurements is as good as the best K-term approximation, with overwhelming probability.
[9] discussed the linear code corrupted by noise problem. However, the RIP condition is proposed in this paper, although it does not show the clear relationship between the decoding and CS. Besides, the Gaussian ensembles are proved to follow RIP with an overwhelming probability. On the other hand, it should be noticed that, if RIP is satisfied, the reconstruction is exact determinedly, whereas in [6] [7] [8], the reconstruction is with overwhelming probability.

Then [13] [14] organized all related theoretical results and formulate the CS problem.
And [12] gives a tutorial for CS theory, which I preferred to recommend for reading.


MMV

The only thing left is MMV. As is shown in the big picture, MMV is a direct extension of SMV, in a different way with the evolution from SMV to CS.
[11] formulate the MMV problem. Similar to SMV, the MMV can be viewed as either a 2D sparse representation problem, or a 2D CS1G problem.
[10], [15] give theoretical results on MMV, which is derived from those on SMV, i.e., [1]-[6].


Discussions


  1. In this blog, I didn't discuss the development from noiseless to noisy case, which is also an interesting and important evolution.
  2. In this blog, I didn't focus on the development from sparse signals to compressible signals.
  3. In this blog, I only discussed the discrete-time situation. However, CS is actually can be, and should be addressed in analog signals, which is the meaning of "sensing".
  4. Since I'm working on CS theory, my knowledge to SR, MMV is limited. Thus, the goal is to discern the CS with SR, SMV, MMV, instead of introducing SR, MMV, which I believe are far more complicated.
  5. Return to the big picture, it is interesting to ask, can we extend MMV similar to the extension form SMV to CS? To the best of my knowledge, recent work of Duarte and Baranuik in [16] discussing the high-dimensional CS, where 2D CS is an special case. However, it differs with MMV in the reconstruction method, since it still solving the reconstruction problem in 1D. I would appreciate your help if you can share your opinions and references on this issue.
  6. At last, I really hope this blog can smooth your perplexity when reading references where you don't know what SR, CS, SMV, MMV refer to. They are on the one hand synonyms and inherently similar, and on the other hand differ slightly from each other in terms of their representations and applications. By the help of this blog, you may have in mind what the reference is discussing when reading, at least, references given in the blog.

[1] Uncertainty principles and signal recovery
[2] Uncertainty principles and idea atomic decomposition
[3] A generalized uncertainty principle and sparse representation
[4] On sparse representation in pairs of bases
[5] Optimally sparse representation in general (nonorthogonal) dictionaries via l1 minimization
[6] Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information
[7] The role of sparsity and incoherence for exactly reconstructing a signal from limited measurements
[8] Near-Optimal signal recovery from random projections: universal encoding strategies ?
[9] Decoding by linear programming
[10] Sparse representation for multiple measurement vectors (MMV) in an over-complete dictionary
[11] Sparse solutions to linear inverse problems with multiple measurement vectors
[12] An introduction to compressive sampling
[13] Compressive sampling
[14] Compressed sensing
[15] Theoretical results on sparse representations of multiple-measurement vectors
[16] Kronecker compressed sensing