DON’T WORRY: I will talk about short, fat matrices in a bit (specifically, about a paper I recently posted on the arXiv), but first I want to give some cool historical context.
In 1948, Claude E. Shannon authored the founding document of modern information theory: a journal article entitled “A mathematical theory of communication.” I find it to be a very interesting paper, everyone should read it, and I’d like to summarize the first few sections:
Given a class of messages and a communication channel, the main idea is to exploit of their “statistical structure.” Specifically, each message is modeled as a discrete random variable with distribution
. From here, entropy
is proposed as a measure of uncertainty in the message
. For example, if
is uniformly distributed over
outcomes, then
. This illustrates that entropy also quantifies the amount of information in a realization of
, where information is measured in bits (binary digits). The channel is then viewed as a machine that outputs a random variable
after receiving the message
, and this machine is completely characterized by the conditional distribution
. To quantify the channel’s noise, conditional entropy is proposed:
; this measures how much uncertainty tends to remain in the original message
after receiving the noisy message
. Certainly, a noiseless channel would have
, and so the amount of information received in one realization of
(aka
) would be
. Intuitively, any remaining uncertainty in
after receiving
would decrease this amount, and indeed, the average transmission rate is given by the mutual information
. You could tweak the message distribution (i.e., use code words) in order to increase the average transmission rate, but the best you can do is the channel capacity
. The paper’s main result, now called the noisy-channel coding theorem, says that for every transmission rate smaller than
, there exist codes which enable arbitrarily small error probabilities at the receiver.
The insights in this paper laid the foundation for technology-based communication. To date, a whole host of codes have been tailored for various applications, including cell phones, ethernet, wireless communication, digital satellite television, and deep space communication. In these and all other applications I can think of, one can establish some sort of statistical distribution on the set of plausible messages. For example, the letter “e” probably makes up about 13% of the alphabetical characters in this blog entry. Also, the difference between neighboring pixels in a natural image is typically small, and an empirical distribution of these differences can be easily determined from a training set. However, unlike the message, it is not always appropriate to model the communication channel as a random process.
Suppose, for example, that the communication channel is a post office, and suppose that post office employs my arch nemesis, who plots to destroy my long-distance relationship with his sister. I write letters to her, and he intends to discard or even re-write some of my words and phrases to make me look bad. I could go in cahoots with his sister by coming up with some sort of cipher (perhaps we’d make one over the phone), but he’s so evil, he’d likely eavesdrop on the phone call! For such an adversarial channel, I have to assume that my message will arrive with worst-case (as opposed to average-case) noise. Thus, a random model for the channel would be inappropriate, and so Shannon’s theory is not directly applicable.
In that scenario, assuming my enemy will leave more than 75% of the characters untouched, we could defeat him using a Hadamard code. In general, error-correcting codes are the way to go here. Recently, I posted a paper on the arXiv that discusses how frames might be used in the case where a transmitted signal faces an adversary along with additive noise:
http://arxiv.org/abs/1202.4525
Here’s the problem: I’d like to convey a vector to my friend, and I plan to do this by transmitting some vector
of larger dimension, but there’s an adversary between us who can remove any of the entries of
. This noise process is sometimes called active jamming. Of course, we have to impose some constraint on the adversary, since otherwise he could remove all of the entries and succeed in thwarting my communication. As such, we assume he is only capable of removing a proportion
of the entries. In addition to adversarial erasures, we can expect additive noise due to round-off error, atmospheric effects, etc. To make life easier, let’s decide up front to build
from
according to a linear process:
, where
is a short, fat matrix (see, I told you it would come up eventually). To defeat both the adversary and the additive noise, we seek short, fat matrices with the following property (note that
and
here are different from Shannon’s):
Definition. Given and
, an
matrix
is a
–numerically erasure-robust frame (NERF) if for every
of size
, the corresponding
submatrix
has condition number
.
The idea is that the adversary will erase at most of the entries of
, and so my friend at the receiver will only consider the other
entries, indexed by
. Having
, my friend will estimate
by applying the Moore-Penrose pseudoinverse:
. Since
is a NERF, we can rest assured that
will be well-conditioned regardless of which entries the adversary erases, and so
will be a decent estimate of
despite the additive noise.
In this paper, we consider an array of methods to construct NERFs. Random matrices are NERFs with , and aymptotically maximal equiangular tight frames (ETFs) are NERFs with
. Other constructions, such as group frames, have
for integers
, but they aren’t as successful. Overall, our constructions achieve erasure rates of the form
. Interestingly,
can approach
in the ETF construction, but not in our random construction.
At the moment, we aren’t convinced that this “one-half barrier” is a mere artifact of our constructions.
Here’s the reasoning: No matrix with entries can be a NERF with
. Why? Because the adversary can always delete less than half of the columns and leave a rank-deficient (let alone well-conditioned) matrix: observe the first two rows, in which corresponding entries are either equal or opposite, and simply delete the columns corresponding to the less popular relationship! What’s more,
matrices are particularly representative of all possible matrices. For instance, random matrix methods tend to apply to
matrices without loss of effectiveness. It remains to be seen whether the one-half barrier is a true fundamental limit of NERFs. In our paper, we prove some (weaker) fundamental limits, but the one-half barrier remains elusive.