This was meant to be the final post in the series but while writing it I realised that I had too much to say and as such, I decided to split this into more parts (I still don't know how many...). If you feel lost while reading, try to remind the basic concepts from the previous posts:
- Statistical Physics of Error-Correcting Codes #1:
- Statistical Physics of Error-Correcting Codes #2: Statistical Physics Overview
Also, take a look at the references therein. Now, at the end of post #2 on this subject I said that when working with statistical physics we are always interested in what happens in the
thermodynamic limit, by which we understand the limit when the number of units
N in our system is very large. This is not always true actually. There is a lot of interest in what is called
finite size effects in statphys, which are the effects that appear when we consider
N not being large. Although the difference between large and not large may seem not very well defined, mathematically it can be translated as taking the limit of
N being infinite.
Taking the system size to infinity may seem radical and even unnatural, but a bit of practice with calculations is enough for everyone to see that one
mol of atoms is
for all practical purposes (
fapp, according to
John Bell's definition) infinite. Even smaller numbers, like one-billion or one-million may be close enough to infinite to render any deviation completely negligible in most practical situations. Now think about the last film you downloaded (I know you did it, but I will not tell anyone). It was probably of the order of 1G, right? This means that it has about 10 billion bits. This may not be as big as a mol, but still is big enough to allow us to consider this system already in the thermodynamic limit fapp.
Consider then some very large file. To be consistent with post #1 let us call it a message
t. In the same way as the electron spin, bits can assume only two values. The former can assume +1 and -1, while the latter can assume 0 and 1. For calculational convenience, it is usual to work with the former representation (specially because statistical physicists were used to do that way before they started to look at bits). We can transform between these two representations in two ways. The first is the linear relation
\[\sigma=2x-1,\]
where $\sigma\in\{\pm1\}$ is the spin variable while $x\in\{0,1\}$ is the binary variable. You can see that 0 is mapped to -1 and 1 to +1 as would seem more natural. However, there is a second way to map these variables which seems less natural, but actually is much more beautiful, namely
\[\sigma=(-1)^x.\]
And where is the beauty?, you may ask as it is not only non-linear but also maps 0 to +1 and 1 to -1, which is kind of weird. In fact, the above mapping is a
homomorphism between the
Galois field of order 2 and the square roots of unity!
Let me explain this better. Isolated bits are usually summed using a sum mod 2 operation, also known as
exclusive OR or simply
XOR. The operation is defined by the following table at the right and many times it is expressed using the symbol $\oplus$ to differentiate it from the normal + sign. That's the notation I am going to use. By also defining the product of two bits to be the usual product of two numbers, the bits form a mathematical structure called a
finite field or a
Galois field and its order is the number of elements, in this case 2. But under this addition mod 2, this so-called
binary field is also a
group. The two square roots of 1 also form a group under the usual multiplication, and the non-linear mapping defined above maps one group to the other preserving the group operation. This can be easily seen by writing the mapping in a more formal way as $\sigma(x) = (-1)^x$ and observing that
\[\sigma(x_1\oplus x_2) = (-1)^{x_1\oplus x_2} = (-1)^{x_1}(-1)^{x_2}=\sigma(x_1)\sigma(x_2).\]
Homeomorphisms are always beautiful. This one in special can be even generalised to higher orders. When the Galois field has a prime order p, the addition operation of the field can be taken to be the addition mod p (be careful, as it doesn't work for a general order p!) therefore representing the elements by the integers from 0 to p-1. The mapping can then be generalised to
\[\sigma(x)=\exp\left(\frac{2\pi i}{p} x\right).\]
I will leave to you to check that this is indeed a homeomorphism. This formula can be interpreted as a mapping from the Galois field to spins pointing in some direction, as the exponentials are nothing more than the p-th complex roots of unit. I guess that all this beautiful mathematical structure is enough to justify using the non-linear mapping instead of the linear one, and that is what I am going to do from now on.
Keeping the potential for generalisation in the back of our minds for now, let's come back to the case of bits, the binary field. There are three classic
ensembles in statistical physics. The word ensemble refers to a particular situation of study where some microscopic variables are allowed to vary around a mean value and others are kept fixed. When we fix the number of particles (or units, I will use it particles many times as I am a physicist and some habits are difficult to lose) and the
average value of the energy, we call this the
canonical ensemble, and this will imply that the temperature of the system will be kept constant. There are many ways to show (my favorite being the maximum entropy that you can find in
Jaynes book) that when the system is in equilibrium the canonical ensemble is describedby the Gibbs distribution
\[P(\sigma)=\frac{e^{-\beta H(\sigma)}}{Z},\]
where $H$ is the Hamiltonian (energy, cost function etc) of the system. $Z$ is the
partition function and is one of the stars of the show. Actually, from the partition function we can derive all important properties of the system.
Then, our first task will be to find out what is the analogue of the Hamiltonian in our case. Let's first understand what is that we want when analysing error-correcting codes. The interesting thing is to understand how much noise the code can stand before we cannot perfectly recover the message. We will not accept errors in the recovered message because if we do, the message will degrade each time we decode it. Imagine that in your HD. If you access to much a document, you loose it. Just to give the right credits here, the first person who noticed the mapping between codes and spin systems was Nicola Sourlas back in 1989:
However, we will deal with a slightly different model from his. Our basic idea will be to use
Bayes' rule to infer the original codeword from the corrupted one. Once we have the codeword, the original message is automatically retrieved as the correspondence is 1-to-1. Let us call the dynamical variable representing the possible codewords by $\tau$ and the received codeword, already corrupted by noise, by $r$. Then, according to Bayes' rule we have
\[P(\tau|r)=\frac{P(r|\tau)P(\tau)}{Z}.\]
The first term in the numerator is the so-called likelihood of $\tau$ and represents the action of the noise in the codeword. Therefore, this term contains the noise model of the channel. The second term is a prior distribution over $\tau$, where we include any information we have that may help in the decoding. Finally, $Z$ is just the normalisation factor, although you probably already noticed that I am using the same notation as for the partition function because that is exactly what it will be.
In our case of parity-check codes, we know that the codeword is in the kernel of the parity-check matrix. That's the only prior information we have about the codeword and we will symbolise it by the indicator function $\chi(A,\tau)$ which will be 1 when $\tau$ is in the kernel of $A$ and zero otherwise. There are many ways to include this term in the calculation, but the most physical one is to see it as an interaction term between each coordinate of $\tau$. Then, we can write the probability of $\tau$ as
\[P(\tau)=\frac{e^{-\beta H}}{Z},\]
where
\[H=-\ln P(r|\tau)-\ln \chi(A,\tau).\]
This finally has the form of a Gibbs probability. The inverse temperature $\beta$ was introduced for convenience and can be taken to be 1 to coincide with the original problem. The first term of the Hamiltonian usually factorises in the spins, with each component of $r$ acting in a different component of $\tau$, which confers them the role of local fields.
There are many subtleties in what I wrote above, but the general idea is that. Bayes' rule allows us to make the connection with the statistical physics of the problem. In the next post I will enter into the details of this formulation and show how the methods of statistical physics allow us to extract what we want from the problem.