This document describes and implements several of the most important functions in image processing: discrete wavelet transform, quantization, smoothing. I also propose adding noise to images as to remedy quantization artifacts. All the stuff here is based on the Cohen-Daubechies-Feauveau 9/7 wavelet filter, the standard transform in the JPEG-2000 standard. We implement it with an efficient lifting transformation. Of course, I did not follow the standard, but have attempted to simplify.
The core idea of the ennoising strategy is to estimate the amount of quantization noise caused by the quantization in each subband, approximate it with a normally distributed random noise, and re-introduce it during decompression. The result can be seen in the image below. On the left, there is the compressed image, suffering from the blurring artifacts. In the center there is the original image, and on the right there is the compressed image to which Gaussian noise was added to the wavelet coefficients after de-quantization and before IDWT: (images are slightly enlarged)Downloadables:
You may view Mathematica notebooks with the MathReader.
Please note that this note doesn't cover all the features, rather an extremely limited (but important) subset of them, mostly related to the basics of wavelet transforms and quantization. I still hope that it will be useful and educational as an introduction to applying wavelets to image compression and denoising.
This is the Cohen-Daubechies-Feauveau 9/7 wavelet filter, the standard transform in the JPEG-2000 standard. We implement it with an efficient lifting transformation.
Vectors are extended to allow proper transformation with filters that reference the surrounding area of every sample. The extension is done with 4 elements, as required by the above 9/7 filter.
Note that the transformation is reversible regardless of the vector extension
But note that the coefficient magnitudes are lower with vector extension
The high-pass and low-pass subbands are interleaved after the transform, but it is often desirable to separate and deinterleave them for further processing.
Wavelet transform is especially useful for transforming images. For this, we apply it twice according to the JPEG-2000 standard: first on columns, second on rows. Upon this, we deinterleave the image matrix, and possibly recursively transform each subband individually further.
We can now import the standard benchmark picture, sized 512x512:
We will only work with a small but diverse part of the image, combining sharp features, smooth gradients, and complex textures. Note that we conver the unsigned byte values [0,256) to signed on the interval [-128,128). This way we can exploit gradient symmetry in quantization
By transposing the image, we can easily implement the 2D DWT:
It is very useful to interleave and deinterlave images too. We do this by creating four subimages, for the HH, LH, HL, and LL subbands.
Now that we have everything ready, we can embark on processing the chosen section of the image.
Let's show all four subbands:
The transformation is reversible, as we now show.
Now, let's try to denoise the image, by chopping near-zero values in the LH, HL and HH components (or low-power white noise), and subsequently reconstructing the image. Note that we chop away more in the HH subband than in the LH and HL - but we didn't have to. This technique is called hard thresholding.
Soft thresholding shrinks the magnitude of the coefficient values outside the threshold, but it's worse in our example.
Image compression is based on a slightly different concept - quantization. Instead of having 256 levels of grey, we might have only 16. Such an image would look bad. But we can do just this on the LH, HL and HH subbands, without losing much. All we're going to do is rounding - with different amounts depending on the subband. This is called deadzone (rounding down) uniform (division by a number) scalar (it operates on individual color components of individual pixels which are scalars) quantization. Let's see:
As you can see, not that much quality was lost - especially considering that the above picture is enlarged on the screen! But the final filesize due to skipping bits is (compare with 100%):
This might not seem much, but note that the remaining information is much more (i.e. orders of magnitude more) compressible with ordinary entropy coding techniques (usually preceded by run-length coding), as we also denoised the image! Furthermore, we can recursively filter the LL subband again, and repeat the procedure on LLLH, LLHL, and LLHH.
Examine the subbands after quantization. Note that the LH, HL and HH subbands have become extremely simple and very well compressible.
We notice that the sharp edge information remains in the LH, HL and HH subbands, but most of the lower amplitude noise, as well as the exact magnitudes of the edges have been compromised.
Let us automatize some earlier operations. Specifically,we will perform DWT recursively in each subband. This is sometimes called a packet or SPACL decomposition (if we recursively subdivided the LLLL once or twice more), unlike the above 1-level Mallat decompositions where the high-pass subbands remain undecomposed. The decompositions can be performed recursively - if we performed a Mallat decomposition again on the LL, we would obtain a 2-level Mallat decomposition. The decomposition below is actually a 2-level packet decomposition, as we performed it twice recursively, and it's packet decomposition because it was performed on all subbands.
We can see that the LH*, HL* and HH* subbands appear to be quite uninformative. It's time to try to understand what wavelet transform is really about. The low-pass filter is a kind of a predictor: what can we say about the average value of a group of pixels. The high-pass filters, on the other hand, store the errors we make with the low-pass predictors. LH and HL store the errors we make when we try to get to average values of column-errors across rows, or vice versa, whereas HH stores LH's and HL's errors.
We see that the LL pass over LH,HL,HH subbands is tending towards the average of 0. This is a good indicator of noise as those irregularities that appear merely in the highest resolution, but cannot be generalized to a wider area. However, we must note that people do recognize the absence of noise. The main difference between the original and the compressed image is that the compressed one is too smooth.
In the following experiments, we will try to capture the regularities of noise. To do this, we will need a different kind of a transform.
Let's examine the error introduced by quantization. We notice that the error is pretty much random, with the exception of a few features in the HH band. The idea we will now pursue is to try to describe this error and re-introduce it in the decompressed image.
Let us now examine the frequency distributions of error:
The LH and HL band distributions are narrower because of the lower level of quantization, and are not very Gaussian: this is because noise got through quantization in those two bands. We could have been more radical when quantizing - disposing with more than 4 bits without losing much information. But not to dawdle, let's now capture the noise (assuming it is Gaussian) and reintroduce it in the image.
There is an important consideration to be made: a part of the quantization noise is indeed noise, but a part is lost information. The lost information shouldn't be considered a part of noise - as we cannot reintroduce it. Thus not to overestimate the noise , we divide the estimated std. deviation by 2.
The noise description is very compact, and adds very little to the compressed file size. Especially considering the fact that the mean is always approximately 0, and does not need to be stored.
Here we add the noise to the compressed image:
And now we compare the "en-noised" compressed image to the original image and the normal "de-noised" compressed image:
Although the en-noised image does look better than the de-noised one, we note that there is more noise than there was in the beginning. This is partly because we equated some information lost in quantization with noise, and partly because there are noiser and less noisy areas on the image. For example, the skin as a less noisy part of the image has received more noise than there was in the beginning, whereas the edges and sharp features received too little. We corrected this by introducing the correction factor of 2 (we divided the standard deviation by 2) which makes the skin appropriately smooth, but the hat feathers excessively smooth.
We would have achieved better results by decomposing in the spatial domain. For example, the texture of the hat is noisier than the texture of the skin, so the noise distribution should be computed differently there. There is another important but yet unsolved problem: a part of the quantization noise is indeed noise, but a part is lost information. The lost information shouldn't be considered a part of noise - as we cannot reintroduce it.
Now, we compute the standard deviations of quantization residuals at different resolutions. We will call these "variance maps".
We will need a routine which will bunch together neighboring pixels
It seems that the variance map at 16 pixel resolution works best: not too much detail, yet not too little of it. Let's now apply variance map at resolution 16 for noise reconstruction. We divide the standard deviations by 2, or else the amount of noise is overestimated. It is important to realize that only noise must be reintroduced, but not just noise is lost in quantization.
Although the noise levels on the skin and in the feathers are now more correct, there are still visible transitions between different noise levels. More troubling, there is too much noise in the areas around sharp edges. Further work should address proper computation of noise levels around sharp edges.
We notice that we do not need to compute the standard deviations in such an expensive way: it is merely an issue of averaging absolute residuals. We might wonder if we could describe the noise better with Spacl decomposition -
One interesting approach would be discarding the noisy high-pass subband of variance and quantizing the low-pass subbands of variance.