CS 180 Project 1

Simple implementation

The most basic implementation would be to divide the image into three equal parts and overlay the corresponding channels. However, in doing this, it becomes clear the images are not aligned.

Cathedral glass plate image from library of congress

Three channels separated and overlaid with no alignment

Alignment: Naive implementation

So, how do we align these images? The simplest method would be to brute force check all x and y translations within a window (say -15 to +15 pixels) and see which resulting image is "best". To define "best", we have to choose an image matching metric. A simple criterion would be to choose the L2 norm or Euclidean distance and define "best" as the image which minimizes the distance between the channel being aligned and the alignment target. The Euclidean distance is defined as follows: \[\text{dist}=\sqrt{\sum(image_1-image_2)^2}\] Where image1 is a pixel of the first image and image2 is a pixel of the second image. The square of the difference is summed for each corresponding pixel in the two images. In our implementation, we drop the square root because it does not affect which shift minimizes the distance. Another metric is the normalized cross correlation (NCC), which is defined as follows: \[\text{NCC} = \frac{\sum(image_1-\mu_1)\cdot(image_2-\mu_2)}{\sqrt{\sum(image_1-\mu_1)^2\sum(image_2-\mu_2)^2}}\] The NCC is less sensitive to changes in illumination and contrast, so brightness offsets in the different channels don't affect the resulting score as much as it would for a Euclidean distance. Moving forward, the alghorithms all use NCC because of this reason.

Unaligned image

Aligned image using L2 norm

Aligned image using NCC

Alignment: Cropping

Upon inspection, it's clear that some plates are still misaligned. This is because of behavior at the edges of the plates. Thus, to attain a much better alignment, we can simply crop out the edges that are skewing our results. Moreover, we don't need the entire image to align the plates, so we can crop more aggressively and only consider a smaller portion of the image. In our implementation, we crop 25% off of each side, effectively using 25% of the original image area. This results in a faster and much better alignment. We also crop the outer 5% of the final image to partly get rid of blank borders.

Cathedral Single-Scale
OffsetR = [12,3], OffsetG = [5,2]

Monastery Single-Scale
OffsetR = [3,2], OffsetG = [-3,2]

Tobolsk Single-Scale
OffsetR = [7,3], OffsetG = [3,3]

Image Pyramids

So cropping and exhaustively searching all alignment seems to work pretty well for the images so far. However, this method does not scale well and becomes incredibly expensive for large displacements (which is often the case for higher resolution images). Thus, to accomodate higher resolution images as well as generally create a more efficient alghorithm, we implemented an image pyramid. An image pyrmaid consists of an image at many different levels of resolution. This allows us to align on the coursest resolution and then slowly step down levels and perform finer adjustments. For our implementation we blur and subsample the plate being aligned and the plate being aligned to until either the height or width is below 32 pixels. We then align it over a 5 pixel window and use the displacement vector (the values have to be doubled to match the resolution of the previous level) as a prior to aligning the previous pyramid level. We keep using the shift obtained from one level as a prior to aligning the previous level until we reach level 0 or the original image. This yielded significantly faster results that are as good as the exhaustive search.

Image pyramid diagrom from wikipedia
https://en.wikipedia.org/wiki/Pyramid_(image_processing)#/media/File:Image_pyramid.svg

Results

These are the results of the final alghorithm using the NCC as the metric, aggressive cropping for better alignment, and an image pyramid for efficieny. One may notice that our alignment for Emir did not quite work out. We believe this is because our distance metric is based on correlation of pixel values. Because the red, green and blue filters don't actually follow the same trends, it doesn't quite work out. Taking a look at the three separate channels before alignment, one can see that the red and blue channels seem to be almost the oppsite. Moreover, because we used NCC and are maximizing it, we are looking for a positive correlation rather than a negative one. It may have been preferable to maximize the magnitude instead.