CS 180 Project 2

Part 1.1: Convolutions From Scratch

Convolutions are central to image processing, so we implemented one using numpy! Because we have a naive implementation that doesn't take advantage of the Fourier Transform, we have \(O(n^2)\) time complexity. We implemented zero padding with support for each mode ("full", "valid", and "same") scipy.signal.convolve2d supports. The scipy function also has different options for boundary handling, but defaults to a zero fill. We only opted to implement the zero fill so our function has a little functionality.

                    
def load_image_as_grey(path:str) -> np.ndarray:
    """Read image from path and convert to greyscale then return numpy array of image data"""
    im = skio.imread(path)
    if im.shape[-1] == 4:
        im = sk.color.rgba2rgb(im)
    im = sk.color.rgb2gray(im)
    im = sk.img_as_float(im)
    return im

def convolve(img: str, kernel:np.ndarray, shape:str='same') -> np.ndarray:
"""
Convolves an image with a specified kernel 
Args:
    img (str): path to image file
    kernel (ndarray): 2D matrix representing convolution kernel 
    shape (str): 'full', 'same', or 'valid'. Determines how to pad the image and the output dimensions
"""
img = load_image_as_grey(img)
imgH = img.shape[0]
imgW = img.shape[1]
kH, kW = kernel.shape
padH = (kH)//2
padW = (kW)//2
match(shape):
    case 'same':
        img_padded = np.pad(img,((padH,padH),(padW,padW)), mode='constant',constant_values=0)
        outImg = np.zeros_like(img)
    case 'full':
        img_padded = np.pad(img,((kH,kH),(kW,kW)), mode='constant',constant_values=0)
        outImg = np.zeros((imgH+kH, imgW+kW))
    case 'valid':
        img_padded = img
        outImg = np.zeros((imgH-kH, imgW-kW))
        imgH -= kH
        imgW -= kW
    case _:
        #default to 'same'
        img_padded = np.pad(img,((padH,padH),(padW,padW)), mode='constant',constant_values=0)
        outImg = np.zeros_like(img)

# Flip the kernel or else this is a cross-correlation
kernel = np.flip(kernel,(0,1))
for i in range(imgH):
    for j in range(imgW):
        region = img_padded[i:i + kH, j:j + kW]
        outImg[i][j] =  np.sum(region * kernel)
return outImg

We ran our convolution alghorithm using a 9x9 box filter as well as the finite difference kernels which are defined below. \[D_x=\begin{bmatrix}1 & 0 & -1\end{bmatrix}\] \[D_y=\begin{bmatrix}1 \\ 0 \\ -1\end{bmatrix}\]

Original image

Convolved with a 9x9 box filter

Convolved with \(D_x\)

Convolved with \(D_y\)

We then ran the same convolutions using scipy.signal.convolve2d and generated the following difference images between our output and scipy's output.

Difference after convolution with 9x9 box filter.
MSE = 5.8336224878156714e-33

Difference after convolution with \(D_x\).
MSE = 0.0

Difference after convolution with \(D_y\).
MSE = 0.0

We see the output is the same for the finite difference operator and almost the same for the box filter, except for the exceptionally bright parts of the image. We believe the error is a result of floating point errors. Moreover, because our implementation does not take advantage of the Fourier Transform, our implementation is much slower than the scipy alternative. For the box filter, our implementation took 50.9s while the scipy function took 1.7s. That's ~30 times slower! Moving forward, we will be using scipy.signal.convolve2d with the mode set to "same".

Part 1.2: Finite Difference Operator

Sometimes, we are interested in finding the edges in an image. A typical metric used to find edges is the gradient magnitude. \[\text{Gradient Magnitude}=\sqrt{\frac{\partial^2}{\partial{x}^2}\text{Image}+\frac{\partial^2}{\partial{y}^2}\text{Image}}\] We can calculate the partial derivatives using our finite difference operators defined earlier. These essentially just calculate the numerical partial derivative because of the fact that images are discrete. Then, to create an edge detector, we must threshold the gradient magnitude to binarize the image. This threshold was determined qualitatively to balance noise and detection of real edges.

Original image

Finite difference in x

Fintie difference in y

Gradient magnitude

Edge image. Threshold=0.35

1.3: Derivative of Gaussian (DoG) Filter

The results from the difference operators are rather noisy. To make our edge detector more robust to this noise, we can blur the image before convolving it with the finite difference operators. Thus, we convolve it with a Gaussian kernel made with the outer product of cv2.getGaussianKernel with a kernel size of 9 and a sigma of 1.5.

Original image

Blurred image

Finite difference in x

finite difference in y

Gradient magnitude

Edges. Threshold=0.15

We can see that the edges are much stronger and slightly smoother.
Performing two convolutions on a large image can take time, but there's a way to make this calculation much cheaper. Because convolutions are commutative we can convolve the gaussian and finite difference kernels together first before applying it to the image. The difference and gaussian kernels are much smaller than the image so this is much more efficient. This is known as the Derivative of Gaussian (DoG) filter.

DoG filter in x

DoG filter in y

Finite difference in x

Finite difference in y

Gradient magnitude

Edges threshold=0.15

Comparing the results between blurred finite difference operators and the Derivative of Gaussian filters, we see the output is essentially identical except for some very slight differences.

Difference image of gradient magnitudes
MSE=0.0008325068109545628

2.1: Image "Sharpening"

High frequencies are what make images look sharp. Thus, we can make an image look sharper by increasing the weight of the high frequencies. To do this, we first apply a low pass filter (ie. convolve the image with a Gaussian) and then subtract it from the original image. This leaves us with only the high frequencies that were filtered out. We can then multiply this high frequency image by some sharpening factor \(\alpha\) and add it back to the original image. Combining this whole process into a single convolution results in what is known as the unsharp mask filter. \[\text{Unsharp mask filter}=(1+\alpha)e-\alpha g\] \[\text{Sharpened image}=f*((1+\alpha)e-\alpha g)\] Where \(\alpha\) is the sharpenign factor, \(f\) is the image, \(e\) is a unit impulse, and \(g\) is a gaussian kernel.

Original image

Blurred image

High frequencies

Sharpened image (\(\alpha=1\))

Original image

Blurred image

High frequencies

Sharpened image (\(\alpha=1\))

We can see that taj is definitely much sharper. For the moon photo, there are less high frequencies so it may not seem to have sharpened as much. We can fix this by adjusting the sharpening factor \(\alpha\).

Original image

Sharpened (\(\alpha=1\))

Sharpened (\(\alpha=5\))

Original image

Sharpened (\(\alpha=1\))

Sharpened (\(\alpha=5\))

The craters are much more defined now at (\(\alpha=5\)), but the Taj Mahal looks a lot worse at (\(\alpha=5\)). Thus, it's important to choose the right \(\alpha\) for your image. Now, can we use this method to recover images? We attempted to do this by blurring and image and then applying the unsharp mask filter in an attempt to see if details could be recovered. Intuitively, this shouldn't work because the unsharp mask filter only amplifies existing high frequencies, so if we get rid of some high frequencies there is no those can be recovered.

Original image

Blurred image

Resharpened (\(\alpha=5\))

This does make the branches more defined, but all the high frequencies in the leaves are lost.

2.2: Hyrbrid Images

We can create hybrid images by combining the low frequencies of one image with the high frequencies of another. This makes it so when low frequencies dominate (ie the viewer is far away) we see one image and when high frequencies dominate (ie viewer is close) we see another. We can do this quite easily by putting one image through a low pass filter (convolve with a Gaussian), the second through a high pass filter (image minus the image convolved with a gaussian), and average the pixels to produce a hybrid image.

Derek (low pass)

Nutmeg (high pass)

Hybrid image

Exit sign (low pass)

Lamp (high pass)

Hybrid image

Photography Decal Lab 1 (low pass)

Photography Decal Lab 2 (high pass)

Hybrid image

Lab 1 Fourier Transform

Lab 2 Fourier Transform

Hybrid Fourier Transform

High frequencies

High freq Fourier Transform

Low frequencies

Low freq Fourier Transform

Part 2.3: Gaussian and Laplacian Stacks

In this part we are creating Gaussian and Laplacian Stacks. A Gaussian stack is a stack of images where each level has half the resolution of the previous level but at the same size. We can do this by progressively applying bigger Gaussian kernels to each level to create the next. A Laplacian stack is created from subtracting the next level of the Gaussian stack with the current level. This essentially separates the frequencies of an image into multiple layers of the stack. The final layer of the Laplacian stack is also the final layer of the Gaussian stack as there is nothing subtract from it.

Apple. Laplaceian stack level 0

Apple. Laplaceian stack level 1

Apple. Laplaceian stack level 2

Apple. Laplaceian stack level 3

Apple. Laplaceian stack level 4

Apple. Laplaceian stack level 5

reconstructed apple from laplacian stack

orange. Laplaceian stack level 0

orange. Laplaceian stack level 1

orange. Laplaceian stack level 2

orange. Laplaceian stack level 3

orange. Laplaceian stack level 4

orange. Laplaceian stack level 5

reconstructed orange from laplacian stack

Part 2.4: Multiresolution Blending

Now we can blend images by using the Laplacian stacks we just made. To do this, we create stacks for both images and a mask. We combine them by going through each layer of the stack and multiplying the layer of image1 with the layer of the mask and then the layer of image2 with 1 minus the layer of the stack. We then sum all the layers together to get our blended image.