Part 1.1: Convolutions From Scratch
Convolutions are central to image processing, so we implemented one using numpy! Because we have a naive implementation that doesn't take advantage of the Fourier Transform, we have \(O(n^2)\) time complexity.
We implemented zero padding with support for each mode ("full", "valid", and "same") scipy.signal.convolve2d
supports. The scipy function also has different options for boundary handling, but defaults to a zero fill.
We only opted to implement the zero fill so our function has a little functionality.
def load_image_as_grey(path:str) -> np.ndarray:
"""Read image from path and convert to greyscale then return numpy array of image data"""
im = skio.imread(path)
if im.shape[-1] == 4:
im = sk.color.rgba2rgb(im)
im = sk.color.rgb2gray(im)
im = sk.img_as_float(im)
return im
def convolve(img: str, kernel:np.ndarray, shape:str='same') -> np.ndarray:
"""
Convolves an image with a specified kernel
Args:
img (str): path to image file
kernel (ndarray): 2D matrix representing convolution kernel
shape (str): 'full', 'same', or 'valid'. Determines how to pad the image and the output dimensions
"""
img = load_image_as_grey(img)
imgH = img.shape[0]
imgW = img.shape[1]
kH, kW = kernel.shape
padH = (kH)//2
padW = (kW)//2
match(shape):
case 'same':
img_padded = np.pad(img,((padH,padH),(padW,padW)), mode='constant',constant_values=0)
outImg = np.zeros_like(img)
case 'full':
img_padded = np.pad(img,((kH,kH),(kW,kW)), mode='constant',constant_values=0)
outImg = np.zeros((imgH+kH, imgW+kW))
case 'valid':
img_padded = img
outImg = np.zeros((imgH-kH, imgW-kW))
imgH -= kH
imgW -= kW
case _:
#default to 'same'
img_padded = np.pad(img,((padH,padH),(padW,padW)), mode='constant',constant_values=0)
outImg = np.zeros_like(img)
# Flip the kernel or else this is a cross-correlation
kernel = np.flip(kernel,(0,1))
for i in range(imgH):
for j in range(imgW):
region = img_padded[i:i + kH, j:j + kW]
outImg[i][j] = np.sum(region * kernel)
return outImg
We ran our convolution alghorithm using a 9x9 box filter as well as the finite difference kernels which are defined below. \[D_x=\begin{bmatrix}1 & 0 & -1\end{bmatrix}\] \[D_y=\begin{bmatrix}1 \\ 0 \\ -1\end{bmatrix}\]




We then ran the same convolutions using scipy.signal.convolve2d
and generated the following difference images between our output and scipy's output.

MSE = 5.8336224878156714e-33

MSE = 0.0

MSE = 0.0
We see the output is the same for the finite difference operator and almost the same for the box filter, except for the exceptionally bright parts of the image. We believe the error is a result of floating point errors.
Moreover, because our implementation does not take advantage of the Fourier Transform, our implementation is much slower than the scipy alternative.
For the box filter, our implementation took 50.9s while the scipy function took 1.7s. That's ~30 times slower!
Moving forward, we will be using scipy.signal.convolve2d
with the mode set to "same".
Part 1.2: Finite Difference Operator
Sometimes, we are interested in finding the edges in an image. A typical metric used to find edges is the gradient magnitude. \[\text{Gradient Magnitude}=\sqrt{\frac{\partial^2}{\partial{x}^2}\text{Image}+\frac{\partial^2}{\partial{y}^2}\text{Image}}\] We can calculate the partial derivatives using our finite difference operators defined earlier. These essentially just calculate the numerical partial derivative because of the fact that images are discrete. Then, to create an edge detector, we must threshold the gradient magnitude to binarize the image. This threshold was determined qualitatively to balance noise and detection of real edges.





1.3: Derivative of Gaussian (DoG) Filter
The results from the difference operators are rather noisy. To make our edge detector more robust to this noise, we can blur the image before convolving it with the finite difference operators.
Thus, we convolve it with a Gaussian kernel made with the outer product of cv2.getGaussianKernel
with a kernel size of 9
and a sigma of 1.5
.






We can see that the edges are much stronger and slightly smoother.
Performing two convolutions on a large image can take time, but there's a way to make this calculation much cheaper.
Because convolutions are commutative we can convolve the gaussian and finite difference kernels together first before applying it to the image. The difference and gaussian kernels are much smaller than the image so this is much more efficient.
This is known as the Derivative of Gaussian (DoG) filter.






Comparing the results between blurred finite difference operators and the Derivative of Gaussian filters, we see the output is essentially identical except for some very slight differences.

MSE=0.0008325068109545628
2.1: Image "Sharpening"
High frequencies are what make images look sharp. Thus, we can make an image look sharper by increasing the weight of the high frequencies. To do this, we first apply a low pass filter (ie. convolve the image with a Gaussian) and then subtract it from the original image. This leaves us with only the high frequencies that were filtered out. We can then multiply this high frequency image by some sharpening factor \(\alpha\) and add it back to the original image. Combining this whole process into a single convolution results in what is known as the unsharp mask filter. \[\text{Unsharp mask filter}=(1+\alpha)e-\alpha g\] \[\text{Sharpened image}=f*((1+\alpha)e-\alpha g)\] Where \(\alpha\) is the sharpenign factor, \(f\) is the image, \(e\) is a unit impulse, and \(g\) is a gaussian kernel.








We can see that taj is definitely much sharper. For the moon photo, there are less high frequencies so it may not seem to have sharpened as much. We can fix this by adjusting the sharpening factor \(\alpha\).






The craters are much more defined now at (\(\alpha=5\)), but the Taj Mahal looks a lot worse at (\(\alpha=5\)). Thus, it's important to choose the right \(\alpha\) for your image. Now, can we use this method to recover images? We attempted to do this by blurring and image and then applying the unsharp mask filter in an attempt to see if details could be recovered. Intuitively, this shouldn't work because the unsharp mask filter only amplifies existing high frequencies, so if we get rid of some high frequencies there is no those can be recovered.



This does make the branches more defined, but all the high frequencies in the leaves are lost.
2.2: Hyrbrid Images
We can create hybrid images by combining the low frequencies of one image with the high frequencies of another. This makes it so when low frequencies dominate (ie the viewer is far away) we see one image and when high frequencies dominate (ie viewer is close) we see another. We can do this quite easily by putting one image through a low pass filter (convolve with a Gaussian), the second through a high pass filter (image minus the image convolved with a gaussian), and average the pixels to produce a hybrid image.
















Part 2.3: Gaussian and Laplacian Stacks
In this part we are creating Gaussian and Laplacian Stacks. A Gaussian stack is a stack of images where each level has half the resolution of the previous level but at the same size. We can do this by progressively applying bigger Gaussian kernels to each level to create the next. A Laplacian stack is created from subtracting the next level of the Gaussian stack with the current level. This essentially separates the frequencies of an image into multiple layers of the stack. The final layer of the Laplacian stack is also the final layer of the Gaussian stack as there is nothing subtract from it.














Part 2.4: Multiresolution Blending
Now we can blend images by using the Laplacian stacks we just made. To do this, we create stacks for both images and a mask.
We combine them by going through each layer of the stack and multiplying the layer of image1
with the layer of the mask and then the layer of image2
with 1 minus the layer of the stack.
We then sum all the layers together to get our blended image.
































