B657: Final Project Report Holistically-Nested Edge Detection

Size: px

Start display at page:

Download "B657: Final Project Report Holistically-Nested Edge Detection"

Suzan Williamson
5 years ago
Views:

B657: Final roject Report Holistically-Nested Edge Detection Mingze Xu & Hanfei Mei May 4, 2016 Abstract Holistically-Nested Edge Detection (HED), which is a novel edge detection method based on

1 B657: Final roject Report Holistically-Nested Edge Detection Mingze Xu & Hanfei Mei May 4, 2016 Abstract Holistically-Nested Edge Detection (HED), which is a novel edge detection method based on fully convolutional neural networks, has a great performance on edge detection work for natural scenes. There are two important improvements in this method: (1) image-to-image training and prediction; and (2) utilization of multi-scale and multi-level deep learning architectures. Inspired by this, we re-implement HED and try to solve Optical Character Recognition (OCR) problem by matching edge maps between templates and input images. Introduction Figure 1. Illustration of the proposed HED algorithm. In the first row: (a) shows an example test image in the BSD500 dataset; (b) shows its corresponding edges as an notation by human subjects; (c ) displays the HED results. In the second row: (d), (e), and (f) respectively shows side edge responses from layers 2, 3, and 4 of our convolutional neural networks. In the third row: (g), (h), and (i), respectively show edge responses from the Canny detector at scales σ=2.0, σ=4.0, and σ=8.0. [1] Note: Figure reprinted from [1] Saining and Zhouwen. 1

1. Edge detection is both fundamental and important to other computer vision areas, such as segmentation, classification, recognition and 3D reconstruction.

2 1. Edge detection is both fundamental and important to other computer vision areas, such as segmentation, classification, recognition and 3D reconstruction. However, in Figure 1, we can see that classic methods, such as Canny, based on gradient variation cannot provide satisfied results. This prevents further researches from accurate state. 2. This paper provides a method of image-to-image prediction by means of deep learning model in a fully convolutional neural network. 3. This paper uses multi-scale and multi-level structure to generate 5 side outputs which improve the final fusion result advanced to state-of-art accuracy. Holistically-Nested Edge Detection 1. Multi-scale and Multi-level NN Figure 2. (a) multi-stream architecture; (b) skip-layer net architecture; (c) a single model running on multi-scale inputs; (d) separate training of different networks; (e) holisticallynested architecture. [1] Note: Figure reprinted from [1] Saining and Zhouwen. (a) Holistically-nested network is a relatively simple architecture that uses a single image as input and produces multiple side outputs. (b) Contrary to get multi-scale inputs like (c) in Figure 2, it generates side outputs in different scales which together contribute to the fusion result. One can consider this as doing edge detection of one image in multiple scales, which has the same effect to multi-scale inputs. (c) The current architecture can generate 5 side outputs. Each one represents a edge map in a particular scales. HED is flexible if more side outputs are desired in a particular circumstance. 2. Network Architecture 2

3 Table 1. The receptive field and stride size in VGGNet [4] used in HED. The bolded convolutional layers are linked to additional side-output layers. [1] Note: Table reused from [1] Saining and Zhouwen. This paper adopts the VGGNet architecture but make the following modifications: (a) Connecting the side output layer to the last convolutional layer in each stage, respectively conv1 2, conv2 2, conv3 3, conv4 3, conv5 3. The receptive field size of each of these convolutional layers is identical to the corresponding side-output layer; [1] (b) Cutting the last stage of VGGNet. Each side output is expected to be meaningful. A convolutional layer with stride 32 will make the output image too small that hardly contributes to the last fusion output. We can see from Table 1 that each bolded convolutional layer are followed by a pooling layer. After one side output is generated, HED sub-samples the image to make it smaller. A smaller image means the neural network will focus more one its contour instead of details. 3. Result Generation Figure 4. Illustration of our network architecture for edge detection, high-lighting the error back propagation paths. Side-output layers are inserted after convolutional layers. Deep supervision is imposed at each side-output layer, guiding the side-outputs towards edge predictions with the characteristics we desire. The outputs of HED are multi-scale and multi-level, with the side-output-plane size becoming smaller and the receptive field size becoming larger. One weighted-fusion layer is added to automatically learn how to combine outputs from multiple scales. The entire network is trained with multiple error propagation paths (dashed lines). [1] Note: Figure reprinted from [1] Saining and Zhouwen. 3

(a) Instead of only having one loss function, each side output will have its own loss function.

(m) lside (W, w(m) ) = β log r(yj = 1 X; W, w(m) ) (1 β) log r(yj = j Y j Y + 0 X; W, w(m) ) iii.

(W, w, h) = arg min(lside (W, w) + Lf use (W, w, h)) (b) Given image X, we obtain edge predictions from both the

.., Y side ) = CN N (X, (W, w, h) ) (1) (2) (M ) ii. Y HED = Average(Y f use, Y side, Y side,.

4 (a) Instead of only having one loss function, each side output will have its own loss function. Each one will contribute to the whole back propagation process. M (m) αm lside (W, w(m) ) i. Lside (W, w) = m=1 ii. (m) lside (W, w(m) ) = β log r(yj = 1 X; W, w(m) ) (1 β) log r(yj = j Y j Y + 0 X; W, w(m) ) iii. Lf use (W, w, h) = Dist(Y, Y f use ) iv. (W, w, h) = arg min(lside (W, w) + Lf use (W, w, h)) (b) Given image X, we obtain edge predictions from both the side output layers and the weighted-fusion layer as following: (1) (2) (M ) i. (Y f use, Y side, Y side,..., Y side ) = CN N (X, (W, w, h) ) (1) (2) (M ) ii. Y HED = Average(Y f use, Y side, Y side,..., Y side ) Implementation and roblems Figure 5. First line: left) original image; right) Canny edges. Second line: left) sample side output; right) HED edges. 1. Figure 5 is a sample result of HED edge detection. We use model trained by HED networks 4

and use natural scene around us as test image. We can see compared to Canny edges, HED edges focus more on outlines of items in picture and closer to what it supposed to be. Figure 6.

5 and use natural scene around us as test image. We can see compared to Canny edges, HED edges focus more on outlines of items in picture and closer to what it supposed to be. Figure 6. edges for musical notations with noise 2. In Figure 6, HED edge detection works not as well as it does on natural scenes. There are some reasons. (a) The training data are all natural scenes, as a result, the trained model must fit natural scenes edges well. The musical notes are differently represented, so the trained model may find it wired to see this items. (b) The training data are all high definition pictures, therefore, every pixel in training images is important to the final results. However, the test data is a blurred one with many spots on it, which will make our model difficult to find accurate edge pixels. 3. Though HED edge detection works not perfectly on these musical notes. it performs better than Sobel edge detection that we used to apply in assignment 1 does. However, the result of Optical Character Recognition (OCR) is still not very good. We think it is mostly because there are more steps in a process of Character Recognition. Besides edge detection, we still have to choose distance function and match the template with the Character in test image. Although a good edge detection method will contribute some to the result, it will not make significant changes. Because the result are based on a combination of many different works. Conclusion 1. The HED works very well on natural scenes edge detection but not perfectly on other data such as Musical notations. 2. The HED still has difficulties to recognize actual item from noise. Reference [1] Saining Xie and Zhouwen Tu, Holistically-Nested EdgeDetection, in roc. ICCV,2015. [2] K. Silberman, A. Zisserman. Very deep convolutional networks for large-scale image recognition, In ICLR, 2015 [3] Liang Chen, Kun Duan, MIDI-Assisted Egocentric Optical Music Recognition, WACV, [4] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In ICLR,

Holistically-Nested Edge Detection (HED)

Holistically-Nested Edge Detection (HED) Saining Xie, Zhuowen Tu Presented by Yuxin Wu February 10, 20 What is an Edge? Local intensity change? Used in traditional methods: Canny, Sobel, etc Learn it!