arxiv: v1 [cs.cv] 28 Dec 2018

Size: px
Start display at page:

Download "arxiv: v1 [cs.cv] 28 Dec 2018"

Transcription

1 Salient Object Detection via High-to-Low Hierarchical Context Aggregation Yun Liu 1 Yu Qiu 1 Le Zhang 2 JiaWang Bian 1 Guang-Yu Nie 3 Ming-Ming Cheng 1 1 Nankai University 2 A*STAR 3 Beijing Institute of Technology arxiv: v1 [cs.cv] 28 Dec 2018 Abstract Recent progress on salient object detection mainly aims at exploiting how to effectively integrate convolutional sideoutput features in convolutional neural networks (CNN). Based on this, most of the existing state-of-the-art saliency detectors design complex network structures to fuse the side-output features of the backbone feature extraction networks. However, should the fusion strategies be more and more complex for accurate salient object detection? In this paper, we observe that the contexts of a natural image can be well expressed by a high-to-low self-learning of sideoutput convolutional features. As we know, the contexts of an image usually refer to the global structures, and the top layers of CNN usually learn to convey global information. On the other hand, it is difficult for the intermediate sideoutput features to express contextual information. Here, we design an hourglass network with intermediate supervision to learn contextual features in a high-to-low manner. The learned hierarchical contexts are aggregated to generate the hybrid contextual expression for an input image. At last, the hybrid contextual features can be used for accurate saliency estimation. We extensively evaluate our method on six challenging saliency datasets, and our simple method achieves state-of-the-art performance under various evaluation metrics. Code will be released upon paper acceptance. 1. Introduction Salient object detection, also known as saliency detection, aims at simulating the human vision system to detect the most conspicuous and eye-attracting objects or regions in a natural image [1, 7]. The progress in saliency detection has been beneficial to a wide range of vision applications, including image retrieval [11], visual tracking [33], scene classification [36], content-ware video compression [61], and weakly supervised learning [46, 47]. Although numerous valuable models have been presented [25, 4, 57, 29, 17, 53, 15] and significant progress has been made, it remains as an open problem to accurately detect M.M. Cheng (cmm@nankai.edu.cn) is the corresponding author. (a) Image (b) GT (c) Side 6 (d) Side 5 (e) Side 4 (f) Side 3 (g) Side 2 (h) Side 1 (i) Aggregated Contexts Figure 1. Visualization of our learned contexts at different sides of the neural network. The contexts at lower sides are learned under the guidance of top global contexts to only emphasize the details of salient objects. salient objects in static images, especially in some complicated scenarios. Conventional saliency detection methods [7, 19, 39] usually design hand-crafted low-level features and heuristic priors, which are difficult to describe semantic objects and scenes. Recent progress on saliency detection is mainly beneficial from convolutional neural networks (CNN) [32, 26, 57, 45, 54, 21, 22]. The backbone of CNN usually consists of several blocks of stacked convolutional and pooling layers, in which the blocks near to network inputs are called bottom sides and otherwise top sides. It is well accepted that the top sides of CNN contain semantic meaningful information while the bottom sides contain complementary spatial details [48, 30, 16]. Therefore, current state-ofthe-art saliency detectors [4, 51, 45, 54, 29, 44, 55, 43, 16] mainly aim at designing complex network structures to fuse the features or results from various side-outputs. For example, Hou et al. [16] carefully selected several combination sets of various side-output results and fused the combination results for accurate saliency segmentation. Wang et al. [44] proposed a recurrent module to filter out noisy 1

2 information for side-output features. Although significant progress has been made in this direction [16, 55, 44], the side-output fusion strategies have become more and more complex. Do we have to continue this direction for the further improvement of saliency detection? To answer this question, we notice that some recent studies [58, 52] find CNN can learn global contextual information for input images at top convolution layers by enlarging receptive fields. This is not directly applicable to saliency detection, because saliency detection requires not only global contextual information but also local spatial details. Instead of fusing side-output features complicatedly as in [4, 57, 51], we consider constructing hierarchical contextual features. Specifically, we flow global contextual information obtained at top sides into bottom sides. The top contextual information will learn to guide the bottom sides to construct the contextual features at fine spatial scales only emphasizing salient objects. Hence the obtained contexts are different from side-output features or some combinations of them which only contain or at least emphasize local representations for an image. A visualization of contexts learned by our model can be found in Figure 1. Intuitively, the hierarchical contexts should be learned in a high-to-low manner, which means the top sides should learn contexts first and then bottom sides can learn contexts at large spatial resolutions using the information flowing from the top sides. Hence we build an hourglass network and add intermediate supervision after the context module at each side. In the training process, we find the top sides can be automatically optimized first, which is consistent with our hypothesis. This will be demonstrated in Section 4. At last, we simply aggregate hierarchical contexts for accurate salient object detection. The experimental results demonstrate our simple idea can favorably outperform recent state-of-the-art methods that use heavily engineered networks. Our contributions can be summarized as three folds: We build an hourglass network with intermediate supervision to learn hierarchical contexts, which are generated with the guidance of global contextual information and thus only emphasize salient objects at different scales. We propose a hierarchical context aggregation module to ensure the network is optimized from the top sides to bottom sides. We aggregate the learned hierarchical contexts at different scales to perform accurate salient object detection unlike previous studies [16, 55, 43] that fuse side-output features or some complex combinations of side-outputs. We extensively compare our method with recent stateof-the-art methods on six popular datasets. Our simple method favorably outperforms these competitors under various metrics. 2. Related Work Salient object detection is a very active research field due to its wide applications and challenging scenarios. Here, we briefly divide the related work into four parts to review the development of saliency detection and context learning. Heuristic saliency detection methods usually extract handcrafted low-level features and apply machine learning models to classify these features. Some heuristic saliency priors are utilized to ensure the accuracy, such as color contrast [1, 7], center prior [20, 19] and background prior [50, 60]. DRFI [19] is a comprehensive representative of this kind of methods by integrating various features and priors. However, it is difficult for the low-level features to describe semantic information, and the saliency priors are not robust enough for complicated scenarios. Hence deep learning based methods have dominated this fields due to their powerful representation capability. Region-based saliency detection appears in the early era of deep learning based saliency. These approaches view each image patch as a basic processing unit to perform saliency detection. Lee et al. [21] utilized both low-level hand-crafted features and high-level deep features to classify candidate regions as salient or not. The low-level features are compared with other parts of an image to form a distance map that is then encoded by the CNN. Wang et al. [40] presented a two-stage training strategy to sort the segmented object proposals in which the first stage extracts features and the second stage predicts the saliency score for each region. Li et al. [23] extracted multi-scale deep features which are used to infer the saliency scores for image segments. CNN-based image-to-image saliency detection models [4, 57, 51, 45, 54, 29, 44, 17, 55, 5, 43, 27, 28, 16, 24, 32, 26] take saliency detection as a pixel-wise binary classification task and perform image-to-image predictions. For example, Chen et al. [5] proposed a two-stream network which consists of a fixation stream and a semantic stream. Zhang et al. [57] introduced an attention guided network that progressively integrates multiple layer-wise attention for saliency detection. Islam et al. [17] introduced a new deep learning solution with a hierarchical representation of relative saliency and stage-wise refinement. How to effectively fuse multi-level CNN features is the main research direction for CNN-based saliency detection methods [4, 51, 45, 54, 29, 44, 55, 43, 16, 24, 32, 26]. There are too many studies to list here, but the general trend of recent designs is becoming more and more complicated. We will provide detailed discussion about these methods in Sec-

3 DeConv Crop Hierarchical Context Aggregation 2 Conv Element-wise Sum Figure 2. Overall framework of our proposed method. Our effort starts from the VGG16 network [38]. We add an additional convolution block at the end of the convolution layers of VGG16, resulting in six convolution blocks in total. The contexts at each convolution block are learned in a high-to-low manner to ensure that each block is guided by all higher layers to generate scale-aware contexts. The Hierarchical Context Aggregation (HCA) module can guarantee the optimization order is high-to-low and aggregate the generated hierarchical contexts to predict the final saliency maps. tion 4. Compared with them, we focus on a simple yet effective design in this paper. Context learning is recently discovered in semantic segmentation [58, 52]. Zhao et al. [58] added a pyramid pooling module for global context construction upon the final layer of the deep network, by which they significantly improved the performance of semantic segmentation. Zhang et al. [52] built context encoding module using the encoding layer [9] on the top of neural network to conduct accurate semantic segmentation. In saliency detection, Wang et al. [43] followed [58] to use the pyramid pooling module to extract contextual information. Zhao et al. [59] proposed a global context module and a local context module to extract the global and local contexts. The global context module is fed with a superpixel-centered large window including the full image, while the local context module takes a superpixel-centered small window with a small image patch. Hence the the goal to extract multi-contexts in [59] is achieved by multi-scale inputs. The full literature review of salient object detection is out the scope of this paper. Please refer to [2, 8, 12] for a more comprehensive survey. In this paper, we focus on the context learning rather than previous multi-level feature fusion for the improvement of saliency detection. Different from [43] that uses multiple networks, each of which has a pyramid pooling module [58] at the top, we propose an elegant single network. Different from [59] that uses multi-scale inputs, we use single-scale inputs to extract multi-level contexts. The resulting model is simple yet effective. 3. Approach In this section, we will elaborate our proposed framework for salient object detection. We first introduce our base network in Section 3.1. Then, we present a Mirrorlinked Hourglass Network (MLHN) in Section 3.2. A detailed description of the Hierarchical Context Aggregation (HCA) module is finally provided in Section 3.3. We show an overall network architecture in Figure Base Network To tackle the salient object detection, we follow recent studies [5, 43, 16] to use fully convolutional networks. Specifically, we use the well-known VGG16 network [38] as our backbone net, whose final fully connected layers are removed to serve for image-to-image translation. Salient object detection usually requires global information to judge which objects are salient [7], so enlarging the receptive field of the network would be helpful. To this end, we remain the final pooling layer as in [16] and follow [3] to transform the last two fully connected layers to convolution layers, one of which has the kernel size of 3 3 with 1024 channels and another of which has the kernel size of 1 1 with 1024 channels as well. Therefore, there are five pooling layers in the backbone net. They divide the convolution layers into six convolution blocks, which are denoted as {S 1, S 2, S 3, S 4, S 5, S 6 } from bottom to top, respectively. We consider S 6 as the top valve that controls the overall contextual information flow in the network. The resolution of feature maps in each convolution block is the half

4 of the preceding one. Following [16, 48], the side-output of each convolution block means the connection from the last layer of this block Mirror-linked Hourglass Network Based on the backbone net, we build a Mirror-linked Hourglass Network (MLHN). An overview of MLHN is displayed in Figure 2. More concretely, we upsample the convolution block S 6 by two times and connect a 1 1 convolution layer (w/o non-linearization) after S 5. The resulting two feature maps are fused using an element-wise summation operation. For the upsampling, the side-output of S 6 is first connected to a 1 1 convolution layer (w/o nonlinearization) which follows by a deconvolution layer. This deconvolution upsamples a features map by 2 times using bilinear interpolation. A crop operation is performed to ensure the upsampled feature map of S 6 has equal size to the feature map of S 5. To convert the fused feature map into contextual information, two sequential convolution layers are then connected to obtain contextual features S 5. These two convolution layers play a role of transform function, which uses the contextual information of S 6 to guide the features of S 5 to generate contexts S 5. The contextual features { S 4, S 3, S 2, S 1 } can be obtained in the similar way. For a clear presentation, this can be formulated as S i = ϕ(φ 1 (S i ) + φ 2 ( S i+1 )) i {1, 2, 3, 4, 5} φ 1 ( ) = Conv( ) φ 2 ( ) = Crop(Upsample(Conv( ))) ϕ( ) = ReLU(Conv(ReLU(Conv( )))). A standard encoder-decoder network can be formulated as (1) S i = ϕ( S i+1 ) i {1, 2, 3, 4, 5} (2) In this way, the proposed MLHN gradually flow top contextual information into lower sides, so the lower sides are expected to only emphasize the details of salient regions in an image. The two sequential convolution layers (orange box in Figure 2) are with kernel size 5 5 for { S 5, S 4, S 3 } and kernel size 3 3 for { S 2, S 1 }. The numbers of output channels are 512, 256, 256, 128 and 128 from S 5 to S 1, respectively. On one hand, the encoded features in the base network are connected to the decoder part in a Mirror-linked way. On the other hand, the proposed network is symmetric with S 6 as its center, just like an hourglass. Hence we call our network Mirror-linked Hourglass Network (MLHN) Hierarchical Context Aggregation Intuitively, the proposed MLHN should be optimized from the top sides to bottom sides, because the global contextual information is contained in the top sides and will be Side6 Side5 7 7 Side4 Side Side2 Side1 Contexts Side 2 Conv 3 3 Conv DeConv 1 1 Conv Figure 3. Hierarchical Context Aggregation (HCA) module used in our proposed network. All sides of the backbone have intermediate supervision to ensure that the optimization is performed from high sides to lower sides, so that every side can learn the contextual information. The hierarchical contexts from all sides are concatenated for final saliency map prediction. flowed to bottom sides gradually. Therefore, unlike previous encoder-decoder networks [37, 31] that impose supervision at the final layer of decoder, we adopt supervision at all context learning stages, i.e. { S 6, S 5, S 4, S 3, S 2, S 1 }, through a Hierarchical Context Aggregation (HCA) module. The HCA module is shown in Figure 3. The side-output of each decoder side is first connected with two convolution layers, which are with kernel size of 7 7 for S 6, 5 5 for { S 5, S 4, S 3 } and 3 3 for { S 2, S 1 }. The numbers of channels for them are 512, 512, 256, 256, 128 and 128, respectively. Then, we add a 3 3 convolution layer without non-linearization to decrease the number of channels to 25 for all sides. The 25-channel map is the context map at each side. A deconvolution layer with fixed bilinear kernel is employed to upsample the context map into the size of original image. In order to better understand this process, we formulate it as C i = Crop(Upsample(ω(ψ( S i )))) ω( ) = Conv( ) i {1, 2, 3, 4, 5, 6} ψ( ) = ReLU(Conv(ReLU(Conv( )))), in which ω( ) is a linear transformation for channel reorganization and ψ( ) is to transform the fused features at each stage into contexts at various scales. The saliency prediction map can be obtained by simply adding a 1 1 (w/o non-linearization) convolution. We put the intermediate supervision here for each side to help the top sides to be optimized first. The upsampled context maps ( C i, i = 1, 2,, 6) for all sides are aggregated using a standard concatenation. A 7 7 convolution and a (3)

5 Hidden Layer Loss Layer Connection (a) (b) (c) (d) (e) (f) Figure 4. Illustration of different multi-scale deep learning architectures: (a) hyper feature learning; (b) FCN style; (c) HED style; (d) DSS style; (e) encoder-decoder networks; (f) our HCA network. The connections in above figures can be any network configurations, e.g. any types of CNN layers or combinations of them. 3 3 convolution are followed to further fuse the hierarchical contexts for the final high-quality prediction of saliency maps. We empirically find large kernel sizes are a bit helpful here, but large kernel sizes will also lead to slow speed because the aggregated context map is in the size of original image. Therefore, we do not use two 7 7 or larger kernel sizes. The essential function of HCA lies in three aspects. Firstly, the intermediate supervision of HCA can help MLHN be optimized from top to bottom, so that the global contextual information at top sides will flow to bottom sides gradually. Secondly, the added convolution layers can encourage each side to generate contexts at the corresponding scale. Thirdly, the hierarchical contexts at all sides are aggregated for final saliency map prediction, unlike previous methods [16, 48, 30] that compute final results by fusing results of various side-outputs. 4. Architectural Analyses Due to the nature of the multi-scale and multi-level learning in deep neural networks, there have emerged a large number of architectures that are designed to utilize the hierarchical deep features. For example, multi-scale learning can use skip-layer connections [13, 31] which is widely accepted owning their strong capabilities to fuse hierarchical deep features inside the networks. On the other hand, multi-scale learning can use encoder-decoder networks that progressively decode the hierarchical deep representation learned in the encoder backbone net. We have seen these two structures applied in various vision tasks. We continue our discussion by briefly categorizing inside multi-scale deep learning into five classes: hyper feature learning, FCN style, HED style, DSS style and encoderdecoder networks. An overall illustration of them is summarized in Figure 4. Our following discussion of them will clearly show the differences between our proposed HCA network and previous efforts on multi-scale learning. Hyper feature learning: Hyper feature learning [13] is the most intuitive way to purse multi-scale information, as illus- Loss loss-side1 loss-side2 loss-side3 loss-side4 loss-side5 loss-side6 loss-fuse #Iteration Figure 5. Side loss at the first 2000 training iterations. At the beginning, the loss of top sides drop quickly, but the bottom sides manage to have smaller loss at last. trated in Figure 4(a). Examples of this structure for saliency include [24, 51, 5, 43, 27]. These models concatenate/sum multi-scale deep features from multiple levels of backbone nets [24, 51] or branches of the multi-stream nets [5, 43, 27]. The fused hyper features are then used for final predictions. FCN style: Since the top sides of neural networks usually contain more reliable semantic information, a reasonable revision of hyper feature learning is to progressively fuse deep features from upper layers to lower layers [31, 37], as shown in Figure 4(b). The top semantic features will combine with bottom low-level features to capture fine-grained details. The feature fusion can be a simple element-wise summation [31], a simple feature map concatenation (U- Net) [37], or more complex designs based on them. Most of recent saliency models fall into this category [57, 45, 54, 29, 44, 17, 55]. They differ from each other by applying different fusion strategies. One notable similarity of these models is that the final prediction is produced using the fused feature maps at the largest scale. Hence the final fused features are expected to learn both global seman-

6 tic information and local low-level details. To better achieve this goal, recent state-of-the-art models have designed very complex fusion strategies [29, 44, 4]. HED style: HED-like networks [48, 30] add deep supervision at the intermediate sides to perform predictions, and the final result is a combination of predictions at all sides (shown in Figure 4(c)). Unlike multi-scale feature fusion, HED performs multi-scale prediction fusion. Chen et al. [4] followed this style to perform saliency detection. DSS style: DSS network [16] is an extension of HED architecture. The side-output of each network side is fused with side-outputs from some of the upper sides. For each side, which upper sides to choose for fusion is carefully selected by experiments. The difference between HED and DSS can be clearly seen in Figure 4(d). Encoder-decoder networks: To benefit from the powerful representation capability of deep networks, one can also decode the high-level representation at the top layers [35], as displayed in Figure 4(e). The decoder gradually enlarges its resolution to decode local information from upper layers. HCA network: We show a streamlined diagram of our proposed HCA network in Figure 4(f). Its left part looks a bit like an FCN (Figure 4(b)) or an encoder-decoder network (Figure 4(e)) with parallel connections. Unlike the FCN and encoder-decoder nets that perform predictions using the final fused hybrid features, our HCA network aggregates hierarchical contexts to perform predictions. The contexts are learned in a high-to-low manner through the proposed HCA module, so that the firstly optimized top sides can generate global contextual information to guide lower layers to produce scale-specific contexts. We show a demonstration of this high-to-low optimization in Figure 5, which includes the loss curves of all sides during training. We can clearly see that C 6 is optimized first, then C 5, C4, C3, C2 and C 1 follow sequentially. Without carefully designed feature fusion strategies [29, 55, 44, 4], the simple HCA can learn high-quality contexts for accurate salient object detection. 5. Experiments 5.1. Experimental Setup Implementation Details. We implement the proposed network using the well-known Caffe [18] framework. The convolution layers contained in original VGG16 [38] are initialized using the publicly available pretrained ImageNet model [10]. The weights of other layers are initialized from the zero-mean Gaussian distribution with standard deviation The upsampling operations are implemented by deconvolution layers with bilinear interpolation kernels which will be frozen in the training process. The network is optimized using SGD with learning rate policy of poly, in which the current learning rate equals the base one multiplying (1 curr iter/max iter) power. The hyper parameters power and max iter are set to 0.9 and 20000, respectively, so that the training takes iterations in total. The initial learning rate is set to 1e-7. The momentum and weight decay are set to 0.9 and , respectively. All the experiments in this paper are performed on a TITAN Xp GPU. Datasets. We extensively evaluate our method on six popular datasets, including DUTS [41], ECSSD [49], SOD [34], HKU-IS [23], THUR15K [6] and DUT-OMRON [50]. These six datasets consist of 15572, 1000, 300, 4447, 6232 and 5168 natural complex images with corresponding pixelwise ground truth labeling. Among them, DUTS dataset [41] is a latest released challenging dataset consisting of training images and 5019 test images in very complex scenarios. For fair comparison, we follow recent studies [44, 29, 43, 51] to use DUTS training set for training and test on the DUTS test set and other five datasets. Evaluation Criteria. We utilize two evaluation metrics to evaluate our method as well as other state-of-the-art salient object detectors, including max F-measure score and mean absolute error (). Given a predicted saliency map with continuous probability values, we can convert it into binary maps with arbitrary thresholds and computing corresponding precision/recall values. Taking the average of precision/recall values over all images in a dataset, we can get many mean precision/recall pairs. Moreover, F-measure score is an overall performance indicator: F β = (1 + β2 ) P recision Recall β 2, (4) P recision + Recall in which β 2 is usually set to 0.3 to emphasize more on precision. We follow recent studies [32, 16, 55, 56, 29, 25, 4] to report max F β across different thresholds. Given a saliency map S and the corresponding ground truth G that are normalized to [0, 1], can be calculated as = 1 H W H i=1 j=1 W S(i, j) G(i, j) (5) where H and W represent the height and width, respectively. S(i, j) denotes the saliency score at location (i, j), similar to G(i, j) Performance Comparison We compare our proposed salient object detector with 16 recent state-of-the-art saliency models, including DRFI [19], MDF [23], LEGS [40], DCL [24], DHS [26], ELD [21], RFCN [42], NLDF [32], DSS [16], SRM [43], Amulet [55], UCF [56], BRN [44], PiCA [29], C2S [25] and RAS

7 Methods DUTS-test ECSSD DRFI [19] MDF [23] LEGS [40] DCL [24] DHS [26] ELD [21] RFCN [42] NLDF [32] DSS [16] Amulet [55] UCF [56] PiCA [29] C2S [25] RAS [4] HCA (ours) SRM [43] BRN [44] PiCA [29] HCA (ours) SOD HKU-IS Non-deep learning VGG16 [38] backbone ResNet [14] backbone DUT-OMRON THUR15K Table 1. Comparison of the proposed HCA and 16 competitors in terms of the metrics of and on six datasets. We report results on both VGG16 [38] backbone and ResNet [14] backbone. The top three models in each column are highlighted in red, green and blue, respectively. For ResNet based methods, we only highlight the top performance. Image DRFI MDF DCL DHS RFCN DSS SRM Amulet UCF BRN PiCA C2S RAS Ours GT Figure 6. Qualitative comparison of HCA and 13 state-of-the-art methods. [4]. Among them, DRFI [19] is the state-of-the-art nondeep-learning based method, and the other 15 models are all based on deep learning. We do not report MDF [23] results on the HKU-IS [23] dataset because MDF uses a part of HKU-IS for training. Due to the same reason, we do not report DHS [26] results on the DUT-OMRON [50]. For fair comparison, all these models are tested using their publicly available code and pretrained models released by the authors with default settings. We also report the results of the ResNet-101 [14] version of our proposed HCA. Since

8 No. Module Side 1 Side 2 Side 3 Side 4 Side 5 Side 6 1 MLHN (128, 3 3) 1 (128, 3 3) 1 (256, 5 5) 1 (256, 5 5) 1 (512, 5 5) 1-2 HCA (128, 3 3) 1 (128, 3 3) 1 (256, 5 5) 1 (256, 5 5) 1 (512, 5 5) 1 (512, 7 7) 1 3 MLHN (128, 3 3) 2 (128, 3 3) 2 (256, 3 3) 2 (256, 3 3) 2 (512, 3 3) 2-4 MLHN (128, 3 3) 2 (128, 3 3) 2 (256, 3 3) 2 (256, 3 3) 2 (512, 3 3) 2 (512, 7 7) 2 5 HCA (128, 3 3) 2 (128, 3 3) 2 (256, 5 5) 2 (256, 5 5) 2 (512, 5 5) 2 (512, 5 5) 2 6 MLHN (128, 3 3) 2 (128, 3 3) 2 (256, 5 5) 2 (512, 5 5) 2 (1024, 5 5) 2-7 HCA (128, 3 3) 2 (128, 3 3) 2 (256, 5 5) 2 (512, 5 5) 2 (1024, 5 5) 2 (1024, 7 7) 2 * MLHN (128, 3 3) 2 (128, 3 3) 2 (256, 5 5) 2 (256, 5 5) 2 (512, 5 5) 2 - HCA (128, 3 3) 2 (128, 3 3) 2 (256, 5 5) 2 (256, 5 5) 2 (512, 5 5) 2 (512, 7 7) 2 Table 2. Experimental settings of ablation studies. * means the default settings used in this paper. The column of Module indicates which module is changed, and another model remains the default settings in the meanwhile. No. DUTS-test ECSSD SOD HKU-IS DUT-OMRON THUR15K F β F β F β F β F β F β * Table 3. Evaluation results of ablation studies. See Table 2 for experimental settings with corresponding numbers. ResNet is deep enough to capture global contexts, we exclude the sixth side ( S 6 ) in HCA. Table 1 summarizes the numeric comparison in terms of F β and on six datasets. HCA can significantly outperform other competitors in most cases, which demonstrates its effectiveness. With the VGG16 [38] backbone, the F β values of HCA are 2.1%, 1.0%, 0.9%, 1.1%, 0.6% and 0.5% higher than the second best method on the DUTS, ECSSD, SOD, HKU-IS, DUT-OMRON and THUR15K datasets, respectively. On the SOD dataset in terms of metric, HCA performs slightly worse than the best result. PiCA [29] seems to achieves the second place. With the ResNet backbone, the performance gap between the proposed HCA and other ResNet based competitors is much larger than with VGG16 backbone net. Specifically, the F β values of HCA are 2.2%, 1.3%, 1.3%, 1.7%, 3.0% and 0.8% higher than the second best method on six datasets, respectively. We also provide a qualitative comparison in Figure 6. For objects with various shapes and scales, HCA can well segment the entire objects with fine details (1-2 rows). HCA is also robust with complicated background (3-5 rows), multiple objects (6-7 rows) and confusing stuff (8 row) Ablation Studies To evaluate the influences of various design choices of MLHN and HCA (the 2 Conv blocks in Figure 2 and Figure 3), we extensively perform seven ablation studies with VGG16 backbone. The detailed experimental settings and corresponding evaluation results are shown in Table 2 and Table 3, respectively. We can observe that our proposed method is not sensitive to different parameter settings, and the default design achieves slightly better results. These ablation studies can also reflect some interesting phenomena. For example, the experiment #5 suggests larger convolution kernel at sixth side is helpful to obtain accurate global contexts. The experiments #6 and #7 demonstrate introducing more convolution channels is useless to the performance. Interestingly, we observe that the default convolution parameter settings are similar to DSS [16] although we have different network architecture (see Section 4). Perhaps it is due to the intrinsic properties of backbone nets. 6. Conclusion Salient object detection is highly related to the global contextual information which can be used to judge which parts of an image are salient. Motivated by this, we propose a simple yet effective method in this paper. Our method starts from the top sides of neural networks and gradually flows the top global contexts into lower sides to obtain hierarchical contexts. These hierarchical contexts are aggregated for the final salient object detection. Our method reaches the new state-of-the-art on six datasets when compared with 16 recent saliency models. In the future, we plan to apply the proposed network architecture into other vision tasks that need global information.

9 References [1] R. Achanta, S. Hemami, F. Estrada, and S. Susstrunk. Frequency-tuned salient region detection. In IEEE CVPR, pages , , 2 [2] A. Borji, M.-M. Cheng, H. Jiang, and J. Li. Salient object detection: A benchmark. IEEE TIP, 12(24): , [3] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille. DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE TPAMI, 40(4): , [4] S. Chen, X. Tan, B. Wang, and X. Hu. Reverse attention for salient object detection. In ECCV, , 2, 6, 7 [5] X. Chen, A. Zheng, J. Li, and F. Lu. Look, perceive and segment: Finding the salient objects in images via two-stream fixation-semantic CNNs. In IEEE ICCV, pages , , 3, 5 [6] M.-M. Cheng, N. J. Mitra, X. Huang, and S.-M. Hu. Salientshape: Group saliency in image collections. The Visual Computer, 30(4): , [7] M.-M. Cheng, N. J. Mitra, X. Huang, P. H. Torr, and S.-M. Hu. Global contrast based salient region detection. IEEE TPAMI, 37(3): , , 2, 3 [8] R. Cong, J. Lei, H. Fu, M.-M. Cheng, W. Lin, and Q. Huang. Review of visual saliency detection with comprehensive information. IEEE TCSVT, [9] H. Z. J. X. K. Dana. Deep TEN: Texture encoding network. In IEEE CVPR, pages , [10] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei- Fei. Imagenet: A large-scale hierarchical image database. In IEEE CVPR, pages , [11] Y. Gao, M. Wang, Z.-J. Zha, J. Shen, X. Li, and X. Wu. Visual-textual joint relevance learning for tag-based social image search. IEEE TIP, 22(1): , [12] J. Han, D. Zhang, G. Cheng, N. Liu, and D. Xu. Advanced deep-learning techniques for salient and category-specific object detection: a survey. IEEE Signal Processing Magazine, 35(1):84 100, [13] B. Hariharan, P. Arbeláez, R. Girshick, and J. Malik. Hypercolumns for object segmentation and fine-grained localization. In IEEE CVPR, pages , [14] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In IEEE CVPR, pages , [15] S. He, J. Jiao, X. Zhang, G. Han, and R. W. Lau. Delving into salient object subitizing and detection. In IEEE ICCV, pages , [16] Q. Hou, M.-M. Cheng, X. Hu, A. Borji, Z. Tu, and P. Torr. Deeply supervised salient object detection with short connections. In IEEE CVPR, pages , , 2, 3, 5, 6, 7, 8 [17] M. A. Islam, M. Kalash, and N. D. Bruce. Revisiting salient object detection: Simultaneous detection, ranking, and subitizing of multiple salient objects. In IEEE CVPR, pages , , 2, 5 [18] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. In ACM MM, pages , [19] H. Jiang, J. Wang, Z. Yuan, Y. Wu, N. Zheng, and S. Li. Salient object detection: A discriminative regional feature integration approach. In IEEE CVPR, pages , , 2, 6, 7 [20] Z. Jiang and L. S. Davis. Submodular salient region detection. In IEEE CVPR, pages , [21] G. Lee, Y.-W. Tai, and J. Kim. Deep saliency with encoded low level distance map and high level features. In IEEE CVPR, pages , , 2, 6, 7 [22] G. Li, Y. Xie, L. Lin, and Y. Yu. Instance-level salient object segmentation. In IEEE CVPR, pages , [23] G. Li and Y. Yu. Visual saliency based on multiscale deep features. In IEEE CVPR, pages , , 6, 7 [24] G. Li and Y. Yu. Deep contrast learning for salient object detection. In IEEE CVPR, pages , , 5, 6, 7 [25] X. Li, F. Yang, H. Cheng, W. Liu, and D. Shen. Contour knowledge transfer for salient object detection. In ECCV, pages , , 6, 7 [26] N. Liu and J. Han. DHSNet: Deep hierarchical saliency network for salient object detection. In IEEE CVPR, pages , , 2, 6, 7 [27] N. Liu and J. Han. A deep spatial contextual long-term recurrent convolutional network for saliency detection. IEEE TIP, 27(7): , , 5 [28] N. Liu, J. Han, T. Liu, and X. Li. Learning to predict eye fixations via multiresolution convolutional neural networks. IEEE TNNLS, 29(2): , [29] N. Liu, J. Han, and M.-H. Yang. PiCANet: Learning pixelwise contextual attention for saliency detection. In IEEE CVPR, pages , , 2, 5, 6, 7, 8 [30] Y. Liu, M.-M. Cheng, X. Hu, K. Wang, and X. Bai. Richer convolutional features for edge detection. In IEEE CVPR, pages , , 5, 6 [31] J. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks for semantic segmentation. In IEEE CVPR, pages , , 5 [32] Z. Luo, A. K. Mishra, A. Achkar, J. A. Eichel, S. Li, and P.-M. Jodoin. Non-local deep features for salient object detection. In IEEE CVPR, pages , , 2, 6, 7 [33] V. Mahadevan and N. Vasconcelos. Saliency-based discriminant tracking. In IEEE CVPR, [34] V. Movahedi and J. H. Elder. Design and perceptual validation of performance measures for salient object segmentation. In IEEE Conf. Comput. Vis. Pattern Recog. Worksh., pages 49 56, [35] H. Noh, S. Hong, and B. Han. Learning deconvolution network for semantic segmentation. In IEEE ICCV, pages , [36] Z. Ren, S. Gao, L.-T. Chia, and I. W.-H. Tsang. Regionbased saliency detection and its application in object recognition. IEEE TCSVT, 24(5): ,

10 [37] O. Ronneberger, P. Fischer, and T. Brox. U-Net: convolutional networks for biomedical image segmentation. In MIC- CAI, pages , , 5 [38] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In ICLR, , 6, 7, 8 [39] N. Tong, H. Lu, X. Ruan, and M.-H. Yang. Salient object detection via bootstrap learning. In IEEE CVPR, pages , [40] L. Wang, H. Lu, X. Ruan, and M.-H. Yang. Deep networks for saliency detection via local estimation and global search. In IEEE CVPR, pages , , 6, 7 [41] L. Wang, H. Lu, Y. Wang, M. Feng, D. Wang, B. Yin, and X. Ruan. Learning to detect salient objects with image-level supervision. In IEEE CVPR, pages , [42] L. Wang, L. Wang, H. Lu, P. Zhang, and X. Ruan. Saliency detection with recurrent fully convolutional networks. In ECCV, pages , , 7 [43] T. Wang, A. Borji, L. Zhang, P. Zhang, and H. Lu. A stagewise refinement model for detecting salient objects in images. In IEEE ICCV, pages , , 2, 3, 5, 6, 7 [44] T. Wang, L. Zhang, S. Wang, H. Lu, G. Yang, X. Ruan, and A. Borji. Detect globally, refine locally: A novel approach to saliency detection. In IEEE CVPR, pages , , 2, 5, 6, 7 [45] W. Wang, J. Shen, X. Dong, and A. Borji. Salient object detection driven by fixation prediction. In IEEE CVPR, pages , , 2, 5 [46] Y. Wei, J. Feng, X. Liang, M.-M. Cheng, Y. Zhao, and S. Yan. Object region mining with adversarial erasing: A simple classification to semantic segmentation approach. In IEEE CVPR, pages , [47] Y. Wei, H. Xiao, H. Shi, Z. Jie, J. Feng, and T. S. Huang. Revisiting dilated convolution: A simple approach for weaklyand semi-supervised semantic segmentation. In IEEE CVPR, pages , [48] S. Xie and Z. Tu. Holistically-nested edge detection. In IEEE ICCV, pages , , 3, 5, 6 [49] Q. Yan, L. Xu, J. Shi, and J. Jia. Hierarchical saliency detection. In IEEE CVPR, pages , [50] C. Yang, L. Zhang, H. Lu, X. Ruan, and M.-H. Yang. Saliency detection via graph-based manifold ranking. In IEEE CVPR, pages , , 6, 7 [51] Y. Zeng, H. Lu, L. Zhang, M. Feng, and A. Borji. Learning to promote saliency detectors. In IEEE CVPR, pages , , 2, 5, 6 [52] H. Zhang, K. Dana, J. Shi, Z. Zhang, X. Wang, A. Tyagi, and A. Agrawal. Context encoding for semantic segmentation. In IEEE CVPR, pages , , 3 [53] J. Zhang, T. Zhang, Y. Dai, M. Harandi, and R. Hartley. Deep unsupervised saliency detection: A multiple noisy labeling perspective. In IEEE CVPR, pages , [54] L. Zhang, J. Dai, H. Lu, Y. He, and G. Wang. A bi-directional message passing model for salient object detection. In IEEE CVPR, pages , , 2, 5 [55] P. Zhang, D. Wang, H. Lu, H. Wang, and X. Ruan. Amulet: Aggregating multi-level convolutional features for salient object detection. In IEEE ICCV, pages , , 2, 5, 6, 7 [56] P. Zhang, D. Wang, H. Lu, H. Wang, and B. Yin. Learning uncertain convolutional features for accurate saliency detection. In IEEE ICCV, pages , , 7 [57] X. Zhang, T. Wang, J. Qi, H. Lu, and G. Wang. Progressive attention guided recurrent network for salient object detection. In IEEE CVPR, pages , , 2, 5 [58] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia. Pyramid scene parsing network. In IEEE CVPR, pages , , 3 [59] R. Zhao, W. Ouyang, H. Li, and X. Wang. Saliency detection by multi-context deep learning. In IEEE CVPR, pages , [60] W. Zhu, S. Liang, Y. Wei, and J. Sun. Saliency optimization from robust background detection. In IEEE CVPR, pages , [61] F. Zund, Y. Pritch, A. Sorkine-Hornung, S. Mangold, and T. Gross. Content-aware compression using saliency-driven image retargeting. In ICIP, pages ,

Visual Saliency Based on Multiscale Deep Features Supplementary Material

Visual Saliency Based on Multiscale Deep Features Supplementary Material Visual Saliency Based on Multiscale Deep Features Supplementary Material Guanbin Li Yizhou Yu Department of Computer Science, The University of Hong Kong https://sites.google.com/site/ligb86/mdfsaliency/

More information

arxiv: v1 [cs.cv] 2 Jun 2017

arxiv: v1 [cs.cv] 2 Jun 2017 INTEGRATED DEEP AND SHALLOW NETWORKS FOR SALIENT OBJECT DETECTION Jing Zhang 1,2, Bo Li 1, Yuchao Dai 2, Fatih Porikli 2 and Mingyi He 1 1 School of Electronics and Information, Northwestern Polytechnical

More information

Multi-Scale Salient Object Detection with Pyramid Spatial Pooling

Multi-Scale Salient Object Detection with Pyramid Spatial Pooling Multi-Scale Salient Object Detection with Pyramid Spatial Pooling Jing Zhang, Yuchao Dai, Fatih Porikli and Mingyi He School of Electronics and Information, Northwestern Polytechnical University, China.

More information

Progressive Attention Guided Recurrent Network for Salient Object Detection

Progressive Attention Guided Recurrent Network for Salient Object Detection Progressive Attention Guided Recurrent Network for Salient Object Detection Xiaoning Zhang, Tiantian Wang, Jinqing Qi, Huchuan Lu, Gang Wang Dalian University of Technology, China Alibaba AILabs, China

More information

Reverse Attention for Salient Object Detection

Reverse Attention for Salient Object Detection Reverse Attention for Salient Object Detection Shuhan Chen [0000 0002 0094 5157], Xiuli Tan, Ben Wang, and Xuelong Hu School of Information Engineering, Yangzhou University, China {c.shuhan, t.xiuli0214}@gmail.com,

More information

Look, Perceive and Segment: Finding the Salient Objects in Images via Two-stream Fixation-Semantic CNNs

Look, Perceive and Segment: Finding the Salient Objects in Images via Two-stream Fixation-Semantic CNNs Look, Perceive and Segment: Finding the Salient Objects in Images via Two-stream Fixation-Semantic CNNs Xiaowu Chen 1, Anlin Zheng 1, Jia Li 1,2, Feng Lu 1,2 1 State Key Laboratory of Virtual Reality Technology

More information

Deep Salient Object Detection by Integrating Multi-level Cues

Deep Salient Object Detection by Integrating Multi-level Cues Deep Salient Object Detection by Integrating Multi-level Cues Jing Zhang 1,2 Yuchao Dai 2 and Fatih Porikli 2 1 School of Electronics and Information, Northwestern Polytechnical University, Xi an, China

More information

Hierarchical Convolutional Features for Visual Tracking

Hierarchical Convolutional Features for Visual Tracking Hierarchical Convolutional Features for Visual Tracking Chao Ma Jia-Bin Huang Xiaokang Yang Ming-Husan Yang SJTU UIUC SJTU UC Merced ICCV 2015 Background Given the initial state (position and scale), estimate

More information

Deeply-Supervised Recurrent Convolutional Neural Network for Saliency Detection

Deeply-Supervised Recurrent Convolutional Neural Network for Saliency Detection Deeply-Supervised Recurrent Convolutional Neural Netork for Saliency Detection Youbao Tang Xiangqian Wu School of Computer Science and Technology Harbin Institute of Technology Harbin 150001 China {tangyoubao

More information

Delving into Salient Object Subitizing and Detection

Delving into Salient Object Subitizing and Detection Delving into Salient Object Subitizing and Detection Shengfeng He 1 Jianbo Jiao 2 Xiaodan Zhang 2,3 Guoqiang Han 1 Rynson W.H. Lau 2 1 South China University of Technology, China 2 City University of Hong

More information

Y-Net: Joint Segmentation and Classification for Diagnosis of Breast Biopsy Images

Y-Net: Joint Segmentation and Classification for Diagnosis of Breast Biopsy Images Y-Net: Joint Segmentation and Classification for Diagnosis of Breast Biopsy Images Sachin Mehta 1, Ezgi Mercan 1, Jamen Bartlett 2, Donald Weaver 2, Joann G. Elmore 1, and Linda Shapiro 1 1 University

More information

CSE Introduction to High-Perfomance Deep Learning ImageNet & VGG. Jihyung Kil

CSE Introduction to High-Perfomance Deep Learning ImageNet & VGG. Jihyung Kil CSE 5194.01 - Introduction to High-Perfomance Deep Learning ImageNet & VGG Jihyung Kil ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton,

More information

EXEMPLAR BASED IMAGE SALIENT OBJECT DETECTION. {zzwang, ruihuang, lwan,

EXEMPLAR BASED IMAGE SALIENT OBJECT DETECTION. {zzwang, ruihuang, lwan, EXEMPLAR BASED IMAGE SALIENT OBJECT DETECTION Zezheng Wang Rui Huang, Liang Wan 2 Wei Feng School of Computer Science and Technology, Tianjin University, Tianjin, China 2 School of Computer Software, Tianjin

More information

Task-driven Webpage Saliency

Task-driven Webpage Saliency Task-driven Webpage Saliency Quanlong Zheng 1[0000 0001 5059 0078], Jianbo Jiao 1,2[0000 0003 0833 5115], Ying Cao 1[0000 0002 9288 3167], and Rynson W.H. Lau 1[0000 0002 8957 8129] 1 Department of Computer

More information

Salient Object Detection in RGB-D Image Based on Saliency Fusion and Propagation

Salient Object Detection in RGB-D Image Based on Saliency Fusion and Propagation Salient Object Detection in RGB-D Image Based on Saliency Fusion and Propagation Jingfan Guo,, Tongwei Ren,,, Jia Bei,, Yujin Zhu State Key Laboratory for Novel Software Technology, Nanjing University,

More information

VIDEO SALIENCY INCORPORATING SPATIOTEMPORAL CUES AND UNCERTAINTY WEIGHTING

VIDEO SALIENCY INCORPORATING SPATIOTEMPORAL CUES AND UNCERTAINTY WEIGHTING VIDEO SALIENCY INCORPORATING SPATIOTEMPORAL CUES AND UNCERTAINTY WEIGHTING Yuming Fang, Zhou Wang 2, Weisi Lin School of Computer Engineering, Nanyang Technological University, Singapore 2 Department of

More information

B657: Final Project Report Holistically-Nested Edge Detection

B657: Final Project Report Holistically-Nested Edge Detection B657: Final roject Report Holistically-Nested Edge Detection Mingze Xu & Hanfei Mei May 4, 2016 Abstract Holistically-Nested Edge Detection (HED), which is a novel edge detection method based on fully

More information

Multi-attention Guided Activation Propagation in CNNs

Multi-attention Guided Activation Propagation in CNNs Multi-attention Guided Activation Propagation in CNNs Xiangteng He and Yuxin Peng (B) Institute of Computer Science and Technology, Peking University, Beijing, China pengyuxin@pku.edu.cn Abstract. CNNs

More information

Object-Level Saliency Detection Combining the Contrast and Spatial Compactness Hypothesis

Object-Level Saliency Detection Combining the Contrast and Spatial Compactness Hypothesis Object-Level Saliency Detection Combining the Contrast and Spatial Compactness Hypothesis Chi Zhang 1, Weiqiang Wang 1, 2, and Xiaoqian Liu 1 1 School of Computer and Control Engineering, University of

More information

Deep Unsupervised Saliency Detection: A Multiple Noisy Labeling Perspective

Deep Unsupervised Saliency Detection: A Multiple Noisy Labeling Perspective Deep Unsupervised Saliency Detection: A Multiple Noisy Labeling Perspective Jing Zhang 1,2, Tong Zhang 2,3, Yuchao Dai 1, Mehrtash Harandi 2,3, and Richard Hartley 2 1 Northwestern Polytechnical University,

More information

Deep Visual Attention Prediction

Deep Visual Attention Prediction IEEE TRANSACTIONS ON IMAGE PROCESSING 1 Deep Visual Attention Prediction Wenguan Wang, and Jianbing Shen, Senior Member, IEEE arxiv:1705.02544v3 [cs.cv] 22 Mar 2018 Abstract In this work, we aim to predict

More information

Computational modeling of visual attention and saliency in the Smart Playroom

Computational modeling of visual attention and saliency in the Smart Playroom Computational modeling of visual attention and saliency in the Smart Playroom Andrew Jones Department of Computer Science, Brown University Abstract The two canonical modes of human visual attention bottomup

More information

Object Detectors Emerge in Deep Scene CNNs

Object Detectors Emerge in Deep Scene CNNs Object Detectors Emerge in Deep Scene CNNs Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, Antonio Torralba Presented By: Collin McCarthy Goal: Understand how objects are represented in CNNs Are

More information

Efficient Deep Model Selection

Efficient Deep Model Selection Efficient Deep Model Selection Jose Alvarez Researcher Data61, CSIRO, Australia GTC, May 9 th 2017 www.josemalvarez.net conv1 conv2 conv3 conv4 conv5 conv6 conv7 conv8 softmax prediction???????? Num Classes

More information

arxiv: v1 [stat.ml] 23 Jan 2017

arxiv: v1 [stat.ml] 23 Jan 2017 Learning what to look in chest X-rays with a recurrent visual attention model arxiv:1701.06452v1 [stat.ml] 23 Jan 2017 Petros-Pavlos Ypsilantis Department of Biomedical Engineering King s College London

More information

Holistically-Nested Edge Detection (HED)

Holistically-Nested Edge Detection (HED) Holistically-Nested Edge Detection (HED) Saining Xie, Zhuowen Tu Presented by Yuxin Wu February 10, 20 What is an Edge? Local intensity change? Used in traditional methods: Canny, Sobel, etc Learn it!

More information

HHS Public Access Author manuscript Med Image Comput Comput Assist Interv. Author manuscript; available in PMC 2018 January 04.

HHS Public Access Author manuscript Med Image Comput Comput Assist Interv. Author manuscript; available in PMC 2018 January 04. Discriminative Localization in CNNs for Weakly-Supervised Segmentation of Pulmonary Nodules Xinyang Feng 1, Jie Yang 1, Andrew F. Laine 1, and Elsa D. Angelini 1,2 1 Department of Biomedical Engineering,

More information

Medical Image Analysis

Medical Image Analysis Medical Image Analysis 1 Co-trained convolutional neural networks for automated detection of prostate cancer in multiparametric MRI, 2017, Medical Image Analysis 2 Graph-based prostate extraction in t2-weighted

More information

Comparison of Two Approaches for Direct Food Calorie Estimation

Comparison of Two Approaches for Direct Food Calorie Estimation Comparison of Two Approaches for Direct Food Calorie Estimation Takumi Ege and Keiji Yanai Department of Informatics, The University of Electro-Communications, Tokyo 1-5-1 Chofugaoka, Chofu-shi, Tokyo

More information

arxiv: v1 [cs.cv] 13 Jul 2018

arxiv: v1 [cs.cv] 13 Jul 2018 Multi-Scale Convolutional-Stack Aggregation for Robust White Matter Hyperintensities Segmentation Hongwei Li 1, Jianguo Zhang 3, Mark Muehlau 2, Jan Kirschke 2, and Bjoern Menze 1 arxiv:1807.05153v1 [cs.cv]

More information

arxiv: v2 [cs.cv] 7 Jun 2018

arxiv: v2 [cs.cv] 7 Jun 2018 Deep supervision with additional labels for retinal vessel segmentation task Yishuo Zhang and Albert C.S. Chung Lo Kwee-Seong Medical Image Analysis Laboratory, Department of Computer Science and Engineering,

More information

Salient Object Detection Driven by Fixation Prediction

Salient Object Detection Driven by Fixation Prediction Salient Object Detection Driven by Fixation Prediction Wenguan Wang 1, Jianbing Shen 1,2, Xingping Dong 1, Ali Borji 3 1 Beijing Lab of Intelligent Information Technology, School of Computer Science, Beijing

More information

Learning to Detect Salient Objects with Image-level Supervision

Learning to Detect Salient Objects with Image-level Supervision Learning to Detect Salient Objects with Image-level Supervision Lijun Wang 1, Huchuan Lu 1, Yifan Wang 1, Mengyang Feng 1 Dong Wang 1, Baocai Yin 1, and Xiang Ruan 2 1 Dalian University of Technology,

More information

Supplementary Material for submission 2147: Traditional Saliency Reloaded: A Good Old Model in New Shape

Supplementary Material for submission 2147: Traditional Saliency Reloaded: A Good Old Model in New Shape Sulementary Material for submission 247: Traditional Saliency Reloaded: A Good Old Model in New Shae Simone Frintro, Thomas Werner, and Germán M. García Institute of Comuter Science III Rheinische Friedrich-Wilhelms-Universität

More information

Video Saliency Detection via Dynamic Consistent Spatio- Temporal Attention Modelling

Video Saliency Detection via Dynamic Consistent Spatio- Temporal Attention Modelling AAAI -13 July 16, 2013 Video Saliency Detection via Dynamic Consistent Spatio- Temporal Attention Modelling Sheng-hua ZHONG 1, Yan LIU 1, Feifei REN 1,2, Jinghuan ZHANG 2, Tongwei REN 3 1 Department of

More information

Shu Kong. Department of Computer Science, UC Irvine

Shu Kong. Department of Computer Science, UC Irvine Ubiquitous Fine-Grained Computer Vision Shu Kong Department of Computer Science, UC Irvine Outline 1. Problem definition 2. Instantiation 3. Challenge 4. Fine-grained classification with holistic representation

More information

arxiv: v2 [cs.cv] 19 Dec 2017

arxiv: v2 [cs.cv] 19 Dec 2017 An Ensemble of Deep Convolutional Neural Networks for Alzheimer s Disease Detection and Classification arxiv:1712.01675v2 [cs.cv] 19 Dec 2017 Jyoti Islam Department of Computer Science Georgia State University

More information

Segmentation of Cell Membrane and Nucleus by Improving Pix2pix

Segmentation of Cell Membrane and Nucleus by Improving Pix2pix Segmentation of Membrane and Nucleus by Improving Pix2pix Masaya Sato 1, Kazuhiro Hotta 1, Ayako Imanishi 2, Michiyuki Matsuda 2 and Kenta Terai 2 1 Meijo University, Siogamaguchi, Nagoya, Aichi, Japan

More information

Skin cancer reorganization and classification with deep neural network

Skin cancer reorganization and classification with deep neural network Skin cancer reorganization and classification with deep neural network Hao Chang 1 1. Department of Genetics, Yale University School of Medicine 2. Email: changhao86@gmail.com Abstract As one kind of skin

More information

Shu Kong. Department of Computer Science, UC Irvine

Shu Kong. Department of Computer Science, UC Irvine Ubiquitous Fine-Grained Computer Vision Shu Kong Department of Computer Science, UC Irvine Outline 1. Problem definition 2. Instantiation 3. Challenge and philosophy 4. Fine-grained classification with

More information

When Saliency Meets Sentiment: Understanding How Image Content Invokes Emotion and Sentiment

When Saliency Meets Sentiment: Understanding How Image Content Invokes Emotion and Sentiment When Saliency Meets Sentiment: Understanding How Image Content Invokes Emotion and Sentiment Honglin Zheng1, Tianlang Chen2, Jiebo Luo3 Department of Computer Science University of Rochester, Rochester,

More information

The Nottingham eprints service makes this work by researchers of the University of Nottingham available open access under the following conditions.

The Nottingham eprints service makes this work by researchers of the University of Nottingham available open access under the following conditions. Bulat, Adrian and Tzimiropoulos, Georgios (2016) Human pose estimation via convolutional part heatmap regression. In: 14th European Conference on Computer Vision (EECV 2016), 8-16 October 2016, Amsterdam,

More information

What is and What is not a Salient Object? Learning Salient Object Detector by Ensembling Linear Exemplar Regressors

What is and What is not a Salient Object? Learning Salient Object Detector by Ensembling Linear Exemplar Regressors What is and What is not a Salient Object? Learning Salient Object Detector by Ensembling Linear Exemplar Regressors Changqun Xia 1, Jia Li 1,2, Xiaowu Chen 1, Anlin Zheng 1, Yu Zhang 1 1 State Key Laboratory

More information

arxiv: v1 [cs.cv] 17 Aug 2017

arxiv: v1 [cs.cv] 17 Aug 2017 Deep Learning for Medical Image Analysis Mina Rezaei, Haojin Yang, Christoph Meinel Hasso Plattner Institute, Prof.Dr.Helmert-Strae 2-3, 14482 Potsdam, Germany {mina.rezaei,haojin.yang,christoph.meinel}@hpi.de

More information

Chair for Computer Aided Medical Procedures (CAMP) Seminar on Deep Learning for Medical Applications. Shadi Albarqouni Christoph Baur

Chair for Computer Aided Medical Procedures (CAMP) Seminar on Deep Learning for Medical Applications. Shadi Albarqouni Christoph Baur Chair for (CAMP) Seminar on Deep Learning for Medical Applications Shadi Albarqouni Christoph Baur Results of matching system obtained via matching.in.tum.de 108 Applicants 9 % 10 % 9 % 14 % 30 % Rank

More information

GESTALT SALIENCY: SALIENT REGION DETECTION BASED ON GESTALT PRINCIPLES

GESTALT SALIENCY: SALIENT REGION DETECTION BASED ON GESTALT PRINCIPLES GESTALT SALIENCY: SALIENT REGION DETECTION BASED ON GESTALT PRINCIPLES Jie Wu and Liqing Zhang MOE-Microsoft Laboratory for Intelligent Computing and Intelligent Systems Dept. of CSE, Shanghai Jiao Tong

More information

Rumor Detection on Twitter with Tree-structured Recursive Neural Networks

Rumor Detection on Twitter with Tree-structured Recursive Neural Networks 1 Rumor Detection on Twitter with Tree-structured Recursive Neural Networks Jing Ma 1, Wei Gao 2, Kam-Fai Wong 1,3 1 The Chinese University of Hong Kong 2 Victoria University of Wellington, New Zealand

More information

Highly Accurate Brain Stroke Diagnostic System and Generative Lesion Model. Junghwan Cho, Ph.D. CAIDE Systems, Inc. Deep Learning R&D Team

Highly Accurate Brain Stroke Diagnostic System and Generative Lesion Model. Junghwan Cho, Ph.D. CAIDE Systems, Inc. Deep Learning R&D Team Highly Accurate Brain Stroke Diagnostic System and Generative Lesion Model Junghwan Cho, Ph.D. CAIDE Systems, Inc. Deep Learning R&D Team Established in September, 2016 at 110 Canal st. Lowell, MA 01852,

More information

arxiv: v2 [cs.cv] 22 Mar 2018

arxiv: v2 [cs.cv] 22 Mar 2018 Deep saliency: What is learnt by a deep network about saliency? Sen He 1 Nicolas Pugeault 1 arxiv:1801.04261v2 [cs.cv] 22 Mar 2018 Abstract Deep convolutional neural networks have achieved impressive performance

More information

Beyond R-CNN detection: Learning to Merge Contextual Attribute

Beyond R-CNN detection: Learning to Merge Contextual Attribute Brain Unleashing Series - Beyond R-CNN detection: Learning to Merge Contextual Attribute Shu Kong CS, ICS, UCI 2015-1-29 Outline 1. RCNN is essentially doing classification, without considering contextual

More information

AS AN important and challenging problem in computer

AS AN important and challenging problem in computer IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 25, NO. 8, AUGUST 2016 3919 DeepSaliency: Multi-Task Deep Neural Network Model for Salient Object Detection Xi Li, Liming Zhao, Lina Wei, Ming-Hsuan Yang, Senior

More information

Image-Based Estimation of Real Food Size for Accurate Food Calorie Estimation

Image-Based Estimation of Real Food Size for Accurate Food Calorie Estimation Image-Based Estimation of Real Food Size for Accurate Food Calorie Estimation Takumi Ege, Yoshikazu Ando, Ryosuke Tanno, Wataru Shimoda and Keiji Yanai Department of Informatics, The University of Electro-Communications,

More information

Age Estimation based on Multi-Region Convolutional Neural Network

Age Estimation based on Multi-Region Convolutional Neural Network Age Estimation based on Multi-Region Convolutional Neural Network Ting Liu, Jun Wan, Tingzhao Yu, Zhen Lei, and Stan Z. Li 1 Center for Biometrics and Security Research & National Laboratory of Pattern

More information

arxiv: v1 [cs.cv] 12 Apr 2016

arxiv: v1 [cs.cv] 12 Apr 2016 Recurrent Attentional Networks for Saliency Detection arxiv:1604.03227v1 [cs.cv] 12 Apr 2016 Abstract Convolutional-deconvolution networks can be adopted to perform end-to-end saliency detection. But,

More information

Object recognition and hierarchical computation

Object recognition and hierarchical computation Object recognition and hierarchical computation Challenges in object recognition. Fukushima s Neocognitron View-based representations of objects Poggio s HMAX Forward and Feedback in visual hierarchy Hierarchical

More information

FEATURE EXTRACTION USING GAZE OF PARTICIPANTS FOR CLASSIFYING GENDER OF PEDESTRIANS IN IMAGES

FEATURE EXTRACTION USING GAZE OF PARTICIPANTS FOR CLASSIFYING GENDER OF PEDESTRIANS IN IMAGES FEATURE EXTRACTION USING GAZE OF PARTICIPANTS FOR CLASSIFYING GENDER OF PEDESTRIANS IN IMAGES Riku Matsumoto, Hiroki Yoshimura, Masashi Nishiyama, and Yoshio Iwai Department of Information and Electronics,

More information

Rich feature hierarchies for accurate object detection and semantic segmentation

Rich feature hierarchies for accurate object detection and semantic segmentation Rich feature hierarchies for accurate object detection and semantic segmentation Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik UC Berkeley Tech Report @ http://arxiv.org/abs/1311.2524! Detection

More information

Active Deformable Part Models Inference

Active Deformable Part Models Inference Active Deformable Part Models Inference Menglong Zhu Nikolay Atanasov George J. Pappas Kostas Daniilidis GRASP Laboratory, University of Pennsylvania 3330 Walnut Street, Philadelphia, PA 19104, USA Abstract.

More information

Deep Networks and Beyond. Alan Yuille Bloomberg Distinguished Professor Depts. Cognitive Science and Computer Science Johns Hopkins University

Deep Networks and Beyond. Alan Yuille Bloomberg Distinguished Professor Depts. Cognitive Science and Computer Science Johns Hopkins University Deep Networks and Beyond Alan Yuille Bloomberg Distinguished Professor Depts. Cognitive Science and Computer Science Johns Hopkins University Artificial Intelligence versus Human Intelligence Understanding

More information

Convolutional Neural Networks (CNN)

Convolutional Neural Networks (CNN) Convolutional Neural Networks (CNN) Algorithm and Some Applications in Computer Vision Luo Hengliang Institute of Automation June 10, 2014 Luo Hengliang (Institute of Automation) Convolutional Neural Networks

More information

Image Captioning using Reinforcement Learning. Presentation by: Samarth Gupta

Image Captioning using Reinforcement Learning. Presentation by: Samarth Gupta Image Captioning using Reinforcement Learning Presentation by: Samarth Gupta 1 Introduction Summary Supervised Models Image captioning as RL problem Actor Critic Architecture Policy Gradient architecture

More information

Recommending Outfits from Personal Closet

Recommending Outfits from Personal Closet Recommending Outfits from Personal Closet Pongsate Tangseng 1, Kota Yamaguchi 2, and Takayuki Okatani 1,3 1 Tohoku University, Sendai, Japan 2 CyberAgent, Inc., Tokyo, Japan 3 RIKEN Center for AIP, Tokyo,

More information

arxiv: v2 [cs.cv] 3 Jun 2018

arxiv: v2 [cs.cv] 3 Jun 2018 S4ND: Single-Shot Single-Scale Lung Nodule Detection Naji Khosravan and Ulas Bagci Center for Research in Computer Vision (CRCV), School of Computer Science, University of Central Florida, Orlando, FL.

More information

Differential Attention for Visual Question Answering

Differential Attention for Visual Question Answering Differential Attention for Visual Question Answering Badri Patro and Vinay P. Namboodiri IIT Kanpur { badri,vinaypn }@iitk.ac.in Abstract In this paper we aim to answer questions based on images when provided

More information

Automatic Detection of Knee Joints and Quantification of Knee Osteoarthritis Severity using Convolutional Neural Networks

Automatic Detection of Knee Joints and Quantification of Knee Osteoarthritis Severity using Convolutional Neural Networks Automatic Detection of Knee Joints and Quantification of Knee Osteoarthritis Severity using Convolutional Neural Networks Joseph Antony 1, Kevin McGuinness 1, Kieran Moran 1,2 and Noel E O Connor 1 Insight

More information

Design of Palm Acupuncture Points Indicator

Design of Palm Acupuncture Points Indicator Design of Palm Acupuncture Points Indicator Wen-Yuan Chen, Shih-Yen Huang and Jian-Shie Lin Abstract The acupuncture points are given acupuncture or acupressure so to stimulate the meridians on each corresponding

More information

A Visual Saliency Map Based on Random Sub-Window Means

A Visual Saliency Map Based on Random Sub-Window Means A Visual Saliency Map Based on Random Sub-Window Means Tadmeri Narayan Vikram 1,2, Marko Tscherepanow 1 and Britta Wrede 1,2 1 Applied Informatics Group 2 Research Institute for Cognition and Robotics

More information

arxiv: v3 [cs.cv] 26 May 2018

arxiv: v3 [cs.cv] 26 May 2018 DeepEM: Deep 3D ConvNets With EM For Weakly Supervised Pulmonary Nodule Detection Wentao Zhu, Yeeleng S. Vang, Yufang Huang, and Xiaohui Xie University of California, Irvine Lenovo AI Lab {wentaoz1,ysvang,xhx}@uci.edu,

More information

414 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 26, NO. 1, JANUARY 2017

414 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 26, NO. 1, JANUARY 2017 414 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 26, NO. 1, JANUARY 2017 Co-Bootstrapping Saliency Huchuan Lu, Senior Member, IEEE, Xiaoning Zhang, Jinqing Qi, Member, IEEE, Na Tong, Xiang Ruan, and Ming-Hsuan

More information

An Artificial Neural Network Architecture Based on Context Transformations in Cortical Minicolumns

An Artificial Neural Network Architecture Based on Context Transformations in Cortical Minicolumns An Artificial Neural Network Architecture Based on Context Transformations in Cortical Minicolumns 1. Introduction Vasily Morzhakov, Alexey Redozubov morzhakovva@gmail.com, galdrd@gmail.com Abstract Cortical

More information

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering SUPPLEMENTARY MATERIALS 1. Implementation Details 1.1. Bottom-Up Attention Model Our bottom-up attention Faster R-CNN

More information

Quantifying Radiographic Knee Osteoarthritis Severity using Deep Convolutional Neural Networks

Quantifying Radiographic Knee Osteoarthritis Severity using Deep Convolutional Neural Networks Quantifying Radiographic Knee Osteoarthritis Severity using Deep Convolutional Neural Networks Joseph Antony, Kevin McGuinness, Noel E O Connor, Kieran Moran Insight Centre for Data Analytics, Dublin City

More information

arxiv: v1 [cs.cv] 24 Jul 2018

arxiv: v1 [cs.cv] 24 Jul 2018 Multi-Class Lesion Diagnosis with Pixel-wise Classification Network Manu Goyal 1, Jiahua Ng 2, and Moi Hoon Yap 1 1 Visual Computing Lab, Manchester Metropolitan University, M1 5GD, UK 2 University of

More information

A NEW HUMANLIKE FACIAL ATTRACTIVENESS PREDICTOR WITH CASCADED FINE-TUNING DEEP LEARNING MODEL

A NEW HUMANLIKE FACIAL ATTRACTIVENESS PREDICTOR WITH CASCADED FINE-TUNING DEEP LEARNING MODEL A NEW HUMANLIKE FACIAL ATTRACTIVENESS PREDICTOR WITH CASCADED FINE-TUNING DEEP LEARNING MODEL Jie Xu, Lianwen Jin*, Lingyu Liang*, Ziyong Feng, Duorui Xie South China University of Technology, Guangzhou

More information

Edge Detection Techniques Using Fuzzy Logic

Edge Detection Techniques Using Fuzzy Logic Edge Detection Techniques Using Fuzzy Logic Essa Anas Digital Signal & Image Processing University Of Central Lancashire UCLAN Lancashire, UK eanas@uclan.a.uk Abstract This article reviews and discusses

More information

A Review of Co-saliency Detection Technique: Fundamentals, Applications, and Challenges

A Review of Co-saliency Detection Technique: Fundamentals, Applications, and Challenges REGULAR PAPER 1 A Review of Co-saliency Detection Technique: Fundamentals, Applications, and Challenges Dingwen Zhang, Huazhu Fu, Junwei Han, Senior Member, IEEE, Feng Wu, Fellow, IEEE arxiv:1604.07090v2

More information

Lateral Inhibition-Inspired Convolutional Neural Network for Visual Attention and Saliency Detection

Lateral Inhibition-Inspired Convolutional Neural Network for Visual Attention and Saliency Detection The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18) Lateral Inhibition-Inspired Convolutional Neural Network for Visual Attention and Saliency Detection Chunshui Cao, 1,2 Yongzhen Huang,

More information

Development of novel algorithm by combining Wavelet based Enhanced Canny edge Detection and Adaptive Filtering Method for Human Emotion Recognition

Development of novel algorithm by combining Wavelet based Enhanced Canny edge Detection and Adaptive Filtering Method for Human Emotion Recognition International Journal of Engineering Research and Development e-issn: 2278-067X, p-issn: 2278-800X, www.ijerd.com Volume 12, Issue 9 (September 2016), PP.67-72 Development of novel algorithm by combining

More information

arxiv: v1 [cs.cv] 4 Jul 2018

arxiv: v1 [cs.cv] 4 Jul 2018 Noname manuscript No. (will be inserted by the editor) An Integration of Bottom-up and Top-Down Salient Cues on RGB-D Data: Saliency from Objectness vs. Non-Objectness Nevrez Imamoglu Wataru Shimoda Chi

More information

Dual Path Network and Its Applications

Dual Path Network and Its Applications Learning and Vision Group (NUS), ILSVRC 2017 - CLS-LOC & DET tasks Dual Path Network and Its Applications National University of Singapore: Yunpeng Chen, Jianan Li, Huaxin Xiao, Jianshu Li, Xuecheng Nie,

More information

Weakly Supervised Coupled Networks for Visual Sentiment Analysis

Weakly Supervised Coupled Networks for Visual Sentiment Analysis Weakly Supervised Coupled Networks for Visual Sentiment Analysis Jufeng Yang, Dongyu She,Yu-KunLai,PaulL.Rosin, Ming-Hsuan Yang College of Computer and Control Engineering, Nankai University, Tianjin,

More information

A Study on Automatic Age Estimation using a Large Database

A Study on Automatic Age Estimation using a Large Database A Study on Automatic Age Estimation using a Large Database Guodong Guo WVU Guowang Mu NCCU Yun Fu BBN Technologies Charles Dyer UW-Madison Thomas Huang UIUC Abstract In this paper we study some problems

More information

THE human visual system has the ability to zero-in rapidly onto

THE human visual system has the ability to zero-in rapidly onto 1 Weakly Supervised Top-down Salient Object Detection Hisham Cholakkal, Jubin Johnson, and Deepu Rajan arxiv:1611.05345v2 [cs.cv] 17 Nov 2016 Abstract Top-down saliency models produce a probability map

More information

arxiv: v1 [cs.cv] 12 Jun 2018

arxiv: v1 [cs.cv] 12 Jun 2018 Multiview Two-Task Recursive Attention Model for Left Atrium and Atrial Scars Segmentation Jun Chen* 1, Guang Yang* 2, Zhifan Gao 3, Hao Ni 4, Elsa Angelini 5, Raad Mohiaddin 2, Tom Wong 2,Yanping Zhang

More information

Network Dissection: Quantifying Interpretability of Deep Visual Representation

Network Dissection: Quantifying Interpretability of Deep Visual Representation Name: Pingchuan Ma Student number: 3526400 Date: August 19, 2018 Seminar: Explainable Machine Learning Lecturer: PD Dr. Ullrich Köthe SS 2018 Quantifying Interpretability of Deep Visual Representation

More information

3D Deep Learning for Multi-modal Imaging-Guided Survival Time Prediction of Brain Tumor Patients

3D Deep Learning for Multi-modal Imaging-Guided Survival Time Prediction of Brain Tumor Patients 3D Deep Learning for Multi-modal Imaging-Guided Survival Time Prediction of Brain Tumor Patients Dong Nie 1,2, Han Zhang 1, Ehsan Adeli 1, Luyan Liu 1, and Dinggang Shen 1(B) 1 Department of Radiology

More information

Research Article Multiscale CNNs for Brain Tumor Segmentation and Diagnosis

Research Article Multiscale CNNs for Brain Tumor Segmentation and Diagnosis Computational and Mathematical Methods in Medicine Volume 2016, Article ID 8356294, 7 pages http://dx.doi.org/10.1155/2016/8356294 Research Article Multiscale CNNs for Brain Tumor Segmentation and Diagnosis

More information

Action Recognition. Computer Vision Jia-Bin Huang, Virginia Tech. Many slides from D. Hoiem

Action Recognition. Computer Vision Jia-Bin Huang, Virginia Tech. Many slides from D. Hoiem Action Recognition Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem This section: advanced topics Convolutional neural networks in vision Action recognition Vision and Language 3D

More information

IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING 1

IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING 1 IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING Joint Classification and Regression via Deep Multi-Task Multi-Channel Learning for Alzheimer s Disease Diagnosis Mingxia Liu, Jun Zhang, Ehsan Adeli, Dinggang

More information

Efficient Salient Region Detection with Soft Image Abstraction

Efficient Salient Region Detection with Soft Image Abstraction Efficient Salient Region Detection with Soft Image Abstraction Ming-Ming Cheng Jonathan Warrell Wen-Yan Lin Shuai Zheng Vibhav Vineet Nigel Crook Vision Group, Oxford Brookes University Abstract Detecting

More information

Revisiting RCNN: On Awakening the Classification Power of Faster RCNN

Revisiting RCNN: On Awakening the Classification Power of Faster RCNN Revisiting RCNN: On Awakening the Classification Power of Faster RCNN Bowen Cheng 1, Yunchao Wei 1, Honghui Shi 2, Rogerio Feris 2, Jinjun Xiong 2, and Thomas Huang 1 1 University of Illinois at Urbana-Champaign,

More information

A HMM-based Pre-training Approach for Sequential Data

A HMM-based Pre-training Approach for Sequential Data A HMM-based Pre-training Approach for Sequential Data Luca Pasa 1, Alberto Testolin 2, Alessandro Sperduti 1 1- Department of Mathematics 2- Department of Developmental Psychology and Socialisation University

More information

A Deep Multi-Level Network for Saliency Prediction

A Deep Multi-Level Network for Saliency Prediction A Deep Multi-Level Network for Saliency Prediction Marcella Cornia, Lorenzo Baraldi, Giuseppe Serra and Rita Cucchiara Dipartimento di Ingegneria Enzo Ferrari Università degli Studi di Modena e Reggio

More information

The Impact of Visual Saliency Prediction in Image Classification

The Impact of Visual Saliency Prediction in Image Classification Dublin City University Insight Centre for Data Analytics Universitat Politecnica de Catalunya Escola Tècnica Superior d Enginyeria de Telecomunicacions de Barcelona Eric Arazo Sánchez The Impact of Visual

More information

DeepMiner: Discovering Interpretable Representations for Mammogram Classification and Explanation

DeepMiner: Discovering Interpretable Representations for Mammogram Classification and Explanation DeepMiner: Discovering Interpretable Representations for Mammogram Classification and Explanation Jimmy Wu 1, Bolei Zhou 1, Diondra Peck 2, Scott Hsieh 3, Vandana Dialani, MD 4 Lester Mackey 5, and Genevieve

More information

LSTD: A Low-Shot Transfer Detector for Object Detection

LSTD: A Low-Shot Transfer Detector for Object Detection LSTD: A Low-Shot Transfer Detector for Object Detection Hao Chen 1,2, Yali Wang 1, Guoyou Wang 2, Yu Qiao 1,3 1 Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, China 2 Huazhong

More information

Motivation: Attention: Focusing on specific parts of the input. Inspired by neuroscience.

Motivation: Attention: Focusing on specific parts of the input. Inspired by neuroscience. Outline: Motivation. What s the attention mechanism? Soft attention vs. Hard attention. Attention in Machine translation. Attention in Image captioning. State-of-the-art. 1 Motivation: Attention: Focusing

More information

arxiv: v1 [cs.lg] 4 Feb 2019

arxiv: v1 [cs.lg] 4 Feb 2019 Machine Learning for Seizure Type Classification: Setting the benchmark Subhrajit Roy [000 0002 6072 5500], Umar Asif [0000 0001 5209 7084], Jianbin Tang [0000 0001 5440 0796], and Stefan Harrer [0000

More information