Captioning Videos Using Large-Scale Image Corpus

Size: px
Start display at page:

Download "Captioning Videos Using Large-Scale Image Corpus"

Transcription

1 Du XY, Yang Y, Yang L et al. Captioning videos using large-scale iage corpus. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 32(3): May 207. DOI 0.007/s Captioning Videos Using Large-Scale Iage Corpus Xiao-Yu Du,2, Meber, CCF, Yang Yang 3,4, Meber, CCF, ACM, IEEE, Liu Yang,5 Fu-Min Shen 3,4, Meber, CCF, ACM, IEEE, Zhi-Guang Qin, Senior Meber, CCF, Meber, ACM, IEEE and Jin-Hui Tang,6,, Senior Meber, CCF, IEEE, Meber, ACM School of Inforation and Software Engineering, University of Electronic Science and Technology of China Chengdu 60054, China 2 School of Software Engineering, Chengdu University of Inforation Technology, Chengdu 60225, China 3 Center for Future Media, University of Electronic Science and Technology of China, Chengdu 673, China 4 School of Coputer Science and Engineering, University of Electronic Science and Technology of China Chengdu 673, China 5 Sichuan University West China Hospital of Stoatology, Chengdu 6004, China 6 School of Coputer Science and Engineering, Nanjing University of Science and Technology, Nanjing 20094, China E-ail: duxiaoyu@cuit.edu.cn; dlyyang@gail.co; yangliu988322@63.co; fuin.shen@gail.co E-ail: qinzg@uestc.edu.cn; jinhuitang@njust.edu.cn Revised Deceber 26, 206; revised April 7, 207. Abstract Video captioning is the tas of assigning coplex high-level seantic descriptions(e.g., sentences or paragraphs) to video data. Different fro previous video analysis techniques such as video annotation, video event detection and action recognition, video captioning is uch closer to huan cognition with saller seantic gap. However, the scarcity of captioned video data severely liits the developent of video captioning. In this paper, we propose a novel video captioning approach to describe videos by leveraging freely-available iage corpus with abundant literal nowledge. There are two ey aspects of our approach: ) effective integration strategy bridging videos and iages, and 2) high efficiency in handling ever-increasing training data. To achieve these goals, we adopt sophisticated visual hashing techniques to efficiently index and search large-scale iages for relevant captions, which is of high extensibility to evolving data and the corresponding seantics. Extensive experiental results on various real-world visual datasets show the effectiveness of our approach with different hashing techniques, e.g., LSH (locality-sensitive hashing), PCA-ITQ (principle coponent analysis iterative quantization) and supervised discrete hashing, as copared with the state-of-the-art ethods. It is worth noting that the epirical coputational cost of our approach is uch lower than that of an existing ethod, i.e., it taes /256 of the eory requireent and /64 of the tie cost of the ethod of Devlin et al. Keywords video captioning, hashing, iage captioning Introduction In recent years, driven by the rapid developent of obile devices, Internet and social sharing platfors, a treendous aount of user-generated video clips have been eerging on the Web. As an iportant alternative to wordsand iages, such UGC videoshavesignificantly changed the way people live and counicate. Every inute, there are -hour videos uploaded to YouTube and 39 illion videos uploaded to Vine 2. Regular Paper Special Section of CVM 207 This wor was partially supported by the National Basic Research 973 Progra of China under Grant No. 204CB347600, the National Natural Science Foundation of China under Grant Nos , , , and 6208, the National Ten-Thousand Talents Progra of China (Young Top-Notch Talent), the National Thousand Young Talents Progra of China, the Fundaental Research Funds for the Central Universities of China under Grant Nos. ZYGX204Z007 and ZYGX205J055, and the Natural Science Foundation of Jiangsu Province of China under Grant No. BK Corresponding Author Mar Mar Springer Science + Business Media, LLC & Science Press, China

2 Xiao-Yu Du et al.: Captioning Videos Using Large-Scale Iage Corpus 48 Besides, averagely.5 illion videos are uploaded to Miaopai 3 every day. Such a large aount of videos are ostly lac of descriptions. How to seantically index these video data and ae the easier to see and coprehend is a significant proble []. AsshowninFig.,videoannotation [2] focusesonderiving siple seantic concepts fro low-level features, i.e., describing videos in natural language words rather than nueric features. To coprehend ore seantic inforation fro video data, event detection [3], action recognition [], refined frae tags [4], deduced tags [5] and any other seantic descriptions [6-7] were designed for learning ore coplex seantics [8]. Nevertheless, the approaches are based on classification and their results are liited in certain pre-defined concept/event set. The descriptive power of these results is very liited. Providing users coprehensive descriptions, video captioning, which describes videos with sentences or paragraphs, is in urgent need. (a) (b) Fig.. Describing video clips using video annotation, action recognition, event detection and video captioning. In existing video captioning approaches, convolutional neural networ(cnn) and long-short ter eory (LSTM) are widely used. The conventional way is first adapting CNN odel to extract video features and then utilizing LSTM to generate video captions. Nonetheless, there are two disadvantages. ) Extreely Coplex Neural Networs for Video Data. The coplex structure slows down the execution efficiency which liits the extensibility for variable large-scale datasets. 2) Scarcity Proble in Existing Video Corpus. For instance, one of the ost popular video captioning datasets, YouTube2Text (Y2T), only contains 970 video clips with captions. Such scale of training data can hardly guarantee reliable odels to generalize to new video data. In order to address the above probles, we turn to seeing for auxiliary captioned data. Iage datasets always contain various corpus (e.g., isitu 4 provides situation annotations [9] ). They are always used as auxiliary inforation in video processing [0-]. Moreover we notice that there are any iage captioning datasets with sufficiently relevant captions. Integrating iage captioning corpus ay be a nice way to augent the scale of corpus for video captioning. There are two ajor iage captioning directions. The first one is to extract iage features (e.g., CNN feature, and the supervised and unsupervised features [2] ) and then generate captions by LSTM. The other is to generate captions fro the descriptions of the siilar iages of the query iage. As entioned before, the first direction is less extensible and does not perfor well when we have the proble of data scarcity and data updating. Therefore, in this paper, we choose to follow the second direction and refer to it as the retrieval approach in the subsequent content. With the frequently-updated training data, the retrieval approach only needs to process the new data. In contrast, the deep learning approach ay have to retrain the coplex odel, which can hardly be afforded. Meanwhile, even if there are a very sall aount of siilar iages for a query iage, retrieval approach would perfor well by accurately discovering the ost relevant saples to guarantee the perforance. It is 3 Mar Mar. 207.

3 482 J. Coput. Sci. & Technol., May 207, Vol.32, No.3 intuitive that when the size of training data is large enough, the retrieval approach is able to achieve good perforance. However, the coputational cost of - nearest neighbor (NN) search is extreely high as the diensionality of visual feature is norally large, e.g., the fc7 layer (4096-D) fro Alexnet or VGG. Moreover, original visual features ay contain redundancy and/or noise which degrades the perforance. In order to copensate the drawbac of traditional NN search, we propose to eploy hashing technique to index visual saples. With the excellent ability of reducing storage and coputational cost, hashing [3] has been extensively studied and applied to ultiedia retrieval in large-scale databases. The ain idea is to utilize a sall set of binary bits to represent a saple instead of a large nuber of real-valued features [4]. The siilarity between saples can be efficiently coputed with the Haing distance between the corresponding hash codes. Locality-sensitive hashing (LSH) is nown as one of the ost popular data-independent hashing ethods. In recent years, data-dependent or learning-based hashing ethods becoe ore popular since they perfor uch better in accuracy and copressing rate. Iterative quantization (ITQ) is a typical unsupervised optiization algorith. The popular unsupervised hashing ethod PCA-ITQ [5] copresses features by principle coponent analysis (PCA) and then optiizes the features by ITQ. The recently proposed supervised discrete hashing (SDH) [6] is a wellperfored supervised hashing ethod. In this paper, we propose a novel video captioning approach using hashing techniques. First of all, we utilize visual hashing ethods to preprocess the visual eleents including iages and videos. We instantiate our ethod with SDH in this paper. Then we retrievesiilar iages to query video and collect the corresponding captions as candidate captions. At last, we re-ran the candidate captions and select the consensus one as our captioning result. The ain contributions of this paper are suarized as follows. We propose a novel video captioning approach by leveraging well-captioned iage data. The sufficient seantic descriptions in large-scale iage datasets help to avoid the deficiency of captioned videos. We devise a siple yet effective approach to align iage and video data to alleviate the influence of doain difference, which bridges the iages and videos by extracting representative video fraes. By utilizing the hashing techniques, copared with the approach proposed in [7], the proposed retrieval captioning approach achieves 64 ties speedup with only /256 ties eory overhead. Hash codes are used for effective and efficiency feature copression, which aes retrieval approach easily extensible to eerging data. The reinder of this paper is organized as follows. Section 2 presents related wor. Section 3 elaborates the details of our proposed approach. Section 4 deonstrates the experiental results, followed by the conclusions in Section 5. 2 Related Wor The rapid developent of deep neural networ (DNN) otivates the progress of ultiedia processing. Convolutional neural networ (CNN) [8] is a ind of DNN to process iage inforation [9]. Recurrent neural networ (RNN) [20] is another ind of DNN widely used in natural language generation [2]. To reduce the coplexity of DNN ipleentation, various DNN tools were developed. Caffe [22] is the faous one whichcanconstruct,train,testandiprovethenetina siple way. Therefore any classical CNN odels are trained on Caffe including Alexnet [8], GoogLeNet [23] and VGG [24]. Since the CNN odels always perfor well across doains [25], the trained odels are frequently selected to extract seantic features. The fully connection layers naed fc7 in Alexnet and VGG6 are always treated as seantic features of a given iage. The perforances in iage classification and siilarity judgent are better than the traditional features. While features are extracted, captioning approaches would generate sentences word by word. Soe existing researches utilized probability odels such as Marov chain [26], expectation axiization [27] and axiu entropy [28] odels. Soe directly utilized neural networ odels RNN [29], and LSTM [30], and soe ixed the CNN and RNN [3-32] to train an integrative odel. In the entioned approaches, the descriptions are generated echanically, and thus we call the generation approaches. Another ind of captioning approaches taes existing huan captions as the captioning result. They are called retrieval approaches. One of the retrieval approaches perfors well in MSCOCO challenge [33]. It is a siple and even rude solution. Given a query iage, several siilar iages are collected as candidate iages. Then the candidate captions which refer to the candidate iages are rearranged. The best caption is treated as the consensus caption. The retrieval approach was first pro-

4 Xiao-Yu Du et al.: Captioning Videos Using Large-Scale Iage Corpus 483 posed in [34]. Ordonez et al. [34] constructed one illion captioned iages corpus and designed the captioning experients. And the experients deonstrated that the ore iages appended to dataset, the ore effective the results are. To retrieve siilar iages, Devlin et al. [7] used the nearest neighbor (NN) ethod. Hashing is another faous retrieving ethod especially in large-scale datasets. The ain idea is effectively copressing features into a set of binary codes (hash codes) to efficiently calculate the siilarity using Haing distance [35]. Soe hashing ethods were accelerated [36] and soe could extend the identifications [37]. To generalize the process, soe hashing fraewors were proposed [38]. There are three inds of hashing: ) data-independent ethods such as locality sensitive hashing (LSH) [39], 2) unsupervised ethods such as RDSH [40], IMH [4] and PCA-ITQ [5], and 3) supervised ethods such as LDA-HASH [42], CCA-ITQ [5], RDCM [43] and SDH [6]. PCA-ITQ and CCA-ITQ are diensionality reduction ethods with iterative quantization and perfor well presented in [5]. Shen et al. [6] deonstrated that SDH perfors uch better than other supervised hashing ethods. Generally the supervised ethods perfor better than the unsupervised ones, and the unsupervised ones perfor better than data-independent ones. It is notable that the nuber of hash bits also significantly affects the perforance. For exaple, LSH needs at least 52 bits to retrieve fairly siilar iages epirically. Many iage captioning approaches have been applied to video captioning. Iproving the iage generation approaches has been a widely-used way. Ballas et al. [44] arranged several CNNs in a CNN atrix to process the adjacent fraes synchronously. Mazloo et al. [45] too the bag-of-words with the SIFT features [46], MFCC [47] and attention points as video features. Shetty and Laasonen [48] incorporated frae features extracted by CNN, classification fro CNN and video feature as the input of LSTM. Yao et al. [49] fored a 3D-CNN encoder-decoder frae, with which spatio-teporal vectors are divided to chuns as the input of LSTM. Pan et al. [] proposed a hierarchical recurrent neural networ to extract the video features. Sener et al. [5] put inforation fro videos and subtitlesintobag-of-wordsandtooitasthe inputoflstm. The researches deonstrate that the ideas fro iage captioning approaches benefit videos. Therefore, the retrieval approaches rarely used in video captioning are another captioning choice. 3 Proposed Approach In this section, we describe the proposed video captioning approach. As shown in Fig.2, we first integrate video corpus and iage corpus by apping the into a coon visual seantic space, and then copress the to binary codes. Given a query video, we retrieve the siilar saples fro the apped corpus and collect the captions related to the siilar eleents as our candidate captions. Finally, candidate captions are rearranged, and the best caption also called consensus caption is selected as our caption for the query video. 3. Data Integration Note that each video ay consist of a sequence of fraes, which aes it difficult to directly copare videos and iages. In this part, we integrate videos and iages in the sae space where we could calculate their siilarity. We design a siple yet effective approach to achieve this goal. Based on the fact that current UGC videos are very short (norally a few seconds), it is reasonable to assue that the seantics of a video is focused. In fact, we have analyzed the distribution of video length in the Y2T dataset, and the average length is 9 s. As illustrated in Fig.3, the representative videos are seantically focused. Meanwhile, if we use ultiple fraes to represent videos, the coon part of these fraes (e.g., bacground) ay be over-ephasized. Therefore, it is possible for us to use only one frae to represent a video, which also provides a straightforward way to connect iage representation and video representation. In this wor, we adopt the fc7 layer of VGG6 as the visual feature for both the iage and the video. 3.2 Visual Hashing The visual features are usually in high-diensional space (4 096-D for AlexNet CNN features). We copress the features to a set of binary codes with hashing ethods. There are various hashing ethods. We select several typical hashing ethods in different inds as our benchar. LSH. Locality sensitive hashing (LSH) [39] is a fundaental hashing ethod and it is data-independent. Codes are generated using rando projection. Randoly constructing a weight atrix W R d b, where d is the nuber of source feature diensions and b is the nuber of hash bits, the codes can be calculated by C = (F W > 0). Here F R n d indicates n saples with d-diensional features.

5 484 J. Coput. Sci. & Technol., May 207, Vol.32, No.3 Training Set Test Set d d2 d d2 d d2 d d2 d d2 d d Captioning d d2 Hashing CNN Iages in Corpus Videos in Corpus Preprocessing Query Video Mixed Corpus Retrieving Siilar Eleents Candidate Captions A Cat Is on A Table Is Consensus Captions A Cat Is Sitting on the Table A Dog Is... Fig.2. Video clip captioning flow. get a transfor atrix T to indicate the PCA reduction and a transfor atrix R to indicate the ITQ operation. Therefore, the weight atrix of PCA-ITQ is W = T R. The hash codes can be calculated by C = (F W > 0) as well. SDH. The recently proposed supervised discrete hashing (SDH)[6] is a well-perfored supervised hashing ethod. In this paper, we utilize SDH as our video hashing odule. SDH learns binary codes and hash functions through the following proble. in B,W,F n X yi W T bi 2 + λ W 2 + i= n X ν bi H(xi ) 2 i= Fig.3. Fraes extracted fro video clips. The ain ideas to be explained are clear in the fraes. PCA-ITQ. PCA-ITQ[5] is one of the ost popular unsupervised hashing ethods. Different fro LSH, it is optiized with data features to get better perforance. PCA-ITQ contains two steps: ) utilizing principle coponent analysis (PCA) to reduce feature diensions, and 2) utilizing ITQ to optiize features by shifting and rotating. As shown in [5], we would s.t. bi {, }L. That is, in B,W,F Y W T B 2 + λ W 2 + ν B H(X) 2 s.t. B {, }L n. Here H( ) is the hash function to be learned, λ and ν are penalty paraeters, X = {xi }ni= is the data atrix,

6 Xiao-Yu Du et al.: Captioning Videos Using Large-Scale Iage Corpus 485 Y = {y i } n i= {0,}C n is the ground truth label atrix, W = {w } C = RL C is the classification vector, and B = {b i } n i= {,}L n is the learned binary codes, where the i-th colun b i is the binary code for x i. The ey of SDH is that binary code optiization is not solved with resort to continuous relaxations but directly with discrete optiization. In SDH, the optiization is processed with three alternating steps: W- step, F-step and B-step [6]. With the obtained hash function H( ), our hash codes could be generated as C = (H(X) > 0). 3.3 Video Captioning In this part, we describe how to generate captions for videos using hashing and auxiliary iages. The overall flowchart is illustrated in Fig.2, which is coprised of three ajor coponents. Preliinarily, we transfor all training iages and videos into hash codes using the learned hash functions. All the hash codes can be directly stored in the ain eory. Given a target video v, we go through the following steps. Step. We first export the representative frae and extract the fc7 layer of VGG as the visual feature. Then, we convert the visual feature to hash codes, denoted as q, using the learned hash functions. Step 2. Given q, we search the whole database of binary codes to obtain siilar saples. Note that the siilarity between two saples is calculated according to Haing distance. Given a code of retrieved saple p {0,} L, the Haing distance of q and p is defined as L hd(q,p) = XOR(q l,p l ), l= where q l and p l are the l-th eleents of q and p, respectively. The eleents with the sallest Haing distance are the siilar saples. Step 3. Finally, we collect the captions of all retrieved saples to for a candidate caption set, denoted as C. To generate the desired caption for the target, we intend to first find a axiu subset of size, denoted as M C. Then, we decide the ost representative caption c if c is the ost siilar one to the eleents in M. Following [7], we tae BLEU- 4 [52] score as our caption siilarity and our consensus caption is forally generated by: c = argaxax c C M C c M Si(c,c ), where Si(, ) coputes the siilarity of two captions. 4 Experients In this section, we discuss the perforance of our proposed approach applied in video captioning. The experients contain two steps. At the first step, we iprove the iage retrieval captioning approach, and copare the effects using different inds of hashing ethods and NN ethod. At the second step, we apply our video captioning approach and illustrate the interesting results. We score the approaches in the experients by BLEU [52], METEOR [53], CIDEr [54] and ROUGE [55], which are calculated with the MSCOCO evaluation server. 4. Datasets Our experients are designed on iages and videos. Therefore the iage and video datasets shown below are selected to illustrate the ultiple aspects of our approach. MSCOCO. It is an iage captioning dataset provided by Microsoft Corporation. Each iage has at least five captions. The training set, validation set and test set have been separated and an evaluation syste is provided for results evaluating uniforly. There are totally iages in the training set (one of the is daaged), 40 4 iages in the validation set and in the test set. The test set is confidential for us; therefore we tae the validation set as our test data. YouTube2Text (Y2T). It is a user-generated video dataset with various clean and noisy captions. There are totally 970 videos and captions of the are English captions and are cleaned English captions. Statistically the average length of the videos is 9 s. Most of the can be described using one sentence. Following [44, 56-57] we tae videos as the training saples and the other 670 videos as the test data. In our experients, the official evaluation syste of MSCOCO is used to score our results. The MSCOCO dataset is used to evaluate our iage captioning approach and the Y2T dataset is used to evaluate our video captioning approach. MSCOCO is also used as an extended corpus for video captioning. The nubers of n-gras in the entioned corpus are shown in Table. Obviously the n-gras in iage corpus are uch larger than those in video corpus. Using iage corpus ay be a way to extend video corpus.

7 486 J. Coput. Sci. & Technol., May 207, Vol.32, No.3 Table. Statistics of n-gras in Y2T and MSCOCO Dataset -Gra 2-Gra 3-Gra 4-Gra Total Y2T MSCOCO Training MSCOCO Validation Total (unique) Corpus Codes At the first step of our experients, the visual eleents are referred to as binary codes using VGG6 and hashing ethods. We explore the perforances of typical hashing ethods including data-independent hashing LSH, unsupervised hashing PCA-ITQ, and supervised hashing SDH. To evaluate the perforances on large-scale corpus, we exploit the results using 52 bits hashing codes and 64 bits hashing codes. Using 52 bits hashing codes costs only 64 bytes eory per iage. Storing all the iages costs less than 0 MB eory. Using 64-bit hashing codes costs uch less. In contrast, taing the diensional fc7 layerofvgg6asaniagefeaturecosts6 KBperiage and nearly 2 GB in total. That is about 256 ties ore than using 52-bit codes. The saved storage grows ore obviously in larger scale datasets. Table 2 deonstrates the eory cost of our test datasets D indicatesthe features using4096floatssuch asthe original feature exported fro the Alexnet. 52-bit and 64- bit indicate the features using 52 bits and 64 bits respectively. Obviously the copressed features are uch saller than the original features. Table 2. Meory Requireents Using Different Sizes of Features for Visual Eleents in Y2T and MSCOCO Dataset D 52-Bit 64-Bit Y2T 3 MB 24.0 KB 6 KB MSCOCO 894 MB 7.4 MB MB 4.3 Siilar Iage Retrieval Fig.4 deonstrates the siilar iage retrieving results using NN, LSH, PCA-ITQ and SDH. Fro the five ost siilar iages, we can hardly judge which ethod is better. The hashing ethods could get effective candidate iages as good as NN. Therefore, hashing ethods wor well in retrieving siilar iages. This change aes the execution cost uch less tie, as shown in Fig.5. With the increasing aount of data, NN costs ore and ore tie than the hashing ethods. When there areabout 0 6 iages, using NN costs about 0 inutes. In contrast, hashing ethods cost several icro-seconds still. Obviously, after our iproveents, the retrieval step is ore appropriate for large-scale datasets. 4.4 Iage Captioning Result To explore the effects of iage captioning using NN, LSH, PCA-ITQ and SDH, we design several tests on the MSCOCO dataset. Following [7], we tae half of the validation set (20252 iages) as tuning data and tae the reains as the test data, and we score the results using the MSCOCO evaluation syste. There are two variables and for iage captioning. We utilize tuning data to explore the best and for our approaches. Fig.6 shows the effect of - paraeters. Generally the variances of different approaches are siilar. Table 3 presents that retrieval approaches using hashing ethods perfor as well as those using NN ethod. The scores are based on BLEU [52], METEOR [53], CIDEr [54] and ROUGE [55]. This deonstrates our proposed iage captioning approach is effective and efficient. Because of less eory and tie cost, our approach could be used in a uch larger dataset. The results also illustrate that although our approaches using different hashing ethods have siilar perforances, there are still a fine distinction. The supervised ethod perfors better than the unsupervised ethod, and the unsupervised ethod is better than the data-independent ethod, because of the descending order of scores. Besides, we copare the codes using 52 bits with the codes using 64 bits. As shown in Table 3, using 64- bit, SDH still perfors well, but LSH perfors uch worse. When resources are liited, using 64-bit SDH ust be a better choice. 4.5 Video Captioning Result Considering soe frae in the video can denote the ain contents of the video, we select one of the fraes

8 487 Xiao-Yu Du et al.: Captioning Videos Using Large-Scale Iage Corpus Query Iage Query Iage SDH NN LSH PCA-ITQ (b) (a) Fig.4. Siilar iages retrieved by SDH, NN, LSH, and PCA-ITQ. Tie (s) as the videos indicator. Here we tae the first fraes and the iddle fraes of the videos for coparison. We tae the test set of Y2T as ground truth, the entire MSCOCO dataset and the Y2T training set as our labeled corpus respectively. Table 4 presents the results using our SDH retrieval captioning approach. This siple and even rude ethod perfors as well as the approach proposed in [57]. The result also deonstrates that the first fraes and the iddle fraes perfor siilarly and both contain ain video contents. (Τ03) CNN fc D Feature 52-Bit Hashing Code 2 3 log(x) Fig.5. Tie cost for corpus with X iages. For ore details, we list soe captioned videos in Fig.7 to present the captioning results. Obviously the generated captions are related to the query video. The inaccuracies are ainly caused by the deficiency of related corpus. The captions for the videos in Fig.7(a) use puppy and dog to describe the sae anial. Even though both of the are correct, the outer corpus would obtain a lower score. Most situations siilar to videos in Fig.7(b) stand for a party. Thus the caption based on MSCOCO describes soe details of a party which causes a low score. In videos of Fig.7(c), the focuses of the two corpora are uch different: Y2T says the process while MSCOCO says the status. Fig.7(d) deonstrates that only if there are enough phrases in MSCOCO, it ay perfor better than Y2T. Generally, the Y2T training dataset is a better training dataset than the MSCOCO dataset in captioning Y2T test data, as deonstrated in Table 4, even though there are less descriptions in Y2T corpus. The BLEU- score presents that the results on MSCOCO contain siilar inforation. But the uch lower BLEU-4 score presents that the phrases and sentence structures in MSCOCO are uch different. With the increasing size of corpus, the phrases and sentences would cover ost of the graars and this proble would be changed. Moreover, MSCOCO is not large enough so that the iage types are deficient in our application. Fig.8 shows two unatched saples. The first one is a girl putting soe sticer on her face. There are 464 captions in MSCOCO related to sticer, but only three captions are related to woan, woen or girl. The second saple presents a siilar proble. The captions including words cat and turtle are laid there. Besides, the results are liited by the extracted features. Although VGG6 is alost the best

9 488 J. Coput. Sci. & Technol., May 207, Vol.32, No.3 SDH NN PCA-ITQ LSH (a) (b) (c) (d) Fig.6. Captioning scores while selecting the consensus caption fro subsets of the captions for the siilar iages. (a) BLEU-4. (b) ROUGE L. (c) METEOR. (d) CIDEr. Table 3. Scores of Our Approaches with Different Hashing Methods on MSCOCO Validation Approach BLEU- BLEU-2 BLEU-3 BLEU-4 CIDEr METEOR ROUGE MS NN [7] bit Ours-LSH Ours-PCA-ITQ Ours-SDH bit Ours-LSH Ours-PCA-ITQ Ours-SDH Table 4. Results of Video Retrieval Captioning Approach on Y2T Using First Frae and Middle Frae Frae Corpus BLEU- BLEU-2 BLEU-3 BLEU-4 CIDEr METEOR ROUGE First Y2T MSCOCO Middle Y2T MSCOCO Thoason et al. [57]

10 489 Xiao-Yu Du et al.: Captioning Videos Using Large-Scale Iage Corpus (a) (b) (c) (d) Fig.7. Captions for four saple videos generated by our approaches with Y2T and MSCOCO corpus. (a) Y2T: a puppy plays with a ball. MSCOCO: a dog laying on the floor in a roo. (b) Y2T: a an is playing a guitar onstage. MSCOCO: a group of people standing around and dancing. (c) Y2T: soeone is slicing bread into slices. MSCOCO: a plate of food on a table. (d) Y2T: a an is riding a horse barebac. MSCOCO: a an riding on the bac of an elephant. features for iage classification, it lacs salient point. While we are finding a panda appearing in a clip, there are only one panda iage in 0-ost siilar iages. Most of the retrieving results focus on the bacground and the rest are related to the color. To eliinate the effect fro hashing ethods, we retrieve the -siilar iages using NN. The results shown in Fig.9 confir our discovery. Fig.8. No appropriate captions for the query clip found. The blue caption is a user-generated caption for the clip. The red ones are all the captions related to the eywords. Fig.9. Iages siilar to a panda iage.

11 4 J. Coput. Sci. & Technol., May 207, Vol.32, No.3 Finally we explore the relations between the corpus size and the captioning results. We integrate the training set and the validation set of MSCOCO and get totally iages. Then we design a 3-step experient (s s3): at the first step s, we tae the iages with their captions as our corpus. At each of the following steps we append 0000 ore iages to the corpus. At the last step s3 we use the entire iages. Then we score the results at each step. Fig.0 presents that with the corpus size increased fro s to s3, our approach gets a higher and higher score. Obviously large-scale corpus does help in video captioning. 5 Conclusions In this paper, we proposed a novel video captioning approach to describe videos by leveraging freelyavailable iage corpus with abundant literal nowledge. Hashing techniques are utilized to significantly iprove both coputation and storage efficiencies. The results deonstrated that our approach is exactly effective and efficient. Our siple captioning approach adapts to the large-scale datasets. The volues of ultiedia data in the Internet are explosively increasing. Since the noisy data disturbs our approaches, how to clean the large-scale data, integrate the, and use other auxiliary features, ay be the next topic First Frae Middle Frae First Frae Middle Frae BLEU_ Score BLEU_4 Score Step (a) Step (b) First Frae Middle Frae 0.65 First Frae Middle Frae CIDEr Score METEOR Score Step Step (c) (d) Fig.0. Scores of video captioning using different scales of MSCOCO corpus. References [] Song Y, Tang J H, Liu F, Yan S C. Body surface context: A new robust feature for action recognition fro depth videos. IEEE Transactions on Circuits and Systes for Video Technology, 204, 24(6): [2] Qi G J, Hua X S, Rui Y, Tang J H, Mei T, Zhang H J. Correlative ulti-label video annotation. In Proc. the 5th

12 Xiao-Yu Du et al.: Captioning Videos Using Large-Scale Iage Corpus 49 ACM International Conference on Multiedia, Sept. 2007, pp [3] Chen J W, Cui Y, Ye G N, Liu D, Chang S F. Event-driven seantic concept discovery by exploiting wealy tagged internet iages. In Proc. International Conference on Multiedia Retrieval, Apr [4] Tang J H, Shu X B, Qi G J, Li Z C, Wang M, Yan S C, Jain R. Tri-clustered tensor copletion for social-aware iage tag refineent. IEEE Transactions on Pattern Analysis and Machine Intelligence, 206, doi: 0.09/TPAMI [5] Tang J H, Shu X B, Li Z C, Qi G J, Wang J D. Generalized deep transfer networs for nowledge propagation in heterogeneous doains. ACM Transactions on Multiedia Coputing, Counications, and Applications, 206, 2(4s): Article No. 68. [6] Li Z C, Tang J H. Wealy supervised deep atrix factorization for social iage understanding. IEEE Transactions on Iage Processing, 207, 26(): [7] Li Z C, Liu J, Tang J H, Lu H Q. Robust structured subspace learning for data representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 205, 37(0): [8] Yang Y, Zhang H W, Zhang M X, Shen F M, Li X L. Visual coding in a seantic hierarchy. In Proc. the 23rd ACM International Conference on Multiedia, Oct. 205, pp [9] Yatsar M, Zettleoyer L, Farhadi A. Situation recognition: Visual seantic role labeling for iage understanding. In Proc. IEEE Conference on Coputer Vision and Pattern Recognition, Jun pp [0] Yang Y, Zha Z J, Gao Y, Zhu X F, Chua T S. Exploiting web iages for seantic video indexing via robust saple-specific loss. IEEE Transactions on Multiedia, 204, 6(6): [] Yang Y, Yang Y, Shen H T. Effective transfer tagging fro iage to video. ACM Transactions on Multiedia Coputing, Counications, and Applications, 203, 9(2): Article No. 4. [2] Li Z C, Liu J, Yang Y, Zhou X F, Lu H Q. Clusteringguided sparse structural learning for unsupervised feature selection. IEEE Transactions on Knowledge and Data Engineering, 204, 26(9): [3] Wang J D, Shen H T, Song J K, Ji J Q. Hashing for siilarity search: A survey. arxiv: , Apr [4] Tang J H, Li Z C, Wang M, Zhao R Z. Neighborhood discriinant hashing for large-scale iage retrieval. IEEE Transactions on Iage Processing, 205, 24(9): [5] Gong Y C, Lazebni S. Iterative quantization: A procrustean approach to learning binary codes. In Proc. IEEE Conference on Coputer Vision and Pattern Recognition, Jun. 20 pp [6] Shen F M, Shen C H, Liu W, Shen H T. Supervised discrete hashing. In Proc. IEEE Conference on Coputer Vision and Pattern Recognition, Jun. 205, pp [7] Devlin J, Gupta S, Girshic R, Mitchell M, Zitnic C L. Exploring nearest neighbor approaches for iage captioning. arxiv preprint arxiv: , Apr [8] Krizhevsy A, Sutsever I, Hinton G E. Iagenet classification with deep convolutional neural networs. In Proc. the 25th International Conference on Neural Inforation Processing Systes, Dec. 202, pp [9] Zhu Z, Liang D, Zhang S H, Huang X L, Li B, Hu S M. Traffic-sign detection and classification in the wild. In Proc. IEEE Conference on Coputer Vision and Pattern Recognition, Jun. 206, pp [20] Miolov T, Karafiát M, Burget L, Černocý J, Khudanpur S. Recurrent neural networ based language odel. In Proc. the th Annual Conference of the International Speech Counication Association, Sep. 200, pp [2] Song J, Tang S L, Xiao J, Wu F, Zhang Z F. LSTM-in- LSTM for generating long descriptions of iages. Coputational Visual Media, 206, 2(4): [22] Jia Y Q, Shelhaer E, Donahue J, Karayev S, Long J, Girshic R, Guadarraa S, Darrell T. Caffe: Convolutional architecture for fast feature ebedding. In Proc. the 22nd ACM International Conference on Multiedia, Nov. 204, pp [23] Szegedy C, Liu W, Jia Y Q, Seranet P, Reed S, Anguelov D, Erhan D, Vanhouce V, Rabinovich A. Going deeper with convolutions. In Proc. IEEE Conference on Coputer Vision and Pattern Recognition, Jun. 205, pp.-9. [24] Sionyan K, Zisseran A. Very deep convolutional networs for large-scale iage recognition. arxiv preprint arxiv: , Apr [25] Razavian A S, Azizpour H, Sullivan J, Carlsson S. CNN features off-the-shelf: An astounding baseline for recognition. In Proc. IEEE Conference on Coputer Vision and Pattern Recognition, Jun. 204, pp [26] Norris J R. Marov Chains. Cabridge University Press, 998. [27] Moon T K. The expectation-axiization algorith. IEEE Signal Processing Magazine, 996, 3(6): [28] Berger A L, Pietra V J D, Pietra S A D. A axiu entropy approach to natural language processing. Coputational Linguistics, 996, 22(): [29] Kiros R, Salahutdinov R, Zeel R. Multiodal neural language odels. In Proc. the 3st International Conference on Machine Learning, Jun. 204, pp [30] Wu Q, Shen C H, van den Hengel A, Liu L Q, Dic A. Iage captioning with an interediate attributes layer. arxiv:6.044v, Apr [3] Gao H Y, Mao J H, Zhou J, Huang Z H, Wang L, Xu W. Are you taling to a achine? Dataset and ethods for ultilingual iage question answering. arxiv:5.0562, Apr [32] Donahue J, Hendrics L A, Guadarraa S, Rohrbach M, Venugopalan S, Saeno K, Darrell T. Longter recurrent convolutional networs for visual recognition and description. arxiv:4.4389, Apr [33] Chen X L, Fang H, Lin T Y, Vedanta R, Gupta S, Dollar P, Zitnic C L. Microsoft coco captions: Data collection and evaluation server. arxiv: , Apr [34] Ordonez V, Kularni G, Berg T L. I2text: Describing iages using illion captioned photographs. In Proc. Neural Inforation Processing Systes, Dec. 20, pp.43-5.

13 492 J. Coput. Sci. & Technol., May 207, Vol.32, No.3 [35] Chariar M S. Siilarity estiation techniques fro rounding algoriths. In Proc. the 34th Annual ACM Syposiu on Theory of Coputing, May 2002, pp [36] Shen F M, Zhou X, Yang Y, Song J K, Shen H T, Tao D C. A fast optiization ethod for general binary code learning. IEEE Transactions on Iage Processing, 206, 25(2): [37] Yang Y, Luo Y S, Chen W L, Shen F M, Shao J, Shen H T. Zero-shot hashing via transferring supervised nowledge. In Proc. ACM Conference on Multiedia, Oct. 206, pp [38] Shen F M, Shen C H, Shi Q F, van den Hengel A, Tang Z M, Shen H T. Hashing on nonlinear anifolds. IEEE Transactions on Iage Processing, 205, 24(6): [39] Gionis A, Indy P, Motwani R. Siilarity search in high diensions via hashing. In Proc. the 25th International Conference on Very Large Data Bases, Sept. 999, pp [40] Yang Y, Shen F M, Shen H T, Li H X, Li X L. Robust discrete spectral hashing for large-scale iage seantic indexing. IEEE Transactions on Big Data, 205, (4): [4] Song J K, Yang Y, Yang Y, Huang Z, Shen H T. Interedia hashing for large-scale retrieval fro heterogeneous data sources. In Proc. ACM SIGMOD International Conference on Manageent of Data, Jun. 203, pp [42] Strecha C, Bronstein A, Bronstein M, Fua P. Ldahash: Iproved atching with saller descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 202, 34(): [43] Luo Y D, Yang Y, Shen F M, Huang Z, Zhou P, Shen H T. Robust discrete code odeling for supervised hashing. Pattern Recognition, 207, doi: 0.06/j.patcog [44] Ballas N, Yao L, Pal C, Courville A. Delving deeper into convolutional networs for learning video representations. arxiv: , Apr [45] Mazloo M, Li X R, Snoe C G M. TagBoo: A seantic video representation without supervision for event detection. arxiv:.02899v2, Apr [46] Lowe D G. Object recognition fro local scale-invariant features. In Proc. the 7th IEEE International Conference on Coputer Vision, Sep. 999, pp.-57. [47] Xu M, Duan L Y, Cai J F, Chia L T, Xu C S, Tian Q. HMM-based audio eyword generation. In Proc. Pacific- Ri Conference on Multiedia, Dec pp [48] Shetty R, Laasonen J. Video captioning with recurrent networs based on frae- and video-level features and visual content classification. arxiv: , Apr [49] Yao L, Torabi A, Cho K, Ballas N, Pal C, Larochelle H, Courville A. Describing videos by exploiting teporal structure. In Proc. IEEE International Conference on Coputer Vision, Dec. 205, pp [] Pan P B, Xu Z W, Yang Y, Wu F, Zhuang Y T. Hierarchical recurrent neural encoder for video representation with application to captioning. arxiv: , Apr [5] Sener O, Zair A R, Savarese S, Saxena A. Unsupervised seantic parsing of video collections. In Proc. IEEE International Conference on Coputer Vision, Dec. 205, pp [52] Papineni K, Rouos S, Ward T, Zhu W J. BLEU: A ethod for autoatic evaluation of achine translation. In Proc. the 40th Annual Meeting on Association for Coputational Linguistics, Jul. 2002, pp [53] Denowsi M, Lavie A. Meteor universal: Language specific translation evaluation for any target language. In Proc. the 9th Worshop on Statistical Machine Translation, Vol. 6, Apr. 204, pp [54] Vedanta R, Zitnic C L, Parih D. CIDEr: Consensusbased iage description evaluation. arxiv:4.5726, Apr [55] Lin C Y. Rouge: A pacage for autoatic evaluation of suaries. In Proc. ACL-04 Worshop on Text Suarization Branches Out, Jul. 2004, pp [56] Guadarraa S, Krishnaoorthy N, Malarnenar G, Venugopalan S, Mooney R, Darrell T, Saeno K. YouTube2Text: Recognizing and describing arbitrary activities using seantic hierarchies and zero-shot recognition. In Proc. IEEE International Conference on Coputer Vision, Dec. 203, pp [57] Thoason J, Venugopalan S, Guadarraa S, Saeno K, Mooney R J. Integrating language and vision to generate natural language descriptions of videos in the wild. In Proc. the 25th International Conference on Coputational Linguistics, Aug. 204, pp Xiao-Yu Du is currently a lecturer in the School of Software Engineering of Chengdu University of Inforation Technology, Chengdu, and a Ph.D. candidate of University of Electronic Science and Technology of China, Chengdu. He received his M.E. degree in coputer software and theory in 20 and B.S. degree in coputer science and technology in 2008, both fro Beijing Noral University, Beijing. His research interests include ultiedia analysis and retrieval, coputer vision, and achine learning. Yang Yang is currently with University of Electronic Science and Technology of China, Chengdu. He joined National University of Singapore (NUS) as a research fellow, woring with Prof. Tat-Seng Chua ( ). He was conferred his Ph.D. degree in inforation technology fro the University of Queensland (UQ) in 202. He obtained his Master s degree in coputer science in 2009 fro Peing University, Beijing, and Bachelor s degree in coputer science in 2006 fro Jilin University, Changchun, respectively. He research interests ainly focus on ultiedia search, social edia analysis, and achine learning.

14 Xiao-Yu Du et al.: Captioning Videos Using Large-Scale Iage Corpus 493 Liu Yang is a staff of Sichuan University West China Hospital of Stoatology, Chengdu. He is pursuing his Master s degree in School of Inforation and Software Engineering, University of Electronic Science and Technology of China, Chengdu. His research interests in pattern recognition and coputer vision. Fu-Min Shen is currently a lecturer in School of Coputer Science and Engineering, University of Electronic Science and Technology of China, Chengdu. He received his Ph.D. degree fro School of Coputer Science, Nanjing University of Science and Technology, Nanjing, in 204. With the support of the China Scholarship Council fro Apr. 20 to Nov. 202, he visited ACVT (Australian Centre for Visual Technology) and School of Coputer Science, University of Adelaide, Australia. He received his Bachelor s degree in applied atheatics fro Shandong University, Jinan, in Zhi-Guang Qin received his Ph.D. degree fro the School of Coputer Science and Engineering, University of Electronic Science and Technology of China (UESTC), Chengdu, in 996. He is currently a full professor of the School of Inforation and Software Engineering with UESTC, Chengdu. His research interests include coputer networing, inforation security, cryptography, and pattern analysis. Jin-Hui Tang is a professor in School of Coputer Science and Engineering, Nanjing University of Science and Technology, Nanjing. He received his B.E. degree in counication engineering and Ph.D. degree in signal and inforation processing fro the University of Science and Technology of China, Hefei, in 2003 and 2008 respectively. His current research interests include ultiedia search and coputer vision. He has authored over 60 journal and conference papers in these areas. He is a co-recipient of the Best Student Paper Award in MMM 206, the Best Paper Finalist in ACM MM 205, and Best Paper Awards in ACM MM 2007, PCM 20 and ICIMCS 20. He is a senior eber ofccf and IEEE, andaeber ofacm.

Bayesian Networks Modeling for Crop Diseases

Bayesian Networks Modeling for Crop Diseases Bayesian Networs Modeling for Crop Diseases Chunguang Bi and Guifen Chen College of nforation & Technology, Jilin gricultural University, Changchun, China Bi_chunguan@126.co, guifchen@163.co bstract. Severe

More information

Adaptive visual attention model

Adaptive visual attention model H. Hügli, A. Bur, Adaptive Visual Attention Model, Proceedings of Iage and Vision Coputing New Zealand 2007, pp. 233 237, Hailton, New Zealand, Deceber 2007. Adaptive visual attention odel H. Hügli and

More information

Automatic Speaker Recognition System in Adverse Conditions Implication of Noise and Reverberation on System Performance

Automatic Speaker Recognition System in Adverse Conditions Implication of Noise and Reverberation on System Performance Autoatic Speaer Recognition Syste in Adverse Conditions Iplication of Noise and Reverberation on Syste Perforance Khais A. Al-Karawi, Ahed H. Al-Noori, Francis F. Li, and Ti Ritchings Abstract Speaer recognition

More information

Image Captioning using Reinforcement Learning. Presentation by: Samarth Gupta

Image Captioning using Reinforcement Learning. Presentation by: Samarth Gupta Image Captioning using Reinforcement Learning Presentation by: Samarth Gupta 1 Introduction Summary Supervised Models Image captioning as RL problem Actor Critic Architecture Policy Gradient architecture

More information

Assessment of Human Random Number Generation for Biometric Verification ABSTRACT

Assessment of Human Random Number Generation for Biometric Verification ABSTRACT Original Article www.jss.ui.ac.ir Assessent of Huan Rando Nuber Generation for Bioetric Verification Elha Jokar, Mohaad Mikaili Departent of Engineering, Shahed University, Tehran, Iran Subission: 07-01-2012

More information

FAST ACQUISITION OF OTOACOUSTIC EMISSIONS BY MEANS OF PRINCIPAL COMPONENT ANALYSIS

FAST ACQUISITION OF OTOACOUSTIC EMISSIONS BY MEANS OF PRINCIPAL COMPONENT ANALYSIS FAST ACQUISITION OF OTOACOUSTIC EMISSIONS BY MEANS OF PRINCIPAL COMPONENT ANALYSIS P. Ravazzani 1, G. Tognola 1, M. Parazzini 1,2, F. Grandori 1 1 Centro di Ingegneria Bioedica CNR, Milan, Italy 2 Dipartiento

More information

Predicting Time Spent with Physician

Predicting Time Spent with Physician Ji Zheng jizheng@stanford.edu Stanford University, Coputer Science Dept., 353 Serra Mall, Stanford, CA 94305 USA Ioannis (Yannis) Petousis petousis@stanford.edu Stanford University, Electrical Engineering

More information

An Indian Journal FULL PAPER ABSTRACT KEYWORDS. Trade Science Inc.

An Indian Journal FULL PAPER ABSTRACT KEYWORDS. Trade Science Inc. [Type tet] [Type tet] [Type tet] ISSN : 0974-7435 Volue 10 Issue 8 BioTechnology 014 An Indian Journal FULL PAPER BTAIJ, 10(8), 014 [714-71] Factor analysis-based Chinese universities aerobics sustainable

More information

Fig.1. Block Diagram of ECG classification. 2013, IJARCSSE All Rights Reserved Page 205

Fig.1. Block Diagram of ECG classification. 2013, IJARCSSE All Rights Reserved Page 205 Volue 3, Issue 9, Septeber 2013 ISSN: 2277 128X International Journal of Advanced Research in Coputer Science and Software Engineering Research Paper Available online at: www.ijarcsse.co Autoatic Classification

More information

Brain Computer Interface with Low Cost Commercial EEG Device

Brain Computer Interface with Low Cost Commercial EEG Device Brain Coputer Interface with Low Cost Coercial EEG Device 1 * Gürkan Küçükyıldız, Suat Karakaya, Hasan Ocak and 2 Öer Şayli 1 Faculty of Engineering, Departent of Mechatronics Engineering, Kocaeli University,

More information

Sudden Noise Reduction Based on GMM with Noise Power Estimation

Sudden Noise Reduction Based on GMM with Noise Power Estimation J. Software Engineering & Applications, 010, 3: 341-346 doi:10.436/jsea.010.339 Pulished Online April 010 (http://www.scirp.org/journal/jsea) 341 Sudden Noise Reduction Based on GMM with Noise Power Estiation

More information

Speech Enhancement Using Temporal Masking in the FFT Domain

Speech Enhancement Using Temporal Masking in the FFT Domain PAGE 8 Speech Enhanceent Using Teporal Masking in the FFT Doain Yao Wang, Jiong An, Teddy Surya Gunawan, and Eliathaby Abikairajah School of Electrical Engineering and Telecounications The University of

More information

Hierarchical Cellular Automata for Visual Saliency

Hierarchical Cellular Automata for Visual Saliency https://doi.org/.7/s263-7-62-2 Hierarchical Cellular Autoata for Visual Saliency Yao Qin Mengyang Feng 2 Huchuan Lu 2 Garrison W. Cottrell Received: 2 May 27 / Accepted: 26 Deceber 27 Springer Science+Business

More information

Performance Measurement Parameter Selection of PHM System for Armored Vehicles Based on Entropy Weight Ideal Point. Yuanhong Liu

Performance Measurement Parameter Selection of PHM System for Armored Vehicles Based on Entropy Weight Ideal Point. Yuanhong Liu nd International Conference on Coputer Engineering, Inforation Science & Application Technology (ICCIA 17) Perforance Measureent Paraeter Selection of PHM Syste for Arored Vehicles Based on Entropy Weight

More information

Learning the topology of the genome from protein-dna interactions

Learning the topology of the genome from protein-dna interactions Learning the topology of the genoe fro protein-dna interactions Suhas S.P. Rao, SUnet ID: suhasrao Stanford University I. Introduction A central proble in genetics is how the genoe (which easures 2 eters

More information

Results Univariable analyses showed that heterogeneity variances were, on average, increased among trials at

Results Univariable analyses showed that heterogeneity variances were, on average, increased among trials at Between-trial heterogeneity in eta-analyses ay be partially explained by reported design characteristics KM Rhodes 1, RM Turner 1,, J Savović 3,4, E Jones 3, D Mawdsley 5, JPT iggins 3 1 MRC Biostatistics

More information

Fuzzy Analytical Hierarchy Process for Ecological Risk Assessment

Fuzzy Analytical Hierarchy Process for Ecological Risk Assessment Inforation Technology and Manageent Science Fuzzy Analytical Hierarchy Process for Ecological Risk Assessent Andres Radionovs 1 Oļegs Užga-Rebrovs 2 1 2 Rezekne Acadey of Technologies ISSN 2255-9094 (online)

More information

Automatic Seizure Detection Based on Wavelet-Chaos Methodology from EEG and its Sub-bands

Automatic Seizure Detection Based on Wavelet-Chaos Methodology from EEG and its Sub-bands Autoatic Seizure Detection Based on Wavelet-Chaos Methodology fro EEG and its Sub-bands Azadeh Abbaspour 1, Alireza Kashaninia 2, and Mahood Airi 3 1 Meber of Scientific Association of Electrical Eng.

More information

Follicle Detection in Digital Ultrasound Images using Bidimensional Empirical Mode Decomposition and Fuzzy C-means Clustering Algorithm

Follicle Detection in Digital Ultrasound Images using Bidimensional Empirical Mode Decomposition and Fuzzy C-means Clustering Algorithm Follicle Detection in Digital Ultrasound Iages using Bidiensional Epirical Mode Decoposition and Fuzzy C-eans Clustering Algorith M.Jayanthi Rao @, Dr.R.Kiran Kuar # @ Research Scholar, Departent of CS,

More information

The sensitivity analysis of hypergame equilibrium

The sensitivity analysis of hypergame equilibrium 3rd International Conference on Manageent, Education, Inforation and Control (MEICI 015) The sensitivity analysis of hypergae equilibriu Zhongfu Qin 1,a Xianrong Wei 1,b Jingping Li 1,c 1 College of Civil

More information

Bivariate Quantitative Trait Linkage Analysis: Pleiotropy Versus Co-incident Linkages

Bivariate Quantitative Trait Linkage Analysis: Pleiotropy Versus Co-incident Linkages Genetic Epideiology 14:953!958 (1997) Bivariate Quantitative Trait Linkage Analysis: Pleiotropy Versus Co-incident Linkages Laura Alasy, Thoas D. Dyer, and John Blangero Departent of Genetics, Southwest

More information

A new approach for epileptic seizure detection: sample entropy based feature extraction and extreme learning machine

A new approach for epileptic seizure detection: sample entropy based feature extraction and extreme learning machine J. Bioedical Science and Engineering, 2010, 3, 556-567 doi:10.4236/jbise.2010.36078 Published Online June 2010 (http://www.scirp.org/journal/jbise/). A new approach for epileptic seizure detection: saple

More information

arxiv: v1 [cs.lg] 28 Nov 2017

arxiv: v1 [cs.lg] 28 Nov 2017 Snorkel: Rapid Training Data Creation with Weak Supervision Alexander Ratner Stephen H. Bach Henry Ehrenberg Jason Fries Sen Wu Christopher Ré Stanford University Stanford, CA, USA {ajratner, bach, henryre,

More information

Evolution of Indirect Reciprocity by Social Information: The Role of

Evolution of Indirect Reciprocity by Social Information: The Role of 1 Title: Evolution of indirect reciprocity by social inforation: the role of Trust and reputation in evolution of altruis Author Affiliation: Mojdeh Mohtashei* and Lik Mui* *Laboratory for Coputer Science,

More information

Data Mining Techniques for Performance Evaluation of Diagnosis in

Data Mining Techniques for Performance Evaluation of Diagnosis in ISSN: 2347-3215 Volue 2 Nuber 1 (October-214) pp. 91-98 www.ijcrar.co Data Mining Techniques for Perforance Evaluation of Diagnosis in Gestational Diabetes Srideivanai Nagarajan 1*, R.M.Chandrasekaran

More information

Biomedical Research 2016; Special Issue: S178-S185 ISSN X

Biomedical Research 2016; Special Issue: S178-S185 ISSN X Bioedical Research 2016; Special Issue: S178-S185 ISSN 0970-938X www.bioedres.info A novel autoatic stepwise signal processing based coputer aided diagnosis syste for epilepsy-seizure detection and classification

More information

Tucker, L. R, & Lewis, C. (1973). A reliability coefficient for maximum likelihood factor

Tucker, L. R, & Lewis, C. (1973). A reliability coefficient for maximum likelihood factor T&L article, version of 6/7/016, p. 1 Tucker, L. R, & Lewis, C. (1973). A reliability coefficient for axiu likelihood factor analysis. Psychoetrika, 38, 1-10 (4094 citations according to Google Scholar

More information

Optical coherence tomography (OCT) is a noninvasive

Optical coherence tomography (OCT) is a noninvasive Coparison of Optical Coherence Toography in Diabetic Macular Edea, with and without Reading Center Manual Grading fro a Clinical Trials Perspective Ada R. Glassan, 1 Roy W. Beck, 1 David J. Browning, 2

More information

Translating Videos to Natural Language Using Deep Recurrent Neural Networks

Translating Videos to Natural Language Using Deep Recurrent Neural Networks Translating Videos to Natural Language Using Deep Recurrent Neural Networks Subhashini Venugopalan UT Austin Huijuan Xu UMass. Lowell Jeff Donahue UC Berkeley Marcus Rohrbach UC Berkeley Subhashini Venugopalan

More information

Noise Spectrum Estimation using Gaussian Mixture Model-based Speech Presence Probability for Robust Speech Recognition

Noise Spectrum Estimation using Gaussian Mixture Model-based Speech Presence Probability for Robust Speech Recognition Noise Spectru Estiation using Gaussian Mixture Model-bed Speech Presence Probability for Robust Speech Recognition M. J. Ala 2 P. Kenny P. Duouchel 2 D. O'Shaughnessy 3 CRIM Montreal Canada 2 ETS Montreal

More information

How Should Blood Glucose Meter System Analytical Performance Be Assessed?

How Should Blood Glucose Meter System Analytical Performance Be Assessed? 598599DSTXXX1.1177/1932296815598599Journal of Diabetes Science and TechnologySions research-article215 Coentary How Should Blood Glucose Meter Syste Analytical Perforance Be Assessed? Journal of Diabetes

More information

CHAPTER 7 THE HIV TRANSMISSION DYNAMICS MODEL FOR FIVE MAJOR RISK GROUPS

CHAPTER 7 THE HIV TRANSMISSION DYNAMICS MODEL FOR FIVE MAJOR RISK GROUPS CHAPTER 7 THE HIV TRANSMISSION DYNAMICS MODEL FOR FIVE MAJOR RISK GROUPS Chapters 2 and 3 have focused on odeling the transission dynaics of HIV and the progression to AIDS for hoosexual en. That odel

More information

Matching Methods for High-Dimensional Data with Applications to Text

Matching Methods for High-Dimensional Data with Applications to Text Matching Methods for High-Diensional Data with Applications to Text Margaret E. Roberts, Brandon M. Stewart, and Richard Nielsen This draft: October 6, 2015 We thank the following for helpful coents and

More information

Annotation and Retrieval System Using Confabulation Model for ImageCLEF2011 Photo Annotation

Annotation and Retrieval System Using Confabulation Model for ImageCLEF2011 Photo Annotation Annotation and Retrieval System Using Confabulation Model for ImageCLEF2011 Photo Annotation Ryo Izawa, Naoki Motohashi, and Tomohiro Takagi Department of Computer Science Meiji University 1-1-1 Higashimita,

More information

arxiv: v1 [cs.cv] 12 Dec 2016

arxiv: v1 [cs.cv] 12 Dec 2016 Text-guided Attention Model for Image Captioning Jonghwan Mun, Minsu Cho, Bohyung Han Department of Computer Science and Engineering, POSTECH, Korea {choco1916, mscho, bhhan}@postech.ac.kr arxiv:1612.03557v1

More information

Research Article Association Patterns of Ontological Features Signify Electronic Health Records in Liver Cancer

Research Article Association Patterns of Ontological Features Signify Electronic Health Records in Liver Cancer Hindawi Journal of Healthcare Engineering Volue 2017, Article ID 6493016, 9 pages https://doi.org/10.1155/2017/6493016 Research Article Association Patterns of Ontological Features Signify Electronic Health

More information

Challenges and Implications of Missing Data on the Validity of Inferences and Options for Choosing the Right Strategy in Handling Them

Challenges and Implications of Missing Data on the Validity of Inferences and Options for Choosing the Right Strategy in Handling Them International Journal of Statistical Distributions and Applications 2017; 3(4): 87-94 http://www.sciencepublishinggroup.co/j/ijsda doi: 10.11648/j.ijsd.20170304.15 ISSN: 2472-3487 (Print); ISSN: 2472-3509

More information

A HMM-based Pre-training Approach for Sequential Data

A HMM-based Pre-training Approach for Sequential Data A HMM-based Pre-training Approach for Sequential Data Luca Pasa 1, Alberto Testolin 2, Alessandro Sperduti 1 1- Department of Mathematics 2- Department of Developmental Psychology and Socialisation University

More information

A CLOUD-BASED ARCHITECTURE PROPOSAL FOR REHABILITATION OF APHASIA PATIENTS

A CLOUD-BASED ARCHITECTURE PROPOSAL FOR REHABILITATION OF APHASIA PATIENTS Rev. Rou. Sci. Techn. Électrotechn. et Énerg. Vol. 62, 3, pp. 332 337, Bucarest, 2017 A CLOUD-BASED ARCHITECTURE PROPOSAL FOR REHABILITATION OF APHASIA PATIENTS DORIN CÂRSTOIU 1, VIRGINIA ECATERINA OLTEAN

More information

A novel technique for stress recognition using ECG signal pattern.

A novel technique for stress recognition using ECG signal pattern. Curr Pediatr Res 2017; 21 (4): 674-679 ISSN 0971-9032 www.currentpediatrics.co A novel technique for stress recognition using ECG signal pattern. Supriya Goel, Gurjit Kau, Pradeep Toa Gauta Buddha University,

More information

Beating by hitting: Group Competition and Punishment

Beating by hitting: Group Competition and Punishment Beating by hitting: Group Copetition and Punishent Eva van den Broek *, Martijn Egas **, Laurens Goes **, Arno Riedl *** This version: February, 2008 Abstract Both group copetition and altruistic punishent

More information

Toll Pricing. Computational Tests for Capturing Heterogeneity of User Preferences. Lan Jiang and Hani S. Mahmassani

Toll Pricing. Computational Tests for Capturing Heterogeneity of User Preferences. Lan Jiang and Hani S. Mahmassani Toll Pricing Coputational Tests for Capturing Heterogeneity of User Preferences Lan Jiang and Hani S. Mahassani Because of the increasing interest in ipleentation and exploration of a wider range of pricing

More information

Keywords: meta-epidemiology; randomised trials; heterogeneity; Bayesian methods; Cochrane

Keywords: meta-epidemiology; randomised trials; heterogeneity; Bayesian methods; Cochrane Label-invariant odels for the analysis of eta-epideiological data KM Rhodes 1, D Mawdsley, RM Turner 1,3, HE Jones 4, J Savović 4,5, JPT Higgins 4 1 MRC Biostatistics Unit, School of Clinical Medicine,

More information

Comparison of Two Approaches for Direct Food Calorie Estimation

Comparison of Two Approaches for Direct Food Calorie Estimation Comparison of Two Approaches for Direct Food Calorie Estimation Takumi Ege and Keiji Yanai Department of Informatics, The University of Electro-Communications, Tokyo 1-5-1 Chofugaoka, Chofu-shi, Tokyo

More information

Identification of Consumer Adverse Drug Reaction Messages on Social Media

Identification of Consumer Adverse Drug Reaction Messages on Social Media Association for Inforation Systes AIS Electronic Library (AISeL) PACIS 2013 Proceedings Pacific Asia Conference on Inforation Systes (PACIS) 6-18-2013 Identification of Consuer Adverse Drug Reaction Messages

More information

Direct in situ measurement of specific capacitance, monolayer tension, and bilayer tension in a droplet interface bilayer

Direct in situ measurement of specific capacitance, monolayer tension, and bilayer tension in a droplet interface bilayer Electronic Suppleentary Material (ESI) for Soft Matter. This journal is The Royal Society of Cheistry 2015 Taylor et al. Electronic Supporting Inforation Direct in situ easureent of specific capacitance,

More information

Application of Factor Analysis on Academic Performance of Pupils

Application of Factor Analysis on Academic Performance of Pupils Aerican Journal of Applied Matheatics and Statistics, 07, Vol. 5, No. 5, 64-68 Available online at http://pubs.sciepub.co/aas/5/5/ Science and Education Publishing DOI:0.69/aas-5-5- Application of Factor

More information

Development of new muscle contraction sensor to replace semg for using in muscles analysis fields

Development of new muscle contraction sensor to replace semg for using in muscles analysis fields Loughborough University Institutional Repository Developent of new uscle contraction sensor to replace semg for using in uscles analysis fields This ite was subitted to Loughborough University's Institutional

More information

THYROID SEGMENTATION IN ULTRASOUND IMAGES USING SUPPORT VECTOR MACHINE

THYROID SEGMENTATION IN ULTRASOUND IMAGES USING SUPPORT VECTOR MACHINE International Journal of Neural Networks and Applications, 4(1), 2011, pp. 7-12 THYROID SEGMENTATION IN ULTRASOUND IMAGES USING SUPPORT VECTOR MACHINE D. Selvathi 1 and V. S. Sharnitha 2 Mepco Schlenk

More information

Dendritic Inhibition Enhances Neural Coding Properties

Dendritic Inhibition Enhances Neural Coding Properties Dendritic Inhibition Enhances Neural Coding Properties M.W. Spratling and M.H. Johnson Centre for Brain and Cognitive Developent, Birkbeck College, London, UK The presence of a large nuber of inhibitory

More information

Design of Palm Acupuncture Points Indicator

Design of Palm Acupuncture Points Indicator Design of Palm Acupuncture Points Indicator Wen-Yuan Chen, Shih-Yen Huang and Jian-Shie Lin Abstract The acupuncture points are given acupuncture or acupressure so to stimulate the meridians on each corresponding

More information

Liu Jing and Liu Jing Diagnosis System in Classical TCM Discussions of Six Divisions or Six Confirmations Diagnosis System in Classical TCM Texts

Liu Jing and Liu Jing Diagnosis System in Classical TCM Discussions of Six Divisions or Six Confirmations Diagnosis System in Classical TCM Texts Liu Jing and Liu Jing Diagnosis System in Classical TCM Discussions of Six Divisions or Six Confirmations Diagnosis System in Classical TCM Texts Liu Jing Bian Zheng system had developed about 1800 years

More information

Acoustic Signal Processing Based on Deep Neural Networks

Acoustic Signal Processing Based on Deep Neural Networks Acoustic Signal Processing Based on Deep Neural Networks Chin-Hui Lee School of ECE, Georgia Tech chl@ece.gatech.edu Joint work with Yong Xu, Yanhui Tu, Qing Wang, Tian Gao, Jun Du, LiRong Dai Outline

More information

The Level of Participation in Deliberative Arenas

The Level of Participation in Deliberative Arenas The Level of Participation in Deliberative Arenas Autoria: Leonardo Secchi, Fabrizio Plebani Abstract This paper ais to contribute to the discussion on levels of participation in collective decision-aking

More information

Multi-attention Guided Activation Propagation in CNNs

Multi-attention Guided Activation Propagation in CNNs Multi-attention Guided Activation Propagation in CNNs Xiangteng He and Yuxin Peng (B) Institute of Computer Science and Technology, Peking University, Beijing, China pengyuxin@pku.edu.cn Abstract. CNNs

More information

Medical Knowledge Attention Enhanced Neural Model. for Named Entity Recognition in Chinese EMR

Medical Knowledge Attention Enhanced Neural Model. for Named Entity Recognition in Chinese EMR Medical Knowledge Attention Enhanced Neural Model for Named Entity Recognition in Chinese EMR Zhichang Zhang, Yu Zhang, Tong Zhou College of Computer Science and Engineering, Northwest Normal University,

More information

Physical Activity Training for

Physical Activity Training for Physical Activity Training for Functional Mobility in Older Persons To Hickey Fredric M. Wolf Lynne S. Robins University of Michigan Marilyn B. Wagner Cleveland State University Wafa Harik Case Western

More information

Strain-rate Dependent Stiffness of Articular Cartilage in Unconfined Compression

Strain-rate Dependent Stiffness of Articular Cartilage in Unconfined Compression L. P. Li* Biosyntech Inc., 475 Arand-Frappier Blvd., Park of Science and High Technology Laval, Quebec, Canada H7V 4B3 M. D. Buschann Departent of Cheical Engineering and Institute of Bioedical Engineering,

More information

OBESITY EPIDEMICS MODELLING BY USING INTELLIGENT AGENTS

OBESITY EPIDEMICS MODELLING BY USING INTELLIGENT AGENTS OBESITY EPIDEMICS MODELLING BY USING INTELLIGENT AGENTS Agostino G. Bruzzone, MISS DIPTEM University of Genoa Eail agostino@iti.unige.it - URL www.iti.unige.i t Vera Novak BIDMC, Harvard Medical School

More information

A TWO-DIMENSIONAL THERMODYNAMIC MODEL TO PREDICT HEART THERMAL RESPONSE DURING OPEN CHEST PROCEDURES

A TWO-DIMENSIONAL THERMODYNAMIC MODEL TO PREDICT HEART THERMAL RESPONSE DURING OPEN CHEST PROCEDURES A TWO-DIMENSIONAL THERMODYNAMIC MODEL TO PREDICT HEART THERMAL RESPONSE DURING OPEN CHEST PROCEDURES F. G. Dias, J. V. C. Vargas, and M. L. Brioschi Universidade Federal do Paraná Departaento de Engenharia

More information

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering SUPPLEMENTARY MATERIALS 1. Implementation Details 1.1. Bottom-Up Attention Model Our bottom-up attention Faster R-CNN

More information

Deep Learning based Information Extraction Framework on Chinese Electronic Health Records

Deep Learning based Information Extraction Framework on Chinese Electronic Health Records Deep Learning based Information Extraction Framework on Chinese Electronic Health Records Bing Tian Yong Zhang Kaixin Liu Chunxiao Xing RIIT, Beijing National Research Center for Information Science and

More information

Policy Trap and Optimal Subsidization Policy under Limited Supply of Vaccines

Policy Trap and Optimal Subsidization Policy under Limited Supply of Vaccines olicy Trap and Optial Subsidization olicy under Liited Supply of Vaccines Ming Yi 1,2, Achla Marathe 1,3 * 1 Networ Dynaics and Siulation Science Laboratory, VBI, Virginia Tech, Blacsburg, Virginia, United

More information

Genome-wide association study of esophageal squamous cell carcinoma in Chinese subjects identifies susceptibility loci at PLCE1 and C20orf54

Genome-wide association study of esophageal squamous cell carcinoma in Chinese subjects identifies susceptibility loci at PLCE1 and C20orf54 CORRECTION NOTICE Nat. Genet. 42, 759 763 (2010); published online 22 August 2010; corrected online 27 August 2014 Genome-wide association study of esophageal squamous cell carcinoma in Chinese subjects

More information

PathGAN: Visual Scanpath Prediction with Generative Adversarial Networks

PathGAN: Visual Scanpath Prediction with Generative Adversarial Networks PathGAN: Visual Scanpath Prediction with Generative Adversarial Networks Marc Assens 1, Kevin McGuinness 1, Xavier Giro-i-Nieto 2, and Noel E. O Connor 1 1 Insight Centre for Data Analytic, Dublin City

More information

Deep Compression and EIE: Efficient Inference Engine on Compressed Deep Neural Network

Deep Compression and EIE: Efficient Inference Engine on Compressed Deep Neural Network Deep Compression and EIE: Efficient Inference Engine on Deep Neural Network Song Han*, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark Horowitz, Bill Dally Stanford University Our Prior Work: Deep

More information

CRITICAL REVIEW OF EPILEPTIC PREDICTION MODEL USING EEG

CRITICAL REVIEW OF EPILEPTIC PREDICTION MODEL USING EEG CRITICAL REVIEW OF EPILEPTIC PREDICTION MODEL USING EEG Anju Shaikh 1 and Dr.Mukta Dhopeshwarkar 2 1 Assistant Professor,Departent Of Coputer Science & IT,Deogiri College, Aurangabad, India. 2 Assistant

More information

Outcome measures in palliative care for advanced cancer patients: a review

Outcome measures in palliative care for advanced cancer patients: a review Journal of Public Health Medicine Vol. 19, No. 2, pp. 193-199 Printed in Great Britain Outcoe easures in for advanced cancer s: a review Julie Hearn and Irene J. Higginson Suary Inforation generated using

More information

Transient Evoked Otoacoustic Emissions and Pseudohypacusis

Transient Evoked Otoacoustic Emissions and Pseudohypacusis A Acad Audiol 6 : 293-31 (1995) Transient Evoked Otoacoustic Eissions and Pseudohypacusis Frank E. Musiek* Steven P. Bornsteint Willia F. Rintelann$ Abstract The audiologic diagnosis of pseudohypacusis

More information

The Roles of Beliefs, Information, and Convenience. in the American Diet

The Roles of Beliefs, Information, and Convenience. in the American Diet The Roles of Beliefs, Inforation, and Convenience in the Aerican Diet Selected Paper Presented at the AAEA Annual Meeting 2002 Long Beach, July 28 th -31 st Lisa Mancino PhD Candidate University of Minnesota

More information

Learning to Disambiguate by Asking Discriminative Questions Supplementary Material

Learning to Disambiguate by Asking Discriminative Questions Supplementary Material Learning to Disambiguate by Asking Discriminative Questions Supplementary Material Yining Li 1 Chen Huang 2 Xiaoou Tang 1 Chen Change Loy 1 1 Department of Information Engineering, The Chinese University

More information

IDENTIFYING STRESS BASED ON COMMUNICATIONS IN SOCIAL NETWORKS

IDENTIFYING STRESS BASED ON COMMUNICATIONS IN SOCIAL NETWORKS IDENTIFYING STRESS BASED ON COMMUNICATIONS IN SOCIAL NETWORKS 1 Manimegalai. C and 2 Prakash Narayanan. C manimegalaic153@gmail.com and cprakashmca@gmail.com 1PG Student and 2 Assistant Professor, Department

More information

LOOK, LISTEN, AND DECODE: MULTIMODAL SPEECH RECOGNITION WITH IMAGES. Felix Sun, David Harwath, and James Glass

LOOK, LISTEN, AND DECODE: MULTIMODAL SPEECH RECOGNITION WITH IMAGES. Felix Sun, David Harwath, and James Glass LOOK, LISTEN, AND DECODE: MULTIMODAL SPEECH RECOGNITION WITH IMAGES Felix Sun, David Harwath, and James Glass MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA, USA {felixsun,

More information

Reader s Emotion Prediction Based on Partitioned Latent Dirichlet Allocation Model

Reader s Emotion Prediction Based on Partitioned Latent Dirichlet Allocation Model Reader s Emotion Prediction Based on Partitioned Latent Dirichlet Allocation Model Ruifeng Xu, Chengtian Zou, Jun Xu Key Laboratory of Network Oriented Intelligent Computation, Shenzhen Graduate School,

More information

Vector Learning for Cross Domain Representations

Vector Learning for Cross Domain Representations Vector Learning for Cross Domain Representations Shagan Sah, Chi Zhang, Thang Nguyen, Dheeraj Kumar Peri, Ameya Shringi, Raymond Ptucha Rochester Institute of Technology, Rochester, NY 14623, USA arxiv:1809.10312v1

More information

An Artificial Neural Network Architecture Based on Context Transformations in Cortical Minicolumns

An Artificial Neural Network Architecture Based on Context Transformations in Cortical Minicolumns An Artificial Neural Network Architecture Based on Context Transformations in Cortical Minicolumns 1. Introduction Vasily Morzhakov, Alexey Redozubov morzhakovva@gmail.com, galdrd@gmail.com Abstract Cortical

More information

Social Image Captioning: Exploring Visual Attention and User Attention

Social Image Captioning: Exploring Visual Attention and User Attention sensors Article Social Image Captioning: Exploring and User Leiquan Wang 1 ID, Xiaoliang Chu 1, Weishan Zhang 1, Yiwei Wei 1, Weichen Sun 2,3 and Chunlei Wu 1, * 1 College of Computer & Communication Engineering,

More information

Chapter 4: Nutrition. Teacher s Guide

Chapter 4: Nutrition. Teacher s Guide Chapter 4: Nutrition Teacher s Guide 55 Chapter 4: Nutrition Teacher s Guide Learning Objectives Students will explain two ways that nutrition affects health Students will describe the function of 5 iportant

More information

CSE Introduction to High-Perfomance Deep Learning ImageNet & VGG. Jihyung Kil

CSE Introduction to High-Perfomance Deep Learning ImageNet & VGG. Jihyung Kil CSE 5194.01 - Introduction to High-Perfomance Deep Learning ImageNet & VGG Jihyung Kil ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton,

More information

Deep Networks and Beyond. Alan Yuille Bloomberg Distinguished Professor Depts. Cognitive Science and Computer Science Johns Hopkins University

Deep Networks and Beyond. Alan Yuille Bloomberg Distinguished Professor Depts. Cognitive Science and Computer Science Johns Hopkins University Deep Networks and Beyond Alan Yuille Bloomberg Distinguished Professor Depts. Cognitive Science and Computer Science Johns Hopkins University Artificial Intelligence versus Human Intelligence Understanding

More information

AUC Optimization vs. Error Rate Minimization

AUC Optimization vs. Error Rate Minimization AUC Optiization vs. Error Rate Miniization Corinna Cortes and Mehryar Mohri AT&T Labs Research 180 Park Avenue, Florha Park, NJ 0793, USA {corinna, ohri}@research.att.co Abstract The area under an ROC

More information

DIET QUALITY AND CALORIES CONSUMED: THE IMPACT OF BEING HUNGRIER, BUSIER AND EATING OUT

DIET QUALITY AND CALORIES CONSUMED: THE IMPACT OF BEING HUNGRIER, BUSIER AND EATING OUT Working Paper 04-02 The Food Industry Center University of Minnesota Printed Copy $25.50 DIET QUALITY AND CALORIES CONSUMED: THE IMPACT OF BEING HUNGRIER, BUSIER AND EATING OUT Lisa Mancino and Jean Kinsey

More information

Convolutional Neural Networks (CNN)

Convolutional Neural Networks (CNN) Convolutional Neural Networks (CNN) Algorithm and Some Applications in Computer Vision Luo Hengliang Institute of Automation June 10, 2014 Luo Hengliang (Institute of Automation) Convolutional Neural Networks

More information

An original approach to the diagnosis of scolineinduced

An original approach to the diagnosis of scolineinduced J. clin Path., 1972, 25, 422-426 An original approach to the diagnosis of scolineinduced apnoea A. FSHTAL, R. T. EVANS, AND C. N. CHAPMAN Fro the Departent ofpathology, Southead General Hospital, Bristol

More information

Chair for Computer Aided Medical Procedures (CAMP) Seminar on Deep Learning for Medical Applications. Shadi Albarqouni Christoph Baur

Chair for Computer Aided Medical Procedures (CAMP) Seminar on Deep Learning for Medical Applications. Shadi Albarqouni Christoph Baur Chair for (CAMP) Seminar on Deep Learning for Medical Applications Shadi Albarqouni Christoph Baur Results of matching system obtained via matching.in.tum.de 108 Applicants 9 % 10 % 9 % 14 % 30 % Rank

More information

arxiv: v3 [stat.ml] 27 Mar 2018

arxiv: v3 [stat.ml] 27 Mar 2018 ATTACKING THE MADRY DEFENSE MODEL WITH L 1 -BASED ADVERSARIAL EXAMPLES Yash Sharma 1 and Pin-Yu Chen 2 1 The Cooper Union, New York, NY 10003, USA 2 IBM Research, Yorktown Heights, NY 10598, USA sharma2@cooper.edu,

More information

AIDS Epidemiology. Min Shim Math 164: Scientific Computing. April 30, 2004

AIDS Epidemiology. Min Shim Math 164: Scientific Computing. April 30, 2004 AIDS Epideiology Min Shi Math 64: Scientiic Coputing April 30, 004 Abstract Thopson s AIDS epideic odel, which is orulated in his article AIDS: The Misanageent o an Epideic, published in Great Britain,

More information

ASBESTOSIS AND PRIMARY INTRATHORACIC NEOPLASMS

ASBESTOSIS AND PRIMARY INTRATHORACIC NEOPLASMS ASBESTOSIS AND PRIMARY INTRATHORACIC NEOPLASMS Willia D. Buchanan Ministry of Labour, London, Gwat Britain The purpose of this paper is to describe one aspect of a study, which has been continued over

More information

Video Saliency Detection via Dynamic Consistent Spatio- Temporal Attention Modelling

Video Saliency Detection via Dynamic Consistent Spatio- Temporal Attention Modelling AAAI -13 July 16, 2013 Video Saliency Detection via Dynamic Consistent Spatio- Temporal Attention Modelling Sheng-hua ZHONG 1, Yan LIU 1, Feifei REN 1,2, Jinghuan ZHANG 2, Tongwei REN 3 1 Department of

More information

Parameter Identification using δ Decisions for Hybrid Systems in Biology

Parameter Identification using δ Decisions for Hybrid Systems in Biology CMACS/AVACS Worshop Paraeter Identification using δ Decisions for Hbrid Sstes in Biolog Bing Liu Joint wor with Soonho Kong, Sean Gao, Ed Clare Biological Sste Bolins et al. SIGGRAPH, 26 2 Coputational

More information

DEEP LEARNING BASED VISION-TO-LANGUAGE APPLICATIONS: CAPTIONING OF PHOTO STREAMS, VIDEOS, AND ONLINE POSTS

DEEP LEARNING BASED VISION-TO-LANGUAGE APPLICATIONS: CAPTIONING OF PHOTO STREAMS, VIDEOS, AND ONLINE POSTS SEOUL Oct.7, 2016 DEEP LEARNING BASED VISION-TO-LANGUAGE APPLICATIONS: CAPTIONING OF PHOTO STREAMS, VIDEOS, AND ONLINE POSTS Gunhee Kim Computer Science and Engineering Seoul National University October

More information

Deep Learning for Lip Reading using Audio-Visual Information for Urdu Language

Deep Learning for Lip Reading using Audio-Visual Information for Urdu Language Deep Learning for Lip Reading using Audio-Visual Information for Urdu Language Muhammad Faisal Information Technology University Lahore m.faisal@itu.edu.pk Abstract Human lip-reading is a challenging task.

More information

arxiv: v2 [cs.cv] 19 Dec 2017

arxiv: v2 [cs.cv] 19 Dec 2017 An Ensemble of Deep Convolutional Neural Networks for Alzheimer s Disease Detection and Classification arxiv:1712.01675v2 [cs.cv] 19 Dec 2017 Jyoti Islam Department of Computer Science Georgia State University

More information

Recurrent Neural Networks

Recurrent Neural Networks CS 2750: Machine Learning Recurrent Neural Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2017 One Motivation: Descriptive Text for Images It was an arresting face, pointed of chin,

More information

Age Estimation based on Multi-Region Convolutional Neural Network

Age Estimation based on Multi-Region Convolutional Neural Network Age Estimation based on Multi-Region Convolutional Neural Network Ting Liu, Jun Wan, Tingzhao Yu, Zhen Lei, and Stan Z. Li 1 Center for Biometrics and Security Research & National Laboratory of Pattern

More information

Research on Digital Testing System of Evaluating Characteristics for Ultrasonic Transducer

Research on Digital Testing System of Evaluating Characteristics for Ultrasonic Transducer Sensors & Transducers 2014 by IFSA Publishing, S. L. http://www.sensorsportal.com Research on Digital Testing System of Evaluating Characteristics for Ultrasonic Transducer Qin Yin, * Liang Heng, Peng-Fei

More information

A Comparison of Poisson Model and Modified Poisson Model in Modelling Relative Risk of Childhood Diabetes in Kenya

A Comparison of Poisson Model and Modified Poisson Model in Modelling Relative Risk of Childhood Diabetes in Kenya Aerican Journal of Theoretical and Applied Statistics 2018; 7(5): 193-199 http://www.sciencepublishinggroup.co/j/ajtas doi: 10.11648/j.ajtas.20180705.15 ISSN: 2326-8999 (Print); ISSN: 2326-9006 (Online)

More information

Investigation of Binaural Interference in Normal-Hearing and Hearing-Impaired Adults

Investigation of Binaural Interference in Normal-Hearing and Hearing-Impaired Adults J A Acad Audiol 11 : 494-500 (2000) Investigation of Binaural Interference in Noral-Hearing and Hearing-Ipaired Adults Rose L. Allen* Brady M. Schwab* Jerry L. Cranford* Michael D. Carpenter* Abstract

More information

Perceived similarity and visual descriptions in content-based image retrieval

Perceived similarity and visual descriptions in content-based image retrieval University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2007 Perceived similarity and visual descriptions in content-based image

More information