Learning-based Composite Metrics for Improved Caption Evaluation
|
|
- Lawrence Anthony
- 5 years ago
- Views:
Transcription
1 Learning-based Composite Metrics for Improved Caption Evaluation Naeha Sharif, Lyndon White, Mohammed Bennamoun and Syed Afaq Ali Shah, {naeha.sharif, and {mohammed.bennamoun, The University of Western Australia. 35 Stirling Highway, Crawley, Western Australia Abstract The evaluation of image caption quality is a challenging task, which requires the assessment of two main aspects in a caption: adequacy and fluency. These quality aspects can be judged using a combination of several linguistic features. However, most of the current image captioning metrics focus only on specific linguistic facets, such as the lexical or semantic, and fail to meet a satisfactory level of correlation with human judgements at the sentence-level. We propose a learning-based framework to incorporate the scores of a set of lexical and semantic metrics as features, to capture the adequacy and fluency of captions at different linguistic levels. Our experimental results demonstrate that composite metrics draw upon the strengths of standalone measures to yield improved correlation and accuracy. 1 Introduction Automatic image captioning requires the understanding of the visual aspects of images to generate human-like descriptions (Bernardi et al., 2016). The evaluation of the generated captions is crucial for the development and fine-grained analysis of image captioning systems (Vedantam et al., 2015). Automatic evaluation metrics aim at providing efficient, cost-effective and objective assessments of the caption quality. Since these automatic measures serve as an alternative to the manual evaluation, the major concern is that such measures should correlate well with human assessments. In other words, automatic metrics are expected to mimic the human judgement process by taking into account various aspects that humans consider when they assess the captions. The evaluation of image captions can be characterized as having two major aspects: adequacy and fluency. Adequacy is how well the caption reflects the source image, and fluency is how well the caption conforms to the norms and conventions of human language (Toury, 2012). In the case of manual evaluation, both adequacy and fluency tend to shape the human perception of the overall caption quality. Most of the automatic evaluation metrics tend to capture these aspects of quality based on the idea that the closer the candidate description to the professional human caption, the better it is in quality (Papineni et al., 2002). The output in such case is a score (the higher the better) reflecting the similarity. The majority of the commonly used metrics for image captioning such as BLEU (Papineni et al., 2002) and METEOR (Banerjee and Lavie, 2005) are based on the lexical similarity. Lexical measures (n-gram based) work by rewarding the n- gram overlaps between the candidate and the reference captions. Thus, measuring the adequacy by counting the n-gram matches and assessing the fluency by implicitly using the reference n-grams as a language model (Mutton et al., 2007). However, a high number of n-gram matches cannot always be indicative of a high caption quality, nor a low number of n-gram matches can always be reflective of a low caption quality (Giménez and Màrquez, 2010). A recently proposed semantic metric SPICE (Anderson et al., 2016), overcomes this deficiency of lexical measures by measuring the semantic similarity of candidate and reference captions using Scene Graphs. However, the major drawback of SPICE is that it ignores the fluency of the output caption. Integrating assessment scores of different measures is an intuitive and reasonable way to improve the current image captioning evaluation methods. Through this methodology, each metric plays the role of a judge, assessing the quality of captions in terms of lexical, grammatical or semantic accuracy. For this research, we use the scores conferred
2 by a set of measures that are commonly used for captioning and combine them through a learningbased framework. In this work: 1. We evaluate various combinations of a chosen set of metrics and show that the proposed composite metrics correlate better with human judgements. 2. We analyse the accuracy of composite metrics in terms of differentiating between pairs of captions in reference to the ground truth captions. 2 Literature Review The success of any captioning system depends on how well it transforms the visual information to natural language. Therefore, the significance of reliable automatic evaluation metrics is undeniable for the fine-grained analysis and advancement of image captioning systems. While image captioning has drawn inspiration from the Machine Translation (MT) domain for encoderdecoder based captioning networks (Vinyals et al., 2015), (Xu et al., 2015), (Yao et al., 2016), (You et al., 2016), (Lu et al., 2017), it has also benefited from automatic metrics which were initially proposed to evaluate machine translations/text summaries, such as BLEU (Papineni et al., 2002), ME- TEOR (Denkowski and Lavie, 2014) and ROUGE (Lin, 2004). In the past few years, two metrics CIDEr (Vedantam et al., 2015) and SPICE (Anderson et al., 2016) were developed specifically for image captioning. Compared to the previously used metrics, these two show a better correlation with human judgements. The authors in (Liu et al., 2016) proposed a linear combination of SPICE and CIDEr called SPIDEr and showed that optimizing image captioning models for SPIDEr s score can lead to better quality captions. However, SPIDEr was not evaluated for its correlation with human judgements. Recently, (Kusner et al., 2015) proposed the use of a document similarity metric Word Mover s Distance (WMD), which uses the word2vec (Mikolov et al., 2013) embedding space to determine the distance between two texts. The metrics used for caption evaluation can be broadly categorized as lexical and semantic measures. Lexical metrics reward the n-gram matches between candidate captions and human generated reference texts (Giménez and Màrquez, 2010), and can be further categorized as unigram and n-gram based measures. Unigram based methods such as BLEU-1 (Papineni et al., 2002), assess only the lexical correctness of the candidate. However, in the case of METEOR or WMD, where some sort of synonym-matching/stemming is also involved, unigram-overlaps help to evaluate both the lexical and to some degree the semantic aptness of the output caption. N-gram based metrics such as ROUGE and CIDEr primarily assess the lexical correctness of the caption, but also measure some amount of syntactic accuracy by capturing the word order. The lexical measures have received criticism based on the argument that the n-gram overlap is neither an adequate nor a necessary indicative measure of the caption quality (Anderson et al., 2016). To overcome this limitation, semantic metrics such as SPICE, capture the sentence meaning to evaluate the candidate captions. Their performance however is highly dependent on a successful semantic parsing. Purely syntactic measures, which capture the grammatical correctness, exist, and have been used in MT (Mutton et al., 2007), but not in the captioning domain. While fluency (well-formedness) of a candidate caption can be attributed to the syntactic and lexical correctness (Fomicheva et al., 2016), adequacy (informativeness) depends on the lexical and semantic correctness (Rios et al., 2011). We hypothesize that by combining scores from different metrics, which have different strengths in measuring adequacy and fluency, a composite metric that is of overall higher quality is created (Sec. 5). Machine learning offers a systematic approach to integrate the scores of stand-alone metrics. In the MT evaluation, various successful learning paradigms have been proposed (Bojar et al., 2016), (Bojar et al., 2017) and the existing learning-based metrics can be categorized as binary functions which classify the candidate translation as good or bad (Kulesza and Shieber, 2004), (Guzmán et al., 2015) or continuous functions which score the quality of translation on an absolute scale (Song and Cohn, 2011), (Albrecht and Hwa, 2008). Our research is conceptually similar to the work in (Kulesza and Shieber, 2004), which induces a human-likeness criteria. However, our approach differs in terms of the learning algorithm as well as the features used. Moreover, the focus of this work is to assess various combinations of metrics (that capture the caption quality at
3 Figure 1: Overall framework of the proposed Composite Metrics different linguistic levels) in terms of their correlation with human judgements at the sentence level. 3 Proposed Approach In our approach, we use scores conferred by a set of existing metrics as an input to a multi-layer feed-forward neural network. We adopt a training criteria based on a simple question: is the caption machine or human generated? Our trained classifier sets a boundary between good and bad quality captions, thus classifying them as human or machine produced. Furthermore, we obtain a continuous output score by using the class-probability, which can be considered as some measure of believability that the candidate caption is human generated. Framing our learning problem as a classification task allows us to create binary training data using the human generated captions and machine generated captions as positive and negative training examples respectively. Our proposed framework shown in Figure 1 first extracts a set of numeric features using the candidate C and the reference sentences S. The extracted feature vector is then fed as an input to our multi-layer neural network. Each entity of the feature vector corresponds to the score generated by one of the four measures: METEOR, CIDEr, WMD 1 and SPICE respectively. We chose these measures because they show a relatively better correlation with human judgements compared to 1 We convert the WMD distance score to similarity by using a negative exponential, to use it as a feature. the other commonly used ones for captioning (Kilickaya et al., 2016). Our composite metrics are named Eval MS, Eval CS, Eval MCS, Eval W CS, Eval MW S and Eval MW CS. The subscript letters in each name corresponds to the first letter of each individual metric. For example, Eval MS corresponds to the combination of METEOR and SPICE. Figure 2 shows the linguistic aspects captured by the stand-alone 2 and the composite metrics. SPICE is based on sentence meanings, thus it evaluates the semantics. CIDEr covers the syntactic and lexical aspects, whereas Meteor and WMD assess the lexical and semantic components. The learning-based metrics mostly fall in the region formed by the overlap of all three major linguistics facets, leading to better a evaluation. We train our metrics to maximise the classification accuracy on the training dataset. Since we are primarily interested in maximizing the correlation with human judgements, we perform early stopping based on Kendalls τ (rank correlation) with the validation set. 4 Experimental Setup To train our composite metrics, we source data from Flicker30k dataset (Plummer et al., 2015) and three image captioning models namely: (1) show and tell (Vinyals et al., 2015), (2) show, attend and tell (soft-attention) (Xu et al., 2015), and (3) adaptive attention (Lu et al., 2017). Flicker30k 2 The stand-alone metrics marked with an * in the Figure 2 are used as features for this work.
4 Table 1: Kendall s correlation co-efficient of automatic evaluation metrics and proposed composite metrics against human quality judgements. All correlations are significant at p<0.001 Individual Metrics Kendall τ Composite Metrics Kendall τ BLEU Eval MS ROUGE-L Eval CS METEOR Eval MCS CIDEr Eval W CS SPICE Eval MW S WMD Eval MW CS Figure 2: Various automatic measures (standalone and combined) and their respective linguistic levels. See Sec. 3 for more details. dataset consists of 31,783 photos acquired from Flicker 3, each paired with 5 captions obtained through the Amazon Mechanical Turk (AMT) (Turk, 2012). For each image in Flicker30k, we randomly select three of the human generated captions as positive training examples, and three machine generated (one from each image captioning model) captions as negative training examples. We combined the Microsoft COCO (Chen et al., 2015) training and validation set (containing 123,287 images in total, each paired with 5 or more captions), to train the image captioning models using their official codes. These image captioning models achieved state-of-the-art performance when they were published. In order to obtain reference captions for each training example, we again use the human written descriptions of Flicker30k. For each negative training example (machine-generated caption), we randomly choose 4 out of 5 human written captions originally associated with each image. Whereas, for each positive training example (human-generated caption), we use the 5 human written captions associated with each image, selecting one of these as a human candidate caption (positive example) and the remaining 4 as references. In Figure 3, a possible pairing scenario is shown for further clarification. For our validation set, we source data from Flicker8k (Young et al., 2014). This dataset contains 5,822 captions assessed by three expert judges on a scale of 1 (the caption is unrelated to the image) to 4 (the caption describes the im- 3 age without any errors). From our training set, we remove the captions of images which overlap with the captions in the validation and test sets (discussed in Sec. 5), leaving us with a total of 132,984 non-overlapping captions for the training of the composite metrics. 5 Results and Discussion 5.1 Correlation The most desirable characteristic of an automatic evaluation metric is its strong correlation with human scores (Zhang and Vogel, 2010). A stronger correlation with human judgements indicates that a metric captures the information that humans use to assess a candidate caption. To evaluate the sentence-level correlation of our composite metrics with human judgements, we source data from a dataset collected by the authors in (Aditya et al., 2017). We use 6993 manually evaluated human and machine generated captions from this set, which were scored by AMT workers for correctness on the scale of 1 (low relevance to image) to 5 (high relevance to image). Each caption in the dataset is accompanied by a single judgement. In Table 1, we report the Kendalls τ correlation coefficient for the proposed composite metrics and other commonly used caption evaluation metrics. It can be observed from Table 1 that composite metrics outperform stand-alone metrics in terms of sentence-level correlation. The combination of Meteor and SPICE (Eval MS ) and METEOR, CIDEr and SPICE (Eval MCS ) showed the most promising results. The success of these composite metrics can be attributed to the individual strengths of Meteor, CIDEr and SPICE. ME- TEOR is a strong lexical measure based on unigram matching, which uses additional linguistic knowledge for word matching, such as the morphological variation in words via stemming and
5 Figure 3: Shows an example of a candidate and reference pairing that is used in the training set. (a) Image, (b) human and machine generated captions for the image, and (c) candidate and reference pairings for the image. dictionary based look-up for synonyms and paraphrases (Banerjee and Lavie, 2005). CIDEr uses higher order n-grams to account for fluency and down-weighs the commonly occurring (less informative) n-grams by performing Term Frequency Inverse Document Frequency (TF-IDF) weighting for each n-gram in the dataset (Vedantam et al., 2015). SPICE on the other hand is a strong indicator of the semantic correctness of a caption. Together these metrics assess the lexical, semantic and syntactic information. The composite metrics which included WMD in the combination achieved a lower performance, compared to the ones in which WMD was not included. One possible reason is that WMD heavily penalizes shorter candidate captions when the number of words between the output and the reference captions are not equal (Kusner et al., 2015). This penalty might not be consistently useful as it is possible for a shorter candidate caption to be both fluent and adequate. Therefore, WMD is a better suited metric for measuring document distance. 5.2 Accuracy We follow the framework introduced in (Vedantam et al., 2015) to analyse the ability of a metric to discriminate between pairs of captions with reference to the ground truth caption. A metric is considered accurate if it assigns a higher score to the caption preferred by humans. For this experiment, we use PASCAL-50s (Vedantam et al., 2015), which contains human judgements for 4000 triplets of descriptions (one reference caption with two candidate captions). Based on the pairing, the triplets are grouped into four categories (comprising of 1000 triplets each) i.e., Table 2: Comparative accuracy results (in percentage) on four kinds of pairs tested on PASCAL-50s Metrics HC HI HM MM AVG BLEU ROUGE-L METEOR CIDEr SPICE WMD Eval MS Eval CS Eval MCS Eval W CS Eval MW S Eval MW CS Human-Human Correct (HC), Human-Human Incorrect (HI), Human-Machine (HM), Machine- Machine (MM). We follow the original approach of (Vedantam et al., 2015) and use 5 reference captions per candidate to assess the accuracy of the metrics and report them in Table 2. Table 2 shows that on average composite measures produce better accuracy compared to the individual metrics. Amongst the four categories, HC is the hardest, in which all metrics show the worst performance. Differentiating between two good quality (human generated) correct captions is challenging as it involves a fine-grained analysis of the two candidates. Eval MS achieves the highest accuracy in HC category which shows that as captioning systems continue to improve, this combination of lexical and semantic metrics will continue to perform well. Moreover, human generated captions are usually fluent. Therefore, a combination of strong indicators of adequacy such as SPICE and METEOR is the most suitable for this task. Eval MCS shows the highest accuracy in differ-
6 entiating between machine captions, which is another important category as one of the main goals of automatic evaluation is to distinguish between two machine algorithms. Amongst the composite metrics, Eval MS is again the best in distinguishing human captions (good quality) from machine captions (bad quality) which was our basic training criteria. 6 Conclusion and Future Works In this paper we propose a learning-based approach to combine various metrics to improve caption evaluation. Our experimental results show that metrics operating along different linguistic dimensions can be successfully combined through a learning-based framework, and they outperform the existing metrics for caption evaluation in term of correlation and accuracy, with Eval MS and Eval MCS giving the best overall performance. Our study reveals that the proposed approach is promising and has a lot of potential to be used for evaluation in the captioning domain. In the future, we plan to integrate features (components) of metrics instead of their scores for a better performance. We also intend to use syntactic measures, which to the best of our knowledge have not yet been used for caption evaluation (except in an indirect way by the n-gram measures which capture the word order) and study how they can improve the correlation at the sentence level. Majority of the metrics for captioning focus more on adequacy as compared to fluency. This aspect also needs further attention and a combination of metrics/features that can specifically assess the fluency of captions needs to be devised. Acknowledgements The authors are grateful to Nvidia for providing Titan-Xp GPU, which was used for the experiments. This research is supported by Australian Research Council, ARC DP References Somak Aditya, Yezhou Yang, Chitta Baral, Yiannis Aloimonos, and Cornelia Fermüller Image understanding using vision and reasoning through scene description graph. Computer Vision and Image Understanding. Joshua S Albrecht and Rebecca Hwa Regression for machine translation evaluation at the sentence level. Machine Translation, 22(1-2):1. Peter Anderson, Basura Fernando, Mark Johnson, and Stephen Gould Spice: Semantic propositional image caption evaluation. In European Conference on Computer Vision, pages Springer. Satanjeev Banerjee and Alon Lavie Meteor: An automatic metric for mt evaluation with improved correlation with human judgments. In Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, pages Raffaella Bernardi, Ruket Cakici, Desmond Elliott, Aykut Erdem, Erkut Erdem, Nazli Ikizler-Cinbis, Frank Keller, Adrian Muscat, Barbara Plank, et al Automatic description generation from images: A survey of models, datasets, and evaluation measures. J. Artif. Intell. Res.(JAIR), 55: Ondřej Bojar, Yvette Graham, Amir Kamran, and Miloš Stanojević Results of the wmt16 metrics shared task. In Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers, volume 2, pages Ondřej Bojar, Jindřich Helcl, Tom Kocmi, Jindřich Libovickỳ, and Tomáš Musil Results of the wmt17 neural mt training task. In Proceedings of the Second Conference on Machine Translation, pages Xinlei Chen, Hao Fang, Tsung-Yi Lin, Ramakrishna Vedantam, Saurabh Gupta, Piotr Dollár, and C Lawrence Zitnick Microsoft coco captions: Data collection and evaluation server. arxiv preprint arxiv: Michael Denkowski and Alon Lavie Meteor universal: Language specific translation evaluation for any target language. In Proceedings of the ninth workshop on statistical machine translation, pages Marina Fomicheva, Núria Bel, Lucia Specia, Iria da Cunha, and Anton Malinovskiy Cobaltf: a fluent metric for mt evaluation. In Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers, volume 2, pages Jesús Giménez and Lluís Màrquez Linguistic measures for automatic machine translation evaluation. Machine Translation, 24(3-4): Francisco Guzmán, Shafiq Joty, Lluís Màrquez, and Preslav Nakov Pairwise neural machine translation evaluation. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), volume 1, pages Mert Kilickaya, Aykut Erdem, Nazli Ikizler-Cinbis, and Erkut Erdem Re-evaluating automatic metrics for image captioning. arxiv preprint arxiv:
7 Alex Kulesza and Stuart M Shieber A learning approach to improving sentence-level mt evaluation. In Proceedings of the 10th International Conference on Theoretical and Methodological Issues in Machine Translation, pages Matt Kusner, Yu Sun, Nicholas Kolkin, and Kilian Weinberger From word embeddings to document distances. In International Conference on Machine Learning, pages Chin-Yew Lin Rouge: A package for automatic evaluation of summaries. Text Summarization Branches Out. Siqi Liu, Zhenhai Zhu, Ning Ye, Sergio Guadarrama, and Kevin Murphy Improved image captioning via policy gradient optimization of spider. arxiv preprint arxiv: Jiasen Lu, Caiming Xiong, Devi Parikh, and Richard Socher Knowing when to look: Adaptive attention via a visual sentinel for image captioning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), volume 6. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pages Andrew Mutton, Mark Dras, Stephen Wan, and Robert Dale Gleu: Automatic evaluation of sentence-level fluency. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pages Kishore Papineni, Salim Roukos, Todd Ward, and Wei- Jing Zhu Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on association for computational linguistics, pages Association for Computational Linguistics. Bryan A Plummer, Liwei Wang, Chris M Cervantes, Juan C Caicedo, Julia Hockenmaier, and Svetlana Lazebnik Flickr30k entities: Collecting region-to-phrase correspondences for richer image-to-sentence models. In Computer Vision (ICCV), 2015 IEEE International Conference on, pages IEEE. Gideon Toury Descriptive Translation Studies and beyond: revised edition, volume 100. John Benjamins Publishing. Amazon Mechanical Turk Amazon mechanical turk. Retrieved August, 17:2012. Ramakrishna Vedantam, C Lawrence Zitnick, and Devi Parikh Cider: Consensus-based image description evaluation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan Show and tell: A neural image caption generator. In Computer Vision and Pattern Recognition (CVPR), 2015 IEEE Conference on, pages IEEE. Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio Show, attend and tell: Neural image caption generation with visual attention. In International Conference on Machine Learning, pages Ting Yao, Yingwei Pan, Yehao Li, Zhaofan Qiu, and Tao Mei Boosting image captioning with attributes. OpenReview, 2(5):8. Quanzeng You, Hailin Jin, Zhaowen Wang, Chen Fang, and Jiebo Luo Image captioning with semantic attention. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages Peter Young, Alice Lai, Micah Hodosh, and Julia Hockenmaier From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. Transactions of the Association for Computational Linguistics, 2: Ying Zhang and Stephan Vogel Significance tests of automatic machine translation evaluation metrics. Machine Translation, 24(1): Miguel Rios, Wilker Aziz, and Lucia Specia Tine: A metric to assess mt adequacy. In Proceedings of the Sixth Workshop on Statistical Machine Translation, pages Association for Computational Linguistics. Xingyi Song and Trevor Cohn Regression and ranking based optimisation for sentence level machine translation evaluation. In Proceedings of the Sixth Workshop on Statistical Machine Translation, pages Association for Computational Linguistics.
NNEval: Neural Network based Evaluation Metric for Image Captioning
NNEval: Neural Network based Evaluation Metric for Image Captioning Naeha Sharif 1, Lyndon White 1, Mohammed Bennamoun 1, and Syed Afaq Ali Shah 1,2 1 The University of Western Australia, 35 Stirling Highway,
More informationSPICE: Semantic Propositional Image Caption Evaluation
SPICE: Semantic Propositional Image Caption Evaluation Presented to the COCO Consortium, Sept 2016 Peter Anderson1, Basura Fernando1, Mark Johnson2 and Stephen Gould1 1 Australian National University 2
More informationTowards image captioning and evaluation. Vikash Sehwag, Qasim Nadeem
Towards image captioning and evaluation Vikash Sehwag, Qasim Nadeem Overview Why automatic image captioning? Overview of two caption evaluation metrics Paper: Captioning with Nearest-Neighbor approaches
More informationTraining for Diversity in Image Paragraph Captioning
Training for Diversity in Image Paragraph Captioning Luke Melas-Kyriazi George Han Alexander M. Rush School of Engineering and Applied Sciences Harvard University {lmelaskyriazi@college, hanz@college,
More informationLOOK, LISTEN, AND DECODE: MULTIMODAL SPEECH RECOGNITION WITH IMAGES. Felix Sun, David Harwath, and James Glass
LOOK, LISTEN, AND DECODE: MULTIMODAL SPEECH RECOGNITION WITH IMAGES Felix Sun, David Harwath, and James Glass MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA, USA {felixsun,
More informationarxiv: v1 [cs.cl] 13 Apr 2017
Room for improvement in automatic image description: an error analysis Emiel van Miltenburg Vrije Universiteit Amsterdam emiel.van.miltenburg@vu.nl Desmond Elliott ILLC, Universiteit van Amsterdam d.elliott@uva.nl
More informationImage Captioning using Reinforcement Learning. Presentation by: Samarth Gupta
Image Captioning using Reinforcement Learning Presentation by: Samarth Gupta 1 Introduction Summary Supervised Models Image captioning as RL problem Actor Critic Architecture Policy Gradient architecture
More informationA Neural Compositional Paradigm for Image Captioning
A Neural Compositional Paradigm for Image Captioning Bo Dai 1 Sanja Fidler 2,3,4 Dahua Lin 1 1 CUHK-SenseTime Joint Lab, The Chinese University of Hong Kong 2 University of Toronto 3 Vector Institute 4
More informationarxiv: v1 [cs.cv] 23 Oct 2018
A Neural Compositional Paradigm for Image Captioning arxiv:1810.09630v1 [cs.cv] 23 Oct 2018 Bo Dai 1 Sanja Fidler 2,3,4 Dahua Lin 1 1 CUHK-SenseTime Joint Lab, The Chinese University of Hong Kong 2 University
More informationPragmatically Informative Image Captioning with Character-Level Inference
Pragmatically Informative Image Captioning with Character-Level Inference Reuben Cohn-Gordon, Noah Goodman, and Chris Potts Stanford University {reubencg, ngoodman, cgpotts}@stanford.edu Abstract We combine
More informationAttention Correctness in Neural Image Captioning
Attention Correctness in Neural Image Captioning Chenxi Liu 1 Junhua Mao 2 Fei Sha 2,3 Alan Yuille 1,2 Johns Hopkins University 1 University of California, Los Angeles 2 University of Southern California
More informationUnsupervised Measurement of Translation Quality Using Multi-engine, Bi-directional Translation
Unsupervised Measurement of Translation Quality Using Multi-engine, Bi-directional Translation Menno van Zaanen and Simon Zwarts Division of Information and Communication Sciences Department of Computing
More informationarxiv: v3 [cs.cv] 23 Jul 2018
Show, Tell and Discriminate: Image Captioning by Self-retrieval with Partially Labeled Data Xihui Liu 1, Hongsheng Li 1, Jing Shao 2, Dapeng Chen 1, and Xiaogang Wang 1 arxiv:1803.08314v3 [cs.cv] 23 Jul
More informationarxiv: v2 [cs.cv] 17 Jun 2016
Human Attention in Visual Question Answering: Do Humans and Deep Networks Look at the Same Regions? Abhishek Das 1, Harsh Agrawal 1, C. Lawrence Zitnick 2, Devi Parikh 1, Dhruv Batra 1 1 Virginia Tech,
More information. Semi-automatic WordNet Linking using Word Embeddings. Kevin Patel, Diptesh Kanojia and Pushpak Bhattacharyya Presented by: Ritesh Panjwani
Semi-automatic WordNet Linking using Word Embeddings Kevin Patel, Diptesh Kanojia and Pushpak Bhattacharyya Presented by: Ritesh Panjwani January 11, 2018 Kevin Patel WordNet Linking via Embeddings 1/22
More informationExploring Normalization Techniques for Human Judgments of Machine Translation Adequacy Collected Using Amazon Mechanical Turk
Exploring Normalization Techniques for Human Judgments of Machine Translation Adequacy Collected Using Amazon Mechanical Turk Michael Denkowski and Alon Lavie Language Technologies Institute School of
More informationChoosing the Right Evaluation for Machine Translation: an Examination of Annotator and Automatic Metric Performance on Human Judgment Tasks
Choosing the Right Evaluation for Machine Translation: an Examination of Annotator and Automatic Metric Performance on Human Judgment Tasks Michael Denkowski and Alon Lavie Language Technologies Institute
More informationLearning to Evaluate Image Captioning
Learning to Evaluate Image Captioning Yin Cui 1,2 Guandao Yang 1 Andreas Veit 1,2 Xun Huang 1,2 Serge Belongie 1,2 1 Department of Computer Science, Cornell University 2 Cornell Tech Abstract Evaluation
More informationIntelligent Machines That Act Rationally. Hang Li Bytedance AI Lab
Intelligent Machines That Act Rationally Hang Li Bytedance AI Lab Four Definitions of Artificial Intelligence Building intelligent machines (i.e., intelligent computers) Thinking humanly Acting humanly
More informationComparing Automatic Evaluation Measures for Image Description
Comparing Automatic Evaluation Measures for Image Description Desmond Elliott and Frank Keller Institute for Language, Cognition, and Computation School of Informatics, University of Edinburgh d.elliott@ed.ac.uk,
More informationarxiv: v1 [cs.cv] 12 Dec 2016
Text-guided Attention Model for Image Captioning Jonghwan Mun, Minsu Cho, Bohyung Han Department of Computer Science and Engineering, POSTECH, Korea {choco1916, mscho, bhhan}@postech.ac.kr arxiv:1612.03557v1
More informationIntelligent Machines That Act Rationally. Hang Li Toutiao AI Lab
Intelligent Machines That Act Rationally Hang Li Toutiao AI Lab Four Definitions of Artificial Intelligence Building intelligent machines (i.e., intelligent computers) Thinking humanly Acting humanly Thinking
More informationBLEU: A Discriminative Metric for Generation Tasks with Intrinsically Diverse Targets
BLEU: A Discriminative Metric for Generation Tasks with Intrinsically Diverse Targets Michel Galley 1 Chris Brockett 1 Alessandro Sordoni 2 Yangfeng Ji 3 Michael Auli 4 Chris Quirk 1 Margaret Mitchell
More informationarxiv: v1 [cs.cl] 22 May 2018
Joint Image Captioning and Question Answering Jialin Wu and Zeyuan Hu and Raymond J. Mooney Department of Computer Science The University of Texas at Austin {jialinwu, zeyuanhu, mooney}@cs.utexas.edu arxiv:1805.08389v1
More informationComparing Automatic Evaluation Measures for Image Description
Comparing Automatic Evaluation Measures for Image Description Desmond Elliott and Frank Keller Institute for Language, Cognition, and Computation School of Informatics, University of Edinburgh d.elliott@ed.ac.uk,
More informationSocial Image Captioning: Exploring Visual Attention and User Attention
sensors Article Social Image Captioning: Exploring and User Leiquan Wang 1 ID, Xiaoliang Chu 1, Weishan Zhang 1, Yiwei Wei 1, Weichen Sun 2,3 and Chunlei Wu 1, * 1 College of Computer & Communication Engineering,
More informationStudy of Translation Edit Rate with Targeted Human Annotation
Study of Translation Edit Rate with Targeted Human Annotation Matthew Snover (UMD) Bonnie Dorr (UMD) Richard Schwartz (BBN) Linnea Micciulla (BBN) John Makhoul (BBN) Outline Motivations Definition of Translation
More informationOn the Automatic Generation of Medical Imaging Reports
On the Automatic Generation of Medical Imaging Reports Baoyu Jing * Pengtao Xie * Eric P. Xing Petuum Inc, USA * School of Computer Science, Carnegie Mellon University, USA {baoyu.jing, pengtao.xie, eric.xing}@petuum.com
More informationWhen to Finish? Optimal Beam Search for Neural Text Generation (modulo beam size)
When to Finish? Optimal Beam Search for Neural Text Generation (modulo beam size) Liang Huang and Kai Zhao and Mingbo Ma School of Electrical Engineering and Computer Science Oregon State University Corvallis,
More informationSmaller, faster, deeper: University of Edinburgh MT submittion to WMT 2017
Smaller, faster, deeper: University of Edinburgh MT submittion to WMT 2017 Rico Sennrich, Alexandra Birch, Anna Currey, Ulrich Germann, Barry Haddow, Kenneth Heafield, Antonio Valerio Miceli Barone, Philip
More informationMan is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings. Md Musa Leibniz University of Hannover
Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings Md Musa Leibniz University of Hannover Agenda Introduction Background Word2Vec algorithm Bias in the data generated from
More informationDiscriminability objective for training descriptive image captions
Discriminability objective for training descriptive image captions Ruotian Luo TTI-Chicago Joint work with Brian Price Scott Cohen Greg Shakhnarovich (Adobe) (Adobe) (TTIC) Discriminability objective for
More informationVector Learning for Cross Domain Representations
Vector Learning for Cross Domain Representations Shagan Sah, Chi Zhang, Thang Nguyen, Dheeraj Kumar Peri, Ameya Shringi, Raymond Ptucha Rochester Institute of Technology, Rochester, NY 14623, USA arxiv:1809.10312v1
More informationBeam Search Strategies for Neural Machine Translation
Beam Search Strategies for Neural Machine Translation Markus Freitag and Yaser Al-Onaizan IBM T.J. Watson Research Center 1101 Kitchawan Rd, Yorktown Heights, NY 10598 {freitagm,onaizan}@us.ibm.com Abstract
More informationChittron: An Automatic Bangla Image Captioning System
Chittron: An Automatic Bangla Image Captioning System Motiur Rahman 1, Nabeel Mohammed 2, Nafees Mansoor 3 and Sifat Momen 4 1,3 Department of Computer Science and Engineering, University of Liberal Arts
More informationSentiment Analysis of Reviews: Should we analyze writer intentions or reader perceptions?
Sentiment Analysis of Reviews: Should we analyze writer intentions or reader perceptions? Isa Maks and Piek Vossen Vu University, Faculty of Arts De Boelelaan 1105, 1081 HV Amsterdam e.maks@vu.nl, p.vossen@vu.nl
More informationarxiv: v1 [cs.cv] 19 Jan 2018
Describing Semantic Representations of Brain Activity Evoked by Visual Stimuli arxiv:1802.02210v1 [cs.cv] 19 Jan 2018 Eri Matsuo Ichiro Kobayashi Ochanomizu University 2-1-1 Ohtsuka, Bunkyo-ku, Tokyo 112-8610,
More informationarxiv: v1 [cs.cv] 7 Dec 2018
An Attempt towards Interpretable Audio-Visual Video Captioning Yapeng Tian 1, Chenxiao Guan 1, Justin Goodman 2, Marc Moore 3, and Chenliang Xu 1 arxiv:1812.02872v1 [cs.cv] 7 Dec 2018 1 Department of Computer
More informationA HMM-based Pre-training Approach for Sequential Data
A HMM-based Pre-training Approach for Sequential Data Luca Pasa 1, Alberto Testolin 2, Alessandro Sperduti 1 1- Department of Mathematics 2- Department of Developmental Psychology and Socialisation University
More informationTranslation Quality Assessment: Evaluation and Estimation
Translation Quality Assessment: Evaluation and Estimation Lucia Specia University of Sheffield l.specia@sheffield.ac.uk MTM - Prague, 12 September 2016 Translation Quality Assessment: Evaluation and Estimation
More informationDeepDiary: Automatically Captioning Lifelogging Image Streams
DeepDiary: Automatically Captioning Lifelogging Image Streams Chenyou Fan David J. Crandall School of Informatics and Computing Indiana University Bloomington, Indiana USA {fan6,djcran}@indiana.edu Abstract.
More informationBottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering SUPPLEMENTARY MATERIALS 1. Implementation Details 1.1. Bottom-Up Attention Model Our bottom-up attention Faster R-CNN
More informationBuilding Evaluation Scales for NLP using Item Response Theory
Building Evaluation Scales for NLP using Item Response Theory John Lalor CICS, UMass Amherst Joint work with Hao Wu (BC) and Hong Yu (UMMS) Motivation Evaluation metrics for NLP have been mostly unchanged
More informationarxiv: v3 [cs.cv] 11 Aug 2017 Abstract
human G-GAN G-MLE human G-GAN G-MLE Towards Diverse and Natural Image Descriptions via a Conditional GAN Bo Dai 1 Sanja Fidler 23 Raquel Urtasun 234 Dahua Lin 1 1 Department of Information Engineering,
More informationExploiting Patent Information for the Evaluation of Machine Translation
Exploiting Patent Information for the Evaluation of Machine Translation Atsushi Fujii University of Tsukuba Masao Utiyama National Institute of Information and Communications Technology Mikio Yamamoto
More informationAdjudicator Agreement and System Rankings for Person Name Search
Adjudicator Agreement and System Rankings for Person Name Search Mark D. Arehart, Chris Wolf, Keith J. Miller The MITRE Corporation 7515 Colshire Dr., McLean, VA 22102 {marehart, cwolf, keith}@mitre.org
More informationContext Gates for Neural Machine Translation
Context Gates for Neural Machine Translation Zhaopeng Tu Yang Liu Zhengdong Lu Xiaohua Liu Hang Li Noah s Ark Lab, Huawei Technologies, Hong Kong {tu.zhaopeng,lu.zhengdong,liuxiaohua3,hangli.hl}@huawei.com
More informationMulti-attention Guided Activation Propagation in CNNs
Multi-attention Guided Activation Propagation in CNNs Xiangteng He and Yuxin Peng (B) Institute of Computer Science and Technology, Peking University, Beijing, China pengyuxin@pku.edu.cn Abstract. CNNs
More informationShu Kong. Department of Computer Science, UC Irvine
Ubiquitous Fine-Grained Computer Vision Shu Kong Department of Computer Science, UC Irvine Outline 1. Problem definition 2. Instantiation 3. Challenge and philosophy 4. Fine-grained classification with
More informationShu Kong. Department of Computer Science, UC Irvine
Ubiquitous Fine-Grained Computer Vision Shu Kong Department of Computer Science, UC Irvine Outline 1. Problem definition 2. Instantiation 3. Challenge 4. Fine-grained classification with holistic representation
More informationAge Estimation based on Multi-Region Convolutional Neural Network
Age Estimation based on Multi-Region Convolutional Neural Network Ting Liu, Jun Wan, Tingzhao Yu, Zhen Lei, and Stan Z. Li 1 Center for Biometrics and Security Research & National Laboratory of Pattern
More informationDifferential Attention for Visual Question Answering
Differential Attention for Visual Question Answering Badri Patro and Vinay P. Namboodiri IIT Kanpur { badri,vinaypn }@iitk.ac.in Abstract In this paper we aim to answer questions based on images when provided
More informationDEEP LEARNING BASED VISION-TO-LANGUAGE APPLICATIONS: CAPTIONING OF PHOTO STREAMS, VIDEOS, AND ONLINE POSTS
SEOUL Oct.7, 2016 DEEP LEARNING BASED VISION-TO-LANGUAGE APPLICATIONS: CAPTIONING OF PHOTO STREAMS, VIDEOS, AND ONLINE POSTS Gunhee Kim Computer Science and Engineering Seoul National University October
More informationarxiv: v2 [cs.lg] 1 Jun 2018
Shagun Sodhani 1 * Vardaan Pahuja 1 * arxiv:1805.11016v2 [cs.lg] 1 Jun 2018 Abstract Self-play (Sukhbaatar et al., 2017) is an unsupervised training procedure which enables the reinforcement learning agents
More informationDecoding Strategies for Neural Referring Expression Generation
Decoding Strategies for Neural Referring Expression Generation Sina Zarrieß and David Schlangen Dialogue Systems Group // CITEC // Faculty of Linguistics and Literary Studies Bielefeld University, Germany
More informationarxiv: v1 [cs.cv] 20 Dec 2018
nocaps: novel object captioning at scale arxiv:1812.08658v1 [cs.cv] 20 Dec 2018 Harsh Agrawal 1 Karan Desai 1 Xinlei Chen 2 Rishabh Jain 1 Dhruv Batra 1,2 Devi Parikh 1,2 Stefan Lee 1 Peter Anderson 1
More informationDo explanations make VQA models more predictable to a human?
Do explanations make VQA models more predictable to a human? Arjun Chandrasekaran,1 Viraj Prabhu,1 Deshraj Yadav,1 Prithvijit Chattopadhyay,1 Devi Parikh1,2 1 2 Georgia Institute of Technology Facebook
More informationFormulating Emotion Perception as a Probabilistic Model with Application to Categorical Emotion Classification
Formulating Emotion Perception as a Probabilistic Model with Application to Categorical Emotion Classification Reza Lotfian and Carlos Busso Multimodal Signal Processing (MSP) lab The University of Texas
More informationTranslation Quality Assessment: Evaluation and Estimation
Translation Quality Assessment: Evaluation and Estimation Lucia Specia University of Sheffield l.specia@sheffield.ac.uk 9 September 2013 Translation Quality Assessment: Evaluation and Estimation 1 / 33
More informationMotivation: Attention: Focusing on specific parts of the input. Inspired by neuroscience.
Outline: Motivation. What s the attention mechanism? Soft attention vs. Hard attention. Attention in Machine translation. Attention in Image captioning. State-of-the-art. 1 Motivation: Attention: Focusing
More informationCase-based reasoning using electronic health records efficiently identifies eligible patients for clinical trials
Case-based reasoning using electronic health records efficiently identifies eligible patients for clinical trials Riccardo Miotto and Chunhua Weng Department of Biomedical Informatics Columbia University,
More informationarxiv: v2 [cs.cv] 19 Dec 2017
An Ensemble of Deep Convolutional Neural Networks for Alzheimer s Disease Detection and Classification arxiv:1712.01675v2 [cs.cv] 19 Dec 2017 Jyoti Islam Department of Computer Science Georgia State University
More informationAssigning Relative Importance to Scene Elements
Assigning Relative Importance to Scene Elements Igor L. O Bastos, William Robson Schwartz Smart Surveillance Interest Group, Department of Computer Science Universidade Federal de Minas Gerais, Minas Gerais,
More informationExploiting Task-Oriented Resources to Learn Word Embeddings for Clinical Abbreviation Expansion
Exploiting Task-Oriented Resources to Learn Word Embeddings for Clinical Abbreviation Expansion Yue Liu 1, Tao Ge 2, Kusum Mathews 3, Heng Ji 1, Deborah L. McGuinness 1 1 Department of Computer Science,
More informationAutomatic Arabic Image Captioning using RNN-LSTM-Based Language Model and CNN
Automatic Arabic Image Captioning using RNN-LSTM-Based Language Model and CNN Huda A. Al-muzaini, Tasniem N. Al-yahya, Hafida Benhidour Dept. of Computer Science College of Computer and Information Sciences
More informationObject Detectors Emerge in Deep Scene CNNs
Object Detectors Emerge in Deep Scene CNNs Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, Antonio Torralba Presented By: Collin McCarthy Goal: Understand how objects are represented in CNNs Are
More informationarxiv: v1 [cs.ai] 28 Nov 2017
: a better way of the parameters of a Deep Neural Network arxiv:1711.10177v1 [cs.ai] 28 Nov 2017 Guglielmo Montone Laboratoire Psychologie de la Perception Université Paris Descartes, Paris montone.guglielmo@gmail.com
More informationSpatial Prepositions in Context: The Semantics of near in the Presence of Distractor Objects
Spatial Prepositions in Context: The Semantics of near in the Presence of Distractor Objects Fintan J. Costello, School of Computer Science and Informatics, University College Dublin, Dublin, Ireland.
More informationMeasurement of Progress in Machine Translation
Measurement of Progress in Machine Translation Yvette Graham Timothy Baldwin Aaron Harwood Alistair Moffat Justin Zobel Department of Computing and Information Systems The University of Melbourne {ygraham,tbaldwin,aharwood,amoffat,jzobel}@unimelb.edu.au
More informationSegmentation of Cell Membrane and Nucleus by Improving Pix2pix
Segmentation of Membrane and Nucleus by Improving Pix2pix Masaya Sato 1, Kazuhiro Hotta 1, Ayako Imanishi 2, Michiyuki Matsuda 2 and Kenta Terai 2 1 Meijo University, Siogamaguchi, Nagoya, Aichi, Japan
More informationNeural Response Generation for Customer Service based on Personality Traits
Neural Response Generation for Customer Service based on Personality Traits Jonathan Herzig, Michal Shmueli-Scheuer, Tommy Sandbank and David Konopnicki IBM Research - Haifa Haifa 31905, Israel {hjon,shmueli,tommy,davidko}@il.ibm.com
More informationarxiv: v1 [cs.cv] 28 Mar 2016
arxiv:1603.08507v1 [cs.cv] 28 Mar 2016 Generating Visual Explanations Lisa Anne Hendricks 1 Zeynep Akata 2 Marcus Rohrbach 1,3 Jeff Donahue 1 Bernt Schiele 2 Trevor Darrell 1 1 UC Berkeley EECS, CA, United
More informationEmotion Recognition using a Cauchy Naive Bayes Classifier
Emotion Recognition using a Cauchy Naive Bayes Classifier Abstract Recognizing human facial expression and emotion by computer is an interesting and challenging problem. In this paper we propose a method
More informationarxiv: v3 [cs.cv] 29 Jan 2019
Large-Scale Visual Relationship Understanding Ji Zhang 1,2, Yannis Kalantidis 1, Marcus Rohrbach 1, Manohar Paluri 1, Ahmed Elgammal 2, Mohamed Elhoseiny 1 1 Facebook Research 2 Department of Computer
More informationRumor Detection on Twitter with Tree-structured Recursive Neural Networks
1 Rumor Detection on Twitter with Tree-structured Recursive Neural Networks Jing Ma 1, Wei Gao 2, Kam-Fai Wong 1,3 1 The Chinese University of Hong Kong 2 Victoria University of Wellington, New Zealand
More informationGuided Open Vocabulary Image Captioning with Constrained Beam Search
Guided Open Vocabulary Image Captioning with Constrained Beam Search Peter Anderson 1, Basura Fernando 1, Mark Johnson 2, Stephen Gould 1 1 The Australian National University, Canberra, Australia firstname.lastname@anu.edu.au
More informationFramework for Comparative Research on Relational Information Displays
Framework for Comparative Research on Relational Information Displays Sung Park and Richard Catrambone 2 School of Psychology & Graphics, Visualization, and Usability Center (GVU) Georgia Institute of
More informationToward the Evaluation of Machine Translation Using Patent Information
Toward the Evaluation of Machine Translation Using Patent Information Atsushi Fujii Graduate School of Library, Information and Media Studies University of Tsukuba Mikio Yamamoto Graduate School of Systems
More informationarxiv: v3 [stat.ml] 27 Mar 2018
ATTACKING THE MADRY DEFENSE MODEL WITH L 1 -BASED ADVERSARIAL EXAMPLES Yash Sharma 1 and Pin-Yu Chen 2 1 The Cooper Union, New York, NY 10003, USA 2 IBM Research, Yorktown Heights, NY 10598, USA sharma2@cooper.edu,
More informationExploiting Task-Oriented Resources to Learn Word Embeddings for Clinical Abbreviation Expansion
Exploiting Task-Oriented Resources to Learn Word Embeddings for Clinical Abbreviation Expansion Yue Liu 1, Tao Ge 2, Kusum S. Mathews 3, Heng Ji 1, Deborah L. McGuinness 1 1 Department of Computer Science,
More informationExploiting Task-Oriented Resources to Learn Word Embeddings for Clinical Abbreviation Expansion
Exploiting Task-Oriented Resources to Learn Word Embeddings for Clinical Abbreviation Expansion Yue Liu 1, Tao Ge 2, Kusum S. Mathews 3, Heng Ji 1, Deborah L. McGuinness 1 1 Department of Computer Science,
More informationAn Overview and Comparative Analysis on Major Generative Models
An Overview and Comparative Analysis on Major Generative Models Zijing Gu zig021@ucsd.edu Abstract The amount of researches on generative models has been grown rapidly after a period of silence due to
More informationarxiv: v2 [cs.cv] 6 Nov 2017
Published as a conference paper in International Conference of Computer Vision (ICCV) 2017 Speaking the Same Language: Matching Machine to Human Captions by Adversarial Training Rakshith Shetty 1 Marcus
More informationarxiv: v1 [cs.cl] 4 Apr 2019
Evaluating Style Transfer for Text Remi Mir 1, Bjarke Felbo 2, Nick Obradovich 2, Iyad Rahwan 2 1 Department of EECS, Massachusetts Institute of Technology 2 Media Lab, Massachusetts Institute of Technology
More informationComparison of Two Approaches for Direct Food Calorie Estimation
Comparison of Two Approaches for Direct Food Calorie Estimation Takumi Ege and Keiji Yanai Department of Informatics, The University of Electro-Communications, Tokyo 1-5-1 Chofugaoka, Chofu-shi, Tokyo
More informationDeep Learning based Information Extraction Framework on Chinese Electronic Health Records
Deep Learning based Information Extraction Framework on Chinese Electronic Health Records Bing Tian Yong Zhang Kaixin Liu Chunxiao Xing RIIT, Beijing National Research Center for Information Science and
More informationAutomated Femoral Neck Shaft Angle Measurement Using Neural Networks
SIIM 2017 Scientific Session Posters & Demonstrations Automated Femoral Neck Shaft Angle Using s Simcha Rimler, MD, SUNY Downstate; Srinivas Kolla; Mohammad Faysel; Menachem Rimler (Presenter); Gabriel
More informationCSE Introduction to High-Perfomance Deep Learning ImageNet & VGG. Jihyung Kil
CSE 5194.01 - Introduction to High-Perfomance Deep Learning ImageNet & VGG Jihyung Kil ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton,
More informationDeep Architectures for Neural Machine Translation
Deep Architectures for Neural Machine Translation Antonio Valerio Miceli Barone Jindřich Helcl Rico Sennrich Barry Haddow Alexandra Birch School of Informatics, University of Edinburgh Faculty of Mathematics
More informationMemory-Augmented Active Deep Learning for Identifying Relations Between Distant Medical Concepts in Electroencephalography Reports
Memory-Augmented Active Deep Learning for Identifying Relations Between Distant Medical Concepts in Electroencephalography Reports Ramon Maldonado, BS, Travis Goodwin, PhD Sanda M. Harabagiu, PhD The University
More informationarxiv: v1 [cs.cv] 18 May 2016
BEYOND CAPTION TO NARRATIVE: VIDEO CAPTIONING WITH MULTIPLE SENTENCES Andrew Shin, Katsunori Ohnishi, and Tatsuya Harada Grad. School of Information Science and Technology, The University of Tokyo, Japan
More informationNeural Baby Talk. Jiasen Lu 1 Jianwei Yang 1 Dhruv Batra 1,2 Devi Parikh 1,2 1 Georgia Institute of Technology 2 Facebook AI Research.
Neural Baby Talk Jiasen Lu 1 Jianwei Yang 1 Dhruv Batra 1,2 Devi Parikh 1,2 1 Georgia Institute of Technology 2 Facebook AI Research {jiasenlu, jw2yang, dbatra, parikh}@gatech.edu Abstract We introduce
More informationDeep Multimodal Fusion of Health Records and Notes for Multitask Clinical Event Prediction
Deep Multimodal Fusion of Health Records and Notes for Multitask Clinical Event Prediction Chirag Nagpal Auton Lab Carnegie Mellon Pittsburgh, PA 15213 chiragn@cs.cmu.edu Abstract The advent of Electronic
More informationMinimum Risk Training For Neural Machine Translation. Shiqi Shen, Yong Cheng, Zhongjun He, Wei He, Hua Wu, Maosong Sun, and Yang Liu
Minimum Risk Training For Neural Machine Translation Shiqi Shen, Yong Cheng, Zhongjun He, Wei He, Hua Wu, Maosong Sun, and Yang Liu ACL 2016, Berlin, German, August 2016 Machine Translation MT: using computer
More informationPartially-Supervised Image Captioning
Partially-Supervised Image Captioning Peter Anderson Macquarie University Sydney, Australia p.anderson@mq.edu.au Stephen Gould Australian National University Canberra, Australia stephen.gould@anu.edu.au
More informationarxiv: v1 [cs.cv] 24 Jul 2018
Multi-Class Lesion Diagnosis with Pixel-wise Classification Network Manu Goyal 1, Jiahua Ng 2, and Moi Hoon Yap 1 1 Visual Computing Lab, Manchester Metropolitan University, M1 5GD, UK 2 University of
More informationBayesian models of inductive generalization
Bayesian models of inductive generalization Neville E. Sanjana & Joshua B. Tenenbaum Department of Brain and Cognitive Sciences Massachusetts Institute of Technology Cambridge, MA 239 nsanjana, jbt @mit.edu
More informationPathGAN: Visual Scanpath Prediction with Generative Adversarial Networks
PathGAN: Visual Scanpath Prediction with Generative Adversarial Networks Marc Assens 1, Kevin McGuinness 1, Xavier Giro-i-Nieto 2, and Noel E. O Connor 1 1 Insight Centre for Data Analytic, Dublin City
More informationPsychological testing
Psychological testing Lecture 12 Mikołaj Winiewski, PhD Test Construction Strategies Content validation Empirical Criterion Factor Analysis Mixed approach (all of the above) Content Validation Defining
More information