HOW AI WILL IMPACT SUBTITLE PRODUCTION

Similar documents
INTERLINGUAL SUBTITLES AND SDH IN HBBTV

Accessible Computing Research for Users who are Deaf and Hard of Hearing (DHH)

R&D White Paper WHP 065. Speech Recognition in Assisted and Live Subtitling for Television. Research & Development BRITISH BROADCASTING CORPORATION

Speech Processing / Speech Translation Case study: Transtac Details

Ofcom consultation response

EDUCATIONAL TECHNOLOGY MAKING AUDIO AND VIDEO ACCESSIBLE

Online Speaker Adaptation of an Acoustic Model using Face Recognition

Language Volunteer Guide


Panopto: Captioning for Videos. Automated Speech Recognition for Individual Videos

THE IMPACT OF SUBTITLE DISPLAY RATE ON ENJOYMENT UNDER NORMAL TELEVISION VIEWING CONDITIONS

Recommended Practices for Closed Captioning Quality Compliance

Apples and oranges? A Comparative Study of Accuracy in Live Subtitling: The NER Model. Pablo Romero-Fresco Juan Martinez

Interact-AS. Use handwriting, typing and/or speech input. The most recently spoken phrase is shown in the top box

Measuring live subtitling quality. Results from the fourth sampling exercise

Broadband Wireless Access and Applications Center (BWAC) CUA Site Planning Workshop

Overview 6/27/16. Rationale for Real-time Text in the Classroom. What is Real-Time Text?

Captioning Your Video Using YouTube Online Accessibility Series

Image Captioning using Reinforcement Learning. Presentation by: Samarth Gupta

User Guide V: 3.0, August 2017

THE MGB-2 CHALLENGE: ARABIC MULTI-DIALECT BROADCAST MEDIA RECOGNITION

Video Captioning Basics

arxiv: v1 [stat.ml] 23 Jan 2017

EDF Reply to Ofcom Proposals for Access Services on Non-Domestic Channels

Speech as HCI. HCI Lecture 11. Human Communication by Speech. Speech as HCI(cont. 2) Guest lecture: Speech Interfaces

Quality in Respeaking: The Reception of Respoken Subtitles

DeepASL: Enabling Ubiquitous and Non-Intrusive Word and Sentence-Level Sign Language Translation

Background Information

DTIC ECTE ( 1. Final Technical Report on Phase I SBIR Study on "Semi-Automated Speech Transcription Systems at Dragon Systems

Appendix I Teaching outcomes of the degree programme (art. 1.3)

Requirements for Maintaining Web Access for Hearing-Impaired Individuals

Assistive Technology for Regular Curriculum for Hearing Impaired

Captions on Holodeck: Exploring Quality Implications of Augmented Reality

Signing for the Deaf using Virtual Humans. Ian Marshall Mike Lincoln J.A. Bangham S.J.Cox (UEA) M. Tutt M.Wells (TeleVirtual, Norwich)

A Comparison of the Evaluation of the Victorian Deaf Education Institute Real-time Captioning and C-Print Projects

London Medical Imaging & Artificial Intelligence Centre for Value-Based Healthcare. Professor Reza Razavi Centre Director

Full Utilization of Closed-captions in Broadcast News Recognition

Speech to Text Wireless Converter

Advanced Audio Interface for Phonetic Speech. Recognition in a High Noise Environment

Real-Time Captioning by Groups of Non-Experts

Errol Davis Director of Research and Development Sound Linked Data Inc. Erik Arisholm Lead Engineer Sound Linked Data Inc.

LOOK, LISTEN, AND DECODE: MULTIMODAL SPEECH RECOGNITION WITH IMAGES. Felix Sun, David Harwath, and James Glass

Review of SPRACH/Thisl meetings Cambridge UK, 1998sep03/04

Audiovisual to Sign Language Translator

Video Captioning Workflow and Style Guide Overview

Domain Adversarial Training for Accented Speech Recognition

Ameliorating the quality issues in live subtitling

Florida Standards Assessments

In this presentation we are going to look at how captions make video accessible and why simply having words on the screen isn t good enough.

SCRIPT FOR PODCAST ON DIGITAL HEARING AIDS STEPHANIE COLANGELO AND SARA RUSSO

A Guide to Theatre Access: Marketing for captioning

Video Captioning Using YouTube. Andi Dietrich May 8th, 2018

Recognition of sign language gestures using neural networks

Thrive Hearing Control Application

Fabricating Reality Through Language

General Soundtrack Analysis

-SQA-SCOTTISH QUALIFICATIONS AUTHORITY NATIONAL CERTIFICATE MODULE: UNIT SPECIFICATION GENERAL INFORMATION Session

Gesture Recognition using Marathi/Hindi Alphabet

Mr Chris Chapman ACMA Chair The ACMA PO Box Q500 Queen Victoria Building NSW July Dear Mr Chapman

Speech Technology at Work IFS

Introduction to Audiovisual Translation

Hand Gestures Recognition System for Deaf, Dumb and Blind People

Mayor s Office on Disability

Guidelines for Captioning

EUROPE. Challenges for broadcasters

Access to Internet for Persons with Disabilities and Specific Needs

A C C E S S I B I L I T Y. in the Online Environment

AI Support for Communication Disabilities. Shaun Kane University of Colorado Boulder

Video Transcript Sanjay Podder:

BSLBT RESPONSE TO OFCOM REVIEW OF SIGNING ARRANGEMENTS FOR RELEVANT TV CHANNELS

Innovative Risk and Quality Solutions for Value-Based Care. Company Overview

Appendix C Protocol for the Use of the Scribe Accommodation and for Transcribing Student Responses

Captioning in an Automated Workflow For Transcoding and Delivery

Bioscience in the 21st century

Creating YouTube Captioning

Thrive Hearing Control Application

An Artificial Neural Network Architecture Based on Context Transformations in Cortical Minicolumns

Case Study. Complex Components Demand Innovative Ultrasound Solution

How can the Church accommodate its deaf or hearing impaired members?

Assistive Technologies

A Language Independent Approach to Identify Problematic Conversations in Call Centers

LANGUAGE INTRODUCTION STATEMENT OF COMMITMENT

Integrated Approaches in Diabetes Care The 4 Ds Author: Jochen Maas, General Manager R&D Germany, Date: 24/05/2018. Unlocking Tomorrow s Cures

A Brief Presentation to Intersteno Participants, Prague, July 2007 Keith Vincent

New Approaches to Accessibility. Richard Ladner University of Washington

Accessible Internet Video

IBM Research Report. Low-Cost Call Type Classification for Contact Center Calls Using Partial Transcripts

An exploration of the potential of Automatic Speech Recognition to assist and enable receptive communication in higher education

Video Accessibility in Adobe Flash

BBC Learning English 6 Minute English 2 October 2014 Sleeping on the job

Content. The Origin. What is new in Déjà Vu X3? Transition smoothly from Déjà Vu X2 to. Déjà Vu X3. Help and guidance for my Déjà Vu X3

ACCESSIBILITY FOR THE DISABLED

Mayor s Office on Disability

CS343: Artificial Intelligence

Clinical Research Facilities. Guide to using the NIHR identity

Outline. Teager Energy and Modulation Features for Speech Applications. Dept. of ECE Technical Univ. of Crete

Effective Communication: The ADA and Law Enforcement

Effective Communication: The ADA and Law Enforcement

Linguistic Resources for Meeting Speech Recognition

Transcription:

HOW AI WILL IMPACT SUBTITLE PRODUCTION 1

WHAT IS AI IN A BROADCAST CONTEXT? 2

What is AI in a Broadcast Context? I m sorry, Dave. I m afraid I can t do that. Image By Cryteria [CC BY 3.0 (https://creativecommons.org/licenses/by/3.0)], from Wikimedia Commons 3

What is AI in a Broadcast Context? 4

What is AI in a Broadcast Context? 5

What is AI in a Broadcast Context? AUTOMATIC SPEECH RECOGNITION What words are being said? - Used for Offline Captioning, Live Captioning, QC ALIGNMENT When was this said? - Used to produce timed-text from untimed scripts. DIARISATION / SPEAKER SEGMENTATION Who is speaking? - Used to generate colour changes, new speaker marks MACHINE TRANSLATION How would this be said in another language? - Assisting production of Interlingual Subtitles NATURAL LANGUAGE PROCESSING - Entity tagging - Keyword Extraction - Summarisation 6

What is AI in a Broadcast Context? Most of these use cases are best served by Sequence Models. Recurrent Neural Networks (RNNs), Bidirectional Long Short Term Memory (LSTMS) models, and Seq2Seq with attention are currently cutting edge, but algorithms involved are evolving quickly. Basic RNN Algorithm, unfolded Image By François Deloche - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=60109157 7

AUTOMATIC SPEECH RECOGNITION Evaluation & Integration 8

Automatic Speech Recognition 9

Evaluation and Integration WHY EVALUATE? - Enormous number of ASR solutions and potential configurations - Need for a method to choose most appropriate solution for a given problem - Need for an evaluation method which is quick, comparable, replicable, low cost. EVALUATION METHODS - A range of test material covering variety of dialects, subjects, acoustic environments - High quality verbatim transcripts for all test material - Automated solution which: - Submits material to variety of ASR Engines - Normalises output for comparability - Compares with Verbatim transcript in order to generate an accuracy score 10

Engine Comparison by Programme Type and Accent 11

Variation in Performance He's on detention organised the cupboard we'll. Carry on with your work please. So what am I meant to do Jordan. I meant to had a baby in the school because because what are you you do. don't you tobacco. Don't. Forget. 12

Variation in Performance Welcome to The Great Fire of London uncovering the truths behind the most terrible blaze in British history. We're following in the footsteps of the flames witnessing the devastation the fire unleashed on the capital. 350 years ago. The flames raged for four days and nights through the streets behind us. The blaze burned down nearly all the buildings within the city walls. 13

Evaluation and Integration GENERAL CONSIDERATIONS: - Automatic Speech Recognition is not Automatic Subtitling - need to convert timed text in order to apply regulatory standards, best practice and formatting/encoding. - Secure use of ASR Solutions essential in order to protect customer IP and fulfil contractual obligations. - Ability to identify and make use of the most appropriate engine for each bit of media/output. - Apply custom vocabulary and training material as per results of evaluation. - Use production staff s knowledge and expertise, and include them in planning and experimentation. 14

Evaluation and Integration OFFLINE/FILE SUBTITLING - Significant QC will be required for all mainstream broadcast material. - Most material still quicker to produce without ASR need to be able to identify which bits. LIVE SUBTITLING - Output varies significantly from respoken subtitling: - No summarising, no cueing, no post-corrections. - Accuracy closing in on respeaking for ideal material (single slow speaker, constrained vocabulary) but significantly worse for other material. - Most appropriate for DR or low-profile online material for now. 15

AI and Subtitling

Automatic Speech Recognition Use of Context Trigrams LTSMs Extremely long-range context Full Comprehension LSTM Speaker coverage Speaker dependent Dialect dependent Multi-dialect Comprehensive language coverage Past Current Future - 5 years Some day

Automatic Speech Recognition Automated Wide use Live Respeaking subtitling outside of All Live Output Subtitling and Cueing for DR and high profile Automated low-profile material File/Offline 30% Assisted ASR LSTM 60% Assisted ASR Fully Automated increasingly common All offline subtitling fully automated Current 1-2 Years 3-6 Years Some day? 18