HOW AI WILL IMPACT SUBTITLE PRODUCTION

HOW AI WILL IMPACT SUBTITLE PRODUCTION 1

WHAT IS AI IN A BROADCAST CONTEXT? 2

What is AI in a Broadcast Context? I m sorry, Dave. I m afraid I can t do that. Image By Cryteria [CC BY 3.0 (https://creativecommons.org/licenses/by/3.0)], from Wikimedia Commons 3

What is AI in a Broadcast Context? 4

What is AI in a Broadcast Context? 5

What is AI in a Broadcast Context? AUTOMATIC SPEECH RECOGNITION What words are being said? - Used for Offline Captioning, Live Captioning, QC ALIGNMENT When was this said? - Used to produce timed-text from untimed scripts. DIARISATION / SPEAKER SEGMENTATION Who is speaking? - Used to generate colour changes, new speaker marks MACHINE TRANSLATION How would this be said in another language? - Assisting production of Interlingual Subtitles NATURAL LANGUAGE PROCESSING - Entity tagging - Keyword Extraction - Summarisation 6

What is AI in a Broadcast Context? Most of these use cases are best served by Sequence Models. Recurrent Neural Networks (RNNs), Bidirectional Long Short Term Memory (LSTMS) models, and Seq2Seq with attention are currently cutting edge, but algorithms involved are evolving quickly. Basic RNN Algorithm, unfolded Image By François Deloche - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=60109157 7

AUTOMATIC SPEECH RECOGNITION Evaluation & Integration 8

Automatic Speech Recognition 9

Evaluation and Integration WHY EVALUATE? - Enormous number of ASR solutions and potential configurations - Need for a method to choose most appropriate solution for a given problem - Need for an evaluation method which is quick, comparable, replicable, low cost. EVALUATION METHODS - A range of test material covering variety of dialects, subjects, acoustic environments - High quality verbatim transcripts for all test material - Automated solution which: - Submits material to variety of ASR Engines - Normalises output for comparability - Compares with Verbatim transcript in order to generate an accuracy score 10

Engine Comparison by Programme Type and Accent 11

Variation in Performance He's on detention organised the cupboard we'll. Carry on with your work please. So what am I meant to do Jordan. I meant to had a baby in the school because because what are you you do. don't you tobacco. Don't. Forget. 12

Variation in Performance Welcome to The Great Fire of London uncovering the truths behind the most terrible blaze in British history. We're following in the footsteps of the flames witnessing the devastation the fire unleashed on the capital. 350 years ago. The flames raged for four days and nights through the streets behind us. The blaze burned down nearly all the buildings within the city walls. 13

Evaluation and Integration GENERAL CONSIDERATIONS: - Automatic Speech Recognition is not Automatic Subtitling - need to convert timed text in order to apply regulatory standards, best practice and formatting/encoding. - Secure use of ASR Solutions essential in order to protect customer IP and fulfil contractual obligations. - Ability to identify and make use of the most appropriate engine for each bit of media/output. - Apply custom vocabulary and training material as per results of evaluation. - Use production staff s knowledge and expertise, and include them in planning and experimentation. 14

Evaluation and Integration OFFLINE/FILE SUBTITLING - Significant QC will be required for all mainstream broadcast material. - Most material still quicker to produce without ASR need to be able to identify which bits. LIVE SUBTITLING - Output varies significantly from respoken subtitling: - No summarising, no cueing, no post-corrections. - Accuracy closing in on respeaking for ideal material (single slow speaker, constrained vocabulary) but significantly worse for other material. - Most appropriate for DR or low-profile online material for now. 15

AI and Subtitling

Automatic Speech Recognition Use of Context Trigrams LTSMs Extremely long-range context Full Comprehension LSTM Speaker coverage Speaker dependent Dialect dependent Multi-dialect Comprehensive language coverage Past Current Future - 5 years Some day

Automatic Speech Recognition Automated Wide use Live Respeaking subtitling outside of All Live Output Subtitling and Cueing for DR and high profile Automated low-profile material File/Offline 30% Assisted ASR LSTM 60% Assisted ASR Fully Automated increasingly common All offline subtitling fully automated Current 1-2 Years 3-6 Years Some day? 18