Seamless Audio Splicing for ISO/IEC Transport Streams

Similar documents
Jitter-aware time-frequency resource allocation and packing algorithm

CEA Technical Report. Closed Captioning in IP-delivered Video Programming CEA-TR-3

40 GbE and 100 GbE PCS Considerations November 2007 Update

Complete a large project that embodies the major course topics Project should be simple but expandable The project should include:

40 GbE and 100 GbE PCS Considerations Key Questions to be Answered concerning OTN mapping for MLD (CTBI) architecture

Outline. Teager Energy and Modulation Features for Speech Applications. Dept. of ECE Technical Univ. of Crete

Complete a large project that embodies the major course topics Project should be simple but expandable The project should include:

Two Themes. MobileASL: Making Cell Phones Accessible to the Deaf Community. Our goal: Challenges: Current Technology for Deaf People (text) ASL

Main Study: Summer Methods. Design

TOWARDS CERTIFICATION OF 3D VIDEO QUALITY ASSESSMENT

A Radioteletype Over-Sampling Software Decoder for Amateur Radio

A model of parallel time estimation

Saliency Inspired Modeling of Packet-loss Visibility in Decoded Videos

MAC Sleep Mode Control Considering Downlink Traffic Pattern and Mobility

Course Syllabus. Operating Systems, Spring 2016, Meni Adler, Danny Hendler and Amnon Meisels 1 3/14/2016

TTY/TDD Minimum Performance Specification

How the LIVELab helped us evaluate the benefits of SpeechPro with Spatial Awareness

Mark J. Anderson, Patrick J. Whitcomb Stat-Ease, Inc., Minneapolis, MN USA

Fundamental Clinical Trial Design

Image Enhancement and Compression using Edge Detection Technique

Content Scope & Sequence

Power Management for Networks to Reduce Energy Consumption

EVRC TTY/TDD Extension

South Australian Research and Development Institute. Positive lot sampling for E. coli O157

SNJB College of Engineering Department of Computer Engineering

A Brief Introduction to Bayesian Statistics

ANALYSIS OF MOTION-BASED CODING FOR SIGN LANGUAGE VIDEO COMMUNICATION APPLICATIONS

To open a CMA file > Download and Save file Start CMA Open file from within CMA

Ergonomic Test of the Kinesis Contoured Keyboard

Robust Optimization accounting for Uncertainties

SUPPLEMENTAL MATERIAL

Tier 3 and 4 healthy weight and obesity services in Kent

MCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and. Lord Equating Methods 1,2

Set Up SOS Video Chat and Screen-Sharing

Discrimination and Generalization in Pattern Categorization: A Case for Elemental Associative Learning

Internet Hub WLAN/3G/4G. Internet Cloud. Figure 1: U-Healthcare system overview

ERA: Architectures for Inference

Discovering Meaningful Cut-points to Predict High HbA1c Variation

! Can hear whistle? ! Where are we on course map? ! What we did in lab last week. ! Psychoacoustics

Statistical analysis DIANA SAPLACAN 2017 * SLIDES ADAPTED BASED ON LECTURE NOTES BY ALMA LEORA CULEN

Level 2 Basics Workshop Healthcare

Tackling Random Blind Spots with Strategy-Driven Stimulus Generation

Optimal control of an emergency room triage and treatment process

Placebo and Belief Effects: Optimal Design for Randomized Trials

Alternative splicing. Biosciences 741: Genomics Fall, 2013 Week 6

AUG Expert responses to issues raised by British Gas during the query period for the first draft 2017/18 AUG Statement, 14 March 2017.

Color Repeatability of Spot Color Printing

Types of questions. You need to know. Short question. Short question. Measurement Scale: Ordinal Scale

DC Unbalance Tolerance Testing in PSE s

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1

Supporting Information

CHAPTER 4 RESULTS. In this chapter the results of the empirical research are reported and discussed in the following order:

Framework for Comparative Research on Relational Information Displays

Proceedings of Meetings on Acoustics

Operational Flexibility

Duplexing and Scheduling for 5G Systems

Observational studies; descriptive statistics

Feasibility Evaluation of a Novel Ultrasonic Method for Prosthetic Control ECE-492/3 Senior Design Project Fall 2011

CAS Seminar - Spiking Neurons Network (SNN) Jakob Kemi ( )

Supplemental Figure S1. Expression of Cirbp mrna in mouse tissues and NIH3T3 cells.

A Decision-Theoretic Approach to Evaluating Posterior Probabilities of Mental Models

Yunhyoung Kim, KBS (Korean Broadcasting System) KBS`

The Economics of Obesity

A Spreadsheet for Deriving a Confidence Interval, Mechanistic Inference and Clinical Inference from a P Value

Continuous/Discrete Non Parametric Bayesian Belief Nets with UNICORN and UNINET

CHAPTER 3. Methodology

For all of the following, you will have to use this website to determine the answers:

19th AWCBR (Australian Winter Conference on Brain Research), 2001, Queenstown, AU

Robust Neural Encoding of Speech in Human Auditory Cortex

Dutch Multimodal Corpus for Speech Recognition

A Brief Introduction to Queuing Theory

Constrained Keys for Invertible Pseudorandom Functions. Dan Boneh, Sam Kim, and David J. Wu Stanford University

Incorporation of Imaging-Based Functional Assessment Procedures into the DICOM Standard Draft version 0.1 7/27/2011

Chapter IV. Information Transfer Rate

Perceptual-Based Objective Picture Quality Measurements

Conditional spectrum-based ground motion selection. Part II: Intensity-based assessments and evaluation of alternative target spectra

Issues Surrounding the Normalization and Standardisation of Skin Conductance Responses (SCRs).

Quick Start Training

PROVIDER CONTRACT ISSUES

Title. Author(s)Aoki, Naofumi. Issue Date Doc URL. Type. Note. File Information. A Lossless Steganography Technique for G.

Human-Centered Approach Evaluating Mobile Sign Language Video Communication

USING STATCRUNCH TO CONSTRUCT CONFIDENCE INTERVALS and CALCULATE SAMPLE SIZE

Evaluation of CBT for increasing threat detection performance in X-ray screening

Evaluation of CBT for increasing threat detection performance in X-ray screening

AT THE MARKET (PART II)

NEW AUTOMATED PERIMETERS NEW. Fast and precise perimetry at your fingertips. ZETA strategy EyeSee recording DPA analysis

Contour Diabetes app User Guide

Chapter 1: Introduction to Statistics

5 REASONS TO CHOOSE THE E-LINK 1000EXR OVER COMPETING 70/80 GIGABIT WIRELESS SOLUTIONS

Natural Scene Statistics and Perception. W.S. Geisler

The reality.1. Project IT89, Ravens Advanced Progressive Matrices Correlation: r = -.52, N = 76, 99% normal bivariate confidence ellipse

FROM H.264 TO HEVC: CODING GAIN PREDICTED BY OBJECTIVE VIDEO QUALITY ASSESSMENT MODELS. Kai Zeng, Abdul Rehman, Jiheng Wang and Zhou Wang

Calibration Standards

MITOCW conditional_probability

Chapter 1. Picturing Distributions with Graphs

To open a CMA file > Download and Save file Start CMA Open file from within CMA

Minimizing Uncertainty in Property Casualty Loss Reserve Estimates Chris G. Gross, ACAS, MAAA

Previously, when making inferences about the population mean,, we were assuming the following simple conditions:

Transcription:

Seamless Audio Splicing for ISO/IEC 13818 Transport Streams A New Framework for Audio Elementary Stream Tailoring and Modeling Seyfullah Halit Oguz, Ph.D. and Sorin Faibish EMC Corporation Media Solutions Group Engineering Page. 1

EMC Media Solutions Group Profile The EMC Media Solutions Group is part of EMC Engineering EMC Engineering will spend well over $1 Billion Dollars on Research and Development in FY 2001. Over 300 engineers concentrating on Celerra Server development and Rich Media products. The Media Solutions Group Lab has in excess of $300 Million dollars of hardware for development of Rich Media solutions. Page. 2

Mission Statement The EMC Media Solutions Group is tasked with: Deploying EMC Products into the Rich Media market. Develop products and solutions which meet the requirements of Rich Media customers. Developing partnerships with key companies to develop and deploy customer solutions. Provide Professional Services in the Rich Media environment. Page. 3

Outline Splicing in brief Objective Problem description Basic algorithm A model for the audio elementary streams Enhanced algorithm Additional implementation details Conclusions Page. 4

Splicing Splicing is the act of switching from one MPEG-2 program (embedded in a transport stream) to another MPEG-2 program (again embedded in a transport stream). Commercial insertion, camera or content switching and content editing all require splicing to be performed on compressed bit-streams. The structure of the compressed data makes a seamless splicing algorithm be far from trivial. Page. 5

Objective A generic method to process the audio elementary streams during the splicing of ITU-T Rec. H.222.0 ISO/IEC 13818-1 transport streams (TS) to achieve a seamless audio splice. * generic : No constraining assumptions are made about signal formats (e.g. the video frame rate (PAL, NTSC), the audio sampling frequency), or various encoding parameters (e.g. the audio bit rate, the layer of audio encoding algorithm employed). * audio elementary streams. Current focus will be on the audio elementary streams. Ultimately audio and video splicing should be considered jointly. * transport streams. Achieve audio elementary stream splicing directly on transport streams with lowest possible complexity. Page. 6

Definitions and Notation Encoded Data Domain Decoded Data Domain Audio Access Unit AAU (Audio Frame) (576 bytes) * Video Access Unit VAU (variable size) Audio Decoder Video Decoder Audio Presentation Unit, APU, (a block of contiguous audio samples) (24 ms) * Video Presentation Unit, VPU, (a video frame) (1/29.97 s) ** * (ISO/IEC 11172-3 Layer-II Audio coding with sampling frequency 48kHz and audio bitrate 192kbits/s assumed only for illustrative purposes) ** (NTSC frame rate assumed only for illustrative purposes) Page. 7

AT THE BEGINNING VPU and APU alignment Start End S E S E S Start VPU 0 VPU 1 VPU 2 APU 0 (1 / 29.97) seconds End S E S E S APU 1 APU 2 0.024 seconds ~= 0.03337 seconds The start of a VPU will be aligned with the start of an APU possibly at the beginning of a stream and then only at multiples of 5 minutes increments in time. This implies that later they will not be aligned again for all practical purposes. E S E S ES E S E S E S VPU (k-2) VPU (k-1) VPU k VPU (k+1) VPU (k+2) LATER E S E S E S E S E S E S E S APU (j-2) APU (j-1) APU j APU (j+1) APU (j+2) APU (j+3) Page. 8

The setting for splicing Ending stream E S E S E S E S E S E S VPU (k-2) VPU (k-1) VPU k VPU (k+1) VPU (k+2) time base #1 E S E S E S E S E S E S E S APU (j-2) APU (j-1) APU j APU (j+1) APU (j+2) APU (j+3) Splicing point is naturally defined with respect to VPUs. E S E S E S E S E S E S VPU (n-2) VPU (n-1) VPU n VPU (n+1) VPU (n+2) time base #2 E S E S E S E S E S E S E S APU (m-2) APU (m-1) APU m APU (m+1) APU (m+2) APU (m+3) Beginning stream Page. 9

Audio processing at splicing Time base of the beginning stream is shifted to achieve video presentation continuity. E S E S E S E S E S E S VPU (k-2) VPU (k-1) VPU k VPU (k+1) VPU (k+2) E S E S E S E S E S E S E S APU (j-2) APU (j-1) APU j APU (j+1) APU (j+2) APU (j+3) E S E S E S E S E S E S E S APU (m-3) APU (m-2) APU (m-1) APU m APU (m+1) APU (m+2) APUs are available only through the decoding of their corresponding AAUs. Fractional (i.e. truncated) AAUs in the encoded data domain are useless. Page. 10

So far Decoding, time domain editing and re-encoding. High computational complexity. Gaps in the audio stream. Audio mutes, uncontrolled audio-visual skew. Overlaps in the scopes of APUs. Uncontrolled audio-visual skew, inconsistent ES structure. Page. 11

Observations Audio truncation should always be done at AAU boundaries i.e. no fractional AAUs! Audio truncation for the ending stream should be done with respect to the end of its last VPU s presentation interval. Audio truncation for the beginning stream should be done relative to the beginning of its first VPU s presentation interval. BEST ALIGNED APUs Page. 12

Best aligned APUs Best aligned final APU APU (j+1) short long 12 msec. 12 msec. The APU of the ending stream whose presentation interval ends within the identified 24 ms interval is called the best aligned final APU. VPU (k-1) VPU k VPU (k+1) VPU (k+2) 12 msec. 12 msec. The APU of the beginning stream whose presentation interval starts within the identified 24 ms interval is called the best aligned initial APU. long short APU m Best aligned initial APU There is a comprehensive list of 8 possible cases that can be identified regarding the alignment of ending and beginning audio streams based on the above definitions. Page. 13

How to make use of best aligned APUs ACTION Truncate the ending audio stream at the end of the best aligned final APU. Start the beginning audio stream at the beginning of the best aligned initial APU. Re-stamp the audio PTSs of the beginning stream to generate an immediate continuation of the ending audio stream. REQUIRED PROCESSING AT ELEMENTARY STREAM LEVEL In the audio PES packet carrying the best aligned final APU - truncate after the AAU associated with the best aligned final APU, - modify the PES packet size information accordingly. In the audio PES packet carrying the best aligned initial APU - delete the AAU data preceding the AAU associated with the best aligned initial APU, - modify the PES packet size information accordingly. Modify the PTS values associated with the first and all consequent audio PES packets accordingly. Page. 14

Case 6) b. a. final APU long, b. a. initial APU short and 0 msec. < audio overlap < 12 msec. Best aligned final APU APU (j+1) APU (j+2) 12 msec. 12 msec. VPU (k-1) VPU k VPU (k+1) VPU (k+2) 12 msec. 12 msec. APU (m-1) APU m Best aligned initial APU (m) (m+1) SOLUTION: APU (j+1) APU (j+2) APU (j+3) A/V skew of at most 12 msec. VPU (k-1) VPU k VPU (k+1) VPU (k+2) Page. 15

Minimal Achievable Skew Algorithm Immediately applicable to 6 out of the 8 possible best aligned APU relative position classes. In the remaining 2 classes of relative position, a slight modification to the proposed algorithm is needed to achieve an A/V skew bounded by half APU duration. Page. 16

Case 1) Both best aligned APUs are short and 12 msec. < audio gap < 24 msec. Best aligned final APU APU j+1 APU (j+2) 12 msec. 12 msec. VPU (k-1) VPU k VPU (k+1) VPU (k+2) 12 msec. 12 msec. APU (m-1) APU m Best aligned initial APU (m) SOLUTION (a): APU (j+1) APU (j+2) APU (j+3) A/V skew of at most 12 msec. VPU (k-1) VPU k VPU (k+1) VPU (k+2) (m-1) A/V skew of at most 12 msec. (m) SOLUTION (b): APU (j+1) APU (j+2) APU (j+3) Page. 17

Facts - I An audio elementary stream construction with no holes and no audio PTS discontinuity is possible. As a consequence, an A/V skew of magnitude at most half APU duration will be induced in the beginning stream. This is below the sensitivity limits of human perception. The proposed algorithm can be repeatedly applied an arbitrary number of times with neither a failure to meet its structural assumptions nor a degradation in its promised A/V skew performance. Page. 18

Facts - II The A/V skews induced as a result of the proposed processing do not accumulate, i.e. irrespective of the number of consecutive splices the worst A/V skew at any point in time will be half of the APU duration. At each splice point, at the termination of the PUs of the ending stream, we always have total audio and video presentation durations up to that point almost matching each other, i.e. video_duration - audio_duration < = (1/2) APU_duration i.e. always correct amount of audio data provided. Page. 19

Facts - III The resulting audio stream is error-free and totally ISO/IEC 11172-3 and ISO/IEC 13818-3 compliant. Page. 20

Implementation at TS level TS level implementation necessitates further considerations: 1) Editing of transport packets 2) Transport packet buffering and re-multiplexing 3) Audio buffer management 4) Meta-data generation Page. 21

Audio buffer management We have to control the dynamic behavior of the audio buffer during the transient induced by the splicing as well as consequent to splicing. We need a simple model to characterize the audio elementary stream within a TS, such that 1) model parameters should be easy to estimate, 2) model should accurately provide the following desired information a) the mean audio buffer fullness, b) the extent of buffer fullness variation around its mean value. Page. 22

Audio elementary stream model Audio elementary stream will be modeled on the basis of AAUs which determines its granularity with respect to audio decoder actions. The proposed framework for the model is an arrival process to a FIFO queue with a deterministic service time. p j p (j+1) p (j+2) p (j+3) a j a a a (j+1) (j+2) (j+3) Page. 23 a j : arrival time of j th AAU p j : presentation time of j th AAU service time 0.024 sec.

Audio elementary stream model The presentation times p j can be easily computed for each AAU. The arrival times a j however are not uniquely defined since each AAU arrives in a distributed fashion owing to transport packet encapsulation. Solution: Use weighted-average arrival times. Page. 24

Audio elementary stream model Weighted-average arrival time is defined for each AAU as follows: TS PACKET TYPE VISUALIZATION PLOT TS packet types (color coded) 0.65 0.6 0.55 0.5 0.45 850 900 950 1000 1050 TS packet indices Page. 25 (TS packet arrival time) * (fraction of the j th a j = AAU in the payload data) all TS packets whose payloads have some of the Note: TS packet arrival time is defined with respect to th j AAU data packet s 94th byte through a PCR extrapolation process.

Audio elementary stream model Based on these definitions we let w = p - a j j j where w is the waiting time for j AAU in the audio buffer. j The mean and variance of w j as a random variable w, provide very important information about the audio elementary stream within the TS multiplex and hence the audio buffer. th Page. 26

Specifically, Audio elementary stream model E[w].(audio bitrate) = mean audio buffer fullness σ w.(audio bitrate) = a measure of the variation in the audio buffer fullness around its mean value E[.]: expectation operation. σ : standard deviation. Page. 27

Audio elementary stream model Example 1. Encoder B characteristics: - Long mean waiting time - Highly irregular, bursty arrival regime Page. 28

Example 1 1 TS PACKET TYPE VISUALIZATION PLOT Interpolated play-out times for individual TS packets, [seconds] TS packet types (color coded) 0.8 0.6 0.4 0.2 7.9442 7.9441 7.9441 0 6500 6600 6700 6800 6900 7000 7100 7200 7300 7400 7500 TS packet indices 7.9443 x 104 INTERLEAVED AUDIO-VIDEO ALIGNMENT IN THE BITSTREAM WRT PTS VALUES 7.944 0 5000 10000 15000 TS packet indices Page. 29

Example 1 110 100 90 AUDIO BUFFER ANALYSIS Predicted mean audio buffer fullness: 15585.8 bits = 54.36 %. 2 STD interval around the mean: 4504.2 bits = 15.71 %. 70 Audio buffer level [% of max] 60 50 40 30 20 10 0-10 0 5 10 15 20 25 30 Time [seconds] Page. 30

An important observation PTS re-stamping in the beginning audio stream The waiting times of AAUs will be modified in the beginning stream. The mean waiting time of AAUs in the beginning stream will change (decrease or increase) by at most half APU duration. A corresponding change in the mean audio buffer fullness level for the beginning stream will be induced. Page. 31

Improvement to the proposed processing For audio elementary streams structured with mean audio buffer fullness level bounded away from both underflows and overflows, the already proposed methodology with the minimal achievable A/V skew is the best to choose. For audio elementary streams with mean audio buffer fullness level close to either underflow or overflow states, the requirement of minimal resultant A/V skew should be relaxed and instead of best aligned APUs those APUs introducing the minimal safe A/V skew should be employed. Page. 32

Outline of the improvement - I Let the type of A/V skew in which the audio signal component is delayed with respect to its associated video signal, be referred to as the forward (in time) skew. Forward skew is a result of a uniform increment in the beginning stream s AAU presentation time stamps and is therefore associated with an increase in the mean audio buffer fullness level. Hence, for beginning audio elementary streams with mean audio buffer fullness levels close to underflow, the minimal achievable forward skew is the minimal safe skew and should be the one to be employed. Page. 33

Outline of the improvement - II Let the type of A/V skew in which the audio signal component moves to earlier in time with respect to its associated video signal, be referred to as the backward (in time) skew. Backward skew is a result of a uniform decrement in the beginning stream s AAU presentation time stamps and is therefore associated with a decrease in the mean audio buffer fullness level. Hence, for beginning audio elementary streams with mean audio buffer fullness levels close to overflow, the minimal achievable backward skew is the minimal safe skew and should be the one to be employed. Page. 34

Minimal Safe Skew Algorithm Summary For all 8 possible relative position classes of best aligned APUs, three solutions are defined 1. the minimal achievable skew solution 2. the minimal safe forward skew solution 3. the minimal safe backward skew solution Minimal achievable skew is upper-bounded by half APU duration. Minimal safe skew (forward or backward) is upper bounded by one APU duration. (Still below the sensitivity limits of human perception.) Page. 35

Case 1) Both best aligned APUs are short and 12 msec. < audio gap < 24 msec. Best aligned final APU APU (j+1) APU (j+2) 12 msec. 12 msec. VPU (k-1) VPU k VPU (k+1) VPU (k+2) 12 msec. 12 msec. APU (m-1) APU m Best aligned initial APU Page. 36

(m) SOLUTION I: APU (j+1) APU (j+2) APU (j+3) A/V skew of at most 12 msec. VPU (k-1) VPU k VPU (k+1) VPU (k+2) Minimal achievable skew implementation for clips with originally moderate audio buffer levels. Note: An alternate implementation by omitting APU (j+2) and including APU (m-1) is also possible and this approach achieves exactly the same resultant A/V skew. This latter solution is preferable in terms of ease of implementation. (m) SOLUTION II: APU (j+1) APU (j+2) APU (j+3) A/V skew of at most 12 msec. VPU (k-1) VPU k VPU (k+1) VPU (k+2) Minimal achievable safe (forward) skew implementation for clips with originally low audio buffer levels. Note: An alternate implementation by omitting APU (j+2) and including APU (m-1) is also possible and this approach achieves exactly the same resultant A/V skew. This latter solution is preferable in terms of ease of implementation. (m) (m+1) SOLUTION III: APU (j+1) APU (j+2) APU (j+3) A/V skew of at most 24 msec. VPU (k-1) VPU k VPU (k+1) VPU (k+2) Minimal achievable safe (backward) skew implementation for clips with originally high audio buffer levels. Page. 37

Case 6) b. a. final APU long, b. a. initial APU short and 0 msec. < audio overlap < 12 msec. Best aligned final APU APU (j+1) APU (j+2) 12 msec. 12 msec. VPU k VPU (k+1) VPU (k-1) VPU k VPU (k+1) VPU (k+2) 12 msec. 12 msec. APU (m-1) APU m Best aligned initial APU Page. 38

(m) (m+1) SOLUTION I: APU (j+1) APU (j+2) APU (j+3) A/V skew of at most 12 msec. VPU (k-1) VPU k VPU (k+1) VPU (k+2) Minimal achievable skew implementation for clips with originally moderate audio buffer levels. (m) (m+1) SOLUTION II: APU (j+1) APU (j+2) APU (j+3) A/V skew of at most 12 msec. VPU (k-1) VPU k VPU (k+1) VPU (k+2) Minimal achievable safe (forward) skew implementation for clips with originally low audio buffer levels. (m+1) (m+2) SOLUTION III: APU (j+1) APU (j+2) APU (j+3) A/V skew of at most 24 msec. VPU (k-1) VPU k VPU (k+1) VPU (k+2) Minimal achievable safe (backward) skew implementation for clips with originally high audio buffer levels. Note: An alternate implementation by omitting APU (j+1) and including APU m is also possible and this approach achieves exactly the same resultant A/V skew. This latter solution is preferable in terms of ease of implementation. Page. 39

Implementation at TS level TS level implementation necessitates further considerations: 1) Editing of transport packets 2) Transport packet buffering and re-multiplexing 3) Audio buffer management 4) Metadata generation Page. 40

Editing of transport packets The truncation of the final PES packet of the ending audio stream will typically necessitate the insertion of some adaptation field padding into its last transport packet. The deletion of some AAU data from the front end of the beginning audio stream s first PES packet will typically necessitate the editing of at most two audio transport packets. Possible use of a causal bit-reservoir in Layer III encoding, typically dictates a structural constraint on the beginning audio stream. Page. 41

Transport packet buffering and re-multiplexing In the transport stream the audio bit rate is a constant the VAUs are of varying sizes therefore the relative positions of VAUs and AAUs associated with VPUs and APUs almost aligned in time cannot be maintained constant. Almost always it is the case that the AAUs are significantly delayed with respect to the VAUs for which the decoded representations are almost synchronous. Page. 42

Legend Black: TS video packets initiating I type pictures. Blue: TS video packets initiating P or B type pictures. Cyan: TS video packets. Red: TS audio packets initiating PES packets. Magenta: TS audio packets, Yellow: Null TS packets. Green: TS packets carrying various system information. Page. 43

Example 1. Encoder B. 0.03 TS PACKET TYPE VISUALIZATION PLOT TS packet types (color coded) 0.025 0.02 0.015 0.01 0.005 Page. 44 Interpolated play-out times for individual TS packets, [seconds] 0 x 10 4 7.9442 7.9442 7.9442 7.9442 7.9442 7.9442 7.9442 7.9442 9800 9805 9810 9815 9820 9825 TS packet indices INTERLEAVED AUDIO-VIDEO ALIGNMENT IN THE BITSTREAM WRT PTS VALUES 0.95 1 1.05 1.1 1.15 TS packet index TS packet indices TS packet index 9812 11005 x 10 4

Transport packet buffering and re-multiplexing This TS multiplex structure necessitates 1) locating and temporarily storing (buffering) the delayed audio packets when the ending stream is truncated based on the last VAU, 2) TS re-multiplexing in the form of a) deletion of some obsolete green audio packets in the beginning stream, b) insertion of the above blue audio packets into the beginning stream. Page. 45

Metadata Generation During the ingest of MPEG-2 transport streams to storage systems, 1. estimate the parameters of the proposed audio elementary stream model, 2. record this descriptive information within the metadata associated with the asset. In splicing scenarios with live streams, known characteristics and/or settings of the encoder employed can be used. Page. 46

Conclusions Audio elementary stream tailoring based on minimal achievable or minimal safe A/V skew concepts. Audio splicing without any artifacts made possible. A simple and efficient model to characterize the audio elementary stream structure embedded in the transport stream. (Prediction and possible control of audio buffer behavior within reach.) Audio splicing cannot be considered independently from the video splicing. Page. 47

Page. 48