CEA Technical Report Closed Captioning in IP-delivered Video Programming CEA-TR-3 February 2013
NOTICE Consumer Electronics Association (CEA ) Standards, Bulletins and other technical publications are designed to serve the public interest through eliminating misunderstandings between manufacturers and purchasers, facilitating interchangeability and improvement of products, and assisting the purchaser in selecting and obtaining with minimum delay the proper product for his particular need. Existence of such Standards, Bulletins and other technical publications shall not in any respect preclude any member or nonmember of CEA from manufacturing or selling products not conforming to such Standards, Bulletins or other technical publications, nor shall the existence of such Standards, Bulletins and other technical publications preclude their voluntary use by those other than CEA members, whether the standard is to be used either domestically or internationally. Standards, Bulletins and other technical publications are adopted by CEA in accordance with the American National Standards Institute (ANSI) patent policy. By such action, CEA does not assume any liability to any patent owner, nor does it assume any obligation whatever to parties adopting the Standard, Bulletin or other technical publication. This document does not purport to address all safety problems associated with its use or all applicable regulatory requirements. It is the responsibility of the user of this document to establish appropriate safety and health practices and to determine the applicability of regulatory limitations before its use. This document is copyrighted by the Consumer Electronics Association (CEA ) and may not be reproduced, in whole or part, without written permission. Federal copyright law prohibits unauthorized reproduction of this document by any means. Organizations may obtain permission to reproduce a limited number of copies by entering into a license agreement. Requests to reproduce text, data, charts, figures or other material should be made to CEA. (Formulated under the cognizance of the CEA R4.3 Television Data Systems Subcommittee.) Published by CONSUMER ELECTRONICS ASSOCIATION 2013 Technology & Standards Department www.ce.org All rights reserved
CEA Technical Report on Closed Captioning in IP-delivered Video Programming Contents 1. Purpose... 1 2. References... 1 3. Executive Summary... 2 4. Background... 3 5. Technical Requirements Reference File Format... 3 5.1. Audio/Video Format... 4 5.2. Caption Format... 4 5.3. Signaling of SDH Subtitles (Caption Tracks)... 4 6. Technical Requirements Reference Media Player... 5 6.1. SMPTE-TT Feature Subset... 5 6.2. Captioning Functionality... 6 6.3. Caption Synchronization to Video...7 6.4. Progressive Downloads...7
(This page intentionally left blank.)
1. Purpose The purpose of this report is to document a common understanding of an implementation of SMPTE Timed Text (SMPTE ST 2052-1) [9] that meets the FCC s safe harbor provision in 79.103 (c) (11) [7]. This report describes the technical details of a media player capable of rendering closed captioning from file-based content when the captioning is delivered via IP protocols (e.g. via the Internet), for synchronized presentation with audio and video. 2. References The following standards, recommended practices and other documents are referenced in this report. [1] DECE: Common File Format & Media Formats Specification Version 1.0.4, Digital Entertainment Content Ecosystem (DECE) LLC, August 2012. http://www.uvvu.com/uv-for-business.php [2] DECE: Content Metadata Specification Version 1.0.4, Digital Entertainment Content Ecosystem (DECE) LLC, August 2012. http://www.uvvu.com/uv-for-business.php [3] DECE: Content Publishing Specification Version 1.0.4, Digital Entertainment Content Ecosystem (DECE) LLC, August 2012. http://www.uvvu.com/uv-for-business.php [4] DECE: Device Specification Version 1.0.4, Digital Entertainment Content Ecosystem (DECE) LLC, August 2012. http://www.uvvu.com/uv-for-business.php [5] FCC Report and Order 12-9, Closed Captioning of Internet Protocol-Delivered Video Programming: Implementation of the Twenty-First Century Communications and Video Accessibility Act of 2010, released January 2012, Federal Communications Commission. http://transition.fcc.gov/daily_releases/daily_business/2012/db0130/fcc-12-9a1.pdf [6] US 47 CFR 79.102 Closed caption decoder requirements for digital television receivers and converter boxes. [7] US 47 CFR 79.103 Closed Captioning and Video Description of Video Programming. http://www.gpo.gov/fdsys/pkg/fr-2012-03-30/pdf/2012-7247.pdf [8] Movielabs: Common Metadata md namespace, Version 1.2d, September 2012, Motion Picture Laboratories, Inc. http://www.movielabs.com/md/md/v1.2/common%20metadata%20v1.2d.pdf [9].SMPTE ST 2052-1:2010, Timed Text Format (SMPTE-TT), Society of Motion Picture and Television Engineers, December 2010. [10] SMPTE RP 2052-10:2010, Conversion from CEA-608 Caption Data to SMPTE-TT, Society of Motion Picture and Television Engineers, November 2010. 1
[11] Draft SMPTE RP 2052-11:201x, Conversion from CEA-708 Caption Data to SMPTE- TT, Society of Motion Picture and Television Engineers, <work in progress>. [12] FCC VPAAC: First Report of the Video Programming Accessibility Advisory Committee on the Twenty-First Century Communications and Video Accessibility Act of 2010: Closed Captioning of Video Programming Delivered Using Internet Protocol, Video Programming and Emergency Access Advisory Committee, July 2011, http://transition.fcc.gov/cgb/dro/vpaac/first_vpaac_report_to_the_fcc_7-11- 11_FINAL.pdf. [13] W3C Recommendation, Timed Text Markup Language (TTML) 1.0, 18 November 2010, World Wide Web Consortium. http://www.w3.org/tr/ttaf1-dfxp/ and Editor s Draft TTML 1.0 2nd Edition <work in progress>, http://dvcs.w3.org/hg/ttml/rawfile/tip/ttml10/spec/ttaf1-dfxp.html. [14] Editor s Draft W3C Recommendation, Simple Delivery Profile for Closed Captions (US), World Wide Web Consortium, available at http://dvcs.w3.org/hg/ttml/rawfile/tip/ttml10-sdp-us/overview.html, <work in progress>. 3. Executive Summary This report defines two aspects of a method for delivering closed captions over IP-transported programming. First is the definition of an MP4-based media file format, called the Reference File Format (RFF) in which the file contains video, audio, and captioning in SMPTE-TT format. The second is a specification of technical details of a player capable of rendering the video and subtitles contained in files conforming to the specified format. This player is called a Reference Media Player or RMP. Further, this report describes certain features and capabilities of the RMP, in particular its ability to render closed captioning and to offer the user certain controls over caption display and operation. While a variety of file formats and device implementations may satisfy the FCC rules for support of IP-delivered closed captioning, this report specifies the detailed structure of a particular type of media file in which captioning is conveyed using the SMPTE-TT standard. The Reference File Format is suited to IP transport and Internet delivery, and under some restrictions, files of this format can be structured to allow the receiving device to begin playback prior to download of the complete file. The chosen RFF is based on the DECE Common File Format (CFF) [1] specification, which already specifies many of the needed transport and signaling parameters for the closed captioning component. DECE CFF itself references and builds on a number of other standards, including members of the ISO/IEC 14496 (MPEG) family, specifically the ISO Base Media File Format, on which the MP4 and 3GPP 1 file formats are based, and another DECE standard called the Content Metadata Specification [2]. Guidelines for publishing are found in the DECE Content Publishing Specification [3], and guidelines for device behavior are found in the DECE Device Specification [4]. 1 See http://3gpp.org 2
This report defines two RMP profiles. One handles standard-definition video formats and is called the RMP SD Profile, while the second handles high-definition video formats and is called the RMP HD Profile. The two are identical except for the constraints on the video encoding. An RMP may be built that can handle both profiles. In addition to specifying the RFF and RMP profiles, this report also describes the signaling methods in the RFF s metadata by which the player identifies and locates the caption tracks. Finally, this report describes in detail the CEA-608/CEA-708 constrained subset of features and capabilities of SMPTE-TT that should be supported by the RMP. 4. Background Following receipt of the IP Captioning report [12] from the Video Programming Access Advisory Committee (VPAAC), the FCC issued a Report and Order on IP Closed Captioning [5]. The Order adopts SMPTE Timed Text (TT) [9] as a safe harbor interchange and delivery format. The safe harbor provision is interpreted herein as meaning that if a receiver has implemented SMPTE- TT, it has satisfied its obligations under the IP Closed Captioning rules. However, SMPTE-TT does not describe a signaling/discovery/carriage mechanism, so even if a receiver has SMPTE-TT decode capabilities, there is not a standard mechanism for their carriage. There is a need for guidelines that describe how a receiver should expect to discover, extract, decode, and present the SMPTE-TT content. In fact, the VPAAC WG1 report to the FCC [12] anticipated ongoing industry activity to define the method whereby SMPTE-TT is encapsulated in a file wrapper This Technical Report represents an industry effort towards a common solution before multiple, competing technical solutions gain traction, fragment the market and complicate implementation. Methods for signaling and transport of the SMPTE-TT essence in an MP4 file wrapper were studied. The Ultraviolet (DECE) Common File Format, which is based on the well-established MP4 format, and is appropriate for IP-delivered video programming, was considered since it defines an encapsulation of SMPTE-TT. In addition, the VPAAC SMPTE-TT recommendation was considered. The VPAAC recommended use of SMPTE RP 2052-10-2010 [10], constrained to support the limited capabilities of CEA-608 and CEA-708. 5. Technical Requirements Reference File Format This section describes the specifications of the RFF which contain SMPTE-TT captions. The Reference Media Player described here is capable of rendering audio, video and closed captioning when given an RFF as input. The RFF has the following characteristics: File structure conforming to the DECE Common File Format (CFF) [1] Section 2; 3
Content is unencrypted; Audio, video, and subtitles conforming to either the SD or HD Media Profile definitions (CFF [1] Annex B or Annex C). Video is encoded as AVC; audio is encoded as MPEG-4 AAC (2-channel); Subtitle encoding conforming to CFF Text Subtitle Profile defined in specification (CFF) [1] Section 6), with other limitations as specified below; Captions are transported as one or more separate subtitle tracks labeled as Type SDH subtitles for deaf and hard-of-hearing in the required metadata section of the MP4 file. 5.1. Audio/Video Format As mentioned, two RMP profiles are defined here. Both consist of AVC coded to the constraints of the main body of CFF [1]. Two profiles are described. The RMP SD Profile is constrained per Annex B of DECE CFF [1], SD Media Profile Definition, while the RMP HD Profile is constrained per Annex C of DECE CFF [1], HD Media Profile Definition. Other than that distinction, the two profiles are identical. The required audio in both cases is 2-channel AAC with a maximum bit rate of 192 kbps and a sample rate of 48 khz. 5.2. Caption Format The caption format conforms to the CFF Text Subtitle Profile specified in CFF [1] Section 6, which specifies the coordinate system, track structure, and data structures to implement SMPTE TT [9]. The caption content in the file is required to be formatted to be decodable on a player built according to the CFF-TT Text subtitle hypothetical render model described in CFF [1] Section 6.5.4. 5.3. Signaling of SDH Subtitles (Caption Tracks) The CFF [1] specification describes subtitles and indicates that subtitle tracks can carry normal captions. 2 The signaling method whereby a subtitle track is identified to be a closed caption track is not defined in the CFF document but can be found by following a trail of normative references as follows. DECE Content Metadata Specification (CMS) [2] requires each track in the Container to be described by an XML type called md:digitalassetmetadata-type that is specified in the Movielabs Common Metadata [8] document in Section 5.2.1. The Movielabs specification [8] defines the md namespace. The description of each track is given by instances of the Track element. Each track will be Audio, Video, or Subtitle. Metadata describing Subtitle tracks are carried in a data type 2 CFF Sec. 1.7.6. 4
called DigitalAssetSubtitleData-type that is specified in the Movielabs Common Metadata [8] document in Section 5.2.7. Characteristics of each subtitle track are specified via the following elements: Format specifies whether the subtitle track is textual, image-based, or a combination; in the present application the value will be Text ; Type specifies whether the subtitle track is a regular subtitle track (specified as normal ), or is SDH subtitles for deaf and hard-of-hearing, otherwise known as closed captions; FormatType CMS [2] specifies that this field is set to SMPTE 2052-1 Timed Text ; and Language specifies the language of the subtitle track as an instance of xs:language. 6. Technical Requirements Reference Media Player This section establishes the technical requirements of the RMP. The Reference Media Player (either SD or HD Profile) is capable of the following functions when playing back a file conforming to the RFF specification: Decoding and presenting synchronized audio and video; Rendering SDH subtitles over video, time synchronized to decoded video; Offering user controls over the closed caption functions as described in the FCC rules (see Section 6.2 below): Retaining the default caption configuration until changed by the user. 6.1. SMPTE-TT Feature Subset The RMP implements the DECE Text Subtitle Profile (CFF [1] Section 6), which is based on the SMPTE 2052-1 Timed Text [9] specification. To support the DECE Text Subtitle Profile, the RMP: Supports at least four simultaneous active caption windows Need not support overlapping active caption windows 3. In addition, in the RMP support is optional for #data and #information extensions; The RMP need only support rendering of SDH subtitles derived from CEA-608/708 content playable on a FCC minimum decoder as defined in 47 CFR 79.102 [6]. 3 Note that caption windows are referred to as regions in DECE CFF [1]. 5
Optionally, the RMP may support additional capabilities such as: Certain Unicode character codes required by CFF [1], as these are outside the set required to render captions originating from CEA-708 content. SMPTE RP 2052-11 [11] lists the set of Unicode character codes corresponding to defined CEA-708 character codes and can be used for guidance and W3C provides a character code mapping in a document called Simple Delivery Profile for Closed Captions (US) [14] available at http://dvcs.w3.org/hg/ttml/raw-file/tip/ttml10-sdp-us/overview.html; More than four rows of caption text per screen is optional; 6.2. Captioning Functionality The RMP implements certain captioning functionality in accordance with 79.103 (c) [7], which is copied here for reference: (c) Specific technical capabilities. All apparatus subject to this section shall implement the following captioning functionality: (1) Presentation. All apparatus shall implement captioning such that the caption text may be displayed within one or separate caption windows and supporting the following modes: text that appears all at once (pop-on), text that scrolls up as new text appears (roll-up), and text where each new letter or word is displayed as it arrives (paint-on). (2) Character color. All apparatus shall implement captioning such that characters may be displayed in the 64 colors defined in CEA-708 and such that users are provided with the ability to override the authored color for characters and select from a palette of at least 8 colors including: white, black, red, green, blue, yellow, magenta, and cyan. (3) Character opacity. All apparatus shall implement captioning such that users are provided with the ability to vary the opacity of captioned text and select between opaque and semi-transparent opacities. (4) Character size. All apparatus shall implement captioning such that users are provided with the ability to vary the size of captioned text and shall provide a range of such sizes from 50% of the default character size to 200% of the default character size. (5) Fonts. All apparatus shall implement captioning such that fonts are available to implement the eight fonts required by CEA-708 and 79.102(k). Users must be provided with the ability to assign the fonts included on their apparatus as the default font for each of the eight styles contained in 79.102(k). (6) Caption background color and opacity. All apparatus shall implement captioning such that the caption background may be displayed in the 64 colors defined in CEA-708 and such that users are provided with the ability to override the authored color for the caption background and select from a palette of at least 8 colors including: white, black, red, green, blue, yellow, magenta, and cyan. All apparatus shall implement captioning such that users are provided with the ability to vary the opacity 6
of the caption background and select between opaque, semi-transparent, and transparent background opacities. (7) Character edge attributes. All apparatus shall implement captioning such that character edge attributes may be displayed and users are provided the ability to select character edge attributes including: no edge attribute, raised edges, depressed edges, uniform edges, and drop shadowed edges. (8) Caption window color. All apparatus shall implement captioning such that the caption window color may be displayed in the 64 colors defined in CEA-708 and such that users are provided with the ability to override the authored color for the caption window and select from a palette of at least 8 colors including: white, black, red, green, blue, yellow, magenta, and cyan. All apparatus shall implement captioning such that users are provided with the ability to vary the opacity of the caption window and select between opaque, semi-transparent, and transparent background opacities. (9) Language. All apparatus must implement the ability to select between caption tracks in additional languages when such tracks are present and provide the ability for the user to select simplified or reduced captions when such captions are available and identify such a caption track as easy reader. (10) Preview and setting retention. All apparatus must provide the ability for the user to preview default and user selection of the caption features required by this section, and must retain such settings as the default caption configuration until changed by the user. 6.3. Caption Synchronization to Video The RMP supports presentation of captions synchronized with video to a precision of the source video frame. 6.4. Progressive Downloads The RFF may be delivered in a way that allows the RMP to begin decoding and rendering prior to the completion of the download. CFF [1] defines such progressive download mechanisms. If the RMP supports progressive download capability, the captioning functionality described above is applicable. 7
(This page intentionally left blank.) 8
CEA Document Improvement Proposal If in the review or use of this document a potential change is made evident for safety, health or technical reasons, please email your reason/rationale for the recommended change to standards@ce.org. Consumer Electronics Association Technology & Standards Department 1919 S Eads Street, Arlington, VA 22202 FAX: (703) 907-7693 standards@ce.org