This site has been archived

Caption File Formats

There’s been some discussion on the Matterhorn list recently about caption file formats, and I thought it might be useful to describe what we’re doing with file formats for SpokenMedia.

SpokenMedia uses two file formats, our original .wrd files output from the recognition process and Timed Text Markup Language (TTML). We also need to handle two other caption file formats .srt and .sbv.

There is a nice discussion of the YouTube format at SBV file format for Youtube Subtitles and Captions and a link to a web-based tool to convert .srt files to .sbv files.

We’ll cover our implementation of TTML in a separate post.

.wrd: A “time-aligned word transcription” file that is the ouput of SpokenMedia’s speech recognizer output format. This file displays the start time and end time in milliseconds along with the corresponding recognized word. (More Info)

Format:

startTime endTime word

Example:

666 812 i'm
812 1052 walter
1052 1782 lewin
1782 1912 i
1912 2017 will
2017 2192 be
2192 2337 your
2337 2817 lecturer

.srt: SubRip’s caption file format. This file displays the start time and end time in hh:mm:ss,milliseconds separated by a “-->”, along with a corresponding caption number and phrase. (Note the use of commas to separate seconds from milliseconds.) Each caption phrase is separated by a single line. (More Info)

Format:

Caption Number
hh:mm:ss,mmm --> hh:mm:ss,sss
Text of Sub Title (one or more lines, including punctuation and optionally sound effects)
Blank Line

Example:

1
0:00:00,766 --> 0:00:02,033
I'm Walter Lewin.

2
0:00:02,033 --> 0:00:04,766
I will be your lecturer
this term.

.sbv: Google/YouTube’s caption file format. This file format is similar to the .srt format but contains some notable differences in syntax (the use of periods and commas as separators). Additionally both formats support identification of the speaker and other cues like laughter, applause, etc.–but of course both are in slightly different ways.

According to Google (More Info):

We currently support a simple caption format that is compatible with the formats known as SubViewer (*.SUB) and SubRip (*.SRT). Although you can upload your captions/subtitles in any format, only supported formats will be displayed properly on the playback page.

Here’s what a (*.SBV) caption file might look like:

0:00:03.490,0:00:07.430
>> FISHER: All right. So, let's begin.
This session is: Going Social

0:00:07.430,0:00:11.600
with the YouTube APIs. I am
Jeff Fisher,
0:00:11.600,0:00:14.009
and this is Johann Hartmann,
we're presenting today.
0:00:14.009,0:00:15.889
[pause]

Here are also some common captioning practice that help readability:

  • Descriptions inside square brackets like [music] or [laughter] can help people with hearing disabilities to understand what is happening in your video.
  • You can also add tags like >> at the beginning of a new line to identify speakers or change of speaker.

The format can be described by looking at YouTube examples.

Format:

Caption Number
H:MM:SS.000,H:MM:SS.000
Text of Sub Title (one or more lines, including punctuation and optionally sound effects)
Blank Line

Example:

1
0:00:00.766,0:00:02.033
I'm Walter Lewin.

0:00:02.033,0:00:04.766
I will be your lecturer
this term.

Creative Commons License Unless otherwise specified, the Spoken Media Website by the MIT Office of Digital Learning, Strategic Education Initiatives is licensed under a Creative Commons Attribution 4.0 International License.