This site has been archived

Converting .sbv to .trans/continuous text

As a step in comparing the output from YouTube’s Autocaptioning, we need to transform their .sbv file into something we can use in our comparison tests (a .trans file). We needed to strip the hours out of the timecode, drop the end time, and bring everything to a single line.

Update: It turns out we needed a continuous text file. So these have been updated accordingly.

Continue Reading

Caption File Formats

There’s been some discussion on the Matterhorn list recently about caption file formats, and I thought it might be useful to describe what we’re doing with file formats for SpokenMedia.

SpokenMedia uses two file formats, our original .wrd files output from the recognition process and Timed Text Markup Language (TTML). We also need to handle two other caption file formats .srt and .sbv.

There is a nice discussion of the YouTube format at SBV file format for Youtube Subtitles and Captions and a link to a web-based tool to convert .srt files to .sbv files.

We’ll cover our implementation of TTML in a separate post.
Continue Reading

YouTube Auto-Captions

YouTube announced in early March that they would be extending their pilot program to enable auto-captioning for all channels.

The highlights…YouTube has announced that they’re doing this to improve accessibility, and…

  • Captions will initially only be available for English videos.
  • Auto-captions requires clearly spoken audio/
  • Auto-captions aren’t perfect, the owner will need to check that they’re accurate.
  • Auto-captions will be available for all channels.

We think this is great, if YouTube can automatically caption files, at scale and with high accuracy, that’s a great step forward for all videos, and definitely the lecture videos that we’ve been interested in the SpokenMedia project.

Though, as with SpokenMedia’s approach that builds on Jim Glass’ Spoken Language Systems research, they still have a ways to go on accuracy.

At this early date though, we can still see some significant advantages to our approach:

  • You don’t have to host your videos through YouTube to use the service SpokenMedia is developing. (YouTube locks the videos you upload into their player and service.)
  • SpokenMedia will provide a timed-aligned transcript file that you can download and use in other applications. (YouTube allows the channel publishers to download a transcript, edit it, and then reupload it for time code alignment. However, they don’t allow the public at large to download the transcript.)
  • SpokenMedia will provide an editor to improve the accuracy of the transcripts.
  • SpokenMedia will enable you to use the transcripts in other applications like search, and will let you start playing a segment within a video. (Though I’m pretty sure YouTube will be using transcripts to help users find videos–and I personally think that’s the real driver behind auto-captions search and keyword advertising. And if you know how to do it, you can construct a URL to link to a particular timepoint in a YouTube-hosted video.)

In any event, if you’ve watched the recent slidecasts of the last couple SpokenMedia presentations, you’ll see that we’ve included the impact of Auto-Captions on SpokenMedia.

Creative Commons License Unless otherwise specified, the Spoken Media Website by the MIT Office of Digital Learning, Strategic Education Initiatives is licensed under a Creative Commons Attribution 4.0 International License.