This site has been archived

Archives for April 2010

Universal Subtitles

Here’s the problem: web video is beginning to rival television, but there isn’t a good open resource for subtitling. Here’s our mission: we’re trying to make captioning, subtitling, and translating video publicly accessible in a way that’s free and open, just like the Web.

The SpokenMedia project was born out of the research into automatic lecture transcription from the Spoken Language Systems group at MIT. Our approach has been two fold. We have been focusing on working with researchers to improve the automatic creation of transcripts–to enable search, and perhaps accessible captions. We’ve been working hard with researchers and doing what we can do from a process standpoint to improve accuracy. We have also been working on tools to address accuracy from a human editing perspective. In this approach we would provide these tools to lecture video publishers, but have considered setup a process to enable crowdsourced editing.

Recently we learned of a new project, Universal Subtitles (now Amara) and their Mozilla design challenge for Collaborative Subtitles. Both (?) projects/approaches are interesting and we’ll be keeping our eye on their progress. (Similarly with UToronto’s OpenCaps project that’s part of the Opencast Matterhorn suite).

Here’s a screenshot from the Universal Subtitle project.

Universal Subtitle Project

Universal Subtitle Project

Here’s a screenshot of the caption widget from the Collaborative Subtitling project.

Collaborative Subtitling Mockup
Source: Brandon Muramatsu/Collaborative Subtitling Challenge Mockup

Collaborative Subtitling Mockup

HTML5 Video

In a recent email from the Opencast community, I received a link to a post titled, “HTML5 video Libraries, Toolkits and Players” that gathers some of the currently available info on HTML5 Video. HTML5 Video is something that the SpokenMedia project will begin investigating “soon”.

To help you understand and get the most from this new tag, we have listed below a selection of the best HTML5 video libraries, frameworks, toolkits and players.

Source: Speckboy Design Magazine. (2010, April 23). HTML5 video Libraries, Toolkits and Players. Retrieved on April 25, 2010 from Speckboy Design Magazine Website:

Editing Protocol

We have settled on an editing protocol for communication between our player/transcript editor and the service that stores transcripts and videos.  The document in PDF format is attached below:

SpokenMedia Editing Protocol (PDF)

The protocol conforms to WC3’s proposed Timed Text Markup Language (TTML) 1.0 specification.[1]  We selected this specification because our primary data is time-aligned text and this specification is a standard used by our collaborators.


YouTube Auto-Captions

YouTube announced in early March that they would be extending their pilot program to enable auto-captioning for all channels.

The highlights…YouTube has announced that they’re doing this to improve accessibility, and…

  • Captions will initially only be available for English videos.
  • Auto-captions requires clearly spoken audio/
  • Auto-captions aren’t perfect, the owner will need to check that they’re accurate.
  • Auto-captions will be available for all channels.

We think this is great, if YouTube can automatically caption files, at scale and with high accuracy, that’s a great step forward for all videos, and definitely the lecture videos that we’ve been interested in the SpokenMedia project.

Though, as with SpokenMedia’s approach that builds on Jim Glass’ Spoken Language Systems research, they still have a ways to go on accuracy.

At this early date though, we can still see some significant advantages to our approach:

  • You don’t have to host your videos through YouTube to use the service SpokenMedia is developing. (YouTube locks the videos you upload into their player and service.)
  • SpokenMedia will provide a timed-aligned transcript file that you can download and use in other applications. (YouTube allows the channel publishers to download a transcript, edit it, and then reupload it for time code alignment. However, they don’t allow the public at large to download the transcript.)
  • SpokenMedia will provide an editor to improve the accuracy of the transcripts.
  • SpokenMedia will enable you to use the transcripts in other applications like search, and will let you start playing a segment within a video. (Though I’m pretty sure YouTube will be using transcripts to help users find videos–and I personally think that’s the real driver behind auto-captions search and keyword advertising. And if you know how to do it, you can construct a URL to link to a particular timepoint in a YouTube-hosted video.)

In any event, if you’ve watched the recent slidecasts of the last couple SpokenMedia presentations, you’ll see that we’ve included the impact of Auto-Captions on SpokenMedia.

Integrating the SpokenMedia Player with MIT OCW

We’re testing out the SpokenMedia Player interface on a mirror of MIT OCW through the OEIT Greenfield Project.

We’ve taken one video by Prof. Walter Lewin and modified the corresponding MIT video page with the player. It works pretty well, right out of the box–there’s a bit of conflict with the CSS namespace. (Right now the demo only includes the first two minutes of the transcript since I was doing the editing by hand.)

8.01 Classical Physics I: Powers of Ten – Units – Dimensions – Measurements – Uncertainties – Dimensional Analysis – Scaling Arguments

Updating the OCW Video Player with SpokenMedia
Credit: Brandon Muramatsu

Updating the OCW Video Player with SpokenMedia

We also have a demo online at the SpokenMedia site.

SpokenMedia at the Hewlett Grantees Meeting 2010

Brandon Muramatsu presented a poster on SpokenMedia at the 2010 Hewlett Grantees Meeting in April 2010 at Yale University in New Haven, CT.

SpokenMedia Poster at 2010 Hewlett Grantees Meeting
Source: Brandon Muramatsu

SpokenMedia Poster at 2010 Hewlett Grantees Meeting

Cite as: Muramatsu, B., McKinney, A., Wilkins, P. (2010, April). SpokenMedia. Poster at the 2010 Hewlett Grantees Meeting, Yale University, New Haven, CT. April 9, 2010

OER10 presentation updated as a slidecast

Quick update, I added the audio for the OER10 presentation so it’s now a slidecast. Check it out.

Continue Reading

SpokenMedia Transcript Editor

We’re working on a javascript-based transcript editor with our developer Ryan Lee at

The goals of the editor project are:

  • Low and high accuracy editors–We believe the best approach to transcript editing involves separating the editing into two distinct phases. In cases where the transcript is mostly accurate, we want to retain the time/word relationships. That is, for every word, we want to make sure we retain the timecode associated with that word. In cases where the transcript is mostly inaccurate, we believe it’s best to just edit the transcript as a single block of text. And that we’ll take the edited transcript and align it to the audio after the transcript editing is completed. Unfortunately, this will require a time delay (best case is about 1:1.5) to reprocess the video.
  • Be simple and intuitive to use.
  • Be a clean design.
  • Support the user with a limited amount of extra mousing and/or clicking (this is the one compelling reason for us to have the “low” and “high” accuracy editors).
  • Integrate an audio/video player within the UI of the transcript editor (instead of running the video/audio as a separate application, or in a separate window, from the editor).
  • An editing communication protocol to be implemented between the server and client browser.

We’ve seen some initial designs from Ryan, and once we have this design phase completed, we’ll post the editors with transcripts and go into a testing phase.

Extending the Spiral Connect Player

Christophe Battier, Jean Baptiste Nallet and Alexandre Louys from ICAP at the Universite de Lyon 1 in France visited the SpokenMedia team in February 2010.

They are working on a new version of their virtual learning environment (VLE) — a learning management system (LMS) in American-speak — that has an integrated video player with a number of features of interest to SpokenMedia.

The player is Flash-based and provides the ability for users to create “bubbles” — or annotations/bookmarks — that overlay the video. These bubbles can be seen along a timeline, and can be used to provide feedback from teacher to student or highlight interesting aspects of the video.

Here’s a screenshot from the current version of the Spiral player:

Spiral Player with Bubbles
Source: Christophe Battier/Spiral

Spiral Player with Bubbles

We discussed with them, integrating the aspects of the transcript display in the player we’ve been developing.

The user can watch the video and see the transcript with a “bouncing ball” highlighting the phrase being said. The user can switch between transcripts in multiple languages. And, the user can search through the transcript and playback the video by double clicking on the search result.

We talked about how the SpiralConnect team might extend their player to integrate transcripts and also create annotations that could be displayed below (or to the side of the video) and not just overlay the video image.

Here’s a mockup of what we discussed.

SpiralConnect plus SpokenMedia Transcript Mockup
Source: Brandon Muramatsu

SpiralConnect plus SpokenMedia Transcript Mockup

Creative Commons License Unless otherwise specified, the Spoken Media Website by the MIT Office of Digital Learning, Strategic Education Initiatives is licensed under a Creative Commons Attribution 4.0 International License.