News – SpokenMedia

How Google Translate Works

Brandon Muramatsu — Thu, 12 Aug 2010 23:11:08 +0000

Google posted a high level overview of how Google Translate works.

Source: Google

An interesting hack from Yahoo! Openhack India

Brandon Muramatsu — Wed, 28 Jul 2010 22:53:17 +0000

Sound familiar?

Automatic, Real-time close captioning/translation for flickr videos.

How?
We captured the audio stream that comes out to speaker and gave as input to mic. Used Microsoft Speech API and Julius to convert the speech to text. Used a GreaseMonkey script to sync with transcription server(our local box) and video and displayed the transcribed text on the video. Before displaying the actual text on the video, based on the user’s choice we translate the text and show it on video. (We used Google’s Translate API for this).

Srithar, B. (2010). Yahoo! Openhack India 2010- FlicksubZ. Retrieved on July 28, 2010 from Srithar’s Blog Website: http://babusri.blogspot.com/2010/07/yahoo-openhack-india-2010-flicksubz.html

Check out the whole post.

Making Progress

Brandon Muramatsu — Thu, 17 Jun 2010 23:48:25 +0000

In the last month or two we’ve made some good progress with getting additional parts of the SpokenMedia workflow into a working state.

Here’s a workflow diagram showing what we can do with SpokenMedia today.

Source: Brandon Muramatsu

SpokenMedia Workflow, June 2010

(The bright yellow indicates features working in the last two months, the gray indicates features we’ve had working since December 2009, and the light yellow indicates features on which we’ve just started working.)

To recap, since December 2009, we’ve been able to:

Rip audio from video files and prepare it for the speech recognizer.
Process the audio through the speech recognizer locally within the SpokenMedia project using domain and acoustic models.
Present output transcript files (.WRD) through the SpokenMedia player.

Recently, we’ve added the ability to:

Create domain models (or augment existing domain models from files.
Create unsupervised acoustic models from input audio files. (Typically 10 hours of audio by the same speaker are required to create “good” acoustic model–certainly for American’s speaking English. We’re still not sure how well this capability will allow us to handle Indian-English speakers.)
Use a selected domain or acoustic model from a pre-existing set, in addition to creating a new one.
Process audio through an “upgraded” speech recognizer, using the custom domain and acoustic models. Though this recognition is being performed on Jim Glass’ research cluster.

We still have a ways to go–we still need to better understand the potential accuracy of our approach. The critical blocker is now a means to compare a known accurate transcript with the output of the speech recognizer (it’s a matter of transforming existing transcripts into time-aligned ones of the right format). And then there are the two challenges of automating the software and getting it running on OEIT servers (we’ve reverted to using Jim Glass’ research cluster to get some of the other pieces up and running).

Universal Subtitles

Brandon Muramatsu — Tue, 27 Apr 2010 14:30:26 +0000

Here’s the problem: web video is beginning to rival television, but there isn’t a good open resource for subtitling. Here’s our mission: we’re trying to make captioning, subtitling, and translating video publicly accessible in a way that’s free and open, just like the Web.

Source: Universal Subtitle Project

The SpokenMedia project was born out of the research into automatic lecture transcription from the Spoken Language Systems group at MIT. Our approach has been two fold. We have been focusing on working with researchers to improve the automatic creation of transcripts–to enable search, and perhaps accessible captions. We’ve been working hard with researchers and doing what we can do from a process standpoint to improve accuracy. We have also been working on tools to address accuracy from a human editing perspective. In this approach we would provide these tools to lecture video publishers, but have considered setup a process to enable crowdsourced editing.

Recently we learned of a new project, Universal Subtitles (now Amara) and their Mozilla design challenge for Collaborative Subtitles. Both (?) projects/approaches are interesting and we’ll be keeping our eye on their progress. (Similarly with UToronto’s OpenCaps project that’s part of the Opencast Matterhorn suite).

Here’s a screenshot from the Universal Subtitle project.

Source: Universal Subtitle Project (now Amara)

Universal Subtitle Project

Here’s a screenshot of the caption widget from the Collaborative Subtitling project.

Source: Brandon Muramatsu/Collaborative Subtitling Challenge Mockup

Collaborative Subtitling Mockup

HTML5 Video

Brandon Muramatsu — Mon, 26 Apr 2010 00:22:20 +0000

In a recent email from the Opencast community, I received a link to a post titled, “HTML5 video Libraries, Toolkits and Players” that gathers some of the currently available info on HTML5 Video. HTML5 Video is something that the SpokenMedia project will begin investigating “soon”.

To help you understand and get the most from this new tag, we have listed below a selection of the best HTML5 video libraries, frameworks, toolkits and players.

Source: Speckboy Design Magazine. (2010, April 23). HTML5 video Libraries, Toolkits and Players. Retrieved on April 25, 2010 from Speckboy Design Magazine Website: http://speckyboy.com/2010/04/23/html5-video-libraries-toolkits-and-players/

YouTube Auto-Captions

Brandon Muramatsu — Tue, 20 Apr 2010 14:30:36 +0000

YouTube announced in early March that they would be extending their pilot program to enable auto-captioning for all channels.

The highlights…YouTube has announced that they’re doing this to improve accessibility, and…

Captions will initially only be available for English videos.

Auto-captions requires clearly spoken audio/

Auto-captions aren’t perfect, the owner will need to check that they’re accurate.

Auto-captions will be available for all channels.

We think this is great, if YouTube can automatically caption files, at scale and with high accuracy, that’s a great step forward for all videos, and definitely the lecture videos that we’ve been interested in the SpokenMedia project.

Though, as with SpokenMedia’s approach that builds on Jim Glass’ Spoken Language Systems research, they still have a ways to go on accuracy.

At this early date though, we can still see some significant advantages to our approach:

You don’t have to host your videos through YouTube to use the service SpokenMedia is developing. (YouTube locks the videos you upload into their player and service.)
SpokenMedia will provide a timed-aligned transcript file that you can download and use in other applications. (YouTube allows the channel publishers to download a transcript, edit it, and then reupload it for time code alignment. However, they don’t allow the public at large to download the transcript.)
SpokenMedia will provide an editor to improve the accuracy of the transcripts.
SpokenMedia will enable you to use the transcripts in other applications like search, and will let you start playing a segment within a video. (Though I’m pretty sure YouTube will be using transcripts to help users find videos–and I personally think that’s the real driver behind auto-captions search and keyword advertising. And if you know how to do it, you can construct a URL to link to a particular timepoint in a YouTube-hosted video.)

In any event, if you’ve watched the recent slidecasts of the last couple SpokenMedia presentations, you’ll see that we’ve included the impact of Auto-Captions on SpokenMedia.

YouTube EDU and iTunesU

Brandon Muramatsu — Thu, 25 Mar 2010 21:25:00 +0000

An interesting article on TechCrunch today about YouTube EDU and iTunesU.

YouTube has reported on the one year anniversary of the launch of YouTube EDU:

Source: Brandon/YouTube EDU

MIT on YouTube EDU

YouTube EDU is now one of the largest online video repositories of higher education content in the world. We have tripled our partner base to over 300 universities and colleges, including University of Cambridge, Yale, Stanford, MIT, University of Chicago and The Indian Institutes of Technology. We have grown to include university courses in seven languages across 10 countries. We now have over 350 full courses, a 75% increase from a year ago and thousands of aspiring students have viewed EDU videos tens of millions of times. And today, the EDU video library stands at over 65,000 videos.

Source: YouTube. (2010, March 25). More Courses and More Colleges – YouTube EDU Turns One. Retrieved on March 25, 2010 from YouTube Website: http://youtube-global.blogspot.com/2010/03/more-courses-and-more-colleges-youtube.html

The TechCrunch article also lists the stats of iTunesU as 600 university partners and 250,000 videos.

IIHS Demo: How’d we do it?

Brandon Muramatsu — Tue, 09 Mar 2010 01:18:07 +0000

Source: Brandon Muramatsu

Workflow Used in IIHS Demo

SpokenMedia and NPTEL: Initial Thoughts

Brandon Muramatsu — Sun, 31 Jan 2010 16:42:00 +0000

During our trip to India in early January 2010, Brandon Muramatsu, Andrew McKinney and Vijay Kumar met with Prof. Mangala Sunder and the Indian National Programme on Technology Enhanced Learning (NPTEL) team at the Indian Institute of Technology-Madras.

The SpokenMedia project and NPTEL are in discussions to bring the automated lecture transcription process under development at MIT to NPTEL to:

Radically reduce transcription and captioning time (from 26 hours to as little as 2 hours).
Improve initial transcription accuracy via a research and development program.
Improve search and discovery of lecture video via transcripts.
Improve accessibility of video lectures for the diverse background of learners in India, and worldwide, via captioned video.

NPTEL, the National Programme on Technology Enhanced Learning, is a program funded by the Indian Ministry for Human Resource Development and a collaboration of a number of participating Indian Institutes of Technology. As part of Phase I, they have published approximately 4,500 hours of lecture videos in engineering courses that comply with the model curriculum suggested by All India Council for Technical Education. In an even more ambitious Phase II, they plan to add approximately 40,000 additional hours of lecture video for science and engineering courses.

The current NPTEL transcription and captioning process is labor and time intensive. During our discussions, we learned that it takes approximately 26 hours on aveage to transcribe and caption a single hour of video. Even with the initial hand transcription, they are averaging 50% accuracy.

Source: Brandon Muramatsu

NPTEL Current Transcription and Captioning Process

We discussed using the untrained SpokenMedia software to improve the efficiency of this initial process. Our initial experiments suggest that the untrained recognizer can achieve 40-60% accuracy, which is in the same range as the current NPTEL hand process. Thus we propose replacing the hand transcription and captioning steps with a two step recognition and editing process. Using a single processor, the recognition step takes on the order of 1.5 hours per 1 hour of video. Using this estimate coupled with the existing editing time in use at NPTEL, the overall process might be reduced from 26 hours to approximately 10 hours.

Source: Brandon Muramatsu

Initial Improved NPTEL Transcription and Captioning Process

And we discussed initiating an applied research and development project to create baseline acoustic (speaker) models for Indian English for use in the automated lecture transcription process, with the goal of improving the automated transcription accuracy to the same range of American English transcription (as high as 80-85% accuracy). The use of improved acoustic models and parallizing the recognition process might reduce the total transcription time to approximately 2 hours.

Source: Brandon Muramatsu

Goal NPTEL Transcription and Captioning Process

SpokenMedia at the IIHS Curriculum Conference

Brandon Muramatsu — Sat, 23 Jan 2010 16:41:26 +0000

Brandon Muramatsu and Andrew McKinney presented on SpokenMedia at the Indian Institute for Human Settlements (IIHS) Curriculum Conference in Bangalore, India on January 5, 2010.

Along with Peter Wilkins, we developed a demonstration of SpokenMedia technology using automatic lecture transcription to transcribe videos from IIHS. We developed a new JavaScript player that allowed us to view and search transcripts, and that supports transcripts in multiple languages. View the demo.

IIHS Open Framework-SpokenMedia from Brandon Muramatsu

View more presentations from Brandon Muramatsu.

Cite as: Muramatsu, B., McKinney, A. & Wilkins, P. (2010, January 6). IIHS Open Framework-SpokenMedia. Presentation at the Indian Institute for Human Settlements Curriculum Conference: Bangalore, India, January 5, 2010. Retrieved January 23, 2010 from Slideshare Web site: http://www.slideshare.net/bmuramatsu/iihs-open-frameworkspoken-media