How to Display Timed Transcript Alongside Played Audio

Audio transcript is the text version of speech, helpful in providing useful materials like recorded lectures, seminars, etc. to the audibly challenged. They’re also used to keep textual records of events like interviews, court hearings and meetings.

Speech audio in webpages (like in podcasts) are typically accompanied with transcripts, for the benefit of those who are hearing impaired or not able to hear at all. They can view the text "playing" alongside the audio. The best way to create an audio transcript is by means of manual interpretation and recording.

In this post, we’re going to see how to display a running audio transcript alongside the audio. To get started we need to have the transcript ready. For this post, I’ve downloaded a sample audio and its transcript from voxtab.

I use HTML ul list to display the dialogues on a webpage like below:

<div id="transcriptWrapper">
    <ul id="transcript">
        <li class="speaker1"><strong class="speakerName">Justin</strong>: What I am trying to say is that the appeal and the settlement are separate.</li>
        <li class="speaker2"><strong class="speakerName">Alistair</strong>: You mean that communications and announcements internal and external would be brought into the appeal process.</li>
        <li class="speaker1"><strong class="speakerName">Justin</strong>: Right, because they connect to the appeal.</li>
        ...
    </ul>
</div>

Next, I want all the available text to be blurred and to unblur only the dialogue that will match the current speech being played by the audio recording. So, to unblur the dialogues I use the CSS filter "blur".

#transcript > li{
     -webkit-filter: blur(3px)
    filter: blur(3px);
    transition: all .8s ease;
    ...
}

For IE 10+ you can add text-shadow to create a blurred effect. You can use this code to detect whether CSS blur has been applied or not, and to load your IE specific stylesheet. Once the text is blurred, I went ahead and added some style to the transcript.

if( getComputedStyle(dialogues[0]).webkitFilter === undefined && getComputedStyle(dialogues[0]).filter === "none"){
    var headEle = document.querySelector('head'),
      linkEle = document.createElement('link');
      linkEle.type = 'text/css';
      linkEle.rel = 'stylesheet';
      linkEle.href = 'ie.css';
      headEle.appendChild(linkEle);
}

Now, let’s add the audio to the page, with this.

<audio id="audio" ontimeupdate="playTranscript()" controls>
    <source src="sample.mp3" type="audio/mpeg">
    Your browser does not support the audio element.
</audio> 

The ontimeupdate event of the audio element is fired every time its currentTime is updated, so we’ll use that event to check the current running time of the audio and highlight the corresponding dialogue in the transcript. Let’s first create some global variables we’ll be needing.

dialogueTimings = [0,4,9,11,18,24,29,31,44,45,47];
dialogues = document.querySelectorAll('#transcript>li');
transcriptWrapper = document.querySelector('#transcriptWrapper');
audio = document.querySelector('#audio');
previousDialogueTime = -1;

dialogueTimings is an array of numbers representing the seconds when each dialogue in the transcript begins. The first dialogue starts at 0s, second at 4s, and so on. previousDialogueTime will be used to track dialogue changes.

Let’s finally move to the function hooked to ontimeupdate, which is named "playTranscript". Since playTranscript is fired almost every second the audio is playing, we first need to identify which dialogue is currently being played. Suppose the audio is at 0:14, then it’s playing the dialogue that started at 0:11 (refer the dialogueTimings array), if the current time is 0:30 in the audio then it’s the dialogue that started at 0:29.

Hence, to find out when the current dialogue begun, we first filter all the times in the dialogueTimings array which are below the current time of the audio. If the current time is 0:14 we filter out the all the nos. in the array that are below 14 (which are 0, 4, 9 and 11) and find out the maximum no. out of those, which is 11 (thus the dialogue started at 0:11).

function playTranscript(){
    var currentDialogueTime = Math.max.apply(Math, dialogueTimings.filter(function(v){return v <= audio.currentTime}));
}

Once the currentDialogueTime is calculated, we check if it’s the same as the previousDialogueTime(that is if the dialogue in the audio has changed or not), if it’s not a match (that is if the dialogue has changed), then currentDialogueTime is assigned to previousDialogueTime.

function playTranscript(){
    var currentDialogueTime = Math.max.apply(Math, dialogueTimings.filter(function(v){return v <= audio.currentTime}));

    if(previousDialogueTime !== currentDialogueTime) {
        previousDialogueTime = currentDialogueTime;
    }
}

Now let’s use the index of the currentDialogueTime in the dialogueTimings array to find out which dialogue in the list of transcript dialogues needs to be highlighted. For example, if the currentDialogueTime is 11, then index of 11 in the dialogueTimings array is 3 which means we have to highlight the dialogue at index 3 in the dialogues array.

function playTranscript(){
    var currentDialogueTime = Math.max.apply(Math, dialogueTimings.filter(function(v){return v <= audio.currentTime}));

    if(previousDialogueTime !== currentDialogueTime) {
        previousDialogueTime = currentDialogueTime;
        var currentDialogue = dialogues[dialogueTimings.indexOf(currentDialogueTime)];
    }
}

Once we’ve found the dialogue to highlight (that is the currentDialogue), we scroll transcriptWrapper (if scrollable) till currentDialogue is 50px below the wrapper’s top, then we find out the previously highlighted dialogue and remove the class speaking from it and add it to currentDialogue.

function playTranscript(){
    var currentDialogueTime = Math.max.apply(Math, dialogueTimings.filter(function(v){return v <= audio.currentTime}));

    if(previousDialogueTime !== currentDialogueTime) {
        previousDialogueTime = currentDialogueTime;
        var currentDialogue = dialogues[dialogueTimings.indexOf(currentDialogueTime)];
        transcriptWrapper.scrollTop  = currentDialogue.offsetTop - 50;  
        var previousDialogue = document.getElementsByClassName('speaking')[0];
        if(previousDialogue !== undefined)
            previousDialogue.className = previousDialogue.className.replace('speaking','');
        currentDialogue.className +=' speaking';
    }
}

The element with class speaking will display unblurred text.

.speaking{
  -webkit-filter: blur(0px)
  filter:blur(0px);
}

And that’s it, here’s the full code HTML and JS code.

<div id="transcriptWrapper">
    <ul id="transcript">
        <li class="speaker1"><strong class="speakerName">Justin</strong>: What I am trying to say is that the appeal and the settlement are separate.</li>
        <li class="speaker2"><strong class="speakerName">Alistair</strong>: You mean that communications and announcements internal and external would be brought into the appeal process.</li>
        <li class="speaker1"><strong class="speakerName">Justin</strong>: Right, because they connect to the appeal.</li>
        ...
    </ul>
</div>

<br>

<audio id="audio" ontimeupdate="playTranscript()" controls>
    <source src="sample.mp3" type="audio/mpeg">
    Your browser does not support the audio element.
</audio>

<br>

<script>
dialogueTimings = [0,4,9,11,18,24,29,31,44,45,47];
dialogues = document.querySelectorAll('#transcript>li');
transcriptWrapper = document.querySelector('#transcriptWrapper');
audio = document.querySelector('#audio');
previousDialogueTime = -1;

function playTranscript(){
    var currentDialogueTime = Math.max.apply(Math, dialogueTimings.filter(function(v){return v <= audio.currentTime}));

    if(previousDialogueTime !== currentDialogueTime) {
        previousDialogueTime = currentDialogueTime;
        var currentDialogue = dialogues[dialogueTimings.indexOf(currentDialogueTime)];
        transcriptWrapper.scrollTop  = currentDialogue.offsetTop - 50;  
        var previousDialogue = document.getElementsByClassName('speaking')[0];
        if(previousDialogue !== undefined)
            previousDialogue.className = previousDialogue.className.replace('speaking','');
        currentDialogue.className +=' speaking';
    }
}
</script>

Demo

Check out the demo below to get an idea how it works when all codes are put together.