How to Add Text-to-Speech Feature on Any Web Page

By Preethi Ranjit. in Coding. Updated on October 27, 2017.

The text-to-speech feature refers to the spoken narration of a text displayed on a device. At present, devices such as laptops, tablets, and mobile phones already have this feature. Any application running on these devices, such as a web browser, can make use of it, and extend its functionality. The narration feature can be a suitable aid for an application that displays plentiful text, as it offers the option of listening to website visitors.

The Web Speech API

The Web Speech JavaScript API is the gateway to access the Text-to-Speech feature by a web browser. So, if you want to introduce text-to-speech functionality on a text-heavy web page, and allow your readers to listen to the content, you can make use of this handy API, or, to be more specific, its SpeechSynthesis interface.

Initial code & support check

To get started, let’s create a web page with me sample text to be narrated, and three buttons.

<div>
    <button id=play></button>
    <button id=pause></button>
    <button id=stop></button>
</div>
<article>
    <h1>The Hare With Many Friends</h1>
    <img src="hare-and-friends.jpg">
    <p>A hare was very popular with...</p>
    <p>But he declined, stating that...</p>
    <!-- More text... -->
    <blockquote>Moral of the story...</blockquote>
</article>

The buttons will be the controls for the narration. Now we need to make sure if the UA supports the SpeechSynthesis interface. To do so, we quickly check with JavaScript if the window object has the 'speechSynthesis' property, or not.

onload = function() {
  if ('speechSynthesis' in window) {
      /* speech synthesis supported */
  }
  else {
      /* speech synthesis not supported */
  }
}

If speechSynthesis is available, first we create a reference for speechSynthesis that we assign to the synth variable. We also initiate a flag with the false value (we’ll see its purpose later in the post), and we create references & click event handlers for the three buttons (Play, Pause, Stop) as well.

When the user clicks one of the buttons, its respective function (onClickPlay(), onClickPause(), onClickStop()) will be called.

if ('speechSynthesis' in window){
    var synth = speechSynthesis;
    var flag = false;

    /* references to the buttons */
    var playEle = document.querySelector('#play');
    var pauseEle = document.querySelector('#pause');
    var stopEle = document.querySelector('#stop');

    /* click event handlers for the buttons */
    playEle.addEventListener('click', onClickPlay);
    pauseEle.addEventListener('click', onClickPause);
    stopEle.addEventListener('click', onClickStop);

    function onClickPlay() {
    }
    function onClickPause() {
    }
    function onClickStop() {
    }
}

Create the custom functions

Now let’s build the click functions of the three individual buttons that will be called by the event handlers.

1. Play/Resume

When the Play button is clicked, first we check the flag. If it’s false, we set it to true, so if any time the button is clicked later, the code inside the first if condition won’t execute (not until the flag is false again).

Then we create a new instance of the SpeechSynthesisUtterance interface that holds information about the speech, like, the text to be read, speech volume, voice spoken in, speed, pitch and language of the speech. We add the article text as parameter of the constructor, and assign it to the utterance variable.

function onClickPlay() {
    if(!flag){
        flag = true;
        utterance = new SpeechSynthesisUtterance(
              document.querySelector('article').textContent);
        utterance.voice = synth.getVoices()[0];
        utterance.onend = function(){
            flag = false;
        };
        synth.speak(utterance);
    }
    if(synth.paused) { /* unpause/resume narration */
        synth.resume();
    }
}

We use the SpeechSynthesis.getVoices() method to designate a voice for the speech from the voices available in the user’s device. As this method returns an array of all the available voice options in a device, we assign the first available device voice by using the utterance.voice = synth.getVoices()[0]; statement.

The onend property represents an event handler that is executed when the speech is finished. Inside of it, we change the value of the flag variable back to false so that the code that starts the speech can be executed when the button is clicked again.

Then we call the SpeechSynthesis.speak() method in order to start the narration. We also need to check if the narration is paused, for which we use the read-only SpeechSynthesis.paused property. If the narration is paused, we need to resume the narration on the button click, which we can acheive by using the SpeechSynthesis.resume() method.

2. Pause

Now let’s create the onClickPause() function in which we first check if the narration is ongoing and not paused. We can test these conditions by making use of the SpeechSynthesis.speaking and the SpeechSynthesis.paused properties. If both conditions are true, our onClickPause() function pauses the speech by calling the SpeechSynthesis.pause() method.

function onClickPause() {
    if(synth.speaking && !synth.paused){ /* pause narration */
        synth.pause();
    }
}

3. Stop

The onClickStop() function is built similarly to onClickPause(). If the speech is ongoing, we stop it by calling the SpeechSynthesis.cancel() method that removes all utterances.

function onClickStop() {
    if(synth.speaking){ /* stop narration */
        /* for safari */
        flag = false;
        synth.cancel();
    }
}

Note that on the cancellation of speech, the onend event is automatically fired, and we had already added the flag reset code inside of it. However, there’s a bug in the Safari browser that prevents this event from firing, that’s why we resetted the flag in the onClickStop() function. You don’t have to do it if you don’t want to support Safari.

Browser support

All latest versions of modern browsers have full or partial support for the speech synthesis API. Webkit browsers don’t play speech from multiple tabs, pausing is buggy (works but buggy), and speech isn’t reset when the user reloads the page in Webkit browsers.

Working demo

Have a look at the live demo below, or check out the full code on Github.

See the Pen ðŸ-£ Text to Speech – JavaScript by HONGKIAT (@hkdc) on CodePen.