Text To Speech Engine with pitch and speech rate adjustment settings ~ Hamad's blog

About Text To speech:

A text-to-speech (TTS) system converts normal language text into speech; other systems render symbolic linguistic representations like phonetic transcriptions into speech.And Android uses Text to speech engine to read text and convert into speech using downloaded language data.And i have noticed that no one tells about what are the parts of Text to speech so here is some brief info regarding parts of text to speech.

Parts of Text To speech:

A text-to-speech system (or "engine") is composed of two parts:

A Front-End

The front-end has two major tasks. First, it converts raw text containing symbols like numbers and abbreviations into the equivalent of written-out words. This process is often called text normalization, pre-processing, or tokenization. The front-end then assigns phonetic transcriptions to each word, and divides and marks the text into prosodic units, like phrases, clauses, and sentences. The process of assigning phonetic transcriptions to words is called text-to-phoneme or grapheme-to-phoneme conversion. Phonetic transcriptions and prosody information together make up the symbolic linguistic representation that is output by the front-end.

A Back-End

The back-end—often referred to as the synthesizer—then converts the symbolic linguistic representation into sound. In certain systems, this part includes the computation of the target prosody (pitch contour, phoneme durations), which is then imposed on the output speech.

Above popup dialog could be shown if language data is not already available on your device to download text to speech language data.so make sure that specific language data is already downloaded to your device or download it by following these steps before using the Application.

Settings -> Language & input -> scroll down to Text-to-speech output -> under “Preferred Engine” click the settings icon next to Google Text-to-speech Engine -> Install voice data -> select whichever language you like -> click the download icon next to the “high quality” version (should be around 240MB) -> once downloaded it should already be selected for you.

I have used US locale so download united states voices as i used for this post.

text to speech download US language data

OutPut:

Create new Android Project

Project Name: TextToSpeak

//tested from 2.3.3 to current android sdk

Build Target: Android 2.3.3 //or greater than that

Application Name: TextToSpeak

Package Name: com.shaikhhamadali.blogspot.texttospeech

Create Layout file: activity_text_to_speech

1. code of Layout:

<LinearLayout xmlns:android="http://schemas.android.com/apk/res/android"
    xmlns:tools="http://schemas.android.com/tools"
    android:layout_width="match_parent"
    android:layout_height="match_parent"
    android:orientation="vertical"
    tools:context=".TextToSpeak" >

    <TextView
        android:id="@+id/tVSpeechRate"
        android:layout_width="wrap_content"
        android:layout_height="wrap_content"
        android:text="Set Speech Rate" />

    <SeekBar
        android:id="@+id/sBSpeechRate"
        android:layout_width="match_parent"
        android:layout_height="wrap_content"
        android:layout_below="@id/tVSpeechRate"
        android:max="19"
        android:progress="9" />

    <TextView
        android:id="@+id/tVPitchRate"
        android:layout_width="wrap_content"
        android:layout_height="wrap_content"
        android:layout_below="@id/sBSpeechRate"
        android:text="Set Pitch" />

    <SeekBar
        android:id="@+id/sBPitchRate"
        android:layout_width="match_parent"
        android:layout_height="wrap_content"
        android:layout_below="@id/tVPitchRate"
        android:max="19"
        android:progress="9" />

    <EditText
        android:id="@+id/eTPronounce"
        android:layout_width="match_parent"
        android:layout_height="wrap_content"
        android:layout_below="@+id/sBPitchRate"
        android:ems="10"
        android:hint="Enter Text to Speak" >

        <requestFocus />
    </EditText>

    <Button
        android:id="@+id/btnSpeak"
        android:layout_width="match_parent"
        android:layout_height="wrap_content"
        android:text="Speak" />

</LinearLayout>

2. code of activity:

package com.shaikhhamadali.blogspot.texttospeech;

import java.util.Locale;

import android.os.Bundle;
import android.app.Activity;
import android.view.View;
import android.view.View.OnClickListener;
import android.widget.Button;
import android.widget.EditText;
import android.widget.SeekBar;
import android.widget.SeekBar.OnSeekBarChangeListener;
import android.widget.Toast;
import android.speech.tts.TextToSpeech;

public class TextToSpeak extends Activity implements TextToSpeech.OnInitListener{
 //Create variables
 double pitch=0.0f,speechRate=0.0f;
 //declare views/controls 
 private TextToSpeech tts;
 SeekBar sBSpeechRate,sBPitchRate;
 EditText eTPronounce;
 Button btnSpeak;

 @Override
 protected void onCreate(Bundle savedInstanceState) {
  super.onCreate(savedInstanceState);
  setContentView(R.layout.activity_text_to_speech);
  initializeControls();
  /*Initialize the Text to speech engine using the default TTS engine.
   *This will also initialize the associated TextToSpeech engine if it isn't already running.
   */
  tts = new TextToSpeech(this, this);
 }
 private void initializeControls() {
  //get reference of the UI Controls
  sBSpeechRate=(SeekBar)findViewById(R.id.sBSpeechRate);
  sBPitchRate=(SeekBar)findViewById(R.id.sBPitchRate);
  eTPronounce=(EditText)findViewById(R.id.eTPronounce);
  btnSpeak=(Button)findViewById(R.id.btnSpeak);
  /*initialize seek bar change listener to listen every change on seekbar
   * either increment or decrement*/
  sBSpeechRate.setOnSeekBarChangeListener(new OnSeekBarChangeListener() {

   @Override
   public void onStopTrackingTouch(SeekBar seekBar) {}
   @Override
   public void onStartTrackingTouch(SeekBar seekBar) {}
   @Override
   public void onProgressChanged(SeekBar seekBar, int progress,
     boolean fromUser) {
    //divide progress by 10 to get speech rate in float values like 0.1
    speechRate=((double)progress+1)/10;
   }
  });

  sBPitchRate.setOnSeekBarChangeListener(new OnSeekBarChangeListener() {
   @Override
   public void onStopTrackingTouch(SeekBar seekBar) {}
   @Override
   public void onStartTrackingTouch(SeekBar seekBar) {}
   @Override
   public void onProgressChanged(SeekBar seekBar, int progress,
     boolean fromUser) {
    //divide progress by 10 to get pitch in float values like 0.1
    pitch=((double)progress+1)/10;
   }
  });
  //set default text as Welcome to shaikhhamadali.blogspot.com
  eTPronounce.setText("Welcome to shaikhhamadali.blogspot.com");
  //set on click listener to button speak call speakOut Method to speak text
  btnSpeak.setOnClickListener(new OnClickListener() {
   @Override
   public void onClick(View v) {
    speakOut();
   }
  });
 }
 @Override
 public void onInit(int status) {
  //check the status
  if (status == TextToSpeech.SUCCESS) {
   //set language Locale to US
   int result = tts.setLanguage(Locale.US);
   //check that is language locale available on device or supported
   if (result == TextToSpeech.LANG_MISSING_DATA
     || result == TextToSpeech.LANG_NOT_SUPPORTED) {
   } else {
    //then enable button to listen for listener
    btnSpeak.setEnabled(true);
    //and speak by calling speakOut
    speakOut();
   }

  } else {
   //show toast if initialization failed
   Toast.makeText(getBaseContext(), "TTS Engine Initilization Failed!",Toast.LENGTH_SHORT).show();
  }

 }

 private void speakOut() {
  //get entered text to speak
  String text = eTPronounce.getText().toString();
  //set pitch rate adjusted by user
  tts.setPitch((float)pitch);
  //set speech rate adjusted by user
  tts.setSpeechRate((float)speechRate);
  /*pass text to speak using engine and pass Queue mode as QUEUE_FLUSH where all entries in the playback queue 
   *(media to be played and text to be synthesized) are dropped and
   *replaced by the new entry. Queues are flushed with respect to
   *a given calling app. Entries in the queue from other callers are not discarded*/
  tts.speak(text, TextToSpeech.QUEUE_FLUSH, null);

 }
 @Override
 public void onDestroy() {
  // Don't forget to stop and shutdown text to speech engine!
  if (tts != null) {
   tts.stop();
   tts.shutdown();
  }
  super.onDestroy();
 }

}

3. note that:

Good practice is to always shutdown text to speech engine in onDestroy.

Above I have used Speech Rate, speech rate is nothing but the speed of speaking text you can slow down it and can also speed it up.

pitch is nothing but the frequency set of voice, you can change it high and low frequency. high frequency is an example of some of the people whose voice is thinner enough to understand.

Learn more uses of intent voice search speech recognition and web search using intent.

4. conclusion:

Some information about text to speech engine.

Some information pitch and speech rate setting.

Know how to use seek bar control and how to progress values of seek bar.

Know how to download text to speech engine voices of any language from settings.

5. About the post:

The code seems to explain itself due to comments, but if you have any questions you can freely ask too!

Don’t mind to write a comment whatever you like to ask, to know,to suggest or recommend.

Hope you enjoy it!

6. Source Code:

you can download the source code from: GoogleDrive, Github

Cheers,

Hamad Ali Shaikh

Friday, 2 May 2014