RoboCatz.com

Talking Robots

Do you have an NXT with a blank screen?
Would you like your robot to talk to you?


This page describes an experiment to bring speech to LEGO robots.

Sound on the NXT and EV3 Robots

Sound is stored in audio files (like an MP3). An audio file is a document read by a computer (or your robot) and interpreted to produce sound. There are many formats for audio files. The NXT and EV3 have their own formats for audio files.

RSF File Format

EV3 robots use sound files that are in an RSF format. What does RSF stand for? I don't know. But it could be "Raw Sound File". I suggest that because the file format is actually one of the RAW sound format files. The RSF is a RAW file that uses an 8000 Hz sampling rate. The sampling rate refers to the number of times per second that the sound level is measured.


Figure 1

Sound Waves

Sound is a "compression" wave that travels through air. Though the wave exists in 3 dimensions in our real world, we often draw (or render) a slice of the wave in 2 dimensions and it would appear as shown in Figure 1.

Two characteristics of the wave are "frequency" (x-axis) and "amplitude" (y-axis).

As the sound radiates from the source (i.e., the speaker) it will dissipate (or attenuate) which means that the volume of sound will decrease as the distance from the source increases. This is because the wave of molecules covers a larger area as it radiates (spreads) from the source. A jet engine sounds much louder if you are standing next to it than if you were several miles away.

Windows Speech

In the beginning of this century, Microsoft created a library of programs for software developers to enable them to create programs that talked, and listened, to the user. These capabilities are known as Speech Recognition and Text-To-Speech.

The software libraries made it easy to incorporate these capabilities into programs. For example, if you wanted the computer to say something, all you need to do is execute the command .Speak(words) and the computer will speak those words. For example, you could code: .Speak("Hello World") and the computer would say it.

If you want to create an audio sound file of the computer speaking, just set the output to a sound file as in the function: .SetOutputToWaveFile()

Speech in if NXT's and EV3's

Getting the EV3 robot to speak requires the creation of a sound file for each word that you want the robot to say. For example, if you want the robot to say: "Hello World", then you would need a sound file for "Hello" and a sound file for "World". The robot would then play the two sound files (in order) for the robot to say "Hello World". If you want your robot to be able to say any word in the dictionary, then you would need to have a sound file for each word in the dictionary. This might 'sound' like an impossible task. But it's not once you realize the computer can do billions of calculations per second.

Creating RSF Sound Files using Microsoft Software

The program below is written in Microsoft's Visual Basic .NET (or VB.net). In this code, a list of words is scanned. For each word in the list, a computer sound file is created (if it doesn't already exist). The sound file is set to the encoding format ".Pcm" which is the format for a RAW file and the sampling rate is set to 8000 Hz which is the format for a RSF file. With this small program you can create a RSF file of the computer speaking a word. This can be for just one word or for tens of thousands of words or even for millions of words.

Dim Syn As New Speech.Synthesis.SpeechSynthesizer
' Get each word
Dim WordList As String()
Dim rootDir As String = txtPath.Text & "Fall2015\RSFFiles\"
WordList = txtSayThis.Text.Split(New Char() {ControlChars.Lf, ControlChars.Cr}, StringSplitOptions.RemoveEmptyEntries)
For Each word In WordList
    word = Trim(word)
    word = Regex.Replace(word, "[\;\:\?\.\,\]\[\}\{'()\#\!\?\`]", "")
    If Not My.Computer.FileSystem.FileExists(rootDir & word & ".rsf") Then
        Try
            Dim myPath As String
            myPath = rootDir & word & ".rsf"
            Syn.SetOutputToWaveFile(myPath, New SpeechAudioFormatInfo(EncodingFormat.Pcm, 8000, 8, 1, 2000, 1, Nothing))
            Syn.Volume = 100
            Syn.Speak(word)
            Syn.SetOutputToNull()
            txtWorkingOnThis.Text = word
            txtWorkingOnThis.Refresh()
            Refresh()
        Catch ex As Exception
        End Try
    End If
Next

Data Mining