Using voice commands to control a servo

Posted by Marco Minerva

I’m very interested in combining .NET Gadgeteer devices with other techonologies. In this post I’ll show how to use the Microsft Speech Platform to recognize speech and send commands to control a servo.

Let’s start with the Gadgeteer device. We’ll use the HiTec HS-311 servo, the same that has been described in the post Using a Servo in a .NET Gadgeteer Camera Device. The system will be able to received commands via TCP and move the servo accordingly.

Connect the following modules to a FEZ Spider Mainboard:

You can see the result in following screenshot.

The Gadgeteer device schema in the Designer

The Gadgeteer device schema in the Designer

First of all, we need to configure the servo and the Wi-Fi module:

private const string SSID = "YOUR_NETWORK_SSID";
private const string PASSPHRASE = "YOUR_NETWORK_PASSPHRASE";

private SocketServer servoControlServer;

// All Hitec servos require a 3-4V peak to peak square wave pulse.
// Pulse duration is from 0.9ms to 2.1ms with 1.5ms as center.
// The pulse refreshes at 50Hz (20ms).
private GT.Interfaces.PWMOutput servo;

private static uint high = 2100000;
private static uint low = 900000;
private static uint delta = high - low;

void ProgramStarted()
{
    // Connects the servo to the extender.
    servo = extender.SetupPWMOutput(GT.Socket.Pin.Nine);

    oledDisplay.SimpleGraphics.Clear();
    oledDisplay.SimpleGraphics.DisplayText("Acquiring",
        Resources.GetFont(Resources.FontResources.NinaB), GT.Color.Yellow, 30, 40);
    oledDisplay.SimpleGraphics.DisplayText("network address...",
        Resources.GetFont(Resources.FontResources.NinaB), GT.Color.Yellow, 5, 60);

    servoControlServer = new SocketServer(8080);
    servoControlServer.DataReceived += new DataReceivedEventHandler(servoControlServer_DataReceived);

    WiFi_RS21.WiFiNetworkInfo info = new WiFi_RS21.WiFiNetworkInfo();
    info.SSID = SSID;
    info.SecMode = WiFi_RS21.SecurityMode.WPA2;
    info.networkType = WiFi_RS21.NetworkType.AccessPoint;

    wifi.NetworkUp += new GTM.Module.NetworkModule.NetworkEventHandler(wifi_NetworkUp);

    wifi.UseDHCP();
    wifi.Join(info, PASSPHRASE);

    // Use Debug.Print to show messages in Visual Studio's "Output" window during debugging.
    Debug.Print("Program Started");
}

We control the servo using the PWM output interface through the Extender module, as in the post mentioned before. You can refer to that article for hardware and implementation details. Next, we show a waiting message on the display, instantiate the usual SocketServer class and setup Wi-Fi connection (as described in many articles on this blog, for example Wi-Fi Gadgeteer Robot controlled by Windows Phone with image streaming).

When the network is available, the NetworkUp event is raised: we show the IP address on the display and we start the server, making it avaiable to accept connections:

private void wifi_NetworkUp(GTM.Module.NetworkModule sender, GTM.Module.NetworkModule.NetworkState state)
{
    oledDisplay.SimpleGraphics.Clear();
    oledDisplay.SimpleGraphics.DisplayText(wifi.NetworkSettings.IPAddress,
        Resources.GetFont(Resources.FontResources.NinaB), GT.Color.Yellow, 15, 50);

    servoControlServer.Start();
}

To simplify the control of the servo, we write a simple method that takes a percentage as parameter and moves it accordingly:

private void MoveServo(uint percent)
{
    uint pulse = low + (delta * percent / 100);
    servo.SetPulse(20000000, pulse);
}

We use the following conventions:

  1. valid value ranges from 0 to 100;
  2. 50 represents the center value;
  3. values lesser than 50 makes the servo turn left;
  4. values grater than 50 makes the servo turn right.

To make this work, we need to se the initial position of the servo to 50 when the application starts. So, we need to add this code:

private uint percent = 50;

void ProgramStarted()
{
    // ...

    // Sets startup position of the servo.
    this.MoveServo(percent);

    // ...
}

Finally, when a message is received on port 8080, the DataReceived event of SocketServer is raised. Here we write the code to handle the ommand and move the servo accordingly:

void servoControlServer_DataReceived(object sender, DataReceivedEventArgs e)
{
    try
    {
        string command = SocketServer.BytesToString(e.Data).ToUpper().Trim();
        Debug.Print("Received command: " + command);

        switch (command.Substring(0, 1))
        {
            case "C":
                percent = 50;
                break;

            case "L":
                if (command.Length == 1)
                {
                    // Step isn't specified. Sets the percentage to 0, making the servo
                    // turn all left.
                    percent = 0;
                }
                else
                {
                    // Extracts the step from the received command and uses it to updates the
                    // percentage of servo position.
                    int step = int.Parse(command.Substring(1));
                    percent = (uint)System.Math.Max(0, (int)percent - step);
                }
                break;

            case "R":
                if (command.Length == 1)
                {
                    // Step isn't specified. Sets the percentage to 0, making the servo
                    // turn all right.
                    percent = 100;
                }
                else
                {
                    // Extracts the step from the received command and uses it to updates the
                    // percentage of servo position.
                    int step = int.Parse(command.Substring(1));
                    percent = (uint)System.Math.Min(100, (int)percent + step);
                }
                break;

            default:
                break;
        }

        // Moves the servo using the computed percentage.
        this.MoveServo(percent);
    }
    catch
    { }
}

The device accepts three commands:

  • C, to make the servo return to its center (initial) position;
  • Lxx, to make the servo turn left, using the xx percentage value, if specified;
  • Rxx, to make the servo turn right, using the xx percentage value, if specified.

So, we check the first character of the command. If it is C, we set percent variable to 50. If it is L or R, we control if it is the only received character: in this case, we set percentage to 0 or 100, to make the servo, respectively, turn all left or right; otherwise, we extract the percentage value (that comes after the direction command)  and we update the percentage variable with it.

Finally, we call the MoveServo method, to make the servo actually move.

THE SPEECH RECOGNITION APPLICATION

As said before, the speech recognition application uses the Microsoft Speech Platform. In particular, we need the following components:

The first two are necessary to develop and execute applications that use speech recognition functionalities. The last one contains a separated download for each supported language. Currently, they are 26; for our example we only need the English language recognizer, MSSpeech_SR_en-US_TELE.msi.

After installing all the necessary components, we create Console application, that is sufficient for our purpose.  Now we need to add a reference to the assembly that contains the speech recognition and text-to-speech engines. Its name is Microsoft.Speech.dll and it is located in the C:\Program Files\Microsoft SDKs\Speech\v11.0\Assembly folder.

Developing a speech recognition enabled application requires two steps: defining a grammar that describes the valid speech patterns and write the code that handle the recognized results.

Creating a grammar is quite straightforward. In the following example, we’ll define it via code, but it is possible as well to use a file that conforms to the Speech Recognition Grammar Specification (SRGS) Version 1.0 specification. You can refer to the online documentation for details.

This is the grammar we’ll use in our application:

private static Grammar CreateGrammar()
{
    Choices center = new Choices(new string[] { "center" });
    SemanticResultKey centerKeys = new SemanticResultKey("center", center);

    Choices directions = new Choices(new string[] { "left", "right" });
    SemanticResultKey directionKeys = new SemanticResultKey("direction", directions);

    Choices values = new Choices();
    for (int i = 0; i <= 100; i++)
        values.Add(i.ToString());
    SemanticResultKey valueKeys = new SemanticResultKey("values", values);

    // 1. center command.
    GrammarBuilder centerGrammarBuilder = new GrammarBuilder();
    centerGrammarBuilder.Append(centerKeys);

    // 2. direction commands.
    GrammarBuilder directionGrammarBuilder = new GrammarBuilder();
    directionGrammarBuilder.Append("Turn", 0, 1);
    directionGrammarBuilder.Append(directionKeys);

    // 3. direction commands with step values.
    GrammarBuilder valueGrammarBuilder = new GrammarBuilder();
    valueGrammarBuilder.Append("Turn", 0, 1);
    valueGrammarBuilder.Append(directionKeys);
    valueGrammarBuilder.Append("Step", 0, 1);
    valueGrammarBuilder.Append(valueKeys);

    // assemble the permutations (only one of this is recognized at a time).
    Choices permutations = new Choices();
    permutations.Add(centerGrammarBuilder);
    permutations.Add(directionGrammarBuilder);
    permutations.Add(valueGrammarBuilder);

    GrammarBuilder commands = new GrammarBuilder();
    commands.Culture = CultureInfo.GetCultureInfo("en-US");
    commands.Append(permutations);
    commands.Append("Go");

    // Create a Grammar object from the GrammarBuilder.
    return new Grammar(commands);
}

We want to recognize three speech patterns: only the “center” command, only the “left” or “right” commands, or the “left” or “right” commands plus a value that corresponds to the percentage of servo position. For each group, we create a Choices object that contains a string array with all the relative values. Then, we define the corresponding SemanticResultKey objects, that are needed later to check the recognized speech.

After that, we create the GrammarBuilder variabiles, one for each pattern. Note in particular the second and third one: before directionKeys, we append to them the “Turn” string; the following values (0 and 1) specify that this word can be present or not in the recognized speech. In the same way, the third GrammarBuilder contains also the optional “Step” word and the the values contained in valuesKeys.

Then, we instantiate a Choices object that contains all the permutations of the patterns that can be recognized. Only one of this is recognized at a time. This object is used to create another GrammarBuilder, that appends the “Go” word at the end. Finally, we pass the GrammarBuilder to the costructor of Grammar class, to create the real grammar with all the patterns.

In conclusion, our application will be able to recognize commands like the following:

  1. Center Go
  2. Left Go
  3. Right go
  4. Left 13 Go
  5. Right 34 Go
  6. Turn Left 27 Go
  7. Turn Right 45 Go
  8. Turn Left Step 12 Go
  9. Turn Right Step 50 Go

We need a SpeechRecognitionEngine object to make speech recognition and a TcpClient to send commands to device and a:

const string SERVER_IP = "192.168.1.103";
const int SERVER_PORT = 8080;

private static TcpClient client;
private static Stream tcpStream;

static void Main(string[] args)
{
    client = new TcpClient();
    Console.Write("Connecting to {0}... ", SERVER_IP);

    client.Connect(SERVER_IP, SERVER_PORT);
    Console.WriteLine("Connected\n");

    tcpStream = client.GetStream();

    // Create a SpeechRecognitionEngine object for the en-US locale.
    using (SpeechRecognitionEngine recognizer = new SpeechRecognitionEngine(new CultureInfo("en-US")))
    {
        // Loads grammar in the recognizer.
        Grammar grammar = CreateGrammar();
        recognizer.LoadGrammar(grammar);

        // Add a handler for the speech recognized event.
        recognizer.SpeechRecognized +=
          new EventHandler<SpeechRecognizedEventArgs>(recognizer_SpeechRecognized);

        // Configure the input to the speech recognizer.
        recognizer.SetInputToDefaultAudioDevice();

        // Start asynchronous, continuous speech recognition.
        recognizer.RecognizeAsync(RecognizeMode.Multiple);

        Console.WriteLine("Waiting for commands (press Enter to exit)...\n");

        // Keep the console window open.
        Console.ReadLine();
    }

    // Sends the command to center the servo.
    byte[] data = Encoding.Default.GetBytes("C");
    tcpStream.Write(data, 0, data.Length);

    tcpStream.Close();
    client.Close();
}

We create an instance of SpeechRecognitionEngine specificing the language to use, then we call the CreateGrammar method to get the grammar defined before and load it with the LoadGrammar method. After that, we register the SpeechRecognized event, that is raised everytime a speech is recognized. With recognizer.RecognizeAsync(Recognize.Multiple), we tell the application to start continuos speech recognition.

Finally, in the recognizer_SpeechRecognized event we handle the recognized text:

static void recognizer_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
    string command = string.Empty;
    var result = e.Result;
    Console.WriteLine("Recognized text: " + result.Text);

    if (result.Semantics != null && result.Semantics.Count != 0)
    {
        if (result.Semantics.ContainsKey("center"))
        {
            // The "center" command has been recognized.
            command = "C";
        }
        else if (result.Semantics.ContainsKey("direction"))
        {
            // "left" or "right" command has been recognized.
            command = (result.Semantics["direction"].Value.ToString() == "left") ? "L" : "R";

            // Checks whether a step has been specified.
            if (result.Semantics.ContainsKey("values"))
                command += result.Semantics["values"].Value.ToString();
        }

        // Sends command to device.
        Console.Write("Sending command {0}... ", command);

        byte[] data = Encoding.Default.GetBytes(command);
        tcpStream.Write(data, 0, data.Length);

        Console.WriteLine("Done.\n");
    }
}

The e.Result.Semantics property contains information about the recognized Semantic Result Keys, as we have defined in the CreateGrammar method. Checking it, we set the command variable accordingly and then we send it using TCP.

Our system is ready. We can turn on the Gadgeteer device and then start the speech recognition application:

Both the Gadgeteer and the Speech recognition application are available for download.

ServoSpeechControl.zip

, , ,

  1. #1 by sean on August 28, 2012 - 12:35 AM

    Hi Marco
    I have implemented your code. it seems like there is some fixed buffer period….if i say left, instead of left go then the response time is still the same ( as measured from me saying left)….as though the speech is handled in 2s (say) chunks
    I would like to speed up the response time, and I was wondering if you had come across this.

    My aim is to control a RC car by voice (which “works”, just too slow)

    also my speech recognition seems much worse than yours …. have you done much training? are you using a headset microphone ( I am using my laptop’s inbuilt mic)

  2. #3 by Obtaining The Right Inns on May 30, 2013 - 5:13 AM

    Hi there, just wanted to mention, I liked this blog post.
    It was inspiring. Keep on posting!

  3. #4 by mnmighri on March 22, 2014 - 8:27 AM

    Hi , i am working on a similar project but using the STM32F4 DISCOVERY BOARD.I have not tried it yet but i want to know if the Gadgeteer tools and software codes are usefull with STM32F4 and CC motors.Thanks

  1. What are people doing with .NET Gadgeteer? | MSDN Blogs
  2. Servo connections | Kcloanstore

Leave a comment