Skip to content

Latest commit

 

History

History
73 lines (66 loc) · 4.09 KB

lab2.org

File metadata and controls

73 lines (66 loc) · 4.09 KB

Lab 2. Speech recognition and semantic interpretation

This version is preliminary and subject to change

In this lab session you will practice writing speech recognition grammars in the style of Speech Recognition Grammar Specification (SRGS) and Semantic Interpretation for Speech Recognition (SISR). It is assumed that you have read the relevant literature on the subject before attempting to solve the assignments.

In order to test your grammars, you should have a small “testing environment” (fetch the repository!). There are two files: ./vxml-server/templates/lab2.xml (VoiceXML) and ./vxml-server/grammars/lab2.xml (SRGS).

Recognising funny quotes

Here is a wellknown but still funny listing of short quotes:

to do is to beSocrates
to be is to doSartre
do be do be doSinatra

Your task here is to:

  1. Write an SRGS grammar that covers exactly these quotes, and no other sentences/utterances. Test it by using the code above.
  2. Add semantic tags to your grammar, so as to enable the ASR to return the name corresponding to the source of the recognized quote (e.g. ‘socrates’). Again test it by using the code above.

Recognizing simple commands

Suppose that you want to voice control your intelligent home, and be able to say things like:

turn off the light
turn on the light
turn off the air conditioning
turn on the AC
turn on the heat
open the window
close the door
open the door
close the window

Your task here is to:

  1. Write a grammar that covers the above utterances. In principle, you could solve this by writing a grammar that just listed them all. This is not allowed in this exercise however. Instead, I want you to write a grammar that makes use of at least three rules. In order to give you a good start, I have written one rule for you:
    <rule id="object">
      <one-of>
         <item> light </item>
         <item> heat </item>
         <item> A C <tag> out = 'air conditioning'; </tag></item>
         <item> air conditioning </item>
         <item> window </item>
         <item> door </item>
      </one-of>
    </rule>
        

    You will need at least two more rules: one rule for “action”, and one rule that ties “object” and “action” together.

  2. Add semantic tags to your grammar, in such a way that the interpretation of what is being said becomes an object (a JavaScript object, to be more precise). For example, if you say “turn the AC off” the ASR should return the object:
    {"object": "air conditioning",
     "action": "off"}
        

    When testing this, I recommend that you change your definition of <filled> by adding:

    Object was: <value expr="foo$.interpretation.object"/>
    Action was: <value expr="foo$.interpretation.action"/>
        
  3. Now, you want to be able to be polite to your intelligent home, don’t you? Add rule(s) that will allow you to start your utterances with an optional “please”, as in “please close the door”.
  4. Finally, your grammar most likely suffers from the problem that “open the air conditioning” is recognized. This is not optimal. Try to revise your grammar in a way that will fix this problem.

(VG part) Working with confidence threshold

  1. Does the recognition works fine? Try to get some situations when you get false acceptances and see what confidence scores did you get.
  2. Add the property for adjusting the confidence threshold. Now your application will return nomatch for inputs with the confidence below the threshold.
  3. How the selection of an optimal threshold can be automated? Write your proposal in the GUL’s comment box.
  4. Extend your VoiceXML according to the table below in order to:
    • Acknowledge the action if the confidence score is high (“Opening the window”).
    • Ask for confirmation if the confidence score is not high enough but above the threshold (“You asked to open the window. Correct?”).
ConfidenceAction
HighAcknowledge
MidAsk for confirmation
LowReprompt