Microsoft Speech Platform Software Development Kit (SDK) Release Notes

Welcome to the Microsoft Speech Platform Software Development Kit (SDK) Release Notes. Refer to this file for information regarding known issues about the application programming interface (API) and tools for the Microsoft Speech Platform.

About This Document

Installation Notes

Known Issues in Installation

Known Issues in Tools

Known Issues in the VoiceXML API and Interpreter

Other Known Issues

Copyright

© 2010 Microsoft Corporation. All rights reserved.

About this Document

This document contains important information that you should know before you use the Microsoft Speech Platform SDK. For additional information about this SDK and API, refer to the documentation available at: http://go.microsoft.com/fwlink/?LinkId=198424

Installation Notes

Before installing the MicrosoftSpeechPlatform.msi, be sure to uninstall any previous versions of the Speech Platform Runtime or the Speech Platform SDK. After uninstalling the previous versions, install the following:

  1. MicrosoftSpeechPlatformSDK.msi (version 10.2)
  2. SpeechPlatformRuntime.msi (version 10.2)

Note that there are 32 bit and 64 bit versions of both of the above installation files.

File locations

Back to Top

Known Issues in installation

After runtime installation, no speech recognition or text-to-speech languages are available.

Issue: After runtime installation, no speech recognition or text-to-speech languages are available.

Resolution: The speech recognition (SR) and text-to-speech (TTS) languages are not installed by default. To install speech recognition or text-to-speech languages, download the needed SR or TTS language installer from the here: http://go.microsoft.com/fwlink/?LinkID=189279

Back to Top

Windows XP is not supported for this release.

Issue: This version of the Microsoft Speech SDK has not been validated and tested on Windows XP.

Resolution: Developers on Windows XP can upgrade to Windows Vista or Windows 7.

Back to Top

The VoiceXML API cannot be used as is.

Issue: The VoiceXML API included in the Microsoft Speech Platform requires one to develop a class derived from the included abstract Browser class.

Resolution: The Browser class included in this API is a generic base class on which other, platform specific classes can be developed. The UCMA 3.0 SDK includes a UCMA specific assembly and derived class that works with UCMA and OCS. It also includes sample code illustrating how to use the API.

Back to Top

Known Issues in Tools

  1. In the Excel template "SampleAnalysisReport.xltm" for viewing results from the SimulatorResultsAnalyzer tool, on the first worksheet's graph, the X axis is incorrectly labeled "FR/all" and should be labeled "FA/all".
  2. As a reminder, to use either the Simulator tool or PrepareGrammar tool, a local speech engine and language pack must be installed before using either tool. The speech engine is then configured with a configuration XML file that is input to the tool as a command line parameter. See the help documentation for details.
  3. The Simulator and PrepareGrammar tools can only be configured to work with a recognizer that is local to the SDK tools. The SDK installation does provide a sample configuration file to use the tools with a web service recognizer; however, the tools for this version do not support the configuration. Instead, the sample configuration file entitled "RecoConfig_Local_Engine_Example.xml" should be referred to.

Regarding the SampleAnalysisReport.xltm Excel template (e.g., viewing a CA/FA graph)

The "SampleAnalysisReport.xltm" file is an Excel template which has a set of macros to load and display the output file generated from the SimulatorResultsAnalyzer.exe tool.  For example, using the Excel template, summary statistics are generated, including a CA/FA graph which can be useful to gain insight on the impact of varying confidence thresholds for a particular dialogue state. Considerations and known issues are:

Here's a step by step process that takes you from the using the Simulator tool to viewing the results in the Excel template:

  1. Run a set of utterances (at least 500) through the Simulator tool. Transcriptions are required.
  2. Run the output file that the Simulator tool generates through the SimulatorResultsAnalyzer tool.
  3. If you have not already done so, save a copy of the original "SampleAnalysisReport.xltm" file to a user specific location.
  4. Using the saved copy of the template, open the template file in Excel. From within the first worksheet of the file, click on the button labeled "Add analysis results" at the top of the first worksheet (named "analysis").
  5. From the file dialogue, load the output file generated from step (2) above. Loading the second set of results from the SimulatorResultsAnalyzer.exe directly into the Excel template that already had the first set of results loaded will display both sets in the same graph (e.g., indicating changes to the CA/all vs. FA/all confidence threshold setting).
  6. If you change your grammars (e.g., after improving them), you can repeat steps 1 and 2 above before loading the results into the Excel template.

Back to Top

Known Issues in the VoiceXML API and interpreter

File name and other details are not available when a file referenced in a VoiceXML document or grammar file is not found

Issue: When a grammar or other file referenced from within a VoiceXML document or grammar file is not found, the error messages does not include details such as the name of the file that could not be loaded. In particular this makes it difficult to debug missing files referenced from within a grammar in binary (cfg) format since the binary file is not human readable.

Resolution: During development use human readable grammar files (grxml) when referencing other grammars. Also be sure to check log files if the VoiceXML interpreter throws an error.badfetch and you do not understand the cause.

Back to Top

Unable to properly record some utterance files

Issue: In a VoiceXML session, a user utterance can be recorded only if it matches a grammar rule close enough for the engine to form a hypothesis. If there is no match at all then the audio will not be recorded.  

Resolution: If reliable recording is always needed, for example for legal reasons, a UCMA Recorder object can be attached to the AudioVideoCall object and started before the VoiceXML browser object session is started.

Back to Top

Malformed VoiceXML documents are not always reported as such.

Issue: Some VoiceXML document parsing errors are not caught at document load time, but are caught later in the process. In such cases the ExitReason on the VoiceXmlResult object may be incorrectly set to DialogEndEncountered rather than MalformedXml.

Resolution: If you suspect that the page was not processed properly be sure that logging is enabled and look into the appropriate log file. The file will generally contain the appropriate error message and line in the VoiceXML document where the error occurred.

Back to Top

DTMF "#" input key is not recognized in a VoiceXML application.

Issue: The "#" key is not recognized as a valid key in DTMF mode even after it is added to the DTMF grammar and the termchar attribute is set to the empty string.

Resolution: Application developers must assign the termchar attribute with some DTMF character(s) (assigning termchar to the empty string will result in a default value of ‘#’). One possible workaround is to substitute the termchar attribute with an uncommonly used DTMF character like DTMF-A, DTMF-B, DTMF-C, DTMF-D in place of ‘#,” which is the default value.

Back to Top

Assigning a bargein type hotword may not be honored.

Issue: Assigning a bargein type hotword may not be honored..

Resolution: The attributes of the first prompt to be played during a new recognition are used for all prompts in the field. If bargeintype hotword is needed, it is recommended that the bargeintype hotword attribute be assigned to the first prompt to be played during a recognition.

Back to Top

Setting the Speech Recognition Sensitivity Property in a VoiceXML Document Has No Effect

Issue: This version of the VoiceXML API does not support setting the sensitivity property of the Speech Recognizer, as explained in the W3C VoiceXML Standard Version 2.0. Setting the sensitivity property will have no effect.

Resolution: The default value is 0.5, which works well in most situations.

Back to Top

Recordings that begin during a prompt may have an initial period of silence

Issue: When a recording of voice input begins during a prompt due to bargein there may be an initial period of silence.

Resolution: In these cases the recording begins at the beginning of the prompt, but the duration shadow variable on the recording variable is computed from when speech is detected. A VoiceXML application author can therefore compute the duration of the initial silence in the recording by subtracting the duration from the total length of the recording.

Back to Top

A semantic interpretation returned from an SRGS grammar as an ECMAScript object with a single property of "_value" is not returned as expected.

Issue: If the semantic interpretation from an SRGS grammar is an ECMAScript object with s single property of "_value", the contents of the "_value" property will be returned rather than the entire object.

Resolution: Grammar developers should avoid using code of the following form in SRGS grammars, which returns an object with a single property named “_value”: <tag>out._value=3</tag>. Instead, use one of the following forms to return the primitive value directly: <tag>3</tag> or<tag>out=3</tag>.

Back to Top

Calling Dispose before the VoiceXML session ends may cause unpredictable behavior.

Issue: Calling Dispose before the VoiceXML session ends may cause unpredictable behavior including second-chance exceptions which result in an application crash.

Resolution: The VoiceXML hosting application should not call Dispose on a VoiceXML Browser object before receiving the SessionCompleted event. Moreover in general the application should not call Dispose from within Browser event handlers, including the handler for SessionCompleted.

Back to Top

There is a default limit of 200 on the number of VoiceXML documents ("pages") that can be processed in one session.

Issue: Loading and processing more than 200 pages in one session will cause the VoiceXML browser to exit.

Resolution: If your VoiceXML application loads more than 200 VoiceXML pages and encounters this issue you can change the default page limit to a higher value by creating a new key in the registry

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech Server\VxmlPageLoadLimit

of type DWORD with the value you wish to use.

Notes:

Back to Top

Other Known Issues

Guidelines for securely deploying a VoiceXML hosting application

Issue: Deploying a VoiceXML hosting application may make it vulnerable to attacks such as denial of service, tampering, or information disclosure.

Resolution: General guidelines for deploying a VoiceXML hosting application are as follows

Back to Top

Please see the Microsoft Speech Platform Runtime Release Notes for more information

Issue: The Microsoft Speech Platform SDK is the software development kit for the Microsoft Speech Platform Runtime. The Microsoft Speech Platform Runtime has separate release notes and they should be reviewed.

Resolution: To view the Microsoft Speech Platform Runtime Release Notes please go to http://go.microsoft.com/fwlink/?LinkID=198426

Back to Top

Copyright

Information in this document, including URL and other Internet Web site references, is subject to change without notice. Unless otherwise noted, the companies, organizations, products, domain names, e-mail addresses, logos, people, places, and events depicted in examples herein are fictitious. No association with any real company, organization, product, domain name, e-mail address, logo, person, place, or event is intended or should be inferred. Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation.

Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property.

© 2010 Microsoft Corporation. All rights reserved.

Microsoft, Windows, Windows Live, Active Directory, Internet Explorer, MSN, Outlook, and SQL Server are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries.

All other trademarks are property of their respective owners.

Back to Top