Archive for the ‘Track D’ Category

D202: Managing Phone Calls

Tuesday, August 9th, 2011

Track D: Technolgy Advances – Tue, August 9 
11:45 a.m. – 12:30 p.m.

Learn how to use W3C’s Call Control eXtensible Markup Language (CCXML) for managing calls, including route calls based on data collected via a dialogue with the caller. CCXML can help establish multi-party conferences and add and remove participants. It enables organizations to place one or more outbound calls. Organizations can use the markup language to create “follow me” and “find me” applications that find the people you are trying to call by dialing their cell phone, home phone, and office phone in parallel. CCXML can enable call center applications to intelligently gather information from the caller and then pass that information on to the call center agent and more.

Presented by: R.J. Auburn, Paolo Baggia, Daniel C Burnett

Attendee Lunch

Tuesday, August 9th, 2011

Track A: Business Strategies – Tue, August 9 
12:30 p.m. – 1:45 p.m.

D203: Speaking With Your Home Entertainment Devices

Tuesday, August 9th, 2011

Track D: Technolgy Advances – Tue, August 9 
1:45 p.m. – 2:45 p.m.

Presented by: Patrick Nguyen

Talking to Your TV: Tales From the Design of Xbox Kinect

Kinect eliminates the need for a controller and instead relies on a combination of speech and body gestures to interact with the game. How can these two input modes be combined to enhance one another? How do you deal with the challenges of multiple speakers, background noise, and distance from the microphone? How do you use visual feedback to support error recovery? Gaming is all about having fun. How can we ensure that speech enhances this experience? Attend this session for the answers to these questions.

Presented by: Matt Klee

Model User Behavior for Controlling Home Entertainment

As home entertainment systems start to offer hundreds of channels and movies on demand, subscribers can no longer practically search for content using program listings. We describe experiments using a voice-activated system that helps users find programs by genre, title, cast names, time/date, etc. Hierarchical statistical language models enable a combination of historical behavior with current popularity ratings to increase recognition accuracy. A demo of a working prototype will show the interaction between the user, the speech recognizer, and the video display.

Presented by: Michael Johnston

D204: Automatically Generate Call Flows

Tuesday, August 9th, 2011

Track D: Technolgy Advances – Tue, August 9 
3:00 p.m. – 3:45 p.m.

Presented by: Emmett Coin

Optimize the Obvious: Automatic Call Flow Generation

In commercial spoken dialogue systems, call flows are traditionally built by call flow designers with a predefined business logic. This talk presents a
method for automatically deriving a call flow, minimizing the average number of user turns given a business logic and a frequency distribution of call reasons. As an example, the method was applied to a call routing application whose manually built call flow is processing about 4 million calls per month and whose call reason distribution served to measure the impact of the automatic call flow generation.

Presented by: David Suendermann

Automatically Generating Call Flows

When the number of possible caller intents reaches a certain point in applications such as call steering, voice search, and FAQ, few enterprises can afford the required months of utterance analysis, grammar development, and tuning. This session presents an innovative solution to the problem of handling complex caller intents. A dialogue manager automatically generates the call flow from a model of possible intents. Examples discussed  include a large natural language application deployed in the travel industry.

Presented by: Patrick Nguyen

Break in the Exhibit Hall

Tuesday, August 9th, 2011

Track A: Business Strategies – Tue, August 9 
3:45 p.m. – 4:30 p.m.

D205: Standard Languages for Implementing Voice Applications

Tuesday, August 9th, 2011

Track D: Technolgy Advances – Tue, August 9 
4:30 p.m. – 5:30 p.m.

While VoiceXML 2.0 was the most significant speech standard in recent history, it is not the last. The convergence of the phone (as a voice device) and the internet (as a data medium) has driven standards to support both with easy-to-use designs. This talk presents the variety of speech and related standards that have been developed and are under development now, including both the W3C HTML Speech Incubator Group’s efforts and VoiceXML 3.0, explaining where they fit into the converged world we all now experience.

Presented by: Daniel C Burnett, Paolo Baggia

Networking Reception

Tuesday, August 9th, 2011

Track D: Technolgy Advances – Tue, August 9 
5:30 p.m. – 7:30 p.m.

During the reception you can visit the consultants’ lounge for one-on-one discussions over drinks. 

D301: Innovative Uses of Speech Technology

Wednesday, August 10th, 2011

Track D: Technolgy Advances – Wed, August 10 
9:00 a.m. – 10:00 a.m.

Presented by: David L Thomson

Using Speech Applications in Robotics

The use of robotics, such as the Vgo telepresence robot, is becoming widespread. These types of robots basically represent the user in remote locations. The robot uses text-to-speech synthesis to communicate with those around it. Find out how managers, executives, and students use the robot to navigate through a workspace and interact with its inhabitants.

Presented by: Brad Kayton

Language Learning With Speech and 3D Video

The benefit of using speech-enabled animated video for teaching language has been well-demonstrated. Animated video captures the attention of the
student, and the professionally recorded audio track provides a pronunciation target, which students attempt to match. Speech recognition provides
an indication of their degree of mastery. Combining 3D video with speech technology provides a more compelling vehicle for language instruction.
This presentation focuses on techniques for integrating cloud-based speech technology with 3D video as a platform for teaching language over the public internet.

Presented by: K.W.Bill Scholz

Break in the Exhibit Hall

Wednesday, August 10th, 2011

Track A: Business Strategies – Wed, August 10 
10:00 a.m. – 10:45 a.m.

D302: Improving Speech Input

Wednesday, August 10th, 2011

Track D: Technolgy Advances – Wed, August 10 
10:45 a.m. – 11:45 a.m.

Presented by: Peter Voss

Measuring the Impact of Headset Design on Recognition Accuracy

This session describe a series of experiments to quantify the impact of headset features on speech recognition accuracy. These experiments, conducted both in the lab and on the road, show that improved perceptual quality from noise reduction, as judged by human listeners, does not necessarily translate into higher recognition accuracy.

Presented by: David L Thomson

Hearing the Whole Voice: How Wideband Audio Dramatically Improves Speech Recognition

This session explains the difference between traditional narrowband audio and wideband audio, outlines the current state of available wideband audio options, and discusses what you need to consider when evaluating wideband audio for both servers and clients/devices. Participants will leave with an understanding of how and where they can use wideband audio to improve their speech applications and create better customer experiences.

Presented by: R.J. Auburn