Category Archives: Wednesday

D302: An Intelligent Assistant for High-Level Task Understanding

Track D: INFRASTRUCTURE – Wednesday 26 April 
11:45 – 12:30

Current intelligent agents (IAs) are limited to specific domains. However, people often engage in activities that span multiple domains and have to manage context and information transfer on their own. An ideal personal IA would be able discover such (recurring) activities and learn their structure, to support interaction with the user. The result would be custom applications supporting personal activities. We discuss our work creating agents that autonomously configure spoken language interfaces for this purpose.

Presented by: Alexander Rudnicky

B302: PANEL: Adding Visuals to Voice

Track B: SELF-SERVICE TECHNOLOGIES – Wednesday 26 April 
11:45 – 12:30

Traditional IVR systems limit users to speaking and listening. Enhancing voice-only communications with visual information, including menus, directories, photos, diagrams, fill-in-forms, receipts, and tickets, adds new capabilities to self-help systems. Security may be enhanced by using both voice speaker identification and face recognition. Developers who have build visual/voice systems relate their own experiences developing and using voice with visual systems and provide advice about adopting a voice with visual system for an organization.

Presented by: Crispin Reedy, Thomas Wilson, Chris du Toit, Jo Roman

A302: Speech Technology for Augmenting Language Learning Experiences

Track A: INNOVATIVE USES OF ASR – Wednesday 26 April 
11:45 – 12:30

Gaining language proficiency in learning oral skills without an instructor can be difficult. We present some practical issues surrounding the creation of computer-assisted language learning software incorporating speech technology and describe how breaking down oral language instruction into machine-solvable problems allows speech interfaces to play the role of instructor. We also discuss how to provide computer-generated feedback for pronunciation training. A tight interplay between UI/UX design and core speech technology is key to creating immersive speech experiences for users.

Presented by: Emily Soward

STKU-4: Using a Data-Driven Approach to Design, Build, & Tune Spoken Dialogue Systems

SpeechTEK University – Wednesday, April 26, 2017 – Wednesday 26 April 
13:30 – 16:30

This workshop addresses the whole lifecycle of using data-driven approaches to design, train, and tune practical dialogue systems. The workshop focuses on natural language solutions in call center applications, but many of the techniques are equally applicable to building robust intelligent assistants. Topics covered in the workshop include using live Wizard-of-Oz techniques to test dialogue strategies and gather early customer language for semantic design; managing data collections; semantic annotation (including multi-dimensional semantics); training, testing, and tuning grammars; and data-driven approaches to optimizing dialogue and system performance.

Presented by: David Attwater

STKU-5: Deep Neural Networks in Speech Recognition

SpeechTEK University – Wednesday, April 26, 2017 – Wednesday 26 April 
13:30 – 16:30

Deep learning is setting new standards of accuracy for financial projections, image processing, advertising, translation, games, and virtually every field where we use massive databases to train systems for estimation, classification, and prediction. This tutorial reviews recent advances in machine learning with a focus on Deep Neural Nets (DNNs) for speech recognition and natural language processing. The session includes demonstrations and hands-on exercises. We recommend that participants bring a laptop. Attendees gain an understanding of DNN fundamentals, how they are used in acoustic and language modeling, and where technology appears to be headed.

Presented by: David L Thomson

STKU-6: Developing Multimodal Applications for New Platforms

SpeechTEK University – Wednesday, April 26, 2017 – Wednesday 26 April 
13:30 – 16:30

Multimodal interfaces, combining speech, graphics, and sensor input, are becoming increasingly important for interaction with the rapidly expanding variety of nontraditional platforms, including mobile, wearables, robots, and devices in the Internet of Things. User interfaces on these platforms will need to be much more varied than traditional user interfaces. We demonstrate how to develop multimodal clients using standards such as WebRTC, WebAudio, and Web Sockets and the Open Web Platform, including open technologies such as HTML5, JavaScript, and CSS. We also discuss integration with cloud resources for technologies such as speech recognition and natural language understanding. Attendees should have access to a browser that supports the Open Web Platform standards, for example, the current versions of Chrome, Firefox, or Opera. Basic knowledge of HTML5 and JavaScript would be very helpful.

Presented by: Deborah Dahl

STKU-7: Voice Experience Design for Alexa Skills

SpeechTEK University – Wednesday, April 26, 2017 – Wednesday 26 April 
13:30 – 16:30

Join us to learn about creating within the Alexa ecosystem using the Alexa Skill Kit. We cover general capabilities and use real-world examples of skills to illustrate voice experience design best practices. Attendees experience prototyping techniques and work in groups to define and prototype a skill. Before coming, please sign up at developer.amazon.com. And be sure to bring your laptop!

Presented by: David Bliss, Phillip Hunter