Spoken dialogue system performance can vary widely for different users, as well for the same user during different dialogues. Since the dialogue strategies used by a system strongly influence performance, an ideal system would not use a single fixed strategy, but would instead adapt to the circumstances at hand. To do so, a system must be able to identify dialogue properties that suggest adaptation. For instance, a system that detects poor speech recognition performance for a particular user might switch to a dialogue strategy with more explicit prompting.
We have used a machine learning approach to predict when a system should adapt its dialogue strategies, using only features automatically available at system run-time. In particular, based on human-machine training dialogues, we have learned rules for identifying situations where speech recognition is performing poorly, and where the system should "bail out" to a human operator. Our results show a significant improvement over baseline approaches, and illustrate how the use of different knowledge sources (e.g., lower-level acoustic features, higher-level dialogue features) can impact performance.
To apply our learned rules, we have designed and evaluated an adaptive version of TOOT, a spoken dialogue system for retrieving online train schedules. Based on rules learned from a set of training dialogues, adaptive TOOT constructs a user model representing whether the user is having speech recognition problems as a particular dialogue progresses. Adaptive TOOT then automatically adapts its dialogue strategies based on this dynamically changing user model. An empirical evaluation of the system demonstrates the utility of the approach. In particular, by adapting the dialogue strategies of TOOT in response to inferences regarding repeated ASR misrecognitions, we significantly improve the task success rate.