Does Treating Student Uncertainty as a Learning Impasse Improve Learning in Spoken Dialogue Tutoring?
Kate Forbes-Riley and Diane Litman
|PI||Kate Forbes-Riley and Diane Litman|
|Other Contributers||Scott Silliman, Amruta Purandare (Pitt)|
|Study Start Date||10-1-06|
|Study End Date||5-30-07|
|Number of Students||N = 60|
|Total Participant Hours||2.5 hrs per participant.|
|DataShop||Target date: June 15, 2007|
Most existing tutoring systems respond based on the correctness of student answers. Although the tutoring community has argued that student incorrectness and uncertainty both represent learning impasses (and thus opportunities to learn), and has also shown correlations between uncertainty and learning, to date very few controlled experiments have investigated whether system responses to student uncertainty improve learning. Thus, this controlled experiment tests whether this hypothesis holds true, under “ideal” system conditions.
This study uses a Wizard of Oz (WOZ) version of a qualitative physics spoken dialogue tutoring system, called ITSPOKE, which shares technology with TuTalk. This version of ITSPOKE tutors one qualitative physics problem involving basic physics concepts (e.g. Newton's Second Law); student and tutor interact via spoken dialogue that has a (fixed) Tutor Question – (expected) Student Answer format. A human “Wizard” performs speech recognition, natural language understanding, and recognition of uncertainty, for each student answer.
This study has 3 conditions. In the experimental condition, the Wizard tells the system that all correct but uncertain student answers are incorrect, thereby causing the system to respond to both uncertain and incorrect student answers in the same way, namely with further dialogue to reinforce the student’s understanding of the principle(s) under discussion. In the first control condition, the system responds only to incorrect student answers in this way. In the second control condition, the system also responds to a percentage of correct answers in this way, to control for the additional tutoring in the experimental condition.
Data collection for this experiment began in December, 2006. Because this is a small 8-month experiment, we use an existing WOZ infrastructure and measure only normal learning, with the expectation that a larger subsequent study measuring robust learning can be performed in either the Physics LearnLab or in other LearnLabs developing spoken dialogue systems.
Background and significance
With the underlying hypothesis that increasing the amount of student information available to the computer will increase the effectiveness of the tutoring, a number of tutoring systems have begun adding spoken language capabilities, (e.g. Aist et al. (2002); Pon-Barry et al. (2006); Litman and Silliman (2004)). Adding speech is also supported by Hausmann and Chi (2002), who found that spontaneous self-explanation occurs much more frequently in spoken tutoring then in text-based tutoring; Chi et al. (1994) found that spontaneous self-explanation improves learning gains during tutoring. In our prior work (Litman et al. (2006)), we found that using spoken dialogue (as opposed to typed) significantly improved learning in human tutoring, but not in computer tutoring. We hypothesize that just changing the communication modality is not enough; the system also needs to make use of the additional information in speech. Responding to student uncertainty represents one way of using this information.
Prior correlational studies have investigated the relationship between student uncertainty and learning. For example, Craig et al. (2004) observe that student uncertainty positively correlates with learning during interactions with the AutoTutor system (Graesser et al. (2005)). They theorize that uncertainty can accompany cognitive disequilibrium (Graesser and Olde (2003)), a state in which learners confront obstacles to goals, salient contrasts, equivalent alternatives, or other experiences that fail to match their expectations. The cognitive disequilibrium, and the uncertainty that accompanies it, has a high likelihood of causing deliberation and inquiry aimed at restoring cognitive equilibrium.
However, to our knowledge, there has been only one prior controlled experiment investigating whether adapting to student uncertainty over and above correctness improves student learning. In particular, Pon-Barry et al. (2006) implemented two different tutor responses to uncertainty in the SCoT-DC Shipboard Damage Control spoken dialogue computer tutor. These responses were derived from human tutoring studies. First, their system responded to incorrect and uncertain student answers with a tutor turn that referred back to past dialogue (reminding the student of a point previously discussed). Second, their system responded to correct but uncertain student answers with a tutor turn that paraphrased the student’s correct answer. They also enhanced the SCot-DC tutor to automatically detect a small set of signals to student uncertainty: a list of lexical hedges (e.g. “I think”), filled pauses (e.g. “um”), and high responses latencies. They then conducted a controlled experiment comparing a version of SCoT-DC that employed the two tutor adaptive responses to uncertainty only when uncertainty was detected, with a version that employed the two responses after all correct or incorrect student answers, and also with a version that did not employ these responses at all (instead it responded to correct turns with simple acknowledgements and to incorrect turns with generic hints). They found that using the adaptive tutor responses after all student turns significantly improved learning, but they did not find significant improvements when the responses were used contingent on the detection of uncertainty. However, the empirical basis of their study had several limitations which likely led to this null result. First, the uncertainty detection method was performed automatically by the system rather than by a human, and was based on only three linguistic cues. Second, although this study varies the content of system responses to uncertainty based on how human tutors adapt to uncertainty over and above correctness, their implemented responses each consisted of only a single tutor turn. Our study does not suffer from these limitations. In particular, in our study a human Wizard detects uncertainty based on a wide range of linguistic cues, and our implemented responses to uncertainty often consist of multiple tutor turns (depending on how the answer would have been treated if it were incorrect). Our study also differs from the Pon-Barry study with respect to the hypothesis being investigated. In particular, we are investigating whether treating student uncertainty as a learning impasse (i.e., responding to uncertain answers with the same response they would be given if they were incorrect) increases learning. Pon-Barry et al. (2006) investigated whether changing the content of the response to uncertain answers in a particular way (i.e. from an acknowlegement to a paraphrase, or from a hint to a reminder) increases learning. Furthermore, our study also differs from the Pon-Barry study in that our adaptive system adapts only to correct but uncertain student answers (because incorrect and uncertain student turns are already treated as learning impasses in our system).
The hypothesized result of our study is supported by our pilot correlation studies in our previously collected and annotated ITSPOKE corpora. These correlations suggest that responding to student uncertainty will add value over only responding to correctness, with respect to increasing student learning. In particular, we found that student correctness or incorrectness does not significantly correlate with learning in our ITSPOKE corpora. However, when student correct and incorrect turns are distinguished according to their uncertainty, we found significant negative correlations between incorrect but certain turns and learning (R = -.40, p < .01). In addition, we found trends for the proportion of correct but uncertain turns to negatively correlate with learning (R = -.37, p = .07). These correct but uncertain answers are learning impasses, but they are currently ignored in our system. This result suggests that ignoring these learning impasses can have a negative impact on learning, which in turn suggests that reacting to impasses identified by uncertainty detection could have a positive impact on learning.
Note that a correctness label for each student turn is automatically available from the NLU component of ITSPOKE's backend system (The Why2-Atlas system described in VanLehn, Jordan, Rosé et al. (2002)). In addition, student turns in our ITSPOKE corpora were labeled by a paid annotator as either uncertain or certain. A second annotator separately annotated a subset of the turns, yielding an inter-annotator agreement of 90% (0.68 Kappa). The annotation scheme derives from a pilot study (Litman and Forbes-Riley, (2004a)), in which we annotated student affective states in subsets of our corpora, including uncertain, frustrated, bored, and sad. We found that uncertainty occurs much more frequently than other affective states in our corpora, and expressions of uncertainty typically relate to the material being learned, in contrast to other states, which often also relate to other aspects of the tutoring process (e.g., frustration with speech recognition errors). Note that in our annotation scheme, the uncertain label is also used for turns that express confusion or frustration about the material being learned. For as Rozin and Cohen (2003) note, student confusion and frustration indicate an uncertainty about what to do next or how to act, or a need for clarification or more information. In addition, the certain label is used for all turns that did not express uncertainty, and so includes turns that explicitly express certainty as well as turns that are neutral with respect to expressions of certainty.
How is normal learning affected by responding to all learning impasses (i.e. student uncertainty and incorrectness) with further instructional dialogue vs. responding only to student incorrectness in this way?
Since student incorrectness and student uncertainty both represent learning impasses, responding to student uncertainty in the same way as incorrectness should significantly increase student learning during computer tutoring. Note that in our previously collected and annotated ITSPOKE corpora, correct but uncertain student answers represent about 20% of all learning impasses (including incorrect and correct but uncertain student answers).
We hypothesize that the response to uncertain student answers in the experimental condition will yield significantly higher learning gains than either not responding (first control condition), or treating a random subset of correct student answers as if they were incorrect (second control condition). In other words, additional tutoring should be most effective at points of student uncertainty, where students are motivated both to resolve their uncertainty, and to engage in constructive learning.
If a student answer is correct but uncertain, does the computer tutor respond with further dialogue (e.g., treat the answer as if it were incorrect) to reinforce the student’s understanding of the principle(s) under discussion? Note that in all conditions, incorrect answers (uncertain or not) are always responded to with additional sub-dialogue. This instructional method of adapting to uncertainty is similar to other kinds of error correction support.
The first control condition uses a non-adaptive version of the system, which only responds to incorrect student answers with further dialogue. The second control condition controls for the additional tutoring given in the experimental condition, by using a version of the system that responds to incorrect student answers with further dialogue and also responds to 15% of correct answers with further dialogue, to control for the additional tutoring in the experimental condition. This percentage represents the number of correct but uncertain student turns in our previously collected and annotated ITSPOKE corpora, as noted below.
The experimental procedure for this study is as follows: students (native speakers of American English) who have never taken college-level physics 1) read a short document of background physics material overviewing the material to be tutored, 2) take a “fill in the blank” pretest measuring their initial (post-reading) knowledge of the material to be tutored, 3) work through the first qualitative physics problem with the ITSPOKE WOZ, 4) take a posttest isomorphic to the pretest, 5) work through a second qualitative physics problem with the ITSPOKE WOZ (isomorphic to the first problem). Based on studies of two prior ITSPOKE corpora, the first (and second) physics problem takes about 20-25 minutes to complete, and ranges from 7 to 58 student turns, 15% of which are correct but uncertain on average.
Normal post-test: We will measure normal learning (near transfer, immediate testing) via comparisons of pretest and posttest scores.
We will also investigate the proportion of correct answers on the second physics problem, across conditions (Note: This does not measure learning as a consequence of the instructional conditions as these instructional variations are back in place during this phase).
We hypothesized that the response to uncertain student answers in the experimental condition would yield significantly higher learning gains on the post-test than either not responding (first control condition), or treating a random subset of correct student answers as if they were incorrect (second control condition).
However, in a two-way ANOVA with condition by repeated test measures design, there was a significant main effect for test phase (F(1,57) = 34.88, p = 0.000, MSe = 0.032), indicating that students in all conditions learned a significant amount during tutoring. However, there was no significant interaction effect between condition and test phase, indicating that how much students learned was not dependent on condition.
Based on these results, we hypothesize that the tutoring treatment was too short to yield significant differences between conditions in learning, as measured by our pretest and posttest.
We used the isomorphic second problem as an additional test of how the uncertainty adaptation in the first problem impacted student answers to the isomorphic questions in the test problem. The non-adaptive system was used in all conditions for the test problem, so that all students received the same "test". We hypothesized that the conditions would differ with respect to various dialogue-based performance metrics measured in the test problem, depending on whether or not the uncertainty adaptation was received in the first problem. Our performance metrics include correctness, uncertainty, and learning impasse severities. Our results are described in detail in Forbes-Riley et al. (2008). Among other results, we found that correct+uncertain answers are more likely to become incorrect in the second problem if the uncertainty adaptation is not received during the tutoring treatment (p < 1.0), but only in the random control condition are these answers also more likely to become nonuncertain (p < 0.05).
This study is part of the Interactive Communication cluster. Its hypothesis is essentially a precursor to investigating IC cluster’s central hypothesis (that robust learning occurs when the collaboration somehow appropriately balances the work done by the agents and their communication).
In particular, this study investigates whether normal learning increases when system responses are enhanced to depend on both the correctness and the uncertainty of the student answer. If uncertainty is a learning impasse, then the tutoring should bridge this impasse by providing the student an opportunity to better learn the material about which s/he is uncertain. Without this bridge, students must resolve their uncertainty without collaboration with the tutor. This study investigates one such bridge, namely, further dialogue that is normally invoked when the student answer is incorrect.
Aist, Gregory, Barry Kort, Rob Reilly, Jack Mostow, and Rosalind Picard. 2002. Experimentally augmenting an intelligent tutoring system with human-supplied capabilities: Adding human-provided emotional scaffolding to an automated reading tutor that listens. In Proceedings of Intelligent Tutoring Systems Conference (ITS) Workshop on Empirical Methods for Tutorial Dialogue Systems, pages 16-28, San Sebastian, Spain. (This paper shows that responding to student emotional states can increase student persistence.)
Bhatt, K., M. Evens, and S. Argamon. 2004. Hedged responses and expressions of affect in human/human and human/computer tutorial interactions. In Proceedings of Cognitive Science. (This paper describes emotion/uncertainty annotation in tutoring.)
Chi, Michelene, Nicholas De Leeuw, Mei-Hung Chiu, and Christian Lavancher. 1994. Eliciting self-explanations improves understanding. Cognitive Science, 18:439-477. (This paper shows that spontaneous self-explanation improves learning gains during tutoring.)
Craig, Scotty, Arthur Graesser, Jeremiah Sullins, and Barry Gholson. 2004. Affect and learning: an exploratory look into the role of affect in learning with AutoTutor. Journal of Educational Media, 29(3):241-250. (This paper describes correlations between uncertainty and learning in tutoring.)
Forbes-Riley, Kate, Diane Litman, and Mihai Rotaru. 2008. Responding to Student Uncertainty during Computer Tutoring: A Preliminary Evaluation. Proceedings 9th International Conference on Intelligent Tutoring Systems (ITS), Montreal, Canada, June. (This paper describes in detail the results of the experiment overviewed in this wiki page.)
Forbes-Riley, Kate, Diane Litman, Scott Silliman, and Amruta Purandare. 2008. Uncertainty Corpus: Resource to Study User Affect in Complex Spoken Dialogue Systems. Proceedings 6th Language Resources and Evaluation Conference (LREC 2008), Marrakech, Morocco, May-June. (This paper describes in detail the corpus collected from the experiment overviewed in this wiki page.)
Forbes-Riley, Kate and Diane Litman. 2004. Predicting emotion in spoken dialogue from multiple knowledge sources. In Proceedings of the Human Language Technology Conference: 4th Meeting of the North American Chapter of the Association for Computational Linguistics (HLT/NAACL), pages 201-208, Boston, MA. (This paper describes some of our prior work on automatically predicting student emotions in human-human tutoring.)
Kate Forbes-Riley, Mihai Rotaru, Diane J. Litman and Joel Tetreault. 2007. Exploring Affect-Context Dependencies for Adaptive System Development. Proceedings Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT), Rochester, NY, April. (This paper explores analyses leading to plausible automatic adaptations to student uncertainty.)
Hausmann, Robert and Michelene Chi. 2002. Can a computer interface support self-explaining? The International Journal of Cognitive Technology, 7(1):4-14. (This paper shows that spontaneous self-explanation occurs more frequently in spoken tutoring then in text-based tutoring.)
Litman, Diane. April 2008. Detecting and Adapting to Student Uncertainty in a Spoken Tutorial Dialogue System. Invited Talk at Affective Language in Human and Machine Symposium, AISB Convention, Aberdeen, Scotland. (This is an invited talk about the experiment described in this wiki page, as well as another experiment.)
Litman, Diane and Kate Forbes-Riley. 2004a. Annotating student emotional states in spoken tutoring dialogues. In Proceedings of 5th SIGdial Workshop on Discourse and Dialogue (SIGdial), pages 144-153, Boston, MA, April. (This paper describes some of our prior work on annotating student uncertainty and other emotions.)
Litman, Diane and Kate Forbes-Riley. 2004b. Predicting student emotions in computer-human tutoring dialogues. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), pages 352-359, Barcelona, Spain. (This paper describes some of our prior work on automatically predicting student emotions in human-computer tutoring.)
Litman, Diane and Scott Silliman. 2004. ITSPOKE: An intelligent tutoring spoken dialogue system. In Proceedings of the Human Language TechnologyConference /Third Meeting of the North American Chapter of the Association for Computational Linguistics (HLT/NAACL) (Companion Vol.), pages 233-236, Boston, MA. (This paper describes our spoken dialogue tutoring system.)
Litman, Diane and Carolyn P. Rose, Kate Forbes-Riley, Kurt VanLehn, Dumisizwe Bhembe, and Scott Silliman. 2006. Spoken Versus Typed Human and Computer Dialogue Tutoring. International Journal of Artificial Intelligence in Education, Volume 16, Pages 145-170. (This paper compares learning across spoken and typed human-human and human-computer tutoring.)
Pon-Barry, Heather, Karl Schultz, Elizabeth Owen Bratt, Brady Clark, and Stanley Peters. 2006. Responding to student uncertainty in spoken tutorial dialogue systems. International Journal of Artificial Intelligence in Education. In Press. (This paper describes a related controlled experiment on responding to student uncertainty during computer tutoring.)
VanLehn, Kurt, Pamela W. Jordan, Carolyn Rosé, Dumisizwe Bhembe, Michael Böttner, Andy Gaydos, Maxim Makatchev, Umarani Pappuswamy, Michael Ringenberg, Antonio Roque, Stephanie Siler, Ramesh Srivastava, and Roy Wilson. 2002. The architecture of Why2-Atlas: A coach for qualitative physics essay writing. In Proceedings of the 6th International Intelligent Tutoring Systems Conference, pages 158-167. (This paper describes the Why2-Atlas system, which is the text-based backend for our spoken dialogue tutoring system.)
VanLehn, Kurt, Stephanie Siler, and Charles Murray. 2003. Why do only some events cause learning during human tutoring? Cognition and Instruction, 21(3):209-249. (This paper defines learning impasses and describes studies correlating learning impasses and learning.)