Difference between revisions of "Extending Reflective Dialogue Support (Katz & Connelly)"

From LearnLab
Jump to: navigation, search
(Future plans)
(Future plans)
 
(9 intermediate revisions by the same user not shown)
Line 70: Line 70:
  
 
=== Findings ===
 
=== Findings ===
Preliminary analyses showed marginal support for a replication of last year's finding that student performance on the post-test relative to the pre-test was significantly influenced by the number of dialogues they completed, as opposed to the number of target problems they completed.  However, due to software glitches during data collection, low student participation (completing assigned homework and/or dialogues) in some course sections, and evidence of cheating and general "gaming the system" by some students, more detailed analyses were made possible only by the identification and omission of noisy data from our overall corpus.  This lengthy and painstaking process required us to revisit KCD and Andes logs in a much lower level of detail, in an attempt to determine whether each students' list of assigned problems and dialogues were viable (i.e., legitimately completed).
+
Preliminary analyses showed marginal support for a replication of last year's finding that student performance on the post-test relative to the pre-test was significantly influenced by the number of dialogues they completed, as opposed to the number of target problems they completed.  However, due to software glitches during data collection, low student participation (completing assigned homework and/or dialogues) in some course sections, and evidence of cheating and general "gaming the system" by some students, more detailed analyses were made possible only by the identification and omission of noisy data from our overall corpus.  This lengthy and painstaking process required us to examine KCD and Andes logs in a much lower level of detail than for prior studies, in an attempt to determine whether each students' list of assigned problems and dialogues represented viable data (i.e., if the problems and KCDs were ''legitimately'' completed).
  
In the end, the various problems plaguing our data rendered impossible any comparisons between our two treatment conditions.  However, we were able to salvage enough data to compare student performance relative to degrees of viable problem and dialogue completion.  The salvaged exposure measures were the number of viable KCDs completed (vs. KCDs that were "passed through" with gibberish responses) and the number of viable problems completed (vs. those on which students likely "cheated", as determined by answer-only "solutions" or minimal Andes inputs to attain a score of 50, the minimal criterion for credit used by one course instructor).
+
In the end, the various problems plaguing our data rendered impossible any comparisons between our two treatment conditions.  However, we were able to salvage enough data to compare student performance relative to degrees of viable problem and dialogue completion.  The salvaged exposure measures were the number of viable KCDs completed (vs. KCDs that were "passed through" with gibberish responses) and the number of viable problems completed (vs. those on which students likely "cheated", as determined by answer-only "solutions" or minimal Andes inputs to attain a score of 50, the minimal criterion for full credit used by one course instructor).
 +
 
 +
==== Pre- and Post-Tests ====
 +
 
 +
Regressing post-test score on pre-test score, CQPR, number of KCDs completed, and number of target problems completed (''R''<SUP>2</SUP> = .52, ''F''(4, 54) = 14.82, ''p'' < .00001) showed positive contributions of all factors, but only pre-test score and CQPR were statistically significant (''p''s < .0005 & .05, respectively); KCD completion was marginal (''p'' = .08) and problem completion was ns (t < 1).  After omitting CQPR from the regression model (''R''<SUP>2</SUP> = .50, ''F''(3, 57) = 18.74, ''p'' < .00001), the effect of KCD completion reached statistical significance (''p'' = .012) while that of problem completion remained ''ns'' (''t'' < 1).
 +
 
 +
Regressing post-test qualitative subscore on pre-test qualitative subscore, CQPR, number of KCDs completed, and number of target problems completed (''R''<SUP>2</SUP> = .37, ''F''(4, 54) = 7.96, ''p'' < .00001) showed positive contributions of all factors, but only pre-test subscore was statistically significant (''p'' < .005); KCD completion reached marginal status (''p'' = .07) only after dropping CQPR from the regression model (''R''<SUP>2</SUP> = .35, ''F''(3, 57) = 10.33, ''p'' < .00001).  The effect of problem completion was ''ns'' (''t'' < 1) in both models.
 +
 
 +
Regressing post-test quantitative subscore on pre-test quantitative subscore, CQPR, number of KCDs completed, and number of target problems completed (''R''<SUP>2</SUP> = .46, ''F''(4, 54) = 11.53, ''p'' < .00001) showed positive contributions of all factors, but only pre-test subscore and CQPR were statistically significant (''p''s < .0005 & .01, respectively); KCD completion reached statistical significance (''p'' = .015) only after dropping CQPR from the regression model (''R''<SUP>2</SUP> = .41, ''F''(3, 57) = 13.31, ''p'' < .00001).  The effect of problem completion was ''ns'' in both models.
 +
 
 +
A marginal regression of normalized (Estes) gain scores on CQPR, number of KCDs completed, and number of target problems completed (''R''<SUP>2</SUP> = .12, ''F''(3, 55) = 2.59, ''p'' = .062) showed positive contributions of all factors, but the only factor that was even marginal was KCD completion (''p'' = .09).  However, omitting CQPR from the model resulted in a significant regression (''R''<SUP>2</SUP> = .13, ''F''(2, 58) = 4.24, ''p'' < .05) with a significant effect of KCD completion (''p'' = .016).  The effect of problem completion was ''ns'' (''t'' < 1) in both models.
 +
 
 +
==== Final Exam & Course Grades ====
 +
 
 +
A regression of exam score on CQPR, KCD completion counts, and problem completion counts was significant (''R''<SUP>2</SUP> = .47, ''F''(3, 68) = 20.17, ''p'' < .00001) and showed positive contributions of all three factors, but only CQPR had a statistically significant effect (''p'' < .00001).  However, when I omitted CQPR from the model (''R''<SUP>2</SUP> = .16, ''F''(2, 72) = 7.08, ''p'' < .005), both KCD and problem completion counts reached statistical significance (''p''s < .05).  A regression of final grade on all three factors was significant (''R''<SUP>2</SUP> = .56, ''F''(3, 53) = 22.10, ''p'' < .00001), but again only CQPR had a statistically significant effect (p < .00001).  When I omitted CQPR from this model (''R''<SUP>2</SUP> = .13, ''F''(2, 55) = 3.99, ''p'' < .05), neither factor of interest reached significance, but KCD completion had a stronger marginal effect (''p'' = .073) than did problem completion (''p'' = .106).
 +
 
 +
=== Explanation ===
 +
 
 +
Despite being unable to perform many of our intended analyses, including examinations of group differences between our old Short KCDs and new Long KCDs, we were able to replicate our prior finding (Katz et al. 2007) that it was KCD completion, as opposed to the completion of target homework problems, that significantly (for most scores and subscores) improved post-test performance.  In other words, whether the KCDs were Short or Long, doing more of them better predicted post-test scores than did the solving of homework problems.
 +
 
 +
Moreover, we found that both factors significantly improved scores on the final exam.  That is, the more KCDs and target homework problems students did, the better they performed on the final exam.  Although the effects of both factors were not statistically significant on their final course grades, the same trend existed, with KCDs being a stronger predictor of course grades that were homework problems.
 +
 
 +
This marked the first time in our line of work with KCDs (Connelly & Katz, 2006; Katz et al., 2005, 2007) that learning benefits of our dialogues transferred to longer-term performance measures.  Future work could investigate the relative benefits of our Short, largely qualitative KCDs versus our Long KCDs with both qualitative and quantitative knowledge, as well as explicit ties between them.
 +
 
 +
This study also has implications for other studies of learning gains from instructional interventions, which tend to show small (if any) effects.  Investigator attempts to “clean” their data, after being confronted with “cheating” and “gaming” behaviors, may be worthwhile in that they might increase the signal to noise ratio enough for intended (or perhaps even unintended) learning effects to emerge.
  
 
=== Further Information ===
 
=== Further Information ===
 
==== Annotated bibliography ====
 
==== Annotated bibliography ====
 
* Presentation to Advisory Board, January 2008
 
* Presentation to Advisory Board, January 2008
 +
* Virtual brief paper (results with salvaged data) accepted at ED-MEDIA09:
 +
** Connelly, J., & Katz, S. (in press). Toward more robust learning of physics via reflective dialogue extensions. To appear in ''Proceedings of ED-MEDIA 2009''
  
 
==== References ====
 
==== References ====
 +
*Connelly, J., & Katz, S. (2006). Intelligent dialogue support for physics problem solving: Some preliminary mixed results. ''Technology, Instruction, Cognition and Learning, 4'', 1-29. Philadelphia, PA: Old City Publishing.
 +
*Katz, S., Connelly, J., & Wilson, C. (2005). When should dialogues in a scaffolded learning environment take place? In P. Kommers & G. Richards (Eds.), ''Proceedings of ED-MEDIA 2005'' (pp. 2850-2855). Norfolk, VA: AACE.
 
*Katz, S., Connelly, J., & Wilson, C. (2007).  Out of the lab and into the classroom: An evaluation of reflective dialogue in Andes.  In R. Luckin, K. R. Koedinger, & J. Greer (Eds.), ''Artificial Intelligence in Education: Building Technology Rich Learning Contexts that Work'' (pp. 425-432). Amsterdam: IOS Press.
 
*Katz, S., Connelly, J., & Wilson, C. (2007).  Out of the lab and into the classroom: An evaluation of reflective dialogue in Andes.  In R. Luckin, K. R. Koedinger, & J. Greer (Eds.), ''Artificial Intelligence in Education: Building Technology Rich Learning Contexts that Work'' (pp. 425-432). Amsterdam: IOS Press.
 
*Leonard, W. J., Dufresne, R. J., & Mestre, J. P. (1996). Using qualitative problem-solving strategies to highlight the role of conceptual knowledge in solving problems. ''American Journal of Physics, 64'' (12), 1495-1503.
 
*Leonard, W. J., Dufresne, R. J., & Mestre, J. P. (1996). Using qualitative problem-solving strategies to highlight the role of conceptual knowledge in solving problems. ''American Journal of Physics, 64'' (12), 1495-1503.
Line 101: Line 129:
 
====  Future plans ====
 
====  Future plans ====
 
Our future plans to wrap up the project:
 
Our future plans to wrap up the project:
* write conference paper reporting updated findings on salvaged data
+
* resubmit rejected conference paper detailing the process by which we salvaged usable data
* write conference paper detailing the process by which we salvaged usable data
 
  
 
[[Category:Study]]
 
[[Category:Study]]

Latest revision as of 21:07, 4 May 2009

Extending Automated Dialogue Support for Robust Learning of Physics

Sandra Katz

Summary Table

PIs Sandra Katz & John Connelly
Study Start Date 10/1/07
Study End Date 9/30/08
LearnLab Site USNA
LearnLab Course General Physics I
Number of Students N = 75
Total Participant Hours approx. 125 hrs
DataShop Yes

Abstract

Research on student understanding and problem-solving ability in first-year college physics courses shows that some students become adept at solving quantitative problems but do poorly on tests of conceptual knowledge and qualitative problem-solving ability, while other students show at least a glimmer of understanding of basic physics concepts and principles but are unable to use this knowledge to solve quantitative problems. The present research seeks to integrate quantitative and qualitative knowledge via post-practice reflection dialogues that guide students in learning and practicing the concepts and principles associated with a just-solved physics problem. It builds upon our 2006 LearnLab study (Katz, Connelly, & Treacy) by trying to better support robust learning via a third condition in which students work through longer dialogues designed to foster transfer by specifically tying together qualitative and quantitative knowledge in different contexts.

Software errors and students' circumvention of both the system and instructors’ scoring rubrics rendered our intended detailed analyses impossible. However, despite the various manifestations of “cheating” and otherwise gaming the system that were discovered, we found that dialogue exposure significantly boosted gains on a post-test and on a largely quantitative final exam two months after the end of our intervention, providing evidence of some robust learning and transfer to related problem solving contexts.

Glossary

Research question

Does explicit training in the three main components of problem-solving knowledge (i.e., knowledge about what principles apply to a problem, how to apply these principles, and why to apply them), combined with quantitative practice in applying these knowledge components in different contexts, enhance students’ problem-solving ability more than additional problem solving and better foster transfer and robust learning?

Background and Significance

Research on student understanding and problem-solving ability in first-year college physics courses shows that instructors deal with a double-edged sword. Some students become adept at solving quantitative problems but do poorly on tests of conceptual knowledge and qualitative problem-solving ability. Other students display the reverse problem: they show at least a glimmer of understanding of basic physics concepts and principles, but are unable to use this knowledge to solve quantitative problems. Still other students master neither qualitative nor quantitative understanding of physics; very few master both. Thus, the instructional challenge motivating this project is to find effective pedagogical strategies to integrate quantitative and qualitative knowledge. Our scientific goal is to determine whether explicit and implicit learning can be effectively combined via post-practice dialogues that guide students in reflecting on the concepts and principles associated with a just-solved physics problem. The main hypothesis tested is that, in the context of tutored problem solving, integrative reflective dialogues that explicitly tie qualitative knowledge to quantitative knowledge can improve quantitative problem-solving ability and retention of qualitative knowledge better than problem-solving practice (implicit learning) alone.

To test this hypothesis, we conducted an experiment in the PSLC Physics LearnLab at the US Naval Academy in sections that use the Andes physics tutoring system (VanLehn et al., 2005a, 2005b). We compared students who were randomly assigned to one of three conditions on measures of qualitative and quantitative problem-solving performance. The two treatment conditions engaged in automated reflective dialogues after solving quantitative physics problems, while the control condition solved the same set of problems (plus a few additional problems to balance time on task) without any reflective dialogues, using the standard version of Andes. In one treatment condition, the reflective dialogues individually targeted the three main types of knowledge that experts employ during problem solving, according to Leonard, Dufresne, & Mestre (1996): knowledge about what principle(s) to apply to a given problem, how to apply these principles (e.g., what equations to use), and why to apply them—that is, what the applicability conditions are (Leonard, Dufresne, & Mestre, 1996). In a prior LearnLab study (Katz, Connelly, & Treacy), this intervention significantly improved students’ qualitative understanding of basic mechanics, as measured by pre-test to post-test gain scores. However, students did not outperform standard Andes users on more robust learning measures of transfer (e.g., performance on quantitative course exams) and on a measure of retention of qualitative problem-solving ability (Katz, Connelly, & Wilson, 2007). The extended dialogue condition we implemented differs from the other dialogue condition in three main ways: (1) reflective dialogues contained more problem variations (what if scenarios), designed to support both qualitative and quantitative knowledge (most of our previous what if scenarios were qualitative only); (2) these “what if” scenarios were tied to the corresponding Andes problem-solving context and to new contexts, to help support near and far transfer; and (3) students were prompted to state the rules (knowledge components) applied to solving the problem variations, in order to promote a principle-based approach to learning, and they were given feedback that makes these rules explicit. Our goal is to determine whether reflective dialogues that make the links between qualitative and quantitative physics knowledge explicit are more effective than both our previous dialogues and an implicit learning condition that is based on problem-solving practice alone.

Dependent Variables

  • Gains in qualitative and quantitative knowledge. Post-test score, and pre-test to post-test gain scores, on near and far transfer items.
  • Long-term retention. Performance on final exam, taken several weeks after the intervention.

We had also intended to measure short-term retention via student performance on course exams that covered target topics (statics; translational dynamics, including circular motion; work-energy; power; and linear momentum, including impulse). However, critical omissions in the data provided to us by course instructors rendered such measurements impossible.

Independent Variables

Students within each course section were block-randomly assigned to one of three dialogue conditions in which the usage of Knowledge Construction Dialogues (KCDs) differed. The short-KCD condition approximated the treatment condition from last year's study (Katz, Connelly, & Treacy), in which mostly conceptual dialogues followed problem solving on selected Andes problems.

6kcd.jpg

The new long-KCD condition appended more quantitative practice and transfer content (via additional what if scenarios) to the short KCDs; students in this condition were assigned five fewer Andes problems than those in the short-KCD condition to attempt to equate time on task.

7lkcd.jpg

The control condition was identical to last year's; students saw no KCDs and were assigned five more Andes problems than the short-KCD students.

E1b-50.jpg

The following student variables were entered into a regression analysis, with post-test score as the dependent variable:

  • Number of viable target problems completed (before the post-test, or before the final exam)
  • Number of viable (non-gibberish) dialogues completed
  • Grade point average (CQPR)
  • Pre-test score

Hypothesis

The main hypotheses tested were (a) that explicit, part-task training on the “what? how? and why?” components of planning, via strategically staged reflective dialogues, would be more effective and efficient than the traditional approach of letting students acquire these components implicitly, through lots of practice with solving problems (replicating last year's study); and (b) that extended dialogues with additional quantitative practice in applying these knowledge components in different problem-solving contexts would better foster transfer and robust learning.

Findings

Preliminary analyses showed marginal support for a replication of last year's finding that student performance on the post-test relative to the pre-test was significantly influenced by the number of dialogues they completed, as opposed to the number of target problems they completed. However, due to software glitches during data collection, low student participation (completing assigned homework and/or dialogues) in some course sections, and evidence of cheating and general "gaming the system" by some students, more detailed analyses were made possible only by the identification and omission of noisy data from our overall corpus. This lengthy and painstaking process required us to examine KCD and Andes logs in a much lower level of detail than for prior studies, in an attempt to determine whether each students' list of assigned problems and dialogues represented viable data (i.e., if the problems and KCDs were legitimately completed).

In the end, the various problems plaguing our data rendered impossible any comparisons between our two treatment conditions. However, we were able to salvage enough data to compare student performance relative to degrees of viable problem and dialogue completion. The salvaged exposure measures were the number of viable KCDs completed (vs. KCDs that were "passed through" with gibberish responses) and the number of viable problems completed (vs. those on which students likely "cheated", as determined by answer-only "solutions" or minimal Andes inputs to attain a score of 50, the minimal criterion for full credit used by one course instructor).

Pre- and Post-Tests

Regressing post-test score on pre-test score, CQPR, number of KCDs completed, and number of target problems completed (R2 = .52, F(4, 54) = 14.82, p < .00001) showed positive contributions of all factors, but only pre-test score and CQPR were statistically significant (ps < .0005 & .05, respectively); KCD completion was marginal (p = .08) and problem completion was ns (t < 1). After omitting CQPR from the regression model (R2 = .50, F(3, 57) = 18.74, p < .00001), the effect of KCD completion reached statistical significance (p = .012) while that of problem completion remained ns (t < 1).

Regressing post-test qualitative subscore on pre-test qualitative subscore, CQPR, number of KCDs completed, and number of target problems completed (R2 = .37, F(4, 54) = 7.96, p < .00001) showed positive contributions of all factors, but only pre-test subscore was statistically significant (p < .005); KCD completion reached marginal status (p = .07) only after dropping CQPR from the regression model (R2 = .35, F(3, 57) = 10.33, p < .00001). The effect of problem completion was ns (t < 1) in both models.

Regressing post-test quantitative subscore on pre-test quantitative subscore, CQPR, number of KCDs completed, and number of target problems completed (R2 = .46, F(4, 54) = 11.53, p < .00001) showed positive contributions of all factors, but only pre-test subscore and CQPR were statistically significant (ps < .0005 & .01, respectively); KCD completion reached statistical significance (p = .015) only after dropping CQPR from the regression model (R2 = .41, F(3, 57) = 13.31, p < .00001). The effect of problem completion was ns in both models.

A marginal regression of normalized (Estes) gain scores on CQPR, number of KCDs completed, and number of target problems completed (R2 = .12, F(3, 55) = 2.59, p = .062) showed positive contributions of all factors, but the only factor that was even marginal was KCD completion (p = .09). However, omitting CQPR from the model resulted in a significant regression (R2 = .13, F(2, 58) = 4.24, p < .05) with a significant effect of KCD completion (p = .016). The effect of problem completion was ns (t < 1) in both models.

Final Exam & Course Grades

A regression of exam score on CQPR, KCD completion counts, and problem completion counts was significant (R2 = .47, F(3, 68) = 20.17, p < .00001) and showed positive contributions of all three factors, but only CQPR had a statistically significant effect (p < .00001). However, when I omitted CQPR from the model (R2 = .16, F(2, 72) = 7.08, p < .005), both KCD and problem completion counts reached statistical significance (ps < .05). A regression of final grade on all three factors was significant (R2 = .56, F(3, 53) = 22.10, p < .00001), but again only CQPR had a statistically significant effect (p < .00001). When I omitted CQPR from this model (R2 = .13, F(2, 55) = 3.99, p < .05), neither factor of interest reached significance, but KCD completion had a stronger marginal effect (p = .073) than did problem completion (p = .106).

Explanation

Despite being unable to perform many of our intended analyses, including examinations of group differences between our old Short KCDs and new Long KCDs, we were able to replicate our prior finding (Katz et al. 2007) that it was KCD completion, as opposed to the completion of target homework problems, that significantly (for most scores and subscores) improved post-test performance. In other words, whether the KCDs were Short or Long, doing more of them better predicted post-test scores than did the solving of homework problems.

Moreover, we found that both factors significantly improved scores on the final exam. That is, the more KCDs and target homework problems students did, the better they performed on the final exam. Although the effects of both factors were not statistically significant on their final course grades, the same trend existed, with KCDs being a stronger predictor of course grades that were homework problems.

This marked the first time in our line of work with KCDs (Connelly & Katz, 2006; Katz et al., 2005, 2007) that learning benefits of our dialogues transferred to longer-term performance measures. Future work could investigate the relative benefits of our Short, largely qualitative KCDs versus our Long KCDs with both qualitative and quantitative knowledge, as well as explicit ties between them.

This study also has implications for other studies of learning gains from instructional interventions, which tend to show small (if any) effects. Investigator attempts to “clean” their data, after being confronted with “cheating” and “gaming” behaviors, may be worthwhile in that they might increase the signal to noise ratio enough for intended (or perhaps even unintended) learning effects to emerge.

Further Information

Annotated bibliography

  • Presentation to Advisory Board, January 2008
  • Virtual brief paper (results with salvaged data) accepted at ED-MEDIA09:
    • Connelly, J., & Katz, S. (in press). Toward more robust learning of physics via reflective dialogue extensions. To appear in Proceedings of ED-MEDIA 2009

References

  • Connelly, J., & Katz, S. (2006). Intelligent dialogue support for physics problem solving: Some preliminary mixed results. Technology, Instruction, Cognition and Learning, 4, 1-29. Philadelphia, PA: Old City Publishing.
  • Katz, S., Connelly, J., & Wilson, C. (2005). When should dialogues in a scaffolded learning environment take place? In P. Kommers & G. Richards (Eds.), Proceedings of ED-MEDIA 2005 (pp. 2850-2855). Norfolk, VA: AACE.
  • Katz, S., Connelly, J., & Wilson, C. (2007). Out of the lab and into the classroom: An evaluation of reflective dialogue in Andes. In R. Luckin, K. R. Koedinger, & J. Greer (Eds.), Artificial Intelligence in Education: Building Technology Rich Learning Contexts that Work (pp. 425-432). Amsterdam: IOS Press.
  • Leonard, W. J., Dufresne, R. J., & Mestre, J. P. (1996). Using qualitative problem-solving strategies to highlight the role of conceptual knowledge in solving problems. American Journal of Physics, 64 (12), 1495-1503.
  • VanLehn, K., Lynch, C., Schmulze, K., Shapiro, J.A., Shelby, R., Taylor, L., Treacy, D., Weinstein, A., & Wintersgill, M. (2005). The Andes physics tutoring system: Lessons learned. International Journal of Artificial Intelligence and Education, 15(3).
  • VanLehn, K., Lynch, C., Schmulze, K., Shapiro, J.A., Shelby, R., Taylor, L., Treacy, D., Weinstein, A., & Wintersgill, M. (2005). The Andes physics tutoring system: Five years of evaluation. In G. McCalla, C.K. Looi, B. Bredeweg, & J. Breuker (Eds.), Artificial Intelligence in Education (pp. 678-685). Amsterdam, Netherlands: IOS Press.

Connections

This project shares features with the following research projects:

Use of Questions during learning

Self explanations during learning

Future plans

Our future plans to wrap up the project:

  • resubmit rejected conference paper detailing the process by which we salvaged usable data