Difference between revisions of "Baker - Building Generalizable Fine-grained Detectors"

From LearnLab
Jump to: navigation, search
(first draft)
 
m (Reverted edits by Woolerystixmaker (Talk); changed back to last version by Alida)
 
(34 intermediate revisions by 3 users not shown)
Line 1: Line 1:
== Improving student affect through adding game elements to mathematics LearnLabs ==
+
== Building Generalizable Fine-grained Detectors ==
 +
 
 
=== Summary Table ===
 
=== Summary Table ===
 
====Study 1====
 
====Study 1====
 
{| border="1" cellspacing="0" cellpadding="5" style="text-align: left;"
 
{| border="1" cellspacing="0" cellpadding="5" style="text-align: left;"
| '''PIs''' || Vincent Aleven, Ryan Baker
+
| '''PIs''' || Ryan Baker, Vincent Aleven
 
|-
 
|-
| '''Other Contributers''' || Adriana de Carvalho (Research Associate, CMU HCII); Owen Durni, and Matt Morrill (Undergraduate research assistant, and former PSLC summer intern); Daniel Batalha (Artist, independent contractor)
+
| '''Other Contributors''' || Sidney D'Mello (Consultant, University of Memphis), Ma. Mercedes T. Rodrigo (Consultant, Ateneo de Manila University)
  
 
|-
 
|-
| '''Study Start Date''' || Fall, 2009
+
| '''Study Start Date''' || February, 2010
 
|-
 
|-
| '''Study End Date''' ||  
+
| '''Study End Date''' || February, 2011
 
|-
 
|-
 
| '''LearnLab Site''' || TBD
 
| '''LearnLab Site''' || TBD
 
|-
 
|-
| '''LearnLab Course''' || Middle-School Mathematics
+
| '''LearnLab Course''' || Algebra, Geometry, Chemistry, MathTutor, ScienceAssistments
 
|-
 
|-
| '''Number of Students''' ||  
+
| '''Number of Students''' || 78 so far; total TBD
 
|-
 
|-
| '''Total Participant Hours''' ||  
+
| '''Total Participant Hours''' || 444 so far; total TBD
 
|-
 
|-
| '''DataShop''' ||  
+
| '''Data available in DataShop''' || [https://pslcdatashop.web.cmu.edu/DatasetInfo?datasetId=431 Dataset: CMU VlabHomeworks F2010]<br>
 +
[https://pslcdatashop.web.cmu.edu/DatasetInfo?datasetId=448 Dataset: Affect Detectors and Questionnaires Greenville 2010-11]<br>
 +
* '''Pre/Post Test Score Data:''' TBD
 +
* '''Paper or Online Tests:''' TBD
 +
* '''Scanned Paper Tests:''' TBD
 +
* '''Blank Tests:''' TBD
 +
* '''Answer Keys: ''' TBD
 
|}
 
|}
  
 
=== Abstract ===
 
=== Abstract ===
There is much evidence to believe that games are fun. Can we incorporate some of the features that make games fun into intelligent tutors, in a way that improves motivation, generates positive affect, and improves the robustness of student learning? Specifically, what happens if we take game elements known to be effective such as fantasy, competition, and trivial choice, and embed into tutors already known to promote learning, using principles in PSLC theoretical framework? In the current project (which started in Year 5 of the PSLC), we are investigating the effect of adding game elements to an existing set of fractions tutors developed by Martina Rau (Rau, Aleven, & Rummel, in press) for a different PSLC project. The game elements comprise a fantasy soccer game, where success in the soccer game depends on learning progress in the factions tutors.
+
This project, joint between M&M and CMDM, will create a set of fine-grained detectors of affect and M&M behaviors. These detectors will be usable by future projects in these two thrusts to study the impact of learning interventions on these dimensions of students’ learning experiences, and to study the inter-relationships between these constructs and other key PSLC constructs (such as measures of robust learning, and motivational questionnaire data). It will be possible to apply these detectors retrospectively to existing PSLC data in [[DataShop]], in order to re-interpret prior work in the light of relevant evidence on students’ affect and M&M behaviors.  
  
 
=== Background & Significance ===
 
=== Background & Significance ===
Games and tutors appear to have complementary strengths. The current project is an attempt to determine if we can develop learning environments that leverage the strength of each type of learning environment, creating learning software that is as motivationally effective as games but promotes robust learning as well as intelligent tutors. We investigate one particular way of integrating game elements and learning content, building a game around an intelligent tutor engine.
 
 
How best to integrate learning content and game elements has been the subject of much theorizing, with some authors arguing that optimal learning requires that the learning content and game world be mutually dependent (Lepper & Malone, 1987), and others arguing that the learning should be embedded in the game’s core mechanic (Habgood, 2007). However, such theories ignore the real-world success (at least in anecdotal reports from teachers and students) of environments that feature a much looser integration between learning and motivational embellishments (e.g., FirstInMath). Given that a loose coupling between game elements and learning activities is far easier to implement (since it avoids the difficult problem of embedding math problems in a storyline or game context, hard to do especially if the learning content is to be adapted to individual students’ learning results), it is reasonable to investigate this option first. It may well be that as long as the game features are “cool,” the degree of integration is not really a strong factor. As mentioned, the success of for example the motivational embellishments in FirstInMath certainly suggest so.
 
  
 
=== Glossary ===
 
=== Glossary ===
  
 
[[Metacognition and Motivation]]
 
[[Metacognition and Motivation]]
 +
 +
[[Computational Modeling and Data Mining]]
 +
 +
[[Gaming the system]]
 +
 +
[[Off-Task Behavior]]
 +
 +
[[Affect]]
 +
 +
[[Frustration]]
 +
 +
[[Boredom]]
 +
 +
[[Flow]]
 +
 +
[[Engaged Concentration]]
  
 
=== Hypotheses ===
 
=== Hypotheses ===
  
;H1
+
H1: We hypothesize that it will be possible to develop reasonably accurate detectors of student affect for four LearnLabs, that detect affect using only the data from the interaction between the student and the keyboard/mouse.
: A tutor with game features will lead to equal robust-learning outcomes as an unmodified tutor covering the same material
+
 
 +
H2: We hypothesize that models of behaviors such as gaming the system, and off-task behavior, in combination with models of affect/behavior dynamics, can make affect detectors more accurate.
 +
 
 +
H3: We hypothesize that models created using data from three LearnLabs will perform significantly better than chance in data from a fourth LearnLab, with no re-training (or limited EM-based modification that requires no new labeled data).
 +
 
 +
H4: We hypothesize that these affect models will become a valuable component of future research in the M&M and CMDM thrusts.
 +
 
 +
=== Research Process ===
 +
 
 +
We will develop detectors of the M&M (metacognitive & motivational) behaviors of gaming the system, off-task behavior, proper help use, on-task conversation, help avoidance and self-explanation without scaffolding. This set of behaviors has already been effectively detected in mathematics LearnLabs. We will model the dynamics between these behaviors and student affect (following on work in the PSLC and at Memphis), in order to be able to leverage these detectors to create detectors of the affective states of engaged concentration, boredom, confusion, and frustration (the dynamics models will enable us to set Bayesian priors for how likely an affective state is at a given time).
 +
 
 +
These detectors will be developed for multiple LearnLabs, and the generalizability of detectors across LearnLabs will be one of the focuses of study during this project. We anticipate developing detectors for Algebra and Geometry, the Chemistry Virtual Lab, MathTutor, and Science ASSISTments. Each of these learning environments presents a context where complex learning occurs, fine-grained interaction behavior is logged, and the outputs of the detectors will provide leverage on a number of research questions of interest.
 +
 
 +
“Ground truth” for the M&M behavior categories will be established through quantitative field observations. “Ground truth” for the affect categories will be established by field observations and infrequent pop-up questions. Work will be conducted to increase the reliability of quantitative field observations of affect to a standard considered appropriate by psychology journals, through repeated coding and discussion sessions and the development of a detailed coding manual based on prior work to code affect in field settings and work to code emotions from facial expressions.
 +
 
 +
Models will be developed solely using distilled log file data of the sort currently collected in [[DataShop]] (more sophisticated sensors will NOT be included in this project). The models will be built with a combination of machine learning, and knowledge engineering (specifically, through leveraging and adapting existing knowledge engineered models such as Aleven et al’s help-seeking model and Shih et al’s self-explanation model). Generalization of models across learning environments will involve expectation maximization to adapt models to new data sets, and/or leveraging the CTLVS1 taxonomy to develop meta-models that relate prediction features to design features. We will first develop models for individual learning environments and then extend them across environments.
 +
 
 +
=== Research Plan ===
 +
 
 +
1. Develop software for conducting field observations (cf. Baker et al, 2004) with PDAs and synchronizing with [[DataShop]] data -- software development completed, as of Aug 2010 synchronization verification in progress
 +
 
 +
2. Study and improve quantitative field coding of student affect states
 +
 
 +
* The Research Associate and Assistant will conduct multiple coding and discussion sessions with the PI, and develop a detailed coding manual (including some video examples)
 +
 +
3. Collect training data (months 4-7) -- as of Aug 2010 first data set collected, other data collection in progress
 +
 
 +
* Starting first in one LearnLab and rolling across LearnLabs, so that we have all the data for one LearnLab first. Collecting data on all constructs at once. Then the programmer/PI can start developing detectors for constructs in first LearnLab, while the RAs keep collecting more data in the second and subsequent LearnLabs
 +
* Quantitative field observations (cf. Baker et al, 2004)
 +
 
 +
4. Develop detectors (months 5-8)
 +
 
 +
*      Utilizing combination of existing data mining tools and code previously used by Baker to create Latent Response Model-based detectors of [[Gaming the System]] and [[Off-Task Behavior]]
 +
 
 +
* Develop and leverage behavior-affect temporal dynamics models (cf. D’Mello et al, 2007; Baker, Rodrigo, & Xolocotzin, 2007) to create priors for predicting affect
  
;H2
+
* Use log data to predict field observations, student responses
: A tutor with game features will lead to significantly better affect (e.g. less boredom and frustration; more delight) than an unmodified tutor covering the same material
 
  
;H3
+
* Student-level cross-validation used for assessing goodness of detectors
: Students will report higher liking of a tutor with gaming features than an unmodified tutor covering the same material
 
  
;H4
+
5. Develop meta-detectors (months 9-12)
: Students will choose to play the tutor with game features for longer than an unmodified tutor covering the same material, given the option to choose other activities.
+
 
 +
* Use expectation maximization to adapt models to new data sets
 +
 
 +
* Leverage the CTLVS1 taxonomy to develop meta-models that relate prediction features to design features
 +
 
 +
* Cross-validation at grain-size of transfer between units or corresponding (within each LearnLab) to validate appropriateness for whole LearnLab
 +
 
 +
* Test goodness of models when {train on 3 tutors, transfer to tutor #4} to evaluate effectiveness for entirely new tutors
  
 
=== Independent Variables ===
 
=== Independent Variables ===
We are investigating the effect of loosely coupling the fractions tutors to a parallel fantasy world. This fantasy world was designed in a participatory design process involving 6th graders (the target population). The main theme (soccer) was chosen because it will appeal to the taste of both girls and boys.
 
  
In the fantasy world, a soccer game takes place on a tropical beach between a motley crew of friendly animal characters and a group of invading pirates. This unfriendly bunch is intent on taking over the beach, a disaster that can be averted only by beating the pirates at the soccer game. The student regularly switches between the tutors and the fantasy world (in a manner controlled largely by the system). During each visit to the fantasy world, the student selects a game move (e.g., a deep pass, or less risky dribble). The outcome of that move will not be known until the student’s next visit to the fantasy world, and the probability of success of that move depends on the student’s performance on the intervening fractions problems. (It may be clear that this coupling between game elements and learning is loose: different learning content could be slotted in without changing the game.) 
+
n/a (see Research Plan)
 +
 
 +
=== Dependent Variables ===
 +
 
 +
n/a (see Research Plan)
 +
 
 +
=== Affective States and M&M Behaviors to be Modeled ===
  
<br><center>[[Image:Amigos.jpg]]<br> An image from the "pirate soccer" fantasy world</center><br>
+
Affective States:
 +
* Engaged Concentration (a subset of [[Flow]]) (cf. Baker et al, 2010)
 +
* Boredom (Kapoor, Burleson, & Picard, 2007)
 +
* Frustration (Kapoor, Burleson, & Picard, 2007)
  
=== Dependent Variables ===
+
M&M Behaviors:
We will evaluate whether this game leads to increased learning, compared to a standard tutor without the game elements, and whether any improvement in learning we may observe is mediated by increased motivation and more positive affect. Affect will be assessed by means of field observations, motivation (interest, self-efficacy) by means of questionnaires. We will evaluate both whether the game leads to greater persistence (when student have the option to do something else), and whether it leads to greater learning even when the set of learning activities (but not time) is held constant (which if true would mean that the game elements lead to deeper processing during the same set of learning activities).
 
  
=== Planned Experiments ===
+
* [[Gaming the system]] (Baker et al, 2004)
We will conduct two in-vivo experiments comparing the tutor with game features against the regular tutor.
+
* [[Off-Task Behavior]] (Baker, 2007)
 +
* Proper Help Use (Aleven et al, 2006)
 +
* On-Task Conversation
 +
* [[Help Avoidance]] (Aleven et al, 2006)
 +
* [[Self-Explanation]] without scaffolding (Shih et al, 2008)
  
The first experiment, to be conducted in LearnLab schools in Fall 2009, will control for time. Students will be randomly selected to use either the game or the unmodified tutor (for equity, all students will receive access to the game over the web after the study). Motivation and liking will be assessed by pre-test and post-test questionnaires, and affect will be assessed by quantitative field observations during usage. Robust learning will be measured by pre-test and post-test.
+
=== Planned Studites ===
  
The second experiment, to be conducted in LearnLab schools in Spring 2010, will allow time to vary . Students will be randomly selected to use either the game or the unmodified tutor (for equity, all students will receive access to the game over the web after the study). Students will be required to use the condition for one class period, and then in two subsequent class periods will be given the choice to switch conditions or use an alternate piece of educational software covering the same material. Motivation will be assessed by students' time allocation once they are given the option of switching tasks.  
+
In 2010, data will be collected in the Algebra, Geometry, Chemistry, MathTutor, and Science ASSISTments.
  
 
=== Explanation ===
 
=== Explanation ===
 
=== Further Information ===
 
=== Further Information ===
==== Connections ====
+
=== Connections ===
==== Annotated Bibliography ====
+
 
==== References ====
+
=== Annotated Bibliography ===
==== Future Plans ====
+
=== References ===
 +
 
 +
Aleven, V., McLaren, B., Roll, I., & Koedinger, K. (2006). Toward meta-cognitive tutoring: A model of help seeking with a Cognitive Tutor. International Journal of Artificial Intelligence and Education, 16, 101-128.
 +
 
 +
Baker, R.S.J.d. (2007) Modeling and Understanding Students' Off-Task Behavior in Intelligent Tutoring Systems. Proceedings of ACM CHI 2007: Computer-Human Interaction, 1059-1068.
 +
 
 +
Baker, R.S., Corbett, A.T., Koedinger, K.R., Wagner, A.Z. (2004) Off-Task Behavior in the Cognitive Tutor Classroom: When Students "Game The System". Proceedings of ACM CHI 2004: Computer-Human Interaction, 383-390.
 +
 
 +
Baker, R.S.J.d., Rodrigo, M.M.T., Xolocotzin, U.E. (2007) The Dynamics of Affective Transitions in Simulation Problem-Solving Environments. Proceedings of the Second International Conference on Affective Computing and Intelligent Interaction.
 +
 
 +
D'Mello, S. K., Picard, R. W., and Graesser, A. C. (2007) Towards an Affect-Sensitive AutoTutor. Special issue on Intelligent Educational Systems – IEEE Intelligent Systems, 22(4), 53-61.
 +
 
 +
Kapoor, A., Burleson, W., & Picard, R. W. (2007). Automatic prediction of frustration. International Journal of Human-Computer Studies, 65, 724-736.
 +
 
 +
Shih, B., Koedinger, K., and Scheines, R. (2008) A Response Time Model for Bottom-Out Hints as Worked Examples. Proceedings of the 1st International Conference on Educational Data Mining, 117-126.
 +
 
 +
=== Future Plans ===

Latest revision as of 13:31, 29 August 2011

Building Generalizable Fine-grained Detectors

Summary Table

Study 1

PIs Ryan Baker, Vincent Aleven
Other Contributors Sidney D'Mello (Consultant, University of Memphis), Ma. Mercedes T. Rodrigo (Consultant, Ateneo de Manila University)
Study Start Date February, 2010
Study End Date February, 2011
LearnLab Site TBD
LearnLab Course Algebra, Geometry, Chemistry, MathTutor, ScienceAssistments
Number of Students 78 so far; total TBD
Total Participant Hours 444 so far; total TBD
Data available in DataShop Dataset: CMU VlabHomeworks F2010

Dataset: Affect Detectors and Questionnaires Greenville 2010-11

  • Pre/Post Test Score Data: TBD
  • Paper or Online Tests: TBD
  • Scanned Paper Tests: TBD
  • Blank Tests: TBD
  • Answer Keys: TBD

Abstract

This project, joint between M&M and CMDM, will create a set of fine-grained detectors of affect and M&M behaviors. These detectors will be usable by future projects in these two thrusts to study the impact of learning interventions on these dimensions of students’ learning experiences, and to study the inter-relationships between these constructs and other key PSLC constructs (such as measures of robust learning, and motivational questionnaire data). It will be possible to apply these detectors retrospectively to existing PSLC data in DataShop, in order to re-interpret prior work in the light of relevant evidence on students’ affect and M&M behaviors.

Background & Significance

Glossary

Metacognition and Motivation

Computational Modeling and Data Mining

Gaming the system

Off-Task Behavior

Affect

Frustration

Boredom

Flow

Engaged Concentration

Hypotheses

H1: We hypothesize that it will be possible to develop reasonably accurate detectors of student affect for four LearnLabs, that detect affect using only the data from the interaction between the student and the keyboard/mouse.

H2: We hypothesize that models of behaviors such as gaming the system, and off-task behavior, in combination with models of affect/behavior dynamics, can make affect detectors more accurate.

H3: We hypothesize that models created using data from three LearnLabs will perform significantly better than chance in data from a fourth LearnLab, with no re-training (or limited EM-based modification that requires no new labeled data).

H4: We hypothesize that these affect models will become a valuable component of future research in the M&M and CMDM thrusts.

Research Process

We will develop detectors of the M&M (metacognitive & motivational) behaviors of gaming the system, off-task behavior, proper help use, on-task conversation, help avoidance and self-explanation without scaffolding. This set of behaviors has already been effectively detected in mathematics LearnLabs. We will model the dynamics between these behaviors and student affect (following on work in the PSLC and at Memphis), in order to be able to leverage these detectors to create detectors of the affective states of engaged concentration, boredom, confusion, and frustration (the dynamics models will enable us to set Bayesian priors for how likely an affective state is at a given time).

These detectors will be developed for multiple LearnLabs, and the generalizability of detectors across LearnLabs will be one of the focuses of study during this project. We anticipate developing detectors for Algebra and Geometry, the Chemistry Virtual Lab, MathTutor, and Science ASSISTments. Each of these learning environments presents a context where complex learning occurs, fine-grained interaction behavior is logged, and the outputs of the detectors will provide leverage on a number of research questions of interest.

“Ground truth” for the M&M behavior categories will be established through quantitative field observations. “Ground truth” for the affect categories will be established by field observations and infrequent pop-up questions. Work will be conducted to increase the reliability of quantitative field observations of affect to a standard considered appropriate by psychology journals, through repeated coding and discussion sessions and the development of a detailed coding manual based on prior work to code affect in field settings and work to code emotions from facial expressions.

Models will be developed solely using distilled log file data of the sort currently collected in DataShop (more sophisticated sensors will NOT be included in this project). The models will be built with a combination of machine learning, and knowledge engineering (specifically, through leveraging and adapting existing knowledge engineered models such as Aleven et al’s help-seeking model and Shih et al’s self-explanation model). Generalization of models across learning environments will involve expectation maximization to adapt models to new data sets, and/or leveraging the CTLVS1 taxonomy to develop meta-models that relate prediction features to design features. We will first develop models for individual learning environments and then extend them across environments.

Research Plan

1. Develop software for conducting field observations (cf. Baker et al, 2004) with PDAs and synchronizing with DataShop data -- software development completed, as of Aug 2010 synchronization verification in progress

2. Study and improve quantitative field coding of student affect states

  • The Research Associate and Assistant will conduct multiple coding and discussion sessions with the PI, and develop a detailed coding manual (including some video examples)

3. Collect training data (months 4-7) -- as of Aug 2010 first data set collected, other data collection in progress

  • Starting first in one LearnLab and rolling across LearnLabs, so that we have all the data for one LearnLab first. Collecting data on all constructs at once. Then the programmer/PI can start developing detectors for constructs in first LearnLab, while the RAs keep collecting more data in the second and subsequent LearnLabs
  • Quantitative field observations (cf. Baker et al, 2004)

4. Develop detectors (months 5-8)

  • Utilizing combination of existing data mining tools and code previously used by Baker to create Latent Response Model-based detectors of Gaming the System and Off-Task Behavior
  • Develop and leverage behavior-affect temporal dynamics models (cf. D’Mello et al, 2007; Baker, Rodrigo, & Xolocotzin, 2007) to create priors for predicting affect
  • Use log data to predict field observations, student responses
  • Student-level cross-validation used for assessing goodness of detectors

5. Develop meta-detectors (months 9-12)

  • Use expectation maximization to adapt models to new data sets
  • Leverage the CTLVS1 taxonomy to develop meta-models that relate prediction features to design features
  • Cross-validation at grain-size of transfer between units or corresponding (within each LearnLab) to validate appropriateness for whole LearnLab
  • Test goodness of models when {train on 3 tutors, transfer to tutor #4} to evaluate effectiveness for entirely new tutors

Independent Variables

n/a (see Research Plan)

Dependent Variables

n/a (see Research Plan)

Affective States and M&M Behaviors to be Modeled

Affective States:

  • Engaged Concentration (a subset of Flow) (cf. Baker et al, 2010)
  • Boredom (Kapoor, Burleson, & Picard, 2007)
  • Frustration (Kapoor, Burleson, & Picard, 2007)

M&M Behaviors:

Planned Studites

In 2010, data will be collected in the Algebra, Geometry, Chemistry, MathTutor, and Science ASSISTments.

Explanation

Further Information

Connections

Annotated Bibliography

References

Aleven, V., McLaren, B., Roll, I., & Koedinger, K. (2006). Toward meta-cognitive tutoring: A model of help seeking with a Cognitive Tutor. International Journal of Artificial Intelligence and Education, 16, 101-128.

Baker, R.S.J.d. (2007) Modeling and Understanding Students' Off-Task Behavior in Intelligent Tutoring Systems. Proceedings of ACM CHI 2007: Computer-Human Interaction, 1059-1068.

Baker, R.S., Corbett, A.T., Koedinger, K.R., Wagner, A.Z. (2004) Off-Task Behavior in the Cognitive Tutor Classroom: When Students "Game The System". Proceedings of ACM CHI 2004: Computer-Human Interaction, 383-390.

Baker, R.S.J.d., Rodrigo, M.M.T., Xolocotzin, U.E. (2007) The Dynamics of Affective Transitions in Simulation Problem-Solving Environments. Proceedings of the Second International Conference on Affective Computing and Intelligent Interaction.

D'Mello, S. K., Picard, R. W., and Graesser, A. C. (2007) Towards an Affect-Sensitive AutoTutor. Special issue on Intelligent Educational Systems – IEEE Intelligent Systems, 22(4), 53-61.

Kapoor, A., Burleson, W., & Picard, R. W. (2007). Automatic prediction of frustration. International Journal of Human-Computer Studies, 65, 724-736.

Shih, B., Koedinger, K., and Scheines, R. (2008) A Response Time Model for Bottom-Out Hints as Worked Examples. Proceedings of the 1st International Conference on Educational Data Mining, 117-126.

Future Plans