Baker - Building Generalizable Fine-grained Detectors

From LearnLab
Revision as of 13:31, 29 August 2011 by Mbett (talk | contribs) (Reverted edits by Woolerystixmaker (Talk); changed back to last version by Alida)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Building Generalizable Fine-grained Detectors

Summary Table

Study 1

PIs Ryan Baker, Vincent Aleven
Other Contributors Sidney D'Mello (Consultant, University of Memphis), Ma. Mercedes T. Rodrigo (Consultant, Ateneo de Manila University)
Study Start Date February, 2010
Study End Date February, 2011
LearnLab Site TBD
LearnLab Course Algebra, Geometry, Chemistry, MathTutor, ScienceAssistments
Number of Students 78 so far; total TBD
Total Participant Hours 444 so far; total TBD
Data available in DataShop Dataset: CMU VlabHomeworks F2010

Dataset: Affect Detectors and Questionnaires Greenville 2010-11

  • Pre/Post Test Score Data: TBD
  • Paper or Online Tests: TBD
  • Scanned Paper Tests: TBD
  • Blank Tests: TBD
  • Answer Keys: TBD

Abstract

This project, joint between M&M and CMDM, will create a set of fine-grained detectors of affect and M&M behaviors. These detectors will be usable by future projects in these two thrusts to study the impact of learning interventions on these dimensions of students’ learning experiences, and to study the inter-relationships between these constructs and other key PSLC constructs (such as measures of robust learning, and motivational questionnaire data). It will be possible to apply these detectors retrospectively to existing PSLC data in DataShop, in order to re-interpret prior work in the light of relevant evidence on students’ affect and M&M behaviors.

Background & Significance

Glossary

Metacognition and Motivation

Computational Modeling and Data Mining

Gaming the system

Off-Task Behavior

Affect

Frustration

Boredom

Flow

Engaged Concentration

Hypotheses

H1: We hypothesize that it will be possible to develop reasonably accurate detectors of student affect for four LearnLabs, that detect affect using only the data from the interaction between the student and the keyboard/mouse.

H2: We hypothesize that models of behaviors such as gaming the system, and off-task behavior, in combination with models of affect/behavior dynamics, can make affect detectors more accurate.

H3: We hypothesize that models created using data from three LearnLabs will perform significantly better than chance in data from a fourth LearnLab, with no re-training (or limited EM-based modification that requires no new labeled data).

H4: We hypothesize that these affect models will become a valuable component of future research in the M&M and CMDM thrusts.

Research Process

We will develop detectors of the M&M (metacognitive & motivational) behaviors of gaming the system, off-task behavior, proper help use, on-task conversation, help avoidance and self-explanation without scaffolding. This set of behaviors has already been effectively detected in mathematics LearnLabs. We will model the dynamics between these behaviors and student affect (following on work in the PSLC and at Memphis), in order to be able to leverage these detectors to create detectors of the affective states of engaged concentration, boredom, confusion, and frustration (the dynamics models will enable us to set Bayesian priors for how likely an affective state is at a given time).

These detectors will be developed for multiple LearnLabs, and the generalizability of detectors across LearnLabs will be one of the focuses of study during this project. We anticipate developing detectors for Algebra and Geometry, the Chemistry Virtual Lab, MathTutor, and Science ASSISTments. Each of these learning environments presents a context where complex learning occurs, fine-grained interaction behavior is logged, and the outputs of the detectors will provide leverage on a number of research questions of interest.

“Ground truth” for the M&M behavior categories will be established through quantitative field observations. “Ground truth” for the affect categories will be established by field observations and infrequent pop-up questions. Work will be conducted to increase the reliability of quantitative field observations of affect to a standard considered appropriate by psychology journals, through repeated coding and discussion sessions and the development of a detailed coding manual based on prior work to code affect in field settings and work to code emotions from facial expressions.

Models will be developed solely using distilled log file data of the sort currently collected in DataShop (more sophisticated sensors will NOT be included in this project). The models will be built with a combination of machine learning, and knowledge engineering (specifically, through leveraging and adapting existing knowledge engineered models such as Aleven et al’s help-seeking model and Shih et al’s self-explanation model). Generalization of models across learning environments will involve expectation maximization to adapt models to new data sets, and/or leveraging the CTLVS1 taxonomy to develop meta-models that relate prediction features to design features. We will first develop models for individual learning environments and then extend them across environments.

Research Plan

1. Develop software for conducting field observations (cf. Baker et al, 2004) with PDAs and synchronizing with DataShop data -- software development completed, as of Aug 2010 synchronization verification in progress

2. Study and improve quantitative field coding of student affect states

  • The Research Associate and Assistant will conduct multiple coding and discussion sessions with the PI, and develop a detailed coding manual (including some video examples)

3. Collect training data (months 4-7) -- as of Aug 2010 first data set collected, other data collection in progress

  • Starting first in one LearnLab and rolling across LearnLabs, so that we have all the data for one LearnLab first. Collecting data on all constructs at once. Then the programmer/PI can start developing detectors for constructs in first LearnLab, while the RAs keep collecting more data in the second and subsequent LearnLabs
  • Quantitative field observations (cf. Baker et al, 2004)

4. Develop detectors (months 5-8)

  • Utilizing combination of existing data mining tools and code previously used by Baker to create Latent Response Model-based detectors of Gaming the System and Off-Task Behavior
  • Develop and leverage behavior-affect temporal dynamics models (cf. D’Mello et al, 2007; Baker, Rodrigo, & Xolocotzin, 2007) to create priors for predicting affect
  • Use log data to predict field observations, student responses
  • Student-level cross-validation used for assessing goodness of detectors

5. Develop meta-detectors (months 9-12)

  • Use expectation maximization to adapt models to new data sets
  • Leverage the CTLVS1 taxonomy to develop meta-models that relate prediction features to design features
  • Cross-validation at grain-size of transfer between units or corresponding (within each LearnLab) to validate appropriateness for whole LearnLab
  • Test goodness of models when {train on 3 tutors, transfer to tutor #4} to evaluate effectiveness for entirely new tutors

Independent Variables

n/a (see Research Plan)

Dependent Variables

n/a (see Research Plan)

Affective States and M&M Behaviors to be Modeled

Affective States:

  • Engaged Concentration (a subset of Flow) (cf. Baker et al, 2010)
  • Boredom (Kapoor, Burleson, & Picard, 2007)
  • Frustration (Kapoor, Burleson, & Picard, 2007)

M&M Behaviors:

Planned Studites

In 2010, data will be collected in the Algebra, Geometry, Chemistry, MathTutor, and Science ASSISTments.

Explanation

Further Information

Connections

Annotated Bibliography

References

Aleven, V., McLaren, B., Roll, I., & Koedinger, K. (2006). Toward meta-cognitive tutoring: A model of help seeking with a Cognitive Tutor. International Journal of Artificial Intelligence and Education, 16, 101-128.

Baker, R.S.J.d. (2007) Modeling and Understanding Students' Off-Task Behavior in Intelligent Tutoring Systems. Proceedings of ACM CHI 2007: Computer-Human Interaction, 1059-1068.

Baker, R.S., Corbett, A.T., Koedinger, K.R., Wagner, A.Z. (2004) Off-Task Behavior in the Cognitive Tutor Classroom: When Students "Game The System". Proceedings of ACM CHI 2004: Computer-Human Interaction, 383-390.

Baker, R.S.J.d., Rodrigo, M.M.T., Xolocotzin, U.E. (2007) The Dynamics of Affective Transitions in Simulation Problem-Solving Environments. Proceedings of the Second International Conference on Affective Computing and Intelligent Interaction.

D'Mello, S. K., Picard, R. W., and Graesser, A. C. (2007) Towards an Affect-Sensitive AutoTutor. Special issue on Intelligent Educational Systems – IEEE Intelligent Systems, 22(4), 53-61.

Kapoor, A., Burleson, W., & Picard, R. W. (2007). Automatic prediction of frustration. International Journal of Human-Computer Studies, 65, 724-736.

Shih, B., Koedinger, K., and Scheines, R. (2008) A Response Time Model for Bottom-Out Hints as Worked Examples. Proceedings of the 1st International Conference on Educational Data Mining, 117-126.

Future Plans