Computational Modeling and Data Mining Research Thrust

In our Computational Modeling and Data Mining (CMDM) Thrust, we demonstrated the application of fine grain data to build accurate models of student performance and learning.   We developed 1) domain-specific models of student knowledge representation, acquisition, and transfer (e.g., Falakmasir, Pardos, Gordon, & Brusilovsky, 2013 ; Koedinger, Yudelson, & Pavlik, in press), 2) domain-general models of learning (e.g., Li et al., 2015; Matsuda, Cohen & Koedinger, 2015) and the metacognitive and motivational processes that support learning (e.g., Aleven, Mclaren, Roll, & Koedinger, 2006, cited by 247), and 3) predictive engineering models and methods that enable the design of large-impact instructional interventions (e.g., Koedinger, Stamper, McLaughlin, & Nixon, 2013; Koedinger & Aleven, 2007 cited by 335).

CMDM was a key contributor to center-wide outcomes like the KLI Framework (Koedinger, Corbett, & Perfetti, 2012, cited by 128) and the Instructional Complexity analysis (Koedinger, Booth, & Klahr, 2013) which illuminates the vastness of the instructional design space (see Figure 1). It has also been critical in the formation of the field of educational data mining, providing many award winning results (see the section below) as well as influential reviews of the field (Baker & Yacef, 2009, cited by 590; Koedinger, D’Mello, McLaughlin, Pardos, & Rosé, 2015). Consistent with our interdisciplinary goals, CMDM researchers made fundamental contributions to machine learning (Singh & Gordon, 2008, cited by 356) and artificial intelligence (Li et al., 2015).

Domain-specific models of student knowledge. Through an interdisciplinary collaboration of cognitive science and psychometrics, we invented a new algorithm for automatically searching for better cognitive models of learning (Cen, Koedinger, & Junker, 2006, cited by 191) and demonstrated its effectiveness across 11 different data sets from DataShop in an award-winning paper (Koedinger, McLaughlin, & Stamper, 2012). A key to this approach was formulating a cognitive model, which was historically represented in discrete symbols, as a statistical model.  Many, many variations on such statistical models have been explored since (Lee & Brunskill, 2012; Baker et al.,2008a, cited by 183; Pavlik, Cen, & Koedinger, 2009, cited by 166; Chi, Koedinger, Gordon, Jordon, & VanLehn, 2011; Yudelson, Koedinger, & Gordon, 2013; Pardos & Heffernan, 2010; …).

In another demonstration of the great value of the amassed data in DataShop, Koedinger, Yudelson, & Pavlik (in press) used eight of the stored datasets stored to address a long-standing debate between competing theories of transfer of learning: faculty theory versus an identical elements or component theory (cf., Singley & Anderson, 1989). They developed statistical models of these alternative theories and found that the component theory provides better prediction accuracy and better explanatory power than the faculty theory.  These results provide further support for KLI Framework construct of knowledge components.

Domain-general models of learning. One fundamental contribution of the CMDM team was an AI journal paper on the integration of representation learning into skill learning (Li et al., 2015). The paper presented a computational model that extended our machine learning simulation of student learning. SimStudent, providing a precise theoretical account of inductive learning processes in the KLI framework. This effort continued a LearnLab theme of demonstrating the relevance of language learning processes (i.e., grammar learning as a form of representation learning) in science and mathematics learning (cf., Koedinger & McLaughlin, 2010; Koedinger & McLaughlin 2016).

Predictive Engineering Models and Methods.  Our progress in developing models and methods for engineering more effective courses was through work on the Assistance Dilemma formulation (Koedinger & Aleven, 2007, cited by 335; Koedinger, Pavlik, McLaren, & Aleven, 2008; Pavlik & Anderson, 2008; Pavlik, Bolster, Wu, Koedinger & MacWhinney, 2008; Borek, McLaren,Karabinos & Yaron, 2009; McLaren, Timms, Weihnacht & Brenner, 2012; McLaren, van Gog, Ganoe, Yaron & Karabinos, 2014; Walkington & Maull, 2011) and on extending the KLI framework (see above). We pushed the KLI framework beyond the core focus on cognitive factors to touch on issues related to the Social Communicative thrust (e.g., Koedinger & Wiese, 2wy015) and the Metacognition and Motivation thrust.  For example, an ongoing collaboration between CMDM and the Metacognition and Motivation thrust produced a series of highly influential papers on machine learning based detectors of student motivation and affect (e.g., Baker et al., 2013, cited by 175; Baker & Rossi, 2013; Baker et al., 2008c, cited by 126; Rodrigo & Baker, 2011).

Large-impact Instructional Interventions.  The learner models developed in CMDM let to large-impact instructional inventions.  We provide two examples.  First, Davenport et al., (2008) performanced a micro-level domain analysis used verbal protocol studies to identify the knowledge components that differentiate novice and expert performance while solving problems in chemical equilibrium. These analyses revealed a central conceptual structure —  progress of reaction —  that was not being directly addressed in instruction.  A cross-semester comparison demonstrated that consequent redesign of the course materials led to a 2.5x increase in student performance on the targeted equilibrium content.  Second, Chi et al., 2011 applied a machine learning method, Reinforcement Learning, to automatically derive adaptive pedagogical strategies directly from pre-existing student-computer interaction data.  Results showed that the induced pedagogical strategies significantly improved students’ learning gains by up to 60% compared to control students.