Difference between revisions of "Baker Choices in LE Space"

From LearnLab
Jump to: navigation, search
(New page: == How Content and Interface Features Influence Student Choices Within the Learning Spaces== Ryan S.J.d. Baker, Albert T. Corbett, Kenneth R. Koedinger, Ma. Mercedes T. Rodrigo ===Overvi...)
 
 
(26 intermediate revisions by the same user not shown)
Line 21: Line 21:
 
===Abstract===
 
===Abstract===
  
We have been investigating whether personalized (or polite) instructional materials and worked examples can improve learning when used as techniques [[complementary]] to an intelligent tutoring system.  The study involves online (i.e., web-based) learning of stoichiometry, the basic math required to solve elementary chemistry problems, and uses intelligent tutoring systems developed with the aid of the [[Cognitive Tutor Authoring Tools]] (CTAT), a key enabling technology of the PSLC. A screen shot of the tutor we have been using in the studies is shown below.  The stoichiometry materials were piloted at CMU and the University of Pittsburgh and have now been used in three separate in vivo studies with students at a university and three high schools.  
+
We are investigating what factors lead students to make specific path choices in the learning space, focusing specifically on the shallow strategy known as [[gaming the system]], and on [[Off-Task Behavior]]. Prior PSLC research has shown that a variety of motivations, attitudes, and affective states are associated with the choice to game the system (Baker et al, 2004; Baker, 2007b; Rodrigo et al, 2007) and the choice of off-task behavior (Baker, 2007b) within intelligent tutoring systems. However, other recent research has found that differences between lessons are on the whole better predictors of gaming than differences between students (Baker, 2007), suggesting that contextual factors associated with a specific tutor unit may be the most important reason why students game the system. Hence, this project is investigating how the content and presentational/interface aspects of a learning environment influence whether students tend to choose a gaming the system strategy. An extension to this project in 2008-2009 also investigated how the content and presentational/interface aspects of a learning environment influence whether students tend to choose a gaming the system strategy.
  
[[Image:Fig1-PoliteStoichTutorScreenShot.jpg|600px|center]]
+
To this end, we have annotated a large proportion of the learning events/transactions in a set of twenty units in the [[Algebra]] LearnLab with descriptions of each unit's content and interface features, using a combination of human coding and educational data mining. We then used data mining to predict gaming and off-task behavior with the content and interface features of the units they occur in. This gives us new insight into why students make specific path choices in the learning space, and explains the prior finding that path choices differ considerably between tutor units.
 
 
In a recent book by Clark and Mayer (2003), a number of [[E-Learning Principles]] were proposed as guidelines for building e-Learning systems.  All are supported by multiple educational psychology and cognitive science studies. We were especially interested in and decided to experiment with two of the Clark and Mayer principles:
 
*[[Personalization]] Principle One:  Use Conversational Rather than Formal Style (i.e., first and second person pronouns, informal language)
 
*[[Worked Examples]] Principle One: Replace Some Practice Problems with Worked Examples
 
In contrast with most previous studies, however, we wished to test these principles in the context of an intelligent tutoring system (ITS), rather than on their own in a standard e-Learning or ITS environment or, as in even earlier studies, in conjunction with problems solved by hand. The key difference is that an intelligent tutoring system provides more than just problem solving practice; it also supplies students with context-specific hints and feedback on their progress.
 
  
 
===Glossary===
 
===Glossary===
  
*[[E-Learning Principles]]  
+
*[[Gaming the system]]  
*[[Personalization]]
+
*[[Help abuse]]  
*[[Worked Examples]]
+
*[[Systematic Guessing]]
*[[working memory]]
+
*[[Off-Task Behavior]]
  
 
===Research Questions===
 
===Research Questions===
 +
 +
What aspects of tutor lesson design lead to the choice to game the system?
  
Can personalized (or polite) hints, feedback, and messages lead to robust learning when used in conjunction with a highly supportive learning environment, in particular an intelligent tutoring system?
+
What aspects of tutor lesson design lead to the choice to go off-task?
 
 
Can worked examples lead to robust learning when used in conjunction with a highly supportive learning environment, in particular an intelligent tutoring system?
 
 
 
Do worked examples lead to more efficient learning?  That is, by studying worked examples, can students learn as much as with supported problem solving, but in less time?
 
  
 
===Hypothesis===
 
===Hypothesis===
 
These research questions led us to the following three hypotheses:
 
  
 
;H1
 
;H1
:The use of personalized (or polite) problem statements, feedback, and hints in a supported problem-solving environment (i.e., an intelligent tutoring system) can improve learning in an e-Learning system.
+
:Content or interface features better explain differences in gaming frequency than stable between-student differences
  
;H2
+
;H2  
:The use of [[Worked Examples]] in a supported problem-solving environment (i.e., an intelligent tutoring system) can improve learning in an e-Learning system.
+
:Specific content or interface features will be replicably associated with differences in gaming the system across students
  
 
;H3
 
;H3
:The use of [[Worked Examples]] in a supported problem-solving environment can result in more efficient learning (i.e., learning as much as with supported problem solving only, but in less time).
+
:Specific content or interface features will be replicably associated with differences in off-task behavior across students
  
 
===Background and Significance===
 
===Background and Significance===
  
The Clark and Mayer personalization principle proposes that informal speech or text (i.e., conversational style) is more supportive of learning than formal speech or text in an e-Learning environment.  In other words, instructions, hints, and feedback should employ first or second-person language (e.g., “You might want to try this”) and should be presented informally (e.g., “Hello there, welcome to the Stoichiometry Tutor! …”) rather than in a more formal tone (e.g., “Problems such as these are solved in the following manner”).
+
In recent years, there has been considerable interest in how students choose to interact with learning environments. At any given learning event, a student may choose from a variety of learning-oriented "deep" paths, including attempting to construct knowledge to solve a problem on one’s own (Brown and vanLehn, 1980), self-explaining (Chi et al, 1989; Siegler, 2002), and seeking help and thinking about it carefully (Aleven et al, 2003). Alternatively, the student may choose from a variety of non-learning oriented "shallow" strategies, such as [[Help Abuse]] (Aleven & Koedinger, 2001), [[Systematic Guessing]] (Baker et al, 2004), and the failure to engage in [[Self-explanation]]. A student may also leave the learning event space entirely by engaging in various forms of off-task behavior.
  
Although the personalization principle runs counter to the intuition that information should be “efficiently delivered” and provided in a business-like manner to a learner, it is consistent with cognitive theories of learning. For instance, educational research has demonstrated that people put forth a greater effort to understand information when they feel they are in a dialogue (Beck, McKeown, Sandora, Kucan, and Worthy, 1996).  While consumers of e-Learning content certainly know they are interacting with a computer, and not a human, personalized language helps to create a “dialogue” effect with the computer. E-Learning research in support of the personalization principle is somewhat limited but at least one project has shown positive effects (Moreno and Mayer, 2000). Students who learned from personalized text in a botany e-Learning system performed better on subsequent transfer tasks than students who learned from more formal text in five out of five studies. Note that this project did not explore the use of personalization in a web-based intelligent tutoring setting, as we are doing in our work.
+
One analytical tool with considerable power to help learning scientists explain the ways students choose to use a learning environment is the [[learning event space]]. In a learning event space, the different paths a student could take are enumerated, and the effects of each path are detailed, both in terms of how the path influences the student’s success within the environment, and the student’s learning. The learning event space model provides a simple way to identify the possible paths and effects; it also provides a concrete way to break down complex research questions into simpler and more concrete questions.
  
The Clark and Mayer worked example principle proposes that an e-Learning course should present learners with some step-by-step solutions to problems (i.e., worked examples) rather than having them try to solve all problems on their own. Interestingly, this principle also runs counter to many people’s intuition and even to research that stresses the importance of “learning by doing” (Kolb, 1984).  
+
[[Gaming the system]] is an active and strategic type of shallow strategy known to occur in many types of learning environments (cf. Baker et al, 2004; Cheng and Vassileva, 2005; Rodrigo et al, 2007), including the Cognitive Tutors used in LearnLab courses (Baker et al, 2004). It was earlier hypothesized that gaming stemmed from stable differences in student goals, motivation, and attitudes -- however multiple studies have now suggested that these constructs play only a small role in predicting gaming behavior (Baker et al, 2005; Walonoski & Heffernan, 2006; Baker et al, 2008). By contrast, variation in short-term affective states and the tutor lesson itself appear to play a much larger role in the choice to game (Rodrigo et al, 2007; Baker, 2007a).
  
The theory behind worked examples is that solving problems can overload limited [[working memory]], while studying worked examples does not and, in fact, can help build new knowledge (Sweller, 1994). The empirical evidence in support of worked examples is more established and long standing than that of personalization.  For instance, in a study of geometry by Paas (1992), students who studied 8 worked examples and solved 4 problems worked for less time and scored higher on a posttest than students who solved all 12 problems. In a study in the domain of probability calculation, Renkl (1997) found that students who employed more principle-based self-explanations benefited more from worked examples than those who did not.  Research has also shown that mixing worked examples and problem solving is beneficial to learning. In a study on LISP programming (Trafton and Reiser, 1993), it was shown that alternating between worked examples and problem solving was more beneficial to learners than observing a group of worked examples followed by solving a group of problems.
+
In this project, we investigate what it is about some tutor lessons that encourages or discourages gaming. This project helps explain why students choose shallow gaming strategies at some learning events and not at others. This contributes to our understanding of learning event spaces, and makes a significant contribution to the PSLC Theoretical Framework, by providing an account for why students choose the shallow learning strategies in many of the learning event space models in the PSLC Theoretical Framework. The study of what lesson features predicted gaming was anticipated to jump-start the process of studying why students choose other shallow learning strategies beyond gaming the system, by providing a methodological template that can be directly applied in future research, as well as initial hypotheses to investigate. It did so, enabling analysis of which lesson features are associated with the choice to go off-task. This study has influenced the upcoming PSLC project [[Baker Closing the Loop on Gaming]].
  
Previous ITS research has investigated how worked examples can be used to help students as they problem solve (Gott, Lesgold, and Kane, 1996; Aleven and Ashley, 1997).  Conati’s and VanLehn’s SE-Coach demonstrated that an ITS can help students self-explain worked examples (2000).  However, none of this prior work explicitly studied how worked examples, presented separately from supported problem solving as [[complementary]] learning devices, might provide added value to learning with an ITS and avoid [[cognitive load]] (Sweller, 1994).  Closest to our approach is that of Mathan and Koedinger (2002).  They experimented with two different versions of an Excel ITS, one that employed an expert model and one that used an intelligent novice model, complemented by two different types of worked examples, “active” example walkthroughs (examples in which students complete some of the work) and “passive” examples (examples that are just watched).  The “active” example walkthroughs led to better learning but only for the students who used the expert model ITS.  However, a follow-up study did not replicate these results (Mathan, 2003).  This work, as with the other ITS research mentioned above, was not done in the context of a web-based ITS.
+
===Independent Variables===
  
===Independent Variables===
+
We have developed a taxonomy for how Cognitive Tutor lessons can differ from one another, the Cognitive Tutor Lesson Variation Space, version 1.1 (CTLVS1.1). The CTLVS1 was developed by a six member design team with a variety of perspectives and expertise, including three Cognitive Tutor designers (with expertise in cognitive psychology and artificial intelligence), a researcher specializing in the study of gaming the system, a mathematics teacher with several years of experience using Cognitive Tutors in class, and a designer of non-computerized curricula who had not previously used a Cognitive Tutor. Full detail on the CTLVS1's design is given in Baker et al (in press a).
 +
 
 +
The CTLVS1's features are as follows:
 +
 +
'''Difficulty, Complexity of Material, and Time-Consumingness'''
 +
* 1. Average percent error
 +
* 2. Lesson consists solely of review of material encountered in previous lessons
 +
* 3. Average probability that student will learn a skill at each opportunity to practice skill (cf. Corbett & Anderson, 1995)
 +
* 4. Average initial probability that student will know a skill when starting tutor  (cf. Corbett & Anderson, 1995)
 +
* 5. Average number of extraneous “distractor” values per problem
 +
* 6. Proportion of problems where extraneous “distractor” values are given
 +
* 7. Maximum number of mathematical operators needed to give correct answer on any step in lesson
 +
* 8. Maximum number of mathematical operators mentioned in hint on any step in lesson
 +
* 9. Intermediate calculations must be done outside of software (mentally or on paper) for some problem steps (ever occurs) 
 +
* 10. Proportion of hints that discuss intermediate calculations that must be done outside of software (mentally or on paper)
 +
* 11. Total number of skills in lesson
 +
* 12. Average time per problem step
 +
* 13. Proportion of problem statements that incorporate multiple representations (for example: diagram as well as text)
 +
* 14. Proportion of problem statements that use same numeric value for two constructs
 +
* 15. Average number of distinct/separable questions or problem-solving tasks per problem
 +
* 16. Maximum number of distinct/separable questions or problem-solving tasks in any problem
 +
* 17. Average number of numerical quantities manipulated per step
 +
* 18. Average number of times each skill is repeated per problem
 +
* 19. Number of problems in lesson
 +
* 20. Average time spent in lesson
 +
* 21. Average number of problem steps per problem
 +
* 22. Minimum number of answers or interface actions required to complete problem
 +
 
 +
'''Quality of Help Features'''
 +
* 23. Average amount that reading on-demand hints improves performance on future opportunities to use skill (cf. Beck, 2006)
 +
* 24. Average Flesch-Kincaid Grade Reading Level of hints
 +
* 25. Proportion of hints using inductive support, going from example to abstract description of concept/principle (Koedinger & Anderson, 1998)
 +
* 26. Proportion of hints that explicitly explain concepts or principles underlying current problem-solving step
 +
* 27. Proportion of hints that explicitly refer to abstract principles
 +
* 28. On average, how many hints must student request before concrete features of problems are discussed
 +
* 29. Average number of hint messages per hint sequence that orient student to mathematical sub-goal
 +
* 30. Proportion of hints that explicitly refer to scenario content (instead of referring solely to mathematical constructs)
 +
* 31. Proportion of hint sequences that use terminology specific to this software
 +
* 32. Proportion of hint messages which refer solely to interface features
 +
* 33. Proportion of hint messages that cannot be understood by teacher
 +
* 34. Proportion of hint messages with complex noun phrases
 +
* 35. Proportion of skills where the only hint message explicitly tells student what to do
 +
 
 +
'''Usability'''
 +
* 36. First problem step in first problem of lesson is either clearly indicated, or follows established convention (such as top-left cell in worksheet)
 +
* 37. Problem-solving task in lesson is not made immediately clear
 +
* 38. After student completes step, system indicates where in interface next action should occur
 +
* 39. Proportion of steps where it is necessary to request hint to figure out what to do next
 +
* 40. Not immediately apparent what icons in toolbar mean
 +
* 41. Screen is sufficiently cluttered with interface widgets, that it is difficult to determine where to enter answers
 +
* 42. Proportion of steps where student must change a value in a cell that was previously treated as correct (examples: self-detection of errors; refinement of answers)
 +
* 43. Format of answer changes between problem steps without clear indication
 +
* 44. If student has skipped step, and asks for hint, hints refer to skipped step without explicitly highlighting  in interface (ever seen)
 +
* 45. If student has skipped step, and asks for hint, skipped step is explicitly highlighted in interface (ever seen)
  
To test our hypotheses and the effect of personalization and worked examples on learning, we designed and have executed two 2 x 2 factorial studies.   
+
'''Relevance and Interestingness'''
 +
* 46. Proportion of problem statements which involve concrete people/places/things, rather than just numerical quantities
 +
* 47. Proportion of problem statements with story content
 +
* 48. Proportion of problem statements which involve scenarios relevant to the "World of Work"
 +
* 49. Proportion of problem statements which involve scenarios relevant to students’ current daily life
 +
* 50. Proportion of problem statements which involve fantasy (example: being a rock star)
 +
* 51. Proportion of problem statements which involve concrete details unfamiliar to population of students (example: dog-sleds)
 +
* 52. Proportion of problems which use (or appear to use) genuine data
 +
* 53. Proportion of problem statements with text not directly related to problem-solving task
 +
* 54. Average number of person proper names in problem statements
  
*One independent variable is [[Personalization]], with one level impersonal instruction, feedback, and hints and the other personal instruction, feedback, and hints.  
+
'''Aspects of “buggy” messages notifying student why action was incorrect'''
*The other independent variable is [[Worked Examples]], with one level tutored problem solving alone and the other tutored problem solving together with worked examples. In the former condition, subjects only solve problems using the intelligent tutor; no worked examples are presented.  In the latter condition, subjects alternate between observation and self-explanation of a worked example and solving of a tutored problem. This alternating technique has yielded better learning results in prior research (Trafton and Reiser, 1993).
+
* 55. Proportion of buggy messages that indicate which concept student demonstrated misconception in
 +
* 56. Proportion of buggy messages that indicate how student’s action was the result of a procedural error
 +
* 57. Proportion of buggy messages that refer solely to interface action
 +
* 58. Buggy messages are not immediately given; instead icon appears, which can be hovered over to receive bug message
  
With respect to personalized language (and because we got a null result in the first two studies), we thought that perhaps our conceptualization and implementation might not be as socially engaging as we had hoped. This was also suggested to us by Rich Mayer, who reviewed the first study. In a recent study that Mayer and colleagues did (Wang, Johnson, Mayer, Rizzo, Shaw, & Collins, in press), based on the work of Brown and Levinson (1987), they found that a polite version of a tutor, which provided polite feedback such as, “You could press the ENTER key”, led to significantly better learning than a direct version of the tutor that used more imperative feedback such as, “Press the ENTER key.”  We decided to investigate this in a third in vivo study in which we changed all of the personalized instruction, feedback, and hints of the tutor to more polite forms, similar to that used by Mayer and colleagues.
+
'''Design Choices Which Make It Easier to Game the System'''
 +
* 59. Proportion of steps which are explicitly multiple-choice
 +
* 60. Average number of choices in multiple-choice step
 +
* 61. Proportion of hint sequences with final hint that explicitly tells student  what the answer is, but not what/how to enter it in the tutor software
 +
* 62. Hint gives directional feedback (example: “try a larger number”) (ever seen)
 +
* 63. Average number of feasible answers for each problem step
  
Thus, for the third study we changed the first independent variable to "Politness," with one level polite instruction, feedback, and hints and the other direct instruction, feedback, and hints.  The 2 x 2 factorial design for our third and most recent study is shown below.
+
'''Meta-Cognition and Complex Conceptual Thinking (or features that make them easy to avoid)'''
 +
* 64. Student is prompted to give [[self-explanations]]
 +
* 65. Hints give explicit metacognitive advice (ever seen)
 +
* 66. Proportion of problem statements that use common word to indicate  mathematical operation to use (example: “increase”)
 +
* 67. Proportion of problem statements that indicate  mathematical operation to use, but with uncommon terminology (example: “pounds below normal” to indicate subtraction)
 +
* 68. Proportion of problem statements that explicitly tell student which mathematical operation to use (example: “add”)
  
[[Image:Fig2-FactorialDesign.jpg|600px|center]]
+
'''Software Bugs/Implementation Flaws (rare)'''
 +
* 69. Percent of problems where grammatical error is found in problem statement
 +
* 70. Reference in problem statement to interface component that does not exist (ever occurs)
 +
* 71. Proportion of problem steps where hints are unavailable
 +
* 72. Hint recommends student do something which is incorrect or non-optimal (ever occurs)
 +
* 73. Student can advance to new problem despite still visible errors on intermediate problem-solving steps
  
Below is a table that provides examples of the differences in language between the polite version of our tutor and earlier versions.
+
'''Miscellaneous'''
 +
* 74. Hint requests that student perform some action
 +
* 75. Value of answer is very large (over four significant digits) (ever seen)
 +
* 76. Average length of text in multiple-choice popup widgets
 +
* 77. Proportion of problem statements which include question or imperative
 +
* 78. Student selects action from menu, tutor software performs action (as opposed to typing in answers, or direct manipulation)
 +
* 79. Lesson is an "equation-solver" unit
  
[[image:Table1-FeedbackDiffs.jpg|500px|center]]
+
We then labeled a large proportion of units
 +
in the [[Algebra]] LearnLab with these taxonomic features. These features
 +
make up the independent variables in this project.
  
 
===Dependent Variables===
 
===Dependent Variables===
 +
 +
We labeled approximately 1.2 million transactions in [[Algebra]] tutor data from the [[DataShop]] with predictions as to whether it is an instance of gaming the system.
 +
These predictions were created by using text replay observations (Baker, Corbett, & Wagner, 2006) to label a representative set of transactions, and then using these labels to create gaming detectors (cf. Baker, Corbett, & Koedinger, 2004; Baker et al, 2008) which can be used to label the remaining transactions.
  
To evaluate learning, students are asked to solve pre and post-test stoichiometry problems that are isomorphic to one another and to the tutored problems. Thus, we have focused on [[normal post-test]]s in our studies so far.  When (and if) we see an effect in a normal post-test, we will conduct studies to test [[retention]].
+
===Findings and Explanation===
  
===Findings===
+
The text below is taken from (Baker, 2007b; Baker et al, in press a, accepted).
  
In two initial 2 x 2 factorial studies, we found that personalized language and worked examples had no significant effects on learning, thus not supporting hypotheses H1 and H2.  On the other hand, there was a significant difference between the pre and posttest in all conditions, suggesting that the intelligent tutor present in all conditions did make a difference in learning. For study 1 we had N = 63 and for study 2 we had N = 76.  The results of Study 1 are reported in (McLaren, Lim, Gagnon, Yaron, and Koedinger, 2006).  We are currently analyzing the data to test hypothesis H3, that is, to determine if learning with worked examples was more efficient.
+
The difference between lessons is a significantly better predictor than the difference between students in determining how much gaming behavior a student will engage in, in a given lesson. Put more simply, knowing which lesson a student is using is a better predictor of how much gaming will occur, than knowing which student it is.  
  
One possible explanation for why neither personalized language nor worked examples have made a difference thus far is the switch from a lab environment to in vivo experimentation. Most of the results from past studies of both personalized language and worked examples come from lab studies, so it may simply be that the realism and messiness of an in vivo study makes it much more difficult for interventions such as these to make a difference to students’ learning.  It may also be that the tutoring received by the subjects simply had much more effect on learning than the worked examples or personalized language.
+
In the Middle School Tutor, lesson has 35 parameters and achieves an r-squared of 0.55. Student has 240 parameters and achieves an r-squared of 0.16. In the Algebra Tutor, lesson has 21 parameters and achieves an r-squared of 0.18. Student achieves an equal r-squared, but with 58 students; hence, lesson is a statistically better predictor because it achieves equal or significantly better fit with considerably fewer parameters.
  
We recently concluded the third study in which we investigated the use of polite language, rather than personalized language (as shown in the table above). We have so far only analyzed the first 33 subjects, out of N=84 (for details on the analysis of the first 33 subjects see McLaren, Lim, Yaron, & Yaron, in press).  The preliminary data indicates that the polite condition leads to larger learning gains than the non-polite condition, however, not at a statistically significant level.   Worked examples also did not make a difference to learning.  Thus, once again, hypotheses H1 and H2 were not supported.  We are in the process of analyzing the rest of the data from the remaining subjects who participated in this study, including an investigation of hypothesis H3 and the efficiency of learning.
+
We empirically grouped the 79 features of the CTLVS1.1 with Principal Component Analysis (PCA). We grouped the 79 features of the CTLVS1 into 6 factors. We then analyzed whether the correlation between these 6 factorsand the frequency of gaming the system was significant in any case.
  
===Explanation===
+
Of these 6 factors, one was statistically significantly associated with the choice to game the system, r2 = 0.29 (e.g. accounting for 29% of the variance in gaming), F(1,19)= 7.84, p=0.01. The factor loaded strongly on eight features associated with more gaming:
 +
* 14: The same number being used for multiple constructs
 +
* 23-inverse-direction: Reading hints does not positively influence performance on future opportunities to use skill
 +
* 27: Proportion of hints in each hint sequence that refer to abstract principles
 +
* 40: Not immediately apparent what icons in toolbar mean
 +
* 53-inverse-direction: Lack of text in problem statements not directly related to the problem-solving task, generally there to increase interest
 +
* 63-inverse-direction: Hints do not give directional feedback such as “try a larger number”
 +
* 71-inverse-direction: Lack of implementation flaw in hint message, where there is a reference to a non-existent interface component
 +
* 75: Hint requests that student perform some action
  
This study is part of the [[Coordinative Learning]] cluster.   The study follows the Coordinative Learning hypothesis that two (or more) [[sources]] of instructional information can lead to improved robust learning.   In particular, the study tests whether an ITS and personalized (or polite) language used together lead to more robust learning and whether an ITS and worked examples used together lead to more robust learning.
+
In general, several of the features in this factor appear to correspond to a lack of clarity in the presentation of the content or task (23-inverse, 40, 63-inverse), as well as abstractness (27) and ambiguity (14). Curiously, feature 71-inverse (the lack of a specific type of implementation flaw in hint messages, which would make things very unclear) appears to point in the opposite direction – however, this implementation flaw was only common in a single rarely gamed lesson, so this result is probably a statistical artifact.
 +
 
 +
Feature 53-inverse appears to represent a different construct – interestingness (or the attempt to increase interestingness). The fact that feature 53 was associated with less gaming whereas more specific interest-increasing features (features 46-52) were not so strongly related may suggest that it is less important exactly how a problem scenario attempts to increase interest, than it is important that the problem scenario has some content in it that is not strictly mathematical.
 +
 
 +
Taken individually, two of the constructs in this factor were significantly (or marginally significantly) associated with gaming. Feature 53-inverse (text in the problem statement not directly related to the problem-solving task) was associated with significantly less gaming, r2 = 0.19, F(1,19) = 4.59, p = 0.04. Feature 40 (when it is not immediately apparent what icons in toolbar mean) was marginally significantly associated with more gaming, r2 = 0.15, F(1, 19)=3.52, p=0.08. The fact that other top features in the factor were not independently associated with gaming, while the factor as a whole was fairly strongly associated with gaming, suggests that gaming may occur primarily when more than one of these features are present.
 +
 
 +
Two features that were not present in the significant factor was statistically significantly associated with gaming: Feature 36, where the location of the first problem step does not follow conventions (such as being the top-left cell of a worksheet) and is not directly indicated, r2 = 0.20, F(1,19)=4.97, p=0.04. This feature, like many of those in the gaming-related factor, represents an unclear or confusing lesson. Also, Feature 79, whether or not the lesson was an equation solver unit, was statistically significantly better than chance, r2 = 0.30, F(1, 19)=8.55, p<0.01. Note, however, that although a lower amount of interesting text is generally associated with more gaming (Feature 53), equation-solver units (which have no text) have less gaming in general (Feature 79). This result may suggest that interest-increasing text is only beneficial (for reducing gaming) above a certain threshold -- alternatively, other aspects of the equation-solver units may have reduced gaming even though the lack of interesting-increasing text would generally be expected to increase it.
 +
 
 +
When the gaming-related factor, Feature 36, and Feature 79, were included in a model together, all remain statistically significant, and the combined model explains 56% of the variance in gaming (e.g. r2 = 0.55).
 +
 
 +
Five other features that were not strongly loaded in the significant factor were marginally associated with gaming. None of these other features is statistically significant in a model that already includes the gaming-related cluster and Feature 36. Due to the non-conclusiveness of the evidence relevant to these features, we will not discuss all of these features in detail, but will briefly mention one that has appeared in prior discussions of gaming. Lessons where a higher proportion of hint sequences told students what to do on the last hint (Feature 61) had marginally significantly more gaming, r2 = 0.14, F(1,19)=3.28, p=0.09. This result is unsurprising, as drilling through hints and typing in a bottom-out hint is one of the easiest and most frequently reported types of [[gaming the system]].
 +
 
 +
The off-task behavior model achieved similar predictive power, but was a much less complex model. None of the 6 factors were statistically significantly associated with gaming. Only one of the features was individually statistically significantly associated with off-task behavior: Feature 79, whether or not the lesson was an equation solver unit. Equation solver units had significantly less off-task behavior, just as they had significantly less gaming the system, and the effect was large in magnitude, r2 = 0.55, F(1, 21)=27.29, p<0.001, Bonferroni adjusted p<0.001.
 +
 
 +
To put this relationship into better context, we can look at the proportion of time students
 +
spent off-task in equation-solver lessons as compared to other lessons. On average,
 +
students spent 4.4% of their time off-task within the equation-solver lessons, much lower
 +
than is generally seen in intelligent tutor classrooms or, for that matter, in traditional
 +
classrooms. By contrast, students spent 14.1% of their time off-task within the
 +
other lessons, a proportion of time-on-task which is much more in line with previous
 +
observations. The difference in time spent per type of lesson is, as would be expected,
 +
statistically significant, t(22)=4.48, p<0.001.
  
 
=== Connections to Other PSLC Studies===
 
=== Connections to Other PSLC Studies===
  
* The key finding of our studies so far has been that learning has not improved when students use an intelligent tutor in conjunction with other instructional techniques.  Two other studies in the [[Coordinative Learning]] cluster, the [[Visual-Verbal Learning (Aleven & Butcher Project)|Aleven/Butcher]] and [[Booth|Booth/Siegler/Koedinger/Rittle-Johnson]] projects, and one study in the [[Interactive Communication]] cluster, the [[Does learning from worked-out examples improve tutored problem solving?|Renkl/Aleven/Salden project]], are also investigating the [[complementary]] effects of intelligent tutoring combined with another instructional technique.
+
This study inspired and led to the upcoming Year 6 study, [[Baker - Closing the Loop]].
  
* Our project differs from the [[Booth|Booth study]] in that we are using only correct examples as a means to strengthen correct [[knowledge components]].  We are not using incorrect examples to weaken incorrect [[knowledge components]], as Booth is testing in her study.
+
===Annotated Bibliography===
  
* Our project also relates to the [[Visual-Verbal Learning (Aleven & Butcher Project)|Aleven and Butcher project]] in that we both are exploring the learning value of [[e-Learning Principles]] (they are investigating Contiguity; we are looking at personalization and worked examples).   In addition, like their study, we are prompting [[self-explanation]] as a means to promote robust learning.
+
===References===
 +
Aleven, V., Koedinger, K.R. (2001) Investigations into Help Seeking and Learning with a Cognitive Tutor. In R.  Luckin (Ed.), Papers of the AIED-2001 Workshop on Help Provision and Help Seeking in Interactive Learning Environments (2001) 47-58
 +
 
 +
Aleven, V., Stahl, E., Schworm, S., Fischer, F., Wallace, R. (2003) Help seeking and help design in interactive learning environments. Review of Educational Research, 73 (3), 277-320.
 +
 
 +
Baker, R.S.J.d. (2007a) Is Gaming the System State-or-Trait? Educational Data Mining Through the Multi-Contextual Application of a Validated Behavioral Model. Complete On-Line Proceedings of the Workshop on Data Mining for User Modeling at the 11th International Conference on User Modeling 2007, 76-80. [http://www.joazeirodebaker.net/ryan/B2007B.pdf pdf]  
 +
 
 +
Baker, R.S.J.d. (2007b) Modeling and Understanding Students' Off-Task Behavior in Intelligent Tutoring Systems. Proceedings of ACM CHI 2007: Computer-Human Interaction, 1059-1068.
 +
 
 +
Baker, R.S.J.d. (2009) Differences Between Intelligent Tutor Lessons, and the Choice to Go Off-Task. Proceedings of the 2nd International Conference on Educational Data Mining, 11-20.
 +
 
 +
Baker, R.S.J.d., Corbett, A.T., Koedinger, K.R., Aleven, V., de Carvalho, A., Raspat, J. (2009) Educational Software Features that Encourage and Discourage "Gaming the System". Proceedings of the 14th International Conference on Artificial Intelligence in Education, 475-482.
 +
 
 +
Baker, R.S., Corbett, A.T., Koedinger, K.R., Wagner, A.Z. (2004) Off-Task Behavior in the Cognitive Tutor Classroom: When Students “Game the System”. Proceedings of ACM CHI 2004: Computer-Human Interaction, 383-390.[http://www.joazeirodebaker.net/ryan/p383-baker-rev.pdf pdf]
 +
 +
Baker, R.S.J.d., Corbett, A.T., Roll, I., Koedinger, K.R. (2008)  Developing a Generalizable Detector of When Students Game the System. User Modeling and User-Adapted Interaction, 18, 3, 287-314. [http://www.joazeirodebaker.net/ryan/USER475.pdf pdf]
 +
 
 +
Baker, R.S., Corbett, A.T., Wagner, A.Z. (2006) Human Classification of Low-Fidelity Replays of Student Actions. Proceedings of the Educational Data Mining Workshop at the 8th International Conference on Intelligent Tutoring Systems, 29-36. [http://www.joazeirodebaker.net/ryan/BCWFinal.pdf pdf]
 +
 
 +
Baker, R.S., Roll, I., Corbett, A.T., Koedinger, K.R. (2005) Do Performance Goals Lead Students to Game the System. Proceedings of the 12th International Conference on Artificial Intelligence in Education, 57-64. [http://www.joazeirodebaker.net/ryan/BRCKAIED2005Final.pdf pdf]
 +
 
 +
Baker, R.S.J.d., Walonoski, J.A., Heffernan, N.T., Roll, I., Corbett, A.T., Koedinger, K.R. (2008) Why Students Engage in "Gaming the System" Behavior in Interactive Learning Environments. Journal of Interactive Learning Research, 19 (2), 185-224. [http://www.joazeirodebaker.net/ryan/BWHRKC-JILR-draft.pdf pdf]
 +
 
 +
Beck, J.E. (2006) Using Learning Decomposition to Analyze Student Fluency Development. Workshop on Educational Data Mining at the 8th International Conference on Intelligent Tutoring Systems, 21-28.
 +
 
 +
Brown, J.S., vanLehn, K. (1980) Repair theory: A generative theory of bugs in procedural skills. Cognitive Science, 4, 379-426.
 +
 
 +
Cheng, R., Vassileva, J. (2005) Adaptive Reward Mechanism for Sustainable Online Learning Community. Proceedings of the 12th International Conference on Artificial Intelligence in Education, 152-159.
 +
 
 +
Chi, M.T.H., Bassok, M., Lewis, M.W., Reimann, P., Glaser, R. (1989) Self-Explanations: How Students Study and Use Examples in Learning to Solve Problems. Cognitive Science, 13, 145-182.  
  
===Annotated Bibliography===
+
Corbett,A.T., & Anderson, J.R. (1995). Knowledge tracing: Modeling the acquisition of procedural knowledge. User Modeling and User-Adapted Interaction, 4, 253-278.
 +
 
 +
Koedinger, K. R., & Anderson, J. R. (1998). Illustrating principled design: The early evolution of a cognitive tutor for algebra symbolization. Interactive Learning Environments, 5, 161-180.
  
*McLaren, B. M., Lim, S., Yaron, D., and Koedinger, K. R. (2007). Can a Polite Intelligent Tutoring System Lead to Improved Learning Outside of the Lab? In the Proceedings of the 13th International Conference on Artificial Intelligence in Education (AIED-07), pp 331-338. [[http://www.learnlab.org/research/wiki/images/5/5a/AIED-07-PoliteTutoring.pdf pdf file]]
+
Rodrigo, M.M.T., Baker, R.S.J.d., Lagud, M.C.V., Lim, S.A.L., Macapanpan, A.F., Pascua, S.A.M.S., Santillano, J.Q., Sevilla, L.R.S., Sugay, J.O., Tep, S., Viehland, N.J.B. (2007) Affect and Usage Choices in Simulation Problem Solving Environments. Proceedings of Artificial Intelligence in Education 2007, 145-152. [http://www.joazeirodebaker.net/ryan/RodrigoBakeretal2006Final.pdf pdf]  
*McLaren, B. M., Lim, S., Gagnon, F., Yaron, D., and Koedinger, K. R. (2006). Studying the Effects of Personalized Language and Worked Examples in the Context of a Web-Based Intelligent Tutor; In the Proceedings of the 8th International Conference on Intelligent Tutoring Systems (ITS-2006), pp. 318-328. [[http://www.learnlab.org/research/wiki/images/5/58/ChemStudy1-ITS2006.pdf pdf file]]  
 
*McLaren, B. M.  Presentation to the NSF Site Visitors, June, 2006.
 
  
===References===
+
Siegler, R.S. (2002) Microgenetic Studies of Self-Explanations. In N. Granott & J. Parziale (Eds.), Microdevelopment: Transition processes in development and learning,  31-58. New York: Cambridge University.
  
*Aleven, V. & Ashley, K. D. (1997).  Teaching Case-Based Argumentation Through a Model and Examples: Empirical Evaluation of an Intelligent Learning Environment, Proceedings of AIED-97, 87-94.
+
Walonoski, J.A., Heffernan, N.T. (2006) Detection and Analysis of Off-Task Gaming Behavior in Intelligent Tutoring Systems. Proceedings of the 8th International Conference on Intelligent Tutoring Systems, 382-391.
*Beck, I., McKeown, M. G., Sandora, C., Kucan, L., and Worthy, J. (1996).  Questioning the author: A year long classroom implementation to engage students in text. Elementary School Journal, 96, 385-414.
 
*Brown, P. and Levinson, S. C. (1987).  Politeness: Some Universals in Language Use.  Cambridge University Press, New York.
 
*Clark, R. C. and Mayer, R. E. (2003). e-Learning and the Science of Instruction. Jossey-Bass/Pfeiffer.
 
*Conati, C. and VanLehn, K. (2000).  Toward Computer-Based Support of Meta-Cognitive Skills: a Computational Framework to Coach Self-Explanation.  Int’l Journal of Artificial Intelligence in Education, 11, 398-415.
 
*Gott, S. P., Lesgold, A., & Kane, R. S. (1996).  Tutoring for Transfer of Technical Competence.  In B. G. Wilson (Ed.), Constructivist Learning Environments, 33-48, Englewood Cliffs, NJ: Educational Technology Publications.
 
*Kolb, D. A.  (1984). Experiential Learning - Experience as the Source of Learning and Development, Prentice-Hall, New Jersey. 1984.
 
*Mathan, S. and Koedinger, K. R. (2002). An Empirical Assessment of Comprehension Fostering Features in an Intelligent Tutoring System. Proceedings of ITS-2002. Lecture Notes in Computer Science, Vol. 2363, 330-343. Berlin: Springer-Verlag
 
*Mathan, S. (2003). Recasting the Feedback Debate: Benefits of Tutoring Error Detection and Correction Skills. Ph.D. Dissertation, Carnegie Mellon Univ., Pitts., PA.
 
*Mayer, R. E., Johnson, W. L., Shaw, E. and Sandhu, S. (2006).  Constructing Computer-Based Tutors that are Socially Sensitive: Politeness in Educational Software, International Journal of Human-Computer Studies 64 (2006) 36-42.
 
*Moreno, R. and Mayer, R. E. (2000).  Engaging students in active learning: The case for personalized multimedia messages. Journal of Ed. Psych., 93, 724-733.
 
*Paas, F. G. W. C. (1992). Training strategies for attaining transfer of problem-solving skill in statistics: A cognitive load approach.  Journal of Ed. Psych., 84, 429-434.
 
*Renkl, A. (1997). Learning from Worked-Out Examples: A Study on Individual Differences.  Cognitive Science, 21, 1-29.
 
*Sweller, J. (1994). Cognitive load theory, learning difficulty and instructional design.  Learning and Instruction, 4, 295-312.
 
*Trafton, J. G. and Reiser, B. J. (1993).  The contributions of studying examples and solving problems to skill acquisition.  In M. Polson (Ed.) Proceedings of the 15th annual conference of the Cognitive Science Society, 1017-1022.
 
*Wang, N., Johnson, W. L., Mayer, R. E., Rizzo, P., Shaw, E., & Collins, H. (in press). The Politeness Effect: Pedagogical Agents and Learning Outcomes. To be published in the International Journal of Human-Computer Studies.
 

Latest revision as of 21:26, 5 December 2009

How Content and Interface Features Influence Student Choices Within the Learning Spaces

Ryan S.J.d. Baker, Albert T. Corbett, Kenneth R. Koedinger, Ma. Mercedes T. Rodrigo

Overview

PIs: Ryan S.J.d Baker

Co-PIs: Albert T. Corbett, Kenneth R. Koedinger

Others who have contributed 160 hours or more:

  • Jay Raspat, Carnegie Mellon University, taxonomy development
  • Adriana M.J.A. de Carvalho, Carnegie Mellon University, data coding

Others significant personnel :

  • Ma. Mercedes T. Rodrigo, Ateneo de Manila University, data coding methods
  • Vincent Aleven, Carnegie Mellon University, taxonomy development

Abstract

We are investigating what factors lead students to make specific path choices in the learning space, focusing specifically on the shallow strategy known as gaming the system, and on Off-Task Behavior. Prior PSLC research has shown that a variety of motivations, attitudes, and affective states are associated with the choice to game the system (Baker et al, 2004; Baker, 2007b; Rodrigo et al, 2007) and the choice of off-task behavior (Baker, 2007b) within intelligent tutoring systems. However, other recent research has found that differences between lessons are on the whole better predictors of gaming than differences between students (Baker, 2007), suggesting that contextual factors associated with a specific tutor unit may be the most important reason why students game the system. Hence, this project is investigating how the content and presentational/interface aspects of a learning environment influence whether students tend to choose a gaming the system strategy. An extension to this project in 2008-2009 also investigated how the content and presentational/interface aspects of a learning environment influence whether students tend to choose a gaming the system strategy.

To this end, we have annotated a large proportion of the learning events/transactions in a set of twenty units in the Algebra LearnLab with descriptions of each unit's content and interface features, using a combination of human coding and educational data mining. We then used data mining to predict gaming and off-task behavior with the content and interface features of the units they occur in. This gives us new insight into why students make specific path choices in the learning space, and explains the prior finding that path choices differ considerably between tutor units.

Glossary

Research Questions

What aspects of tutor lesson design lead to the choice to game the system?

What aspects of tutor lesson design lead to the choice to go off-task?

Hypothesis

H1
Content or interface features better explain differences in gaming frequency than stable between-student differences
H2
Specific content or interface features will be replicably associated with differences in gaming the system across students
H3
Specific content or interface features will be replicably associated with differences in off-task behavior across students

Background and Significance

In recent years, there has been considerable interest in how students choose to interact with learning environments. At any given learning event, a student may choose from a variety of learning-oriented "deep" paths, including attempting to construct knowledge to solve a problem on one’s own (Brown and vanLehn, 1980), self-explaining (Chi et al, 1989; Siegler, 2002), and seeking help and thinking about it carefully (Aleven et al, 2003). Alternatively, the student may choose from a variety of non-learning oriented "shallow" strategies, such as Help Abuse (Aleven & Koedinger, 2001), Systematic Guessing (Baker et al, 2004), and the failure to engage in Self-explanation. A student may also leave the learning event space entirely by engaging in various forms of off-task behavior.

One analytical tool with considerable power to help learning scientists explain the ways students choose to use a learning environment is the learning event space. In a learning event space, the different paths a student could take are enumerated, and the effects of each path are detailed, both in terms of how the path influences the student’s success within the environment, and the student’s learning. The learning event space model provides a simple way to identify the possible paths and effects; it also provides a concrete way to break down complex research questions into simpler and more concrete questions.

Gaming the system is an active and strategic type of shallow strategy known to occur in many types of learning environments (cf. Baker et al, 2004; Cheng and Vassileva, 2005; Rodrigo et al, 2007), including the Cognitive Tutors used in LearnLab courses (Baker et al, 2004). It was earlier hypothesized that gaming stemmed from stable differences in student goals, motivation, and attitudes -- however multiple studies have now suggested that these constructs play only a small role in predicting gaming behavior (Baker et al, 2005; Walonoski & Heffernan, 2006; Baker et al, 2008). By contrast, variation in short-term affective states and the tutor lesson itself appear to play a much larger role in the choice to game (Rodrigo et al, 2007; Baker, 2007a).

In this project, we investigate what it is about some tutor lessons that encourages or discourages gaming. This project helps explain why students choose shallow gaming strategies at some learning events and not at others. This contributes to our understanding of learning event spaces, and makes a significant contribution to the PSLC Theoretical Framework, by providing an account for why students choose the shallow learning strategies in many of the learning event space models in the PSLC Theoretical Framework. The study of what lesson features predicted gaming was anticipated to jump-start the process of studying why students choose other shallow learning strategies beyond gaming the system, by providing a methodological template that can be directly applied in future research, as well as initial hypotheses to investigate. It did so, enabling analysis of which lesson features are associated with the choice to go off-task. This study has influenced the upcoming PSLC project Baker Closing the Loop on Gaming.

Independent Variables

We have developed a taxonomy for how Cognitive Tutor lessons can differ from one another, the Cognitive Tutor Lesson Variation Space, version 1.1 (CTLVS1.1). The CTLVS1 was developed by a six member design team with a variety of perspectives and expertise, including three Cognitive Tutor designers (with expertise in cognitive psychology and artificial intelligence), a researcher specializing in the study of gaming the system, a mathematics teacher with several years of experience using Cognitive Tutors in class, and a designer of non-computerized curricula who had not previously used a Cognitive Tutor. Full detail on the CTLVS1's design is given in Baker et al (in press a).

The CTLVS1's features are as follows:

Difficulty, Complexity of Material, and Time-Consumingness

  • 1. Average percent error
  • 2. Lesson consists solely of review of material encountered in previous lessons
  • 3. Average probability that student will learn a skill at each opportunity to practice skill (cf. Corbett & Anderson, 1995)
  • 4. Average initial probability that student will know a skill when starting tutor (cf. Corbett & Anderson, 1995)
  • 5. Average number of extraneous “distractor” values per problem
  • 6. Proportion of problems where extraneous “distractor” values are given
  • 7. Maximum number of mathematical operators needed to give correct answer on any step in lesson
  • 8. Maximum number of mathematical operators mentioned in hint on any step in lesson
  • 9. Intermediate calculations must be done outside of software (mentally or on paper) for some problem steps (ever occurs)
  • 10. Proportion of hints that discuss intermediate calculations that must be done outside of software (mentally or on paper)
  • 11. Total number of skills in lesson
  • 12. Average time per problem step
  • 13. Proportion of problem statements that incorporate multiple representations (for example: diagram as well as text)
  • 14. Proportion of problem statements that use same numeric value for two constructs
  • 15. Average number of distinct/separable questions or problem-solving tasks per problem
  • 16. Maximum number of distinct/separable questions or problem-solving tasks in any problem
  • 17. Average number of numerical quantities manipulated per step
  • 18. Average number of times each skill is repeated per problem
  • 19. Number of problems in lesson
  • 20. Average time spent in lesson
  • 21. Average number of problem steps per problem
  • 22. Minimum number of answers or interface actions required to complete problem

Quality of Help Features

  • 23. Average amount that reading on-demand hints improves performance on future opportunities to use skill (cf. Beck, 2006)
  • 24. Average Flesch-Kincaid Grade Reading Level of hints
  • 25. Proportion of hints using inductive support, going from example to abstract description of concept/principle (Koedinger & Anderson, 1998)
  • 26. Proportion of hints that explicitly explain concepts or principles underlying current problem-solving step
  • 27. Proportion of hints that explicitly refer to abstract principles
  • 28. On average, how many hints must student request before concrete features of problems are discussed
  • 29. Average number of hint messages per hint sequence that orient student to mathematical sub-goal
  • 30. Proportion of hints that explicitly refer to scenario content (instead of referring solely to mathematical constructs)
  • 31. Proportion of hint sequences that use terminology specific to this software
  • 32. Proportion of hint messages which refer solely to interface features
  • 33. Proportion of hint messages that cannot be understood by teacher
  • 34. Proportion of hint messages with complex noun phrases
  • 35. Proportion of skills where the only hint message explicitly tells student what to do

Usability

  • 36. First problem step in first problem of lesson is either clearly indicated, or follows established convention (such as top-left cell in worksheet)
  • 37. Problem-solving task in lesson is not made immediately clear
  • 38. After student completes step, system indicates where in interface next action should occur
  • 39. Proportion of steps where it is necessary to request hint to figure out what to do next
  • 40. Not immediately apparent what icons in toolbar mean
  • 41. Screen is sufficiently cluttered with interface widgets, that it is difficult to determine where to enter answers
  • 42. Proportion of steps where student must change a value in a cell that was previously treated as correct (examples: self-detection of errors; refinement of answers)
  • 43. Format of answer changes between problem steps without clear indication
  • 44. If student has skipped step, and asks for hint, hints refer to skipped step without explicitly highlighting in interface (ever seen)
  • 45. If student has skipped step, and asks for hint, skipped step is explicitly highlighted in interface (ever seen)

Relevance and Interestingness

  • 46. Proportion of problem statements which involve concrete people/places/things, rather than just numerical quantities
  • 47. Proportion of problem statements with story content
  • 48. Proportion of problem statements which involve scenarios relevant to the "World of Work"
  • 49. Proportion of problem statements which involve scenarios relevant to students’ current daily life
  • 50. Proportion of problem statements which involve fantasy (example: being a rock star)
  • 51. Proportion of problem statements which involve concrete details unfamiliar to population of students (example: dog-sleds)
  • 52. Proportion of problems which use (or appear to use) genuine data
  • 53. Proportion of problem statements with text not directly related to problem-solving task
  • 54. Average number of person proper names in problem statements

Aspects of “buggy” messages notifying student why action was incorrect

  • 55. Proportion of buggy messages that indicate which concept student demonstrated misconception in
  • 56. Proportion of buggy messages that indicate how student’s action was the result of a procedural error
  • 57. Proportion of buggy messages that refer solely to interface action
  • 58. Buggy messages are not immediately given; instead icon appears, which can be hovered over to receive bug message

Design Choices Which Make It Easier to Game the System

  • 59. Proportion of steps which are explicitly multiple-choice
  • 60. Average number of choices in multiple-choice step
  • 61. Proportion of hint sequences with final hint that explicitly tells student what the answer is, but not what/how to enter it in the tutor software
  • 62. Hint gives directional feedback (example: “try a larger number”) (ever seen)
  • 63. Average number of feasible answers for each problem step

Meta-Cognition and Complex Conceptual Thinking (or features that make them easy to avoid)

  • 64. Student is prompted to give self-explanations
  • 65. Hints give explicit metacognitive advice (ever seen)
  • 66. Proportion of problem statements that use common word to indicate mathematical operation to use (example: “increase”)
  • 67. Proportion of problem statements that indicate mathematical operation to use, but with uncommon terminology (example: “pounds below normal” to indicate subtraction)
  • 68. Proportion of problem statements that explicitly tell student which mathematical operation to use (example: “add”)

Software Bugs/Implementation Flaws (rare)

  • 69. Percent of problems where grammatical error is found in problem statement
  • 70. Reference in problem statement to interface component that does not exist (ever occurs)
  • 71. Proportion of problem steps where hints are unavailable
  • 72. Hint recommends student do something which is incorrect or non-optimal (ever occurs)
  • 73. Student can advance to new problem despite still visible errors on intermediate problem-solving steps

Miscellaneous

  • 74. Hint requests that student perform some action
  • 75. Value of answer is very large (over four significant digits) (ever seen)
  • 76. Average length of text in multiple-choice popup widgets
  • 77. Proportion of problem statements which include question or imperative
  • 78. Student selects action from menu, tutor software performs action (as opposed to typing in answers, or direct manipulation)
  • 79. Lesson is an "equation-solver" unit

We then labeled a large proportion of units in the Algebra LearnLab with these taxonomic features. These features make up the independent variables in this project.

Dependent Variables

We labeled approximately 1.2 million transactions in Algebra tutor data from the DataShop with predictions as to whether it is an instance of gaming the system. These predictions were created by using text replay observations (Baker, Corbett, & Wagner, 2006) to label a representative set of transactions, and then using these labels to create gaming detectors (cf. Baker, Corbett, & Koedinger, 2004; Baker et al, 2008) which can be used to label the remaining transactions.

Findings and Explanation

The text below is taken from (Baker, 2007b; Baker et al, in press a, accepted).

The difference between lessons is a significantly better predictor than the difference between students in determining how much gaming behavior a student will engage in, in a given lesson. Put more simply, knowing which lesson a student is using is a better predictor of how much gaming will occur, than knowing which student it is.

In the Middle School Tutor, lesson has 35 parameters and achieves an r-squared of 0.55. Student has 240 parameters and achieves an r-squared of 0.16. In the Algebra Tutor, lesson has 21 parameters and achieves an r-squared of 0.18. Student achieves an equal r-squared, but with 58 students; hence, lesson is a statistically better predictor because it achieves equal or significantly better fit with considerably fewer parameters.

We empirically grouped the 79 features of the CTLVS1.1 with Principal Component Analysis (PCA). We grouped the 79 features of the CTLVS1 into 6 factors. We then analyzed whether the correlation between these 6 factorsand the frequency of gaming the system was significant in any case.

Of these 6 factors, one was statistically significantly associated with the choice to game the system, r2 = 0.29 (e.g. accounting for 29% of the variance in gaming), F(1,19)= 7.84, p=0.01. The factor loaded strongly on eight features associated with more gaming:

  • 14: The same number being used for multiple constructs
  • 23-inverse-direction: Reading hints does not positively influence performance on future opportunities to use skill
  • 27: Proportion of hints in each hint sequence that refer to abstract principles
  • 40: Not immediately apparent what icons in toolbar mean
  • 53-inverse-direction: Lack of text in problem statements not directly related to the problem-solving task, generally there to increase interest
  • 63-inverse-direction: Hints do not give directional feedback such as “try a larger number”
  • 71-inverse-direction: Lack of implementation flaw in hint message, where there is a reference to a non-existent interface component
  • 75: Hint requests that student perform some action

In general, several of the features in this factor appear to correspond to a lack of clarity in the presentation of the content or task (23-inverse, 40, 63-inverse), as well as abstractness (27) and ambiguity (14). Curiously, feature 71-inverse (the lack of a specific type of implementation flaw in hint messages, which would make things very unclear) appears to point in the opposite direction – however, this implementation flaw was only common in a single rarely gamed lesson, so this result is probably a statistical artifact.

Feature 53-inverse appears to represent a different construct – interestingness (or the attempt to increase interestingness). The fact that feature 53 was associated with less gaming whereas more specific interest-increasing features (features 46-52) were not so strongly related may suggest that it is less important exactly how a problem scenario attempts to increase interest, than it is important that the problem scenario has some content in it that is not strictly mathematical.

Taken individually, two of the constructs in this factor were significantly (or marginally significantly) associated with gaming. Feature 53-inverse (text in the problem statement not directly related to the problem-solving task) was associated with significantly less gaming, r2 = 0.19, F(1,19) = 4.59, p = 0.04. Feature 40 (when it is not immediately apparent what icons in toolbar mean) was marginally significantly associated with more gaming, r2 = 0.15, F(1, 19)=3.52, p=0.08. The fact that other top features in the factor were not independently associated with gaming, while the factor as a whole was fairly strongly associated with gaming, suggests that gaming may occur primarily when more than one of these features are present.

Two features that were not present in the significant factor was statistically significantly associated with gaming: Feature 36, where the location of the first problem step does not follow conventions (such as being the top-left cell of a worksheet) and is not directly indicated, r2 = 0.20, F(1,19)=4.97, p=0.04. This feature, like many of those in the gaming-related factor, represents an unclear or confusing lesson. Also, Feature 79, whether or not the lesson was an equation solver unit, was statistically significantly better than chance, r2 = 0.30, F(1, 19)=8.55, p<0.01. Note, however, that although a lower amount of interesting text is generally associated with more gaming (Feature 53), equation-solver units (which have no text) have less gaming in general (Feature 79). This result may suggest that interest-increasing text is only beneficial (for reducing gaming) above a certain threshold -- alternatively, other aspects of the equation-solver units may have reduced gaming even though the lack of interesting-increasing text would generally be expected to increase it.

When the gaming-related factor, Feature 36, and Feature 79, were included in a model together, all remain statistically significant, and the combined model explains 56% of the variance in gaming (e.g. r2 = 0.55).

Five other features that were not strongly loaded in the significant factor were marginally associated with gaming. None of these other features is statistically significant in a model that already includes the gaming-related cluster and Feature 36. Due to the non-conclusiveness of the evidence relevant to these features, we will not discuss all of these features in detail, but will briefly mention one that has appeared in prior discussions of gaming. Lessons where a higher proportion of hint sequences told students what to do on the last hint (Feature 61) had marginally significantly more gaming, r2 = 0.14, F(1,19)=3.28, p=0.09. This result is unsurprising, as drilling through hints and typing in a bottom-out hint is one of the easiest and most frequently reported types of gaming the system.

The off-task behavior model achieved similar predictive power, but was a much less complex model. None of the 6 factors were statistically significantly associated with gaming. Only one of the features was individually statistically significantly associated with off-task behavior: Feature 79, whether or not the lesson was an equation solver unit. Equation solver units had significantly less off-task behavior, just as they had significantly less gaming the system, and the effect was large in magnitude, r2 = 0.55, F(1, 21)=27.29, p<0.001, Bonferroni adjusted p<0.001.

To put this relationship into better context, we can look at the proportion of time students spent off-task in equation-solver lessons as compared to other lessons. On average, students spent 4.4% of their time off-task within the equation-solver lessons, much lower than is generally seen in intelligent tutor classrooms or, for that matter, in traditional classrooms. By contrast, students spent 14.1% of their time off-task within the other lessons, a proportion of time-on-task which is much more in line with previous observations. The difference in time spent per type of lesson is, as would be expected, statistically significant, t(22)=4.48, p<0.001.

Connections to Other PSLC Studies

This study inspired and led to the upcoming Year 6 study, Baker - Closing the Loop.

Annotated Bibliography

References

Aleven, V., Koedinger, K.R. (2001) Investigations into Help Seeking and Learning with a Cognitive Tutor. In R. Luckin (Ed.), Papers of the AIED-2001 Workshop on Help Provision and Help Seeking in Interactive Learning Environments (2001) 47-58

Aleven, V., Stahl, E., Schworm, S., Fischer, F., Wallace, R. (2003) Help seeking and help design in interactive learning environments. Review of Educational Research, 73 (3), 277-320.

Baker, R.S.J.d. (2007a) Is Gaming the System State-or-Trait? Educational Data Mining Through the Multi-Contextual Application of a Validated Behavioral Model. Complete On-Line Proceedings of the Workshop on Data Mining for User Modeling at the 11th International Conference on User Modeling 2007, 76-80. pdf

Baker, R.S.J.d. (2007b) Modeling and Understanding Students' Off-Task Behavior in Intelligent Tutoring Systems. Proceedings of ACM CHI 2007: Computer-Human Interaction, 1059-1068.

Baker, R.S.J.d. (2009) Differences Between Intelligent Tutor Lessons, and the Choice to Go Off-Task. Proceedings of the 2nd International Conference on Educational Data Mining, 11-20.

Baker, R.S.J.d., Corbett, A.T., Koedinger, K.R., Aleven, V., de Carvalho, A., Raspat, J. (2009) Educational Software Features that Encourage and Discourage "Gaming the System". Proceedings of the 14th International Conference on Artificial Intelligence in Education, 475-482.

Baker, R.S., Corbett, A.T., Koedinger, K.R., Wagner, A.Z. (2004) Off-Task Behavior in the Cognitive Tutor Classroom: When Students “Game the System”. Proceedings of ACM CHI 2004: Computer-Human Interaction, 383-390.pdf

Baker, R.S.J.d., Corbett, A.T., Roll, I., Koedinger, K.R. (2008) Developing a Generalizable Detector of When Students Game the System. User Modeling and User-Adapted Interaction, 18, 3, 287-314. pdf

Baker, R.S., Corbett, A.T., Wagner, A.Z. (2006) Human Classification of Low-Fidelity Replays of Student Actions. Proceedings of the Educational Data Mining Workshop at the 8th International Conference on Intelligent Tutoring Systems, 29-36. pdf

Baker, R.S., Roll, I., Corbett, A.T., Koedinger, K.R. (2005) Do Performance Goals Lead Students to Game the System. Proceedings of the 12th International Conference on Artificial Intelligence in Education, 57-64. pdf

Baker, R.S.J.d., Walonoski, J.A., Heffernan, N.T., Roll, I., Corbett, A.T., Koedinger, K.R. (2008) Why Students Engage in "Gaming the System" Behavior in Interactive Learning Environments. Journal of Interactive Learning Research, 19 (2), 185-224. pdf

Beck, J.E. (2006) Using Learning Decomposition to Analyze Student Fluency Development. Workshop on Educational Data Mining at the 8th International Conference on Intelligent Tutoring Systems, 21-28.

Brown, J.S., vanLehn, K. (1980) Repair theory: A generative theory of bugs in procedural skills. Cognitive Science, 4, 379-426.

Cheng, R., Vassileva, J. (2005) Adaptive Reward Mechanism for Sustainable Online Learning Community. Proceedings of the 12th International Conference on Artificial Intelligence in Education, 152-159.

Chi, M.T.H., Bassok, M., Lewis, M.W., Reimann, P., Glaser, R. (1989) Self-Explanations: How Students Study and Use Examples in Learning to Solve Problems. Cognitive Science, 13, 145-182.

Corbett,A.T., & Anderson, J.R. (1995). Knowledge tracing: Modeling the acquisition of procedural knowledge. User Modeling and User-Adapted Interaction, 4, 253-278.

Koedinger, K. R., & Anderson, J. R. (1998). Illustrating principled design: The early evolution of a cognitive tutor for algebra symbolization. Interactive Learning Environments, 5, 161-180.

Rodrigo, M.M.T., Baker, R.S.J.d., Lagud, M.C.V., Lim, S.A.L., Macapanpan, A.F., Pascua, S.A.M.S., Santillano, J.Q., Sevilla, L.R.S., Sugay, J.O., Tep, S., Viehland, N.J.B. (2007) Affect and Usage Choices in Simulation Problem Solving Environments. Proceedings of Artificial Intelligence in Education 2007, 145-152. pdf

Siegler, R.S. (2002) Microgenetic Studies of Self-Explanations. In N. Granott & J. Parziale (Eds.), Microdevelopment: Transition processes in development and learning, 31-58. New York: Cambridge University.

Walonoski, J.A., Heffernan, N.T. (2006) Detection and Analysis of Off-Task Gaming Behavior in Intelligent Tutoring Systems. Proceedings of the 8th International Conference on Intelligent Tutoring Systems, 382-391.