Difference between revisions of "REAP Study on Personalization of Readings by Topic (Fall 2006)"

From LearnLab
Jump to: navigation, search
(Explanation)
(Explanation)
Line 76: Line 76:
  
  
[[Image:post_just_practice.PNG|700px]]
+
[[Image:post_just_practice.PNG|400px]]
  
 
However, students in the treatment condition that included personalization saw fewer words in their training sessions (N=16, M=12.0 , SD=1.13) than students in the control condition (N=19, M=16.3, SD=0.87) (t=-2.9, df=33, p=0.006).  Average time on task was essentially the same for students in both conditions.  Students in the treatment condition spent slightly longer on each reading.  The main reason, however, for the difference in the average total number of words practiced was that students for whom the tutor provided personalized instruction saw fewer words (M=3.41, SD=0.55) per practice reading passage than students in the control condition (M=4.07, SD=0.83) (t=2.929, df=33, p=0.006).
 
However, students in the treatment condition that included personalization saw fewer words in their training sessions (N=16, M=12.0 , SD=1.13) than students in the control condition (N=19, M=16.3, SD=0.87) (t=-2.9, df=33, p=0.006).  Average time on task was essentially the same for students in both conditions.  Students in the treatment condition spent slightly longer on each reading.  The main reason, however, for the difference in the average total number of words practiced was that students for whom the tutor provided personalized instruction saw fewer words (M=3.41, SD=0.55) per practice reading passage than students in the control condition (M=4.07, SD=0.83) (t=2.929, df=33, p=0.006).
Line 82: Line 82:
  
  
[[Image:words_per_reading.PNG|700px]]
+
[[Image:words_per_reading.PNG|400px]]
  
 
There is a possibility that the students in the treatment condition who were seeing fewer words in each reading were learning more of the words simply because they had fewer to learn per reading.  To rule out this hypothesis, regression analyses (multiple linear regression) with overall post-test performance and performance for practiced words as the dependent variables.  In both regression analyses, the number of target words per reading was not a significant predictor of performance.  In fact, the number of target words per document was slightly positively correlated with post-test performance in both cases.  This result seems to rule out the possibility that students were learning more target words in the treatment condition because they were seeing fewer words.   
 
There is a possibility that the students in the treatment condition who were seeing fewer words in each reading were learning more of the words simply because they had fewer to learn per reading.  To rule out this hypothesis, regression analyses (multiple linear regression) with overall post-test performance and performance for practiced words as the dependent variables.  In both regression analyses, the number of target words per reading was not a significant predictor of performance.  In fact, the number of target words per document was slightly positively correlated with post-test performance in both cases.  This result seems to rule out the possibility that students were learning more target words in the treatment condition because they were seeing fewer words.   

Revision as of 14:52, 19 March 2007

REAP Study on Personalization of Readings for Increased Interest

Abstract

Intrinsic motivation and personal interest have been shown to lead to deeper processing, the use of learning strategies, and better learning outcomes (Lepper, 1988). Therefore, it seems reasonable to try to increase interest in a tutoring environment. Previous studies have shown both positive (Cordova & Lepper, 1997) and negative (Clark & Mayer, 2003) effects of attempts to increase interest. Clark and Mayer (2003) state that adding interest but irrelevant material may distract or disrupt the learner.

This study investigated the effects of personalizing practice readings by topic. The REAP tutor for ESL vocabulary can prefer practice readings that match personal interests as specified in a questionnaire. However, if this preference is included as a factor in reading selection, then other factors, such as the density of practice opportunities in a reading, are necessarily given less weight. Finding a reading for a student is like searching for an optimal point in a multidimensional space. In practice, the tutor never finds optimal readings, but must weigh different factors against each other (e.g., reading difficulty, length, density of practice opportunities, etc.). Adding personalization as another factor may significantly affect the tutor's ability to find readings of high quality according to other factors.

This study investigated the use of personalization of readings by topic as an attempt to reduce gaming and increase the likelihood that students would more deeply process the context around target words, rather than just access definitions for them. As such, this study primarily addresses the category of passive, implicit instruction in the following table of types of instruction and learning.

Passive Active Interactive
Explicit (general)
Implicit (instance) Interpreting meaning in context while reading

Glossary

Intrinsic Motivation: Motivation to learn for learning's own sake rather than some external goal.

Extrinsic Motivation: Motivation for learn in order to satisfy an external goal, such as completing a task or passing an assessment.

Personalization: The preference of practice materials to match the personal interests of the learner. In this context, this term is used as it is in information retrieval rather than as in Mayer's work (Clark & Mayer, 2003) where it means using casual, direct language.

Research question

Do the benefits of personalization of practice readings by topics of interest outweigh the costs in a tutoring system for ESL vocabulary practice?


Dependent variables

Immediate post-test scores for practiced words.

Number of words practiced

Overall post-test scores (essentially a product of the previous two)

Long-term retention test scores similar to post-test but administered months later.

Transfer of knowledge: sentence production tasks for target words, correct use of words in writing assignments for other courses.

Independent variables

Personalization of readings by topics of interest. In the control condition, the tutor did not use potential personal interest as a factor in its selection of reading materials. In the treatment condition, the tutor did use interest as a factor. All other selection criteria were the same in both conditions. Time on task was also the same.

Hypotheses

Since intrinsic motivation seems to be important in language learning, the benefits of personalization will outweigh the costs.

Findings

Students in the treatment condition with personalization performed better on average (M=35.5%, SD=14.9%) in terms of overall post-test scores compared to students in the control condition (M=27.1%, SD=17.2%). However, the improvement of average overall post-test scores in the treatment condition was only 8.4% (95% CI = -2.8%, 19.5%), which corresponds to a medium effect size of 0.51. This difference was not statistically significant (p=0.14). Therefore, the null hypothesis that personalization has no effect on overall post-test scores cannot be rejected.


File:Graph40.PNG

Explanation

There is evidence that the difference in post-test scores is due to increased interest leading to deeper processing of the reading practice texts.

Responses to questionnaires following each reading show the interest level of students using the REAP tutor. The questionnaires asked students to indicate on a scale from one to five their interest in the preceding text. The distributions of post-reading interest ratings for students in the treatment and control conditions are shown in Figures 1 and 2.

File:Interest combined.PNG

Students were also given an exit survey during their last week of practice with the tutor that asked them, among other questions, for to indicate whether they agreed with the statement, “Most of the readings were interesting.” The ratings were on a scale from one to five, with five indicating strong agreement and one indicating strong disagreement. Exit survey interest ratings by students in the treatment condition were significantly higher (p<0.05) than the ratings by students in the control condition. The mean response for students who received personalized readings was 3.18, while it was 2.65 for students in the control condition.


The effects of this increased interest were measured by time spent on readings and scores on reading check questions designed to test that the student at least read the text (these were not detailed tests of comprehension). Students in the treatment condition spent slightly (though not significantly) longer on each reading. Students in the treatment group also scored higher on post-reading reading-check questions aimed at verifying that the student actually read the text, rather than just accessing definitions for highlighted target words, which was a gaming behavior witnessed in previous studies. The reading check questions were multiple-choice questions of the form, "Which set of words occurred in the passage?" The correct answer contained only salient words (defined by the tf.idf measure from information retrieval) that appeared in the text. Distractors contained some salient words from the text, but also words that were not in the text. There is some evidence from REAP studies that performance on this type of question correlates with post-test vocabulary scores (which are unrelated to the content of readings). Thus, it seems that the students in the treatment group were processing the context around the target words to a greater degree. However, the difference in reading check question performance is only marginally significant (2-sided independent samples t-test, p<0.10)

File:Readingcheck0.PNG

Further analysis of post-test scores reveals that students did learn more of the words that they actually practiced in REAP. The post-test contained 40 questions for target vocabulary words. Many of the students did not practice 40 words, so performance on practiced words alone was analyzed. Students in the treatment condition scored higher (N=16, M=50.3, SD=20.1) on questions for words seen in readings than did students in the control condition (N=19, M=32.4, SD=18.9). A two-tailed t-test for independent means verified that this result is statistically significant (t=2.719, df=33, p=0.005). The difference of scores between the two groups was 17.9% (95% CI = 4.5%, 31.3%), which corresponds to a large effect size of 0.85. This result indicates that personalization improved learning for the words that students saw in readings, which is in line with previous findings that intrinsic motivation leads to improved learning.


File:Post just practice.PNG

However, students in the treatment condition that included personalization saw fewer words in their training sessions (N=16, M=12.0 , SD=1.13) than students in the control condition (N=19, M=16.3, SD=0.87) (t=-2.9, df=33, p=0.006). Average time on task was essentially the same for students in both conditions. Students in the treatment condition spent slightly longer on each reading. The main reason, however, for the difference in the average total number of words practiced was that students for whom the tutor provided personalized instruction saw fewer words (M=3.41, SD=0.55) per practice reading passage than students in the control condition (M=4.07, SD=0.83) (t=2.929, df=33, p=0.006). Thus, when the tutor used personalization as a factor in the selection of readings, it chose readings that were less valuable according to other factors. Specifically, this result shows that by personalizing instruction, the tutor was not able to provide practice for as many words. Of course, the practice that it did provide was better, as is shown in the previous result that for words student did practice, personalization appeared to increase learning.


File:Words per reading.PNG

There is a possibility that the students in the treatment condition who were seeing fewer words in each reading were learning more of the words simply because they had fewer to learn per reading. To rule out this hypothesis, regression analyses (multiple linear regression) with overall post-test performance and performance for practiced words as the dependent variables. In both regression analyses, the number of target words per reading was not a significant predictor of performance. In fact, the number of target words per document was slightly positively correlated with post-test performance in both cases. This result seems to rule out the possibility that students were learning more target words in the treatment condition because they were seeing fewer words.


NOTE: Long-term retention test results are pending.

Descendents

Submitted paper to appear here.

Annotated bibliography

Clark, R. C. and Mayer, R. E. (2003). e-Learning and the Science of Instruction. Jossey-Bass/Pfeiffer.

Cordova, D. I. & Lepper, M. R. (1996). Intrinsic Motivation and the Process of Learning: Beneficial Effects of Contextualization, Personalization, and Choice. Journal of Educational Psychology. Vol. 88,l No. 4, 715-730.

Lepper, M. (1988). Motivational Considerations in the Study of Instruction. Cognition and Instruction. 5(4), 289-309.

Heilman, M., Juffs, A., & Eskenazi, M. (To Appear). Choosing Reading Passages for Vocabulary Learning by Topic to Increase Intrinsic Motivation. Proceedings of the 13th International Conferenced on Artificial Intelligence in Education. Marina del Rey, CA. (poster)