Difference between revisions of "Applying optimal scheduling of practice in the Chinese Learnlab"

From LearnLab
Jump to: navigation, search
(Research question)
 
(14 intermediate revisions by the same user not shown)
Line 11: Line 11:
 
|-
 
|-
 
! Others with > 160 hours
 
! Others with > 160 hours
| Dozzi
+
| Dozzi, Lili Wu
 
|-
 
|-
 
! Learnlab
 
! Learnlab
Line 17: Line 17:
 
|-
 
|-
 
! Number of students
 
! Number of students
| 300
+
| 450
 
|-
 
|-
 
! Total Participant Hours
 
! Total Participant Hours
| >700
+
| >1150
 
|-
 
|-
 
! Datashop?
 
! Datashop?
| Expected date 4/15
+
| Current to Spring 2007
 
|}
 
|}
  
 
=== Abstract ===
 
=== Abstract ===
* Chronology
+
==== Chronology ====
**Spring 2006
+
----
***Software debugging and testing
+
*Spring 2006
***Parameterization data collected from approximately 80 students and 160 hours in Elementary Chinese II
+
**Software debugging and testing
***Parameterization data collected from approximately 20 students and 40 hours in Elementary Spanish I
+
**Parameterization data collected from approximately 80 students and 160 hours in Elementary Chinese II
 
+
**Parameterization data collected from approximately 20 students and 40 hours in Elementary Spanish I
**Summer 2006
+
----
***Multi-unit tutor and experiment piloted
+
*Summer 2006
 
+
**Multi-unit tutor and experiment piloted
**Fall 2006
+
----
***Multi-unit tutor applied in the following experiment:
+
*Fall 2006
The vocabulary tutor will be deployed in both Online and Classroom Chinese I classes for an efficacy test. The first 8 units (excluding Unit 1) of each class will be split into two tutors each with content for 4 units. Each of these 4 unit tutors will be an experiment replication, so that the experiment design is replicated twice for each class track. During these 4 unit in-vivo experiments, the tutor will alternate between required units and voluntary units, and the order of this alternation will be randomly assigned by the tutor software for each student. In each tutor, the first unit will be assessed before the 3rd unit and the 2nd unit will be assessed before the 4th unit. This design will allow a comparison of whether requiring the tutor provides an advantage to learning at a long-term interval. The tutor will also administer a brief survey of students to get self-reports of vocabulary study time from students (both inside and outside the tutor). This survey will be given from within the tutor and will take less than 5 minutes total for each 4 unit tutor. The hypothesis is that students will do better when required to use the tutor despite not spending greater overall time studying vocabulary (both inside and outside the tutor). Further, Sue-mei has offered to administer an in class assessment of vocabulary using a paper and pencil test after each 4 unit tutor. This will give a measure of [[transfer]] outside the tutor that is hypothesized to reveal similar effects.
+
**Multi-unit tutor applied in the following experiment:
Another aspect of the design is caused by the fact that many students quit after 15 minutes, which is before the tutor introduces all the items. Since item introduction will be randomized within-subjects, this means that we will be able to conduct within-subjects tests of learning as a function of practice with individual words by the student. This will give another direct measure of the benefit of supplementary practice using the tutor.
+
::  The vocabulary tutor will be deployed in both Online and Classroom Chinese I classes for an efficacy test. The first 8 units (excluding Unit 1) of each class will be split into two tutors each with content for 4 units. Each of these 4 unit tutors will be an experiment replication, so that the experiment design is replicated twice for each class track. During these 4 unit in-vivo experiments, the tutor will alternate between required units and voluntary units, and the order of this alternation will be randomly assigned by the tutor software for each student. In each tutor, the first unit will be assessed before the 3rd unit and the 2nd unit will be assessed before the 4th unit. This design will allow a comparison of whether requiring the tutor provides an advantage to learning at a long-term interval. The tutor will also administer a brief survey of students to get self-reports of vocabulary study time from students (both inside and outside the tutor). This survey will be given from within the tutor and will take less than 5 minutes total for each 4 unit tutor. The hypothesis is that students will do better when required to use the tutor despite not spending greater overall time studying vocabulary (both inside and outside the tutor). Further, Sue-mei has offered to administer an in class assessment of vocabulary using a paper and pencil test after each 4 unit tutor. This will give a measure of [[transfer]] outside the tutor that is hypothesized to reveal similar effects. The probable benefit to students is from learning Chinese vocabulary more easily. All tutor curriculum is matched one-for-one with the words taught in the respective courses.
The probable benefit to students is from learning Chinese vocabulary more easily. All tutor curriculum is matched one-for-one with the words taught in the respective courses.
+
----
 
+
*Spring 2007
**Spring 2007
+
**Multi-unit tutor made cumulative
***Multi-unit tutor made cumulative
+
**Comparison "flashcard" ecological control created
***Comparison "flashcard" ecological control created
+
**Tutor applied to directly compare the flashcard version with the cumulative [[optimized scheduling]] version
***Tutor applied to directly compare the flashcard version with the cumulative [[optimized scheduling]] version
+
----
 +
*Fall 2007 -- Vocabulary practice
 +
**Multi-unit tutor now allows flexible student choice of unit or cumualtive practice
 +
**Students may choose a flashcard version or the optimized version
 +
**Between-subjects preference experiment for flashcard or optimized version
 +
**Prequiz/postquiz design to measure long-term learning and transfer.
 +
--
 +
*Fall 2007 -- Radical practice
 +
**Between-subjects comaprison in which students practiced Chinese radicals or Hanzi characters that were not on the pre/pos quizzes
 +
**Randomized assignement
 +
**Prequiz/postquiz design to measure accelerated future learning on previously unstudied Hanzi characters
  
 
=== Glossary ===
 
=== Glossary ===
Line 57: Line 67:
  
 
=== Dependent variables ===
 
=== Dependent variables ===
*[[Normal post-test]] - The tutor functions using an "assistments" type task where every drill practice is also a measure of normal learning.
+
;[[Normal post-test]]:The tutor functions using an "assistments" type task where every drill practice is also a measure of normal learning.
*[[Long-term retention]] - The experiment includes long-term assessments at various intervals. This includes both in tutor and paper and pencil tests of long-term vocabulary performance.
+
;[[Long-term retention]]:The experiment includes long-term assessments at various intervals. This includes both in tutor and paper and pencil tests of long-term vocabulary performance.
*[[Transfer]] learning - Long-term assessments may be given (50% of the time) using pairings not drilled by tutor. These transfer tests will show whether and to what extent students can use what is learned int he tutor flexibly in new contexts.
+
;[[Transfer]] learning:Long-term assessments may be given (50% of the time) using pairings not drilled by tutor. These transfer tests will show whether and to what extent students can use what is learned int he tutor flexibly in new contexts.
Accelerated future learning - Measures of accelerated future learning will be gathered by examining ...
+
Accelerated future learning - In the radical study (Fall 2007).
  
 
=== Independent variables ===
 
=== Independent variables ===
 
The amount practice for a particular group of subjects. Also, within subjects the amount of practice for any individual item.
 
The amount practice for a particular group of subjects. Also, within subjects the amount of practice for any individual item.
 +
 +
Radical experiment (Fall 2007) -- we manipulated whether students got radical practice or Hanzi practice.
  
 
=== Hypothesis ===
 
=== Hypothesis ===
 
The dependent variables will reveal benefits for individuals using the tutor as compared to individuals studying with other methods.
 
The dependent variables will reveal benefits for individuals using the tutor as compared to individuals studying with other methods.
 +
 +
Radical study hypotheses (Fall 2007) was that radical training would allow faster learning of previously unlearned Hanzi characters by providing knowledge components that would transfer to accelerate future Hanzi learning.
  
 
=== Findings ===
 
=== Findings ===
In Chinese, 7 sections of Chinese I class participated in an experiment in which students were randomized to either a) have unit 3 voluntary and unit 4 required or b) have unit 3 required and unit 4 voluntary. This crossover within-subjects experiment tested whether there was an advantage for requiring students to use the system 15 minutes compared to not requiring usage. For each student we computed the score advantage for the required unit vs. voluntary unit on a paper and pencil test of both units (10 items for each unit given approximately one month later). Errors were less (M = 0.90, SD = 1.4) for required compared to voluntary usage (M = 1.5, SD = 1.7) t(53) = 3.0, p < .005, with a Cohen’s d effect size = 0.41.
+
In Chinese, 7 sections of Chinese I class participated in an experiment in which students were randomized to either a) have unit 3 voluntary and unit 4 required or b) have unit 3 required and unit 4 voluntary. This crossover within-subjects experiment tested whether there was an advantage for requiring students to use the system 15 minutes compared to not requiring usage. For each student we computed the score advantage for the required unit vs. voluntary unit on a paper and pencil test of both units (10 items for each unit given approximately one month later). Results were not significant after a careful reanalysis of the data.  
  
For the Spring 2007 semester, the classroom version results so far (2/7/07) look good. There are differences in practice amounts between the control (flashcard) and experimental (optimized) between-subjects conditions that are of a similar magnitude to those that caused the positive results above. Specifically, students get about twice as many drill trials in the optimized condition (significant p<.001), about twice as many correct responses per minute (p<.001), a reduction in errors of 36% (p<.001), and about 2 minutes longer practice (p<.05). The longer practice and somewhat less attrition (not yet significant) for optimized subjects suggest they prefer the optimized conditions.  
+
For the Spring 2007 semester, the classroom version results were interesting. There are differences in practice amounts between the control (flashcard) and experimental (optimized) between-subjects conditions. Specifically, students get about twice as many drill trials in the optimized condition (significant p<.001), about twice as many correct responses per minute (p<.001), a reduction in errors of 36% (p<.001), and about 2 minutes longer practice (p<.05). The longer practice and somewhat less attrition (significant p< 0.05 when subjects with performance of less than 10% correct were excluded) for optimized subjects suggest they prefer the optimized conditions. Results on the final quiz indicated a small advanatge for the optimized subjects (p<.05) for the earlier units in the course. Not surprisingly these early units were also the ones that showed the greater attrition for the flashcard session. Unforutnately, examination of learning curves for this dataset show that the optimziation model was flawed and not optimal. Specifially, the learning curves show a U-shaped dip (quite visible in the DataShop) where perfromance was strangley low. Conditional analysis showed that the model was overly optimistic about the learning following a failed drill and prematurely widened schedules. It was surprising that despite this problem the optimized condition did as well as it did.
  
 
Of course, the spacing of practice tends to be wider for the control subjects, since they are moving through a random order of the stimuli. This probably results for a large portion of the difference above. Further, the control condition allows more metacognitive control since subject must decide after each test whether they want that item repeated during the following pass through the set or not. However, both of these procedures might make the differences above during practice unrepresentative of any long-term effects of the conditions, since the wider spacing and metacogntive control of the flashcard control condition might improve long-term efficiency. Further, there also is a cumulative component to the comparison, since the optimization condition allows more efficient review of prior units. In the control condition, subjects are allowed the option of going through the full cumulative set after they finish each pass through the current unit set. Although this allows cumulative review for control subjects, it does not provide it in the efficient manner of the optimized condition in which cumulative review is interleaved with current practice using an expanding spacing for each old item.
 
Of course, the spacing of practice tends to be wider for the control subjects, since they are moving through a random order of the stimuli. This probably results for a large portion of the difference above. Further, the control condition allows more metacognitive control since subject must decide after each test whether they want that item repeated during the following pass through the set or not. However, both of these procedures might make the differences above during practice unrepresentative of any long-term effects of the conditions, since the wider spacing and metacogntive control of the flashcard control condition might improve long-term efficiency. Further, there also is a cumulative component to the comparison, since the optimization condition allows more efficient review of prior units. In the control condition, subjects are allowed the option of going through the full cumulative set after they finish each pass through the current unit set. Although this allows cumulative review for control subjects, it does not provide it in the efficient manner of the optimized condition in which cumulative review is interleaved with current practice using an expanding spacing for each old item.
  
For these reasons only long-term assessment (including a transfer component to show robustness) is adequate to assess the effects of the conditions on robust learning. Long-term assessment allows possible difference in the conditions to emerge and show the practical utility of the approaches. Intermediate term and final exam related assessments  for this Spring 2007 study are in planning stages and suggestions are welcome.
+
In Fall 2007 vocabualry classroom work we are currently seeing a strong preference for the optimized condition. Considering the care that was taken to make this comparison unbiased, this seems to indicate that students percieve greater advantage for using the optimized version.
 +
 
 +
In Fall 2007 radical classroom work we are finding a sginficiant advantage for radical training. This advantage amounts to a twice as much improvement (approx 14% vs 7%) in the learning rate for subjects that were assigned the one-hour radical practice session.
  
 
=== Explanation ===
 
=== Explanation ===
Line 87: Line 103:
  
 
=== Annotated bibliography ===
 
=== Annotated bibliography ===
Pavlik Jr., P. I. (2006). Transfer effects in Chinese vocabulary learning. In R. Sun (Ed.), Proceedings of the Twenty-Eighth Annual Conference of the Cognitive Science Society (pp. 2579). Mahwah, NJ: Lawrence Erlbaum.
+
*Pavlik Jr., P. I. (2006). Transfer effects in Chinese vocabulary learning. In R. Sun (Ed.), Proceedings of the Twenty-Eighth Annual Conference of the Cognitive Science Society (pp. 2579). Mahwah, NJ: Lawrence Erlbaum. [http://www.learnlab.org/uploads/mypslc/publications/pavlik-transfereffects.pdf (Article)]
 
+
*Pavlik Jr., P. I., Presson, N., Dozzi, G., Wu, S., MacWhinney, B. & Koedinger, K. (2007, accepted). The FaCT (Fact and Concept Training) System: A new tool linking cognitive science with educators. In D. McNamara & G. Trafton (Eds.), Proceedings of the Twenty-Ninth Annual Conference of the Cognitive Science Society. Mahwah, NJ: Lawrence Erlbaum. [http://www.learnlab.org/uploads/mypslc/publications/pavlik_1_31.pdf (Article)]
Pavlik Jr., P. I. (in press-a). Timing is an order: Modeling order effects in the learning of information. In F. E., Ritter, J. Nerb, E. Lehtinen & T. O'Shea (Eds.), In order to learn: How order effects in machine learning illuminate human learning. New York: Oxford University Press.
+
*Pavlik Jr., P. I. (in press-a). Timing is an order: Modeling order effects in the learning of information. In F. E., Ritter, J. Nerb, E. Lehtinen & T. O'Shea (Eds.), In order to learn: How order effects in machine learning illuminate human learning. New York: Oxford University Press.
 
+
*Pavlik Jr., P. I. (in press-b). Understanding and applying the dynamics of test practice and study practice. Instructional Science.
Pavlik Jr., P. I. (in press-b). Understanding and applying the dynamics of test practice and study practice. Instructional Science.
+
*Pavlik Jr., P. I., & Anderson, J. R. (2004,November). Optimizing Paired-Associate Learning. Poster presented at the 45th Annual Meeting of the Psychonomic Society, Minneapolis, MN.
 
+
*Pavlik Jr., P. I., & Anderson, J. R. (2005). Practice and forgetting effects on vocabulary memory: An activation-based model of the spacing effect. Cognitive Science, 29(4), 559-586.
Pavlik Jr., P. I., & Anderson, J. R. (2004,November). Optimizing Paired-Associate Learning. Poster presented at the 45th Annual Meeting of the Psychonomic Society, Minneapolis, MN.
 
 
 
Pavlik Jr., P. I., & Anderson, J. R. (2005). Practice and forgetting effects on vocabulary memory: An activation-based model of the spacing effect. Cognitive Science, 29(4), 559-586.
 
  
  
 
[[Category:Study]]
 
[[Category:Study]]

Latest revision as of 17:00, 3 December 2007

PIs Pavlik, MacWhinney, Wu, Koedinger
Faculty MacWhinney, Wu, Koedinger
Postdocs Pavlik
Others with > 160 hours Dozzi, Lili Wu
Learnlab Chinese
Number of students 450
Total Participant Hours >1150
Datashop? Current to Spring 2007

Abstract

Chronology


  • Spring 2006
    • Software debugging and testing
    • Parameterization data collected from approximately 80 students and 160 hours in Elementary Chinese II
    • Parameterization data collected from approximately 20 students and 40 hours in Elementary Spanish I

  • Summer 2006
    • Multi-unit tutor and experiment piloted

  • Fall 2006
    • Multi-unit tutor applied in the following experiment:
The vocabulary tutor will be deployed in both Online and Classroom Chinese I classes for an efficacy test. The first 8 units (excluding Unit 1) of each class will be split into two tutors each with content for 4 units. Each of these 4 unit tutors will be an experiment replication, so that the experiment design is replicated twice for each class track. During these 4 unit in-vivo experiments, the tutor will alternate between required units and voluntary units, and the order of this alternation will be randomly assigned by the tutor software for each student. In each tutor, the first unit will be assessed before the 3rd unit and the 2nd unit will be assessed before the 4th unit. This design will allow a comparison of whether requiring the tutor provides an advantage to learning at a long-term interval. The tutor will also administer a brief survey of students to get self-reports of vocabulary study time from students (both inside and outside the tutor). This survey will be given from within the tutor and will take less than 5 minutes total for each 4 unit tutor. The hypothesis is that students will do better when required to use the tutor despite not spending greater overall time studying vocabulary (both inside and outside the tutor). Further, Sue-mei has offered to administer an in class assessment of vocabulary using a paper and pencil test after each 4 unit tutor. This will give a measure of transfer outside the tutor that is hypothesized to reveal similar effects. The probable benefit to students is from learning Chinese vocabulary more easily. All tutor curriculum is matched one-for-one with the words taught in the respective courses.

  • Spring 2007
    • Multi-unit tutor made cumulative
    • Comparison "flashcard" ecological control created
    • Tutor applied to directly compare the flashcard version with the cumulative optimized scheduling version

  • Fall 2007 -- Vocabulary practice
    • Multi-unit tutor now allows flexible student choice of unit or cumualtive practice
    • Students may choose a flashcard version or the optimized version
    • Between-subjects preference experiment for flashcard or optimized version
    • Prequiz/postquiz design to measure long-term learning and transfer.

--

  • Fall 2007 -- Radical practice
    • Between-subjects comaprison in which students practiced Chinese radicals or Hanzi characters that were not on the pre/pos quizzes
    • Randomized assignement
    • Prequiz/postquiz design to measure accelerated future learning on previously unstudied Hanzi characters

Glossary

Research question

Does the optimized scheduling of practice produced by the Chinese vocabulary tutor result in measurable difference in performance for students?

Background and significance

Efforts to use practice scheduling algorithms date to the early 60's. One seminal example is Atkinson's (1972) German vocabulary tutor. While these efforts have often produced positive results, such programs have never been employed in the classroom in a consistent fashion. Perhaps this is due to the many practical issues involved with integrating such a system into the context of a course curriculum.

Dependent variables

Normal post-test
The tutor functions using an "assistments" type task where every drill practice is also a measure of normal learning.
Long-term retention
The experiment includes long-term assessments at various intervals. This includes both in tutor and paper and pencil tests of long-term vocabulary performance.
Transfer learning
Long-term assessments may be given (50% of the time) using pairings not drilled by tutor. These transfer tests will show whether and to what extent students can use what is learned int he tutor flexibly in new contexts.

Accelerated future learning - In the radical study (Fall 2007).

Independent variables

The amount practice for a particular group of subjects. Also, within subjects the amount of practice for any individual item.

Radical experiment (Fall 2007) -- we manipulated whether students got radical practice or Hanzi practice.

Hypothesis

The dependent variables will reveal benefits for individuals using the tutor as compared to individuals studying with other methods.

Radical study hypotheses (Fall 2007) was that radical training would allow faster learning of previously unlearned Hanzi characters by providing knowledge components that would transfer to accelerate future Hanzi learning.

Findings

In Chinese, 7 sections of Chinese I class participated in an experiment in which students were randomized to either a) have unit 3 voluntary and unit 4 required or b) have unit 3 required and unit 4 voluntary. This crossover within-subjects experiment tested whether there was an advantage for requiring students to use the system 15 minutes compared to not requiring usage. For each student we computed the score advantage for the required unit vs. voluntary unit on a paper and pencil test of both units (10 items for each unit given approximately one month later). Results were not significant after a careful reanalysis of the data.

For the Spring 2007 semester, the classroom version results were interesting. There are differences in practice amounts between the control (flashcard) and experimental (optimized) between-subjects conditions. Specifically, students get about twice as many drill trials in the optimized condition (significant p<.001), about twice as many correct responses per minute (p<.001), a reduction in errors of 36% (p<.001), and about 2 minutes longer practice (p<.05). The longer practice and somewhat less attrition (significant p< 0.05 when subjects with performance of less than 10% correct were excluded) for optimized subjects suggest they prefer the optimized conditions. Results on the final quiz indicated a small advanatge for the optimized subjects (p<.05) for the earlier units in the course. Not surprisingly these early units were also the ones that showed the greater attrition for the flashcard session. Unforutnately, examination of learning curves for this dataset show that the optimziation model was flawed and not optimal. Specifially, the learning curves show a U-shaped dip (quite visible in the DataShop) where perfromance was strangley low. Conditional analysis showed that the model was overly optimistic about the learning following a failed drill and prematurely widened schedules. It was surprising that despite this problem the optimized condition did as well as it did.

Of course, the spacing of practice tends to be wider for the control subjects, since they are moving through a random order of the stimuli. This probably results for a large portion of the difference above. Further, the control condition allows more metacognitive control since subject must decide after each test whether they want that item repeated during the following pass through the set or not. However, both of these procedures might make the differences above during practice unrepresentative of any long-term effects of the conditions, since the wider spacing and metacogntive control of the flashcard control condition might improve long-term efficiency. Further, there also is a cumulative component to the comparison, since the optimization condition allows more efficient review of prior units. In the control condition, subjects are allowed the option of going through the full cumulative set after they finish each pass through the current unit set. Although this allows cumulative review for control subjects, it does not provide it in the efficient manner of the optimized condition in which cumulative review is interleaved with current practice using an expanding spacing for each old item.

In Fall 2007 vocabualry classroom work we are currently seeing a strong preference for the optimized condition. Considering the care that was taken to make this comparison unbiased, this seems to indicate that students percieve greater advantage for using the optimized version.

In Fall 2007 radical classroom work we are finding a sginficiant advantage for radical training. This advantage amounts to a twice as much improvement (approx 14% vs 7%) in the learning rate for subjects that were assigned the one-hour radical practice session.

Explanation

Assuming the tutor is more efficient than other methods, one would expect that students using it would perform better in less time, perform the same in less time, or perform better in the same amount of time.

Transfer results have not yet been analyzed.

Descendants

Optimizing the practice schedule

Annotated bibliography

  • Pavlik Jr., P. I. (2006). Transfer effects in Chinese vocabulary learning. In R. Sun (Ed.), Proceedings of the Twenty-Eighth Annual Conference of the Cognitive Science Society (pp. 2579). Mahwah, NJ: Lawrence Erlbaum. (Article)
  • Pavlik Jr., P. I., Presson, N., Dozzi, G., Wu, S., MacWhinney, B. & Koedinger, K. (2007, accepted). The FaCT (Fact and Concept Training) System: A new tool linking cognitive science with educators. In D. McNamara & G. Trafton (Eds.), Proceedings of the Twenty-Ninth Annual Conference of the Cognitive Science Society. Mahwah, NJ: Lawrence Erlbaum. (Article)
  • Pavlik Jr., P. I. (in press-a). Timing is an order: Modeling order effects in the learning of information. In F. E., Ritter, J. Nerb, E. Lehtinen & T. O'Shea (Eds.), In order to learn: How order effects in machine learning illuminate human learning. New York: Oxford University Press.
  • Pavlik Jr., P. I. (in press-b). Understanding and applying the dynamics of test practice and study practice. Instructional Science.
  • Pavlik Jr., P. I., & Anderson, J. R. (2004,November). Optimizing Paired-Associate Learning. Poster presented at the 45th Annual Meeting of the Psychonomic Society, Minneapolis, MN.
  • Pavlik Jr., P. I., & Anderson, J. R. (2005). Practice and forgetting effects on vocabulary memory: An activation-based model of the spacing effect. Cognitive Science, 29(4), 559-586.