Intelligent Writing Tutor

PI	Teruko Mitamura
Graduate Students	Ruth Wylie
Faculty	Ken Koedinger, Brian MacWhinney
Others with > 160 hours	Jim Rankin
Start date study 1	March 2006
End date study 1	June 2007
Learnlab	ESL
Number of students	10 (current), 100 (expected total)
Total Participant Hours	~30 (current), ~300 (expected total)
Datashop?	Yes

Intelligent Writing Tutor

Teruko Mitamura, Ruth Wylie, and Jim Rankin
Project Advisors: Brian MacWhinney and Ken Koedinger

Abstract

In our first study, motivated by both classroom needs and learning science questions, we developed two computer-based systems to help students learn the English article system (a, an, the, null). The first system, a menu-based task, mimics cloze activities found in many ESL textbooks. The second, a controlled-editing task, gives students practice with both detecting errors and producing the correct response. Results from a think-aloud study show significant performance differences between the two tasks. Students find the controlled-editing task more challenging but appear more motivated and engaged when using the system.

The log data from Study 1, along with a Knowledge Component Analysis, has led to a Difficulty Factors Analysis of the domain. In order to determine which article constructions pose the largest challenges to students, a coding scheme was developed and each noun phrase of the target paragraphs was analyzed for both the type of article used (a, an, the, or null) and the reason for its use (e.g. ‘a’ because non-specific, singular, count noun; ‘null’ because non-specific, mass noun; ‘the’ because unique-for-all noun, ‘the’ because unique-by-definition noun; etc.).

Finally, our current work contributes to the Assistance Dilemma. Mainly, when should we provide assistance to students and when should we withhold it? Our planned studies examine whether the added difficulty imposed by the editing tutor is adding extraneous or germane cognitive load.

Glossary

L1: a students first language
Extraneous Cognitive Load
Germane Cognitive Load

Background

Research Motivation

We are interested in looking at the effects of transfer from L1 in language learning and English in particular. This study will contribute to the theory of robust learning by providing experimental data related to the often cited but little researched educational principle of “build on prior knowledge”. We hypothesize that elements of English that correspond to a student's L1 will be easier to learn than those elements that do not correspond (positive transfer). Moreover, elements for which there are no corresponding elements will be harder to learn (negative transfer). Our study will go beyond the simple Contrastive Analysis by examining at a detailed leveled the various features and their relative validity for a given knowledge component. For example, instead of looking at simply if articles exist in the native language, we will examine specific instances of article usage (e.g. immediate situation, general knowledge, sporadic reference, etc. (Sand, 2004)).

Educational Motivation

While error-free article use may not be necessary in order for most communication to occur (e.g. People usually understand what is meant by “Give me a book on the table” even when “Give me the book on the table” is correct), the problem becomes more severe when students submit written work, especially high-level work such as technical and academic reports. Errors in writing at this level may suggest, either consciously or unconsciously, that errors exist in the work itself (Master, 1997). Furthermore, students at this level are often aware of their difficulties with article use and are motivated to learn how to use articles correctly, thus developing a tutoring system that will help them develop better understanding and therefore produce error-free text and speech is likely to be well-received.

Research question

How is robust learning of English affected by L1 transfer?

Independent variables

Study 1:

The independent variable for this quasi-experiment is the student’s first language (L1). Students will be divided into groups based on the type of determiners present in their L1. Students with L1s do not have an article system (e.g. Chinese, Japanese, Korean) will be placed in one group, while students with L1s that do have an article system (e.g. Spanish, Arabic, French) will be placed in another.

Hypothesis

Study 1:

Hypothesis 1: Knowledge components subject to negative transfer will be harder to learn than knowledge components for which there is positive or no transfer

Pcno – Probability of correct given no transfer
Pc- – Probability of correct given negative transfer

Hypothesis 2: Knowledge components subject to negative transfer will exhibit greater rate of decay than knowledge components for which there is positive or no transfer.

Pcno – Probability of correct given no transfer
Pc- – Probability of correct given negative transfer
PI – Probability of Incorrect

Dependent variables

This study will utilize a series of post-tests which measure both normal and robust learning, including:

Normal post-test, immediate: Immediately following instruction, students will complete their first post-test in order to measure the effectiveness of the training itself. These tasks will be a measure of normal learning (near transfer, immediate testing).
Normal post-test, long-term retention: Additional post-tests will be administered 3, 10, 20, and 35 days after initial instruction. These measures will be similar to the ones students encountered during training but will assess the more robust learning measure of long-term retention.
Transfer: In addition, we will collect student writing samples in order to determine if the instructional activities succeeded in enabling students to produce text with fewer errors.

Explanation

This study is part of the Fluency and Refinement cluster. The main hypothesis of this cluster is that the structure of instructional activities and student’s prior knowledge play critical roles in developing robust learning. Since students learning English are already fluent in at least one language, we can utilize this fact to better understand how a student’s prior knowledge affects acquisition of new knowledge components. The learning event space is described as follows:

Start

Guess
1. Entry is correct --> exit, with little learning
2. Entry is incorrect --> Start
Use the article of one’s first language
1. Entry is correct --> exit, with possibly mistaken learning
2. Entry is incorrect --> Start
Try to apply knowledge of English article grammar
1. Entry is correct --> Exit, with learning
2. Entry is incorrect --> Start

The second set of paths (2, 2.1, 2.2) are only available to students whose first language has articles. Thus, this analysis is an instance of the explanation schema adding new paths.

Although this study seems on the surface to be a simple matter of measuring negative transfer, the learning event space analysis suggests that learning is contingent on students’ choices of paths. In particular, if the grammatical system of the first language is quite different from the English system, students may rapidly learn that choice 2 leads only to errors (2.2) so they may stop using their first language as the default solution. In that case, the expected negative transfer may not occur.

The study includes retention and accelerated future learning measures. This allows testing of the path independence hypothesis (Klahr & Nigam; Nokes & Ohlsson), which is that when students reach a certain level of competence, it doesn’t matter how they got there; their subsequent performance, including both retention and acceleration, will be the same. The PSLC theoretical framework suggests that the path to competence does make a difference, albeit a small one. If students make several errors per learning event, and thus have to cycle through the paths above several times, then when they do eventually produce the correct response, the encoding context is cluttered with features due to the errors and feedback messages. During testing, those features will be absent. Thus, students may be less able to retrieve the appropriate knowledge components. This predicts that students who make multiple errors during training, and this is likely to be the students who have path 2 available to them, are likely to have less robust learning.

Findings

Think-Aloud Study

A pilot study was conducted in order to examine differences between the two task types as well as gather verbal protocols in order to understand which rules and heuristics students employ when solving these problems. If students performed equally well (or poorly) using the different interfaces, one would not expect to see differences in learning. In answering these questions, we performed a think-aloud study in which we invited ELL participants into the lab and asked them to complete tasks using the two systems. Students were told beforehand that the only errors present within the paragraphs were article errors.

Task Content

The problem paragraphs came from intermediate and advanced-level ESL textbooks (e.g. [2], [3]). The first problem was shorter in length and used simpler vocabulary than the second problem did. Students were randomly assigned to one of two groups; those in Group 1 completed the intermediate level problem using the editing (production and detection) interface and the advanced level problem with the menu (production-only) interface. Students in Group 2 did the opposite (See Table 1).

Table 1. Type of interface used for each problem

	Intermediate Level Problem	Advanced Level Problem
Group 1	Production + Detection Task	Production Only Task
Group 2	Production Only Task	Production + Detection Task

Participants

Participants were recruited from Carnegie Mellon University’s InterCultural Communication Center’s Academic Culture and Communication (ACC) program. The program is a six week summer program designed for newly admitted nonnative English speaking students. Traditionally, students entering the program have high TOEFL scores (greater than 580) and at least an intermediate level of spoken fluency. ACC students are a highly educated and highly motivated group.

In total, there were six participants (three female, three male), and all were native Chinese speakers. The average age was 28 years old and all students had been learning English since middle school (average number of years = 14.4 years). Using self-report scales, participants gave an average proficiency score of 3.4 for reading, 3.0 for writing, and 2.5 for speaking (where 1 represents absolute beginner and 5 native or fluent).

The participants of this study represent a population that is perhaps more advanced than the overall target population of these systems. This is due to the think-aloud methodology employed for the pilot study. We needed students whose English ability was high enough that they were able to verbalize what they were doing. However, because English articles are often one of the last grammar points for students to master, we were able to use texts that were challenging even for these advanced participants.

Results

The data reveal a strong performance difference between the two tasks. Again, the difference in the two interfaces was that in the production-only version, students only had to choose which article to insert for each given box. In the production and detection condition, students had to identify and correct the errors. If students were able to correctly locate errors but had difficulty correcting them, we would expect for the results to be the same regardless of the interface. However, if error identification is the true obstacle, we would expect students in the detection and correction condition to perform worse than students using the production-only tutor.

For both problems, the production-only task resulted in higher accuracy, as measured by the percent of necessary changes that were correctly made. While the difference between the two interfaces was not significantly different for the easier, intermediate-level problem (t=1.45, p = 0.13), the performance results were significantly different for the advanced problem (t=2.13, p = 0.05), suggesting that when students are presented with level-appropriate texts, detecting errors is a formidable challenge.

Further Information

Proposed Future Work Proposed future work for the IWT project include conducting a controlled in vivo experiment in which we answer the following question - How does scaffolding and feedback timing during learning affect transfer to authentic production?

During the unit on articles, students will be assigned to one of four conditions (See Table 2). In addition to studying the main effects of the manipulation, we will look for interactions between condition and student level (intermediate vs. advanced).

Table 2. Proposed 2x2 design

	Menu-Based Tutor	Controlled-Editing Tutor
Immediate Feedback	Group 1	Group 2
Delayed Feedback	Group 3	Group 4

Menu-Based Tutor

The menu-based tutor, built using CTAT, is similar to the cloze activities found in many English textbooks. Using this interface, students select an article from the drop-down menus in order to complete the paragraph. Students do not have to identify where the errors exists but must produce the correct answer for each box.

Controlled-Editing Tutor

The controlled-editing tutor is also implemented via CTAT through the introduction of a widget developed this year. The controlled-editing tutor allows students to insert, remove or change articles anywhere in the text. However, students can only edit articles thus preventing students from completing rewriting sentences in order to avoid certain grammar constructions. In this interface, students must first identify and then fix the errors in the paragraph.

Feedback Conditions

In the immediate feedback condition, student edits turn green if they are correct and red if they are incorrect. In the delayed feedback condition, all edits turn blue until the student states that he is finished working with the paragraph. At this time, the edits are graded and labeled correct (green) or incorrect (red). Students are required to fix all the incorrect edits before moving to the next problem.

Expected Contributions In addition to the software contributions of developing two article tutors, the study will provide a rigorous evaluation of the role of feedback and scaffolding for language learning and production. The practical implications of this work include a better understanding of which instructional techniques lead to robust learning of article use. This work will widen the scope of previous learning science research by introducing the language learning domain to the debates of when to present feedback and the risks/benefits of scaffolding during learning.

Papers and Presentations

Papers

Wylie, R. (accepted) Are we asking the right questions? Understanding which tasks lead to the robust learning of English grammar. Accepted as a Young Researchers Track paper at the 13th International Conference on Artificial Intelligence in Education (2007).

Presentations and Posters

Computer Assisted Language Instruction Consortium (CALICO)
To be presented May 24, 2007
Title: Developing Tutoring Systems for Classroom and Research Use: A look at two English Article Tutors

Three Rivers Teachers of English to Speakers of Other Languages (3R TESOL) Conference
Date: October 28, 2006
Title: From Practice to Production: Developing Tutoring Systems for English Article Use

Society for Neuroscience, SLC Symposium
Date: October 13, 2006
Title: Developing Intelligent Tutoring Systems for Language Learning

University of Pittsburgh’s Multimedia Showcase
Date: September 27, 2006
Title: Two Tutors, One Goal: Two tutoring systems for teaching English articles

Intelligent Writing Tutor

Contents