DataShop Pipeline

From LearnLab
Revision as of 15:46, 29 October 2007 by Kcunning (talk | contribs) (LIVE!)
Jump to: navigation, search

This page provides information on datasets in various stages of progress. If you see an error in any of this information please feel free to correct it by editing this page. If you have a comment or concern on where a dataset sits in the DataShop pipeline, please contact the DataShop staff.

ANALYSIS REPORTED

(on learnlab.org - ordered by alphabetically by project, researchers get a gold star when their datasets move up into this table!)


Project Dataset Name DB ID Tool LearnLab P.I. School(s) Date Notes/Status Remarks

Geometry Course Geometry Area (1996-97) 76 CL Geometry Ken Koedinger unknown 1996-97

Paper written -- get reference!!
8/17: Primary analysis complete. Cen (2006), Cen (2007)


Geometry Course Geometry Angles - Fox Chapel 1998 122 CL Geometry Vincent Aleven unknown unknown Files-only

7/23: Low priority to do: get raw logs and distill to transaction level.
8/17: Two best paper awards!

LIVE!

(on learnlab.org - ordered by alphabetically by project)

Project Dataset Name DB ID Tool LearnLab P.I. School(s) Date Notes/Status Remarks
A Multimodal Interface for Solving Equations Handwriting Examples - Winter 2006 145 CL-2006 (Munger) Algebra Lisa Anthony Winter 2006 07/06/07: Being QA'd by Tristan

09/22/07: Data loaded to production.

1 Algebra Course Algebra I 2005
Algebra I 2005-2006 (Hampton only)
123,110 CL-2005 Algebra Albert Corbett CWCTC
Hampton
Wilkinsburg HSs
2005-2006 2/20: Received DataMunger.

2/27: Started running data munger for wilkinsburg-algebra i.
3/5: Munger done, have 1,378,891 txs instead of 1,187,841.
5/4: Munger fixed, Jon Steinhart running the conversion, Tristan to QA. Supposed to have data week of 5/7
Load of Wilkinsburg student data delayed due to problems copying the student files.
6/25: Hampton numbers on production not right. CWCTC numbers on the-cooker not right. Wilks data doesn't load. Requested a new complete CD from CL.
7/9: Modifying analysis database to support improved pruning, waiting for new CD with latest version of munger.
7/23: Loaded onto production with all 3 schools, one student from Wilkinsburg failed to load.

1 Chemistry Buffer Chemistry_Buffer_Study 63 OLI Chemistry Jodi Davenport CMU Spring 2006  
Chemistry Buffer CMU_sp07_BUFFERS 94 OLI Chemistry Jodi Davenport CMU unknown  
3 Chemistry Buffer Chemistry_Buffer_Study_2007 84 OLI Chemistry Jodi Davenport CMU, UBC? Spring 2007
4 Chemistry Collaboration Effects of collaboration in virtual laboratory environments 95 VLAB Chemistry Bruce McLaren Spring 2007 Files only.
5 Chinese Tone Study Chinese_tonestudy 1 unknown Chinese Ying Liu CMU Summer 2006
6 Chinese Tone Study Two Chinese_toneperception 64 unknown Chinese Ying Liu CMU 2005-2006
8 Contiguity-CWCTC Contiguity Difficulty Factors Analysis Winter 2006 153 Geometry Vincent Aleven Winter 2006 Files only
8 Contiguity-CWCTC Contiguity CWCTC Spring 2006 79 CL Geometry Vincent Aleven CWCTC HS Spring 2006
9 Contiguity-CWCTC Contiguity CWCTC Winter 2006 80 CL Geometry Vincent Aleven CWCTC HS Winter 2006
10 Contiguity-CWCTC Contiguity CWCTC Fall 2006 102 CL + CTAT Geometry Kirsten Butcher, Vincent Aleven CWCTC Fall 2006 6/21: Reloaded.

07/18/07: v4.13 Loaded on test machine for reload verify
7/23: Received missing CTAT data from Frank. Kyle to work on 7/24.
7/26: Missing CTAT uploaded to production.

11 Contiguity-CWCTC Contiguity CMU Winter 2007 113 CL Geometry Kirsten Butcher, Vincent Aleven CMU Winter 2007 6/21: Reloaded.
12 Division Tutor division_study 98 CTAT-Flash Geometry? Stefan King Perrysville Elem Summer 2007
Does Treating Student Uncertainty as a Learning Impasse Improve Learning in Spoken Dialogue Tutoring? WOZ Uncertainty Adaptation 128 ITSPOKE Physics (intended) Kate Forbes-Riley; Diane Litman Lab experiment with Pitt students Winter-Spring 2006-7 7/31/07: Project & Dataset created.
13 Elementary Chinese Course ElemChineseFA06 75 OLI Chinese
14 Elementary Chinese Course ElemChineseSU07 96 OLI Chinese
15 Elementary Chinese Course ElemChineseSP07 83 OLI Chinese
16 Example Study - Freiburg and CMU Example Freiburg Spring 2006 77 CL Geometry Vincent Aleven CMU Spring 2006
16 Example Study - Freiburg and CMU Example CMU Summer 2006 88 CL-modified Geometry Vincent Aleven CMU Summer 2006
18 Example Study - Freiburg and CMU Example Freiburg Summer 2006 78 CL Geometry Alexander Renkl unknown Summer 2006
19 Example Study - Freiburg and CMU Example CWCTC Winter 2007 99 CL+CTAT Geometry Vincent Aleven CMU Summer 2006 6/19: Need to reload CL data. Need to reload CTAT data.

6/21: Reloading CTAT to QA machine. CL will follow.
6/21: Data reloaded to production.
6/25: Bad (not all students were anonymized correctly) CTAT logs removed from production.
7/18: CL data: v4.13 Loaded on test machine for reload verify by Octav.
7/24: CTAT data: ready to load
7/31: CTAT data reloaded.

20 Example Study - Freiburg and CMU Example Steel Valley Spring 2007 114 CL Geometry Vincent Aleven Steel Valley HS Spring 2006 6/19: Need to reload CL data.

6/21: Data reloaded to production.

Example Study - Freiburg and CMU Example CWCTC Spring 2007 125 CL-Lisp tutor (no CTAT) Geometry Vincent Aleven, Ron Salden CWCTC & Wilkinsburg Spring 2007 07/06/07: Will receive from CL week of 7/10/07

07/09/07: Octav received data
07/18/07: Loaded on cooker
07/21/07: QUESTION: is this 3 datasets or 1 with 3 schools?
7/29/07: Loaded to production.

Example Study - Freiburg and CMU Example Wilkinsburg Spring 2007 124 CL-Lisp tutor (no CTAT) Geometry Vincent Aleven, Ron Salden CWCTC & Wilkinsburg Spring 2007 7/29/07: Loaded to production.
Example Study - Freiburg and CMU Example Freiburg Spring 2007 127 CL-Lisp tutor (no CTAT) Geometry Vincent Aleven, Ron Salden Gymnasium Spring 2007 07/10/07: Octav verifying conversion

7/29/07: Loaded to production.

Fostering fluency in second language learning: Testing two types of instruction Repetition in fluency training, Study 1 129 4/3/2 training English De Jong English Language Institute Fall 2006 7/31/07: Dataset created on production. No files yet.
Fostering fluency in second language learning: Testing two types of instruction Formulaic sequences in fluency training, Study 2 130 4/3/2 training English De Jong English Language Institute Spring 2007 7/31/07: Dataset created on production. No files yet.
Fostering fluency in second language learning: Testing two types of instruction Formulaic sequences in fluency training, Study 3 131 shadowing training English De Jong English Language Institute Spring 2007 7/31/07: Dataset created on production. No files yet.
21 French Course FrenchLanguag2 82 OLI French
22 French Course FrenchLanguage 74 OLI French
23 French Course FrenchLanguage2 81 OLI French
French Course pitteiffel 151 OLI French
French Course toureiffel 152 OLI French
28 Geometry Course Geometry Angles - North Hills Spring 2003 63 CL Geometry Vincent Aleven North Hills HS Spring 2003 Paper: Aleven & Koedinger, 2002 in Cognitive Science?? Or Popescu
29 Geometry Course Hampton Fall 2005 66 CL Geometry Vincent Aleven Hampton HS 2005-2006 Redoing this dataset with bug fixes in Octav's converter.
38 Geometry Course Geometry-AllStudents 6 CL Geometry Ken Koedinger unknown unknown Bad Version
Geometry Course Geometry Angles -Fox Chapel 1998 122 CL Geometry Vincent Aleven Fox Chapel 1998 Files Only
Geometry Course Geometry Area (1996-97) 76 CL Geometry Vincent Aleven unknown
30 IERI: Learning Oriented Dialogue Project Learning Oriented Dialogue Project - original 62
2 Improving Algebra Learning and Collaboration CPS Algebra I 2005 109 CL-2005 Algebra Bruce McLaren, Nikol Rummel Hampton HS 2005-2006
2 Improving Algebra Learning and Collaboration PTS Algebra I 2005 112 CL-2005 Algebra Bruce McLaren, Erin Walker 2005-2006 raw data files only
Improving Skill at Solving Equations via Better Encoding of Algebraic Concepts Corrective Self Explanation 147, 149 CL-2006 (Munger) + CTAT Algebra Julie Booth Golden Valley High School Feb, 2007 07/06/07: Being QA'd by Tristan

8/13/07: CTAT files received from Frank. Need to insert course_name, check for errors.
8/15/07: CTAT logs loaded to production. Some dataset info updated as well.
9/21/07: Munger failure due to CTAT logs already loaded. Need to clean out CTAT logs and try again.
9/22/37: Data loaded to production.
9/25/07: CTAT logs loaded to production (were removed for munger reload)

24 Improving cultural learning by predicting in French film FrenchOnline 32 CTAT? French Amy Ogan CMU Fall 2005
25 Improving cultural learning by predicting in French film FrenchTutor_Demo 16 CTAT? French Amy Ogan CMU Spring 2005
26 Improving cultural learning by predicting in French film French_Culture_Tutor 9 CTAT? French Amy Ogan CMU Spring 2005
27 Improving cultural learning by predicting in French film French_Culture_Tutor_Fall_2005 8 CTAT? French Amy Ogan CMU Fall 2005
31 Intelligent Writing Tutor iwt_course 86 english Ruth Wylie
32 Intelligent Writing Tutor iwt retention course 89 english Ruth Wylie 10/9/2007: this dataset is garbage.
32 Intelligent Writing Tutor eli_study 148 english Ruth Wylie Fall 1007 10/9/2007: should be renamed to IWT Article Tutor Level 5 Study Fall 2007 when data collection is complete
40 Knowledge Tracking Chinese Vocabulary Spring 2006 107 Phil's own Chinese Phil Pavlik CMU Spring 2006
Knowledge Tracking Spanish Vocabulary Spring 2006 108 Phil's own Spanish Phil Pavlik Winter 2007
Knowledge Tracking Chinese Vocabulary Transfer Lab Study Spring 2006 115 Phil's own Chinese Phil Pavlik Spring 2006 6/22: Loaded to production
33 MacWhinney Dictation Studies Chinese Dictation Fall 2005 73 unknown Chinese Brian MacWhinney unknown Fall 2005
34 MacWhinney Dictation Studies French Dictation Fall 2005 71 unknown French Brian MacWhinney unknown Fall 2005
35 MacWhinney Dictation Studies Spanish Dictation Fall 2005 72 unknown Other Brian MacWhinney unknown Fall 2005
36 OLI Statistics 07Meyer201 103 OLI Other CMU 2007 Loaded on Jun-08-2007
Physics Physics - USNA - Fall 2006 126 Andes Physics Kurt VanLehn US Naval Academy Fall 2006 6/19/07: Received 18,000 files from Anders.
6/20/07: Loaded to cooker, awaiting verification.

6/22/07: Tim to work on better skill model with Brett van de Sande & Anders. Waiting on result of this.
6/27/07: Anders regenerating raw logs
6/28/07: Files distilled ~100 have invalid xml. Anders to fix.
7/3/07: Reloading to the-cooker, some tutor semantic names are off. Also asked Anders to include units. 7/24/07: Received new data from Anders. Fixing bugs in distiller.
7/29/07: Data loaded to production - will be reloaded for changes to distiller.

39 Public Pre_Summer_School_01_Jun_05 10 CTAT-Flash Chemistry unknown unknown June 2005 Used to show DataShop features, example data only.
Robust Learning of Vocabulary REAP ELI Reading 4 Summer 2006 118 REAP English Maxine Eskenazi Pitt-ELI Summer 2006 3/29/2007 - Sent message to Michael, Maxine to determine status

4/4/07 - Waiting for Michael's studies to finish, he will then convert to DS format.
6/12/07 - Waiting for Michael to add <conditions> to the datasets
6/25/07 - Loading datasets to the-cooker
6/26/07 - Ready to load to production
6/27/07 - Loaded to production

Robust Learning of Vocabulary REAP ELI Reading 4 Spring 2006 117 REAP English Maxine Eskenazi Pitt-ELI Spring 2006 3/29/2007 - Sent message to Michael, Maxine to determine status

4/4/07 - Waiting for Michael's studies to finish, he will then convert to DS format.
6/12/07 - Waiting for Michael to add <conditions> to the datasets
6/25/07 - Loading datasets to the-cooker
6/26/07 - Reloading to the-cooker
6/27/07 - Ready to load to production
6/27/07 - Loaded to production

Robust learning with a Meta-Cognitive Tutor Help Tutor CWCTC Spring 2006 (cognitive) 134 CTAT+CL Geometry Ido Roll CWCTC HS 2005-2006

7/18/07: Files received from Octav, modifying DataShop code to improve import speed.
7/26/07: Loaded to the-cooker.
8/13/07: Loaded onto production!

Robust learning with a Meta-Cognitive Tutor Help Tutor CWCTC Spring 2006 (meta) 135 CTAT+CL Geometry Ido Roll CWCTC HS 2005-2006

7/18/07: Files received from Octav, modifying DataShop code to improve import speed.
7/26/07: Loaded to the-cooker.
8/13/07: Loaded onto production!

Robust learning with a Meta-Cognitive Tutor Short Hints Wilkinsburg Spring 2006 (cognitive) 137 CTAT+CL Geometry Ido Roll Wilkinsburg HS 2005-2006

7/18/07: Files received from Octav, modifying DataShop code to improve import speed.
7/26/07: Loaded to the-cooker.
8/13/07: Loaded onto production!

Robust learning with a Meta-Cognitive Tutor Short Hints Wilkinsburg Spring 2006 (meta) 136 CTAT+CL Geometry Ido Roll Wilkinsburg HS 2005-2006

7/18/07: Files received from Octav, modifying DataShop code to improve import speed.
7/26/07: Loaded to the-cooker.
8/13/07: Loaded onto productionnot!

43 Stoichometry Study PSLC Stoichiometry Study 1 2 CTAT-Flash Chemistry Bruce McLaren UBC, a NJ HS, Hampton HS 2005+ Holds data for Studies 1, 2, 3.

Paper written. Problem name labels include the condition and probably should not, makes error report difficult. Also missing problem descriptions.

44 Stoichometry Study SummerSchool2005 11 Chemistry Bruce McLaren
45 Stoichometry Study Winter_Workshop01 59 Chemistry Bruce McLaren
46 Stoichometry Study PSLC Stoichiometry Study Demo 3 Chemistry Bruce McLaren
42 The Effect of Generation and Interaction on Robust Learning Self Explanation - Electric Fields - USNA - Spring 2006 104 Andes Physics Bob Hausmann USNA Spring 2006 Loaded on Jun-08-2007.


Previously known as 'Hausmann-Experiment'.


47 Thermodynamics Thermo Fall 2005 61 unknown Other Vincent Aleven unknown Fall 2005
Training oral production in learning second language grammar The order of French pronouns 132 Flash online audio recording French De Jong Regular French courses at Pitt and CMU Spring and Summer 2007 7/31/07: Dataset created on production. No files yet.
Training oral production in learning second language grammar The use of French conditionals 133 Flash online audio recording French De Jong Regular French courses at Pitt and CMU Spring and Summer 2007 7/31/07: Dataset created on production. No files yet.
48 Unclassified TWS_Group_01 65
49 WPI-Assistments Assistments - 8th Grade Math - 2004-2005 (200 students) 90 Assistments Other Neil Heffernan Mass Public 2004-2005 6/20: Need to reload.
6/25: Reloaded.
50 WPI-Assistments Assistments - 8th Grade Math - 2004-2005 (762 students) 92 Assistments Other Neil Heffernan Mass Public 2004-2005 6/20: Need to reload.
6/25: Ready to load (on production)
6/26: Reloaded.
LFA - OutOfMemory on all 4 KC models.
WPI-Assistments Assistments - 8th Grade Math - 2005-2006 (200 students) 119 Assistments Other Neil Heffernan Mass Public 2005-2006 Loaded on 7/03/07
10 WPI-Assistments Assistments - 8th Grade Math - 2005-2006 (1582 students) 120 Assistments Other Neil Heffernan Mass Public 2005-2006 Loaded on 7/03/07

READY TO GO LIVE

(datasets that are ready to be loaded to learnlab.web, highest priority on top)

Project Dataset Name Tool LearnLab P.I. School(s) Date Notes/Status Remarks
Physics Physics - USNA - Fall 2006 Andes Physics Kurt VanLehn US Naval Academy Fall 2006 Waiting to receive files from Anders.
8/14/07: files on Cooker, waiting for verification.
Physics Self Explanation - Electric Fields - Spring 2006 Andes Physics Kurt VanLehn US Naval Academy Spring 2006 Waiting to receive files from Anders.

8/14/07: files on Cooker, waiting for verification.
09/28/07: Loaded on the cooker under v2.3, waiting for verification

Physics Physics - USNA - Spring 2007 Andes Physics Kurt VanLehn US Naval Academy Spring 2006 Waiting to receive files from Anders.

8/14/07: files on Cooker, waiting for verification.

09/28/07: Loaded on the cooker under v2.3, waiting for verification

IN TESTING

(datasets that have been loaded and are to be reviewed for correctness, highest priority on top)

Project Dataset Name Tool LearnLab P.I. School(s) Date Notes/Status Remarks
A Multimodal Interface for Solving Equations Handwriting - Spring 2007 CL-2006 (Munger) Algebra Lisa Anthony Spring 2007 07/06/07: Being QA'd by Tristan

09/24/07: Data loaded to the-cooker.
09/28/07: Updated on cooker to v2.3, waiting for new version of munger to load on production with v2.3 analysis_db

Contiguity - CWCTC Contiguity Spring 2007 CL-Lisp tutor (no CTAT) Geometry Vincent Aleven, Kirsten Butcher CWCTC 07/06/07: Will receive from CL week of 7/10/07

07/09/07: Octav received data
07/18/07: Loaded on cooker

09/28/07: Loaded on the cooker under v2.3, waiting for verification.
French Discussion Pilot French Discussion Board CTAT

Discussion Board

French Amy Ogan CMU 8/2/07: Received log files from Jonathan Sewall. Will load to QA.

8/07/07: Files loaded to QA. Some still need fixed (contain invalid XML)
8/16/07: All CTAT logs free of errors. All have been loaded to QA - waiting for researcher verification.
8/24/07: Need to get logs from Discussion Board from Erin Walker. Let's ping them again late Sept.
9/25/07: Files are on QA - waiting for go-ahead to move to prod from the researchers.

15 Knowledge Tracking French Vocabulary Spring 2007 Phil's own French Phil Pavlik/Nora Presson Fall 2006 07/06/07: Nora Presson to send data, no ETA yet.

8/13/07: Nora has import file ready. Waiting to receive from her to load onto the-cooker.
8/15/07: Data loaded to the-cooker. Vetting in process.
10/8/07: Problem with import file, Nora to reprocess and send to DataShop team.

Knowledge Tracking Chinese Vocabulary Fall 2006 Phil's own Chinese Phil Pavlik Fall 2006 7/06/07: Phil working on conversion. ETA 4-6 weeks.

09/25/07: DataShop has the dataset file. Waiting to upgrade the-cooker before loading for vetting.
09/28/07: Loaded on the cooker under v2.3, waiting for verification
10/9/07: Problem with the number of observations in Learning Curves. DS team looking into it.

Knowledge Tracking Chinese Vocabulary Winter 2007 Phil's own Other Phil Pavlik Spring 2006 7/06/07: Phil working on conversion. ETA 4-6 weeks.

9/26/07: Loaded to the-cooker
10/9/07: Problem with the number of observations in Learning Curves. DS team looking into it.

IN PROGRESS

(datasets that require additional conversion work or tweaking, highest priority on top)

Project Dataset Name Tool LearnLab P.I. School(s) Date Notes/Status Remarks
Geometry 2005-2006 CL-2005 (lisp) Geometry Vincent Aleven CWCTC, Hampton, Wilkinsburg HSs 2005-2006 Octav needs to convert this data into XML.

UPCOMING

(datasets we expect to receive soon - may or may not require additional processing on our part)

Project Dataset Name Tool LearnLab P.I. School(s) Date Notes/Status Remarks
Algebra Bridge to Algebra 2005/2006 CL-Munger Algebra 6/6/07: Data is good to munge.

Request sent to CL to break the dataset into smaller pieces
07/09/2007: Confirmed that the munger can load sections only. Steve Ritter wishes to speak with ken/kurt to cover legal issues with this dataset.

7 Stoichiometry VLAB data from Stoich studies Chemistry Bruce McLaren Oliver Scheuer to oversee conversion in Germany.

6/20/07: Mike Karabinos to work up examples for each type of VLAB Action, DataShop to fill in dummy and real <tutor_message>. Will then send this info to Oliver.
7/27/07: Kyle completed transformation to XML. Sent out for review.
8/14/07: Working on storage of replay data - waiting for response from CTAT and Germany
9/25/07: Jonathan Sewall wants to see what logs look like in DataShop.

8 ESL ESL data 7/27/07: Moving the Online Search System to PSLC servers in the near future. Will tie in to DataShop with single-sign on. No current plan to move any ESL data into DataShop itself.
Improving Algebra Learning and Collaboration PTS Algebra I 2005 CL Algebra Tutor Algebra Erin Walker ? 2005-2006 (not sure) Researcher has raw logs and has written a converter for analysis. Need to find out if it's in DataShop format.
Improving Algebra Learning and Collaboration PTS Algebra I Spring 2007 CL Algebra Tutor Algebra Erin Walker CWCTC Spring 2007 Researcher has raw logs is responsible for submitting to DataShop for import.

7/06/07: Frank sent anonymized raw logs to Erin.

Algebra I Course Algebra I 2006-2007 CL Algebra - Mungable Algebra Albert Corbett CWCTC, Hampton, Wilkinsburg, 3 LA schools 2006-2007 school year Waiting for harvest from CL. Also need a new version of munger to load.
Geometry Course Geometry 2006-2007 CL Geometry Geometry Vincent Aleven CWCTC, Hampton, Wilkinsburg 2006-2007 school year Waiting for harvest from CL. Will need a conversion from Octav.
Robust Learning of Vocabulary Fall 2006 Personalization Study REAP Tutor English Juffs English Language Institute Fall 2006 No ETA from Michael yet.

9/25/07: Sent follow-up email to Michael
10/03/07: Waiting for files from REAP Team.

Robust Learning of Vocabulary Spring 2007 Reading 4 REAP Tutor English Juffs English Language Institute Spring 2007 No ETA from Michael yet.

9/25/07: Sent follow-up email to Michael
10/03/07: Waiting for files from REAP Team.

Robust Learning of Vocabulary Summer 2007 Word Sense and Pronunciation Audio Study REAP Tutor English Juffs English Language Institute Summer 2007 No ETA from Michael yet.

9/25/07: Sent follow-up email to Michael
10/03/07: Waiting for files from REAP Team.

Verb aspect Verb aspect CTAT French Amy Ogan 2007 8/24/07: Ping in late Sept to get these logs.
09/28/07: E-mail sent to amy asking for data status update.
Physics Physics - USNA - Spring 2006 Andes Physics Kurt VanLehn US Naval Academy Spring 2006 Waiting to receive files from Anders.

Octav to regenerate all of his datasets?