Difference between revisions of "Collected User Requests"

From LearnLab
Jump to: navigation, search
(Statistical Significance)
(Reveal unanonymized student IDs)
Line 177: Line 177:
* Ruth Wylie, July 3, 2008
* Ruth Wylie, July 3, 2008
* As the honest broker of the PSLC data, I have promised to not reveal the student IDs and to protect the identity of the students.  This is part of the DataShop IRB.  Therefore, I do not see this request as possible. [[User:Alida|Alida]] 09:53, 4 September 2009 (EDT)
* As the honest broker of the PSLC data, I have promised to not reveal the student IDs and to protect the identity of the students.  This is part of the DataShop IRB.  Therefore, I do not see this request as possible. [[User:Alida|Alida]] 09:53, 4 September 2009 (EDT)
** At some point, some coordination would be good with OLI's Digital Dashboard project that Marsha Lovett (and the OLI team and sometimes me) is working on.  This is meant to provide usage information more quickly to instructors.  It could also perhaps be used by researchers (with the right IRB rights) in situations like Ruth's.
== Navigation Bar ==
== Navigation Bar ==

Revision as of 20:34, 16 September 2009



Annotations on Transaction Level

  • I have models which can annotate things like: gaming, bored, etc. on the transaction level. -- Ryan Baker, ET Mtg 12/5/2007

Annotations on Student Level

  • Would annotate on student level. -- Ido Roll, User Meeting, 1/19/2009

Annotations on Pages

  • See the cool thing created by Jeffery Heer where all the settings of the page were recorded with the comment. -- Ryan Baker, DS Team Mtg 5/23/2008

Dataset Discussion - Capture data-integrity issues

  • As a stakeholder in the DataShop project, I want to capture and publicize the data-integrity issues discovered with data sets so that data is better documented (and so we've fulfilled a promise to our funders to better document data). -- Ken Koedinger, Team Meeting, 8/15/2009
  • As a user of DataShop, I want to discuss datasets and have that discussion attached to the dataset so that others can comment and better understand any data-integrity issues I've found.

Linking to internal pages

Have a link from the DataShop to the Theory Wiki (Dataset to Project Page)

  • Can we link from the dataset to the project page on the Theory Wiki? In the pipeline have a clickable link to the project page (make project name clickable). -- Michael Bett, ET Mtg 11/14/2007
    • Link to a dataset directly? Is that obvious to users? Click on dataset link -> log in -> redirected back to dataset. -- Brett Leber

Data Modeling

Non-KC Modeling

Automatic Distillation

  • As an educational data miner wishing to develop a machine learned model with PSLC data, I would like to be able to automatically distill data features (e.g. custom fields) commonly used in past educational data mining research for a new data set (see, for instance, Baker, Corbett, Roll, & Koedinger, 2008 in UMUAI) -- Ryan, Summer 2008, Startup Memo
    • Could be implemented as a plug-in
  • Also interested in this feature idea:
    • Dan Franklin, Oct 2008

Upload model and apply it to new data set

  • As an EDM researcher, I would like to take a model, expressable as a linear formula on DataShop fields, or a simple code procedure (e.g. Bayesian Knowledge Tracing, which Ryan has code for), and apply it to a new data set, so that I can ... -- Ryan Baker, Sept 2008
  • Also interested in this. -- Maxine Eskenazi, Sept 2008
  • May work best as a plug-in
    • Code to display GUI to choose which data sets to use, calls model code, re-import to DataShop
    • Good to have a way to apply many models, as soon as you import a data set
  • Phil has an idea that maybe fits within this one. Please move if there's a better category [Brett Leber]

This [transaction? kc? --ed.] relabeling is really mostly about enabling modeling in DataShop right? With this in mind, I think that it is actually a higher priority to have model alternatives in DataShop.... E.g. Investigators should be able to give you chunks of Java code according to a certain specification, and DataShop should be then able to run these over datasets (perhaps after a certain series of QA occurs according to an SOP) when the investigator clicks some button in DataShop.... Obviously this is a much large project than adding columns, but it is also much more important in my mind.
--Phil Pavlik, email to Brett on 1/14/2009

  • Examples:
    • Example: running gaming detector in multiple tutors and comparing gaming frequencies
    • Example: applying Bayesian Knowledge Tracing to a new data set from the same LearnLab
    • Example: applying Ben Shih's models to many data sets [Ben Shih should be included in design of this feature; he is interested, and has a lot of good ideas]

Add Different Predicted Values

  • Would also like to add statistics, different predicted values than what LFA produces. -- Ken Koedinger, ET Meeting, 10/10/2007

Bayesian Knowledge Tracing

  • Bayesian Knowledge-Tracing built into DataShop like LFA is. -- Ryan Baker, Startup Memo, Summer 2008

Richer statistics for KC modeling

  • In addition, to the model stats and estimates generated for learning factor models, we should also create difficulty factor models (i.e., ones with "Slope" parameter). The latter is particularly relevant for the Unique-Step model where the slope parameter is meaningless (but still counts against the BIC value). -- Ken Koedinger, Email "new feature request", 1/22/2009
    • Relatedly, we should report significance values on the Slope parameters -- that is, when is the Slope significantly different from 0.
    • The KC models page perhaps should also report the log-likelihood and number of parameters (in addition to BIC) and leave out AIC. We might also consider other metrics of model generality, like the "adjusted R2" (if I have this name right -- Joe Beck mentioned in the Assistments meeting yesterday).
    • These changes will be part of meeting the CMDM goal of improvement in (or at least demonstrate acceptability of) the cognitive models in 90+ units in our LearnLab courses (or affiliates).

KC Modeling

Create KC Models through Web Services

  • For John Stamper's CMDM project, it would be nice to automatically update KC Models through web services. -- Ken Koedinger, DataShop Team Meeting, Sept 11, 2009

Automatically discovering new KC model

  • Would it be possible to run some code (perhaps Hao's KC model selection code, perhaps something else generated by CMDM thrust) to find new best KC model. -- Vincent Aleven, Sept 2008
  • As a learning sciences researcher, I would like DataShop to discover a new/better KC model for me.
  • Could be done as a plug-in

Same Skill Twice on Same Step

  • Would like to be able to apply the same skill to a step twice during a KC Model Import. -- Ken Koedinger, email, 2/4/2009

Save KC Model Import Files

  • KC Model Import - save the file used to create the KCMs in case we need to recreate them. -- Ken Koedinger, email 3/4/2009

Generate new KC Models with LFA

  • Not sure who asked for this.
  • It would be nice to generate new KC Models with Hao's LFA code
  • Would need to specify factors.
  • Ideas on where this could run?
    • On a separate server? Request it to be run, specify duration. Have separate server queue up requests, email user when done.
    • In Java Applet on client machine? -- Phil Pavlik

Log Likelihood and MAD

  • Log Likelihood, MAD problem, MAD step (store and show) -- Hao Cen
    • This is a variation on "Richer statistics for KC modeling" above. Probably should be merged. -Ken

Better naming for KCs in auto-generated Unique-Step KCM

As a researcher, I want the KCs in the Unique-Step model to have better names than KC1, KC2, etc, so that I can easily tell which generated KCs go with which unique step.

  • Hui Cheng (Email 1/20/2009), Ken Koedinger (Email 1/22/2009)
  • Could you put this in your new feature request list: could the “Unique-step model” be better labeled then just “KC1”, “KC2” and etc? For example, for Assistments data, you could use part of the “Step Name”. -- Hui Cheng, Email 1/20/2009
  • But, anything is better than "KC". -- Ken, email, 1/22/2009

A simple alternative, that preserves uniqueness and addresses length, is to concatenate: 1) the first K letters of the step name 2) a unique numerical increment (same as the "3" in "KC3"). Note that (2) guarantees uniqueness just as it does in the current "KC<num>" scheme. Or perhaps better given that step names are often scoped within problems, is to concatenate: 1) the first L letters of the problem name 2) the first M letters of the step name 3) a unique numerical increment (just like the "3" in "KC3") I think K or L+M should be as big as possible without making the KC names indistinguishable (because they run off the right margin) in the KC list on the Learning Curve and other pages.

Visualize Learning Curve Split

  • Be able to visualize a learning curve split into 2 based on a specification of a subset of problems. -- Albert Corbett, Math CCM, November 2008
  • Harder: make me a new KCM out of it. -- Ken Koedinger, Team Mtg, Dec 5, 2008, while trying to describe Albert's request to Alida

Statistical Significance

  • Can DataShop determine if the difference between conditions or learning curves is statistically significant? -- general theme at workshop, probably mentioned by Bob Hausmann in his talk, Winter Workshop 1/23/2008
    • Can't do it yet in DataShop, but I can show you how to do it in R (or SPSS...) after you've exported the data -- export the "student-step rollup" than the whole transaction table. --Koedinger 16:30, 16 September 2009 (EDT)
      • This comment perhaps belongs (exists?) elsewhere: The current student-step rollup export (relly, all exports) should be such that I can immediately load it into R (and other packages) without error. Now errors occur, for instance, because there are "#" in the variable names i the student-step rollup. However, it is currently a road-block for helping folks like Bob do analyzes they want to do. --Koedinger 16:30, 16 September 2009 (EDT)


Specialize Label of Help Button

  • Since this help is better than in most applications, it should say more than just 'Help'. -- Ken Koedinger & Ryan Baker, Team Mtg, May 30, 2008
    • Ideas:
      • 'Page Help'
      • 'Help with this page'
      • 'Help with Learning Curve page' (Ken's favorite)
      • 'Help with this tool'

Home Page

Redesign the Home Page

  • In the menu of data sets at the top, include the N (=20 or as many as fit on the screen?) data sets that I have visited starting with the ones I've visited most recently. --Koedinger 16:16, 16 September 2009 (EDT)
  • There needs to be a better ordering for the datasets (DS364)
  • Maybe a search to filter the list of datasets since the list is so long. -- Brett Leber, 6/14/2007
  • Going back to the home page always goes to 'My Datasets' (DS313)
  • Maybe show more high level stats on this page, like how many transactions, students, skill models

Last Transaction Time on Home Page

As a researcher collecting data with a tutoring application right now, I want to easily determine, at a glance, if my tutor is sending data to DataShop so that I know this isn't all for naught.

  • Vincent Aleven, CTAT Meeting, 9/17/2008
  • Add last transaction time to home page
  • Note that there would be a delay between when data was logged and when DataShop was aware of it.
  • Alternative: # of transactions in last x hours.



Analyses by LearnLab

  • Organize data by LearnLab, not by "data set". -- Ryan Baker, Aug 2008
  • Also: Bob, Sep 2008; Maxine, Sep 2008
  • Essentially, current data sets become samples, but the top-level unit is the LearnLab. You can take every data set in a LearnLab together as a sample.
  • Implies being able to run analyses across data sets, and export multiple data sets together; to create multi-data set samples
  • As a user of DataShop, I would like to look at learning curves for all Algebra data together (for example), or export all Algebra data
  • Important long-term, but is a lot of work -- in particular, we need to solve scalability issues first.

Save Settings Between Sessions

  • It would be useful if DataShop could save settings between sessions. -- Bob Hausmann, User Meeting, 2/1/2008
    • "I do a lot of redoing the same steps" (e.g., set cutoffs, select a KC model, select students).

Multiple steps per transaction

  • Needed so that we do not have to create multiple transactions for the same actual action for Andes logs. -- Kurt van Lehn, Feb 2007

Demographic data

  • This has been mentioned by NSF visitors, AB, ESL, and some researchers.
  • Also mentioned at Winter Workshop 1/23/2008.
    • Derek/Sue-mei: Student background information not in DataShop. Would like to see a student or set of students from a particular demographic, and view them across datasets!
  • Note that Gail added demographic data to Additional Notes field on the Dataset Info page for many datasets. The idea here is to put that data into the database somewhere.

Single Sign On

  • Michael Bett, email, 10/8/2007
  • It would be nice if the following services have a single login account/password:
    1. Theory Wiki
    2. Learnlab.org
    3. ESL's OSS
    4. DataShop

Reveal unanonymized student IDs

As a researcher/PI performing research assistant tasks, I want to see easily the unanonymized student IDs of subjects in DataShop so that I can email my subjects telling them when to use my system.

  • Ruth Wylie, July 3, 2008
  • As the honest broker of the PSLC data, I have promised to not reveal the student IDs and to protect the identity of the students. This is part of the DataShop IRB. Therefore, I do not see this request as possible. Alida 09:53, 4 September 2009 (EDT)
    • At some point, some coordination would be good with OLI's Digital Dashboard project that Marsha Lovett (and the OLI team and sometimes me) is working on. This is meant to provide usage information more quickly to instructors. It could also perhaps be used by researchers (with the right IRB rights) in situations like Ruth's.

Navigation Bar

Filter KCs by Name

As a researcher working with KCs, I want to filter KCs based on their names, so that I can...

  • Vincent Aleven, Email, 2/3/2008
  • "Since Alida said you cannot have two mechanisms for putting together your KC set (i.e., cannot have both the selecting-by-clicking and selecting-by-filtering), I would probably opt for the latter."
  • Alida: I thought Vincent mentioned that he'd like to select which KCs are in a set by filtering on the name. Example: Include KCs with '*reason*' in the name and exclude KCs with '*given*' in the name.
  • This could be an addition to our v3.0 KC-selection mechanism--filter by name.
  • Vincent, Email, 5/6/2009: Expressed another need for this feature. Could just allow for a wider area and longer list so that more items can be checked at once. The number of characters we show right now is not enough because in many cases that number of characters is the same across many of the skills. Reference data set: Geometry CWCTC 2005-2006
  • Status: Design Started

Facebook-style KC Selection

As a researcher working with KCs, I want to select KCs based on the learning curve thumbnail, so that I can see quickly which ones I'm interested in exploring more deeply.

  • Feature already designed for v3.0, not implemented due to time constraints.
  • Agreed this would be really useful [Kirsten Butcher, User Mtg, 1/31/2008]
  • Status: Guestimate: 20 days, need to revisit requirements document

Feedback after clicking a large sample on a large dataset

As a user of DataShop (first-time or not), I want some feedback and the ability to cancel after I do something that might take a long time (e.g., clicking "All Data" on a large dataset) so that I do not get stuck.

  • Part of the Susan Goldman story
  • After clicking a sample for a large dataset, there is no "Loading..." text, no feedback that the click was even registered by the app (besides the sometimes busy cursor and small browser "loading" text), nor the ability to cancel the action. 
  • We will always have similar problems even if performance is improved, so providing feedback and the opportunity to cancel is critical.

Save Button in Problem Navigation Box

  • Save buttons in the sidebar. [Ken, Mtg 2006]
    • Could also put one in the Problem selection box in the sidebar.

Make Nav Bar Wider

  • Make the Sample name and description fields much wider. [John LaPlante, email 7/10/2007]

New Visualizations/Reports

Student-KC Rollup

As a researcher, I want to see KCs rolled up by student, so that I can ...

  • Vincent Aleven, User Mtg, 1/29/2008
    • By Student-KC would be more useful than by Student-Problem
    • Example: # Steps asking for a hint or error or what proportion had help
    • How often bottom out hint occurs

Instructor Reports

  • Phil said he received a lot of position reactions to providing reports on units for instructors. [Phil Pavlik, ET Mtg 10/10/2007]

Manage Authorizations/Projects Page

  • Lisa Anthony, email 10/23/2007
  • Allow PI to change permissions on the datasets.
"Actually, I couldn't see how to change permissions on the datasets from the website. Is this possible? If not, it might be a nice feature..."

Calculate Time Spent on Different Study Activities

As a researcher, I want to know how much time, on average, students spend on study activities, so that I can ...

  • Bruce McLaren, Email, 4/7/2009

For my most recent stoich study, Shawn and I are interested in calculating timing information such as:
(a) how long students spent, on average, working on individual tutors
(b) how long students spent, on average, on all items in an intervention
(c) how long students worked, on average, on post-tests.

Timing information is very commonly required for studies, and can be calculated from DataShop logs relatively easily, so even if we don't have it, might be worth considering. (And we don't want to re-invent the wheel, if you already have it or are planning it...)

Incorrect Step Duration and Hint Step Duration

As a researcher, I want to be able to see total step duration if the student's first attempt was an incorrect attempt, and total step duration if the student's first attempt was a hint request, so that I can do some analyses that I can't do with "Error Step Duration".

  • Bob Hausmann, email, 11/11/2008.
  • Updated title and story with 'step duration' instead of 'time'. --Alida 10:36, 4 September 2009 (EDT)


Dataset Info

Pointers to Hard-copy Data

  • Brett van de Sande, NSF Site Visit, 5/28/2008
  • Pointers to hard-copy data such as paper tests and/or homework.  Include contact information.  It doesn't seem to make sense to scan a whole filing cabinet of paper if no one wants to look at it.  And any secondary researchers don't know about the filing cabinet to ask for it.

Sort Problem Breakdown Table

  • Problem Breakdown Table [Bruce McLaren, User Mtg, 11/5/2007]
    • Ability to sort the table by clicking on column heading

Rename dataset

As a researcher, I want to rename my dataset so that it makes more sense to other people. I also want to make sure the dataset doesn't become polluted later by new data not associated with my study.

  • Ruth Wylie, July 3, 2008
  • There are reasons she would want to do this (current name is worthless, other researchers might try her tutor and pollute her data) but also reasons for not doing it (log more data later).
  • There are risks in changing a dataset name that might not be apparent. For example, if you want the new data in the same dataset. Alida 10:27, 4 September 2009 (EDT)

Error Report

View By Student

  • Marsha Lovett, 10/11/2007
    • Would like to see what a couple of students saw in the feedback


  • I would like the ability to export this data. -- John LaPlante, email thread 'Suggestions for Improvement' 7/10/2007
  • Also interested in this feature idea:
    • Bruce McLaren, User Mtg, 11/5/2007


  • Was planned for but not implemented in v2.1 (estimated to be a 4 day task)
    • By Correctness %, starting with the least correct
    • By Hints %
    • Step (or KC if view by KC)
    • Number of Students
  • Ability to sort problems by their average experienced position within the curriculum [Ken, 02/16/2007]
    • Which problem did students most often experience first, then the one experience second most often, ...
  • Order steps by the order they typically are executed by students. [Ken, email 11/7/2008]
"Searching through the steps in a problem to get a sense of what is going on is currently hard because the steps are ordered alphabetically, not by the order in which most students did them. While not all students do all steps in the same order, there is some regularity there. It would be quite useful if the steps could be ordered in a "typical order". This could be accomplished by using the time stamps (of the first (correct?) transaction?) for each step to determine rank order of the each step for each student in a problem and then for each problem average the rank order of each step across all students. Then arrange the steps in the Error Report by their average rank order -- that is, roughly speaking, the step that is most likely to be first across students (closest on average to first) goes first, the step with the next lowest rank goes next, etc."
  • The capability to count the number of errors of each message type and sort in different ways, for instance by all errors that had no messages. [Bruce, email 10/22/2007]
"This is an error analysis I recently did in Excel, using pivot tables, that might be handy if in the DataShop. This one is very important for tutors because the errors that occur most frequently, yet don't elicit messages to the students, are good candidates to become errors with feedback."


Elapsed Time

  • Include the elapsed time in preview and transaction export. It is more valuable than the transaction time as an absolute reference. Possible to keep both. --Ken Koedinger, Team Mtg 04/18/2008

SQL Format

  • Option to export as an SQL file. -- Ken Koedinger, 03/26/2007, also brought up in June ET Meeting
    • Ability to export an SQL dump of a dataset. --Kyle Cunningham, 04/03/2007

Specify Character for Blanks

  • Ability to specify what character if any is used for blank. --Ryan Baker, email 8/9/2007
"Not all tools handles TABTABTAB correctly on import. The period '.' is used to mean missing data in most stats packages. The word 'BLANK' is used in some other ones. Not an issue for Ryan as he wrote a preprocessor to convert blanks.

Learning Curve

Reduce Scrolling

  • Add a forward and back button to the graph to reduce scrolling. [John LaPlante, email 7/10/2007]
"The learning curve page could use a forward and back button to cycle through learning the learning curves. Going through them one by one requires a lot of scrolling."

Turn On Point Labels

  • It would be nice to have the option to turn on point labels. It is nice that I can mouse over a point and view the data but it would be nicer if it appeared automatically. [John LaPlante, email 7/10/2007]

Option for Bigger Graph

  • Allow user to see bigger graph. [Derek Chan, Winter Workshop 1/23/2008]
    • Potential solution: enable user to set x, y scale manually

Normalize Scale of Thumbnails

  • Allow user to normalize scale of thumbnails. [Bob Hausmann, Winter Workshop 1/23/2008]
"How do individual KCs compare? Which has the lower error rate? Hard to tell with varying scales. Would like to normalize across a scale that shows the set of KCs"
    • I agree! In fact, I might go as far as saying they should all go from 0-100% as the default option and perhaps leave it at that. --Koedinger 16:21, 16 September 2009 (EDT)

Performance Profiler

Rename Performance Profiler

  • John laPlante, email thread 'Suggestions for Improvement' 7/10/2007
    • Did not use this report as thought it had something to do with improving the performance of the DataShop itself. This report might have been much better to use then the Error Report, would still need an export as using the data in a tabular form was still necessary. Note that the pivot tables created were added to the dataset (Pittsburgh Science of Learning Center Stoichiometry Study 1).
  • Lisa Anthony, email 8/2008
    • Didn't know to go to the report. Export would be useful.
    • Needed a better definition of Error Rate with respect to Problem and Unit rows.


  • John laPlante (see comments in Rename Performance Profiler)
  • Lisa Anthony (see comments in Rename Performance Profiler)
  • Yes, I too can imagine wanting to export the results of a particular performance profiler output (i.e., to a table) so that I can graph it my own way. --Koedinger 16:24, 16 September 2009 (EDT)

Table View

  • Add option to switch to a table view.
  • Columns are: Problem Name, Steps, % incorrect, Incorrect Steps, % hint, Hint Steps, etc. include all values in pop-up.

Union of KCs/Problems/Students

  • Allow user to get the union of KCs/Problems/Students etc so they can compare across samples easier. [Kirsten Butcher, Winter Workshop 1/23/2008]
"It is difficult to compare performance profiler graphs across samples because the KCs (or problems, or whatever) aren't necessarily in both of those samples."

Show Details In Report

  • Click on bar to see details in report and not just in pop-up. It disappears to quickly. [Alida, Brett]

Show More Information in the Graph

  • Show more information in the graph: [Bruce M, User Mtg, 11/5/2007]
    • Had drilled down by a certain skill - skill is not listed in the graph, user has to check the skill list on the LHS to see what skill was selected
  • Maybe related to Bruce's "show more info in graph": show the actual value of the range variable. e.g., when range is error rate, show the error rate number somewhere (right now you can see, via mouse-over, the incorrect, hint, and correct percentages, but not the error rate) [Brett]
  • Design idea to show # steps incorrect, # steps hint, # steps correct to clarify how the percentages are calculated. [Alida, Brett]

Add Latency Y-axis Options

  • Add Assistance Time and Correct Step Time to y-axis options. [Ken, Mtg with Connie, 8/13/2008]
  • Revising for DataShop v3.5.7: Add Step Duration, Correct Step Duration and Error Step Duration to the y-axis options. [Alida 9/3/2009]

Sample Selector


  • Sub Samples would be helpful [John LaPlante, email 7/10/2007]
"A nice solution would be to have sub-samples where one property varies. When I'm doing this analysis, I've changed my sample many times, renaming it sometimes, tweaking it to get variations on the data. The samples are really useful but they could help me a lot more with this kind of experimentation."

Filter out students

As a researcher, I want to filter out test users (including myself) from my data so that I see less noise in the data.

  • Ruth Wylie, July 3, 2008
  • You can already filter out test users by using a test user id that starts with 'weirdCMUuser_xxx'. Then create a sample that excludes students with a name like 'Test_%'. --Alida 10:29, 4 September 2009 (EDT)

Web Services

Use Custom Fields in Performance Profiler

  • Use all discrete variables/custom fields on the left and all the continuous variables on the bottom of the Performance Profiler. -- Ken Koedinger, DataShop Team Meeting, Sept 11, 2009

See prioritized items on DataShop Feature Wish List.