Theory Wiki - User contributions [en]

Collected User Requests

2013-08-21T12:41:20Z

Bleber: /* Learning Curve */

See prioritized items on [[DataShop Feature Wish List]].

== Annotations ==
* Some links from Ruogo Kang's (CMU PhD student, Sara Kiesler) recent talk. -- Ken, email, 8/24/2011
** http://vis.stanford.edu/papers/senseus
** http://vis.berkeley.edu/papers/commentspace/

==== Annotations on Transaction Level ====
* I have models which can annotate things like: gaming, bored, etc. on the transaction level. -- Ryan Baker, ET Mtg 12/5/2007

==== Annotations on Student Level ====
* Would annotate on student level. -- Ido Roll, User Meeting, 1/19/2009

==== Annotations on Pages ====
* See the cool thing created by Jeffery Heer where all the settings of the page were recorded with the comment. -- Ryan Baker, DS Team Mtg 5/23/2008

==== Dataset Discussion - Capture data-integrity issues ====
* As a stakeholder in the DataShop project, I want to capture and publicize the data-integrity issues discovered with data sets so that data is better documented (and so we've fulfilled a promise to our funders to better document data). -- Ken Koedinger, Team Meeting, 8/15/2009
* As a user of DataShop, I want to discuss datasets and have that discussion attached to the dataset so that others can comment and better understand any data-integrity issues I've found.

==== Linking to internal pages ====

* It would be handy if a link can be saved to any dataset, sample, page in the DataShop. -- [[Alida]], 10/18/2007
** Currently, https://pslc-qa.andrew.cmu.edu/datashop/DatasetInfo?datasetId=793 works if you are already logged in.

==== Have a link from the DataShop to the Theory Wiki (Dataset to Project Page) ====

* Can we link from the dataset to the project page on the Theory Wiki? In the pipeline have a clickable link to the project page (make project name clickable). -- Michael Bett, ET Mtg 11/14/2007
** Link to a dataset directly? Is that obvious to users? Click on dataset link -> log in -> redirected back to dataset. -- Brett Leber

== Data Format ==
==== Capture the question prompt and answer choices the student chose from ====
* As a research, I want to be able to identify steps based on the question prompt—not the difficult-to-understand step names that come from selection and action of my tutor—so that I can analyze the data more easily. For multiple choice questions, I also want to be able to see all of the choices that were available to the student. -- Eli Silk - February 1, 2011 (meeting with Brett and Ross Higashi of the FIRE project)
** Near-term solution is to create a table locally that maps steps to prompts
** Another solution is to ask CTAT team to modify their Flash components so that they log this information as custom fields
** Long-term solution is to make these fields into standard fields in the tutor message format
** In the email thread "Cognitive Tutor Development and Evaluation Requests", there is support for this general idea from Ken Koedinger, Albert Corbett, and Christian Schunn.
** Ken added that "Ideally, we may want to store any images that the student can see and where they reside on the screen ..."

== Data Modeling ==
=== Non-KC Modeling ===

==== Automatic Distillation ====
* As an educational data miner wishing to develop a machine learned model with PSLC data, I would like to be able to automatically distill data features (e.g. custom fields) commonly used in past educational data mining research for a new data set (see, for instance, Baker, Corbett, Roll, & Koedinger, 2008 in UMUAI) -- Ryan Baker, Summer 2008, Startup Memo
** Could be implemented as a plug-in
* Also interested in this feature idea. -- Dan Franklin, Oct 2008

==== Upload model and apply it to new data set ====

* As an EDM researcher, I would like to take a model, expressable as a linear formula on DataShop fields, or a simple code procedure (e.g. Bayesian Knowledge Tracing, which Ryan has code for), and apply it to a new data set, so that I can ... -- Ryan Baker, Sept 2008
* Also interested in this. -- Maxine Eskenazi, Sept 2008
* May work best as a plug-in
** Code to display GUI to choose which data sets to use, calls model code, re-import to DataShop
** Good to have a way to apply many models, as soon as you import a data set
* Phil has an idea that maybe fits within this one. Please move if there's a better category. -- Brett Leber
<blockquote>This [''transaction? kc? --ed.''] relabeling is really mostly about enabling modeling in DataShop right? With this in mind, I think that it is actually a higher priority to have model alternatives in DataShop.... E.g. Investigators should be able to give you chunks of Java code according to a certain specification, and DataShop should be then able to run these over datasets (perhaps after a certain series of QA occurs according to an SOP) when the investigator clicks some button in DataShop.... Obviously this is a much large project than adding columns, but it is also much more important in my mind. ''-- Phil Pavlik, email to Brett on 1/14/2009''</blockquote>
* Examples:
** Example: running gaming detector in multiple tutors and comparing gaming frequencies
** Example: applying Bayesian Knowledge Tracing to a new data set from the same LearnLab
** Example: applying Ben Shih's models to many data sets. Note that Ben Shih should be included in design of this feature; he is interested, and has a lot of good ideas.

==== Add Different Predicted Values ====
* Would also like to add statistics, different predicted values than what LFA produces. -- Ken Koedinger, ET Meeting, 10/10/2007

==== Bayesian Knowledge Tracing ====
* Bayesian Knowledge-Tracing built into DataShop like LFA is. -- Ryan Baker, Startup Memo, Summer 2008

==== Richer statistics for KC modeling ====

* In addition, to the model stats and estimates generated for learning factor models, we should also create difficulty factor models (i.e., ones with "Slope" parameter). The latter is particularly relevant for the Unique-Step model where the slope parameter is meaningless (but still counts against the BIC value). -- Ken Koedinger, Email "new feature request", 1/22/2009
** Relatedly, we should report significance values on the Slope parameters -- that is, when is the Slope significantly different from 0.
** The KC models page perhaps should also report the log-likelihood and number of parameters (in addition to BIC) and leave out AIC. We might also consider other metrics of model generality, like the "adjusted R2" (if I have this name right -- Joe Beck mentioned in the Assistments meeting yesterday).
** These changes will be part of meeting the CMDM goal of improvement in (or at least demonstrate acceptability of) the cognitive models in 90+ units in our LearnLab courses (or affiliates).

=== KC Modeling ===

==== LFA/AFM: Check if enough memory using formula ====
* The LFA/AFM code could calculate how much RAM would be needed to run the algorithm on a given skill model using the formula provided by Hao. This formula is based on the number of transactions, number of students and number of skills. Right now it will not schedule itself to run on a model with over 300 skills, though there is a manual override. [[User:Alida|Alida]] 13:35, 29 November 2010 (EST)

==== Create KC Models through Web Services ====
* For John Stamper's CMDM project, it would be nice to automatically update KC Models through web services. -- [[User:Koedinger|Ken Koedinger]], DataShop Team Meeting, Sept 11, 2009

==== Automatically discovering new KC model ====

* Would it be possible to run some code (perhaps Hao's KC model selection code, perhaps something else generated by CMDM thrust) to find new best KC model. -- Vincent Aleven, Sept 2008
* As a learning sciences researcher, I would like DataShop to discover a new/better KC model for me.
* Could be done as a plug-in

==== Generate new KC Models with LFA ====
* Not sure who asked for this.
* It would be nice to generate new KC Models with Hao's LFA code
* Would need to specify factors.
* Ideas on where this could run?
** On a separate server? Request it to be run, specify duration. Have separate server queue up requests, email user when done.
** In Java Applet on client machine? -- Phil Pavlik

==== Same Skill Twice on Same Step ====
* Would like to be able to apply the same skill to a step twice during a KC Model Import. -- Ken Koedinger, email, 2/4/2009

==== Save KC Model Import Files ====
* KC Model Import - save the file used to create the KCMs in case we need to recreate them. -- Ken Koedinger, email 3/4/2009

==== Log Likelihood and MAD ====
* Log Likelihood, MAD (mean absolute deviation) problem, MAD step (store and show) -- Hao Cen
** This is a variation on "Richer statistics for KC modeling" above. Probably should be merged. - Ken Koedinger

==== Better naming for KCs in auto-generated Unique-Step KCM ====

As a researcher, I want the KCs in the Unique-Step model to have better names than KC1, KC2, etc, so that I can easily tell which generated KCs go with which unique step.

* Hui Cheng (Email 1/20/2009), Ken Koedinger (Email 1/22/2009)
* Could you put this in your new feature request list: could the “Unique-step model” be better labeled then just “KC1”, “KC2” and etc? For example, for Assistments data, you could use part of the “Step Name”. -- Hui Cheng, Email 1/20/2009
* But, anything is better than "KC". -- Ken Koedinger, email, 1/22/2009
<blockquote>A simple alternative, that preserves uniqueness and addresses length, is to concatenate: 1) the first K letters of the step name 2) a unique numerical increment (same as the "3" in "KC3"). Note that (2) guarantees uniqueness just as it does in the current "KC<num>" scheme. Or perhaps better given that step names are often scoped within problems, is to concatenate: 1) the first L letters of the problem name 2) the first M letters of the step name 3) a unique numerical increment (just like the "3" in "KC3") I think K or L+M should be as big as possible without making the KC names indistinguishable (because they run off the right margin) in the KC list on the Learning Curve and other pages. </blockquote>

==== Visualize Learning Curve Split ====
* Be able to visualize a learning curve split into 2 based on a specification of a subset of problems. -- Albert Corbett, Math CCM, November 2008
* Harder: make me a new KCM out of it. -- Ken Koedinger, Team Mtg, Dec 5, 2008, while trying to describe Albert's request to Alida

==== Statistical Significance ====
* Can DataShop determine if the difference between conditions or learning curves is statistically significant? -- general theme at workshop, probably mentioned by Bob Hausmann in his talk, Winter Workshop 1/23/2008
** Can't do it yet in DataShop, but I can show you how to do it in R (or SPSS...) after you've exported the data -- export the "student-step rollup" than the whole transaction table. --[[User:Koedinger|Koedinger]] 16:30, 16 September 2009 (EDT)
*** This comment perhaps belongs (exists?) elsewhere: The current student-step rollup export (relly, all exports) should be such that I can immediately load it into R (and other packages) without error. Now errors occur, for instance, because there are "#" in the variable names i the student-step rollup. However, it is currently a road-block for helping folks like Bob do analyzes they want to do. --[[User:Koedinger|Koedinger]] 16:30, 16 September 2009 (EDT)
*** See [[Condition in Student-Step Rollup]] -- [[User:Alida|Alida]] 10:35, 17 September 2009 (EDT)

==== Split and Merge Skills ====

* Hand searches though a p-matrix for a dataset to split and merge skills. (Pie in the Sky) -- Ken Koedinger, Team Mtg, 02/22/2008

==== Notes on new KCMs ====

* It would be good if I could add a note to a KC model that was newly imported. -- Noboru Matsuda, email, Nov 19, 2009

==== Display Learning Curves Grouped by Interestingness ====

* The page displaying all the learning curves today seems to be alphabetically sorted by KC name, which is not necessarily meaningful.
* An alternative presentation is to group the curves into 4 sets, breaking up the page. Set 1 has curves that contain significant spikes, and therefore seem to be "low-hanging fruit" for purposes of breaking up into KCs. Set 2 has curves with few spikes, but they have a long X axis, suggesting that students are presented with too many opportunities to acquire those KCs. Set 3 has the "good" curves, i.e., nicely decreasing curves that are not too long. Set 4 is "other". -- [http://www.pitt.edu/~goldin Ilya Goldin] 7 December 2009

==== Order KC models according to AIC====
* This is based on results from multiple dataset analyses that compared AIC, BIC and loglikelihood to cross validation RMSE. -- Mimi McLaughlin, 2/9/2011

====Use log of opportunity count for AIC and BIC calculations====
* We compared using the log of opportunity count to whole number opportunity count in multiple datasets. We found the results for log of opportunity count to be consistently better, though small. -- Mimi McLaughlin, 2/9/2011

== Developer Requests ==

==== Store Converted Date and Converter Info ====

As a DataShop administrator, I'd like to see the converter information (version and date) stored in the database, so that I do not have to store that data manually in the Additional Notes field each time I load a dataset. -- Kyle, 8/5/2008

* DTD new fields:
** Store conversion and converter information in database (anything else?)
*** conversion_time
*** converter_info
* Email from Octav, 10/5/2007

I see there's also a note about the converter version in the Dataset
Info. Which is good, but it seems it's taken from the directory name
when I submitted the set. I don't know how reliable that is.  :-) It
would be better if it's taken from the new converter info field.

==== Convert from XML to tabbed-delimited format ====

* If the users agree that export format is valuable, then maybe if they could convert from XML to export format to see data in Excel, could look at Selection column and see blanks more easily. -- Jonathan Sewall, ET Mtg, 10/10/2007

==== Plug-ins (general issues) ====

* Please please support Ruby on Rails. -- Ben Shih, December 2008

==== Create UI to grant DataShop user roles ====

* Already tedious.
* Alida, User Meeting AAR on December 9, 2009

== Help ==

==== Specialize Label of Help Button ====
* Since this help is better than in most applications, it should say more than just 'Help'. -- Ken Koedinger & Ryan Baker, Team Mtg, May 30, 2008
** Ideas:
*** 'Page Help'
*** 'Help with this page'
*** 'Help with Learning Curve page' (Ken's favorite)
*** 'Help with this tool'

== Home Page ==

==== Redesign the Home Page ====

* In the menu of data sets at the top, include the N (=20 or as many as fit on the screen?) data sets that I have visited starting with the ones I've visited most recently. --[[User:Koedinger|Koedinger]] 16:16, 16 September 2009 (EDT)
** As of today, the menu shows the last 10 data sets, most recently visited at the top. I think the feature is good enough, but let's ask Ken --[[User:Bleber|Bleber]] 10:56, 6 August 2010 (EDT)
* There needs to be a better ordering for the datasets (DS364) -- but search/filtering/sorting would take care of that
* Maybe a search to filter the list of datasets since the list is so long. -- Brett Leber, 6/14/2007
* Going back to the home page always goes to 'My Datasets' (DS313) [fixed May 2012]
* Maybe show more high level stats on this page, like how many transactions [done Jan 2012], students, skill models
** "I am particularly looking for data from courses that contain large numbers of students (e.g., thousands or more). Does the Datashop have any such data? I perused the datasets but couldn't tell from the list how many students each course contains." - Kate Forbes-Riley, email on 8/9/2012
* Allow users to post and share project documentation (files, papers, other meta-info) -- Ruth Wylie, suggested during meeting on 8/4/2010. She had a file that is relevant to multiple datasets in her project.

==== News Feed ====
* Bruce McLaren, email, 4/20/2012
**I wonder whether it would be possible to have a scrolling "News" feed somewhere on the DataShop site that would keep researchers informed about what is happening with the DataShop. For instance, the "News" feed could always show the last time a data conversion finished, announce an upcoming workshop, or inform everyone of critical DataShop issues, such as the fact that the DataShop had a server go down recently (which I know caused you guys a lot of headaches - but which I didn't hear about until a couple weeks after it happened). This kind of thing could be a great communication tool and, as an added bonus for DataShop personnel, avoid lots of email with questions like "where is my data?" or "when is the next conversion going to finish?" Perhaps it would even be possible to have the data conversion routine automatically update the "News" feed each time it begins and/or finishes processing?
** Take it with a grain of salt -- I know you have lots of things on your plate -- but I have been in the situation often, especially just before and during my studies, where I wasn't sure what was going on with DataShop conversions and issues and had to track down someone -- typically Alida -- to figure things out. I have the advantage of sitting right next door to Alida, but I wonder how many other researchers within the PSLC, those not in close proximity to Alida, deal with this issue of not really knowing what is happening with the DataShop at any given time.
* Jonathan Sewall also requested a page that shows the status of the log conversion process, including how much data was processed and for which datasets

== Import ==
== Miscellaneous ==

==== Analyses by LearnLab ====

* Organize data by LearnLab, not by "data set". -- Ryan Baker, Aug 2008
* Also: Bob Hausmann, Sep 2008; Maxine Eskenazi, Sep 2008
* Essentially, current data sets become samples, but the top-level unit is the LearnLab. You can take every data set in a LearnLab together as a sample.
* Implies being able to run analyses across data sets, and export multiple data sets together; to create multi-data set samples
* As a user of DataShop, I would like to look at learning curves for all Algebra data together (for example), or export all Algebra data
* Important long-term, but is a lot of work -- in particular, we need to solve scalability issues first.

==== Save Settings Between Sessions ====
* It would be useful if DataShop could save settings between sessions. -- Bob Hausmann, User Meeting, 2/1/2008
** "I do a lot of redoing the same steps" (e.g., set cutoffs, select a KC model, select students).

==== Multiple steps per transaction ====

* Needed so that we do not have to create multiple transactions for the same actual action for Andes logs. -- Kurt van Lehn, Feb 2007

==== Demographic data ====

* This has been mentioned by NSF visitors, AB, ESL, and some researchers.
* Also mentioned at Winter Workshop 1/23/2008.
** Derek/Sue-mei: Student background information not in DataShop. Would like to see a student or set of students from a particular demographic, and view them across datasets!
* Note that Gail added demographic data to Additional Notes field on the Dataset Info page for many datasets. The idea here is to put that data into the database somewhere.

==== Single Sign On ====
* Michael Bett, email, 10/8/2007
* It would be nice if the following services have a single login account/password:
*# Theory Wiki
*# Learnlab.org
*# ESL's OSS
*# DataShop

==== Reveal unanonymized student IDs ====

As a researcher/PI performing research assistant tasks, I want to see easily the unanonymized student IDs of subjects in DataShop so that I can email my subjects telling them when to use my system.

* Ruth Wylie, July 3, 2008
* As the honest broker of the PSLC data, I have promised to not reveal the student IDs and to protect the identity of the students. This is part of the DataShop IRB. Therefore, I do not see this request as possible. [[User:Alida|Alida]] 09:53, 4 September 2009 (EDT)
** At some point, some coordination would be good with OLI's Digital Dashboard project that Marsha Lovett (and the OLI team and sometimes me) is working on. This is meant to provide usage information more quickly to instructors. It could also perhaps be used by researchers (with the right IRB rights) in situations like Ruth's.

==== Knight Timeline ====
* Developed by Andrea Knight, 2004

==== Buggy Skills ====
* Ken Koedinger, prototype walkthrough 9/11/2006

==== Confusion Matrix ====
* Brian MacWhinney, prototype walkthrough 9/11/2006

== Navigation Bar ==
==== Filter KCs by Name ====

As a researcher working with KCs, I want to filter KCs based on their names, so that I can...
* Vincent Aleven, Email, 2/3/2008
* "Since Alida said you cannot have two mechanisms for putting together your KC set (i.e., cannot have both the selecting-by-clicking and selecting-by-filtering), I would probably opt for the latter."
* Alida: I thought Vincent mentioned that he'd like to select which KCs are in a set by filtering on the name. Example: Include KCs with '*reason*' in the name and exclude KCs with '*given*' in the name.
* This could be an addition to our v3.0 KC-selection mechanism--filter by name.
* Vincent, Email, 5/6/2009: Expressed another need for this feature. Could just allow for a wider area and longer list so that more items can be checked at once. The number of characters we show right now is not enough because in many cases that number of characters is the same across many of the skills. Reference data set: Geometry CWCTC 2005-2006

* Status: Design Started

==== Facebook-style KC Selection ====

As a researcher working with KCs, I want to select KCs based on the learning curve thumbnail, so that I can see quickly which ones I'm interested in exploring more deeply.

* Feature already designed for v3.0, not implemented due to time constraints.
* Agreed this would be really useful. -- Kirsten Butcher, User Mtg, 1/31/2008
* Status: Guestimate: 20 days, need to revisit requirements document

==== Feedback after clicking a large sample on a large dataset ====

As a user of DataShop (first-time or not), I want some feedback and the ability to cancel after I do something that might take a long time (e.g., clicking "All Data" on a large dataset) so that I do not get stuck.

* Part of the Susan Goldman story
* After clicking a sample for a large dataset, there is no "Loading..." text, no feedback that the click was even registered by the app (besides the sometimes busy cursor and small browser "loading" text), nor the ability to cancel the action. 
* We will always have similar problems even if performance is improved, so providing feedback and the opportunity to cancel is critical.

==== Save Button in Problem Navigation Box ====

* Save buttons in the sidebar. -- Ken Koedinger, Mtg 2006
** Could also put one in the Problem selection box in the sidebar.

==== Make Nav Bar Wider ====
* Make the Sample name and description fields much wider. -- John LaPlante, email 7/10/2007

== New Visualizations/Reports ==

==== Student-KC Rollup ====
As a researcher, I want to see KCs rolled up by student, so that I can ...

* Vincent Aleven, User Mtg, 1/29/2008
** By Student-KC would be more useful than by Student-Problem
** Example: # Steps asking for a hint or error or what proportion had help
** How often bottom out hint occurs

==== Instructor Reports ====
* Phil said he received a lot of positive reactions to providing reports on units for instructors. -- Phil Pavlik, ET Mtg 10/10/2007

==== Manage Authorizations/Projects Page ====
* Lisa Anthony, email 10/23/2007
* Allow PI to change permissions on the datasets.
:"Actually, I couldn't see how to change permissions on the datasets from the website. Is this possible? If not, it might be a nice feature..."

==== Calculate Time Spent on Different Study Activities ====

As a researcher, I want to know how much time, on average, students spend on study activities, so that I can ...

* Bruce McLaren, Email, 4/7/2009
<blockquote>
For my most recent stoich study, Shawn and I are interested in calculating timing information such as: (a) how long students spent, on average, working on individual tutors (b) how long students spent, on average, on all items in an intervention (c) how long students worked, on average, on post-tests. 

Timing information is very commonly required for studies, and can be calculated from DataShop logs relatively easily, so even if we don't have it, might be worth considering. (And we don't want to re-invent the wheel, if you already have it or are planning it...)
</blockquote>

==== Incorrect Step Duration and Hint Step Duration ====

As a researcher, I want to be able to see total step duration if the student's first attempt was an incorrect attempt, and total step duration if the student's first attempt was a hint request, so that I can do some analyses that I can't do with "Error Step Duration".

* Bob Hausmann, email, 11/11/2008.
* Updated title and story with 'step duration' instead of 'time'. -- [[User:Alida|Alida]] 10:36, 4 September 2009 (EDT)

==== Grading ====

[[Grading]]

====Display number of steps and number of observations for skills====
* How and where to be determined by developers. -- Ken (entered by Mimi)

== Reports ==
=== Dataset Info ===

==== Pointers to Hard-copy Data ====
* Brett van de Sande, NSF Site Visit, 5/28/2008
* Pointers to hard-copy data such as paper tests and/or homework.  Include contact information.  It doesn't seem to make sense to scan a whole filing cabinet of paper if no one wants to look at it.  And any secondary researchers don't know about the filing cabinet to ask for it.

==== Sort Problem Breakdown Table ====

* Would like the ability to sort the table by clicking on the column headings of the Problem Breakdown Table on the Dataset Info Tab. -- Bruce McLaren, User Mtg, 11/5/2007

==== Rename dataset ====

As a researcher, I want to rename my dataset so that it makes more sense to other people. I also want to make sure the dataset doesn't become polluted later by new data not associated with my study.
* Ruth Wylie, July 3, 2008
* There are reasons she would want to do this (current name is worthless, other researchers might try her tutor and pollute her data) but also reasons for not doing it (log more data later).
* There are risks in changing a dataset name that might not be apparent. For example, if you want the new data in the same dataset. [[User:Alida|Alida]] 10:27, 4 September 2009 (EDT)

==== Average time per problem, average number of problems, total number of sections ====

In addition to showing student hours per dataset, it would be useful to know the average time spent per problem, average number of problems completed, and the total number of sections.
* Noboru Matsuda, June 18, 2010

=== Error Report ===

==== View By Student ====

* Would like to see what a couple of students saw in the feedback. -- Marsha Lovett, 10/11/2007

==== Export ====
* I would like the ability to export this data. -- John LaPlante, email thread 'Suggestions for Improvement' 7/10/2007
* Also interested in this feature idea. -- Bruce McLaren, User Mtg, 11/5/2007

==== Sort ====

* Was planned for but not implemented in v2.1 (estimated to be a 4 day task)
** By Correctness %, starting with the least correct
** By Hints %
** Step (or KC if view by KC)
** Number of Students
* Ability to sort problems by their average experienced position within the curriculum. -- Ken Koedinger, 02/16/2007
** Which problem did students most often experience first, then the one experience second most often, ...
* Order steps by the order they typically are executed by students. -- Ken Koedinger, email 11/7/2008
:"Searching through the steps in a problem to get a sense of what is going on is currently hard because the steps are ordered alphabetically, not by the order in which most students did them. While not all students do all steps in the same order, there is some regularity there. It would be quite useful if the steps could be ordered in a "typical order". This could be accomplished by using the time stamps (of the first (correct?) transaction?) for each step to determine rank order of the each step for each student in a problem and then for each problem average the rank order of each step across all students. Then arrange the steps in the Error Report by their average rank order -- that is, roughly speaking, the step that is most likely to be first across students (closest on average to first) goes first, the step with the next lowest rank goes next, etc."
* The capability to count the number of errors of each message type and sort in different ways, for instance by all errors that had no messages. -- Bruce McLaren, email 10/22/2007
:"This is an error analysis I recently did in Excel, using pivot tables, that might be handy if in the DataShop. This one is very important for tutors because the errors that occur most frequently, yet don't elicit messages to the students, are good candidates to become errors with feedback."

==== Show more than 500 problems ====

In the error report, can we see more than 500 problems? See set "Cog Model Discovery Experiment Spring 2010"
2/2888 selected.
(Showing the first 500)
-- Ken via email on 2/15/2011

=== Export ===

==== Include Step Start Time in transaction format ====

* The transaction-level export already includes problem start time; could it also include step start time? I can easily compute it myself, but it seems there's a specific algorithm that the student-step rollup uses, and it might be nice to include the same value here. --Ilya Goldin, email on 01/16/2012

==== Last attempt on step? column for transaction format ====

* Include a new column that shows whether the row is the last attempt on a step for a student or not. Could be 0 or 1 as value. Helpful for researchers who are doing grading of data. Transaction format. --Vincent Aleven, CTAT mtg 11/5/2010

==== Elapsed Time ====

* Include the elapsed time in preview and transaction export. It is more valuable than the transaction time as an absolute reference. Possible to keep both. --Ken Koedinger, Team Mtg 04/18/2008

==== SQL Format ====

* Option to export as an SQL file. -- Ken Koedinger, 03/26/2007, also brought up in June ET Meeting
** Ability to export an SQL dump of a dataset. --Kyle Cunningham, 04/03/2007

==== Specify Character for Blanks ====

* Ability to specify what character if any is used for blank. --Ryan Baker, email 8/9/2007
:"Not all tools handles TABTABTAB correctly on import. The period '.' is used to mean missing data in most stats packages. The word 'BLANK' is used in some other ones. Not an issue for Ryan as he wrote a preprocessor to convert blanks.

==== Opportunity (at KC) Count in Transaction Export ====
* Include the opportunity count in the transaction export (it's only in the student-step rollup) -- Noboru Matsuda, 10/08/2009
** Would be repetitive
** Would need one Opportunity column per KC

==== Export only rows that have KCs tagged ====
* Is it possible when a exporting dataset to include only the [transaction?] rows that have knowledge components tagged? And the same goes for KC models export, only include the items that have KC tagged? -- Hui Cheng, 01/19/2010
** We have the inverse of this option on the Performance Profiler, 'Include steps without a knowledge component', and with the Student-Problem export, 'Include Steps without KCs'.

==== Include Custom Fields in Student-Step Rollup ====
* Is it possible to include the custom field in the datashop [step] export? -- Hui Cheng, 03/01/2010
** "I am thinking about the student-step level export. We have a student from Statistics wanting to use assistment data from datashop. She wants to have problem set id (=curriculum id in datshop). For Assistment data, we decided to put problem set id in the custom field because in the Assistment a lot of problems are repeated in different problem sets. She needs data at the student-step level." (Hui email)
* Vote from Phil Pavlik too -- see [[Collected_User_Requests#Include_transaction_custom_fields_in_web_services_student-step_export|Phil's comment under Web Services]]

==== Don't duplicate rows in student-step format when not showing KCs ====
* If the checkbox to show knowledge components is not checked, maybe it doesn't make sense to show rows more than once if they have more than one KC associated with the step. (Mimi (and Brett) stumbled on this. 8/16/2010)

==== Student-Step Rollup include Success Column ====
* Step rollup, 1 if correct, 0 if incorrect/hint, blank otherwise call it Success. ~~ Ken Koedinger, DataShop Team Meeting, Oct 22, 2010

=== Learning Curve ===

==== Default sort by observation on LCPID ====

* Default sort by observation on LCPID. ~~ Ken Koedinger, DataShop Team Meeting, 10/22/2010

==== Purple Point ====
* Purple Point: if a point on the LC has more than one KC associated with it but you have drilled down to a given KC, then the blue line is off. We could put a purple point that takes this into account.
* Simpler thing: display a warning message that some points in the display are driven by other KCs
* Pearson may be interested
* This was mentioned during the PSLC Summer School 2010.
* For a step with multiple skills, attribute the error only to the skill with the highest overall error rate. ~~ Alida, meeting with Ken, November 18, 2010

==== Reduce Scrolling ====
* Add a forward and back button to the graph to reduce scrolling. -- John LaPlante, email 7/10/2007

:"The learning curve page could use a forward and back button to cycle through learning the learning curves. Going through them one by one requires a lot of scrolling."

==== Turn On Point Labels ====

* It would be nice to have the option to turn on point labels. It is nice that I can mouse over a point and view the data but it would be nicer if it appeared automatically. -- John LaPlante, email 7/10/2007

==== Option for Bigger Graph ====

* Allow user to see bigger graph. -- Derek Chan, Winter Workshop 1/23/2008
** Potential solution: enable user to set x, y scale manually

==== LC Normalize Scale of Thumbnails ====
* [[LC Normalize Scale of Thumbnails]]

==== Export Learning Curves ====

* Is it possible to export the learning curves themselves? I could do so simply by copying the point values one-by-one into Excel, which would be very doable, but easier would be if you could export the curves itself. (Not super important - I was thinking of making a single chart with multiple metrics although that may not pan out anyway because of different y axes needed.) -- Vincent Aleven, email 8/20/2013

=== Performance Profiler ===

==== Rename Performance Profiler ====
* John laPlante, email thread 'Suggestions for Improvement' 7/10/2007
** Did not use this report as thought it had something to do with improving the performance of the DataShop itself. This report might have been much better to use then the Error Report, would still need an export as using the data in a tabular form was still necessary. Note that the pivot tables created were added to the dataset (Pittsburgh Science of Learning Center Stoichiometry Study 1).
* Lisa Anthony, email 8/2008
** Didn't know to go to the report. Export would be useful.
** Needed a better definition of Error Rate with respect to Problem and Unit rows.

==== Export ====
* John laPlante (see comments in Rename Performance Profiler)
* Lisa Anthony (see comments in Rename Performance Profiler)
* Yes, I too can imagine wanting to export the results of a particular performance profiler output (i.e., to a table) so that I can graph it my own way. --[[User:Koedinger|Koedinger]] 16:24, 16 September 2009 (EDT)

==== Table View ====
* Add option to switch to a table view.
* Columns are: Problem Name, Steps, % incorrect, Incorrect Steps, % hint, Hint Steps, etc. include all values in pop-up.

==== Union of KCs/Problems/Students ====
* Allow user to get the union of KCs/Problems/Students etc so they can compare across samples easier. -- Kirsten Butcher, Winter Workshop 1/23/2008
:"It is difficult to compare performance profiler graphs across samples because the KCs (or problems, or whatever) aren't necessarily in both of those samples."

==== Show Details In Report ====
* Click on bar to see details in report and not just in pop-up. It disappears too quickly. -- Alida, Brett

==== Show More Information in the Graph ====
* Show more information in the graph: [Bruce M, User Mtg, 11/5/2007]
** Had drilled down by a certain skill - skill is not listed in the graph, user has to check the skill list on the LHS to see what skill was selected
* Maybe related to Bruce's "show more info in graph": show the actual value of the range variable. e.g., when range is error rate, show the error rate number somewhere (right now you can see, via mouse-over, the incorrect, hint, and correct percentages, but not the error rate) -- Brett
* Design idea to show # steps incorrect, # steps hint, # steps correct to clarify how the percentages are calculated. -- Alida, Brett

== Sample Selector ==

==== Sub-Samples ====

* Sub Samples would be helpful. -- John LaPlante, email 7/10/2007
:"A nice solution would be to have sub-samples where one property varies. When I'm doing this analysis, I've changed my sample many times, renaming it sometimes, tweaking it to get variations on the data. The samples are really useful but they could help me a lot more with this kind of experimentation."

==== Filter out students ====

As a researcher, I want to filter out test users (including myself) from my data so that I see less noise in the data.

* Ruth Wylie, July 3, 2008
* You can already filter out test users by using a test user id that starts with 'weirdCMUuser_xxx'. Then create a sample that excludes students with a name like 'Test_%'. --[[User:Alida|Alida]] 10:29, 4 September 2009 (EDT)

==== Filter by "Class" ====

As a researcher, I want to create samples based on "class" since class is how I've encoded my conditions.

* Maaike Waalkens, July 30, 2010
* This is what made sense using Mathtutor for tutor delivery.
* What other fields are we missing in the Sample Selector?

==== Filter by "Step" ====

As a researcher, I want to create samples based on "step" since I'm only interested in one particular step.

* Mimi McLaughlin, August 17, 2010

==== Filter by "Problem View" ====

As a researcher, I want to exclude transactions from my sample where problem view is greater than 1.

* Michael Yudelson and Summer School participant, August 8, 2012

==== Create Sample Automatically ====

* Would it be possible for me to get a random sample from the 'Bridge to Algebra 2006-2007' dataset of 100 students? I am having trouble looking at the data because it takes too long to load, and my adviser thought that was because the dataset was too large. ~~ DataShop User, 10/19/2010

== Web Services ==

==== Include transaction custom fields in web services student-step export ====

* "I noticed in the web services guide that cfs (which provides the custom fields) is not yet implemented for step roll-up tables. It is marked [coming soon] in the manual... I could write a workaround to pull in the transactions and lookup the custom fields, but I'd really rather not." -- Phil Pavlik, 4/27/2011

* "It would be best if the CF for step rollup was drawn from the primary transaction for the first attempt that is correct or incorrect. At least that seems like the generally best value. Basically, there may be many cases where the custom field is the same across all transactions for a step. In this case ... you could just use the first one since they are all the same." -- Phil Pavlik, 4/29/2011

==== Use Custom Fields Graphs/Reports ====
In Graphs

As a researcher creating custom fields and assigning values at the transaction level, I want DataShop to perform the aggregation to the step level so that I can do other things with my custom-field variable such as graph it. -- Ryan Baker, mtg w/Alida & Brett, 12/15/2008

Performance Profiler
* Use all discrete variables/custom fields on the left and all the continuous variables on the bottom of the Performance Profiler. -- [[User:Koedinger|Ken Koedinger]], DataShop Team Meeting, Sept 11, 2009

==== Ad-hoc queries ====

* Allow restricted filtering on steps and transactions as the next web service feature (after CFs), whatever they can filter on in the navigation boxes (User Meeting AAR, December 9, 2009)

==== Sample creation as a web service ====

* Sample creation is still too slow. (User Meeting AAR, December 9, 2009)

==== Ability to directly query the database ====

* I am looking for a unique identifier for student->problem-step->problem-view. I am able to compute this in R, but it would be better to be able to query the DataShop database to get this unique identifier instead of having to recreate DataShop (essentially) in R. -- Ilya Goldin, 5/21/2012

 
----
See completed [[DataShop 3.x Features]] 
See on-going [[DataShop 4.x Features]] 
See prioritized [[DataShop Feature Wish List]]
[[Category:Protected]]
[[Category:DataShop]]

Collected User Requests

2012-08-13T15:07:03Z

Bleber: /* Redesign the Home Page */

See prioritized items on [[DataShop Feature Wish List]].

== Annotations ==
* Some links from Ruogo Kang's (CMU PhD student, Sara Kiesler) recent talk. -- Ken, email, 8/24/2011
** http://vis.stanford.edu/papers/senseus
** http://vis.berkeley.edu/papers/commentspace/

==== Annotations on Transaction Level ====
* I have models which can annotate things like: gaming, bored, etc. on the transaction level. -- Ryan Baker, ET Mtg 12/5/2007

==== Annotations on Student Level ====
* Would annotate on student level. -- Ido Roll, User Meeting, 1/19/2009

==== Annotations on Pages ====
* See the cool thing created by Jeffery Heer where all the settings of the page were recorded with the comment. -- Ryan Baker, DS Team Mtg 5/23/2008

==== Dataset Discussion - Capture data-integrity issues ====
* As a stakeholder in the DataShop project, I want to capture and publicize the data-integrity issues discovered with data sets so that data is better documented (and so we've fulfilled a promise to our funders to better document data). -- Ken Koedinger, Team Meeting, 8/15/2009
* As a user of DataShop, I want to discuss datasets and have that discussion attached to the dataset so that others can comment and better understand any data-integrity issues I've found.

==== Linking to internal pages ====

* It would be handy if a link can be saved to any dataset, sample, page in the DataShop. -- [[Alida]], 10/18/2007
** Currently, https://pslc-qa.andrew.cmu.edu/datashop/DatasetInfo?datasetId=793 works if you are already logged in.

==== Have a link from the DataShop to the Theory Wiki (Dataset to Project Page) ====

* Can we link from the dataset to the project page on the Theory Wiki? In the pipeline have a clickable link to the project page (make project name clickable). -- Michael Bett, ET Mtg 11/14/2007
** Link to a dataset directly? Is that obvious to users? Click on dataset link -> log in -> redirected back to dataset. -- Brett Leber

== Data Format ==
==== Capture the question prompt and answer choices the student chose from ====
* As a research, I want to be able to identify steps based on the question prompt—not the difficult-to-understand step names that come from selection and action of my tutor—so that I can analyze the data more easily. For multiple choice questions, I also want to be able to see all of the choices that were available to the student. -- Eli Silk - February 1, 2011 (meeting with Brett and Ross Higashi of the FIRE project)
** Near-term solution is to create a table locally that maps steps to prompts
** Another solution is to ask CTAT team to modify their Flash components so that they log this information as custom fields
** Long-term solution is to make these fields into standard fields in the tutor message format
** In the email thread "Cognitive Tutor Development and Evaluation Requests", there is support for this general idea from Ken Koedinger, Albert Corbett, and Christian Schunn.
** Ken added that "Ideally, we may want to store any images that the student can see and where they reside on the screen ..."

== Data Modeling ==
=== Non-KC Modeling ===

==== Automatic Distillation ====
* As an educational data miner wishing to develop a machine learned model with PSLC data, I would like to be able to automatically distill data features (e.g. custom fields) commonly used in past educational data mining research for a new data set (see, for instance, Baker, Corbett, Roll, & Koedinger, 2008 in UMUAI) -- Ryan Baker, Summer 2008, Startup Memo
** Could be implemented as a plug-in
* Also interested in this feature idea. -- Dan Franklin, Oct 2008

==== Upload model and apply it to new data set ====

* As an EDM researcher, I would like to take a model, expressable as a linear formula on DataShop fields, or a simple code procedure (e.g. Bayesian Knowledge Tracing, which Ryan has code for), and apply it to a new data set, so that I can ... -- Ryan Baker, Sept 2008
* Also interested in this. -- Maxine Eskenazi, Sept 2008
* May work best as a plug-in
** Code to display GUI to choose which data sets to use, calls model code, re-import to DataShop
** Good to have a way to apply many models, as soon as you import a data set
* Phil has an idea that maybe fits within this one. Please move if there's a better category. -- Brett Leber
<blockquote>This [''transaction? kc? --ed.''] relabeling is really mostly about enabling modeling in DataShop right? With this in mind, I think that it is actually a higher priority to have model alternatives in DataShop.... E.g. Investigators should be able to give you chunks of Java code according to a certain specification, and DataShop should be then able to run these over datasets (perhaps after a certain series of QA occurs according to an SOP) when the investigator clicks some button in DataShop.... Obviously this is a much large project than adding columns, but it is also much more important in my mind. ''-- Phil Pavlik, email to Brett on 1/14/2009''</blockquote>
* Examples:
** Example: running gaming detector in multiple tutors and comparing gaming frequencies
** Example: applying Bayesian Knowledge Tracing to a new data set from the same LearnLab
** Example: applying Ben Shih's models to many data sets. Note that Ben Shih should be included in design of this feature; he is interested, and has a lot of good ideas.

==== Add Different Predicted Values ====
* Would also like to add statistics, different predicted values than what LFA produces. -- Ken Koedinger, ET Meeting, 10/10/2007

==== Bayesian Knowledge Tracing ====
* Bayesian Knowledge-Tracing built into DataShop like LFA is. -- Ryan Baker, Startup Memo, Summer 2008

==== Richer statistics for KC modeling ====

* In addition, to the model stats and estimates generated for learning factor models, we should also create difficulty factor models (i.e., ones with "Slope" parameter). The latter is particularly relevant for the Unique-Step model where the slope parameter is meaningless (but still counts against the BIC value). -- Ken Koedinger, Email "new feature request", 1/22/2009
** Relatedly, we should report significance values on the Slope parameters -- that is, when is the Slope significantly different from 0.
** The KC models page perhaps should also report the log-likelihood and number of parameters (in addition to BIC) and leave out AIC. We might also consider other metrics of model generality, like the "adjusted R2" (if I have this name right -- Joe Beck mentioned in the Assistments meeting yesterday).
** These changes will be part of meeting the CMDM goal of improvement in (or at least demonstrate acceptability of) the cognitive models in 90+ units in our LearnLab courses (or affiliates).

=== KC Modeling ===

==== LFA/AFM: Check if enough memory using formula ====
* The LFA/AFM code could calculate how much RAM would be needed to run the algorithm on a given skill model using the formula provided by Hao. This formula is based on the number of transactions, number of students and number of skills. Right now it will not schedule itself to run on a model with over 300 skills, though there is a manual override. [[User:Alida|Alida]] 13:35, 29 November 2010 (EST)

==== Create KC Models through Web Services ====
* For John Stamper's CMDM project, it would be nice to automatically update KC Models through web services. -- [[User:Koedinger|Ken Koedinger]], DataShop Team Meeting, Sept 11, 2009

==== Automatically discovering new KC model ====

* Would it be possible to run some code (perhaps Hao's KC model selection code, perhaps something else generated by CMDM thrust) to find new best KC model. -- Vincent Aleven, Sept 2008
* As a learning sciences researcher, I would like DataShop to discover a new/better KC model for me.
* Could be done as a plug-in

==== Generate new KC Models with LFA ====
* Not sure who asked for this.
* It would be nice to generate new KC Models with Hao's LFA code
* Would need to specify factors.
* Ideas on where this could run?
** On a separate server? Request it to be run, specify duration. Have separate server queue up requests, email user when done.
** In Java Applet on client machine? -- Phil Pavlik

==== Same Skill Twice on Same Step ====
* Would like to be able to apply the same skill to a step twice during a KC Model Import. -- Ken Koedinger, email, 2/4/2009

==== Save KC Model Import Files ====
* KC Model Import - save the file used to create the KCMs in case we need to recreate them. -- Ken Koedinger, email 3/4/2009

==== Log Likelihood and MAD ====
* Log Likelihood, MAD (mean absolute deviation) problem, MAD step (store and show) -- Hao Cen
** This is a variation on "Richer statistics for KC modeling" above. Probably should be merged. - Ken Koedinger

==== Better naming for KCs in auto-generated Unique-Step KCM ====

As a researcher, I want the KCs in the Unique-Step model to have better names than KC1, KC2, etc, so that I can easily tell which generated KCs go with which unique step.

* Hui Cheng (Email 1/20/2009), Ken Koedinger (Email 1/22/2009)
* Could you put this in your new feature request list: could the “Unique-step model” be better labeled then just “KC1”, “KC2” and etc? For example, for Assistments data, you could use part of the “Step Name”. -- Hui Cheng, Email 1/20/2009
* But, anything is better than "KC". -- Ken Koedinger, email, 1/22/2009
<blockquote>A simple alternative, that preserves uniqueness and addresses length, is to concatenate: 1) the first K letters of the step name 2) a unique numerical increment (same as the "3" in "KC3"). Note that (2) guarantees uniqueness just as it does in the current "KC<num>" scheme. Or perhaps better given that step names are often scoped within problems, is to concatenate: 1) the first L letters of the problem name 2) the first M letters of the step name 3) a unique numerical increment (just like the "3" in "KC3") I think K or L+M should be as big as possible without making the KC names indistinguishable (because they run off the right margin) in the KC list on the Learning Curve and other pages. </blockquote>

==== Visualize Learning Curve Split ====
* Be able to visualize a learning curve split into 2 based on a specification of a subset of problems. -- Albert Corbett, Math CCM, November 2008
* Harder: make me a new KCM out of it. -- Ken Koedinger, Team Mtg, Dec 5, 2008, while trying to describe Albert's request to Alida

==== Statistical Significance ====
* Can DataShop determine if the difference between conditions or learning curves is statistically significant? -- general theme at workshop, probably mentioned by Bob Hausmann in his talk, Winter Workshop 1/23/2008
** Can't do it yet in DataShop, but I can show you how to do it in R (or SPSS...) after you've exported the data -- export the "student-step rollup" than the whole transaction table. --[[User:Koedinger|Koedinger]] 16:30, 16 September 2009 (EDT)
*** This comment perhaps belongs (exists?) elsewhere: The current student-step rollup export (relly, all exports) should be such that I can immediately load it into R (and other packages) without error. Now errors occur, for instance, because there are "#" in the variable names i the student-step rollup. However, it is currently a road-block for helping folks like Bob do analyzes they want to do. --[[User:Koedinger|Koedinger]] 16:30, 16 September 2009 (EDT)
*** See [[Condition in Student-Step Rollup]] -- [[User:Alida|Alida]] 10:35, 17 September 2009 (EDT)

==== Split and Merge Skills ====

* Hand searches though a p-matrix for a dataset to split and merge skills. (Pie in the Sky) -- Ken Koedinger, Team Mtg, 02/22/2008

==== Notes on new KCMs ====

* It would be good if I could add a note to a KC model that was newly imported. -- Noboru Matsuda, email, Nov 19, 2009

==== Display Learning Curves Grouped by Interestingness ====

* The page displaying all the learning curves today seems to be alphabetically sorted by KC name, which is not necessarily meaningful.
* An alternative presentation is to group the curves into 4 sets, breaking up the page. Set 1 has curves that contain significant spikes, and therefore seem to be "low-hanging fruit" for purposes of breaking up into KCs. Set 2 has curves with few spikes, but they have a long X axis, suggesting that students are presented with too many opportunities to acquire those KCs. Set 3 has the "good" curves, i.e., nicely decreasing curves that are not too long. Set 4 is "other". -- [http://www.pitt.edu/~goldin Ilya Goldin] 7 December 2009

==== Order KC models according to AIC====
* This is based on results from multiple dataset analyses that compared AIC, BIC and loglikelihood to cross validation RMSE. -- Mimi McLaughlin, 2/9/2011

====Use log of opportunity count for AIC and BIC calculations====
* We compared using the log of opportunity count to whole number opportunity count in multiple datasets. We found the results for log of opportunity count to be consistently better, though small. -- Mimi McLaughlin, 2/9/2011

== Developer Requests ==

==== Store Converted Date and Converter Info ====

As a DataShop administrator, I'd like to see the converter information (version and date) stored in the database, so that I do not have to store that data manually in the Additional Notes field each time I load a dataset. -- Kyle, 8/5/2008

* DTD new fields:
** Store conversion and converter information in database (anything else?)
*** conversion_time
*** converter_info
* Email from Octav, 10/5/2007

I see there's also a note about the converter version in the Dataset
Info. Which is good, but it seems it's taken from the directory name
when I submitted the set. I don't know how reliable that is.  :-) It
would be better if it's taken from the new converter info field.

==== Convert from XML to tabbed-delimited format ====

* If the users agree that export format is valuable, then maybe if they could convert from XML to export format to see data in Excel, could look at Selection column and see blanks more easily. -- Jonathan Sewall, ET Mtg, 10/10/2007

==== Plug-ins (general issues) ====

* Please please support Ruby on Rails. -- Ben Shih, December 2008

==== Create UI to grant DataShop user roles ====

* Already tedious.
* Alida, User Meeting AAR on December 9, 2009

== Help ==

==== Specialize Label of Help Button ====
* Since this help is better than in most applications, it should say more than just 'Help'. -- Ken Koedinger & Ryan Baker, Team Mtg, May 30, 2008
** Ideas:
*** 'Page Help'
*** 'Help with this page'
*** 'Help with Learning Curve page' (Ken's favorite)
*** 'Help with this tool'

== Home Page ==

==== Redesign the Home Page ====

* In the menu of data sets at the top, include the N (=20 or as many as fit on the screen?) data sets that I have visited starting with the ones I've visited most recently. --[[User:Koedinger|Koedinger]] 16:16, 16 September 2009 (EDT)
** As of today, the menu shows the last 10 data sets, most recently visited at the top. I think the feature is good enough, but let's ask Ken --[[User:Bleber|Bleber]] 10:56, 6 August 2010 (EDT)
* There needs to be a better ordering for the datasets (DS364) -- but search/filtering/sorting would take care of that
* Maybe a search to filter the list of datasets since the list is so long. -- Brett Leber, 6/14/2007
* Going back to the home page always goes to 'My Datasets' (DS313) [fixed May 2012]
* Maybe show more high level stats on this page, like how many transactions [done Jan 2012], students, skill models
** "I am particularly looking for data from courses that contain large numbers of students (e.g., thousands or more). Does the Datashop have any such data? I perused the datasets but couldn't tell from the list how many students each course contains." - Kate Forbes-Riley, email on 8/9/2012
* Allow users to post and share project documentation (files, papers, other meta-info) -- Ruth Wylie, suggested during meeting on 8/4/2010. She had a file that is relevant to multiple datasets in her project.

==== News Feed ====
* Bruce McLaren, email, 4/20/2012
**I wonder whether it would be possible to have a scrolling "News" feed somewhere on the DataShop site that would keep researchers informed about what is happening with the DataShop. For instance, the "News" feed could always show the last time a data conversion finished, announce an upcoming workshop, or inform everyone of critical DataShop issues, such as the fact that the DataShop had a server go down recently (which I know caused you guys a lot of headaches - but which I didn't hear about until a couple weeks after it happened). This kind of thing could be a great communication tool and, as an added bonus for DataShop personnel, avoid lots of email with questions like "where is my data?" or "when is the next conversion going to finish?" Perhaps it would even be possible to have the data conversion routine automatically update the "News" feed each time it begins and/or finishes processing?
** Take it with a grain of salt -- I know you have lots of things on your plate -- but I have been in the situation often, especially just before and during my studies, where I wasn't sure what was going on with DataShop conversions and issues and had to track down someone -- typically Alida -- to figure things out. I have the advantage of sitting right next door to Alida, but I wonder how many other researchers within the PSLC, those not in close proximity to Alida, deal with this issue of not really knowing what is happening with the DataShop at any given time.
* Jonathan Sewall also requested a page that shows the status of the log conversion process, including how much data was processed and for which datasets

== Import ==
== Miscellaneous ==

==== Analyses by LearnLab ====

* Organize data by LearnLab, not by "data set". -- Ryan Baker, Aug 2008
* Also: Bob Hausmann, Sep 2008; Maxine Eskenazi, Sep 2008
* Essentially, current data sets become samples, but the top-level unit is the LearnLab. You can take every data set in a LearnLab together as a sample.
* Implies being able to run analyses across data sets, and export multiple data sets together; to create multi-data set samples
* As a user of DataShop, I would like to look at learning curves for all Algebra data together (for example), or export all Algebra data
* Important long-term, but is a lot of work -- in particular, we need to solve scalability issues first.

==== Save Settings Between Sessions ====
* It would be useful if DataShop could save settings between sessions. -- Bob Hausmann, User Meeting, 2/1/2008
** "I do a lot of redoing the same steps" (e.g., set cutoffs, select a KC model, select students).

==== Multiple steps per transaction ====

* Needed so that we do not have to create multiple transactions for the same actual action for Andes logs. -- Kurt van Lehn, Feb 2007

==== Demographic data ====

* This has been mentioned by NSF visitors, AB, ESL, and some researchers.
* Also mentioned at Winter Workshop 1/23/2008.
** Derek/Sue-mei: Student background information not in DataShop. Would like to see a student or set of students from a particular demographic, and view them across datasets!
* Note that Gail added demographic data to Additional Notes field on the Dataset Info page for many datasets. The idea here is to put that data into the database somewhere.

==== Single Sign On ====
* Michael Bett, email, 10/8/2007
* It would be nice if the following services have a single login account/password:
*# Theory Wiki
*# Learnlab.org
*# ESL's OSS
*# DataShop

==== Reveal unanonymized student IDs ====

As a researcher/PI performing research assistant tasks, I want to see easily the unanonymized student IDs of subjects in DataShop so that I can email my subjects telling them when to use my system.

* Ruth Wylie, July 3, 2008
* As the honest broker of the PSLC data, I have promised to not reveal the student IDs and to protect the identity of the students. This is part of the DataShop IRB. Therefore, I do not see this request as possible. [[User:Alida|Alida]] 09:53, 4 September 2009 (EDT)
** At some point, some coordination would be good with OLI's Digital Dashboard project that Marsha Lovett (and the OLI team and sometimes me) is working on. This is meant to provide usage information more quickly to instructors. It could also perhaps be used by researchers (with the right IRB rights) in situations like Ruth's.

==== Knight Timeline ====
* Developed by Andrea Knight, 2004

==== Buggy Skills ====
* Ken Koedinger, prototype walkthrough 9/11/2006

==== Confusion Matrix ====
* Brian MacWhinney, prototype walkthrough 9/11/2006

== Navigation Bar ==
==== Filter KCs by Name ====

As a researcher working with KCs, I want to filter KCs based on their names, so that I can...
* Vincent Aleven, Email, 2/3/2008
* "Since Alida said you cannot have two mechanisms for putting together your KC set (i.e., cannot have both the selecting-by-clicking and selecting-by-filtering), I would probably opt for the latter."
* Alida: I thought Vincent mentioned that he'd like to select which KCs are in a set by filtering on the name. Example: Include KCs with '*reason*' in the name and exclude KCs with '*given*' in the name.
* This could be an addition to our v3.0 KC-selection mechanism--filter by name.
* Vincent, Email, 5/6/2009: Expressed another need for this feature. Could just allow for a wider area and longer list so that more items can be checked at once. The number of characters we show right now is not enough because in many cases that number of characters is the same across many of the skills. Reference data set: Geometry CWCTC 2005-2006

* Status: Design Started

==== Facebook-style KC Selection ====

As a researcher working with KCs, I want to select KCs based on the learning curve thumbnail, so that I can see quickly which ones I'm interested in exploring more deeply.

* Feature already designed for v3.0, not implemented due to time constraints.
* Agreed this would be really useful. -- Kirsten Butcher, User Mtg, 1/31/2008
* Status: Guestimate: 20 days, need to revisit requirements document

==== Feedback after clicking a large sample on a large dataset ====

As a user of DataShop (first-time or not), I want some feedback and the ability to cancel after I do something that might take a long time (e.g., clicking "All Data" on a large dataset) so that I do not get stuck.

* Part of the Susan Goldman story
* After clicking a sample for a large dataset, there is no "Loading..." text, no feedback that the click was even registered by the app (besides the sometimes busy cursor and small browser "loading" text), nor the ability to cancel the action. 
* We will always have similar problems even if performance is improved, so providing feedback and the opportunity to cancel is critical.

==== Save Button in Problem Navigation Box ====

* Save buttons in the sidebar. -- Ken Koedinger, Mtg 2006
** Could also put one in the Problem selection box in the sidebar.

==== Make Nav Bar Wider ====
* Make the Sample name and description fields much wider. -- John LaPlante, email 7/10/2007

== New Visualizations/Reports ==

==== Student-KC Rollup ====
As a researcher, I want to see KCs rolled up by student, so that I can ...

* Vincent Aleven, User Mtg, 1/29/2008
** By Student-KC would be more useful than by Student-Problem
** Example: # Steps asking for a hint or error or what proportion had help
** How often bottom out hint occurs

==== Instructor Reports ====
* Phil said he received a lot of positive reactions to providing reports on units for instructors. -- Phil Pavlik, ET Mtg 10/10/2007

==== Manage Authorizations/Projects Page ====
* Lisa Anthony, email 10/23/2007
* Allow PI to change permissions on the datasets.
:"Actually, I couldn't see how to change permissions on the datasets from the website. Is this possible? If not, it might be a nice feature..."

==== Calculate Time Spent on Different Study Activities ====

As a researcher, I want to know how much time, on average, students spend on study activities, so that I can ...

* Bruce McLaren, Email, 4/7/2009
<blockquote>
For my most recent stoich study, Shawn and I are interested in calculating timing information such as: (a) how long students spent, on average, working on individual tutors (b) how long students spent, on average, on all items in an intervention (c) how long students worked, on average, on post-tests. 

Timing information is very commonly required for studies, and can be calculated from DataShop logs relatively easily, so even if we don't have it, might be worth considering. (And we don't want to re-invent the wheel, if you already have it or are planning it...)
</blockquote>

==== Incorrect Step Duration and Hint Step Duration ====

As a researcher, I want to be able to see total step duration if the student's first attempt was an incorrect attempt, and total step duration if the student's first attempt was a hint request, so that I can do some analyses that I can't do with "Error Step Duration".

* Bob Hausmann, email, 11/11/2008.
* Updated title and story with 'step duration' instead of 'time'. -- [[User:Alida|Alida]] 10:36, 4 September 2009 (EDT)

==== Grading ====

[[Grading]]

====Display number of steps and number of observations for skills====
* How and where to be determined by developers. -- Ken (entered by Mimi)

== Reports ==
=== Dataset Info ===

==== Pointers to Hard-copy Data ====
* Brett van de Sande, NSF Site Visit, 5/28/2008
* Pointers to hard-copy data such as paper tests and/or homework.  Include contact information.  It doesn't seem to make sense to scan a whole filing cabinet of paper if no one wants to look at it.  And any secondary researchers don't know about the filing cabinet to ask for it.

==== Sort Problem Breakdown Table ====

* Would like the ability to sort the table by clicking on the column headings of the Problem Breakdown Table on the Dataset Info Tab. -- Bruce McLaren, User Mtg, 11/5/2007

==== Rename dataset ====

As a researcher, I want to rename my dataset so that it makes more sense to other people. I also want to make sure the dataset doesn't become polluted later by new data not associated with my study.
* Ruth Wylie, July 3, 2008
* There are reasons she would want to do this (current name is worthless, other researchers might try her tutor and pollute her data) but also reasons for not doing it (log more data later).
* There are risks in changing a dataset name that might not be apparent. For example, if you want the new data in the same dataset. [[User:Alida|Alida]] 10:27, 4 September 2009 (EDT)

==== Average time per problem, average number of problems, total number of sections ====

In addition to showing student hours per dataset, it would be useful to know the average time spent per problem, average number of problems completed, and the total number of sections.
* Noboru Matsuda, June 18, 2010

=== Error Report ===

==== View By Student ====

* Would like to see what a couple of students saw in the feedback. -- Marsha Lovett, 10/11/2007

==== Export ====
* I would like the ability to export this data. -- John LaPlante, email thread 'Suggestions for Improvement' 7/10/2007
* Also interested in this feature idea. -- Bruce McLaren, User Mtg, 11/5/2007

==== Sort ====

* Was planned for but not implemented in v2.1 (estimated to be a 4 day task)
** By Correctness %, starting with the least correct
** By Hints %
** Step (or KC if view by KC)
** Number of Students
* Ability to sort problems by their average experienced position within the curriculum. -- Ken Koedinger, 02/16/2007
** Which problem did students most often experience first, then the one experience second most often, ...
* Order steps by the order they typically are executed by students. -- Ken Koedinger, email 11/7/2008
:"Searching through the steps in a problem to get a sense of what is going on is currently hard because the steps are ordered alphabetically, not by the order in which most students did them. While not all students do all steps in the same order, there is some regularity there. It would be quite useful if the steps could be ordered in a "typical order". This could be accomplished by using the time stamps (of the first (correct?) transaction?) for each step to determine rank order of the each step for each student in a problem and then for each problem average the rank order of each step across all students. Then arrange the steps in the Error Report by their average rank order -- that is, roughly speaking, the step that is most likely to be first across students (closest on average to first) goes first, the step with the next lowest rank goes next, etc."
* The capability to count the number of errors of each message type and sort in different ways, for instance by all errors that had no messages. -- Bruce McLaren, email 10/22/2007
:"This is an error analysis I recently did in Excel, using pivot tables, that might be handy if in the DataShop. This one is very important for tutors because the errors that occur most frequently, yet don't elicit messages to the students, are good candidates to become errors with feedback."

==== Show more than 500 problems ====

In the error report, can we see more than 500 problems? See set "Cog Model Discovery Experiment Spring 2010"
2/2888 selected.
(Showing the first 500)
-- Ken via email on 2/15/2011

=== Export ===

==== Include Step Start Time in transaction format ====

* The transaction-level export already includes problem start time; could it also include step start time? I can easily compute it myself, but it seems there's a specific algorithm that the student-step rollup uses, and it might be nice to include the same value here. --Ilya Goldin, email on 01/16/2012

==== Last attempt on step? column for transaction format ====

* Include a new column that shows whether the row is the last attempt on a step for a student or not. Could be 0 or 1 as value. Helpful for researchers who are doing grading of data. Transaction format. --Vincent Aleven, CTAT mtg 11/5/2010

==== Elapsed Time ====

* Include the elapsed time in preview and transaction export. It is more valuable than the transaction time as an absolute reference. Possible to keep both. --Ken Koedinger, Team Mtg 04/18/2008

==== SQL Format ====

* Option to export as an SQL file. -- Ken Koedinger, 03/26/2007, also brought up in June ET Meeting
** Ability to export an SQL dump of a dataset. --Kyle Cunningham, 04/03/2007

==== Specify Character for Blanks ====

* Ability to specify what character if any is used for blank. --Ryan Baker, email 8/9/2007
:"Not all tools handles TABTABTAB correctly on import. The period '.' is used to mean missing data in most stats packages. The word 'BLANK' is used in some other ones. Not an issue for Ryan as he wrote a preprocessor to convert blanks.

==== Opportunity (at KC) Count in Transaction Export ====
* Include the opportunity count in the transaction export (it's only in the student-step rollup) -- Noboru Matsuda, 10/08/2009
** Would be repetitive
** Would need one Opportunity column per KC

==== Export only rows that have KCs tagged ====
* Is it possible when a exporting dataset to include only the [transaction?] rows that have knowledge components tagged? And the same goes for KC models export, only include the items that have KC tagged? -- Hui Cheng, 01/19/2010
** We have the inverse of this option on the Performance Profiler, 'Include steps without a knowledge component', and with the Student-Problem export, 'Include Steps without KCs'.

==== Include Custom Fields in Student-Step Rollup ====
* Is it possible to include the custom field in the datashop [step] export? -- Hui Cheng, 03/01/2010
** "I am thinking about the student-step level export. We have a student from Statistics wanting to use assistment data from datashop. She wants to have problem set id (=curriculum id in datshop). For Assistment data, we decided to put problem set id in the custom field because in the Assistment a lot of problems are repeated in different problem sets. She needs data at the student-step level." (Hui email)
* Vote from Phil Pavlik too -- see [[Collected_User_Requests#Include_transaction_custom_fields_in_web_services_student-step_export|Phil's comment under Web Services]]

==== Don't duplicate rows in student-step format when not showing KCs ====
* If the checkbox to show knowledge components is not checked, maybe it doesn't make sense to show rows more than once if they have more than one KC associated with the step. (Mimi (and Brett) stumbled on this. 8/16/2010)

==== Student-Step Rollup include Success Column ====
* Step rollup, 1 if correct, 0 if incorrect/hint, blank otherwise call it Success. ~~ Ken Koedinger, DataShop Team Meeting, Oct 22, 2010

=== Learning Curve ===

==== Default sort by observation on LCPID ====

* Default sort by observation on LCPID. ~~ Ken Koedinger, DataShop Team Meeting, 10/22/2010

==== Purple Point ====
* Purple Point: if a point on the LC has more than one KC associated with it but you have drilled down to a given KC, then the blue line is off. We could put a purple point that takes this into account.
* Simpler thing: display a warning message that some points in the display are driven by other KCs
* Pearson may be interested
* This was mentioned during the PSLC Summer School 2010.
* For a step with multiple skills, attribute the error only to the skill with the highest overall error rate. ~~ Alida, meeting with Ken, November 18, 2010

==== Reduce Scrolling ====
* Add a forward and back button to the graph to reduce scrolling. -- John LaPlante, email 7/10/2007

:"The learning curve page could use a forward and back button to cycle through learning the learning curves. Going through them one by one requires a lot of scrolling."

==== Turn On Point Labels ====

* It would be nice to have the option to turn on point labels. It is nice that I can mouse over a point and view the data but it would be nicer if it appeared automatically. -- John LaPlante, email 7/10/2007

==== Option for Bigger Graph ====

* Allow user to see bigger graph. -- Derek Chan, Winter Workshop 1/23/2008
** Potential solution: enable user to set x, y scale manually

==== LC Normalize Scale of Thumbnails ====
* [[LC Normalize Scale of Thumbnails]]

=== Performance Profiler ===

==== Rename Performance Profiler ====
* John laPlante, email thread 'Suggestions for Improvement' 7/10/2007
** Did not use this report as thought it had something to do with improving the performance of the DataShop itself. This report might have been much better to use then the Error Report, would still need an export as using the data in a tabular form was still necessary. Note that the pivot tables created were added to the dataset (Pittsburgh Science of Learning Center Stoichiometry Study 1).
* Lisa Anthony, email 8/2008
** Didn't know to go to the report. Export would be useful.
** Needed a better definition of Error Rate with respect to Problem and Unit rows.

==== Export ====
* John laPlante (see comments in Rename Performance Profiler)
* Lisa Anthony (see comments in Rename Performance Profiler)
* Yes, I too can imagine wanting to export the results of a particular performance profiler output (i.e., to a table) so that I can graph it my own way. --[[User:Koedinger|Koedinger]] 16:24, 16 September 2009 (EDT)

==== Table View ====
* Add option to switch to a table view.
* Columns are: Problem Name, Steps, % incorrect, Incorrect Steps, % hint, Hint Steps, etc. include all values in pop-up.

==== Union of KCs/Problems/Students ====
* Allow user to get the union of KCs/Problems/Students etc so they can compare across samples easier. -- Kirsten Butcher, Winter Workshop 1/23/2008
:"It is difficult to compare performance profiler graphs across samples because the KCs (or problems, or whatever) aren't necessarily in both of those samples."

==== Show Details In Report ====
* Click on bar to see details in report and not just in pop-up. It disappears too quickly. -- Alida, Brett

==== Show More Information in the Graph ====
* Show more information in the graph: [Bruce M, User Mtg, 11/5/2007]
** Had drilled down by a certain skill - skill is not listed in the graph, user has to check the skill list on the LHS to see what skill was selected
* Maybe related to Bruce's "show more info in graph": show the actual value of the range variable. e.g., when range is error rate, show the error rate number somewhere (right now you can see, via mouse-over, the incorrect, hint, and correct percentages, but not the error rate) -- Brett
* Design idea to show # steps incorrect, # steps hint, # steps correct to clarify how the percentages are calculated. -- Alida, Brett

== Sample Selector ==

==== Sub-Samples ====

* Sub Samples would be helpful. -- John LaPlante, email 7/10/2007
:"A nice solution would be to have sub-samples where one property varies. When I'm doing this analysis, I've changed my sample many times, renaming it sometimes, tweaking it to get variations on the data. The samples are really useful but they could help me a lot more with this kind of experimentation."

==== Filter out students ====

As a researcher, I want to filter out test users (including myself) from my data so that I see less noise in the data.

* Ruth Wylie, July 3, 2008
* You can already filter out test users by using a test user id that starts with 'weirdCMUuser_xxx'. Then create a sample that excludes students with a name like 'Test_%'. --[[User:Alida|Alida]] 10:29, 4 September 2009 (EDT)

==== Filter by "Class" ====

As a researcher, I want to create samples based on "class" since class is how I've encoded my conditions.

* Maaike Waalkens, July 30, 2010
* This is what made sense using Mathtutor for tutor delivery.
* What other fields are we missing in the Sample Selector?

==== Filter by "Step" ====

As a researcher, I want to create samples based on "step" since I'm only interested in one particular step.

* Mimi McLaughlin, August 17, 2010

==== Filter by "Problem View" ====

As a researcher, I want to exclude transactions from my sample where problem view is greater than 1.

* Michael Yudelson and Summer School participant, August 8, 2012

==== Create Sample Automatically ====

* Would it be possible for me to get a random sample from the 'Bridge to Algebra 2006-2007' dataset of 100 students? I am having trouble looking at the data because it takes too long to load, and my adviser thought that was because the dataset was too large. ~~ DataShop User, 10/19/2010

== Web Services ==

==== Include transaction custom fields in web services student-step export ====

* "I noticed in the web services guide that cfs (which provides the custom fields) is not yet implemented for step roll-up tables. It is marked [coming soon] in the manual... I could write a workaround to pull in the transactions and lookup the custom fields, but I'd really rather not." -- Phil Pavlik, 4/27/2011

* "It would be best if the CF for step rollup was drawn from the primary transaction for the first attempt that is correct or incorrect. At least that seems like the generally best value. Basically, there may be many cases where the custom field is the same across all transactions for a step. In this case ... you could just use the first one since they are all the same." -- Phil Pavlik, 4/29/2011

==== Use Custom Fields Graphs/Reports ====
In Graphs

As a researcher creating custom fields and assigning values at the transaction level, I want DataShop to perform the aggregation to the step level so that I can do other things with my custom-field variable such as graph it. -- Ryan Baker, mtg w/Alida & Brett, 12/15/2008

Performance Profiler
* Use all discrete variables/custom fields on the left and all the continuous variables on the bottom of the Performance Profiler. -- [[User:Koedinger|Ken Koedinger]], DataShop Team Meeting, Sept 11, 2009

==== Ad-hoc queries ====

* Allow restricted filtering on steps and transactions as the next web service feature (after CFs), whatever they can filter on in the navigation boxes (User Meeting AAR, December 9, 2009)

==== Sample creation as a web service ====

* Sample creation is still too slow. (User Meeting AAR, December 9, 2009)

==== Ability to directly query the database ====

* I am looking for a unique identifier for student->problem-step->problem-view. I am able to compute this in R, but it would be better to be able to query the DataShop database to get this unique identifier instead of having to recreate DataShop (essentially) in R. -- Ilya Goldin, 5/21/2012

 
----
See completed [[DataShop 3.x Features]] 
See on-going [[DataShop 4.x Features]] 
See prioritized [[DataShop Feature Wish List]]
[[Category:Protected]]
[[Category:DataShop]]

Collected User Requests

2012-08-13T14:59:40Z

Bleber: /* Redesign the Home Page */

See prioritized items on [[DataShop Feature Wish List]].

== Annotations ==
* Some links from Ruogo Kang's (CMU PhD student, Sara Kiesler) recent talk. -- Ken, email, 8/24/2011
** http://vis.stanford.edu/papers/senseus
** http://vis.berkeley.edu/papers/commentspace/

==== Annotations on Transaction Level ====
* I have models which can annotate things like: gaming, bored, etc. on the transaction level. -- Ryan Baker, ET Mtg 12/5/2007

==== Annotations on Student Level ====
* Would annotate on student level. -- Ido Roll, User Meeting, 1/19/2009

==== Annotations on Pages ====
* See the cool thing created by Jeffery Heer where all the settings of the page were recorded with the comment. -- Ryan Baker, DS Team Mtg 5/23/2008

==== Dataset Discussion - Capture data-integrity issues ====
* As a stakeholder in the DataShop project, I want to capture and publicize the data-integrity issues discovered with data sets so that data is better documented (and so we've fulfilled a promise to our funders to better document data). -- Ken Koedinger, Team Meeting, 8/15/2009
* As a user of DataShop, I want to discuss datasets and have that discussion attached to the dataset so that others can comment and better understand any data-integrity issues I've found.

==== Linking to internal pages ====

* It would be handy if a link can be saved to any dataset, sample, page in the DataShop. -- [[Alida]], 10/18/2007
** Currently, https://pslc-qa.andrew.cmu.edu/datashop/DatasetInfo?datasetId=793 works if you are already logged in.

==== Have a link from the DataShop to the Theory Wiki (Dataset to Project Page) ====

* Can we link from the dataset to the project page on the Theory Wiki? In the pipeline have a clickable link to the project page (make project name clickable). -- Michael Bett, ET Mtg 11/14/2007
** Link to a dataset directly? Is that obvious to users? Click on dataset link -> log in -> redirected back to dataset. -- Brett Leber

== Data Format ==
==== Capture the question prompt and answer choices the student chose from ====
* As a research, I want to be able to identify steps based on the question prompt—not the difficult-to-understand step names that come from selection and action of my tutor—so that I can analyze the data more easily. For multiple choice questions, I also want to be able to see all of the choices that were available to the student. -- Eli Silk - February 1, 2011 (meeting with Brett and Ross Higashi of the FIRE project)
** Near-term solution is to create a table locally that maps steps to prompts
** Another solution is to ask CTAT team to modify their Flash components so that they log this information as custom fields
** Long-term solution is to make these fields into standard fields in the tutor message format
** In the email thread "Cognitive Tutor Development and Evaluation Requests", there is support for this general idea from Ken Koedinger, Albert Corbett, and Christian Schunn.
** Ken added that "Ideally, we may want to store any images that the student can see and where they reside on the screen ..."

== Data Modeling ==
=== Non-KC Modeling ===

==== Automatic Distillation ====
* As an educational data miner wishing to develop a machine learned model with PSLC data, I would like to be able to automatically distill data features (e.g. custom fields) commonly used in past educational data mining research for a new data set (see, for instance, Baker, Corbett, Roll, & Koedinger, 2008 in UMUAI) -- Ryan Baker, Summer 2008, Startup Memo
** Could be implemented as a plug-in
* Also interested in this feature idea. -- Dan Franklin, Oct 2008

==== Upload model and apply it to new data set ====

* As an EDM researcher, I would like to take a model, expressable as a linear formula on DataShop fields, or a simple code procedure (e.g. Bayesian Knowledge Tracing, which Ryan has code for), and apply it to a new data set, so that I can ... -- Ryan Baker, Sept 2008
* Also interested in this. -- Maxine Eskenazi, Sept 2008
* May work best as a plug-in
** Code to display GUI to choose which data sets to use, calls model code, re-import to DataShop
** Good to have a way to apply many models, as soon as you import a data set
* Phil has an idea that maybe fits within this one. Please move if there's a better category. -- Brett Leber
<blockquote>This [''transaction? kc? --ed.''] relabeling is really mostly about enabling modeling in DataShop right? With this in mind, I think that it is actually a higher priority to have model alternatives in DataShop.... E.g. Investigators should be able to give you chunks of Java code according to a certain specification, and DataShop should be then able to run these over datasets (perhaps after a certain series of QA occurs according to an SOP) when the investigator clicks some button in DataShop.... Obviously this is a much large project than adding columns, but it is also much more important in my mind. ''-- Phil Pavlik, email to Brett on 1/14/2009''</blockquote>
* Examples:
** Example: running gaming detector in multiple tutors and comparing gaming frequencies
** Example: applying Bayesian Knowledge Tracing to a new data set from the same LearnLab
** Example: applying Ben Shih's models to many data sets. Note that Ben Shih should be included in design of this feature; he is interested, and has a lot of good ideas.

==== Add Different Predicted Values ====
* Would also like to add statistics, different predicted values than what LFA produces. -- Ken Koedinger, ET Meeting, 10/10/2007

==== Bayesian Knowledge Tracing ====
* Bayesian Knowledge-Tracing built into DataShop like LFA is. -- Ryan Baker, Startup Memo, Summer 2008

==== Richer statistics for KC modeling ====

* In addition, to the model stats and estimates generated for learning factor models, we should also create difficulty factor models (i.e., ones with "Slope" parameter). The latter is particularly relevant for the Unique-Step model where the slope parameter is meaningless (but still counts against the BIC value). -- Ken Koedinger, Email "new feature request", 1/22/2009
** Relatedly, we should report significance values on the Slope parameters -- that is, when is the Slope significantly different from 0.
** The KC models page perhaps should also report the log-likelihood and number of parameters (in addition to BIC) and leave out AIC. We might also consider other metrics of model generality, like the "adjusted R2" (if I have this name right -- Joe Beck mentioned in the Assistments meeting yesterday).
** These changes will be part of meeting the CMDM goal of improvement in (or at least demonstrate acceptability of) the cognitive models in 90+ units in our LearnLab courses (or affiliates).

=== KC Modeling ===

==== LFA/AFM: Check if enough memory using formula ====
* The LFA/AFM code could calculate how much RAM would be needed to run the algorithm on a given skill model using the formula provided by Hao. This formula is based on the number of transactions, number of students and number of skills. Right now it will not schedule itself to run on a model with over 300 skills, though there is a manual override. [[User:Alida|Alida]] 13:35, 29 November 2010 (EST)

==== Create KC Models through Web Services ====
* For John Stamper's CMDM project, it would be nice to automatically update KC Models through web services. -- [[User:Koedinger|Ken Koedinger]], DataShop Team Meeting, Sept 11, 2009

==== Automatically discovering new KC model ====

* Would it be possible to run some code (perhaps Hao's KC model selection code, perhaps something else generated by CMDM thrust) to find new best KC model. -- Vincent Aleven, Sept 2008
* As a learning sciences researcher, I would like DataShop to discover a new/better KC model for me.
* Could be done as a plug-in

==== Generate new KC Models with LFA ====
* Not sure who asked for this.
* It would be nice to generate new KC Models with Hao's LFA code
* Would need to specify factors.
* Ideas on where this could run?
** On a separate server? Request it to be run, specify duration. Have separate server queue up requests, email user when done.
** In Java Applet on client machine? -- Phil Pavlik

==== Same Skill Twice on Same Step ====
* Would like to be able to apply the same skill to a step twice during a KC Model Import. -- Ken Koedinger, email, 2/4/2009

==== Save KC Model Import Files ====
* KC Model Import - save the file used to create the KCMs in case we need to recreate them. -- Ken Koedinger, email 3/4/2009

==== Log Likelihood and MAD ====
* Log Likelihood, MAD (mean absolute deviation) problem, MAD step (store and show) -- Hao Cen
** This is a variation on "Richer statistics for KC modeling" above. Probably should be merged. - Ken Koedinger

==== Better naming for KCs in auto-generated Unique-Step KCM ====

As a researcher, I want the KCs in the Unique-Step model to have better names than KC1, KC2, etc, so that I can easily tell which generated KCs go with which unique step.

* Hui Cheng (Email 1/20/2009), Ken Koedinger (Email 1/22/2009)
* Could you put this in your new feature request list: could the “Unique-step model” be better labeled then just “KC1”, “KC2” and etc? For example, for Assistments data, you could use part of the “Step Name”. -- Hui Cheng, Email 1/20/2009
* But, anything is better than "KC". -- Ken Koedinger, email, 1/22/2009
<blockquote>A simple alternative, that preserves uniqueness and addresses length, is to concatenate: 1) the first K letters of the step name 2) a unique numerical increment (same as the "3" in "KC3"). Note that (2) guarantees uniqueness just as it does in the current "KC<num>" scheme. Or perhaps better given that step names are often scoped within problems, is to concatenate: 1) the first L letters of the problem name 2) the first M letters of the step name 3) a unique numerical increment (just like the "3" in "KC3") I think K or L+M should be as big as possible without making the KC names indistinguishable (because they run off the right margin) in the KC list on the Learning Curve and other pages. </blockquote>

==== Visualize Learning Curve Split ====
* Be able to visualize a learning curve split into 2 based on a specification of a subset of problems. -- Albert Corbett, Math CCM, November 2008
* Harder: make me a new KCM out of it. -- Ken Koedinger, Team Mtg, Dec 5, 2008, while trying to describe Albert's request to Alida

==== Statistical Significance ====
* Can DataShop determine if the difference between conditions or learning curves is statistically significant? -- general theme at workshop, probably mentioned by Bob Hausmann in his talk, Winter Workshop 1/23/2008
** Can't do it yet in DataShop, but I can show you how to do it in R (or SPSS...) after you've exported the data -- export the "student-step rollup" than the whole transaction table. --[[User:Koedinger|Koedinger]] 16:30, 16 September 2009 (EDT)
*** This comment perhaps belongs (exists?) elsewhere: The current student-step rollup export (relly, all exports) should be such that I can immediately load it into R (and other packages) without error. Now errors occur, for instance, because there are "#" in the variable names i the student-step rollup. However, it is currently a road-block for helping folks like Bob do analyzes they want to do. --[[User:Koedinger|Koedinger]] 16:30, 16 September 2009 (EDT)
*** See [[Condition in Student-Step Rollup]] -- [[User:Alida|Alida]] 10:35, 17 September 2009 (EDT)

==== Split and Merge Skills ====

* Hand searches though a p-matrix for a dataset to split and merge skills. (Pie in the Sky) -- Ken Koedinger, Team Mtg, 02/22/2008

==== Notes on new KCMs ====

* It would be good if I could add a note to a KC model that was newly imported. -- Noboru Matsuda, email, Nov 19, 2009

==== Display Learning Curves Grouped by Interestingness ====

* The page displaying all the learning curves today seems to be alphabetically sorted by KC name, which is not necessarily meaningful.
* An alternative presentation is to group the curves into 4 sets, breaking up the page. Set 1 has curves that contain significant spikes, and therefore seem to be "low-hanging fruit" for purposes of breaking up into KCs. Set 2 has curves with few spikes, but they have a long X axis, suggesting that students are presented with too many opportunities to acquire those KCs. Set 3 has the "good" curves, i.e., nicely decreasing curves that are not too long. Set 4 is "other". -- [http://www.pitt.edu/~goldin Ilya Goldin] 7 December 2009

==== Order KC models according to AIC====
* This is based on results from multiple dataset analyses that compared AIC, BIC and loglikelihood to cross validation RMSE. -- Mimi McLaughlin, 2/9/2011

====Use log of opportunity count for AIC and BIC calculations====
* We compared using the log of opportunity count to whole number opportunity count in multiple datasets. We found the results for log of opportunity count to be consistently better, though small. -- Mimi McLaughlin, 2/9/2011

== Developer Requests ==

==== Store Converted Date and Converter Info ====

As a DataShop administrator, I'd like to see the converter information (version and date) stored in the database, so that I do not have to store that data manually in the Additional Notes field each time I load a dataset. -- Kyle, 8/5/2008

* DTD new fields:
** Store conversion and converter information in database (anything else?)
*** conversion_time
*** converter_info
* Email from Octav, 10/5/2007

I see there's also a note about the converter version in the Dataset
Info. Which is good, but it seems it's taken from the directory name
when I submitted the set. I don't know how reliable that is.  :-) It
would be better if it's taken from the new converter info field.

==== Convert from XML to tabbed-delimited format ====

* If the users agree that export format is valuable, then maybe if they could convert from XML to export format to see data in Excel, could look at Selection column and see blanks more easily. -- Jonathan Sewall, ET Mtg, 10/10/2007

==== Plug-ins (general issues) ====

* Please please support Ruby on Rails. -- Ben Shih, December 2008

==== Create UI to grant DataShop user roles ====

* Already tedious.
* Alida, User Meeting AAR on December 9, 2009

== Help ==

==== Specialize Label of Help Button ====
* Since this help is better than in most applications, it should say more than just 'Help'. -- Ken Koedinger & Ryan Baker, Team Mtg, May 30, 2008
** Ideas:
*** 'Page Help'
*** 'Help with this page'
*** 'Help with Learning Curve page' (Ken's favorite)
*** 'Help with this tool'

== Home Page ==

==== Redesign the Home Page ====

* In the menu of data sets at the top, include the N (=20 or as many as fit on the screen?) data sets that I have visited starting with the ones I've visited most recently. --[[User:Koedinger|Koedinger]] 16:16, 16 September 2009 (EDT)
** As of today, the menu shows the last 10 data sets, most recently visited at the top. I think the feature is good enough, but let's ask Ken --[[User:Bleber|Bleber]] 10:56, 6 August 2010 (EDT)
* There needs to be a better ordering for the datasets (DS364)
* Maybe a search to filter the list of datasets since the list is so long. -- Brett Leber, 6/14/2007
* Going back to the home page always goes to 'My Datasets' (DS313)
* Maybe show more high level stats on this page, like how many transactions [done Jan 2012], students, skill models
** "I am particularly looking for data from courses that contain large numbers of students (e.g., thousands or more). Does the Datashop have any such data? I perused the datasets but couldn't tell from the list how many students each course contains." - Kate Forbes-Riley, email on 8/9/2012
* Allow users to post and share project documentation (files, papers, other meta-info) -- Ruth Wylie, suggested during meeting on 8/4/2010. She had a file that is relevant to multiple datasets in her project.

==== News Feed ====
* Bruce McLaren, email, 4/20/2012
**I wonder whether it would be possible to have a scrolling "News" feed somewhere on the DataShop site that would keep researchers informed about what is happening with the DataShop. For instance, the "News" feed could always show the last time a data conversion finished, announce an upcoming workshop, or inform everyone of critical DataShop issues, such as the fact that the DataShop had a server go down recently (which I know caused you guys a lot of headaches - but which I didn't hear about until a couple weeks after it happened). This kind of thing could be a great communication tool and, as an added bonus for DataShop personnel, avoid lots of email with questions like "where is my data?" or "when is the next conversion going to finish?" Perhaps it would even be possible to have the data conversion routine automatically update the "News" feed each time it begins and/or finishes processing?
** Take it with a grain of salt -- I know you have lots of things on your plate -- but I have been in the situation often, especially just before and during my studies, where I wasn't sure what was going on with DataShop conversions and issues and had to track down someone -- typically Alida -- to figure things out. I have the advantage of sitting right next door to Alida, but I wonder how many other researchers within the PSLC, those not in close proximity to Alida, deal with this issue of not really knowing what is happening with the DataShop at any given time.
* Jonathan Sewall also requested a page that shows the status of the log conversion process, including how much data was processed and for which datasets

== Import ==
== Miscellaneous ==

==== Analyses by LearnLab ====

* Organize data by LearnLab, not by "data set". -- Ryan Baker, Aug 2008
* Also: Bob Hausmann, Sep 2008; Maxine Eskenazi, Sep 2008
* Essentially, current data sets become samples, but the top-level unit is the LearnLab. You can take every data set in a LearnLab together as a sample.
* Implies being able to run analyses across data sets, and export multiple data sets together; to create multi-data set samples
* As a user of DataShop, I would like to look at learning curves for all Algebra data together (for example), or export all Algebra data
* Important long-term, but is a lot of work -- in particular, we need to solve scalability issues first.

==== Save Settings Between Sessions ====
* It would be useful if DataShop could save settings between sessions. -- Bob Hausmann, User Meeting, 2/1/2008
** "I do a lot of redoing the same steps" (e.g., set cutoffs, select a KC model, select students).

==== Multiple steps per transaction ====

* Needed so that we do not have to create multiple transactions for the same actual action for Andes logs. -- Kurt van Lehn, Feb 2007

==== Demographic data ====

* This has been mentioned by NSF visitors, AB, ESL, and some researchers.
* Also mentioned at Winter Workshop 1/23/2008.
** Derek/Sue-mei: Student background information not in DataShop. Would like to see a student or set of students from a particular demographic, and view them across datasets!
* Note that Gail added demographic data to Additional Notes field on the Dataset Info page for many datasets. The idea here is to put that data into the database somewhere.

==== Single Sign On ====
* Michael Bett, email, 10/8/2007
* It would be nice if the following services have a single login account/password:
*# Theory Wiki
*# Learnlab.org
*# ESL's OSS
*# DataShop

==== Reveal unanonymized student IDs ====

As a researcher/PI performing research assistant tasks, I want to see easily the unanonymized student IDs of subjects in DataShop so that I can email my subjects telling them when to use my system.

* Ruth Wylie, July 3, 2008
* As the honest broker of the PSLC data, I have promised to not reveal the student IDs and to protect the identity of the students. This is part of the DataShop IRB. Therefore, I do not see this request as possible. [[User:Alida|Alida]] 09:53, 4 September 2009 (EDT)
** At some point, some coordination would be good with OLI's Digital Dashboard project that Marsha Lovett (and the OLI team and sometimes me) is working on. This is meant to provide usage information more quickly to instructors. It could also perhaps be used by researchers (with the right IRB rights) in situations like Ruth's.

==== Knight Timeline ====
* Developed by Andrea Knight, 2004

==== Buggy Skills ====
* Ken Koedinger, prototype walkthrough 9/11/2006

==== Confusion Matrix ====
* Brian MacWhinney, prototype walkthrough 9/11/2006

== Navigation Bar ==
==== Filter KCs by Name ====

As a researcher working with KCs, I want to filter KCs based on their names, so that I can...
* Vincent Aleven, Email, 2/3/2008
* "Since Alida said you cannot have two mechanisms for putting together your KC set (i.e., cannot have both the selecting-by-clicking and selecting-by-filtering), I would probably opt for the latter."
* Alida: I thought Vincent mentioned that he'd like to select which KCs are in a set by filtering on the name. Example: Include KCs with '*reason*' in the name and exclude KCs with '*given*' in the name.
* This could be an addition to our v3.0 KC-selection mechanism--filter by name.
* Vincent, Email, 5/6/2009: Expressed another need for this feature. Could just allow for a wider area and longer list so that more items can be checked at once. The number of characters we show right now is not enough because in many cases that number of characters is the same across many of the skills. Reference data set: Geometry CWCTC 2005-2006

* Status: Design Started

==== Facebook-style KC Selection ====

As a researcher working with KCs, I want to select KCs based on the learning curve thumbnail, so that I can see quickly which ones I'm interested in exploring more deeply.

* Feature already designed for v3.0, not implemented due to time constraints.
* Agreed this would be really useful. -- Kirsten Butcher, User Mtg, 1/31/2008
* Status: Guestimate: 20 days, need to revisit requirements document

==== Feedback after clicking a large sample on a large dataset ====

As a user of DataShop (first-time or not), I want some feedback and the ability to cancel after I do something that might take a long time (e.g., clicking "All Data" on a large dataset) so that I do not get stuck.

* Part of the Susan Goldman story
* After clicking a sample for a large dataset, there is no "Loading..." text, no feedback that the click was even registered by the app (besides the sometimes busy cursor and small browser "loading" text), nor the ability to cancel the action. 
* We will always have similar problems even if performance is improved, so providing feedback and the opportunity to cancel is critical.

==== Save Button in Problem Navigation Box ====

* Save buttons in the sidebar. -- Ken Koedinger, Mtg 2006
** Could also put one in the Problem selection box in the sidebar.

==== Make Nav Bar Wider ====
* Make the Sample name and description fields much wider. -- John LaPlante, email 7/10/2007

== New Visualizations/Reports ==

==== Student-KC Rollup ====
As a researcher, I want to see KCs rolled up by student, so that I can ...

* Vincent Aleven, User Mtg, 1/29/2008
** By Student-KC would be more useful than by Student-Problem
** Example: # Steps asking for a hint or error or what proportion had help
** How often bottom out hint occurs

==== Instructor Reports ====
* Phil said he received a lot of positive reactions to providing reports on units for instructors. -- Phil Pavlik, ET Mtg 10/10/2007

==== Manage Authorizations/Projects Page ====
* Lisa Anthony, email 10/23/2007
* Allow PI to change permissions on the datasets.
:"Actually, I couldn't see how to change permissions on the datasets from the website. Is this possible? If not, it might be a nice feature..."

==== Calculate Time Spent on Different Study Activities ====

As a researcher, I want to know how much time, on average, students spend on study activities, so that I can ...

* Bruce McLaren, Email, 4/7/2009
<blockquote>
For my most recent stoich study, Shawn and I are interested in calculating timing information such as: (a) how long students spent, on average, working on individual tutors (b) how long students spent, on average, on all items in an intervention (c) how long students worked, on average, on post-tests. 

Timing information is very commonly required for studies, and can be calculated from DataShop logs relatively easily, so even if we don't have it, might be worth considering. (And we don't want to re-invent the wheel, if you already have it or are planning it...)
</blockquote>

==== Incorrect Step Duration and Hint Step Duration ====

As a researcher, I want to be able to see total step duration if the student's first attempt was an incorrect attempt, and total step duration if the student's first attempt was a hint request, so that I can do some analyses that I can't do with "Error Step Duration".

* Bob Hausmann, email, 11/11/2008.
* Updated title and story with 'step duration' instead of 'time'. -- [[User:Alida|Alida]] 10:36, 4 September 2009 (EDT)

==== Grading ====

[[Grading]]

====Display number of steps and number of observations for skills====
* How and where to be determined by developers. -- Ken (entered by Mimi)

== Reports ==
=== Dataset Info ===

==== Pointers to Hard-copy Data ====
* Brett van de Sande, NSF Site Visit, 5/28/2008
* Pointers to hard-copy data such as paper tests and/or homework.  Include contact information.  It doesn't seem to make sense to scan a whole filing cabinet of paper if no one wants to look at it.  And any secondary researchers don't know about the filing cabinet to ask for it.

==== Sort Problem Breakdown Table ====

* Would like the ability to sort the table by clicking on the column headings of the Problem Breakdown Table on the Dataset Info Tab. -- Bruce McLaren, User Mtg, 11/5/2007

==== Rename dataset ====

As a researcher, I want to rename my dataset so that it makes more sense to other people. I also want to make sure the dataset doesn't become polluted later by new data not associated with my study.
* Ruth Wylie, July 3, 2008
* There are reasons she would want to do this (current name is worthless, other researchers might try her tutor and pollute her data) but also reasons for not doing it (log more data later).
* There are risks in changing a dataset name that might not be apparent. For example, if you want the new data in the same dataset. [[User:Alida|Alida]] 10:27, 4 September 2009 (EDT)

==== Average time per problem, average number of problems, total number of sections ====

In addition to showing student hours per dataset, it would be useful to know the average time spent per problem, average number of problems completed, and the total number of sections.
* Noboru Matsuda, June 18, 2010

=== Error Report ===

==== View By Student ====

* Would like to see what a couple of students saw in the feedback. -- Marsha Lovett, 10/11/2007

==== Export ====
* I would like the ability to export this data. -- John LaPlante, email thread 'Suggestions for Improvement' 7/10/2007
* Also interested in this feature idea. -- Bruce McLaren, User Mtg, 11/5/2007

==== Sort ====

* Was planned for but not implemented in v2.1 (estimated to be a 4 day task)
** By Correctness %, starting with the least correct
** By Hints %
** Step (or KC if view by KC)
** Number of Students
* Ability to sort problems by their average experienced position within the curriculum. -- Ken Koedinger, 02/16/2007
** Which problem did students most often experience first, then the one experience second most often, ...
* Order steps by the order they typically are executed by students. -- Ken Koedinger, email 11/7/2008
:"Searching through the steps in a problem to get a sense of what is going on is currently hard because the steps are ordered alphabetically, not by the order in which most students did them. While not all students do all steps in the same order, there is some regularity there. It would be quite useful if the steps could be ordered in a "typical order". This could be accomplished by using the time stamps (of the first (correct?) transaction?) for each step to determine rank order of the each step for each student in a problem and then for each problem average the rank order of each step across all students. Then arrange the steps in the Error Report by their average rank order -- that is, roughly speaking, the step that is most likely to be first across students (closest on average to first) goes first, the step with the next lowest rank goes next, etc."
* The capability to count the number of errors of each message type and sort in different ways, for instance by all errors that had no messages. -- Bruce McLaren, email 10/22/2007
:"This is an error analysis I recently did in Excel, using pivot tables, that might be handy if in the DataShop. This one is very important for tutors because the errors that occur most frequently, yet don't elicit messages to the students, are good candidates to become errors with feedback."

==== Show more than 500 problems ====

In the error report, can we see more than 500 problems? See set "Cog Model Discovery Experiment Spring 2010"
2/2888 selected.
(Showing the first 500)
-- Ken via email on 2/15/2011

=== Export ===

==== Include Step Start Time in transaction format ====

* The transaction-level export already includes problem start time; could it also include step start time? I can easily compute it myself, but it seems there's a specific algorithm that the student-step rollup uses, and it might be nice to include the same value here. --Ilya Goldin, email on 01/16/2012

==== Last attempt on step? column for transaction format ====

* Include a new column that shows whether the row is the last attempt on a step for a student or not. Could be 0 or 1 as value. Helpful for researchers who are doing grading of data. Transaction format. --Vincent Aleven, CTAT mtg 11/5/2010

==== Elapsed Time ====

* Include the elapsed time in preview and transaction export. It is more valuable than the transaction time as an absolute reference. Possible to keep both. --Ken Koedinger, Team Mtg 04/18/2008

==== SQL Format ====

* Option to export as an SQL file. -- Ken Koedinger, 03/26/2007, also brought up in June ET Meeting
** Ability to export an SQL dump of a dataset. --Kyle Cunningham, 04/03/2007

==== Specify Character for Blanks ====

* Ability to specify what character if any is used for blank. --Ryan Baker, email 8/9/2007
:"Not all tools handles TABTABTAB correctly on import. The period '.' is used to mean missing data in most stats packages. The word 'BLANK' is used in some other ones. Not an issue for Ryan as he wrote a preprocessor to convert blanks.

==== Opportunity (at KC) Count in Transaction Export ====
* Include the opportunity count in the transaction export (it's only in the student-step rollup) -- Noboru Matsuda, 10/08/2009
** Would be repetitive
** Would need one Opportunity column per KC

==== Export only rows that have KCs tagged ====
* Is it possible when a exporting dataset to include only the [transaction?] rows that have knowledge components tagged? And the same goes for KC models export, only include the items that have KC tagged? -- Hui Cheng, 01/19/2010
** We have the inverse of this option on the Performance Profiler, 'Include steps without a knowledge component', and with the Student-Problem export, 'Include Steps without KCs'.

==== Include Custom Fields in Student-Step Rollup ====
* Is it possible to include the custom field in the datashop [step] export? -- Hui Cheng, 03/01/2010
** "I am thinking about the student-step level export. We have a student from Statistics wanting to use assistment data from datashop. She wants to have problem set id (=curriculum id in datshop). For Assistment data, we decided to put problem set id in the custom field because in the Assistment a lot of problems are repeated in different problem sets. She needs data at the student-step level." (Hui email)
* Vote from Phil Pavlik too -- see [[Collected_User_Requests#Include_transaction_custom_fields_in_web_services_student-step_export|Phil's comment under Web Services]]

==== Don't duplicate rows in student-step format when not showing KCs ====
* If the checkbox to show knowledge components is not checked, maybe it doesn't make sense to show rows more than once if they have more than one KC associated with the step. (Mimi (and Brett) stumbled on this. 8/16/2010)

==== Student-Step Rollup include Success Column ====
* Step rollup, 1 if correct, 0 if incorrect/hint, blank otherwise call it Success. ~~ Ken Koedinger, DataShop Team Meeting, Oct 22, 2010

=== Learning Curve ===

==== Default sort by observation on LCPID ====

* Default sort by observation on LCPID. ~~ Ken Koedinger, DataShop Team Meeting, 10/22/2010

==== Purple Point ====
* Purple Point: if a point on the LC has more than one KC associated with it but you have drilled down to a given KC, then the blue line is off. We could put a purple point that takes this into account.
* Simpler thing: display a warning message that some points in the display are driven by other KCs
* Pearson may be interested
* This was mentioned during the PSLC Summer School 2010.
* For a step with multiple skills, attribute the error only to the skill with the highest overall error rate. ~~ Alida, meeting with Ken, November 18, 2010

==== Reduce Scrolling ====
* Add a forward and back button to the graph to reduce scrolling. -- John LaPlante, email 7/10/2007

:"The learning curve page could use a forward and back button to cycle through learning the learning curves. Going through them one by one requires a lot of scrolling."

==== Turn On Point Labels ====

* It would be nice to have the option to turn on point labels. It is nice that I can mouse over a point and view the data but it would be nicer if it appeared automatically. -- John LaPlante, email 7/10/2007

==== Option for Bigger Graph ====

* Allow user to see bigger graph. -- Derek Chan, Winter Workshop 1/23/2008
** Potential solution: enable user to set x, y scale manually

==== LC Normalize Scale of Thumbnails ====
* [[LC Normalize Scale of Thumbnails]]

=== Performance Profiler ===

==== Rename Performance Profiler ====
* John laPlante, email thread 'Suggestions for Improvement' 7/10/2007
** Did not use this report as thought it had something to do with improving the performance of the DataShop itself. This report might have been much better to use then the Error Report, would still need an export as using the data in a tabular form was still necessary. Note that the pivot tables created were added to the dataset (Pittsburgh Science of Learning Center Stoichiometry Study 1).
* Lisa Anthony, email 8/2008
** Didn't know to go to the report. Export would be useful.
** Needed a better definition of Error Rate with respect to Problem and Unit rows.

==== Export ====
* John laPlante (see comments in Rename Performance Profiler)
* Lisa Anthony (see comments in Rename Performance Profiler)
* Yes, I too can imagine wanting to export the results of a particular performance profiler output (i.e., to a table) so that I can graph it my own way. --[[User:Koedinger|Koedinger]] 16:24, 16 September 2009 (EDT)

==== Table View ====
* Add option to switch to a table view.
* Columns are: Problem Name, Steps, % incorrect, Incorrect Steps, % hint, Hint Steps, etc. include all values in pop-up.

==== Union of KCs/Problems/Students ====
* Allow user to get the union of KCs/Problems/Students etc so they can compare across samples easier. -- Kirsten Butcher, Winter Workshop 1/23/2008
:"It is difficult to compare performance profiler graphs across samples because the KCs (or problems, or whatever) aren't necessarily in both of those samples."

==== Show Details In Report ====
* Click on bar to see details in report and not just in pop-up. It disappears too quickly. -- Alida, Brett

==== Show More Information in the Graph ====
* Show more information in the graph: [Bruce M, User Mtg, 11/5/2007]
** Had drilled down by a certain skill - skill is not listed in the graph, user has to check the skill list on the LHS to see what skill was selected
* Maybe related to Bruce's "show more info in graph": show the actual value of the range variable. e.g., when range is error rate, show the error rate number somewhere (right now you can see, via mouse-over, the incorrect, hint, and correct percentages, but not the error rate) -- Brett
* Design idea to show # steps incorrect, # steps hint, # steps correct to clarify how the percentages are calculated. -- Alida, Brett

== Sample Selector ==

==== Sub-Samples ====

* Sub Samples would be helpful. -- John LaPlante, email 7/10/2007
:"A nice solution would be to have sub-samples where one property varies. When I'm doing this analysis, I've changed my sample many times, renaming it sometimes, tweaking it to get variations on the data. The samples are really useful but they could help me a lot more with this kind of experimentation."

==== Filter out students ====

As a researcher, I want to filter out test users (including myself) from my data so that I see less noise in the data.

* Ruth Wylie, July 3, 2008
* You can already filter out test users by using a test user id that starts with 'weirdCMUuser_xxx'. Then create a sample that excludes students with a name like 'Test_%'. --[[User:Alida|Alida]] 10:29, 4 September 2009 (EDT)

==== Filter by "Class" ====

As a researcher, I want to create samples based on "class" since class is how I've encoded my conditions.

* Maaike Waalkens, July 30, 2010
* This is what made sense using Mathtutor for tutor delivery.
* What other fields are we missing in the Sample Selector?

==== Filter by "Step" ====

As a researcher, I want to create samples based on "step" since I'm only interested in one particular step.

* Mimi McLaughlin, August 17, 2010

==== Filter by "Problem View" ====

As a researcher, I want to exclude transactions from my sample where problem view is greater than 1.

* Michael Yudelson and Summer School participant, August 8, 2012

==== Create Sample Automatically ====

* Would it be possible for me to get a random sample from the 'Bridge to Algebra 2006-2007' dataset of 100 students? I am having trouble looking at the data because it takes too long to load, and my adviser thought that was because the dataset was too large. ~~ DataShop User, 10/19/2010

== Web Services ==

==== Include transaction custom fields in web services student-step export ====

* "I noticed in the web services guide that cfs (which provides the custom fields) is not yet implemented for step roll-up tables. It is marked [coming soon] in the manual... I could write a workaround to pull in the transactions and lookup the custom fields, but I'd really rather not." -- Phil Pavlik, 4/27/2011

* "It would be best if the CF for step rollup was drawn from the primary transaction for the first attempt that is correct or incorrect. At least that seems like the generally best value. Basically, there may be many cases where the custom field is the same across all transactions for a step. In this case ... you could just use the first one since they are all the same." -- Phil Pavlik, 4/29/2011

==== Use Custom Fields Graphs/Reports ====
In Graphs

As a researcher creating custom fields and assigning values at the transaction level, I want DataShop to perform the aggregation to the step level so that I can do other things with my custom-field variable such as graph it. -- Ryan Baker, mtg w/Alida & Brett, 12/15/2008

Performance Profiler
* Use all discrete variables/custom fields on the left and all the continuous variables on the bottom of the Performance Profiler. -- [[User:Koedinger|Ken Koedinger]], DataShop Team Meeting, Sept 11, 2009

==== Ad-hoc queries ====

* Allow restricted filtering on steps and transactions as the next web service feature (after CFs), whatever they can filter on in the navigation boxes (User Meeting AAR, December 9, 2009)

==== Sample creation as a web service ====

* Sample creation is still too slow. (User Meeting AAR, December 9, 2009)

==== Ability to directly query the database ====

* I am looking for a unique identifier for student->problem-step->problem-view. I am able to compute this in R, but it would be better to be able to query the DataShop database to get this unique identifier instead of having to recreate DataShop (essentially) in R. -- Ilya Goldin, 5/21/2012

 
----
See completed [[DataShop 3.x Features]] 
See on-going [[DataShop 4.x Features]] 
See prioritized [[DataShop Feature Wish List]]
[[Category:Protected]]
[[Category:DataShop]]

Collected User Requests

2012-08-08T15:32:06Z

Bleber: /* Sample Selector */

See prioritized items on [[DataShop Feature Wish List]].

== Annotations ==
* Some links from Ruogo Kang's (CMU PhD student, Sara Kiesler) recent talk. -- Ken, email, 8/24/2011
** http://vis.stanford.edu/papers/senseus
** http://vis.berkeley.edu/papers/commentspace/

==== Annotations on Transaction Level ====
* I have models which can annotate things like: gaming, bored, etc. on the transaction level. -- Ryan Baker, ET Mtg 12/5/2007

==== Annotations on Student Level ====
* Would annotate on student level. -- Ido Roll, User Meeting, 1/19/2009

==== Annotations on Pages ====
* See the cool thing created by Jeffery Heer where all the settings of the page were recorded with the comment. -- Ryan Baker, DS Team Mtg 5/23/2008

==== Dataset Discussion - Capture data-integrity issues ====
* As a stakeholder in the DataShop project, I want to capture and publicize the data-integrity issues discovered with data sets so that data is better documented (and so we've fulfilled a promise to our funders to better document data). -- Ken Koedinger, Team Meeting, 8/15/2009
* As a user of DataShop, I want to discuss datasets and have that discussion attached to the dataset so that others can comment and better understand any data-integrity issues I've found.

==== Linking to internal pages ====

* It would be handy if a link can be saved to any dataset, sample, page in the DataShop. -- [[Alida]], 10/18/2007
** Currently, https://pslc-qa.andrew.cmu.edu/datashop/DatasetInfo?datasetId=793 works if you are already logged in.

==== Have a link from the DataShop to the Theory Wiki (Dataset to Project Page) ====

* Can we link from the dataset to the project page on the Theory Wiki? In the pipeline have a clickable link to the project page (make project name clickable). -- Michael Bett, ET Mtg 11/14/2007
** Link to a dataset directly? Is that obvious to users? Click on dataset link -> log in -> redirected back to dataset. -- Brett Leber

== Data Format ==
==== Capture the question prompt and answer choices the student chose from ====
* As a research, I want to be able to identify steps based on the question prompt—not the difficult-to-understand step names that come from selection and action of my tutor—so that I can analyze the data more easily. For multiple choice questions, I also want to be able to see all of the choices that were available to the student. -- Eli Silk - February 1, 2011 (meeting with Brett and Ross Higashi of the FIRE project)
** Near-term solution is to create a table locally that maps steps to prompts
** Another solution is to ask CTAT team to modify their Flash components so that they log this information as custom fields
** Long-term solution is to make these fields into standard fields in the tutor message format
** In the email thread "Cognitive Tutor Development and Evaluation Requests", there is support for this general idea from Ken Koedinger, Albert Corbett, and Christian Schunn.
** Ken added that "Ideally, we may want to store any images that the student can see and where they reside on the screen ..."

== Data Modeling ==
=== Non-KC Modeling ===

==== Automatic Distillation ====
* As an educational data miner wishing to develop a machine learned model with PSLC data, I would like to be able to automatically distill data features (e.g. custom fields) commonly used in past educational data mining research for a new data set (see, for instance, Baker, Corbett, Roll, & Koedinger, 2008 in UMUAI) -- Ryan Baker, Summer 2008, Startup Memo
** Could be implemented as a plug-in
* Also interested in this feature idea. -- Dan Franklin, Oct 2008

==== Upload model and apply it to new data set ====

* As an EDM researcher, I would like to take a model, expressable as a linear formula on DataShop fields, or a simple code procedure (e.g. Bayesian Knowledge Tracing, which Ryan has code for), and apply it to a new data set, so that I can ... -- Ryan Baker, Sept 2008
* Also interested in this. -- Maxine Eskenazi, Sept 2008
* May work best as a plug-in
** Code to display GUI to choose which data sets to use, calls model code, re-import to DataShop
** Good to have a way to apply many models, as soon as you import a data set
* Phil has an idea that maybe fits within this one. Please move if there's a better category. -- Brett Leber
<blockquote>This [''transaction? kc? --ed.''] relabeling is really mostly about enabling modeling in DataShop right? With this in mind, I think that it is actually a higher priority to have model alternatives in DataShop.... E.g. Investigators should be able to give you chunks of Java code according to a certain specification, and DataShop should be then able to run these over datasets (perhaps after a certain series of QA occurs according to an SOP) when the investigator clicks some button in DataShop.... Obviously this is a much large project than adding columns, but it is also much more important in my mind. ''-- Phil Pavlik, email to Brett on 1/14/2009''</blockquote>
* Examples:
** Example: running gaming detector in multiple tutors and comparing gaming frequencies
** Example: applying Bayesian Knowledge Tracing to a new data set from the same LearnLab
** Example: applying Ben Shih's models to many data sets. Note that Ben Shih should be included in design of this feature; he is interested, and has a lot of good ideas.

==== Add Different Predicted Values ====
* Would also like to add statistics, different predicted values than what LFA produces. -- Ken Koedinger, ET Meeting, 10/10/2007

==== Bayesian Knowledge Tracing ====
* Bayesian Knowledge-Tracing built into DataShop like LFA is. -- Ryan Baker, Startup Memo, Summer 2008

==== Richer statistics for KC modeling ====

* In addition, to the model stats and estimates generated for learning factor models, we should also create difficulty factor models (i.e., ones with "Slope" parameter). The latter is particularly relevant for the Unique-Step model where the slope parameter is meaningless (but still counts against the BIC value). -- Ken Koedinger, Email "new feature request", 1/22/2009
** Relatedly, we should report significance values on the Slope parameters -- that is, when is the Slope significantly different from 0.
** The KC models page perhaps should also report the log-likelihood and number of parameters (in addition to BIC) and leave out AIC. We might also consider other metrics of model generality, like the "adjusted R2" (if I have this name right -- Joe Beck mentioned in the Assistments meeting yesterday).
** These changes will be part of meeting the CMDM goal of improvement in (or at least demonstrate acceptability of) the cognitive models in 90+ units in our LearnLab courses (or affiliates).

=== KC Modeling ===

==== LFA/AFM: Check if enough memory using formula ====
* The LFA/AFM code could calculate how much RAM would be needed to run the algorithm on a given skill model using the formula provided by Hao. This formula is based on the number of transactions, number of students and number of skills. Right now it will not schedule itself to run on a model with over 300 skills, though there is a manual override. [[User:Alida|Alida]] 13:35, 29 November 2010 (EST)

==== Create KC Models through Web Services ====
* For John Stamper's CMDM project, it would be nice to automatically update KC Models through web services. -- [[User:Koedinger|Ken Koedinger]], DataShop Team Meeting, Sept 11, 2009

==== Automatically discovering new KC model ====

* Would it be possible to run some code (perhaps Hao's KC model selection code, perhaps something else generated by CMDM thrust) to find new best KC model. -- Vincent Aleven, Sept 2008
* As a learning sciences researcher, I would like DataShop to discover a new/better KC model for me.
* Could be done as a plug-in

==== Generate new KC Models with LFA ====
* Not sure who asked for this.
* It would be nice to generate new KC Models with Hao's LFA code
* Would need to specify factors.
* Ideas on where this could run?
** On a separate server? Request it to be run, specify duration. Have separate server queue up requests, email user when done.
** In Java Applet on client machine? -- Phil Pavlik

==== Same Skill Twice on Same Step ====
* Would like to be able to apply the same skill to a step twice during a KC Model Import. -- Ken Koedinger, email, 2/4/2009

==== Save KC Model Import Files ====
* KC Model Import - save the file used to create the KCMs in case we need to recreate them. -- Ken Koedinger, email 3/4/2009

==== Log Likelihood and MAD ====
* Log Likelihood, MAD (mean absolute deviation) problem, MAD step (store and show) -- Hao Cen
** This is a variation on "Richer statistics for KC modeling" above. Probably should be merged. - Ken Koedinger

==== Better naming for KCs in auto-generated Unique-Step KCM ====

As a researcher, I want the KCs in the Unique-Step model to have better names than KC1, KC2, etc, so that I can easily tell which generated KCs go with which unique step.

* Hui Cheng (Email 1/20/2009), Ken Koedinger (Email 1/22/2009)
* Could you put this in your new feature request list: could the “Unique-step model” be better labeled then just “KC1”, “KC2” and etc? For example, for Assistments data, you could use part of the “Step Name”. -- Hui Cheng, Email 1/20/2009
* But, anything is better than "KC". -- Ken Koedinger, email, 1/22/2009
<blockquote>A simple alternative, that preserves uniqueness and addresses length, is to concatenate: 1) the first K letters of the step name 2) a unique numerical increment (same as the "3" in "KC3"). Note that (2) guarantees uniqueness just as it does in the current "KC<num>" scheme. Or perhaps better given that step names are often scoped within problems, is to concatenate: 1) the first L letters of the problem name 2) the first M letters of the step name 3) a unique numerical increment (just like the "3" in "KC3") I think K or L+M should be as big as possible without making the KC names indistinguishable (because they run off the right margin) in the KC list on the Learning Curve and other pages. </blockquote>

==== Visualize Learning Curve Split ====
* Be able to visualize a learning curve split into 2 based on a specification of a subset of problems. -- Albert Corbett, Math CCM, November 2008
* Harder: make me a new KCM out of it. -- Ken Koedinger, Team Mtg, Dec 5, 2008, while trying to describe Albert's request to Alida

==== Statistical Significance ====
* Can DataShop determine if the difference between conditions or learning curves is statistically significant? -- general theme at workshop, probably mentioned by Bob Hausmann in his talk, Winter Workshop 1/23/2008
** Can't do it yet in DataShop, but I can show you how to do it in R (or SPSS...) after you've exported the data -- export the "student-step rollup" than the whole transaction table. --[[User:Koedinger|Koedinger]] 16:30, 16 September 2009 (EDT)
*** This comment perhaps belongs (exists?) elsewhere: The current student-step rollup export (relly, all exports) should be such that I can immediately load it into R (and other packages) without error. Now errors occur, for instance, because there are "#" in the variable names i the student-step rollup. However, it is currently a road-block for helping folks like Bob do analyzes they want to do. --[[User:Koedinger|Koedinger]] 16:30, 16 September 2009 (EDT)
*** See [[Condition in Student-Step Rollup]] -- [[User:Alida|Alida]] 10:35, 17 September 2009 (EDT)

==== Split and Merge Skills ====

* Hand searches though a p-matrix for a dataset to split and merge skills. (Pie in the Sky) -- Ken Koedinger, Team Mtg, 02/22/2008

==== Notes on new KCMs ====

* It would be good if I could add a note to a KC model that was newly imported. -- Noboru Matsuda, email, Nov 19, 2009

==== Display Learning Curves Grouped by Interestingness ====

* The page displaying all the learning curves today seems to be alphabetically sorted by KC name, which is not necessarily meaningful.
* An alternative presentation is to group the curves into 4 sets, breaking up the page. Set 1 has curves that contain significant spikes, and therefore seem to be "low-hanging fruit" for purposes of breaking up into KCs. Set 2 has curves with few spikes, but they have a long X axis, suggesting that students are presented with too many opportunities to acquire those KCs. Set 3 has the "good" curves, i.e., nicely decreasing curves that are not too long. Set 4 is "other". -- [http://www.pitt.edu/~goldin Ilya Goldin] 7 December 2009

==== Order KC models according to AIC====
* This is based on results from multiple dataset analyses that compared AIC, BIC and loglikelihood to cross validation RMSE. -- Mimi McLaughlin, 2/9/2011

====Use log of opportunity count for AIC and BIC calculations====
* We compared using the log of opportunity count to whole number opportunity count in multiple datasets. We found the results for log of opportunity count to be consistently better, though small. -- Mimi McLaughlin, 2/9/2011

== Developer Requests ==

==== Store Converted Date and Converter Info ====

As a DataShop administrator, I'd like to see the converter information (version and date) stored in the database, so that I do not have to store that data manually in the Additional Notes field each time I load a dataset. -- Kyle, 8/5/2008

* DTD new fields:
** Store conversion and converter information in database (anything else?)
*** conversion_time
*** converter_info
* Email from Octav, 10/5/2007

I see there's also a note about the converter version in the Dataset
Info. Which is good, but it seems it's taken from the directory name
when I submitted the set. I don't know how reliable that is.  :-) It
would be better if it's taken from the new converter info field.

==== Convert from XML to tabbed-delimited format ====

* If the users agree that export format is valuable, then maybe if they could convert from XML to export format to see data in Excel, could look at Selection column and see blanks more easily. -- Jonathan Sewall, ET Mtg, 10/10/2007

==== Plug-ins (general issues) ====

* Please please support Ruby on Rails. -- Ben Shih, December 2008

==== Create UI to grant DataShop user roles ====

* Already tedious.
* Alida, User Meeting AAR on December 9, 2009

== Help ==

==== Specialize Label of Help Button ====
* Since this help is better than in most applications, it should say more than just 'Help'. -- Ken Koedinger & Ryan Baker, Team Mtg, May 30, 2008
** Ideas:
*** 'Page Help'
*** 'Help with this page'
*** 'Help with Learning Curve page' (Ken's favorite)
*** 'Help with this tool'

== Home Page ==

==== Redesign the Home Page ====

* In the menu of data sets at the top, include the N (=20 or as many as fit on the screen?) data sets that I have visited starting with the ones I've visited most recently. --[[User:Koedinger|Koedinger]] 16:16, 16 September 2009 (EDT)
** As of today, the menu shows the last 10 data sets, most recently visited at the top. I think the feature is good enough, but let's ask Ken --[[User:Bleber|Bleber]] 10:56, 6 August 2010 (EDT)
* There needs to be a better ordering for the datasets (DS364)
* Maybe a search to filter the list of datasets since the list is so long. -- Brett Leber, 6/14/2007
* Going back to the home page always goes to 'My Datasets' (DS313)
* Maybe show more high level stats on this page, like how many transactions [done Jan 2012], students, skill models
* Allow users to post and share project documentation (files, papers, other meta-info) -- Ruth Wylie, suggested during meeting on 8/4/2010. She had a file that is relevant to multiple datasets in her project.

==== News Feed ====
* Bruce McLaren, email, 4/20/2012
**I wonder whether it would be possible to have a scrolling "News" feed somewhere on the DataShop site that would keep researchers informed about what is happening with the DataShop. For instance, the "News" feed could always show the last time a data conversion finished, announce an upcoming workshop, or inform everyone of critical DataShop issues, such as the fact that the DataShop had a server go down recently (which I know caused you guys a lot of headaches - but which I didn't hear about until a couple weeks after it happened). This kind of thing could be a great communication tool and, as an added bonus for DataShop personnel, avoid lots of email with questions like "where is my data?" or "when is the next conversion going to finish?" Perhaps it would even be possible to have the data conversion routine automatically update the "News" feed each time it begins and/or finishes processing?
** Take it with a grain of salt -- I know you have lots of things on your plate -- but I have been in the situation often, especially just before and during my studies, where I wasn't sure what was going on with DataShop conversions and issues and had to track down someone -- typically Alida -- to figure things out. I have the advantage of sitting right next door to Alida, but I wonder how many other researchers within the PSLC, those not in close proximity to Alida, deal with this issue of not really knowing what is happening with the DataShop at any given time.
* Jonathan Sewall also requested a page that shows the status of the log conversion process, including how much data was processed and for which datasets

== Import ==
== Miscellaneous ==

==== Analyses by LearnLab ====

* Organize data by LearnLab, not by "data set". -- Ryan Baker, Aug 2008
* Also: Bob Hausmann, Sep 2008; Maxine Eskenazi, Sep 2008
* Essentially, current data sets become samples, but the top-level unit is the LearnLab. You can take every data set in a LearnLab together as a sample.
* Implies being able to run analyses across data sets, and export multiple data sets together; to create multi-data set samples
* As a user of DataShop, I would like to look at learning curves for all Algebra data together (for example), or export all Algebra data
* Important long-term, but is a lot of work -- in particular, we need to solve scalability issues first.

==== Save Settings Between Sessions ====
* It would be useful if DataShop could save settings between sessions. -- Bob Hausmann, User Meeting, 2/1/2008
** "I do a lot of redoing the same steps" (e.g., set cutoffs, select a KC model, select students).

==== Multiple steps per transaction ====

* Needed so that we do not have to create multiple transactions for the same actual action for Andes logs. -- Kurt van Lehn, Feb 2007

==== Demographic data ====

* This has been mentioned by NSF visitors, AB, ESL, and some researchers.
* Also mentioned at Winter Workshop 1/23/2008.
** Derek/Sue-mei: Student background information not in DataShop. Would like to see a student or set of students from a particular demographic, and view them across datasets!
* Note that Gail added demographic data to Additional Notes field on the Dataset Info page for many datasets. The idea here is to put that data into the database somewhere.

==== Single Sign On ====
* Michael Bett, email, 10/8/2007
* It would be nice if the following services have a single login account/password:
*# Theory Wiki
*# Learnlab.org
*# ESL's OSS
*# DataShop

==== Reveal unanonymized student IDs ====

As a researcher/PI performing research assistant tasks, I want to see easily the unanonymized student IDs of subjects in DataShop so that I can email my subjects telling them when to use my system.

* Ruth Wylie, July 3, 2008
* As the honest broker of the PSLC data, I have promised to not reveal the student IDs and to protect the identity of the students. This is part of the DataShop IRB. Therefore, I do not see this request as possible. [[User:Alida|Alida]] 09:53, 4 September 2009 (EDT)
** At some point, some coordination would be good with OLI's Digital Dashboard project that Marsha Lovett (and the OLI team and sometimes me) is working on. This is meant to provide usage information more quickly to instructors. It could also perhaps be used by researchers (with the right IRB rights) in situations like Ruth's.

==== Knight Timeline ====
* Developed by Andrea Knight, 2004

==== Buggy Skills ====
* Ken Koedinger, prototype walkthrough 9/11/2006

==== Confusion Matrix ====
* Brian MacWhinney, prototype walkthrough 9/11/2006

== Navigation Bar ==
==== Filter KCs by Name ====

As a researcher working with KCs, I want to filter KCs based on their names, so that I can...
* Vincent Aleven, Email, 2/3/2008
* "Since Alida said you cannot have two mechanisms for putting together your KC set (i.e., cannot have both the selecting-by-clicking and selecting-by-filtering), I would probably opt for the latter."
* Alida: I thought Vincent mentioned that he'd like to select which KCs are in a set by filtering on the name. Example: Include KCs with '*reason*' in the name and exclude KCs with '*given*' in the name.
* This could be an addition to our v3.0 KC-selection mechanism--filter by name.
* Vincent, Email, 5/6/2009: Expressed another need for this feature. Could just allow for a wider area and longer list so that more items can be checked at once. The number of characters we show right now is not enough because in many cases that number of characters is the same across many of the skills. Reference data set: Geometry CWCTC 2005-2006

* Status: Design Started

==== Facebook-style KC Selection ====

As a researcher working with KCs, I want to select KCs based on the learning curve thumbnail, so that I can see quickly which ones I'm interested in exploring more deeply.

* Feature already designed for v3.0, not implemented due to time constraints.
* Agreed this would be really useful. -- Kirsten Butcher, User Mtg, 1/31/2008
* Status: Guestimate: 20 days, need to revisit requirements document

==== Feedback after clicking a large sample on a large dataset ====

As a user of DataShop (first-time or not), I want some feedback and the ability to cancel after I do something that might take a long time (e.g., clicking "All Data" on a large dataset) so that I do not get stuck.

* Part of the Susan Goldman story
* After clicking a sample for a large dataset, there is no "Loading..." text, no feedback that the click was even registered by the app (besides the sometimes busy cursor and small browser "loading" text), nor the ability to cancel the action. 
* We will always have similar problems even if performance is improved, so providing feedback and the opportunity to cancel is critical.

==== Save Button in Problem Navigation Box ====

* Save buttons in the sidebar. -- Ken Koedinger, Mtg 2006
** Could also put one in the Problem selection box in the sidebar.

==== Make Nav Bar Wider ====
* Make the Sample name and description fields much wider. -- John LaPlante, email 7/10/2007

== New Visualizations/Reports ==

==== Student-KC Rollup ====
As a researcher, I want to see KCs rolled up by student, so that I can ...

* Vincent Aleven, User Mtg, 1/29/2008
** By Student-KC would be more useful than by Student-Problem
** Example: # Steps asking for a hint or error or what proportion had help
** How often bottom out hint occurs

==== Instructor Reports ====
* Phil said he received a lot of positive reactions to providing reports on units for instructors. -- Phil Pavlik, ET Mtg 10/10/2007

==== Manage Authorizations/Projects Page ====
* Lisa Anthony, email 10/23/2007
* Allow PI to change permissions on the datasets.
:"Actually, I couldn't see how to change permissions on the datasets from the website. Is this possible? If not, it might be a nice feature..."

==== Calculate Time Spent on Different Study Activities ====

As a researcher, I want to know how much time, on average, students spend on study activities, so that I can ...

* Bruce McLaren, Email, 4/7/2009
<blockquote>
For my most recent stoich study, Shawn and I are interested in calculating timing information such as: (a) how long students spent, on average, working on individual tutors (b) how long students spent, on average, on all items in an intervention (c) how long students worked, on average, on post-tests. 

Timing information is very commonly required for studies, and can be calculated from DataShop logs relatively easily, so even if we don't have it, might be worth considering. (And we don't want to re-invent the wheel, if you already have it or are planning it...)
</blockquote>

==== Incorrect Step Duration and Hint Step Duration ====

As a researcher, I want to be able to see total step duration if the student's first attempt was an incorrect attempt, and total step duration if the student's first attempt was a hint request, so that I can do some analyses that I can't do with "Error Step Duration".

* Bob Hausmann, email, 11/11/2008.
* Updated title and story with 'step duration' instead of 'time'. -- [[User:Alida|Alida]] 10:36, 4 September 2009 (EDT)

==== Grading ====

[[Grading]]

====Display number of steps and number of observations for skills====
* How and where to be determined by developers. -- Ken (entered by Mimi)

== Reports ==
=== Dataset Info ===

==== Pointers to Hard-copy Data ====
* Brett van de Sande, NSF Site Visit, 5/28/2008
* Pointers to hard-copy data such as paper tests and/or homework.  Include contact information.  It doesn't seem to make sense to scan a whole filing cabinet of paper if no one wants to look at it.  And any secondary researchers don't know about the filing cabinet to ask for it.

==== Sort Problem Breakdown Table ====

* Would like the ability to sort the table by clicking on the column headings of the Problem Breakdown Table on the Dataset Info Tab. -- Bruce McLaren, User Mtg, 11/5/2007

==== Rename dataset ====

As a researcher, I want to rename my dataset so that it makes more sense to other people. I also want to make sure the dataset doesn't become polluted later by new data not associated with my study.
* Ruth Wylie, July 3, 2008
* There are reasons she would want to do this (current name is worthless, other researchers might try her tutor and pollute her data) but also reasons for not doing it (log more data later).
* There are risks in changing a dataset name that might not be apparent. For example, if you want the new data in the same dataset. [[User:Alida|Alida]] 10:27, 4 September 2009 (EDT)

==== Average time per problem, average number of problems, total number of sections ====

In addition to showing student hours per dataset, it would be useful to know the average time spent per problem, average number of problems completed, and the total number of sections.
* Noboru Matsuda, June 18, 2010

=== Error Report ===

==== View By Student ====

* Would like to see what a couple of students saw in the feedback. -- Marsha Lovett, 10/11/2007

==== Export ====
* I would like the ability to export this data. -- John LaPlante, email thread 'Suggestions for Improvement' 7/10/2007
* Also interested in this feature idea. -- Bruce McLaren, User Mtg, 11/5/2007

==== Sort ====

* Was planned for but not implemented in v2.1 (estimated to be a 4 day task)
** By Correctness %, starting with the least correct
** By Hints %
** Step (or KC if view by KC)
** Number of Students
* Ability to sort problems by their average experienced position within the curriculum. -- Ken Koedinger, 02/16/2007
** Which problem did students most often experience first, then the one experience second most often, ...
* Order steps by the order they typically are executed by students. -- Ken Koedinger, email 11/7/2008
:"Searching through the steps in a problem to get a sense of what is going on is currently hard because the steps are ordered alphabetically, not by the order in which most students did them. While not all students do all steps in the same order, there is some regularity there. It would be quite useful if the steps could be ordered in a "typical order". This could be accomplished by using the time stamps (of the first (correct?) transaction?) for each step to determine rank order of the each step for each student in a problem and then for each problem average the rank order of each step across all students. Then arrange the steps in the Error Report by their average rank order -- that is, roughly speaking, the step that is most likely to be first across students (closest on average to first) goes first, the step with the next lowest rank goes next, etc."
* The capability to count the number of errors of each message type and sort in different ways, for instance by all errors that had no messages. -- Bruce McLaren, email 10/22/2007
:"This is an error analysis I recently did in Excel, using pivot tables, that might be handy if in the DataShop. This one is very important for tutors because the errors that occur most frequently, yet don't elicit messages to the students, are good candidates to become errors with feedback."

==== Show more than 500 problems ====

In the error report, can we see more than 500 problems? See set "Cog Model Discovery Experiment Spring 2010"
2/2888 selected.
(Showing the first 500)
-- Ken via email on 2/15/2011

=== Export ===

==== Include Step Start Time in transaction format ====

* The transaction-level export already includes problem start time; could it also include step start time? I can easily compute it myself, but it seems there's a specific algorithm that the student-step rollup uses, and it might be nice to include the same value here. --Ilya Goldin, email on 01/16/2012

==== Last attempt on step? column for transaction format ====

* Include a new column that shows whether the row is the last attempt on a step for a student or not. Could be 0 or 1 as value. Helpful for researchers who are doing grading of data. Transaction format. --Vincent Aleven, CTAT mtg 11/5/2010

==== Elapsed Time ====

* Include the elapsed time in preview and transaction export. It is more valuable than the transaction time as an absolute reference. Possible to keep both. --Ken Koedinger, Team Mtg 04/18/2008

==== SQL Format ====

* Option to export as an SQL file. -- Ken Koedinger, 03/26/2007, also brought up in June ET Meeting
** Ability to export an SQL dump of a dataset. --Kyle Cunningham, 04/03/2007

==== Specify Character for Blanks ====

* Ability to specify what character if any is used for blank. --Ryan Baker, email 8/9/2007
:"Not all tools handles TABTABTAB correctly on import. The period '.' is used to mean missing data in most stats packages. The word 'BLANK' is used in some other ones. Not an issue for Ryan as he wrote a preprocessor to convert blanks.

==== Opportunity (at KC) Count in Transaction Export ====
* Include the opportunity count in the transaction export (it's only in the student-step rollup) -- Noboru Matsuda, 10/08/2009
** Would be repetitive
** Would need one Opportunity column per KC

==== Export only rows that have KCs tagged ====
* Is it possible when a exporting dataset to include only the [transaction?] rows that have knowledge components tagged? And the same goes for KC models export, only include the items that have KC tagged? -- Hui Cheng, 01/19/2010
** We have the inverse of this option on the Performance Profiler, 'Include steps without a knowledge component', and with the Student-Problem export, 'Include Steps without KCs'.

==== Include Custom Fields in Student-Step Rollup ====
* Is it possible to include the custom field in the datashop [step] export? -- Hui Cheng, 03/01/2010
** "I am thinking about the student-step level export. We have a student from Statistics wanting to use assistment data from datashop. She wants to have problem set id (=curriculum id in datshop). For Assistment data, we decided to put problem set id in the custom field because in the Assistment a lot of problems are repeated in different problem sets. She needs data at the student-step level." (Hui email)
* Vote from Phil Pavlik too -- see [[Collected_User_Requests#Include_transaction_custom_fields_in_web_services_student-step_export|Phil's comment under Web Services]]

==== Don't duplicate rows in student-step format when not showing KCs ====
* If the checkbox to show knowledge components is not checked, maybe it doesn't make sense to show rows more than once if they have more than one KC associated with the step. (Mimi (and Brett) stumbled on this. 8/16/2010)

==== Student-Step Rollup include Success Column ====
* Step rollup, 1 if correct, 0 if incorrect/hint, blank otherwise call it Success. ~~ Ken Koedinger, DataShop Team Meeting, Oct 22, 2010

=== Learning Curve ===

==== Default sort by observation on LCPID ====

* Default sort by observation on LCPID. ~~ Ken Koedinger, DataShop Team Meeting, 10/22/2010

==== Purple Point ====
* Purple Point: if a point on the LC has more than one KC associated with it but you have drilled down to a given KC, then the blue line is off. We could put a purple point that takes this into account.
* Simpler thing: display a warning message that some points in the display are driven by other KCs
* Pearson may be interested
* This was mentioned during the PSLC Summer School 2010.
* For a step with multiple skills, attribute the error only to the skill with the highest overall error rate. ~~ Alida, meeting with Ken, November 18, 2010

==== Reduce Scrolling ====
* Add a forward and back button to the graph to reduce scrolling. -- John LaPlante, email 7/10/2007

:"The learning curve page could use a forward and back button to cycle through learning the learning curves. Going through them one by one requires a lot of scrolling."

==== Turn On Point Labels ====

* It would be nice to have the option to turn on point labels. It is nice that I can mouse over a point and view the data but it would be nicer if it appeared automatically. -- John LaPlante, email 7/10/2007

==== Option for Bigger Graph ====

* Allow user to see bigger graph. -- Derek Chan, Winter Workshop 1/23/2008
** Potential solution: enable user to set x, y scale manually

==== LC Normalize Scale of Thumbnails ====
* [[LC Normalize Scale of Thumbnails]]

=== Performance Profiler ===

==== Rename Performance Profiler ====
* John laPlante, email thread 'Suggestions for Improvement' 7/10/2007
** Did not use this report as thought it had something to do with improving the performance of the DataShop itself. This report might have been much better to use then the Error Report, would still need an export as using the data in a tabular form was still necessary. Note that the pivot tables created were added to the dataset (Pittsburgh Science of Learning Center Stoichiometry Study 1).
* Lisa Anthony, email 8/2008
** Didn't know to go to the report. Export would be useful.
** Needed a better definition of Error Rate with respect to Problem and Unit rows.

==== Export ====
* John laPlante (see comments in Rename Performance Profiler)
* Lisa Anthony (see comments in Rename Performance Profiler)
* Yes, I too can imagine wanting to export the results of a particular performance profiler output (i.e., to a table) so that I can graph it my own way. --[[User:Koedinger|Koedinger]] 16:24, 16 September 2009 (EDT)

==== Table View ====
* Add option to switch to a table view.
* Columns are: Problem Name, Steps, % incorrect, Incorrect Steps, % hint, Hint Steps, etc. include all values in pop-up.

==== Union of KCs/Problems/Students ====
* Allow user to get the union of KCs/Problems/Students etc so they can compare across samples easier. -- Kirsten Butcher, Winter Workshop 1/23/2008
:"It is difficult to compare performance profiler graphs across samples because the KCs (or problems, or whatever) aren't necessarily in both of those samples."

==== Show Details In Report ====
* Click on bar to see details in report and not just in pop-up. It disappears too quickly. -- Alida, Brett

==== Show More Information in the Graph ====
* Show more information in the graph: [Bruce M, User Mtg, 11/5/2007]
** Had drilled down by a certain skill - skill is not listed in the graph, user has to check the skill list on the LHS to see what skill was selected
* Maybe related to Bruce's "show more info in graph": show the actual value of the range variable. e.g., when range is error rate, show the error rate number somewhere (right now you can see, via mouse-over, the incorrect, hint, and correct percentages, but not the error rate) -- Brett
* Design idea to show # steps incorrect, # steps hint, # steps correct to clarify how the percentages are calculated. -- Alida, Brett

== Sample Selector ==

==== Sub-Samples ====

* Sub Samples would be helpful. -- John LaPlante, email 7/10/2007
:"A nice solution would be to have sub-samples where one property varies. When I'm doing this analysis, I've changed my sample many times, renaming it sometimes, tweaking it to get variations on the data. The samples are really useful but they could help me a lot more with this kind of experimentation."

==== Filter out students ====

As a researcher, I want to filter out test users (including myself) from my data so that I see less noise in the data.

* Ruth Wylie, July 3, 2008
* You can already filter out test users by using a test user id that starts with 'weirdCMUuser_xxx'. Then create a sample that excludes students with a name like 'Test_%'. --[[User:Alida|Alida]] 10:29, 4 September 2009 (EDT)

==== Filter by "Class" ====

As a researcher, I want to create samples based on "class" since class is how I've encoded my conditions.

* Maaike Waalkens, July 30, 2010
* This is what made sense using Mathtutor for tutor delivery.
* What other fields are we missing in the Sample Selector?

==== Filter by "Step" ====

As a researcher, I want to create samples based on "step" since I'm only interested in one particular step.

* Mimi McLaughlin, August 17, 2010

==== Filter by "Problem View" ====

As a researcher, I want to exclude transactions from my sample where problem view is greater than 1.

* Michael Yudelson and Summer School participant, August 8, 2012

==== Create Sample Automatically ====

* Would it be possible for me to get a random sample from the 'Bridge to Algebra 2006-2007' dataset of 100 students? I am having trouble looking at the data because it takes too long to load, and my adviser thought that was because the dataset was too large. ~~ DataShop User, 10/19/2010

== Web Services ==

==== Include transaction custom fields in web services student-step export ====

* "I noticed in the web services guide that cfs (which provides the custom fields) is not yet implemented for step roll-up tables. It is marked [coming soon] in the manual... I could write a workaround to pull in the transactions and lookup the custom fields, but I'd really rather not." -- Phil Pavlik, 4/27/2011

* "It would be best if the CF for step rollup was drawn from the primary transaction for the first attempt that is correct or incorrect. At least that seems like the generally best value. Basically, there may be many cases where the custom field is the same across all transactions for a step. In this case ... you could just use the first one since they are all the same." -- Phil Pavlik, 4/29/2011

==== Use Custom Fields Graphs/Reports ====
In Graphs

As a researcher creating custom fields and assigning values at the transaction level, I want DataShop to perform the aggregation to the step level so that I can do other things with my custom-field variable such as graph it. -- Ryan Baker, mtg w/Alida & Brett, 12/15/2008

Performance Profiler
* Use all discrete variables/custom fields on the left and all the continuous variables on the bottom of the Performance Profiler. -- [[User:Koedinger|Ken Koedinger]], DataShop Team Meeting, Sept 11, 2009

==== Ad-hoc queries ====

* Allow restricted filtering on steps and transactions as the next web service feature (after CFs), whatever they can filter on in the navigation boxes (User Meeting AAR, December 9, 2009)

==== Sample creation as a web service ====

* Sample creation is still too slow. (User Meeting AAR, December 9, 2009)

==== Ability to directly query the database ====

* I am looking for a unique identifier for student->problem-step->problem-view. I am able to compute this in R, but it would be better to be able to query the DataShop database to get this unique identifier instead of having to recreate DataShop (essentially) in R. -- Ilya Goldin, 5/21/2012

 
----
See completed [[DataShop 3.x Features]] 
See on-going [[DataShop 4.x Features]] 
See prioritized [[DataShop Feature Wish List]]
[[Category:Protected]]
[[Category:DataShop]]

Collected User Requests

2012-05-21T17:32:23Z

Bleber: /* Web Services */

See prioritized items on [[DataShop Feature Wish List]].

== Annotations ==
* Some links from Ruogo Kang's (CMU PhD student, Sara Kiesler) recent talk. -- Ken, email, 8/24/2011
** http://vis.stanford.edu/papers/senseus
** http://vis.berkeley.edu/papers/commentspace/

==== Annotations on Transaction Level ====
* I have models which can annotate things like: gaming, bored, etc. on the transaction level. -- Ryan Baker, ET Mtg 12/5/2007

==== Annotations on Student Level ====
* Would annotate on student level. -- Ido Roll, User Meeting, 1/19/2009

==== Annotations on Pages ====
* See the cool thing created by Jeffery Heer where all the settings of the page were recorded with the comment. -- Ryan Baker, DS Team Mtg 5/23/2008

==== Dataset Discussion - Capture data-integrity issues ====
* As a stakeholder in the DataShop project, I want to capture and publicize the data-integrity issues discovered with data sets so that data is better documented (and so we've fulfilled a promise to our funders to better document data). -- Ken Koedinger, Team Meeting, 8/15/2009
* As a user of DataShop, I want to discuss datasets and have that discussion attached to the dataset so that others can comment and better understand any data-integrity issues I've found.

==== Linking to internal pages ====

* It would be handy if a link can be saved to any dataset, sample, page in the DataShop. -- [[Alida]], 10/18/2007
** Currently, https://pslc-qa.andrew.cmu.edu/datashop/DatasetInfo?datasetId=793 works if you are already logged in.

==== Have a link from the DataShop to the Theory Wiki (Dataset to Project Page) ====

* Can we link from the dataset to the project page on the Theory Wiki? In the pipeline have a clickable link to the project page (make project name clickable). -- Michael Bett, ET Mtg 11/14/2007
** Link to a dataset directly? Is that obvious to users? Click on dataset link -> log in -> redirected back to dataset. -- Brett Leber

== Data Format ==
==== Capture the question prompt and answer choices the student chose from ====
* As a research, I want to be able to identify steps based on the question prompt—not the difficult-to-understand step names that come from selection and action of my tutor—so that I can analyze the data more easily. For multiple choice questions, I also want to be able to see all of the choices that were available to the student. -- Eli Silk - February 1, 2011 (meeting with Brett and Ross Higashi of the FIRE project)
** Near-term solution is to create a table locally that maps steps to prompts
** Another solution is to ask CTAT team to modify their Flash components so that they log this information as custom fields
** Long-term solution is to make these fields into standard fields in the tutor message format
** In the email thread "Cognitive Tutor Development and Evaluation Requests", there is support for this general idea from Ken Koedinger, Albert Corbett, and Christian Schunn.
** Ken added that "Ideally, we may want to store any images that the student can see and where they reside on the screen ..."

== Data Modeling ==
=== Non-KC Modeling ===

==== Automatic Distillation ====
* As an educational data miner wishing to develop a machine learned model with PSLC data, I would like to be able to automatically distill data features (e.g. custom fields) commonly used in past educational data mining research for a new data set (see, for instance, Baker, Corbett, Roll, & Koedinger, 2008 in UMUAI) -- Ryan Baker, Summer 2008, Startup Memo
** Could be implemented as a plug-in
* Also interested in this feature idea. -- Dan Franklin, Oct 2008

==== Upload model and apply it to new data set ====

* As an EDM researcher, I would like to take a model, expressable as a linear formula on DataShop fields, or a simple code procedure (e.g. Bayesian Knowledge Tracing, which Ryan has code for), and apply it to a new data set, so that I can ... -- Ryan Baker, Sept 2008
* Also interested in this. -- Maxine Eskenazi, Sept 2008
* May work best as a plug-in
** Code to display GUI to choose which data sets to use, calls model code, re-import to DataShop
** Good to have a way to apply many models, as soon as you import a data set
* Phil has an idea that maybe fits within this one. Please move if there's a better category. -- Brett Leber
<blockquote>This [''transaction? kc? --ed.''] relabeling is really mostly about enabling modeling in DataShop right? With this in mind, I think that it is actually a higher priority to have model alternatives in DataShop.... E.g. Investigators should be able to give you chunks of Java code according to a certain specification, and DataShop should be then able to run these over datasets (perhaps after a certain series of QA occurs according to an SOP) when the investigator clicks some button in DataShop.... Obviously this is a much large project than adding columns, but it is also much more important in my mind. ''-- Phil Pavlik, email to Brett on 1/14/2009''</blockquote>
* Examples:
** Example: running gaming detector in multiple tutors and comparing gaming frequencies
** Example: applying Bayesian Knowledge Tracing to a new data set from the same LearnLab
** Example: applying Ben Shih's models to many data sets. Note that Ben Shih should be included in design of this feature; he is interested, and has a lot of good ideas.

==== Add Different Predicted Values ====
* Would also like to add statistics, different predicted values than what LFA produces. -- Ken Koedinger, ET Meeting, 10/10/2007

==== Bayesian Knowledge Tracing ====
* Bayesian Knowledge-Tracing built into DataShop like LFA is. -- Ryan Baker, Startup Memo, Summer 2008

==== Richer statistics for KC modeling ====

* In addition, to the model stats and estimates generated for learning factor models, we should also create difficulty factor models (i.e., ones with "Slope" parameter). The latter is particularly relevant for the Unique-Step model where the slope parameter is meaningless (but still counts against the BIC value). -- Ken Koedinger, Email "new feature request", 1/22/2009
** Relatedly, we should report significance values on the Slope parameters -- that is, when is the Slope significantly different from 0.
** The KC models page perhaps should also report the log-likelihood and number of parameters (in addition to BIC) and leave out AIC. We might also consider other metrics of model generality, like the "adjusted R2" (if I have this name right -- Joe Beck mentioned in the Assistments meeting yesterday).
** These changes will be part of meeting the CMDM goal of improvement in (or at least demonstrate acceptability of) the cognitive models in 90+ units in our LearnLab courses (or affiliates).

=== KC Modeling ===

==== LFA/AFM: Check if enough memory using formula ====
* The LFA/AFM code could calculate how much RAM would be needed to run the algorithm on a given skill model using the formula provided by Hao. This formula is based on the number of transactions, number of students and number of skills. Right now it will not schedule itself to run on a model with over 300 skills, though there is a manual override. [[User:Alida|Alida]] 13:35, 29 November 2010 (EST)

==== Create KC Models through Web Services ====
* For John Stamper's CMDM project, it would be nice to automatically update KC Models through web services. -- [[User:Koedinger|Ken Koedinger]], DataShop Team Meeting, Sept 11, 2009

==== Automatically discovering new KC model ====

* Would it be possible to run some code (perhaps Hao's KC model selection code, perhaps something else generated by CMDM thrust) to find new best KC model. -- Vincent Aleven, Sept 2008
* As a learning sciences researcher, I would like DataShop to discover a new/better KC model for me.
* Could be done as a plug-in

==== Generate new KC Models with LFA ====
* Not sure who asked for this.
* It would be nice to generate new KC Models with Hao's LFA code
* Would need to specify factors.
* Ideas on where this could run?
** On a separate server? Request it to be run, specify duration. Have separate server queue up requests, email user when done.
** In Java Applet on client machine? -- Phil Pavlik

==== Same Skill Twice on Same Step ====
* Would like to be able to apply the same skill to a step twice during a KC Model Import. -- Ken Koedinger, email, 2/4/2009

==== Save KC Model Import Files ====
* KC Model Import - save the file used to create the KCMs in case we need to recreate them. -- Ken Koedinger, email 3/4/2009

==== Log Likelihood and MAD ====
* Log Likelihood, MAD (mean absolute deviation) problem, MAD step (store and show) -- Hao Cen
** This is a variation on "Richer statistics for KC modeling" above. Probably should be merged. - Ken Koedinger

==== Better naming for KCs in auto-generated Unique-Step KCM ====

As a researcher, I want the KCs in the Unique-Step model to have better names than KC1, KC2, etc, so that I can easily tell which generated KCs go with which unique step.

* Hui Cheng (Email 1/20/2009), Ken Koedinger (Email 1/22/2009)
* Could you put this in your new feature request list: could the “Unique-step model” be better labeled then just “KC1”, “KC2” and etc? For example, for Assistments data, you could use part of the “Step Name”. -- Hui Cheng, Email 1/20/2009
* But, anything is better than "KC". -- Ken Koedinger, email, 1/22/2009
<blockquote>A simple alternative, that preserves uniqueness and addresses length, is to concatenate: 1) the first K letters of the step name 2) a unique numerical increment (same as the "3" in "KC3"). Note that (2) guarantees uniqueness just as it does in the current "KC<num>" scheme. Or perhaps better given that step names are often scoped within problems, is to concatenate: 1) the first L letters of the problem name 2) the first M letters of the step name 3) a unique numerical increment (just like the "3" in "KC3") I think K or L+M should be as big as possible without making the KC names indistinguishable (because they run off the right margin) in the KC list on the Learning Curve and other pages. </blockquote>

==== Visualize Learning Curve Split ====
* Be able to visualize a learning curve split into 2 based on a specification of a subset of problems. -- Albert Corbett, Math CCM, November 2008
* Harder: make me a new KCM out of it. -- Ken Koedinger, Team Mtg, Dec 5, 2008, while trying to describe Albert's request to Alida

==== Statistical Significance ====
* Can DataShop determine if the difference between conditions or learning curves is statistically significant? -- general theme at workshop, probably mentioned by Bob Hausmann in his talk, Winter Workshop 1/23/2008
** Can't do it yet in DataShop, but I can show you how to do it in R (or SPSS...) after you've exported the data -- export the "student-step rollup" than the whole transaction table. --[[User:Koedinger|Koedinger]] 16:30, 16 September 2009 (EDT)
*** This comment perhaps belongs (exists?) elsewhere: The current student-step rollup export (relly, all exports) should be such that I can immediately load it into R (and other packages) without error. Now errors occur, for instance, because there are "#" in the variable names i the student-step rollup. However, it is currently a road-block for helping folks like Bob do analyzes they want to do. --[[User:Koedinger|Koedinger]] 16:30, 16 September 2009 (EDT)
*** See [[Condition in Student-Step Rollup]] -- [[User:Alida|Alida]] 10:35, 17 September 2009 (EDT)

==== Split and Merge Skills ====

* Hand searches though a p-matrix for a dataset to split and merge skills. (Pie in the Sky) -- Ken Koedinger, Team Mtg, 02/22/2008

==== Notes on new KCMs ====

* It would be good if I could add a note to a KC model that was newly imported. -- Noboru Matsuda, email, Nov 19, 2009

==== Display Learning Curves Grouped by Interestingness ====

* The page displaying all the learning curves today seems to be alphabetically sorted by KC name, which is not necessarily meaningful.
* An alternative presentation is to group the curves into 4 sets, breaking up the page. Set 1 has curves that contain significant spikes, and therefore seem to be "low-hanging fruit" for purposes of breaking up into KCs. Set 2 has curves with few spikes, but they have a long X axis, suggesting that students are presented with too many opportunities to acquire those KCs. Set 3 has the "good" curves, i.e., nicely decreasing curves that are not too long. Set 4 is "other". -- [http://www.pitt.edu/~goldin Ilya Goldin] 7 December 2009

==== Order KC models according to AIC====
* This is based on results from multiple dataset analyses that compared AIC, BIC and loglikelihood to cross validation RMSE. -- Mimi McLaughlin, 2/9/2011

====Use log of opportunity count for AIC and BIC calculations====
* We compared using the log of opportunity count to whole number opportunity count in multiple datasets. We found the results for log of opportunity count to be consistently better, though small. -- Mimi McLaughlin, 2/9/2011

== Developer Requests ==

==== Store Converted Date and Converter Info ====

As a DataShop administrator, I'd like to see the converter information (version and date) stored in the database, so that I do not have to store that data manually in the Additional Notes field each time I load a dataset. -- Kyle, 8/5/2008

* DTD new fields:
** Store conversion and converter information in database (anything else?)
*** conversion_time
*** converter_info
* Email from Octav, 10/5/2007

I see there's also a note about the converter version in the Dataset
Info. Which is good, but it seems it's taken from the directory name
when I submitted the set. I don't know how reliable that is.  :-) It
would be better if it's taken from the new converter info field.

==== Convert from XML to tabbed-delimited format ====

* If the users agree that export format is valuable, then maybe if they could convert from XML to export format to see data in Excel, could look at Selection column and see blanks more easily. -- Jonathan Sewall, ET Mtg, 10/10/2007

==== Plug-ins (general issues) ====

* Please please support Ruby on Rails. -- Ben Shih, December 2008

==== Create UI to grant DataShop user roles ====

* Already tedious.
* Alida, User Meeting AAR on December 9, 2009

== Help ==

==== Specialize Label of Help Button ====
* Since this help is better than in most applications, it should say more than just 'Help'. -- Ken Koedinger & Ryan Baker, Team Mtg, May 30, 2008
** Ideas:
*** 'Page Help'
*** 'Help with this page'
*** 'Help with Learning Curve page' (Ken's favorite)
*** 'Help with this tool'

== Home Page ==

==== Redesign the Home Page ====

* In the menu of data sets at the top, include the N (=20 or as many as fit on the screen?) data sets that I have visited starting with the ones I've visited most recently. --[[User:Koedinger|Koedinger]] 16:16, 16 September 2009 (EDT)
** As of today, the menu shows the last 10 data sets, most recently visited at the top. I think the feature is good enough, but let's ask Ken --[[User:Bleber|Bleber]] 10:56, 6 August 2010 (EDT)
* There needs to be a better ordering for the datasets (DS364)
* Maybe a search to filter the list of datasets since the list is so long. -- Brett Leber, 6/14/2007
* Going back to the home page always goes to 'My Datasets' (DS313)
* Maybe show more high level stats on this page, like how many transactions [done Jan 2012], students, skill models
* Allow users to post and share project documentation (files, papers, other meta-info) -- Ruth Wylie, suggested during meeting on 8/4/2010. She had a file that is relevant to multiple datasets in her project.

==== News Feed ====
* Bruce McLaren, email, 4/20/2012
**I wonder whether it would be possible to have a scrolling "News" feed somewhere on the DataShop site that would keep researchers informed about what is happening with the DataShop. For instance, the "News" feed could always show the last time a data conversion finished, announce an upcoming workshop, or inform everyone of critical DataShop issues, such as the fact that the DataShop had a server go down recently (which I know caused you guys a lot of headaches - but which I didn't hear about until a couple weeks after it happened). This kind of thing could be a great communication tool and, as an added bonus for DataShop personnel, avoid lots of email with questions like "where is my data?" or "when is the next conversion going to finish?" Perhaps it would even be possible to have the data conversion routine automatically update the "News" feed each time it begins and/or finishes processing?
** Take it with a grain of salt -- I know you have lots of things on your plate -- but I have been in the situation often, especially just before and during my studies, where I wasn't sure what was going on with DataShop conversions and issues and had to track down someone -- typically Alida -- to figure things out. I have the advantage of sitting right next door to Alida, but I wonder how many other researchers within the PSLC, those not in close proximity to Alida, deal with this issue of not really knowing what is happening with the DataShop at any given time.
* Jonathan Sewall also requested a page that shows the status of the log conversion process, including how much data was processed and for which datasets

== Import ==
== Miscellaneous ==

==== Analyses by LearnLab ====

* Organize data by LearnLab, not by "data set". -- Ryan Baker, Aug 2008
* Also: Bob Hausmann, Sep 2008; Maxine Eskenazi, Sep 2008
* Essentially, current data sets become samples, but the top-level unit is the LearnLab. You can take every data set in a LearnLab together as a sample.
* Implies being able to run analyses across data sets, and export multiple data sets together; to create multi-data set samples
* As a user of DataShop, I would like to look at learning curves for all Algebra data together (for example), or export all Algebra data
* Important long-term, but is a lot of work -- in particular, we need to solve scalability issues first.

==== Save Settings Between Sessions ====
* It would be useful if DataShop could save settings between sessions. -- Bob Hausmann, User Meeting, 2/1/2008
** "I do a lot of redoing the same steps" (e.g., set cutoffs, select a KC model, select students).

==== Multiple steps per transaction ====

* Needed so that we do not have to create multiple transactions for the same actual action for Andes logs. -- Kurt van Lehn, Feb 2007

==== Demographic data ====

* This has been mentioned by NSF visitors, AB, ESL, and some researchers.
* Also mentioned at Winter Workshop 1/23/2008.
** Derek/Sue-mei: Student background information not in DataShop. Would like to see a student or set of students from a particular demographic, and view them across datasets!
* Note that Gail added demographic data to Additional Notes field on the Dataset Info page for many datasets. The idea here is to put that data into the database somewhere.

==== Single Sign On ====
* Michael Bett, email, 10/8/2007
* It would be nice if the following services have a single login account/password:
*# Theory Wiki
*# Learnlab.org
*# ESL's OSS
*# DataShop

==== Reveal unanonymized student IDs ====

As a researcher/PI performing research assistant tasks, I want to see easily the unanonymized student IDs of subjects in DataShop so that I can email my subjects telling them when to use my system.

* Ruth Wylie, July 3, 2008
* As the honest broker of the PSLC data, I have promised to not reveal the student IDs and to protect the identity of the students. This is part of the DataShop IRB. Therefore, I do not see this request as possible. [[User:Alida|Alida]] 09:53, 4 September 2009 (EDT)
** At some point, some coordination would be good with OLI's Digital Dashboard project that Marsha Lovett (and the OLI team and sometimes me) is working on. This is meant to provide usage information more quickly to instructors. It could also perhaps be used by researchers (with the right IRB rights) in situations like Ruth's.

==== Knight Timeline ====
* Developed by Andrea Knight, 2004

==== Buggy Skills ====
* Ken Koedinger, prototype walkthrough 9/11/2006

==== Confusion Matrix ====
* Brian MacWhinney, prototype walkthrough 9/11/2006

== Navigation Bar ==
==== Filter KCs by Name ====

As a researcher working with KCs, I want to filter KCs based on their names, so that I can...
* Vincent Aleven, Email, 2/3/2008
* "Since Alida said you cannot have two mechanisms for putting together your KC set (i.e., cannot have both the selecting-by-clicking and selecting-by-filtering), I would probably opt for the latter."
* Alida: I thought Vincent mentioned that he'd like to select which KCs are in a set by filtering on the name. Example: Include KCs with '*reason*' in the name and exclude KCs with '*given*' in the name.
* This could be an addition to our v3.0 KC-selection mechanism--filter by name.
* Vincent, Email, 5/6/2009: Expressed another need for this feature. Could just allow for a wider area and longer list so that more items can be checked at once. The number of characters we show right now is not enough because in many cases that number of characters is the same across many of the skills. Reference data set: Geometry CWCTC 2005-2006

* Status: Design Started

==== Facebook-style KC Selection ====

As a researcher working with KCs, I want to select KCs based on the learning curve thumbnail, so that I can see quickly which ones I'm interested in exploring more deeply.

* Feature already designed for v3.0, not implemented due to time constraints.
* Agreed this would be really useful. -- Kirsten Butcher, User Mtg, 1/31/2008
* Status: Guestimate: 20 days, need to revisit requirements document

==== Feedback after clicking a large sample on a large dataset ====

As a user of DataShop (first-time or not), I want some feedback and the ability to cancel after I do something that might take a long time (e.g., clicking "All Data" on a large dataset) so that I do not get stuck.

* Part of the Susan Goldman story
* After clicking a sample for a large dataset, there is no "Loading..." text, no feedback that the click was even registered by the app (besides the sometimes busy cursor and small browser "loading" text), nor the ability to cancel the action. 
* We will always have similar problems even if performance is improved, so providing feedback and the opportunity to cancel is critical.

==== Save Button in Problem Navigation Box ====

* Save buttons in the sidebar. -- Ken Koedinger, Mtg 2006
** Could also put one in the Problem selection box in the sidebar.

==== Make Nav Bar Wider ====
* Make the Sample name and description fields much wider. -- John LaPlante, email 7/10/2007

== New Visualizations/Reports ==

==== Student-KC Rollup ====
As a researcher, I want to see KCs rolled up by student, so that I can ...

* Vincent Aleven, User Mtg, 1/29/2008
** By Student-KC would be more useful than by Student-Problem
** Example: # Steps asking for a hint or error or what proportion had help
** How often bottom out hint occurs

==== Instructor Reports ====
* Phil said he received a lot of positive reactions to providing reports on units for instructors. -- Phil Pavlik, ET Mtg 10/10/2007

==== Manage Authorizations/Projects Page ====
* Lisa Anthony, email 10/23/2007
* Allow PI to change permissions on the datasets.
:"Actually, I couldn't see how to change permissions on the datasets from the website. Is this possible? If not, it might be a nice feature..."

==== Calculate Time Spent on Different Study Activities ====

As a researcher, I want to know how much time, on average, students spend on study activities, so that I can ...

* Bruce McLaren, Email, 4/7/2009
<blockquote>
For my most recent stoich study, Shawn and I are interested in calculating timing information such as: (a) how long students spent, on average, working on individual tutors (b) how long students spent, on average, on all items in an intervention (c) how long students worked, on average, on post-tests. 

Timing information is very commonly required for studies, and can be calculated from DataShop logs relatively easily, so even if we don't have it, might be worth considering. (And we don't want to re-invent the wheel, if you already have it or are planning it...)
</blockquote>

==== Incorrect Step Duration and Hint Step Duration ====

As a researcher, I want to be able to see total step duration if the student's first attempt was an incorrect attempt, and total step duration if the student's first attempt was a hint request, so that I can do some analyses that I can't do with "Error Step Duration".

* Bob Hausmann, email, 11/11/2008.
* Updated title and story with 'step duration' instead of 'time'. -- [[User:Alida|Alida]] 10:36, 4 September 2009 (EDT)

==== Grading ====

[[Grading]]

====Display number of steps and number of observations for skills====
* How and where to be determined by developers. -- Ken (entered by Mimi)

== Reports ==
=== Dataset Info ===

==== Pointers to Hard-copy Data ====
* Brett van de Sande, NSF Site Visit, 5/28/2008
* Pointers to hard-copy data such as paper tests and/or homework.  Include contact information.  It doesn't seem to make sense to scan a whole filing cabinet of paper if no one wants to look at it.  And any secondary researchers don't know about the filing cabinet to ask for it.

==== Sort Problem Breakdown Table ====

* Would like the ability to sort the table by clicking on the column headings of the Problem Breakdown Table on the Dataset Info Tab. -- Bruce McLaren, User Mtg, 11/5/2007

==== Rename dataset ====

As a researcher, I want to rename my dataset so that it makes more sense to other people. I also want to make sure the dataset doesn't become polluted later by new data not associated with my study.
* Ruth Wylie, July 3, 2008
* There are reasons she would want to do this (current name is worthless, other researchers might try her tutor and pollute her data) but also reasons for not doing it (log more data later).
* There are risks in changing a dataset name that might not be apparent. For example, if you want the new data in the same dataset. [[User:Alida|Alida]] 10:27, 4 September 2009 (EDT)

==== Average time per problem, average number of problems, total number of sections ====

In addition to showing student hours per dataset, it would be useful to know the average time spent per problem, average number of problems completed, and the total number of sections.
* Noboru Matsuda, June 18, 2010

=== Error Report ===

==== View By Student ====

* Would like to see what a couple of students saw in the feedback. -- Marsha Lovett, 10/11/2007

==== Export ====
* I would like the ability to export this data. -- John LaPlante, email thread 'Suggestions for Improvement' 7/10/2007
* Also interested in this feature idea. -- Bruce McLaren, User Mtg, 11/5/2007

==== Sort ====

* Was planned for but not implemented in v2.1 (estimated to be a 4 day task)
** By Correctness %, starting with the least correct
** By Hints %
** Step (or KC if view by KC)
** Number of Students
* Ability to sort problems by their average experienced position within the curriculum. -- Ken Koedinger, 02/16/2007
** Which problem did students most often experience first, then the one experience second most often, ...
* Order steps by the order they typically are executed by students. -- Ken Koedinger, email 11/7/2008
:"Searching through the steps in a problem to get a sense of what is going on is currently hard because the steps are ordered alphabetically, not by the order in which most students did them. While not all students do all steps in the same order, there is some regularity there. It would be quite useful if the steps could be ordered in a "typical order". This could be accomplished by using the time stamps (of the first (correct?) transaction?) for each step to determine rank order of the each step for each student in a problem and then for each problem average the rank order of each step across all students. Then arrange the steps in the Error Report by their average rank order -- that is, roughly speaking, the step that is most likely to be first across students (closest on average to first) goes first, the step with the next lowest rank goes next, etc."
* The capability to count the number of errors of each message type and sort in different ways, for instance by all errors that had no messages. -- Bruce McLaren, email 10/22/2007
:"This is an error analysis I recently did in Excel, using pivot tables, that might be handy if in the DataShop. This one is very important for tutors because the errors that occur most frequently, yet don't elicit messages to the students, are good candidates to become errors with feedback."

==== Show more than 500 problems ====

In the error report, can we see more than 500 problems? See set "Cog Model Discovery Experiment Spring 2010"
2/2888 selected.
(Showing the first 500)
-- Ken via email on 2/15/2011

=== Export ===

==== Include Step Start Time in transaction format ====

* The transaction-level export already includes problem start time; could it also include step start time? I can easily compute it myself, but it seems there's a specific algorithm that the student-step rollup uses, and it might be nice to include the same value here. --Ilya Goldin, email on 01/16/2012

==== Last attempt on step? column for transaction format ====

* Include a new column that shows whether the row is the last attempt on a step for a student or not. Could be 0 or 1 as value. Helpful for researchers who are doing grading of data. Transaction format. --Vincent Aleven, CTAT mtg 11/5/2010

==== Elapsed Time ====

* Include the elapsed time in preview and transaction export. It is more valuable than the transaction time as an absolute reference. Possible to keep both. --Ken Koedinger, Team Mtg 04/18/2008

==== SQL Format ====

* Option to export as an SQL file. -- Ken Koedinger, 03/26/2007, also brought up in June ET Meeting
** Ability to export an SQL dump of a dataset. --Kyle Cunningham, 04/03/2007

==== Specify Character for Blanks ====

* Ability to specify what character if any is used for blank. --Ryan Baker, email 8/9/2007
:"Not all tools handles TABTABTAB correctly on import. The period '.' is used to mean missing data in most stats packages. The word 'BLANK' is used in some other ones. Not an issue for Ryan as he wrote a preprocessor to convert blanks.

==== Opportunity (at KC) Count in Transaction Export ====
* Include the opportunity count in the transaction export (it's only in the student-step rollup) -- Noboru Matsuda, 10/08/2009
** Would be repetitive
** Would need one Opportunity column per KC

==== Export only rows that have KCs tagged ====
* Is it possible when a exporting dataset to include only the [transaction?] rows that have knowledge components tagged? And the same goes for KC models export, only include the items that have KC tagged? -- Hui Cheng, 01/19/2010
** We have the inverse of this option on the Performance Profiler, 'Include steps without a knowledge component', and with the Student-Problem export, 'Include Steps without KCs'.

==== Include Custom Fields in Student-Step Rollup ====
* Is it possible to include the custom field in the datashop [step] export? -- Hui Cheng, 03/01/2010
** "I am thinking about the student-step level export. We have a student from Statistics wanting to use assistment data from datashop. She wants to have problem set id (=curriculum id in datshop). For Assistment data, we decided to put problem set id in the custom field because in the Assistment a lot of problems are repeated in different problem sets. She needs data at the student-step level." (Hui email)
* Vote from Phil Pavlik too -- see [[Collected_User_Requests#Include_transaction_custom_fields_in_web_services_student-step_export|Phil's comment under Web Services]]

==== Don't duplicate rows in student-step format when not showing KCs ====
* If the checkbox to show knowledge components is not checked, maybe it doesn't make sense to show rows more than once if they have more than one KC associated with the step. (Mimi (and Brett) stumbled on this. 8/16/2010)

==== Student-Step Rollup include Success Column ====
* Step rollup, 1 if correct, 0 if incorrect/hint, blank otherwise call it Success. ~~ Ken Koedinger, DataShop Team Meeting, Oct 22, 2010

=== Learning Curve ===

==== Default sort by observation on LCPID ====

* Default sort by observation on LCPID. ~~ Ken Koedinger, DataShop Team Meeting, 10/22/2010

==== Purple Point ====
* Purple Point: if a point on the LC has more than one KC associated with it but you have drilled down to a given KC, then the blue line is off. We could put a purple point that takes this into account.
* Simpler thing: display a warning message that some points in the display are driven by other KCs
* Pearson may be interested
* This was mentioned during the PSLC Summer School 2010.
* For a step with multiple skills, attribute the error only to the skill with the highest overall error rate. ~~ Alida, meeting with Ken, November 18, 2010

==== Reduce Scrolling ====
* Add a forward and back button to the graph to reduce scrolling. -- John LaPlante, email 7/10/2007

:"The learning curve page could use a forward and back button to cycle through learning the learning curves. Going through them one by one requires a lot of scrolling."

==== Turn On Point Labels ====

* It would be nice to have the option to turn on point labels. It is nice that I can mouse over a point and view the data but it would be nicer if it appeared automatically. -- John LaPlante, email 7/10/2007

==== Option for Bigger Graph ====

* Allow user to see bigger graph. -- Derek Chan, Winter Workshop 1/23/2008
** Potential solution: enable user to set x, y scale manually

==== LC Normalize Scale of Thumbnails ====
* [[LC Normalize Scale of Thumbnails]]

=== Performance Profiler ===

==== Rename Performance Profiler ====
* John laPlante, email thread 'Suggestions for Improvement' 7/10/2007
** Did not use this report as thought it had something to do with improving the performance of the DataShop itself. This report might have been much better to use then the Error Report, would still need an export as using the data in a tabular form was still necessary. Note that the pivot tables created were added to the dataset (Pittsburgh Science of Learning Center Stoichiometry Study 1).
* Lisa Anthony, email 8/2008
** Didn't know to go to the report. Export would be useful.
** Needed a better definition of Error Rate with respect to Problem and Unit rows.

==== Export ====
* John laPlante (see comments in Rename Performance Profiler)
* Lisa Anthony (see comments in Rename Performance Profiler)
* Yes, I too can imagine wanting to export the results of a particular performance profiler output (i.e., to a table) so that I can graph it my own way. --[[User:Koedinger|Koedinger]] 16:24, 16 September 2009 (EDT)

==== Table View ====
* Add option to switch to a table view.
* Columns are: Problem Name, Steps, % incorrect, Incorrect Steps, % hint, Hint Steps, etc. include all values in pop-up.

==== Union of KCs/Problems/Students ====
* Allow user to get the union of KCs/Problems/Students etc so they can compare across samples easier. -- Kirsten Butcher, Winter Workshop 1/23/2008
:"It is difficult to compare performance profiler graphs across samples because the KCs (or problems, or whatever) aren't necessarily in both of those samples."

==== Show Details In Report ====
* Click on bar to see details in report and not just in pop-up. It disappears too quickly. -- Alida, Brett

==== Show More Information in the Graph ====
* Show more information in the graph: [Bruce M, User Mtg, 11/5/2007]
** Had drilled down by a certain skill - skill is not listed in the graph, user has to check the skill list on the LHS to see what skill was selected
* Maybe related to Bruce's "show more info in graph": show the actual value of the range variable. e.g., when range is error rate, show the error rate number somewhere (right now you can see, via mouse-over, the incorrect, hint, and correct percentages, but not the error rate) -- Brett
* Design idea to show # steps incorrect, # steps hint, # steps correct to clarify how the percentages are calculated. -- Alida, Brett

== Sample Selector ==

==== Sub-Samples ====

* Sub Samples would be helpful. -- John LaPlante, email 7/10/2007
:"A nice solution would be to have sub-samples where one property varies. When I'm doing this analysis, I've changed my sample many times, renaming it sometimes, tweaking it to get variations on the data. The samples are really useful but they could help me a lot more with this kind of experimentation."

==== Filter out students ====

As a researcher, I want to filter out test users (including myself) from my data so that I see less noise in the data.

* Ruth Wylie, July 3, 2008
* You can already filter out test users by using a test user id that starts with 'weirdCMUuser_xxx'. Then create a sample that excludes students with a name like 'Test_%'. --[[User:Alida|Alida]] 10:29, 4 September 2009 (EDT)

==== Filter by "Class" ====

As a researcher, I want to create samples based on "class" since class is how I've encoded my conditions.

* Maaike Waalkens, July 30, 2010
* This is what made sense using Mathtutor for tutor delivery.
* What other fields are we missing in the Sample Selector?

==== Filter by "Step" ====

As a researcher, I want to create samples based on "step" since I'm only interested in one particular step.

* Mimi McLaughlin, August 17, 2010

==== Create Sample Automatically ====

* Would it be possible for me to get a random sample from the 'Bridge to Algebra 2006-2007' dataset of 100 students? I am having trouble looking at the data because it takes too long to load, and my adviser thought that was because the dataset was too large. ~~ DataShop User, 10/19/2010

== Web Services ==

==== Include transaction custom fields in web services student-step export ====

* "I noticed in the web services guide that cfs (which provides the custom fields) is not yet implemented for step roll-up tables. It is marked [coming soon] in the manual... I could write a workaround to pull in the transactions and lookup the custom fields, but I'd really rather not." -- Phil Pavlik, 4/27/2011

* "It would be best if the CF for step rollup was drawn from the primary transaction for the first attempt that is correct or incorrect. At least that seems like the generally best value. Basically, there may be many cases where the custom field is the same across all transactions for a step. In this case ... you could just use the first one since they are all the same." -- Phil Pavlik, 4/29/2011

==== Use Custom Fields Graphs/Reports ====
In Graphs

As a researcher creating custom fields and assigning values at the transaction level, I want DataShop to perform the aggregation to the step level so that I can do other things with my custom-field variable such as graph it. -- Ryan Baker, mtg w/Alida & Brett, 12/15/2008

Performance Profiler
* Use all discrete variables/custom fields on the left and all the continuous variables on the bottom of the Performance Profiler. -- [[User:Koedinger|Ken Koedinger]], DataShop Team Meeting, Sept 11, 2009

==== Ad-hoc queries ====

* Allow restricted filtering on steps and transactions as the next web service feature (after CFs), whatever they can filter on in the navigation boxes (User Meeting AAR, December 9, 2009)

==== Sample creation as a web service ====

* Sample creation is still too slow. (User Meeting AAR, December 9, 2009)

==== Ability to directly query the database ====

* I am looking for a unique identifier for student->problem-step->problem-view. I am able to compute this in R, but it would be better to be able to query the DataShop database to get this unique identifier instead of having to recreate DataShop (essentially) in R. -- Ilya Goldin, 5/21/2012

 
----
See completed [[DataShop 3.x Features]] 
See on-going [[DataShop 4.x Features]] 
See prioritized [[DataShop Feature Wish List]]
[[Category:Protected]]
[[Category:DataShop]]

Collected User Requests

2012-04-30T18:50:52Z

Bleber: /* News Feed */

See prioritized items on [[DataShop Feature Wish List]].

== Annotations ==
* Some links from Ruogo Kang's (CMU PhD student, Sara Kiesler) recent talk. -- Ken, email, 8/24/2011
** http://vis.stanford.edu/papers/senseus
** http://vis.berkeley.edu/papers/commentspace/

==== Annotations on Transaction Level ====
* I have models which can annotate things like: gaming, bored, etc. on the transaction level. -- Ryan Baker, ET Mtg 12/5/2007

==== Annotations on Student Level ====
* Would annotate on student level. -- Ido Roll, User Meeting, 1/19/2009

==== Annotations on Pages ====
* See the cool thing created by Jeffery Heer where all the settings of the page were recorded with the comment. -- Ryan Baker, DS Team Mtg 5/23/2008

==== Dataset Discussion - Capture data-integrity issues ====
* As a stakeholder in the DataShop project, I want to capture and publicize the data-integrity issues discovered with data sets so that data is better documented (and so we've fulfilled a promise to our funders to better document data). -- Ken Koedinger, Team Meeting, 8/15/2009
* As a user of DataShop, I want to discuss datasets and have that discussion attached to the dataset so that others can comment and better understand any data-integrity issues I've found.

==== Linking to internal pages ====

* It would be handy if a link can be saved to any dataset, sample, page in the DataShop. -- [[Alida]], 10/18/2007
** Currently, https://pslc-qa.andrew.cmu.edu/datashop/DatasetInfo?datasetId=793 works if you are already logged in.

==== Have a link from the DataShop to the Theory Wiki (Dataset to Project Page) ====

* Can we link from the dataset to the project page on the Theory Wiki? In the pipeline have a clickable link to the project page (make project name clickable). -- Michael Bett, ET Mtg 11/14/2007
** Link to a dataset directly? Is that obvious to users? Click on dataset link -> log in -> redirected back to dataset. -- Brett Leber

== Data Format ==
==== Capture the question prompt and answer choices the student chose from ====
* As a research, I want to be able to identify steps based on the question prompt—not the difficult-to-understand step names that come from selection and action of my tutor—so that I can analyze the data more easily. For multiple choice questions, I also want to be able to see all of the choices that were available to the student. -- Eli Silk - February 1, 2011 (meeting with Brett and Ross Higashi of the FIRE project)
** Near-term solution is to create a table locally that maps steps to prompts
** Another solution is to ask CTAT team to modify their Flash components so that they log this information as custom fields
** Long-term solution is to make these fields into standard fields in the tutor message format
** In the email thread "Cognitive Tutor Development and Evaluation Requests", there is support for this general idea from Ken Koedinger, Albert Corbett, and Christian Schunn.
** Ken added that "Ideally, we may want to store any images that the student can see and where they reside on the screen ..."

== Data Modeling ==
=== Non-KC Modeling ===

==== Automatic Distillation ====
* As an educational data miner wishing to develop a machine learned model with PSLC data, I would like to be able to automatically distill data features (e.g. custom fields) commonly used in past educational data mining research for a new data set (see, for instance, Baker, Corbett, Roll, & Koedinger, 2008 in UMUAI) -- Ryan Baker, Summer 2008, Startup Memo
** Could be implemented as a plug-in
* Also interested in this feature idea. -- Dan Franklin, Oct 2008

==== Upload model and apply it to new data set ====

* As an EDM researcher, I would like to take a model, expressable as a linear formula on DataShop fields, or a simple code procedure (e.g. Bayesian Knowledge Tracing, which Ryan has code for), and apply it to a new data set, so that I can ... -- Ryan Baker, Sept 2008
* Also interested in this. -- Maxine Eskenazi, Sept 2008
* May work best as a plug-in
** Code to display GUI to choose which data sets to use, calls model code, re-import to DataShop
** Good to have a way to apply many models, as soon as you import a data set
* Phil has an idea that maybe fits within this one. Please move if there's a better category. -- Brett Leber
<blockquote>This [''transaction? kc? --ed.''] relabeling is really mostly about enabling modeling in DataShop right? With this in mind, I think that it is actually a higher priority to have model alternatives in DataShop.... E.g. Investigators should be able to give you chunks of Java code according to a certain specification, and DataShop should be then able to run these over datasets (perhaps after a certain series of QA occurs according to an SOP) when the investigator clicks some button in DataShop.... Obviously this is a much large project than adding columns, but it is also much more important in my mind. ''-- Phil Pavlik, email to Brett on 1/14/2009''</blockquote>
* Examples:
** Example: running gaming detector in multiple tutors and comparing gaming frequencies
** Example: applying Bayesian Knowledge Tracing to a new data set from the same LearnLab
** Example: applying Ben Shih's models to many data sets. Note that Ben Shih should be included in design of this feature; he is interested, and has a lot of good ideas.

==== Add Different Predicted Values ====
* Would also like to add statistics, different predicted values than what LFA produces. -- Ken Koedinger, ET Meeting, 10/10/2007

==== Bayesian Knowledge Tracing ====
* Bayesian Knowledge-Tracing built into DataShop like LFA is. -- Ryan Baker, Startup Memo, Summer 2008

==== Richer statistics for KC modeling ====

* In addition, to the model stats and estimates generated for learning factor models, we should also create difficulty factor models (i.e., ones with "Slope" parameter). The latter is particularly relevant for the Unique-Step model where the slope parameter is meaningless (but still counts against the BIC value). -- Ken Koedinger, Email "new feature request", 1/22/2009
** Relatedly, we should report significance values on the Slope parameters -- that is, when is the Slope significantly different from 0.
** The KC models page perhaps should also report the log-likelihood and number of parameters (in addition to BIC) and leave out AIC. We might also consider other metrics of model generality, like the "adjusted R2" (if I have this name right -- Joe Beck mentioned in the Assistments meeting yesterday).
** These changes will be part of meeting the CMDM goal of improvement in (or at least demonstrate acceptability of) the cognitive models in 90+ units in our LearnLab courses (or affiliates).

=== KC Modeling ===

==== LFA/AFM: Check if enough memory using formula ====
* The LFA/AFM code could calculate how much RAM would be needed to run the algorithm on a given skill model using the formula provided by Hao. This formula is based on the number of transactions, number of students and number of skills. Right now it will not schedule itself to run on a model with over 300 skills, though there is a manual override. [[User:Alida|Alida]] 13:35, 29 November 2010 (EST)

==== Create KC Models through Web Services ====
* For John Stamper's CMDM project, it would be nice to automatically update KC Models through web services. -- [[User:Koedinger|Ken Koedinger]], DataShop Team Meeting, Sept 11, 2009

==== Automatically discovering new KC model ====

* Would it be possible to run some code (perhaps Hao's KC model selection code, perhaps something else generated by CMDM thrust) to find new best KC model. -- Vincent Aleven, Sept 2008
* As a learning sciences researcher, I would like DataShop to discover a new/better KC model for me.
* Could be done as a plug-in

==== Generate new KC Models with LFA ====
* Not sure who asked for this.
* It would be nice to generate new KC Models with Hao's LFA code
* Would need to specify factors.
* Ideas on where this could run?
** On a separate server? Request it to be run, specify duration. Have separate server queue up requests, email user when done.
** In Java Applet on client machine? -- Phil Pavlik

==== Same Skill Twice on Same Step ====
* Would like to be able to apply the same skill to a step twice during a KC Model Import. -- Ken Koedinger, email, 2/4/2009

==== Save KC Model Import Files ====
* KC Model Import - save the file used to create the KCMs in case we need to recreate them. -- Ken Koedinger, email 3/4/2009

==== Log Likelihood and MAD ====
* Log Likelihood, MAD (mean absolute deviation) problem, MAD step (store and show) -- Hao Cen
** This is a variation on "Richer statistics for KC modeling" above. Probably should be merged. - Ken Koedinger

==== Better naming for KCs in auto-generated Unique-Step KCM ====

As a researcher, I want the KCs in the Unique-Step model to have better names than KC1, KC2, etc, so that I can easily tell which generated KCs go with which unique step.

* Hui Cheng (Email 1/20/2009), Ken Koedinger (Email 1/22/2009)
* Could you put this in your new feature request list: could the “Unique-step model” be better labeled then just “KC1”, “KC2” and etc? For example, for Assistments data, you could use part of the “Step Name”. -- Hui Cheng, Email 1/20/2009
* But, anything is better than "KC". -- Ken Koedinger, email, 1/22/2009
<blockquote>A simple alternative, that preserves uniqueness and addresses length, is to concatenate: 1) the first K letters of the step name 2) a unique numerical increment (same as the "3" in "KC3"). Note that (2) guarantees uniqueness just as it does in the current "KC<num>" scheme. Or perhaps better given that step names are often scoped within problems, is to concatenate: 1) the first L letters of the problem name 2) the first M letters of the step name 3) a unique numerical increment (just like the "3" in "KC3") I think K or L+M should be as big as possible without making the KC names indistinguishable (because they run off the right margin) in the KC list on the Learning Curve and other pages. </blockquote>

==== Visualize Learning Curve Split ====
* Be able to visualize a learning curve split into 2 based on a specification of a subset of problems. -- Albert Corbett, Math CCM, November 2008
* Harder: make me a new KCM out of it. -- Ken Koedinger, Team Mtg, Dec 5, 2008, while trying to describe Albert's request to Alida

==== Statistical Significance ====
* Can DataShop determine if the difference between conditions or learning curves is statistically significant? -- general theme at workshop, probably mentioned by Bob Hausmann in his talk, Winter Workshop 1/23/2008
** Can't do it yet in DataShop, but I can show you how to do it in R (or SPSS...) after you've exported the data -- export the "student-step rollup" than the whole transaction table. --[[User:Koedinger|Koedinger]] 16:30, 16 September 2009 (EDT)
*** This comment perhaps belongs (exists?) elsewhere: The current student-step rollup export (relly, all exports) should be such that I can immediately load it into R (and other packages) without error. Now errors occur, for instance, because there are "#" in the variable names i the student-step rollup. However, it is currently a road-block for helping folks like Bob do analyzes they want to do. --[[User:Koedinger|Koedinger]] 16:30, 16 September 2009 (EDT)
*** See [[Condition in Student-Step Rollup]] -- [[User:Alida|Alida]] 10:35, 17 September 2009 (EDT)

==== Split and Merge Skills ====

* Hand searches though a p-matrix for a dataset to split and merge skills. (Pie in the Sky) -- Ken Koedinger, Team Mtg, 02/22/2008

==== Notes on new KCMs ====

* It would be good if I could add a note to a KC model that was newly imported. -- Noboru Matsuda, email, Nov 19, 2009

==== Display Learning Curves Grouped by Interestingness ====

* The page displaying all the learning curves today seems to be alphabetically sorted by KC name, which is not necessarily meaningful.
* An alternative presentation is to group the curves into 4 sets, breaking up the page. Set 1 has curves that contain significant spikes, and therefore seem to be "low-hanging fruit" for purposes of breaking up into KCs. Set 2 has curves with few spikes, but they have a long X axis, suggesting that students are presented with too many opportunities to acquire those KCs. Set 3 has the "good" curves, i.e., nicely decreasing curves that are not too long. Set 4 is "other". -- [http://www.pitt.edu/~goldin Ilya Goldin] 7 December 2009

==== Order KC models according to AIC====
* This is based on results from multiple dataset analyses that compared AIC, BIC and loglikelihood to cross validation RMSE. -- Mimi McLaughlin, 2/9/2011

====Use log of opportunity count for AIC and BIC calculations====
* We compared using the log of opportunity count to whole number opportunity count in multiple datasets. We found the results for log of opportunity count to be consistently better, though small. -- Mimi McLaughlin, 2/9/2011

== Developer Requests ==

==== Store Converted Date and Converter Info ====

As a DataShop administrator, I'd like to see the converter information (version and date) stored in the database, so that I do not have to store that data manually in the Additional Notes field each time I load a dataset. -- Kyle, 8/5/2008

* DTD new fields:
** Store conversion and converter information in database (anything else?)
*** conversion_time
*** converter_info
* Email from Octav, 10/5/2007

I see there's also a note about the converter version in the Dataset
Info. Which is good, but it seems it's taken from the directory name
when I submitted the set. I don't know how reliable that is.  :-) It
would be better if it's taken from the new converter info field.

==== Convert from XML to tabbed-delimited format ====

* If the users agree that export format is valuable, then maybe if they could convert from XML to export format to see data in Excel, could look at Selection column and see blanks more easily. -- Jonathan Sewall, ET Mtg, 10/10/2007

==== Plug-ins (general issues) ====

* Please please support Ruby on Rails. -- Ben Shih, December 2008

==== Create UI to grant DataShop user roles ====

* Already tedious.
* Alida, User Meeting AAR on December 9, 2009

== Help ==

==== Specialize Label of Help Button ====
* Since this help is better than in most applications, it should say more than just 'Help'. -- Ken Koedinger & Ryan Baker, Team Mtg, May 30, 2008
** Ideas:
*** 'Page Help'
*** 'Help with this page'
*** 'Help with Learning Curve page' (Ken's favorite)
*** 'Help with this tool'

== Home Page ==

==== Redesign the Home Page ====

* In the menu of data sets at the top, include the N (=20 or as many as fit on the screen?) data sets that I have visited starting with the ones I've visited most recently. --[[User:Koedinger|Koedinger]] 16:16, 16 September 2009 (EDT)
** As of today, the menu shows the last 10 data sets, most recently visited at the top. I think the feature is good enough, but let's ask Ken --[[User:Bleber|Bleber]] 10:56, 6 August 2010 (EDT)
* There needs to be a better ordering for the datasets (DS364)
* Maybe a search to filter the list of datasets since the list is so long. -- Brett Leber, 6/14/2007
* Going back to the home page always goes to 'My Datasets' (DS313)
* Maybe show more high level stats on this page, like how many transactions [done Jan 2012], students, skill models
* Allow users to post and share project documentation (files, papers, other meta-info) -- Ruth Wylie, suggested during meeting on 8/4/2010. She had a file that is relevant to multiple datasets in her project.

==== News Feed ====
* Bruce McLaren, email, 4/20/2012
**I wonder whether it would be possible to have a scrolling "News" feed somewhere on the DataShop site that would keep researchers informed about what is happening with the DataShop. For instance, the "News" feed could always show the last time a data conversion finished, announce an upcoming workshop, or inform everyone of critical DataShop issues, such as the fact that the DataShop had a server go down recently (which I know caused you guys a lot of headaches - but which I didn't hear about until a couple weeks after it happened). This kind of thing could be a great communication tool and, as an added bonus for DataShop personnel, avoid lots of email with questions like "where is my data?" or "when is the next conversion going to finish?" Perhaps it would even be possible to have the data conversion routine automatically update the "News" feed each time it begins and/or finishes processing?
** Take it with a grain of salt -- I know you have lots of things on your plate -- but I have been in the situation often, especially just before and during my studies, where I wasn't sure what was going on with DataShop conversions and issues and had to track down someone -- typically Alida -- to figure things out. I have the advantage of sitting right next door to Alida, but I wonder how many other researchers within the PSLC, those not in close proximity to Alida, deal with this issue of not really knowing what is happening with the DataShop at any given time.
* Jonathan Sewall also requested a page that shows the status of the log conversion process, including how much data was processed and for which datasets

== Import ==
== Miscellaneous ==

==== Analyses by LearnLab ====

* Organize data by LearnLab, not by "data set". -- Ryan Baker, Aug 2008
* Also: Bob Hausmann, Sep 2008; Maxine Eskenazi, Sep 2008
* Essentially, current data sets become samples, but the top-level unit is the LearnLab. You can take every data set in a LearnLab together as a sample.
* Implies being able to run analyses across data sets, and export multiple data sets together; to create multi-data set samples
* As a user of DataShop, I would like to look at learning curves for all Algebra data together (for example), or export all Algebra data
* Important long-term, but is a lot of work -- in particular, we need to solve scalability issues first.

==== Save Settings Between Sessions ====
* It would be useful if DataShop could save settings between sessions. -- Bob Hausmann, User Meeting, 2/1/2008
** "I do a lot of redoing the same steps" (e.g., set cutoffs, select a KC model, select students).

==== Multiple steps per transaction ====

* Needed so that we do not have to create multiple transactions for the same actual action for Andes logs. -- Kurt van Lehn, Feb 2007

==== Demographic data ====

* This has been mentioned by NSF visitors, AB, ESL, and some researchers.
* Also mentioned at Winter Workshop 1/23/2008.
** Derek/Sue-mei: Student background information not in DataShop. Would like to see a student or set of students from a particular demographic, and view them across datasets!
* Note that Gail added demographic data to Additional Notes field on the Dataset Info page for many datasets. The idea here is to put that data into the database somewhere.

==== Single Sign On ====
* Michael Bett, email, 10/8/2007
* It would be nice if the following services have a single login account/password:
*# Theory Wiki
*# Learnlab.org
*# ESL's OSS
*# DataShop

==== Reveal unanonymized student IDs ====

As a researcher/PI performing research assistant tasks, I want to see easily the unanonymized student IDs of subjects in DataShop so that I can email my subjects telling them when to use my system.

* Ruth Wylie, July 3, 2008
* As the honest broker of the PSLC data, I have promised to not reveal the student IDs and to protect the identity of the students. This is part of the DataShop IRB. Therefore, I do not see this request as possible. [[User:Alida|Alida]] 09:53, 4 September 2009 (EDT)
** At some point, some coordination would be good with OLI's Digital Dashboard project that Marsha Lovett (and the OLI team and sometimes me) is working on. This is meant to provide usage information more quickly to instructors. It could also perhaps be used by researchers (with the right IRB rights) in situations like Ruth's.

==== Knight Timeline ====
* Developed by Andrea Knight, 2004

==== Buggy Skills ====
* Ken Koedinger, prototype walkthrough 9/11/2006

==== Confusion Matrix ====
* Brian MacWhinney, prototype walkthrough 9/11/2006

== Navigation Bar ==
==== Filter KCs by Name ====

As a researcher working with KCs, I want to filter KCs based on their names, so that I can...
* Vincent Aleven, Email, 2/3/2008
* "Since Alida said you cannot have two mechanisms for putting together your KC set (i.e., cannot have both the selecting-by-clicking and selecting-by-filtering), I would probably opt for the latter."
* Alida: I thought Vincent mentioned that he'd like to select which KCs are in a set by filtering on the name. Example: Include KCs with '*reason*' in the name and exclude KCs with '*given*' in the name.
* This could be an addition to our v3.0 KC-selection mechanism--filter by name.
* Vincent, Email, 5/6/2009: Expressed another need for this feature. Could just allow for a wider area and longer list so that more items can be checked at once. The number of characters we show right now is not enough because in many cases that number of characters is the same across many of the skills. Reference data set: Geometry CWCTC 2005-2006

* Status: Design Started

==== Facebook-style KC Selection ====

As a researcher working with KCs, I want to select KCs based on the learning curve thumbnail, so that I can see quickly which ones I'm interested in exploring more deeply.

* Feature already designed for v3.0, not implemented due to time constraints.
* Agreed this would be really useful. -- Kirsten Butcher, User Mtg, 1/31/2008
* Status: Guestimate: 20 days, need to revisit requirements document

==== Feedback after clicking a large sample on a large dataset ====

As a user of DataShop (first-time or not), I want some feedback and the ability to cancel after I do something that might take a long time (e.g., clicking "All Data" on a large dataset) so that I do not get stuck.

* Part of the Susan Goldman story
* After clicking a sample for a large dataset, there is no "Loading..." text, no feedback that the click was even registered by the app (besides the sometimes busy cursor and small browser "loading" text), nor the ability to cancel the action. 
* We will always have similar problems even if performance is improved, so providing feedback and the opportunity to cancel is critical.

==== Save Button in Problem Navigation Box ====

* Save buttons in the sidebar. -- Ken Koedinger, Mtg 2006
** Could also put one in the Problem selection box in the sidebar.

==== Make Nav Bar Wider ====
* Make the Sample name and description fields much wider. -- John LaPlante, email 7/10/2007

== New Visualizations/Reports ==

==== Student-KC Rollup ====
As a researcher, I want to see KCs rolled up by student, so that I can ...

* Vincent Aleven, User Mtg, 1/29/2008
** By Student-KC would be more useful than by Student-Problem
** Example: # Steps asking for a hint or error or what proportion had help
** How often bottom out hint occurs

==== Instructor Reports ====
* Phil said he received a lot of positive reactions to providing reports on units for instructors. -- Phil Pavlik, ET Mtg 10/10/2007

==== Manage Authorizations/Projects Page ====
* Lisa Anthony, email 10/23/2007
* Allow PI to change permissions on the datasets.
:"Actually, I couldn't see how to change permissions on the datasets from the website. Is this possible? If not, it might be a nice feature..."

==== Calculate Time Spent on Different Study Activities ====

As a researcher, I want to know how much time, on average, students spend on study activities, so that I can ...

* Bruce McLaren, Email, 4/7/2009
<blockquote>
For my most recent stoich study, Shawn and I are interested in calculating timing information such as: (a) how long students spent, on average, working on individual tutors (b) how long students spent, on average, on all items in an intervention (c) how long students worked, on average, on post-tests. 

Timing information is very commonly required for studies, and can be calculated from DataShop logs relatively easily, so even if we don't have it, might be worth considering. (And we don't want to re-invent the wheel, if you already have it or are planning it...)
</blockquote>

==== Incorrect Step Duration and Hint Step Duration ====

As a researcher, I want to be able to see total step duration if the student's first attempt was an incorrect attempt, and total step duration if the student's first attempt was a hint request, so that I can do some analyses that I can't do with "Error Step Duration".

* Bob Hausmann, email, 11/11/2008.
* Updated title and story with 'step duration' instead of 'time'. -- [[User:Alida|Alida]] 10:36, 4 September 2009 (EDT)

==== Grading ====

[[Grading]]

====Display number of steps and number of observations for skills====
* How and where to be determined by developers. -- Ken (entered by Mimi)

== Reports ==
=== Dataset Info ===

==== Pointers to Hard-copy Data ====
* Brett van de Sande, NSF Site Visit, 5/28/2008
* Pointers to hard-copy data such as paper tests and/or homework.  Include contact information.  It doesn't seem to make sense to scan a whole filing cabinet of paper if no one wants to look at it.  And any secondary researchers don't know about the filing cabinet to ask for it.

==== Sort Problem Breakdown Table ====

* Would like the ability to sort the table by clicking on the column headings of the Problem Breakdown Table on the Dataset Info Tab. -- Bruce McLaren, User Mtg, 11/5/2007

==== Rename dataset ====

As a researcher, I want to rename my dataset so that it makes more sense to other people. I also want to make sure the dataset doesn't become polluted later by new data not associated with my study.
* Ruth Wylie, July 3, 2008
* There are reasons she would want to do this (current name is worthless, other researchers might try her tutor and pollute her data) but also reasons for not doing it (log more data later).
* There are risks in changing a dataset name that might not be apparent. For example, if you want the new data in the same dataset. [[User:Alida|Alida]] 10:27, 4 September 2009 (EDT)

==== Average time per problem, average number of problems, total number of sections ====

In addition to showing student hours per dataset, it would be useful to know the average time spent per problem, average number of problems completed, and the total number of sections.
* Noboru Matsuda, June 18, 2010

=== Error Report ===

==== View By Student ====

* Would like to see what a couple of students saw in the feedback. -- Marsha Lovett, 10/11/2007

==== Export ====
* I would like the ability to export this data. -- John LaPlante, email thread 'Suggestions for Improvement' 7/10/2007
* Also interested in this feature idea. -- Bruce McLaren, User Mtg, 11/5/2007

==== Sort ====

* Was planned for but not implemented in v2.1 (estimated to be a 4 day task)
** By Correctness %, starting with the least correct
** By Hints %
** Step (or KC if view by KC)
** Number of Students
* Ability to sort problems by their average experienced position within the curriculum. -- Ken Koedinger, 02/16/2007
** Which problem did students most often experience first, then the one experience second most often, ...
* Order steps by the order they typically are executed by students. -- Ken Koedinger, email 11/7/2008
:"Searching through the steps in a problem to get a sense of what is going on is currently hard because the steps are ordered alphabetically, not by the order in which most students did them. While not all students do all steps in the same order, there is some regularity there. It would be quite useful if the steps could be ordered in a "typical order". This could be accomplished by using the time stamps (of the first (correct?) transaction?) for each step to determine rank order of the each step for each student in a problem and then for each problem average the rank order of each step across all students. Then arrange the steps in the Error Report by their average rank order -- that is, roughly speaking, the step that is most likely to be first across students (closest on average to first) goes first, the step with the next lowest rank goes next, etc."
* The capability to count the number of errors of each message type and sort in different ways, for instance by all errors that had no messages. -- Bruce McLaren, email 10/22/2007
:"This is an error analysis I recently did in Excel, using pivot tables, that might be handy if in the DataShop. This one is very important for tutors because the errors that occur most frequently, yet don't elicit messages to the students, are good candidates to become errors with feedback."

==== Show more than 500 problems ====

In the error report, can we see more than 500 problems? See set "Cog Model Discovery Experiment Spring 2010"
2/2888 selected.
(Showing the first 500)
-- Ken via email on 2/15/2011

=== Export ===

==== Include Step Start Time in transaction format ====

* The transaction-level export already includes problem start time; could it also include step start time? I can easily compute it myself, but it seems there's a specific algorithm that the student-step rollup uses, and it might be nice to include the same value here. --Ilya Goldin, email on 01/16/2012

==== Last attempt on step? column for transaction format ====

* Include a new column that shows whether the row is the last attempt on a step for a student or not. Could be 0 or 1 as value. Helpful for researchers who are doing grading of data. Transaction format. --Vincent Aleven, CTAT mtg 11/5/2010

==== Elapsed Time ====

* Include the elapsed time in preview and transaction export. It is more valuable than the transaction time as an absolute reference. Possible to keep both. --Ken Koedinger, Team Mtg 04/18/2008

==== SQL Format ====

* Option to export as an SQL file. -- Ken Koedinger, 03/26/2007, also brought up in June ET Meeting
** Ability to export an SQL dump of a dataset. --Kyle Cunningham, 04/03/2007

==== Specify Character for Blanks ====

* Ability to specify what character if any is used for blank. --Ryan Baker, email 8/9/2007
:"Not all tools handles TABTABTAB correctly on import. The period '.' is used to mean missing data in most stats packages. The word 'BLANK' is used in some other ones. Not an issue for Ryan as he wrote a preprocessor to convert blanks.

==== Opportunity (at KC) Count in Transaction Export ====
* Include the opportunity count in the transaction export (it's only in the student-step rollup) -- Noboru Matsuda, 10/08/2009
** Would be repetitive
** Would need one Opportunity column per KC

==== Export only rows that have KCs tagged ====
* Is it possible when a exporting dataset to include only the [transaction?] rows that have knowledge components tagged? And the same goes for KC models export, only include the items that have KC tagged? -- Hui Cheng, 01/19/2010
** We have the inverse of this option on the Performance Profiler, 'Include steps without a knowledge component', and with the Student-Problem export, 'Include Steps without KCs'.

==== Include Custom Fields in Student-Step Rollup ====
* Is it possible to include the custom field in the datashop [step] export? -- Hui Cheng, 03/01/2010
** "I am thinking about the student-step level export. We have a student from Statistics wanting to use assistment data from datashop. She wants to have problem set id (=curriculum id in datshop). For Assistment data, we decided to put problem set id in the custom field because in the Assistment a lot of problems are repeated in different problem sets. She needs data at the student-step level." (Hui email)
* Vote from Phil Pavlik too -- see [[Collected_User_Requests#Include_transaction_custom_fields_in_web_services_student-step_export|Phil's comment under Web Services]]

==== Don't duplicate rows in student-step format when not showing KCs ====
* If the checkbox to show knowledge components is not checked, maybe it doesn't make sense to show rows more than once if they have more than one KC associated with the step. (Mimi (and Brett) stumbled on this. 8/16/2010)

==== Student-Step Rollup include Success Column ====
* Step rollup, 1 if correct, 0 if incorrect/hint, blank otherwise call it Success. ~~ Ken Koedinger, DataShop Team Meeting, Oct 22, 2010

=== Learning Curve ===

==== Default sort by observation on LCPID ====

* Default sort by observation on LCPID. ~~ Ken Koedinger, DataShop Team Meeting, 10/22/2010

==== Purple Point ====
* Purple Point: if a point on the LC has more than one KC associated with it but you have drilled down to a given KC, then the blue line is off. We could put a purple point that takes this into account.
* Simpler thing: display a warning message that some points in the display are driven by other KCs
* Pearson may be interested
* This was mentioned during the PSLC Summer School 2010.
* For a step with multiple skills, attribute the error only to the skill with the highest overall error rate. ~~ Alida, meeting with Ken, November 18, 2010

==== Reduce Scrolling ====
* Add a forward and back button to the graph to reduce scrolling. -- John LaPlante, email 7/10/2007

:"The learning curve page could use a forward and back button to cycle through learning the learning curves. Going through them one by one requires a lot of scrolling."

==== Turn On Point Labels ====

* It would be nice to have the option to turn on point labels. It is nice that I can mouse over a point and view the data but it would be nicer if it appeared automatically. -- John LaPlante, email 7/10/2007

==== Option for Bigger Graph ====

* Allow user to see bigger graph. -- Derek Chan, Winter Workshop 1/23/2008
** Potential solution: enable user to set x, y scale manually

==== LC Normalize Scale of Thumbnails ====
* [[LC Normalize Scale of Thumbnails]]

=== Performance Profiler ===

==== Rename Performance Profiler ====
* John laPlante, email thread 'Suggestions for Improvement' 7/10/2007
** Did not use this report as thought it had something to do with improving the performance of the DataShop itself. This report might have been much better to use then the Error Report, would still need an export as using the data in a tabular form was still necessary. Note that the pivot tables created were added to the dataset (Pittsburgh Science of Learning Center Stoichiometry Study 1).
* Lisa Anthony, email 8/2008
** Didn't know to go to the report. Export would be useful.
** Needed a better definition of Error Rate with respect to Problem and Unit rows.

==== Export ====
* John laPlante (see comments in Rename Performance Profiler)
* Lisa Anthony (see comments in Rename Performance Profiler)
* Yes, I too can imagine wanting to export the results of a particular performance profiler output (i.e., to a table) so that I can graph it my own way. --[[User:Koedinger|Koedinger]] 16:24, 16 September 2009 (EDT)

==== Table View ====
* Add option to switch to a table view.
* Columns are: Problem Name, Steps, % incorrect, Incorrect Steps, % hint, Hint Steps, etc. include all values in pop-up.

==== Union of KCs/Problems/Students ====
* Allow user to get the union of KCs/Problems/Students etc so they can compare across samples easier. -- Kirsten Butcher, Winter Workshop 1/23/2008
:"It is difficult to compare performance profiler graphs across samples because the KCs (or problems, or whatever) aren't necessarily in both of those samples."

==== Show Details In Report ====
* Click on bar to see details in report and not just in pop-up. It disappears too quickly. -- Alida, Brett

==== Show More Information in the Graph ====
* Show more information in the graph: [Bruce M, User Mtg, 11/5/2007]
** Had drilled down by a certain skill - skill is not listed in the graph, user has to check the skill list on the LHS to see what skill was selected
* Maybe related to Bruce's "show more info in graph": show the actual value of the range variable. e.g., when range is error rate, show the error rate number somewhere (right now you can see, via mouse-over, the incorrect, hint, and correct percentages, but not the error rate) -- Brett
* Design idea to show # steps incorrect, # steps hint, # steps correct to clarify how the percentages are calculated. -- Alida, Brett

== Sample Selector ==

==== Sub-Samples ====

* Sub Samples would be helpful. -- John LaPlante, email 7/10/2007
:"A nice solution would be to have sub-samples where one property varies. When I'm doing this analysis, I've changed my sample many times, renaming it sometimes, tweaking it to get variations on the data. The samples are really useful but they could help me a lot more with this kind of experimentation."

==== Filter out students ====

As a researcher, I want to filter out test users (including myself) from my data so that I see less noise in the data.

* Ruth Wylie, July 3, 2008
* You can already filter out test users by using a test user id that starts with 'weirdCMUuser_xxx'. Then create a sample that excludes students with a name like 'Test_%'. --[[User:Alida|Alida]] 10:29, 4 September 2009 (EDT)

==== Filter by "Class" ====

As a researcher, I want to create samples based on "class" since class is how I've encoded my conditions.

* Maaike Waalkens, July 30, 2010
* This is what made sense using Mathtutor for tutor delivery.
* What other fields are we missing in the Sample Selector?

==== Filter by "Step" ====

As a researcher, I want to create samples based on "step" since I'm only interested in one particular step.

* Mimi McLaughlin, August 17, 2010

==== Create Sample Automatically ====

* Would it be possible for me to get a random sample from the 'Bridge to Algebra 2006-2007' dataset of 100 students? I am having trouble looking at the data because it takes too long to load, and my adviser thought that was because the dataset was too large. ~~ DataShop User, 10/19/2010

== Web Services ==

==== Include transaction custom fields in web services student-step export ====

* "I noticed in the web services guide that cfs (which provides the custom fields) is not yet implemented for step roll-up tables. It is marked [coming soon] in the manual... I could write a workaround to pull in the transactions and lookup the custom fields, but I'd really rather not." -- Phil Pavlik, 4/27/2011

* "It would be best if the CF for step rollup was drawn from the primary transaction for the first attempt that is correct or incorrect. At least that seems like the generally best value. Basically, there may be many cases where the custom field is the same across all transactions for a step. In this case ... you could just use the first one since they are all the same." -- Phil Pavlik, 4/29/2011

==== Use Custom Fields Graphs/Reports ====
In Graphs

As a researcher creating custom fields and assigning values at the transaction level, I want DataShop to perform the aggregation to the step level so that I can do other things with my custom-field variable such as graph it. -- Ryan Baker, mtg w/Alida & Brett, 12/15/2008

Performance Profiler
* Use all discrete variables/custom fields on the left and all the continuous variables on the bottom of the Performance Profiler. -- [[User:Koedinger|Ken Koedinger]], DataShop Team Meeting, Sept 11, 2009

==== Ad-hoc queries ====

* Allow restricted filtering on steps and transactions as the next web service feature (after CFs), whatever they can filter on in the navigation boxes (User Meeting AAR, December 9, 2009)

==== Sample creation as a web service ====

* Sample creation is still too slow. (User Meeting AAR, December 9, 2009)

 
----
See completed [[DataShop 3.x Features]] 
See on-going [[DataShop 4.x Features]] 
See prioritized [[DataShop Feature Wish List]]
[[Category:Protected]]
[[Category:DataShop]]

Make Dataset Info Overview public

2012-04-27T20:56:24Z

Bleber: New page: '''Status: Requirements Needed''' == User Story == As a researcher, I would like to be able to follow a link from the PSLC theory wiki (or Google search results) to the metadata page of ...

'''Status: Requirements Needed'''

== User Story ==

As a researcher, I would like to be able to follow a link from the PSLC theory wiki (or Google search results) to the metadata page of a dataset so that I can learn more about the data and log in to gain access to it (if it's public).

== Notes/Comments ==

* Discussed during 4/26/2012 team meeting

----
See completed [[DataShop 3.x Features]] 
See on-going [[DataShop 4.x Features]] 
See prioritized [[DataShop Feature Wish List]] 
See unordered [[Collected User Requests]]
[[Category:Protected]]
[[Category:DataShop]]
[[Category:DataShop Feature]]

DataShop Feature Wish List

2012-04-27T20:44:06Z

Bleber:

Below are two lists of features. The features that we have prioritized and decided to implement are in the first, ordered list. The features that the DataShop team and community are discussing are in an unordered list on the page [[Collected User Requests]]. Click on a feature to get more information about it, such as a description, rationale for building it, and its status.

'''You can help!'''

If you think a feature is important, vote for it by putting your name to the right of the feature. Discuss the feature on the comments section of that feature's page. We'll use these votes and the dialogue that develops to prioritize features.

Don't see a feature on the prioritized list? There's a good chance it's on the '''[[Collected User Requests]]''' page. You can add feature ideas there and discuss the existing ones. Include your comment, name, and date to vote on feature ideas there.

'''Tip:''' Easily sign your username and the current date/time by inserting four tildes (<nowiki>~~~~</nowiki>); insert just your username with three tildes.

''See [[DataShop On-going Features|features we are building now]], and [[DataShop Completed Features|ones we have built]].''

== Prioritized Features ==

# [[Speed up Aggregator]]
# [[Make Dataset Info Overview public]]
# [[Push Button Import]] — Carnegie Learning, John Stamper
# [[Web Services - Add Custom Fields]] (add custom fields to transactions) — Vote: Ryan Baker (1), John Stamper (1)
# [[Error Bars]] — Vote: Ken Koedinger (4)
# [[Add Latency Y-axis Options]] — Vote: Ken Koedinger (3)
# [[Adding Custom Fields through Web Application]] — Vote: Ryan Baker (2)
# [[Scalability]] — Vote: Ryan Baker (4)
# [[KC Model in Transaction Export]] — Vote: Vincent Aleven (2)
# [[Student Filter Dialog]]
# [[Milliseconds]] — Vote: [[User:Mostow|Mostow]] 23:07, 13 December 2011 (UTC)
# [[LFA-AFM on Sample]] — Vote: Ken Koedinger (5)
# [[Place for General Papers]]
# [[Performance Metrics]]
# [[Ability to display step-custom-fields in graphs]]
# [[Add Problem Content]]
# [[Dialogue Message Format]]
# [[Web application student-step format should be the same as web services version]] — Vote: Mike Yudelson

== Unordered Features ==
We have a long list of feature requests that have not been prioritized. Please see the
'''[[Collected User Requests]]'''.

== How to Request a Feature ==
* [[Write a User Story]]
* [[Create a Feature Page]]
* Add Link to Feature on [[Collected User Requests]] page.
 
----
See [[DataShop Completed Features|completed features]] 
See [[DataShop On-going Features|on-going features]] 
See unordered [[Collected User Requests]] 
See the [[:Category:DataShop Glossary|DataShop Glossary]]
[[Category:Protected]]
[[Category:DataShop]]

Other Analysis Outputs

2012-04-18T14:38:00Z

Bleber:

'''Status: In QA'''

== User Story ==

As a user of DataShop, I want to store my model results in DataShop so that I can collaborate with other researchers.

== Notes/Comments ==

* Talked about having "Other Analysis Outputs" as a tab in the DataShop interface so the users of web services can create a simple text file (some sort of free form document) and store it back to DataShop. -- From a meeting with Ken on 3/24/2011
* Feature entailed the creation of a Files tab with Papers, External Analyses, and Files subtabs

 
----
See completed [[DataShop 3.x Features]] 
See on-going [[DataShop 4.x Features]] 
See prioritized [[DataShop Feature Wish List]] 
See unordered [[Collected User Requests]]
[[Category:Protected]]
[[Category:DataShop]]
[[Category:DataShop Feature]]

Access Requests

2012-04-18T14:37:49Z

Bleber:

'''Status: In QA'''

== User Story ==

As a PI or data provider of a project, I want the ability to authorize requests to access my projects myself so that I don’t need to go through DataShop staff.

== Notes/Comments ==

* Feature includes the ability for any registered user to request access to any private project.
* PI and data provider (if specified) must agree for a user to receive access
* PIs and data providers of projects can view and export an access report

 
----
See completed [[DataShop 5.x Features]] 
See prioritized [[DataShop Feature Wish List]] 
See unordered [[Collected User Requests]]
[[Category:Protected]]
[[Category:DataShop]]
[[Category:DataShop Feature]]

Access Requests

2012-04-18T14:36:44Z

Bleber: New page: '''Status: ''' == User Story == As a PI or data provider of a project, I want the ability to authorize requests to access my projects myself so that I don’t need to go through DataShop...

'''Status: '''

== User Story ==

As a PI or data provider of a project, I want the ability to authorize requests to access my projects myself so that I don’t need to go through DataShop staff.

== Notes/Comments ==

* Feature includes the ability for any registered user to request access to any private project.
* PI and data provider (if specified) must agree for a user to receive access
* PIs and data providers of projects can view and export an access report

 
----
See completed [[DataShop 5.x Features]] 
See prioritized [[DataShop Feature Wish List]] 
See unordered [[Collected User Requests]]
[[Category:Protected]]
[[Category:DataShop]]
[[Category:DataShop Feature]]

DataShop Completed Features

2012-04-18T14:28:01Z

Bleber:

* [[DataShop 3.x Features]]
* [[DataShop 4.x Features]]
* [[DataShop 5.x Features]]

 
----
See [[DataShop On-going Features|on-going features]] 
See unordered [[Collected User Requests]] 
See the [[:Category:DataShop Glossary|DataShop Glossary]]
[[Category:Protected]]
[[Category:DataShop]]

DataShop Completed Features

2012-04-18T14:27:21Z

Bleber:

* [[DataShop 3.x Features]]
* [[DataShop 4.x Features]]
* [[DataShop_5.x_Features]]

 
----
See [[DataShop On-going Features|on-going features]] 
See unordered [[Collected User Requests]] 
See the [[:Category:DataShop Glossary|DataShop Glossary]]
[[Category:Protected]]
[[Category:DataShop]]

DataShop Feature Wish List

2012-04-18T14:26:16Z

Bleber:

Below are two lists of features. The features that we have prioritized and decided to implement are in the first, ordered list. The features that the DataShop team and community are discussing are in an unordered list on the page [[Collected User Requests]]. Click on a feature to get more information about it, such as a description, rationale for building it, and its status.

'''You can help!'''

If you think a feature is important, vote for it by putting your name to the right of the feature. Discuss the feature on the comments section of that feature's page. We'll use these votes and the dialogue that develops to prioritize features.

Don't see a feature on the prioritized list? There's a good chance it's on the '''[[Collected User Requests]]''' page. You can add feature ideas there and discuss the existing ones. Include your comment, name, and date to vote on feature ideas there.

'''Tip:''' Easily sign your username and the current date/time by inserting four tildes (<nowiki>~~~~</nowiki>); insert just your username with three tildes.

''See [[DataShop On-going Features|features we are building now]], and [[DataShop Completed Features|ones we have built]].''

== Prioritized Features ==

# [[Speed up Aggregator]]
# [[Push Button Import]] — Carnegie Learning, John Stamper
# [[Web Services - Add Custom Fields]] (add custom fields to transactions) — Vote: Ryan Baker (1), John Stamper (1)
# [[Error Bars]] — Vote: Ken Koedinger (4)
# [[Add Latency Y-axis Options]] — Vote: Ken Koedinger (3)
# [[Adding Custom Fields through Web Application]] — Vote: Ryan Baker (2)
# [[Scalability]] — Vote: Ryan Baker (4)
# [[KC Model in Transaction Export]] — Vote: Vincent Aleven (2)
# [[Student Filter Dialog]]
# [[Milliseconds]] — Vote: [[User:Mostow|Mostow]] 23:07, 13 December 2011 (UTC)
# [[LFA-AFM on Sample]] — Vote: Ken Koedinger (5)
# [[Place for General Papers]]
# [[Performance Metrics]]
# [[Ability to display step-custom-fields in graphs]]
# [[Add Problem Content]]
# [[Dialogue Message Format]]
# [[Web application student-step format should be the same as web services version]] — Vote: Mike Yudelson

== Unordered Features ==
We have a long list of feature requests that have not been prioritized. Please see the
'''[[Collected User Requests]]'''.

== How to Request a Feature ==
* [[Write a User Story]]
* [[Create a Feature Page]]
* Add Link to Feature on [[Collected User Requests]] page.
 
----
See [[DataShop Completed Features|completed features]] 
See [[DataShop On-going Features|on-going features]] 
See unordered [[Collected User Requests]] 
See the [[:Category:DataShop Glossary|DataShop Glossary]]
[[Category:Protected]]
[[Category:DataShop]]

DataShop 5.x Features

2012-04-18T14:23:52Z

Bleber:

== v5.0 April/May 2011 ==
# [[Flat File Importer]]
# [[Pre and Post Test Data]]

== v5.1 October 2011 ==
# [[Add Problem View to Export and Import]]
# [[KC Model Sort]]
# [[Order KCs by BIC on Primary Model]]
# [[Citations]]
# Changes to Cross Validation

== v5.2 January 2012 ==
# [[Terms of Use]]
# Project pages and project PIs

== v5.3 April 2012 ==
# [[Other Analysis Outputs]]
# [[Access Requests]]

 
----
See completed [[DataShop 3.x Features]] 
See completed [[DataShop 4.x Features]] 
See prioritized [[DataShop Feature Wish List]] 
See unordered [[Collected User Requests]] 
See the [[:Category:DataShop Glossary|DataShop Glossary]]
[[Category:Protected]]
[[Category:DataShop]]

Other Analysis Outputs

2012-04-18T14:23:31Z

Bleber: /* Notes/Comments */

'''Status: Prioritization Needed'''

== User Story ==

As a user of DataShop, I want to store my model results in DataShop so that I can collaborate with other researchers.

== Notes/Comments ==

* Talked about having "Other Analysis Outputs" as a tab in the DataShop interface so the users of web services can create a simple text file (some sort of free form document) and store it back to DataShop. -- From a meeting with Ken on 3/24/2011
* Feature entailed the creation of a Files tab with Papers, External Analyses, and Files subtabs

 
----
See completed [[DataShop 3.x Features]] 
See on-going [[DataShop 4.x Features]] 
See prioritized [[DataShop Feature Wish List]] 
See unordered [[Collected User Requests]]
[[Category:Protected]]
[[Category:DataShop]]
[[Category:DataShop Feature]]

Collected User Requests

2012-01-18T23:06:48Z

Bleber: /* Redesign the Home Page */

See prioritized items on [[DataShop Feature Wish List]].

== Annotations ==
* Some links from Ruogo Kang's (CMU PhD student, Sara Kiesler) recent talk. -- Ken, email, 8/24/2011
** http://vis.stanford.edu/papers/senseus
** http://vis.berkeley.edu/papers/commentspace/

==== Annotations on Transaction Level ====
* I have models which can annotate things like: gaming, bored, etc. on the transaction level. -- Ryan Baker, ET Mtg 12/5/2007

==== Annotations on Student Level ====
* Would annotate on student level. -- Ido Roll, User Meeting, 1/19/2009

==== Annotations on Pages ====
* See the cool thing created by Jeffery Heer where all the settings of the page were recorded with the comment. -- Ryan Baker, DS Team Mtg 5/23/2008

==== Dataset Discussion - Capture data-integrity issues ====
* As a stakeholder in the DataShop project, I want to capture and publicize the data-integrity issues discovered with data sets so that data is better documented (and so we've fulfilled a promise to our funders to better document data). -- Ken Koedinger, Team Meeting, 8/15/2009
* As a user of DataShop, I want to discuss datasets and have that discussion attached to the dataset so that others can comment and better understand any data-integrity issues I've found.

==== Linking to internal pages ====

* It would be handy if a link can be saved to any dataset, sample, page in the DataShop. -- [[Alida]], 10/18/2007
** Currently, https://pslc-qa.andrew.cmu.edu/datashop/DatasetInfo?datasetId=793 works if you are already logged in.

==== Have a link from the DataShop to the Theory Wiki (Dataset to Project Page) ====

* Can we link from the dataset to the project page on the Theory Wiki? In the pipeline have a clickable link to the project page (make project name clickable). -- Michael Bett, ET Mtg 11/14/2007
** Link to a dataset directly? Is that obvious to users? Click on dataset link -> log in -> redirected back to dataset. -- Brett Leber

== Data Format ==
==== Capture the question prompt and answer choices the student chose from ====
* As a research, I want to be able to identify steps based on the question prompt—not the difficult-to-understand step names that come from selection and action of my tutor—so that I can analyze the data more easily. For multiple choice questions, I also want to be able to see all of the choices that were available to the student. -- Eli Silk - February 1, 2011 (meeting with Brett and Ross Higashi of the FIRE project)
** Near-term solution is to create a table locally that maps steps to prompts
** Another solution is to ask CTAT team to modify their Flash components so that they log this information as custom fields
** Long-term solution is to make these fields into standard fields in the tutor message format
** In the email thread "Cognitive Tutor Development and Evaluation Requests", there is support for this general idea from Ken Koedinger, Albert Corbett, and Christian Schunn.
** Ken added that "Ideally, we may want to store any images that the student can see and where they reside on the screen ..."

== Data Modeling ==
=== Non-KC Modeling ===

==== Automatic Distillation ====
* As an educational data miner wishing to develop a machine learned model with PSLC data, I would like to be able to automatically distill data features (e.g. custom fields) commonly used in past educational data mining research for a new data set (see, for instance, Baker, Corbett, Roll, & Koedinger, 2008 in UMUAI) -- Ryan Baker, Summer 2008, Startup Memo
** Could be implemented as a plug-in
* Also interested in this feature idea. -- Dan Franklin, Oct 2008

==== Upload model and apply it to new data set ====

* As an EDM researcher, I would like to take a model, expressable as a linear formula on DataShop fields, or a simple code procedure (e.g. Bayesian Knowledge Tracing, which Ryan has code for), and apply it to a new data set, so that I can ... -- Ryan Baker, Sept 2008
* Also interested in this. -- Maxine Eskenazi, Sept 2008
* May work best as a plug-in
** Code to display GUI to choose which data sets to use, calls model code, re-import to DataShop
** Good to have a way to apply many models, as soon as you import a data set
* Phil has an idea that maybe fits within this one. Please move if there's a better category. -- Brett Leber
<blockquote>This [''transaction? kc? --ed.''] relabeling is really mostly about enabling modeling in DataShop right? With this in mind, I think that it is actually a higher priority to have model alternatives in DataShop.... E.g. Investigators should be able to give you chunks of Java code according to a certain specification, and DataShop should be then able to run these over datasets (perhaps after a certain series of QA occurs according to an SOP) when the investigator clicks some button in DataShop.... Obviously this is a much large project than adding columns, but it is also much more important in my mind. ''-- Phil Pavlik, email to Brett on 1/14/2009''</blockquote>
* Examples:
** Example: running gaming detector in multiple tutors and comparing gaming frequencies
** Example: applying Bayesian Knowledge Tracing to a new data set from the same LearnLab
** Example: applying Ben Shih's models to many data sets. Note that Ben Shih should be included in design of this feature; he is interested, and has a lot of good ideas.

==== Add Different Predicted Values ====
* Would also like to add statistics, different predicted values than what LFA produces. -- Ken Koedinger, ET Meeting, 10/10/2007

==== Bayesian Knowledge Tracing ====
* Bayesian Knowledge-Tracing built into DataShop like LFA is. -- Ryan Baker, Startup Memo, Summer 2008

==== Richer statistics for KC modeling ====

* In addition, to the model stats and estimates generated for learning factor models, we should also create difficulty factor models (i.e., ones with "Slope" parameter). The latter is particularly relevant for the Unique-Step model where the slope parameter is meaningless (but still counts against the BIC value). -- Ken Koedinger, Email "new feature request", 1/22/2009
** Relatedly, we should report significance values on the Slope parameters -- that is, when is the Slope significantly different from 0.
** The KC models page perhaps should also report the log-likelihood and number of parameters (in addition to BIC) and leave out AIC. We might also consider other metrics of model generality, like the "adjusted R2" (if I have this name right -- Joe Beck mentioned in the Assistments meeting yesterday).
** These changes will be part of meeting the CMDM goal of improvement in (or at least demonstrate acceptability of) the cognitive models in 90+ units in our LearnLab courses (or affiliates).

=== KC Modeling ===

==== LFA/AFM: Check if enough memory using formula ====
* The LFA/AFM code could calculate how much RAM would be needed to run the algorithm on a given skill model using the formula provided by Hao. This formula is based on the number of transactions, number of students and number of skills. Right now it will not schedule itself to run on a model with over 300 skills, though there is a manual override. [[User:Alida|Alida]] 13:35, 29 November 2010 (EST)

==== Create KC Models through Web Services ====
* For John Stamper's CMDM project, it would be nice to automatically update KC Models through web services. -- [[User:Koedinger|Ken Koedinger]], DataShop Team Meeting, Sept 11, 2009

==== Automatically discovering new KC model ====

* Would it be possible to run some code (perhaps Hao's KC model selection code, perhaps something else generated by CMDM thrust) to find new best KC model. -- Vincent Aleven, Sept 2008
* As a learning sciences researcher, I would like DataShop to discover a new/better KC model for me.
* Could be done as a plug-in

==== Generate new KC Models with LFA ====
* Not sure who asked for this.
* It would be nice to generate new KC Models with Hao's LFA code
* Would need to specify factors.
* Ideas on where this could run?
** On a separate server? Request it to be run, specify duration. Have separate server queue up requests, email user when done.
** In Java Applet on client machine? -- Phil Pavlik

==== Same Skill Twice on Same Step ====
* Would like to be able to apply the same skill to a step twice during a KC Model Import. -- Ken Koedinger, email, 2/4/2009

==== Save KC Model Import Files ====
* KC Model Import - save the file used to create the KCMs in case we need to recreate them. -- Ken Koedinger, email 3/4/2009

==== Log Likelihood and MAD ====
* Log Likelihood, MAD (mean absolute deviation) problem, MAD step (store and show) -- Hao Cen
** This is a variation on "Richer statistics for KC modeling" above. Probably should be merged. - Ken Koedinger

==== Better naming for KCs in auto-generated Unique-Step KCM ====

As a researcher, I want the KCs in the Unique-Step model to have better names than KC1, KC2, etc, so that I can easily tell which generated KCs go with which unique step.

* Hui Cheng (Email 1/20/2009), Ken Koedinger (Email 1/22/2009)
* Could you put this in your new feature request list: could the “Unique-step model” be better labeled then just “KC1”, “KC2” and etc? For example, for Assistments data, you could use part of the “Step Name”. -- Hui Cheng, Email 1/20/2009
* But, anything is better than "KC". -- Ken Koedinger, email, 1/22/2009
<blockquote>A simple alternative, that preserves uniqueness and addresses length, is to concatenate: 1) the first K letters of the step name 2) a unique numerical increment (same as the "3" in "KC3"). Note that (2) guarantees uniqueness just as it does in the current "KC<num>" scheme. Or perhaps better given that step names are often scoped within problems, is to concatenate: 1) the first L letters of the problem name 2) the first M letters of the step name 3) a unique numerical increment (just like the "3" in "KC3") I think K or L+M should be as big as possible without making the KC names indistinguishable (because they run off the right margin) in the KC list on the Learning Curve and other pages. </blockquote>

==== Visualize Learning Curve Split ====
* Be able to visualize a learning curve split into 2 based on a specification of a subset of problems. -- Albert Corbett, Math CCM, November 2008
* Harder: make me a new KCM out of it. -- Ken Koedinger, Team Mtg, Dec 5, 2008, while trying to describe Albert's request to Alida

==== Statistical Significance ====
* Can DataShop determine if the difference between conditions or learning curves is statistically significant? -- general theme at workshop, probably mentioned by Bob Hausmann in his talk, Winter Workshop 1/23/2008
** Can't do it yet in DataShop, but I can show you how to do it in R (or SPSS...) after you've exported the data -- export the "student-step rollup" than the whole transaction table. --[[User:Koedinger|Koedinger]] 16:30, 16 September 2009 (EDT)
*** This comment perhaps belongs (exists?) elsewhere: The current student-step rollup export (relly, all exports) should be such that I can immediately load it into R (and other packages) without error. Now errors occur, for instance, because there are "#" in the variable names i the student-step rollup. However, it is currently a road-block for helping folks like Bob do analyzes they want to do. --[[User:Koedinger|Koedinger]] 16:30, 16 September 2009 (EDT)
*** See [[Condition in Student-Step Rollup]] -- [[User:Alida|Alida]] 10:35, 17 September 2009 (EDT)

==== Split and Merge Skills ====

* Hand searches though a p-matrix for a dataset to split and merge skills. (Pie in the Sky) -- Ken Koedinger, Team Mtg, 02/22/2008

==== Notes on new KCMs ====

* It would be good if I could add a note to a KC model that was newly imported. -- Noboru Matsuda, email, Nov 19, 2009

==== Display Learning Curves Grouped by Interestingness ====

* The page displaying all the learning curves today seems to be alphabetically sorted by KC name, which is not necessarily meaningful.
* An alternative presentation is to group the curves into 4 sets, breaking up the page. Set 1 has curves that contain significant spikes, and therefore seem to be "low-hanging fruit" for purposes of breaking up into KCs. Set 2 has curves with few spikes, but they have a long X axis, suggesting that students are presented with too many opportunities to acquire those KCs. Set 3 has the "good" curves, i.e., nicely decreasing curves that are not too long. Set 4 is "other". -- [http://www.pitt.edu/~goldin Ilya Goldin] 7 December 2009

==== Order KC models according to AIC====
* This is based on results from multiple dataset analyses that compared AIC, BIC and loglikelihood to cross validation RMSE. -- Mimi McLaughlin, 2/9/2011

====Use log of opportunity count for AIC and BIC calculations====
* We compared using the log of opportunity count to whole number opportunity count in multiple datasets. We found the results for log of opportunity count to be consistently better, though small. -- Mimi McLaughlin, 2/9/2011

== Developer Requests ==

==== Store Converted Date and Converter Info ====

As a DataShop administrator, I'd like to see the converter information (version and date) stored in the database, so that I do not have to store that data manually in the Additional Notes field each time I load a dataset. -- Kyle, 8/5/2008

* DTD new fields:
** Store conversion and converter information in database (anything else?)
*** conversion_time
*** converter_info
* Email from Octav, 10/5/2007

I see there's also a note about the converter version in the Dataset
Info. Which is good, but it seems it's taken from the directory name
when I submitted the set. I don't know how reliable that is.  :-) It
would be better if it's taken from the new converter info field.

==== Convert from XML to tabbed-delimited format ====

* If the users agree that export format is valuable, then maybe if they could convert from XML to export format to see data in Excel, could look at Selection column and see blanks more easily. -- Jonathan Sewall, ET Mtg, 10/10/2007

==== Plug-ins (general issues) ====

* Please please support Ruby on Rails. -- Ben Shih, December 2008

==== Create UI to grant DataShop user roles ====

* Already tedious.
* Alida, User Meeting AAR on December 9, 2009

== Help ==

==== Specialize Label of Help Button ====
* Since this help is better than in most applications, it should say more than just 'Help'. -- Ken Koedinger & Ryan Baker, Team Mtg, May 30, 2008
** Ideas:
*** 'Page Help'
*** 'Help with this page'
*** 'Help with Learning Curve page' (Ken's favorite)
*** 'Help with this tool'

== Home Page ==

==== Redesign the Home Page ====

* In the menu of data sets at the top, include the N (=20 or as many as fit on the screen?) data sets that I have visited starting with the ones I've visited most recently. --[[User:Koedinger|Koedinger]] 16:16, 16 September 2009 (EDT)
** As of today, the menu shows the last 10 data sets, most recently visited at the top. I think the feature is good enough, but let's ask Ken --[[User:Bleber|Bleber]] 10:56, 6 August 2010 (EDT)
* There needs to be a better ordering for the datasets (DS364)
* Maybe a search to filter the list of datasets since the list is so long. -- Brett Leber, 6/14/2007
* Going back to the home page always goes to 'My Datasets' (DS313)
* Maybe show more high level stats on this page, like how many transactions [done Jan 2012], students, skill models
* Allow users to post and share project documentation (files, papers, other meta-info) -- Ruth Wylie, suggested during meeting on 8/4/2010. She had a file that is relevant to multiple datasets in her project.

== Import ==
== Miscellaneous ==

==== Analyses by LearnLab ====

* Organize data by LearnLab, not by "data set". -- Ryan Baker, Aug 2008
* Also: Bob Hausmann, Sep 2008; Maxine Eskenazi, Sep 2008
* Essentially, current data sets become samples, but the top-level unit is the LearnLab. You can take every data set in a LearnLab together as a sample.
* Implies being able to run analyses across data sets, and export multiple data sets together; to create multi-data set samples
* As a user of DataShop, I would like to look at learning curves for all Algebra data together (for example), or export all Algebra data
* Important long-term, but is a lot of work -- in particular, we need to solve scalability issues first.

==== Save Settings Between Sessions ====
* It would be useful if DataShop could save settings between sessions. -- Bob Hausmann, User Meeting, 2/1/2008
** "I do a lot of redoing the same steps" (e.g., set cutoffs, select a KC model, select students).

==== Multiple steps per transaction ====

* Needed so that we do not have to create multiple transactions for the same actual action for Andes logs. -- Kurt van Lehn, Feb 2007

==== Demographic data ====

* This has been mentioned by NSF visitors, AB, ESL, and some researchers.
* Also mentioned at Winter Workshop 1/23/2008.
** Derek/Sue-mei: Student background information not in DataShop. Would like to see a student or set of students from a particular demographic, and view them across datasets!
* Note that Gail added demographic data to Additional Notes field on the Dataset Info page for many datasets. The idea here is to put that data into the database somewhere.

==== Single Sign On ====
* Michael Bett, email, 10/8/2007
* It would be nice if the following services have a single login account/password:
*# Theory Wiki
*# Learnlab.org
*# ESL's OSS
*# DataShop

==== Reveal unanonymized student IDs ====

As a researcher/PI performing research assistant tasks, I want to see easily the unanonymized student IDs of subjects in DataShop so that I can email my subjects telling them when to use my system.

* Ruth Wylie, July 3, 2008
* As the honest broker of the PSLC data, I have promised to not reveal the student IDs and to protect the identity of the students. This is part of the DataShop IRB. Therefore, I do not see this request as possible. [[User:Alida|Alida]] 09:53, 4 September 2009 (EDT)
** At some point, some coordination would be good with OLI's Digital Dashboard project that Marsha Lovett (and the OLI team and sometimes me) is working on. This is meant to provide usage information more quickly to instructors. It could also perhaps be used by researchers (with the right IRB rights) in situations like Ruth's.

==== Knight Timeline ====
* Developed by Andrea Knight, 2004

==== Buggy Skills ====
* Ken Koedinger, prototype walkthrough 9/11/2006

==== Confusion Matrix ====
* Brian MacWhinney, prototype walkthrough 9/11/2006

== Navigation Bar ==
==== Filter KCs by Name ====

As a researcher working with KCs, I want to filter KCs based on their names, so that I can...
* Vincent Aleven, Email, 2/3/2008
* "Since Alida said you cannot have two mechanisms for putting together your KC set (i.e., cannot have both the selecting-by-clicking and selecting-by-filtering), I would probably opt for the latter."
* Alida: I thought Vincent mentioned that he'd like to select which KCs are in a set by filtering on the name. Example: Include KCs with '*reason*' in the name and exclude KCs with '*given*' in the name.
* This could be an addition to our v3.0 KC-selection mechanism--filter by name.
* Vincent, Email, 5/6/2009: Expressed another need for this feature. Could just allow for a wider area and longer list so that more items can be checked at once. The number of characters we show right now is not enough because in many cases that number of characters is the same across many of the skills. Reference data set: Geometry CWCTC 2005-2006

* Status: Design Started

==== Facebook-style KC Selection ====

As a researcher working with KCs, I want to select KCs based on the learning curve thumbnail, so that I can see quickly which ones I'm interested in exploring more deeply.

* Feature already designed for v3.0, not implemented due to time constraints.
* Agreed this would be really useful. -- Kirsten Butcher, User Mtg, 1/31/2008
* Status: Guestimate: 20 days, need to revisit requirements document

==== Feedback after clicking a large sample on a large dataset ====

As a user of DataShop (first-time or not), I want some feedback and the ability to cancel after I do something that might take a long time (e.g., clicking "All Data" on a large dataset) so that I do not get stuck.

* Part of the Susan Goldman story
* After clicking a sample for a large dataset, there is no "Loading..." text, no feedback that the click was even registered by the app (besides the sometimes busy cursor and small browser "loading" text), nor the ability to cancel the action. 
* We will always have similar problems even if performance is improved, so providing feedback and the opportunity to cancel is critical.

==== Save Button in Problem Navigation Box ====

* Save buttons in the sidebar. -- Ken Koedinger, Mtg 2006
** Could also put one in the Problem selection box in the sidebar.

==== Make Nav Bar Wider ====
* Make the Sample name and description fields much wider. -- John LaPlante, email 7/10/2007

== New Visualizations/Reports ==

==== Student-KC Rollup ====
As a researcher, I want to see KCs rolled up by student, so that I can ...

* Vincent Aleven, User Mtg, 1/29/2008
** By Student-KC would be more useful than by Student-Problem
** Example: # Steps asking for a hint or error or what proportion had help
** How often bottom out hint occurs

==== Instructor Reports ====
* Phil said he received a lot of positive reactions to providing reports on units for instructors. -- Phil Pavlik, ET Mtg 10/10/2007

==== Manage Authorizations/Projects Page ====
* Lisa Anthony, email 10/23/2007
* Allow PI to change permissions on the datasets.
:"Actually, I couldn't see how to change permissions on the datasets from the website. Is this possible? If not, it might be a nice feature..."

==== Calculate Time Spent on Different Study Activities ====

As a researcher, I want to know how much time, on average, students spend on study activities, so that I can ...

* Bruce McLaren, Email, 4/7/2009
<blockquote>
For my most recent stoich study, Shawn and I are interested in calculating timing information such as: (a) how long students spent, on average, working on individual tutors (b) how long students spent, on average, on all items in an intervention (c) how long students worked, on average, on post-tests. 

Timing information is very commonly required for studies, and can be calculated from DataShop logs relatively easily, so even if we don't have it, might be worth considering. (And we don't want to re-invent the wheel, if you already have it or are planning it...)
</blockquote>

==== Incorrect Step Duration and Hint Step Duration ====

As a researcher, I want to be able to see total step duration if the student's first attempt was an incorrect attempt, and total step duration if the student's first attempt was a hint request, so that I can do some analyses that I can't do with "Error Step Duration".

* Bob Hausmann, email, 11/11/2008.
* Updated title and story with 'step duration' instead of 'time'. -- [[User:Alida|Alida]] 10:36, 4 September 2009 (EDT)

==== Grading ====

[[Grading]]

====Display number of steps and number of observations for skills====
* How and where to be determined by developers. -- Ken (entered by Mimi)

== Reports ==
=== Dataset Info ===

==== Pointers to Hard-copy Data ====
* Brett van de Sande, NSF Site Visit, 5/28/2008
* Pointers to hard-copy data such as paper tests and/or homework.  Include contact information.  It doesn't seem to make sense to scan a whole filing cabinet of paper if no one wants to look at it.  And any secondary researchers don't know about the filing cabinet to ask for it.

==== Sort Problem Breakdown Table ====

* Would like the ability to sort the table by clicking on the column headings of the Problem Breakdown Table on the Dataset Info Tab. -- Bruce McLaren, User Mtg, 11/5/2007

==== Rename dataset ====

As a researcher, I want to rename my dataset so that it makes more sense to other people. I also want to make sure the dataset doesn't become polluted later by new data not associated with my study.
* Ruth Wylie, July 3, 2008
* There are reasons she would want to do this (current name is worthless, other researchers might try her tutor and pollute her data) but also reasons for not doing it (log more data later).
* There are risks in changing a dataset name that might not be apparent. For example, if you want the new data in the same dataset. [[User:Alida|Alida]] 10:27, 4 September 2009 (EDT)

==== Average time per problem, average number of problems, total number of sections ====

In addition to showing student hours per dataset, it would be useful to know the average time spent per problem, average number of problems completed, and the total number of sections.
* Noboru Matsuda, June 18, 2010

=== Error Report ===

==== View By Student ====

* Would like to see what a couple of students saw in the feedback. -- Marsha Lovett, 10/11/2007

==== Export ====
* I would like the ability to export this data. -- John LaPlante, email thread 'Suggestions for Improvement' 7/10/2007
* Also interested in this feature idea. -- Bruce McLaren, User Mtg, 11/5/2007

==== Sort ====

* Was planned for but not implemented in v2.1 (estimated to be a 4 day task)
** By Correctness %, starting with the least correct
** By Hints %
** Step (or KC if view by KC)
** Number of Students
* Ability to sort problems by their average experienced position within the curriculum. -- Ken Koedinger, 02/16/2007
** Which problem did students most often experience first, then the one experience second most often, ...
* Order steps by the order they typically are executed by students. -- Ken Koedinger, email 11/7/2008
:"Searching through the steps in a problem to get a sense of what is going on is currently hard because the steps are ordered alphabetically, not by the order in which most students did them. While not all students do all steps in the same order, there is some regularity there. It would be quite useful if the steps could be ordered in a "typical order". This could be accomplished by using the time stamps (of the first (correct?) transaction?) for each step to determine rank order of the each step for each student in a problem and then for each problem average the rank order of each step across all students. Then arrange the steps in the Error Report by their average rank order -- that is, roughly speaking, the step that is most likely to be first across students (closest on average to first) goes first, the step with the next lowest rank goes next, etc."
* The capability to count the number of errors of each message type and sort in different ways, for instance by all errors that had no messages. -- Bruce McLaren, email 10/22/2007
:"This is an error analysis I recently did in Excel, using pivot tables, that might be handy if in the DataShop. This one is very important for tutors because the errors that occur most frequently, yet don't elicit messages to the students, are good candidates to become errors with feedback."

==== Show more than 500 problems ====

In the error report, can we see more than 500 problems? See set "Cog Model Discovery Experiment Spring 2010"
2/2888 selected.
(Showing the first 500)
-- Ken via email on 2/15/2011

=== Export ===

==== Include Step Start Time in transaction format ====

* The transaction-level export already includes problem start time; could it also include step start time? I can easily compute it myself, but it seems there's a specific algorithm that the student-step rollup uses, and it might be nice to include the same value here. --Ilya Goldin, email on 01/16/2012

==== Last attempt on step? column for transaction format ====

* Include a new column that shows whether the row is the last attempt on a step for a student or not. Could be 0 or 1 as value. Helpful for researchers who are doing grading of data. Transaction format. --Vincent Aleven, CTAT mtg 11/5/2010

==== Elapsed Time ====

* Include the elapsed time in preview and transaction export. It is more valuable than the transaction time as an absolute reference. Possible to keep both. --Ken Koedinger, Team Mtg 04/18/2008

==== SQL Format ====

* Option to export as an SQL file. -- Ken Koedinger, 03/26/2007, also brought up in June ET Meeting
** Ability to export an SQL dump of a dataset. --Kyle Cunningham, 04/03/2007

==== Specify Character for Blanks ====

* Ability to specify what character if any is used for blank. --Ryan Baker, email 8/9/2007
:"Not all tools handles TABTABTAB correctly on import. The period '.' is used to mean missing data in most stats packages. The word 'BLANK' is used in some other ones. Not an issue for Ryan as he wrote a preprocessor to convert blanks.

==== Opportunity (at KC) Count in Transaction Export ====
* Include the opportunity count in the transaction export (it's only in the student-step rollup) -- Noboru Matsuda, 10/08/2009
** Would be repetitive
** Would need one Opportunity column per KC

==== Export only rows that have KCs tagged ====
* Is it possible when a exporting dataset to include only the [transaction?] rows that have knowledge components tagged? And the same goes for KC models export, only include the items that have KC tagged? -- Hui Cheng, 01/19/2010
** We have the inverse of this option on the Performance Profiler, 'Include steps without a knowledge component', and with the Student-Problem export, 'Include Steps without KCs'.

==== Include Custom Fields in Student-Step Rollup ====
* Is it possible to include the custom field in the datashop [step] export? -- Hui Cheng, 03/01/2010
** "I am thinking about the student-step level export. We have a student from Statistics wanting to use assistment data from datashop. She wants to have problem set id (=curriculum id in datshop). For Assistment data, we decided to put problem set id in the custom field because in the Assistment a lot of problems are repeated in different problem sets. She needs data at the student-step level." (Hui email)
* Vote from Phil Pavlik too -- see [[Collected_User_Requests#Include_transaction_custom_fields_in_web_services_student-step_export|Phil's comment under Web Services]]

==== Don't duplicate rows in student-step format when not showing KCs ====
* If the checkbox to show knowledge components is not checked, maybe it doesn't make sense to show rows more than once if they have more than one KC associated with the step. (Mimi (and Brett) stumbled on this. 8/16/2010)

==== Student-Step Rollup include Success Column ====
* Step rollup, 1 if correct, 0 if incorrect/hint, blank otherwise call it Success. ~~ Ken Koedinger, DataShop Team Meeting, Oct 22, 2010

=== Learning Curve ===

==== Default sort by observation on LCPID ====

* Default sort by observation on LCPID. ~~ Ken Koedinger, DataShop Team Meeting, 10/22/2010

==== Purple Point ====
* Purple Point: if a point on the LC has more than one KC associated with it but you have drilled down to a given KC, then the blue line is off. We could put a purple point that takes this into account.
* Simpler thing: display a warning message that some points in the display are driven by other KCs
* Pearson may be interested
* This was mentioned during the PSLC Summer School 2010.
* For a step with multiple skills, attribute the error only to the skill with the highest overall error rate. ~~ Alida, meeting with Ken, November 18, 2010

==== Reduce Scrolling ====
* Add a forward and back button to the graph to reduce scrolling. -- John LaPlante, email 7/10/2007

:"The learning curve page could use a forward and back button to cycle through learning the learning curves. Going through them one by one requires a lot of scrolling."

==== Turn On Point Labels ====

* It would be nice to have the option to turn on point labels. It is nice that I can mouse over a point and view the data but it would be nicer if it appeared automatically. -- John LaPlante, email 7/10/2007

==== Option for Bigger Graph ====

* Allow user to see bigger graph. -- Derek Chan, Winter Workshop 1/23/2008
** Potential solution: enable user to set x, y scale manually

==== LC Normalize Scale of Thumbnails ====
* [[LC Normalize Scale of Thumbnails]]

=== Performance Profiler ===

==== Rename Performance Profiler ====
* John laPlante, email thread 'Suggestions for Improvement' 7/10/2007
** Did not use this report as thought it had something to do with improving the performance of the DataShop itself. This report might have been much better to use then the Error Report, would still need an export as using the data in a tabular form was still necessary. Note that the pivot tables created were added to the dataset (Pittsburgh Science of Learning Center Stoichiometry Study 1).
* Lisa Anthony, email 8/2008
** Didn't know to go to the report. Export would be useful.
** Needed a better definition of Error Rate with respect to Problem and Unit rows.

==== Export ====
* John laPlante (see comments in Rename Performance Profiler)
* Lisa Anthony (see comments in Rename Performance Profiler)
* Yes, I too can imagine wanting to export the results of a particular performance profiler output (i.e., to a table) so that I can graph it my own way. --[[User:Koedinger|Koedinger]] 16:24, 16 September 2009 (EDT)

==== Table View ====
* Add option to switch to a table view.
* Columns are: Problem Name, Steps, % incorrect, Incorrect Steps, % hint, Hint Steps, etc. include all values in pop-up.

==== Union of KCs/Problems/Students ====
* Allow user to get the union of KCs/Problems/Students etc so they can compare across samples easier. -- Kirsten Butcher, Winter Workshop 1/23/2008
:"It is difficult to compare performance profiler graphs across samples because the KCs (or problems, or whatever) aren't necessarily in both of those samples."

==== Show Details In Report ====
* Click on bar to see details in report and not just in pop-up. It disappears too quickly. -- Alida, Brett

==== Show More Information in the Graph ====
* Show more information in the graph: [Bruce M, User Mtg, 11/5/2007]
** Had drilled down by a certain skill - skill is not listed in the graph, user has to check the skill list on the LHS to see what skill was selected
* Maybe related to Bruce's "show more info in graph": show the actual value of the range variable. e.g., when range is error rate, show the error rate number somewhere (right now you can see, via mouse-over, the incorrect, hint, and correct percentages, but not the error rate) -- Brett
* Design idea to show # steps incorrect, # steps hint, # steps correct to clarify how the percentages are calculated. -- Alida, Brett

== Sample Selector ==

==== Sub-Samples ====

* Sub Samples would be helpful. -- John LaPlante, email 7/10/2007
:"A nice solution would be to have sub-samples where one property varies. When I'm doing this analysis, I've changed my sample many times, renaming it sometimes, tweaking it to get variations on the data. The samples are really useful but they could help me a lot more with this kind of experimentation."

==== Filter out students ====

As a researcher, I want to filter out test users (including myself) from my data so that I see less noise in the data.

* Ruth Wylie, July 3, 2008
* You can already filter out test users by using a test user id that starts with 'weirdCMUuser_xxx'. Then create a sample that excludes students with a name like 'Test_%'. --[[User:Alida|Alida]] 10:29, 4 September 2009 (EDT)

==== Filter by "Class" ====

As a researcher, I want to create samples based on "class" since class is how I've encoded my conditions.

* Maaike Waalkens, July 30, 2010
* This is what made sense using Mathtutor for tutor delivery.
* What other fields are we missing in the Sample Selector?

==== Filter by "Step" ====

As a researcher, I want to create samples based on "step" since I'm only interested in one particular step.

* Mimi McLaughlin, August 17, 2010

==== Create Sample Automatically ====

* Would it be possible for me to get a random sample from the 'Bridge to Algebra 2006-2007' dataset of 100 students? I am having trouble looking at the data because it takes too long to load, and my adviser thought that was because the dataset was too large. ~~ DataShop User, 10/19/2010

== Web Services ==

==== Include transaction custom fields in web services student-step export ====

* "I noticed in the web services guide that cfs (which provides the custom fields) is not yet implemented for step roll-up tables. It is marked [coming soon] in the manual... I could write a workaround to pull in the transactions and lookup the custom fields, but I'd really rather not." -- Phil Pavlik, 4/27/2011

* "It would be best if the CF for step rollup was drawn from the primary transaction for the first attempt that is correct or incorrect. At least that seems like the generally best value. Basically, there may be many cases where the custom field is the same across all transactions for a step. In this case ... you could just use the first one since they are all the same." -- Phil Pavlik, 4/29/2011

==== Use Custom Fields Graphs/Reports ====
In Graphs

As a researcher creating custom fields and assigning values at the transaction level, I want DataShop to perform the aggregation to the step level so that I can do other things with my custom-field variable such as graph it. -- Ryan Baker, mtg w/Alida & Brett, 12/15/2008

Performance Profiler
* Use all discrete variables/custom fields on the left and all the continuous variables on the bottom of the Performance Profiler. -- [[User:Koedinger|Ken Koedinger]], DataShop Team Meeting, Sept 11, 2009

==== Ad-hoc queries ====

* Allow restricted filtering on steps and transactions as the next web service feature (after CFs), whatever they can filter on in the navigation boxes (User Meeting AAR, December 9, 2009)

==== Sample creation as a web service ====

* Sample creation is still too slow. (User Meeting AAR, December 9, 2009)

 
----
See completed [[DataShop 3.x Features]] 
See on-going [[DataShop 4.x Features]] 
See prioritized [[DataShop Feature Wish List]]
[[Category:Protected]]
[[Category:DataShop]]

Collected User Requests

2012-01-18T23:01:34Z

Bleber: /* Export */

See prioritized items on [[DataShop Feature Wish List]].

== Annotations ==
* Some links from Ruogo Kang's (CMU PhD student, Sara Kiesler) recent talk. -- Ken, email, 8/24/2011
** http://vis.stanford.edu/papers/senseus
** http://vis.berkeley.edu/papers/commentspace/

==== Annotations on Transaction Level ====
* I have models which can annotate things like: gaming, bored, etc. on the transaction level. -- Ryan Baker, ET Mtg 12/5/2007

==== Annotations on Student Level ====
* Would annotate on student level. -- Ido Roll, User Meeting, 1/19/2009

==== Annotations on Pages ====
* See the cool thing created by Jeffery Heer where all the settings of the page were recorded with the comment. -- Ryan Baker, DS Team Mtg 5/23/2008

==== Dataset Discussion - Capture data-integrity issues ====
* As a stakeholder in the DataShop project, I want to capture and publicize the data-integrity issues discovered with data sets so that data is better documented (and so we've fulfilled a promise to our funders to better document data). -- Ken Koedinger, Team Meeting, 8/15/2009
* As a user of DataShop, I want to discuss datasets and have that discussion attached to the dataset so that others can comment and better understand any data-integrity issues I've found.

==== Linking to internal pages ====

* It would be handy if a link can be saved to any dataset, sample, page in the DataShop. -- [[Alida]], 10/18/2007
** Currently, https://pslc-qa.andrew.cmu.edu/datashop/DatasetInfo?datasetId=793 works if you are already logged in.

==== Have a link from the DataShop to the Theory Wiki (Dataset to Project Page) ====

* Can we link from the dataset to the project page on the Theory Wiki? In the pipeline have a clickable link to the project page (make project name clickable). -- Michael Bett, ET Mtg 11/14/2007
** Link to a dataset directly? Is that obvious to users? Click on dataset link -> log in -> redirected back to dataset. -- Brett Leber

== Data Format ==
==== Capture the question prompt and answer choices the student chose from ====
* As a research, I want to be able to identify steps based on the question prompt—not the difficult-to-understand step names that come from selection and action of my tutor—so that I can analyze the data more easily. For multiple choice questions, I also want to be able to see all of the choices that were available to the student. -- Eli Silk - February 1, 2011 (meeting with Brett and Ross Higashi of the FIRE project)
** Near-term solution is to create a table locally that maps steps to prompts
** Another solution is to ask CTAT team to modify their Flash components so that they log this information as custom fields
** Long-term solution is to make these fields into standard fields in the tutor message format
** In the email thread "Cognitive Tutor Development and Evaluation Requests", there is support for this general idea from Ken Koedinger, Albert Corbett, and Christian Schunn.
** Ken added that "Ideally, we may want to store any images that the student can see and where they reside on the screen ..."

== Data Modeling ==
=== Non-KC Modeling ===

==== Automatic Distillation ====
* As an educational data miner wishing to develop a machine learned model with PSLC data, I would like to be able to automatically distill data features (e.g. custom fields) commonly used in past educational data mining research for a new data set (see, for instance, Baker, Corbett, Roll, & Koedinger, 2008 in UMUAI) -- Ryan Baker, Summer 2008, Startup Memo
** Could be implemented as a plug-in
* Also interested in this feature idea. -- Dan Franklin, Oct 2008

==== Upload model and apply it to new data set ====

* As an EDM researcher, I would like to take a model, expressable as a linear formula on DataShop fields, or a simple code procedure (e.g. Bayesian Knowledge Tracing, which Ryan has code for), and apply it to a new data set, so that I can ... -- Ryan Baker, Sept 2008
* Also interested in this. -- Maxine Eskenazi, Sept 2008
* May work best as a plug-in
** Code to display GUI to choose which data sets to use, calls model code, re-import to DataShop
** Good to have a way to apply many models, as soon as you import a data set
* Phil has an idea that maybe fits within this one. Please move if there's a better category. -- Brett Leber
<blockquote>This [''transaction? kc? --ed.''] relabeling is really mostly about enabling modeling in DataShop right? With this in mind, I think that it is actually a higher priority to have model alternatives in DataShop.... E.g. Investigators should be able to give you chunks of Java code according to a certain specification, and DataShop should be then able to run these over datasets (perhaps after a certain series of QA occurs according to an SOP) when the investigator clicks some button in DataShop.... Obviously this is a much large project than adding columns, but it is also much more important in my mind. ''-- Phil Pavlik, email to Brett on 1/14/2009''</blockquote>
* Examples:
** Example: running gaming detector in multiple tutors and comparing gaming frequencies
** Example: applying Bayesian Knowledge Tracing to a new data set from the same LearnLab
** Example: applying Ben Shih's models to many data sets. Note that Ben Shih should be included in design of this feature; he is interested, and has a lot of good ideas.

==== Add Different Predicted Values ====
* Would also like to add statistics, different predicted values than what LFA produces. -- Ken Koedinger, ET Meeting, 10/10/2007

==== Bayesian Knowledge Tracing ====
* Bayesian Knowledge-Tracing built into DataShop like LFA is. -- Ryan Baker, Startup Memo, Summer 2008

==== Richer statistics for KC modeling ====

* In addition, to the model stats and estimates generated for learning factor models, we should also create difficulty factor models (i.e., ones with "Slope" parameter). The latter is particularly relevant for the Unique-Step model where the slope parameter is meaningless (but still counts against the BIC value). -- Ken Koedinger, Email "new feature request", 1/22/2009
** Relatedly, we should report significance values on the Slope parameters -- that is, when is the Slope significantly different from 0.
** The KC models page perhaps should also report the log-likelihood and number of parameters (in addition to BIC) and leave out AIC. We might also consider other metrics of model generality, like the "adjusted R2" (if I have this name right -- Joe Beck mentioned in the Assistments meeting yesterday).
** These changes will be part of meeting the CMDM goal of improvement in (or at least demonstrate acceptability of) the cognitive models in 90+ units in our LearnLab courses (or affiliates).

=== KC Modeling ===

==== LFA/AFM: Check if enough memory using formula ====
* The LFA/AFM code could calculate how much RAM would be needed to run the algorithm on a given skill model using the formula provided by Hao. This formula is based on the number of transactions, number of students and number of skills. Right now it will not schedule itself to run on a model with over 300 skills, though there is a manual override. [[User:Alida|Alida]] 13:35, 29 November 2010 (EST)

==== Create KC Models through Web Services ====
* For John Stamper's CMDM project, it would be nice to automatically update KC Models through web services. -- [[User:Koedinger|Ken Koedinger]], DataShop Team Meeting, Sept 11, 2009

==== Automatically discovering new KC model ====

* Would it be possible to run some code (perhaps Hao's KC model selection code, perhaps something else generated by CMDM thrust) to find new best KC model. -- Vincent Aleven, Sept 2008
* As a learning sciences researcher, I would like DataShop to discover a new/better KC model for me.
* Could be done as a plug-in

==== Generate new KC Models with LFA ====
* Not sure who asked for this.
* It would be nice to generate new KC Models with Hao's LFA code
* Would need to specify factors.
* Ideas on where this could run?
** On a separate server? Request it to be run, specify duration. Have separate server queue up requests, email user when done.
** In Java Applet on client machine? -- Phil Pavlik

==== Same Skill Twice on Same Step ====
* Would like to be able to apply the same skill to a step twice during a KC Model Import. -- Ken Koedinger, email, 2/4/2009

==== Save KC Model Import Files ====
* KC Model Import - save the file used to create the KCMs in case we need to recreate them. -- Ken Koedinger, email 3/4/2009

==== Log Likelihood and MAD ====
* Log Likelihood, MAD (mean absolute deviation) problem, MAD step (store and show) -- Hao Cen
** This is a variation on "Richer statistics for KC modeling" above. Probably should be merged. - Ken Koedinger

==== Better naming for KCs in auto-generated Unique-Step KCM ====

As a researcher, I want the KCs in the Unique-Step model to have better names than KC1, KC2, etc, so that I can easily tell which generated KCs go with which unique step.

* Hui Cheng (Email 1/20/2009), Ken Koedinger (Email 1/22/2009)
* Could you put this in your new feature request list: could the “Unique-step model” be better labeled then just “KC1”, “KC2” and etc? For example, for Assistments data, you could use part of the “Step Name”. -- Hui Cheng, Email 1/20/2009
* But, anything is better than "KC". -- Ken Koedinger, email, 1/22/2009
<blockquote>A simple alternative, that preserves uniqueness and addresses length, is to concatenate: 1) the first K letters of the step name 2) a unique numerical increment (same as the "3" in "KC3"). Note that (2) guarantees uniqueness just as it does in the current "KC<num>" scheme. Or perhaps better given that step names are often scoped within problems, is to concatenate: 1) the first L letters of the problem name 2) the first M letters of the step name 3) a unique numerical increment (just like the "3" in "KC3") I think K or L+M should be as big as possible without making the KC names indistinguishable (because they run off the right margin) in the KC list on the Learning Curve and other pages. </blockquote>

==== Visualize Learning Curve Split ====
* Be able to visualize a learning curve split into 2 based on a specification of a subset of problems. -- Albert Corbett, Math CCM, November 2008
* Harder: make me a new KCM out of it. -- Ken Koedinger, Team Mtg, Dec 5, 2008, while trying to describe Albert's request to Alida

==== Statistical Significance ====
* Can DataShop determine if the difference between conditions or learning curves is statistically significant? -- general theme at workshop, probably mentioned by Bob Hausmann in his talk, Winter Workshop 1/23/2008
** Can't do it yet in DataShop, but I can show you how to do it in R (or SPSS...) after you've exported the data -- export the "student-step rollup" than the whole transaction table. --[[User:Koedinger|Koedinger]] 16:30, 16 September 2009 (EDT)
*** This comment perhaps belongs (exists?) elsewhere: The current student-step rollup export (relly, all exports) should be such that I can immediately load it into R (and other packages) without error. Now errors occur, for instance, because there are "#" in the variable names i the student-step rollup. However, it is currently a road-block for helping folks like Bob do analyzes they want to do. --[[User:Koedinger|Koedinger]] 16:30, 16 September 2009 (EDT)
*** See [[Condition in Student-Step Rollup]] -- [[User:Alida|Alida]] 10:35, 17 September 2009 (EDT)

==== Split and Merge Skills ====

* Hand searches though a p-matrix for a dataset to split and merge skills. (Pie in the Sky) -- Ken Koedinger, Team Mtg, 02/22/2008

==== Notes on new KCMs ====

* It would be good if I could add a note to a KC model that was newly imported. -- Noboru Matsuda, email, Nov 19, 2009

==== Display Learning Curves Grouped by Interestingness ====

* The page displaying all the learning curves today seems to be alphabetically sorted by KC name, which is not necessarily meaningful.
* An alternative presentation is to group the curves into 4 sets, breaking up the page. Set 1 has curves that contain significant spikes, and therefore seem to be "low-hanging fruit" for purposes of breaking up into KCs. Set 2 has curves with few spikes, but they have a long X axis, suggesting that students are presented with too many opportunities to acquire those KCs. Set 3 has the "good" curves, i.e., nicely decreasing curves that are not too long. Set 4 is "other". -- [http://www.pitt.edu/~goldin Ilya Goldin] 7 December 2009

==== Order KC models according to AIC====
* This is based on results from multiple dataset analyses that compared AIC, BIC and loglikelihood to cross validation RMSE. -- Mimi McLaughlin, 2/9/2011

====Use log of opportunity count for AIC and BIC calculations====
* We compared using the log of opportunity count to whole number opportunity count in multiple datasets. We found the results for log of opportunity count to be consistently better, though small. -- Mimi McLaughlin, 2/9/2011

== Developer Requests ==

==== Store Converted Date and Converter Info ====

As a DataShop administrator, I'd like to see the converter information (version and date) stored in the database, so that I do not have to store that data manually in the Additional Notes field each time I load a dataset. -- Kyle, 8/5/2008

* DTD new fields:
** Store conversion and converter information in database (anything else?)
*** conversion_time
*** converter_info
* Email from Octav, 10/5/2007

I see there's also a note about the converter version in the Dataset
Info. Which is good, but it seems it's taken from the directory name
when I submitted the set. I don't know how reliable that is.  :-) It
would be better if it's taken from the new converter info field.

==== Convert from XML to tabbed-delimited format ====

* If the users agree that export format is valuable, then maybe if they could convert from XML to export format to see data in Excel, could look at Selection column and see blanks more easily. -- Jonathan Sewall, ET Mtg, 10/10/2007

==== Plug-ins (general issues) ====

* Please please support Ruby on Rails. -- Ben Shih, December 2008

==== Create UI to grant DataShop user roles ====

* Already tedious.
* Alida, User Meeting AAR on December 9, 2009

== Help ==

==== Specialize Label of Help Button ====
* Since this help is better than in most applications, it should say more than just 'Help'. -- Ken Koedinger & Ryan Baker, Team Mtg, May 30, 2008
** Ideas:
*** 'Page Help'
*** 'Help with this page'
*** 'Help with Learning Curve page' (Ken's favorite)
*** 'Help with this tool'

== Home Page ==

==== Redesign the Home Page ====

* In the menu of data sets at the top, include the N (=20 or as many as fit on the screen?) data sets that I have visited starting with the ones I've visited most recently. --[[User:Koedinger|Koedinger]] 16:16, 16 September 2009 (EDT)
** As of today, the menu shows the last 10 data sets, most recently visited at the top. I think the feature is good enough, but let's ask Ken --[[User:Bleber|Bleber]] 10:56, 6 August 2010 (EDT)
* There needs to be a better ordering for the datasets (DS364)
* Maybe a search to filter the list of datasets since the list is so long. -- Brett Leber, 6/14/2007
* Going back to the home page always goes to 'My Datasets' (DS313)
* Maybe show more high level stats on this page, like how many transactions, students, skill models
* Allow users to post and share project documentation (files, papers, other meta-info) -- Ruth Wylie, suggested during meeting on 8/4/2010. She had a file that is relevant to multiple datasets in her project.

== Import ==
== Miscellaneous ==

==== Analyses by LearnLab ====

* Organize data by LearnLab, not by "data set". -- Ryan Baker, Aug 2008
* Also: Bob Hausmann, Sep 2008; Maxine Eskenazi, Sep 2008
* Essentially, current data sets become samples, but the top-level unit is the LearnLab. You can take every data set in a LearnLab together as a sample.
* Implies being able to run analyses across data sets, and export multiple data sets together; to create multi-data set samples
* As a user of DataShop, I would like to look at learning curves for all Algebra data together (for example), or export all Algebra data
* Important long-term, but is a lot of work -- in particular, we need to solve scalability issues first.

==== Save Settings Between Sessions ====
* It would be useful if DataShop could save settings between sessions. -- Bob Hausmann, User Meeting, 2/1/2008
** "I do a lot of redoing the same steps" (e.g., set cutoffs, select a KC model, select students).

==== Multiple steps per transaction ====

* Needed so that we do not have to create multiple transactions for the same actual action for Andes logs. -- Kurt van Lehn, Feb 2007

==== Demographic data ====

* This has been mentioned by NSF visitors, AB, ESL, and some researchers.
* Also mentioned at Winter Workshop 1/23/2008.
** Derek/Sue-mei: Student background information not in DataShop. Would like to see a student or set of students from a particular demographic, and view them across datasets!
* Note that Gail added demographic data to Additional Notes field on the Dataset Info page for many datasets. The idea here is to put that data into the database somewhere.

==== Single Sign On ====
* Michael Bett, email, 10/8/2007
* It would be nice if the following services have a single login account/password:
*# Theory Wiki
*# Learnlab.org
*# ESL's OSS
*# DataShop

==== Reveal unanonymized student IDs ====

As a researcher/PI performing research assistant tasks, I want to see easily the unanonymized student IDs of subjects in DataShop so that I can email my subjects telling them when to use my system.

* Ruth Wylie, July 3, 2008
* As the honest broker of the PSLC data, I have promised to not reveal the student IDs and to protect the identity of the students. This is part of the DataShop IRB. Therefore, I do not see this request as possible. [[User:Alida|Alida]] 09:53, 4 September 2009 (EDT)
** At some point, some coordination would be good with OLI's Digital Dashboard project that Marsha Lovett (and the OLI team and sometimes me) is working on. This is meant to provide usage information more quickly to instructors. It could also perhaps be used by researchers (with the right IRB rights) in situations like Ruth's.

==== Knight Timeline ====
* Developed by Andrea Knight, 2004

==== Buggy Skills ====
* Ken Koedinger, prototype walkthrough 9/11/2006

==== Confusion Matrix ====
* Brian MacWhinney, prototype walkthrough 9/11/2006

== Navigation Bar ==
==== Filter KCs by Name ====

As a researcher working with KCs, I want to filter KCs based on their names, so that I can...
* Vincent Aleven, Email, 2/3/2008
* "Since Alida said you cannot have two mechanisms for putting together your KC set (i.e., cannot have both the selecting-by-clicking and selecting-by-filtering), I would probably opt for the latter."
* Alida: I thought Vincent mentioned that he'd like to select which KCs are in a set by filtering on the name. Example: Include KCs with '*reason*' in the name and exclude KCs with '*given*' in the name.
* This could be an addition to our v3.0 KC-selection mechanism--filter by name.
* Vincent, Email, 5/6/2009: Expressed another need for this feature. Could just allow for a wider area and longer list so that more items can be checked at once. The number of characters we show right now is not enough because in many cases that number of characters is the same across many of the skills. Reference data set: Geometry CWCTC 2005-2006

* Status: Design Started

==== Facebook-style KC Selection ====

As a researcher working with KCs, I want to select KCs based on the learning curve thumbnail, so that I can see quickly which ones I'm interested in exploring more deeply.

* Feature already designed for v3.0, not implemented due to time constraints.
* Agreed this would be really useful. -- Kirsten Butcher, User Mtg, 1/31/2008
* Status: Guestimate: 20 days, need to revisit requirements document

==== Feedback after clicking a large sample on a large dataset ====

As a user of DataShop (first-time or not), I want some feedback and the ability to cancel after I do something that might take a long time (e.g., clicking "All Data" on a large dataset) so that I do not get stuck.

* Part of the Susan Goldman story
* After clicking a sample for a large dataset, there is no "Loading..." text, no feedback that the click was even registered by the app (besides the sometimes busy cursor and small browser "loading" text), nor the ability to cancel the action. 
* We will always have similar problems even if performance is improved, so providing feedback and the opportunity to cancel is critical.

==== Save Button in Problem Navigation Box ====

* Save buttons in the sidebar. -- Ken Koedinger, Mtg 2006
** Could also put one in the Problem selection box in the sidebar.

==== Make Nav Bar Wider ====
* Make the Sample name and description fields much wider. -- John LaPlante, email 7/10/2007

== New Visualizations/Reports ==

==== Student-KC Rollup ====
As a researcher, I want to see KCs rolled up by student, so that I can ...

* Vincent Aleven, User Mtg, 1/29/2008
** By Student-KC would be more useful than by Student-Problem
** Example: # Steps asking for a hint or error or what proportion had help
** How often bottom out hint occurs

==== Instructor Reports ====
* Phil said he received a lot of positive reactions to providing reports on units for instructors. -- Phil Pavlik, ET Mtg 10/10/2007

==== Manage Authorizations/Projects Page ====
* Lisa Anthony, email 10/23/2007
* Allow PI to change permissions on the datasets.
:"Actually, I couldn't see how to change permissions on the datasets from the website. Is this possible? If not, it might be a nice feature..."

==== Calculate Time Spent on Different Study Activities ====

As a researcher, I want to know how much time, on average, students spend on study activities, so that I can ...

* Bruce McLaren, Email, 4/7/2009
<blockquote>
For my most recent stoich study, Shawn and I are interested in calculating timing information such as: (a) how long students spent, on average, working on individual tutors (b) how long students spent, on average, on all items in an intervention (c) how long students worked, on average, on post-tests. 

Timing information is very commonly required for studies, and can be calculated from DataShop logs relatively easily, so even if we don't have it, might be worth considering. (And we don't want to re-invent the wheel, if you already have it or are planning it...)
</blockquote>

==== Incorrect Step Duration and Hint Step Duration ====

As a researcher, I want to be able to see total step duration if the student's first attempt was an incorrect attempt, and total step duration if the student's first attempt was a hint request, so that I can do some analyses that I can't do with "Error Step Duration".

* Bob Hausmann, email, 11/11/2008.
* Updated title and story with 'step duration' instead of 'time'. -- [[User:Alida|Alida]] 10:36, 4 September 2009 (EDT)

==== Grading ====

[[Grading]]

====Display number of steps and number of observations for skills====
* How and where to be determined by developers. -- Ken (entered by Mimi)

== Reports ==
=== Dataset Info ===

==== Pointers to Hard-copy Data ====
* Brett van de Sande, NSF Site Visit, 5/28/2008
* Pointers to hard-copy data such as paper tests and/or homework.  Include contact information.  It doesn't seem to make sense to scan a whole filing cabinet of paper if no one wants to look at it.  And any secondary researchers don't know about the filing cabinet to ask for it.

==== Sort Problem Breakdown Table ====

* Would like the ability to sort the table by clicking on the column headings of the Problem Breakdown Table on the Dataset Info Tab. -- Bruce McLaren, User Mtg, 11/5/2007

==== Rename dataset ====

As a researcher, I want to rename my dataset so that it makes more sense to other people. I also want to make sure the dataset doesn't become polluted later by new data not associated with my study.
* Ruth Wylie, July 3, 2008
* There are reasons she would want to do this (current name is worthless, other researchers might try her tutor and pollute her data) but also reasons for not doing it (log more data later).
* There are risks in changing a dataset name that might not be apparent. For example, if you want the new data in the same dataset. [[User:Alida|Alida]] 10:27, 4 September 2009 (EDT)

==== Average time per problem, average number of problems, total number of sections ====

In addition to showing student hours per dataset, it would be useful to know the average time spent per problem, average number of problems completed, and the total number of sections.
* Noboru Matsuda, June 18, 2010

=== Error Report ===

==== View By Student ====

* Would like to see what a couple of students saw in the feedback. -- Marsha Lovett, 10/11/2007

==== Export ====
* I would like the ability to export this data. -- John LaPlante, email thread 'Suggestions for Improvement' 7/10/2007
* Also interested in this feature idea. -- Bruce McLaren, User Mtg, 11/5/2007

==== Sort ====

* Was planned for but not implemented in v2.1 (estimated to be a 4 day task)
** By Correctness %, starting with the least correct
** By Hints %
** Step (or KC if view by KC)
** Number of Students
* Ability to sort problems by their average experienced position within the curriculum. -- Ken Koedinger, 02/16/2007
** Which problem did students most often experience first, then the one experience second most often, ...
* Order steps by the order they typically are executed by students. -- Ken Koedinger, email 11/7/2008
:"Searching through the steps in a problem to get a sense of what is going on is currently hard because the steps are ordered alphabetically, not by the order in which most students did them. While not all students do all steps in the same order, there is some regularity there. It would be quite useful if the steps could be ordered in a "typical order". This could be accomplished by using the time stamps (of the first (correct?) transaction?) for each step to determine rank order of the each step for each student in a problem and then for each problem average the rank order of each step across all students. Then arrange the steps in the Error Report by their average rank order -- that is, roughly speaking, the step that is most likely to be first across students (closest on average to first) goes first, the step with the next lowest rank goes next, etc."
* The capability to count the number of errors of each message type and sort in different ways, for instance by all errors that had no messages. -- Bruce McLaren, email 10/22/2007
:"This is an error analysis I recently did in Excel, using pivot tables, that might be handy if in the DataShop. This one is very important for tutors because the errors that occur most frequently, yet don't elicit messages to the students, are good candidates to become errors with feedback."

==== Show more than 500 problems ====

In the error report, can we see more than 500 problems? See set "Cog Model Discovery Experiment Spring 2010"
2/2888 selected.
(Showing the first 500)
-- Ken via email on 2/15/2011

=== Export ===

==== Include Step Start Time in transaction format ====

* The transaction-level export already includes problem start time; could it also include step start time? I can easily compute it myself, but it seems there's a specific algorithm that the student-step rollup uses, and it might be nice to include the same value here. --Ilya Goldin, email on 01/16/2012

==== Last attempt on step? column for transaction format ====

* Include a new column that shows whether the row is the last attempt on a step for a student or not. Could be 0 or 1 as value. Helpful for researchers who are doing grading of data. Transaction format. --Vincent Aleven, CTAT mtg 11/5/2010

==== Elapsed Time ====

* Include the elapsed time in preview and transaction export. It is more valuable than the transaction time as an absolute reference. Possible to keep both. --Ken Koedinger, Team Mtg 04/18/2008

==== SQL Format ====

* Option to export as an SQL file. -- Ken Koedinger, 03/26/2007, also brought up in June ET Meeting
** Ability to export an SQL dump of a dataset. --Kyle Cunningham, 04/03/2007

==== Specify Character for Blanks ====

* Ability to specify what character if any is used for blank. --Ryan Baker, email 8/9/2007
:"Not all tools handles TABTABTAB correctly on import. The period '.' is used to mean missing data in most stats packages. The word 'BLANK' is used in some other ones. Not an issue for Ryan as he wrote a preprocessor to convert blanks.

==== Opportunity (at KC) Count in Transaction Export ====
* Include the opportunity count in the transaction export (it's only in the student-step rollup) -- Noboru Matsuda, 10/08/2009
** Would be repetitive
** Would need one Opportunity column per KC

==== Export only rows that have KCs tagged ====
* Is it possible when a exporting dataset to include only the [transaction?] rows that have knowledge components tagged? And the same goes for KC models export, only include the items that have KC tagged? -- Hui Cheng, 01/19/2010
** We have the inverse of this option on the Performance Profiler, 'Include steps without a knowledge component', and with the Student-Problem export, 'Include Steps without KCs'.

==== Include Custom Fields in Student-Step Rollup ====
* Is it possible to include the custom field in the datashop [step] export? -- Hui Cheng, 03/01/2010
** "I am thinking about the student-step level export. We have a student from Statistics wanting to use assistment data from datashop. She wants to have problem set id (=curriculum id in datshop). For Assistment data, we decided to put problem set id in the custom field because in the Assistment a lot of problems are repeated in different problem sets. She needs data at the student-step level." (Hui email)
* Vote from Phil Pavlik too -- see [[Collected_User_Requests#Include_transaction_custom_fields_in_web_services_student-step_export|Phil's comment under Web Services]]

==== Don't duplicate rows in student-step format when not showing KCs ====
* If the checkbox to show knowledge components is not checked, maybe it doesn't make sense to show rows more than once if they have more than one KC associated with the step. (Mimi (and Brett) stumbled on this. 8/16/2010)

==== Student-Step Rollup include Success Column ====
* Step rollup, 1 if correct, 0 if incorrect/hint, blank otherwise call it Success. ~~ Ken Koedinger, DataShop Team Meeting, Oct 22, 2010

=== Learning Curve ===

==== Default sort by observation on LCPID ====

* Default sort by observation on LCPID. ~~ Ken Koedinger, DataShop Team Meeting, 10/22/2010

==== Purple Point ====
* Purple Point: if a point on the LC has more than one KC associated with it but you have drilled down to a given KC, then the blue line is off. We could put a purple point that takes this into account.
* Simpler thing: display a warning message that some points in the display are driven by other KCs
* Pearson may be interested
* This was mentioned during the PSLC Summer School 2010.
* For a step with multiple skills, attribute the error only to the skill with the highest overall error rate. ~~ Alida, meeting with Ken, November 18, 2010

==== Reduce Scrolling ====
* Add a forward and back button to the graph to reduce scrolling. -- John LaPlante, email 7/10/2007

:"The learning curve page could use a forward and back button to cycle through learning the learning curves. Going through them one by one requires a lot of scrolling."

==== Turn On Point Labels ====

* It would be nice to have the option to turn on point labels. It is nice that I can mouse over a point and view the data but it would be nicer if it appeared automatically. -- John LaPlante, email 7/10/2007

==== Option for Bigger Graph ====

* Allow user to see bigger graph. -- Derek Chan, Winter Workshop 1/23/2008
** Potential solution: enable user to set x, y scale manually

==== LC Normalize Scale of Thumbnails ====
* [[LC Normalize Scale of Thumbnails]]

=== Performance Profiler ===

==== Rename Performance Profiler ====
* John laPlante, email thread 'Suggestions for Improvement' 7/10/2007
** Did not use this report as thought it had something to do with improving the performance of the DataShop itself. This report might have been much better to use then the Error Report, would still need an export as using the data in a tabular form was still necessary. Note that the pivot tables created were added to the dataset (Pittsburgh Science of Learning Center Stoichiometry Study 1).
* Lisa Anthony, email 8/2008
** Didn't know to go to the report. Export would be useful.
** Needed a better definition of Error Rate with respect to Problem and Unit rows.

==== Export ====
* John laPlante (see comments in Rename Performance Profiler)
* Lisa Anthony (see comments in Rename Performance Profiler)
* Yes, I too can imagine wanting to export the results of a particular performance profiler output (i.e., to a table) so that I can graph it my own way. --[[User:Koedinger|Koedinger]] 16:24, 16 September 2009 (EDT)

==== Table View ====
* Add option to switch to a table view.
* Columns are: Problem Name, Steps, % incorrect, Incorrect Steps, % hint, Hint Steps, etc. include all values in pop-up.

==== Union of KCs/Problems/Students ====
* Allow user to get the union of KCs/Problems/Students etc so they can compare across samples easier. -- Kirsten Butcher, Winter Workshop 1/23/2008
:"It is difficult to compare performance profiler graphs across samples because the KCs (or problems, or whatever) aren't necessarily in both of those samples."

==== Show Details In Report ====
* Click on bar to see details in report and not just in pop-up. It disappears too quickly. -- Alida, Brett

==== Show More Information in the Graph ====
* Show more information in the graph: [Bruce M, User Mtg, 11/5/2007]
** Had drilled down by a certain skill - skill is not listed in the graph, user has to check the skill list on the LHS to see what skill was selected
* Maybe related to Bruce's "show more info in graph": show the actual value of the range variable. e.g., when range is error rate, show the error rate number somewhere (right now you can see, via mouse-over, the incorrect, hint, and correct percentages, but not the error rate) -- Brett
* Design idea to show # steps incorrect, # steps hint, # steps correct to clarify how the percentages are calculated. -- Alida, Brett

== Sample Selector ==

==== Sub-Samples ====

* Sub Samples would be helpful. -- John LaPlante, email 7/10/2007
:"A nice solution would be to have sub-samples where one property varies. When I'm doing this analysis, I've changed my sample many times, renaming it sometimes, tweaking it to get variations on the data. The samples are really useful but they could help me a lot more with this kind of experimentation."

==== Filter out students ====

As a researcher, I want to filter out test users (including myself) from my data so that I see less noise in the data.

* Ruth Wylie, July 3, 2008
* You can already filter out test users by using a test user id that starts with 'weirdCMUuser_xxx'. Then create a sample that excludes students with a name like 'Test_%'. --[[User:Alida|Alida]] 10:29, 4 September 2009 (EDT)

==== Filter by "Class" ====

As a researcher, I want to create samples based on "class" since class is how I've encoded my conditions.

* Maaike Waalkens, July 30, 2010
* This is what made sense using Mathtutor for tutor delivery.
* What other fields are we missing in the Sample Selector?

==== Filter by "Step" ====

As a researcher, I want to create samples based on "step" since I'm only interested in one particular step.

* Mimi McLaughlin, August 17, 2010

==== Create Sample Automatically ====

* Would it be possible for me to get a random sample from the 'Bridge to Algebra 2006-2007' dataset of 100 students? I am having trouble looking at the data because it takes too long to load, and my adviser thought that was because the dataset was too large. ~~ DataShop User, 10/19/2010

== Web Services ==

==== Include transaction custom fields in web services student-step export ====

* "I noticed in the web services guide that cfs (which provides the custom fields) is not yet implemented for step roll-up tables. It is marked [coming soon] in the manual... I could write a workaround to pull in the transactions and lookup the custom fields, but I'd really rather not." -- Phil Pavlik, 4/27/2011

* "It would be best if the CF for step rollup was drawn from the primary transaction for the first attempt that is correct or incorrect. At least that seems like the generally best value. Basically, there may be many cases where the custom field is the same across all transactions for a step. In this case ... you could just use the first one since they are all the same." -- Phil Pavlik, 4/29/2011

==== Use Custom Fields Graphs/Reports ====
In Graphs

As a researcher creating custom fields and assigning values at the transaction level, I want DataShop to perform the aggregation to the step level so that I can do other things with my custom-field variable such as graph it. -- Ryan Baker, mtg w/Alida & Brett, 12/15/2008

Performance Profiler
* Use all discrete variables/custom fields on the left and all the continuous variables on the bottom of the Performance Profiler. -- [[User:Koedinger|Ken Koedinger]], DataShop Team Meeting, Sept 11, 2009

==== Ad-hoc queries ====

* Allow restricted filtering on steps and transactions as the next web service feature (after CFs), whatever they can filter on in the navigation boxes (User Meeting AAR, December 9, 2009)

==== Sample creation as a web service ====

* Sample creation is still too slow. (User Meeting AAR, December 9, 2009)

 
----
See completed [[DataShop 3.x Features]] 
See on-going [[DataShop 4.x Features]] 
See prioritized [[DataShop Feature Wish List]]
[[Category:Protected]]
[[Category:DataShop]]

Web application student-step format should be the same as web services version

2011-09-06T20:05:33Z

Bleber: New page: '''Status: Prioritization Needed''' == User Story == == Notes/Comments == * The student-step format in the webapp is less useful than the one in web services. It suffers from the follo...

'''Status: Prioritization Needed'''

== User Story ==

== Notes/Comments ==

* The student-step format in the webapp is less useful than the one in web services. It suffers from the following shortfalls:
** It has two header rows instead of one, which trips up some analysis programs such as R
** It displays additional rows to capture steps with more than one KC, instead of using the web services approach (see note below
** It is not cached, while the web services version is.
* Michael Yudelson ran into the multiple-rows issue on 9/6/2011

''From the web services API:''

Important: The format of the KC model columns returned by web services Get Student-Step Records is
different from the format of these columns in the web application and from the Get Transactions web
service. In the web application’s current step format, multiple KCs associated with a step results in
multiple rows. In the web services version, multiple KCs are contained in a single value and delimited
with two tildes (“~~”), resulting in a single row for the student-step. The same rule is applied to the
Opportunity and Predicted Error Rate columns.

 
----
See completed [[DataShop 3.x Features]] 
See on-going [[DataShop 4.x Features]] 
See prioritized [[DataShop Feature Wish List]] 
See unordered [[Collected User Requests]]
[[Category:Protected]]
[[Category:DataShop]]
[[Category:DataShop Feature]]

DataShop Feature Wish List

2011-09-06T19:43:33Z

Bleber: /* Prioritized Features */

Below are two lists of features. The features that we have prioritized and decided to implement are in the first, ordered list. The features that the DataShop team and community are discussing are in an unordered list on the page [[Collected User Requests]]. Click on a feature to get more information about it, such as a description, rationale for building it, and its status.

'''You can help!'''

If you think a feature is important, vote for it by putting your name to the right of the feature. Discuss the feature on the comments section of that feature's page. We'll use these votes and the dialogue that develops to prioritize features.

Don't see a feature on the prioritized list? There's a good chance it's on the '''[[Collected User Requests]]''' page. You can add feature ideas there and discuss the existing ones. Include your comment, name, and date to vote on feature ideas there.

'''Tip:''' Easily sign your username and the current date/time by inserting four tildes (<nowiki>~~~~</nowiki>); insert just your username with three tildes.

''See [[DataShop On-going Features|features we are building now]], and [[DataShop Completed Features|ones we have built]].''

== Prioritized Features ==

# [[Terms of Use]]
# [[Speed up Aggregator]]
# [[Other Analysis Outputs]]
# [[Web Services - Add Custom Fields]] (add custom fields to transactions) — Vote: Ryan Baker (1), John Stamper (1)
# [[Push Button Import]] — Carnegie Learning, John Stamper
# [[Error Bars]] — Vote: Ken Koedinger (4)
# [[Add Latency Y-axis Options]] — Vote: Ken Koedinger (3)
# [[Adding Custom Fields through Web Application]] — Vote: Ryan Baker (2)
# [[Scalability]] — Vote: Ryan Baker (4)
# [[KC Model in Transaction Export]] — Vote: Vincent Aleven (2)
# [[Student Filter Dialog]]
# [[Milliseconds]]
# [[LFA-AFM on Sample]] — Vote: Ken Koedinger (5)
# [[Place for General Papers]]
# [[Performance Metrics]]
# [[Ability to display step-custom-fields in graphs]]
# [[Add Problem Content]]
# [[Dialogue Message Format]]
# [[Web application student-step format should be the same as web services version]] — Vote: Mike Yudelson

== Unordered Features ==
We have a long list of feature requests that have not been prioritized. Please see the
'''[[Collected User Requests]]'''.

== How to Request a Feature ==
* [[Write a User Story]]
* [[Create a Feature Page]]
* Add Link to Feature on [[Collected User Requests]] page.
 
----
See [[DataShop Completed Features|completed features]] 
See [[DataShop On-going Features|on-going features]] 
See unordered [[Collected User Requests]] 
See the [[:Category:DataShop Glossary|DataShop Glossary]]
[[Category:Protected]]
[[Category:DataShop]]

DataShop Feature Wish List

2011-09-06T16:54:53Z

Bleber: /* Prioritized Features */

Below are two lists of features. The features that we have prioritized and decided to implement are in the first, ordered list. The features that the DataShop team and community are discussing are in an unordered list on the page [[Collected User Requests]]. Click on a feature to get more information about it, such as a description, rationale for building it, and its status.

'''You can help!'''

If you think a feature is important, vote for it by putting your name to the right of the feature. Discuss the feature on the comments section of that feature's page. We'll use these votes and the dialogue that develops to prioritize features.

Don't see a feature on the prioritized list? There's a good chance it's on the '''[[Collected User Requests]]''' page. You can add feature ideas there and discuss the existing ones. Include your comment, name, and date to vote on feature ideas there.

'''Tip:''' Easily sign your username and the current date/time by inserting four tildes (<nowiki>~~~~</nowiki>); insert just your username with three tildes.

''See [[DataShop On-going Features|features we are building now]], and [[DataShop Completed Features|ones we have built]].''

== Prioritized Features ==

# [[Terms of Use]]
# [[Speed up Aggregator]]
# [[Other Analysis Outputs]]
# [[Web Services - Add Custom Fields]] (add custom fields to transactions) — Vote: Ryan Baker (1), John Stamper (1)
# [[Push Button Import]] — Carnegie Learning, John Stamper
# [[Error Bars]] — Vote: Ken Koedinger (4)
# [[Add Latency Y-axis Options]] — Vote: Ken Koedinger (3)
# [[Adding Custom Fields through Web Application]] — Vote: Ryan Baker (2)
# [[Scalability]] — Vote: Ryan Baker (4)
# [[KC Model in Transaction Export]] — Vote: Vincent Aleven (2)
# [[Student Filter Dialog]]
# [[Milliseconds]]
# [[LFA-AFM on Sample]] — Vote: Ken Koedinger (5)
# [[Place for General Papers]]
# [[Performance Metrics]]
# [[Ability to display step-custom-fields in graphs]]
# [[Add Problem Content]]
# [[Dialogue Message Format]]
# Web application student-step format should be the same as web services version — Vote: Mike Yudelson

== Unordered Features ==
We have a long list of feature requests that have not been prioritized. Please see the
'''[[Collected User Requests]]'''.

== How to Request a Feature ==
* [[Write a User Story]]
* [[Create a Feature Page]]
* Add Link to Feature on [[Collected User Requests]] page.
 
----
See [[DataShop Completed Features|completed features]] 
See [[DataShop On-going Features|on-going features]] 
See unordered [[Collected User Requests]] 
See the [[:Category:DataShop Glossary|DataShop Glossary]]
[[Category:Protected]]
[[Category:DataShop]]

DataShop Feature Wish List

2011-09-06T16:43:14Z

Bleber: /* Prioritized Features */

Below are two lists of features. The features that we have prioritized and decided to implement are in the first, ordered list. The features that the DataShop team and community are discussing are in an unordered list on the page [[Collected User Requests]]. Click on a feature to get more information about it, such as a description, rationale for building it, and its status.

'''You can help!'''

If you think a feature is important, vote for it by putting your name to the right of the feature. Discuss the feature on the comments section of that feature's page. We'll use these votes and the dialogue that develops to prioritize features.

Don't see a feature on the prioritized list? There's a good chance it's on the '''[[Collected User Requests]]''' page. You can add feature ideas there and discuss the existing ones. Include your comment, name, and date to vote on feature ideas there.

'''Tip:''' Easily sign your username and the current date/time by inserting four tildes (<nowiki>~~~~</nowiki>); insert just your username with three tildes.

''See [[DataShop On-going Features|features we are building now]], and [[DataShop Completed Features|ones we have built]].''

== Prioritized Features ==

# [[Terms of Use]]
# [[Speed up Aggregator]]
# [[Other Analysis Outputs]]
# [[Web Services - Add Custom Fields]] (add custom fields to transactions) — Vote: Ryan Baker (1), John Stamper (1)
# [[Push Button Import]] — Carnegie Learning, John Stamper
# [[Error Bars]] — Vote: Ken Koedinger (4)
# [[Add Latency Y-axis Options]] — Vote: Ken Koedinger (3)
# [[Adding Custom Fields through Web Application]] — Vote: Ryan Baker (2)
# [[Scalability]] — Vote: Ryan Baker (4)
# [[KC Model in Transaction Export]] — Vote: Vincent Aleven (2)
# [[Student Filter Dialog]]
# [[Milliseconds]]
# [[LFA-AFM on Sample]] — Vote: Ken Koedinger (5)
# [[Place for General Papers]]
# [[Performance Metrics]]
# [[Ability to display step-custom-fields in graphs]]
# [[Add Problem Content]]
# [[Dialogue Message Format]]
# Web application student-step format should be the same as web services version

== Unordered Features ==
We have a long list of feature requests that have not been prioritized. Please see the
'''[[Collected User Requests]]'''.

== How to Request a Feature ==
* [[Write a User Story]]
* [[Create a Feature Page]]
* Add Link to Feature on [[Collected User Requests]] page.
 
----
See [[DataShop Completed Features|completed features]] 
See [[DataShop On-going Features|on-going features]] 
See unordered [[Collected User Requests]] 
See the [[:Category:DataShop Glossary|DataShop Glossary]]
[[Category:Protected]]
[[Category:DataShop]]

Other Analysis Outputs

2011-05-25T17:50:16Z

Bleber: /* User Story */

'''Status: Prioritization Needed'''

== User Story ==

As a user of DataShop, I want to store my model results in DataShop so that I can collaborate with other researchers.

== Notes/Comments ==

* Talked about having "Other Analysis Outputs" as a tab in the DataShop interface so the users of web services can create a simple text file (some sort of free form document) and store it back to DataShop. -- From a meeting with Ken on 3/24/2011

 
----
See completed [[DataShop 3.x Features]] 
See on-going [[DataShop 4.x Features]] 
See prioritized [[DataShop Feature Wish List]] 
See unordered [[Collected User Requests]]
[[Category:Protected]]
[[Category:DataShop]]
[[Category:DataShop Feature]]

Collected User Requests

2011-05-18T14:12:15Z

Bleber: /* Include transaction custom fields in web services student-step export */

See prioritized items on [[DataShop Feature Wish List]].

== Annotations ==

==== Annotations on Transaction Level ====
* I have models which can annotate things like: gaming, bored, etc. on the transaction level. -- Ryan Baker, ET Mtg 12/5/2007

==== Annotations on Student Level ====
* Would annotate on student level. -- Ido Roll, User Meeting, 1/19/2009

==== Annotations on Pages ====
* See the cool thing created by Jeffery Heer where all the settings of the page were recorded with the comment. -- Ryan Baker, DS Team Mtg 5/23/2008

==== Dataset Discussion - Capture data-integrity issues ====
* As a stakeholder in the DataShop project, I want to capture and publicize the data-integrity issues discovered with data sets so that data is better documented (and so we've fulfilled a promise to our funders to better document data). -- Ken Koedinger, Team Meeting, 8/15/2009
* As a user of DataShop, I want to discuss datasets and have that discussion attached to the dataset so that others can comment and better understand any data-integrity issues I've found.

==== Linking to internal pages ====

* It would be handy if a link can be saved to any dataset, sample, page in the DataShop. -- [[Alida]], 10/18/2007
** Currently, https://pslc-qa.andrew.cmu.edu/datashop/DatasetInfo?datasetId=793 works if you are already logged in.

==== Have a link from the DataShop to the Theory Wiki (Dataset to Project Page) ====

* Can we link from the dataset to the project page on the Theory Wiki? In the pipeline have a clickable link to the project page (make project name clickable). -- Michael Bett, ET Mtg 11/14/2007
** Link to a dataset directly? Is that obvious to users? Click on dataset link -> log in -> redirected back to dataset. -- Brett Leber

== Data Format ==
==== Capture the question prompt and answer choices the student chose from ====
* As a research, I want to be able to identify steps based on the question prompt—not the difficult-to-understand step names that come from selection and action of my tutor—so that I can analyze the data more easily. For multiple choice questions, I also want to be able to see all of the choices that were available to the student. -- Eli Silk - February 1, 2011 (meeting with Brett and Ross Higashi of the FIRE project)
** Near-term solution is to create a table locally that maps steps to prompts
** Another solution is to ask CTAT team to modify their Flash components so that they log this information as custom fields
** Long-term solution is to make these fields into standard fields in the tutor message format
** In the email thread "Cognitive Tutor Development and Evaluation Requests", there is support for this general idea from Ken Koedinger, Albert Corbett, and Christian Schunn.
** Ken added that "Ideally, we may want to store any images that the student can see and where they reside on the screen ..."

== Data Modeling ==
=== Non-KC Modeling ===

==== Automatic Distillation ====
* As an educational data miner wishing to develop a machine learned model with PSLC data, I would like to be able to automatically distill data features (e.g. custom fields) commonly used in past educational data mining research for a new data set (see, for instance, Baker, Corbett, Roll, & Koedinger, 2008 in UMUAI) -- Ryan Baker, Summer 2008, Startup Memo
** Could be implemented as a plug-in
* Also interested in this feature idea. -- Dan Franklin, Oct 2008

==== Upload model and apply it to new data set ====

* As an EDM researcher, I would like to take a model, expressable as a linear formula on DataShop fields, or a simple code procedure (e.g. Bayesian Knowledge Tracing, which Ryan has code for), and apply it to a new data set, so that I can ... -- Ryan Baker, Sept 2008
* Also interested in this. -- Maxine Eskenazi, Sept 2008
* May work best as a plug-in
** Code to display GUI to choose which data sets to use, calls model code, re-import to DataShop
** Good to have a way to apply many models, as soon as you import a data set
* Phil has an idea that maybe fits within this one. Please move if there's a better category. -- Brett Leber
<blockquote>This [''transaction? kc? --ed.''] relabeling is really mostly about enabling modeling in DataShop right? With this in mind, I think that it is actually a higher priority to have model alternatives in DataShop.... E.g. Investigators should be able to give you chunks of Java code according to a certain specification, and DataShop should be then able to run these over datasets (perhaps after a certain series of QA occurs according to an SOP) when the investigator clicks some button in DataShop.... Obviously this is a much large project than adding columns, but it is also much more important in my mind. ''-- Phil Pavlik, email to Brett on 1/14/2009''</blockquote>
* Examples:
** Example: running gaming detector in multiple tutors and comparing gaming frequencies
** Example: applying Bayesian Knowledge Tracing to a new data set from the same LearnLab
** Example: applying Ben Shih's models to many data sets. Note that Ben Shih should be included in design of this feature; he is interested, and has a lot of good ideas.

==== Add Different Predicted Values ====
* Would also like to add statistics, different predicted values than what LFA produces. -- Ken Koedinger, ET Meeting, 10/10/2007

==== Bayesian Knowledge Tracing ====
* Bayesian Knowledge-Tracing built into DataShop like LFA is. -- Ryan Baker, Startup Memo, Summer 2008

==== Richer statistics for KC modeling ====

* In addition, to the model stats and estimates generated for learning factor models, we should also create difficulty factor models (i.e., ones with "Slope" parameter). The latter is particularly relevant for the Unique-Step model where the slope parameter is meaningless (but still counts against the BIC value). -- Ken Koedinger, Email "new feature request", 1/22/2009
** Relatedly, we should report significance values on the Slope parameters -- that is, when is the Slope significantly different from 0.
** The KC models page perhaps should also report the log-likelihood and number of parameters (in addition to BIC) and leave out AIC. We might also consider other metrics of model generality, like the "adjusted R2" (if I have this name right -- Joe Beck mentioned in the Assistments meeting yesterday).
** These changes will be part of meeting the CMDM goal of improvement in (or at least demonstrate acceptability of) the cognitive models in 90+ units in our LearnLab courses (or affiliates).

=== KC Modeling ===

==== LFA/AFM: Check if enough memory using formula ====
* The LFA/AFM code could calculate how much RAM would be needed to run the algorithm on a given skill model using the formula provided by Hao. This formula is based on the number of transactions, number of students and number of skills. Right now it will not schedule itself to run on a model with over 300 skills, though there is a manual override. [[User:Alida|Alida]] 13:35, 29 November 2010 (EST)

==== Create KC Models through Web Services ====
* For John Stamper's CMDM project, it would be nice to automatically update KC Models through web services. -- [[User:Koedinger|Ken Koedinger]], DataShop Team Meeting, Sept 11, 2009

==== Automatically discovering new KC model ====

* Would it be possible to run some code (perhaps Hao's KC model selection code, perhaps something else generated by CMDM thrust) to find new best KC model. -- Vincent Aleven, Sept 2008
* As a learning sciences researcher, I would like DataShop to discover a new/better KC model for me.
* Could be done as a plug-in

==== Generate new KC Models with LFA ====
* Not sure who asked for this.
* It would be nice to generate new KC Models with Hao's LFA code
* Would need to specify factors.
* Ideas on where this could run?
** On a separate server? Request it to be run, specify duration. Have separate server queue up requests, email user when done.
** In Java Applet on client machine? -- Phil Pavlik

==== Same Skill Twice on Same Step ====
* Would like to be able to apply the same skill to a step twice during a KC Model Import. -- Ken Koedinger, email, 2/4/2009

==== Save KC Model Import Files ====
* KC Model Import - save the file used to create the KCMs in case we need to recreate them. -- Ken Koedinger, email 3/4/2009

==== Log Likelihood and MAD ====
* Log Likelihood, MAD (mean absolute deviation) problem, MAD step (store and show) -- Hao Cen
** This is a variation on "Richer statistics for KC modeling" above. Probably should be merged. - Ken Koedinger

==== Better naming for KCs in auto-generated Unique-Step KCM ====

As a researcher, I want the KCs in the Unique-Step model to have better names than KC1, KC2, etc, so that I can easily tell which generated KCs go with which unique step.

* Hui Cheng (Email 1/20/2009), Ken Koedinger (Email 1/22/2009)
* Could you put this in your new feature request list: could the “Unique-step model” be better labeled then just “KC1”, “KC2” and etc? For example, for Assistments data, you could use part of the “Step Name”. -- Hui Cheng, Email 1/20/2009
* But, anything is better than "KC". -- Ken Koedinger, email, 1/22/2009
<blockquote>A simple alternative, that preserves uniqueness and addresses length, is to concatenate: 1) the first K letters of the step name 2) a unique numerical increment (same as the "3" in "KC3"). Note that (2) guarantees uniqueness just as it does in the current "KC<num>" scheme. Or perhaps better given that step names are often scoped within problems, is to concatenate: 1) the first L letters of the problem name 2) the first M letters of the step name 3) a unique numerical increment (just like the "3" in "KC3") I think K or L+M should be as big as possible without making the KC names indistinguishable (because they run off the right margin) in the KC list on the Learning Curve and other pages. </blockquote>

==== Visualize Learning Curve Split ====
* Be able to visualize a learning curve split into 2 based on a specification of a subset of problems. -- Albert Corbett, Math CCM, November 2008
* Harder: make me a new KCM out of it. -- Ken Koedinger, Team Mtg, Dec 5, 2008, while trying to describe Albert's request to Alida

==== Statistical Significance ====
* Can DataShop determine if the difference between conditions or learning curves is statistically significant? -- general theme at workshop, probably mentioned by Bob Hausmann in his talk, Winter Workshop 1/23/2008
** Can't do it yet in DataShop, but I can show you how to do it in R (or SPSS...) after you've exported the data -- export the "student-step rollup" than the whole transaction table. --[[User:Koedinger|Koedinger]] 16:30, 16 September 2009 (EDT)
*** This comment perhaps belongs (exists?) elsewhere: The current student-step rollup export (relly, all exports) should be such that I can immediately load it into R (and other packages) without error. Now errors occur, for instance, because there are "#" in the variable names i the student-step rollup. However, it is currently a road-block for helping folks like Bob do analyzes they want to do. --[[User:Koedinger|Koedinger]] 16:30, 16 September 2009 (EDT)
*** See [[Condition in Student-Step Rollup]] -- [[User:Alida|Alida]] 10:35, 17 September 2009 (EDT)

==== Split and Merge Skills ====

* Hand searches though a p-matrix for a dataset to split and merge skills. (Pie in the Sky) -- Ken Koedinger, Team Mtg, 02/22/2008

==== Notes on new KCMs ====

* It would be good if I could add a note to a KC model that was newly imported. -- Noboru Matsuda, email, Nov 19, 2009

==== Display Learning Curves Grouped by Interestingness ====

* The page displaying all the learning curves today seems to be alphabetically sorted by KC name, which is not necessarily meaningful.
* An alternative presentation is to group the curves into 4 sets, breaking up the page. Set 1 has curves that contain significant spikes, and therefore seem to be "low-hanging fruit" for purposes of breaking up into KCs. Set 2 has curves with few spikes, but they have a long X axis, suggesting that students are presented with too many opportunities to acquire those KCs. Set 3 has the "good" curves, i.e., nicely decreasing curves that are not too long. Set 4 is "other". -- [http://www.pitt.edu/~goldin Ilya Goldin] 7 December 2009

==== Order KC models according to AIC====
* This is based on results from multiple dataset analyses that compared AIC, BIC and loglikelihood to cross validation RMSE. -- Mimi McLaughlin, 2/9/2011

====Use log of opportunity count for AIC and BIC calculations====
* We compared using the log of opportunity count to whole number opportunity count in multiple datasets. We found the results for log of opportunity count to be consistently better, though small. -- Mimi McLaughlin, 2/9/2011

== Developer Requests ==

==== Store Converted Date and Converter Info ====

As a DataShop administrator, I'd like to see the converter information (version and date) stored in the database, so that I do not have to store that data manually in the Additional Notes field each time I load a dataset. -- Kyle, 8/5/2008

* DTD new fields:
** Store conversion and converter information in database (anything else?)
*** conversion_time
*** converter_info
* Email from Octav, 10/5/2007

I see there's also a note about the converter version in the Dataset
Info. Which is good, but it seems it's taken from the directory name
when I submitted the set. I don't know how reliable that is.  :-) It
would be better if it's taken from the new converter info field.

==== Convert from XML to tabbed-delimited format ====

* If the users agree that export format is valuable, then maybe if they could convert from XML to export format to see data in Excel, could look at Selection column and see blanks more easily. -- Jonathan Sewall, ET Mtg, 10/10/2007

==== Plug-ins (general issues) ====

* Please please support Ruby on Rails. -- Ben Shih, December 2008

==== Create UI to grant DataShop user roles ====

* Already tedious.
* Alida, User Meeting AAR on December 9, 2009

== Help ==

==== Specialize Label of Help Button ====
* Since this help is better than in most applications, it should say more than just 'Help'. -- Ken Koedinger & Ryan Baker, Team Mtg, May 30, 2008
** Ideas:
*** 'Page Help'
*** 'Help with this page'
*** 'Help with Learning Curve page' (Ken's favorite)
*** 'Help with this tool'

== Home Page ==

==== Redesign the Home Page ====

* In the menu of data sets at the top, include the N (=20 or as many as fit on the screen?) data sets that I have visited starting with the ones I've visited most recently. --[[User:Koedinger|Koedinger]] 16:16, 16 September 2009 (EDT)
** As of today, the menu shows the last 10 data sets, most recently visited at the top. I think the feature is good enough, but let's ask Ken --[[User:Bleber|Bleber]] 10:56, 6 August 2010 (EDT)
* There needs to be a better ordering for the datasets (DS364)
* Maybe a search to filter the list of datasets since the list is so long. -- Brett Leber, 6/14/2007
* Going back to the home page always goes to 'My Datasets' (DS313)
* Maybe show more high level stats on this page, like how many transactions, students, skill models
* Allow users to post and share project documentation (files, papers, other meta-info) -- Ruth Wylie, suggested during meeting on 8/4/2010. She had a file that is relevant to multiple datasets in her project.

== Import ==
== Miscellaneous ==

==== Analyses by LearnLab ====

* Organize data by LearnLab, not by "data set". -- Ryan Baker, Aug 2008
* Also: Bob Hausmann, Sep 2008; Maxine Eskenazi, Sep 2008
* Essentially, current data sets become samples, but the top-level unit is the LearnLab. You can take every data set in a LearnLab together as a sample.
* Implies being able to run analyses across data sets, and export multiple data sets together; to create multi-data set samples
* As a user of DataShop, I would like to look at learning curves for all Algebra data together (for example), or export all Algebra data
* Important long-term, but is a lot of work -- in particular, we need to solve scalability issues first.

==== Save Settings Between Sessions ====
* It would be useful if DataShop could save settings between sessions. -- Bob Hausmann, User Meeting, 2/1/2008
** "I do a lot of redoing the same steps" (e.g., set cutoffs, select a KC model, select students).

==== Multiple steps per transaction ====

* Needed so that we do not have to create multiple transactions for the same actual action for Andes logs. -- Kurt van Lehn, Feb 2007

==== Demographic data ====

* This has been mentioned by NSF visitors, AB, ESL, and some researchers.
* Also mentioned at Winter Workshop 1/23/2008.
** Derek/Sue-mei: Student background information not in DataShop. Would like to see a student or set of students from a particular demographic, and view them across datasets!
* Note that Gail added demographic data to Additional Notes field on the Dataset Info page for many datasets. The idea here is to put that data into the database somewhere.

==== Single Sign On ====
* Michael Bett, email, 10/8/2007
* It would be nice if the following services have a single login account/password:
*# Theory Wiki
*# Learnlab.org
*# ESL's OSS
*# DataShop

==== Reveal unanonymized student IDs ====

As a researcher/PI performing research assistant tasks, I want to see easily the unanonymized student IDs of subjects in DataShop so that I can email my subjects telling them when to use my system.

* Ruth Wylie, July 3, 2008
* As the honest broker of the PSLC data, I have promised to not reveal the student IDs and to protect the identity of the students. This is part of the DataShop IRB. Therefore, I do not see this request as possible. [[User:Alida|Alida]] 09:53, 4 September 2009 (EDT)
** At some point, some coordination would be good with OLI's Digital Dashboard project that Marsha Lovett (and the OLI team and sometimes me) is working on. This is meant to provide usage information more quickly to instructors. It could also perhaps be used by researchers (with the right IRB rights) in situations like Ruth's.

==== Knight Timeline ====
* Developed by Andrea Knight, 2004

==== Buggy Skills ====
* Ken Koedinger, prototype walkthrough 9/11/2006

==== Confusion Matrix ====
* Brian MacWhinney, prototype walkthrough 9/11/2006

== Navigation Bar ==
==== Filter KCs by Name ====

As a researcher working with KCs, I want to filter KCs based on their names, so that I can...
* Vincent Aleven, Email, 2/3/2008
* "Since Alida said you cannot have two mechanisms for putting together your KC set (i.e., cannot have both the selecting-by-clicking and selecting-by-filtering), I would probably opt for the latter."
* Alida: I thought Vincent mentioned that he'd like to select which KCs are in a set by filtering on the name. Example: Include KCs with '*reason*' in the name and exclude KCs with '*given*' in the name.
* This could be an addition to our v3.0 KC-selection mechanism--filter by name.
* Vincent, Email, 5/6/2009: Expressed another need for this feature. Could just allow for a wider area and longer list so that more items can be checked at once. The number of characters we show right now is not enough because in many cases that number of characters is the same across many of the skills. Reference data set: Geometry CWCTC 2005-2006

* Status: Design Started

==== Facebook-style KC Selection ====

As a researcher working with KCs, I want to select KCs based on the learning curve thumbnail, so that I can see quickly which ones I'm interested in exploring more deeply.

* Feature already designed for v3.0, not implemented due to time constraints.
* Agreed this would be really useful. -- Kirsten Butcher, User Mtg, 1/31/2008
* Status: Guestimate: 20 days, need to revisit requirements document

==== Feedback after clicking a large sample on a large dataset ====

As a user of DataShop (first-time or not), I want some feedback and the ability to cancel after I do something that might take a long time (e.g., clicking "All Data" on a large dataset) so that I do not get stuck.

* Part of the Susan Goldman story
* After clicking a sample for a large dataset, there is no "Loading..." text, no feedback that the click was even registered by the app (besides the sometimes busy cursor and small browser "loading" text), nor the ability to cancel the action. 
* We will always have similar problems even if performance is improved, so providing feedback and the opportunity to cancel is critical.

==== Save Button in Problem Navigation Box ====

* Save buttons in the sidebar. -- Ken Koedinger, Mtg 2006
** Could also put one in the Problem selection box in the sidebar.

==== Make Nav Bar Wider ====
* Make the Sample name and description fields much wider. -- John LaPlante, email 7/10/2007

== New Visualizations/Reports ==

==== Student-KC Rollup ====
As a researcher, I want to see KCs rolled up by student, so that I can ...

* Vincent Aleven, User Mtg, 1/29/2008
** By Student-KC would be more useful than by Student-Problem
** Example: # Steps asking for a hint or error or what proportion had help
** How often bottom out hint occurs

==== Instructor Reports ====
* Phil said he received a lot of positive reactions to providing reports on units for instructors. -- Phil Pavlik, ET Mtg 10/10/2007

==== Manage Authorizations/Projects Page ====
* Lisa Anthony, email 10/23/2007
* Allow PI to change permissions on the datasets.
:"Actually, I couldn't see how to change permissions on the datasets from the website. Is this possible? If not, it might be a nice feature..."

==== Calculate Time Spent on Different Study Activities ====

As a researcher, I want to know how much time, on average, students spend on study activities, so that I can ...

* Bruce McLaren, Email, 4/7/2009
<blockquote>
For my most recent stoich study, Shawn and I are interested in calculating timing information such as: (a) how long students spent, on average, working on individual tutors (b) how long students spent, on average, on all items in an intervention (c) how long students worked, on average, on post-tests. 

Timing information is very commonly required for studies, and can be calculated from DataShop logs relatively easily, so even if we don't have it, might be worth considering. (And we don't want to re-invent the wheel, if you already have it or are planning it...)
</blockquote>

==== Incorrect Step Duration and Hint Step Duration ====

As a researcher, I want to be able to see total step duration if the student's first attempt was an incorrect attempt, and total step duration if the student's first attempt was a hint request, so that I can do some analyses that I can't do with "Error Step Duration".

* Bob Hausmann, email, 11/11/2008.
* Updated title and story with 'step duration' instead of 'time'. -- [[User:Alida|Alida]] 10:36, 4 September 2009 (EDT)

==== Grading ====

[[Grading]]

====Display number of steps and number of observations for skills====
* How and where to be determined by developers. -- Ken (entered by Mimi)

== Reports ==
=== Dataset Info ===

==== Pointers to Hard-copy Data ====
* Brett van de Sande, NSF Site Visit, 5/28/2008
* Pointers to hard-copy data such as paper tests and/or homework.  Include contact information.  It doesn't seem to make sense to scan a whole filing cabinet of paper if no one wants to look at it.  And any secondary researchers don't know about the filing cabinet to ask for it.

==== Sort Problem Breakdown Table ====

* Would like the ability to sort the table by clicking on the column headings of the Problem Breakdown Table on the Dataset Info Tab. -- Bruce McLaren, User Mtg, 11/5/2007

==== Rename dataset ====

As a researcher, I want to rename my dataset so that it makes more sense to other people. I also want to make sure the dataset doesn't become polluted later by new data not associated with my study.
* Ruth Wylie, July 3, 2008
* There are reasons she would want to do this (current name is worthless, other researchers might try her tutor and pollute her data) but also reasons for not doing it (log more data later).
* There are risks in changing a dataset name that might not be apparent. For example, if you want the new data in the same dataset. [[User:Alida|Alida]] 10:27, 4 September 2009 (EDT)

==== Average time per problem, average number of problems, total number of sections ====

In addition to showing student hours per dataset, it would be useful to know the average time spent per problem, average number of problems completed, and the total number of sections.
* Noboru Matsuda, June 18, 2010

=== Error Report ===

==== View By Student ====

* Would like to see what a couple of students saw in the feedback. -- Marsha Lovett, 10/11/2007

==== Export ====
* I would like the ability to export this data. -- John LaPlante, email thread 'Suggestions for Improvement' 7/10/2007
* Also interested in this feature idea. -- Bruce McLaren, User Mtg, 11/5/2007

==== Sort ====

* Was planned for but not implemented in v2.1 (estimated to be a 4 day task)
** By Correctness %, starting with the least correct
** By Hints %
** Step (or KC if view by KC)
** Number of Students
* Ability to sort problems by their average experienced position within the curriculum. -- Ken Koedinger, 02/16/2007
** Which problem did students most often experience first, then the one experience second most often, ...
* Order steps by the order they typically are executed by students. -- Ken Koedinger, email 11/7/2008
:"Searching through the steps in a problem to get a sense of what is going on is currently hard because the steps are ordered alphabetically, not by the order in which most students did them. While not all students do all steps in the same order, there is some regularity there. It would be quite useful if the steps could be ordered in a "typical order". This could be accomplished by using the time stamps (of the first (correct?) transaction?) for each step to determine rank order of the each step for each student in a problem and then for each problem average the rank order of each step across all students. Then arrange the steps in the Error Report by their average rank order -- that is, roughly speaking, the step that is most likely to be first across students (closest on average to first) goes first, the step with the next lowest rank goes next, etc."
* The capability to count the number of errors of each message type and sort in different ways, for instance by all errors that had no messages. -- Bruce McLaren, email 10/22/2007
:"This is an error analysis I recently did in Excel, using pivot tables, that might be handy if in the DataShop. This one is very important for tutors because the errors that occur most frequently, yet don't elicit messages to the students, are good candidates to become errors with feedback."

==== Show more than 500 problems ====

In the error report, can we see more than 500 problems? See set "Cog Model Discovery Experiment Spring 2010"
2/2888 selected.
(Showing the first 500)
-- Ken via email on 2/15/2011

=== Export ===

==== Last attempt on step? column for transaction format ====

* Include a new column that shows whether the row is the last attempt on a step for a student or not. Could be 0 or 1 as value. Helpful for researchers who are doing grading of data. Transaction format. --Vincent Aleven, CTAT mtg 11/5/2010

==== Elapsed Time ====

* Include the elapsed time in preview and transaction export. It is more valuable than the transaction time as an absolute reference. Possible to keep both. --Ken Koedinger, Team Mtg 04/18/2008

==== SQL Format ====

* Option to export as an SQL file. -- Ken Koedinger, 03/26/2007, also brought up in June ET Meeting
** Ability to export an SQL dump of a dataset. --Kyle Cunningham, 04/03/2007

==== Specify Character for Blanks ====

* Ability to specify what character if any is used for blank. --Ryan Baker, email 8/9/2007
:"Not all tools handles TABTABTAB correctly on import. The period '.' is used to mean missing data in most stats packages. The word 'BLANK' is used in some other ones. Not an issue for Ryan as he wrote a preprocessor to convert blanks.

==== Opportunity (at KC) Count in Transaction Export ====
* Include the opportunity count in the transaction export (it's only in the student-step rollup) -- Noboru Matsuda, 10/08/2009
** Would be repetitive
** Would need one Opportunity column per KC

==== Export only rows that have KCs tagged ====
* Is it possible when a exporting dataset to include only the [transaction?] rows that have knowledge components tagged? And the same goes for KC models export, only include the items that have KC tagged? -- Hui Cheng, 01/19/2010
** We have the inverse of this option on the Performance Profiler, 'Include steps without a knowledge component', and with the Student-Problem export, 'Include Steps without KCs'.

==== Include Custom Fields in Student-Step Rollup ====
* Is it possible to include the custom field in the datashop [step] export? -- Hui Cheng, 03/01/2010
** "I am thinking about the student-step level export. We have a student from Statistics wanting to use assistment data from datashop. She wants to have problem set id (=curriculum id in datshop). For Assistment data, we decided to put problem set id in the custom field because in the Assistment a lot of problems are repeated in different problem sets. She needs data at the student-step level." (Hui email)
* Vote from Phil Pavlik too -- see [[Collected_User_Requests#Include_transaction_custom_fields_in_web_services_student-step_export|Phil's comment under Web Services]]

==== Don't duplicate rows in student-step format when not showing KCs ====
* If the checkbox to show knowledge components is not checked, maybe it doesn't make sense to show rows more than once if they have more than one KC associated with the step. (Mimi (and Brett) stumbled on this. 8/16/2010)

==== Student-Step Rollup include Success Column ====
* Step rollup, 1 if correct, 0 if incorrect/hint, blank otherwise call it Success. ~~ Ken Koedinger, DataShop Team Meeting, Oct 22, 2010

=== Learning Curve ===

==== Default sort by observation on LCPID ====

* Default sort by observation on LCPID. ~~ Ken Koedinger, DataShop Team Meeting, 10/22/2010

==== Purple Point ====
* Purple Point: if a point on the LC has more than one KC associated with it but you have drilled down to a given KC, then the blue line is off. We could put a purple point that takes this into account.
* Simpler thing: display a warning message that some points in the display are driven by other KCs
* Pearson may be interested
* This was mentioned during the PSLC Summer School 2010.
* For a step with multiple skills, attribute the error only to the skill with the highest overall error rate. ~~ Alida, meeting with Ken, November 18, 2010

==== Reduce Scrolling ====
* Add a forward and back button to the graph to reduce scrolling. -- John LaPlante, email 7/10/2007

:"The learning curve page could use a forward and back button to cycle through learning the learning curves. Going through them one by one requires a lot of scrolling."

==== Turn On Point Labels ====

* It would be nice to have the option to turn on point labels. It is nice that I can mouse over a point and view the data but it would be nicer if it appeared automatically. -- John LaPlante, email 7/10/2007

==== Option for Bigger Graph ====

* Allow user to see bigger graph. -- Derek Chan, Winter Workshop 1/23/2008
** Potential solution: enable user to set x, y scale manually

==== LC Normalize Scale of Thumbnails ====
* [[LC Normalize Scale of Thumbnails]]

=== Performance Profiler ===

==== Rename Performance Profiler ====
* John laPlante, email thread 'Suggestions for Improvement' 7/10/2007
** Did not use this report as thought it had something to do with improving the performance of the DataShop itself. This report might have been much better to use then the Error Report, would still need an export as using the data in a tabular form was still necessary. Note that the pivot tables created were added to the dataset (Pittsburgh Science of Learning Center Stoichiometry Study 1).
* Lisa Anthony, email 8/2008
** Didn't know to go to the report. Export would be useful.
** Needed a better definition of Error Rate with respect to Problem and Unit rows.

==== Export ====
* John laPlante (see comments in Rename Performance Profiler)
* Lisa Anthony (see comments in Rename Performance Profiler)
* Yes, I too can imagine wanting to export the results of a particular performance profiler output (i.e., to a table) so that I can graph it my own way. --[[User:Koedinger|Koedinger]] 16:24, 16 September 2009 (EDT)

==== Table View ====
* Add option to switch to a table view.
* Columns are: Problem Name, Steps, % incorrect, Incorrect Steps, % hint, Hint Steps, etc. include all values in pop-up.

==== Union of KCs/Problems/Students ====
* Allow user to get the union of KCs/Problems/Students etc so they can compare across samples easier. -- Kirsten Butcher, Winter Workshop 1/23/2008
:"It is difficult to compare performance profiler graphs across samples because the KCs (or problems, or whatever) aren't necessarily in both of those samples."

==== Show Details In Report ====
* Click on bar to see details in report and not just in pop-up. It disappears too quickly. -- Alida, Brett

==== Show More Information in the Graph ====
* Show more information in the graph: [Bruce M, User Mtg, 11/5/2007]
** Had drilled down by a certain skill - skill is not listed in the graph, user has to check the skill list on the LHS to see what skill was selected
* Maybe related to Bruce's "show more info in graph": show the actual value of the range variable. e.g., when range is error rate, show the error rate number somewhere (right now you can see, via mouse-over, the incorrect, hint, and correct percentages, but not the error rate) -- Brett
* Design idea to show # steps incorrect, # steps hint, # steps correct to clarify how the percentages are calculated. -- Alida, Brett

== Sample Selector ==

==== Sub-Samples ====

* Sub Samples would be helpful. -- John LaPlante, email 7/10/2007
:"A nice solution would be to have sub-samples where one property varies. When I'm doing this analysis, I've changed my sample many times, renaming it sometimes, tweaking it to get variations on the data. The samples are really useful but they could help me a lot more with this kind of experimentation."

==== Filter out students ====

As a researcher, I want to filter out test users (including myself) from my data so that I see less noise in the data.

* Ruth Wylie, July 3, 2008
* You can already filter out test users by using a test user id that starts with 'weirdCMUuser_xxx'. Then create a sample that excludes students with a name like 'Test_%'. --[[User:Alida|Alida]] 10:29, 4 September 2009 (EDT)

==== Filter by "Class" ====

As a researcher, I want to create samples based on "class" since class is how I've encoded my conditions.

* Maaike Waalkens, July 30, 2010
* This is what made sense using Mathtutor for tutor delivery.
* What other fields are we missing in the Sample Selector?

==== Filter by "Step" ====

As a researcher, I want to create samples based on "step" since I'm only interested in one particular step.

* Mimi McLaughlin, August 17, 2010

==== Create Sample Automatically ====

* Would it be possible for me to get a random sample from the 'Bridge to Algebra 2006-2007' dataset of 100 students? I am having trouble looking at the data because it takes too long to load, and my adviser thought that was because the dataset was too large. ~~ DataShop User, 10/19/2010

== Web Services ==

==== Include transaction custom fields in web services student-step export ====

* "I noticed in the web services guide that cfs (which provides the custom fields) is not yet implemented for step roll-up tables. It is marked [coming soon] in the manual... I could write a workaround to pull in the transactions and lookup the custom fields, but I'd really rather not." -- Phil Pavlik, 4/27/2011

* "It would be best if the CF for step rollup was drawn from the primary transaction for the first attempt that is correct or incorrect. At least that seems like the generally best value. Basically, there may be many cases where the custom field is the same across all transactions for a step. In this case ... you could just use the first one since they are all the same." -- Phil Pavlik, 4/29/2011

==== Use Custom Fields Graphs/Reports ====
In Graphs

As a researcher creating custom fields and assigning values at the transaction level, I want DataShop to perform the aggregation to the step level so that I can do other things with my custom-field variable such as graph it. -- Ryan Baker, mtg w/Alida & Brett, 12/15/2008

Performance Profiler
* Use all discrete variables/custom fields on the left and all the continuous variables on the bottom of the Performance Profiler. -- [[User:Koedinger|Ken Koedinger]], DataShop Team Meeting, Sept 11, 2009

==== Ad-hoc queries ====

* Allow restricted filtering on steps and transactions as the next web service feature (after CFs), whatever they can filter on in the navigation boxes (User Meeting AAR, December 9, 2009)

==== Sample creation as a web service ====

* Sample creation is still too slow. (User Meeting AAR, December 9, 2009)

 
----
See completed [[DataShop 3.x Features]] 
See on-going [[DataShop 4.x Features]] 
See prioritized [[DataShop Feature Wish List]]
[[Category:Protected]]
[[Category:DataShop]]

Collected User Requests

2011-04-28T16:28:06Z

Bleber: /* Include Custom Fields in Student-Step Rollup */

See prioritized items on [[DataShop Feature Wish List]].

== Annotations ==

==== Annotations on Transaction Level ====
* I have models which can annotate things like: gaming, bored, etc. on the transaction level. -- Ryan Baker, ET Mtg 12/5/2007

==== Annotations on Student Level ====
* Would annotate on student level. -- Ido Roll, User Meeting, 1/19/2009

==== Annotations on Pages ====
* See the cool thing created by Jeffery Heer where all the settings of the page were recorded with the comment. -- Ryan Baker, DS Team Mtg 5/23/2008

==== Dataset Discussion - Capture data-integrity issues ====
* As a stakeholder in the DataShop project, I want to capture and publicize the data-integrity issues discovered with data sets so that data is better documented (and so we've fulfilled a promise to our funders to better document data). -- Ken Koedinger, Team Meeting, 8/15/2009
* As a user of DataShop, I want to discuss datasets and have that discussion attached to the dataset so that others can comment and better understand any data-integrity issues I've found.

==== Linking to internal pages ====

* It would be handy if a link can be saved to any dataset, sample, page in the DataShop. -- [[Alida]], 10/18/2007
** Currently, https://pslc-qa.andrew.cmu.edu/datashop/DatasetInfo?datasetId=793 works if you are already logged in.

==== Have a link from the DataShop to the Theory Wiki (Dataset to Project Page) ====

* Can we link from the dataset to the project page on the Theory Wiki? In the pipeline have a clickable link to the project page (make project name clickable). -- Michael Bett, ET Mtg 11/14/2007
** Link to a dataset directly? Is that obvious to users? Click on dataset link -> log in -> redirected back to dataset. -- Brett Leber

== Data Format ==
==== Capture the question prompt and answer choices the student chose from ====
* As a research, I want to be able to identify steps based on the question prompt—not the difficult-to-understand step names that come from selection and action of my tutor—so that I can analyze the data more easily. For multiple choice questions, I also want to be able to see all of the choices that were available to the student. -- Eli Silk - February 1, 2011 (meeting with Brett and Ross Higashi of the FIRE project)
** Near-term solution is to create a table locally that maps steps to prompts
** Another solution is to ask CTAT team to modify their Flash components so that they log this information as custom fields
** Long-term solution is to make these fields into standard fields in the tutor message format
** In the email thread "Cognitive Tutor Development and Evaluation Requests", there is support for this general idea from Ken Koedinger, Albert Corbett, and Christian Schunn.
** Ken added that "Ideally, we may want to store any images that the student can see and where they reside on the screen ..."

== Data Modeling ==
=== Non-KC Modeling ===

==== Automatic Distillation ====
* As an educational data miner wishing to develop a machine learned model with PSLC data, I would like to be able to automatically distill data features (e.g. custom fields) commonly used in past educational data mining research for a new data set (see, for instance, Baker, Corbett, Roll, & Koedinger, 2008 in UMUAI) -- Ryan Baker, Summer 2008, Startup Memo
** Could be implemented as a plug-in
* Also interested in this feature idea. -- Dan Franklin, Oct 2008

==== Upload model and apply it to new data set ====

* As an EDM researcher, I would like to take a model, expressable as a linear formula on DataShop fields, or a simple code procedure (e.g. Bayesian Knowledge Tracing, which Ryan has code for), and apply it to a new data set, so that I can ... -- Ryan Baker, Sept 2008
* Also interested in this. -- Maxine Eskenazi, Sept 2008
* May work best as a plug-in
** Code to display GUI to choose which data sets to use, calls model code, re-import to DataShop
** Good to have a way to apply many models, as soon as you import a data set
* Phil has an idea that maybe fits within this one. Please move if there's a better category. -- Brett Leber
<blockquote>This [''transaction? kc? --ed.''] relabeling is really mostly about enabling modeling in DataShop right? With this in mind, I think that it is actually a higher priority to have model alternatives in DataShop.... E.g. Investigators should be able to give you chunks of Java code according to a certain specification, and DataShop should be then able to run these over datasets (perhaps after a certain series of QA occurs according to an SOP) when the investigator clicks some button in DataShop.... Obviously this is a much large project than adding columns, but it is also much more important in my mind. ''-- Phil Pavlik, email to Brett on 1/14/2009''</blockquote>
* Examples:
** Example: running gaming detector in multiple tutors and comparing gaming frequencies
** Example: applying Bayesian Knowledge Tracing to a new data set from the same LearnLab
** Example: applying Ben Shih's models to many data sets. Note that Ben Shih should be included in design of this feature; he is interested, and has a lot of good ideas.

==== Add Different Predicted Values ====
* Would also like to add statistics, different predicted values than what LFA produces. -- Ken Koedinger, ET Meeting, 10/10/2007

==== Bayesian Knowledge Tracing ====
* Bayesian Knowledge-Tracing built into DataShop like LFA is. -- Ryan Baker, Startup Memo, Summer 2008

==== Richer statistics for KC modeling ====

* In addition, to the model stats and estimates generated for learning factor models, we should also create difficulty factor models (i.e., ones with "Slope" parameter). The latter is particularly relevant for the Unique-Step model where the slope parameter is meaningless (but still counts against the BIC value). -- Ken Koedinger, Email "new feature request", 1/22/2009
** Relatedly, we should report significance values on the Slope parameters -- that is, when is the Slope significantly different from 0.
** The KC models page perhaps should also report the log-likelihood and number of parameters (in addition to BIC) and leave out AIC. We might also consider other metrics of model generality, like the "adjusted R2" (if I have this name right -- Joe Beck mentioned in the Assistments meeting yesterday).
** These changes will be part of meeting the CMDM goal of improvement in (or at least demonstrate acceptability of) the cognitive models in 90+ units in our LearnLab courses (or affiliates).

=== KC Modeling ===

==== LFA/AFM: Check if enough memory using formula ====
* The LFA/AFM code could calculate how much RAM would be needed to run the algorithm on a given skill model using the formula provided by Hao. This formula is based on the number of transactions, number of students and number of skills. Right now it will not schedule itself to run on a model with over 300 skills, though there is a manual override. [[User:Alida|Alida]] 13:35, 29 November 2010 (EST)

==== Create KC Models through Web Services ====
* For John Stamper's CMDM project, it would be nice to automatically update KC Models through web services. -- [[User:Koedinger|Ken Koedinger]], DataShop Team Meeting, Sept 11, 2009

==== Automatically discovering new KC model ====

* Would it be possible to run some code (perhaps Hao's KC model selection code, perhaps something else generated by CMDM thrust) to find new best KC model. -- Vincent Aleven, Sept 2008
* As a learning sciences researcher, I would like DataShop to discover a new/better KC model for me.
* Could be done as a plug-in

==== Generate new KC Models with LFA ====
* Not sure who asked for this.
* It would be nice to generate new KC Models with Hao's LFA code
* Would need to specify factors.
* Ideas on where this could run?
** On a separate server? Request it to be run, specify duration. Have separate server queue up requests, email user when done.
** In Java Applet on client machine? -- Phil Pavlik

==== Same Skill Twice on Same Step ====
* Would like to be able to apply the same skill to a step twice during a KC Model Import. -- Ken Koedinger, email, 2/4/2009

==== Save KC Model Import Files ====
* KC Model Import - save the file used to create the KCMs in case we need to recreate them. -- Ken Koedinger, email 3/4/2009

==== Log Likelihood and MAD ====
* Log Likelihood, MAD (mean absolute deviation) problem, MAD step (store and show) -- Hao Cen
** This is a variation on "Richer statistics for KC modeling" above. Probably should be merged. - Ken Koedinger

==== Better naming for KCs in auto-generated Unique-Step KCM ====

As a researcher, I want the KCs in the Unique-Step model to have better names than KC1, KC2, etc, so that I can easily tell which generated KCs go with which unique step.

* Hui Cheng (Email 1/20/2009), Ken Koedinger (Email 1/22/2009)
* Could you put this in your new feature request list: could the “Unique-step model” be better labeled then just “KC1”, “KC2” and etc? For example, for Assistments data, you could use part of the “Step Name”. -- Hui Cheng, Email 1/20/2009
* But, anything is better than "KC". -- Ken Koedinger, email, 1/22/2009
<blockquote>A simple alternative, that preserves uniqueness and addresses length, is to concatenate: 1) the first K letters of the step name 2) a unique numerical increment (same as the "3" in "KC3"). Note that (2) guarantees uniqueness just as it does in the current "KC<num>" scheme. Or perhaps better given that step names are often scoped within problems, is to concatenate: 1) the first L letters of the problem name 2) the first M letters of the step name 3) a unique numerical increment (just like the "3" in "KC3") I think K or L+M should be as big as possible without making the KC names indistinguishable (because they run off the right margin) in the KC list on the Learning Curve and other pages. </blockquote>

==== Visualize Learning Curve Split ====
* Be able to visualize a learning curve split into 2 based on a specification of a subset of problems. -- Albert Corbett, Math CCM, November 2008
* Harder: make me a new KCM out of it. -- Ken Koedinger, Team Mtg, Dec 5, 2008, while trying to describe Albert's request to Alida

==== Statistical Significance ====
* Can DataShop determine if the difference between conditions or learning curves is statistically significant? -- general theme at workshop, probably mentioned by Bob Hausmann in his talk, Winter Workshop 1/23/2008
** Can't do it yet in DataShop, but I can show you how to do it in R (or SPSS...) after you've exported the data -- export the "student-step rollup" than the whole transaction table. --[[User:Koedinger|Koedinger]] 16:30, 16 September 2009 (EDT)
*** This comment perhaps belongs (exists?) elsewhere: The current student-step rollup export (relly, all exports) should be such that I can immediately load it into R (and other packages) without error. Now errors occur, for instance, because there are "#" in the variable names i the student-step rollup. However, it is currently a road-block for helping folks like Bob do analyzes they want to do. --[[User:Koedinger|Koedinger]] 16:30, 16 September 2009 (EDT)
*** See [[Condition in Student-Step Rollup]] -- [[User:Alida|Alida]] 10:35, 17 September 2009 (EDT)

==== Split and Merge Skills ====

* Hand searches though a p-matrix for a dataset to split and merge skills. (Pie in the Sky) -- Ken Koedinger, Team Mtg, 02/22/2008

==== Notes on new KCMs ====

* It would be good if I could add a note to a KC model that was newly imported. -- Noboru Matsuda, email, Nov 19, 2009

==== Display Learning Curves Grouped by Interestingness ====

* The page displaying all the learning curves today seems to be alphabetically sorted by KC name, which is not necessarily meaningful.
* An alternative presentation is to group the curves into 4 sets, breaking up the page. Set 1 has curves that contain significant spikes, and therefore seem to be "low-hanging fruit" for purposes of breaking up into KCs. Set 2 has curves with few spikes, but they have a long X axis, suggesting that students are presented with too many opportunities to acquire those KCs. Set 3 has the "good" curves, i.e., nicely decreasing curves that are not too long. Set 4 is "other". -- [http://www.pitt.edu/~goldin Ilya Goldin] 7 December 2009

==== Order KC models according to AIC====
* This is based on results from multiple dataset analyses that compared AIC, BIC and loglikelihood to cross validation RMSE. -- Mimi McLaughlin, 2/9/2011

====Use log of opportunity count for AIC and BIC calculations====
* We compared using the log of opportunity count to whole number opportunity count in multiple datasets. We found the results for log of opportunity count to be consistently better, though small. -- Mimi McLaughlin, 2/9/2011

== Developer Requests ==

==== Store Converted Date and Converter Info ====

As a DataShop administrator, I'd like to see the converter information (version and date) stored in the database, so that I do not have to store that data manually in the Additional Notes field each time I load a dataset. -- Kyle, 8/5/2008

* DTD new fields:
** Store conversion and converter information in database (anything else?)
*** conversion_time
*** converter_info
* Email from Octav, 10/5/2007

I see there's also a note about the converter version in the Dataset
Info. Which is good, but it seems it's taken from the directory name
when I submitted the set. I don't know how reliable that is.  :-) It
would be better if it's taken from the new converter info field.

==== Convert from XML to tabbed-delimited format ====

* If the users agree that export format is valuable, then maybe if they could convert from XML to export format to see data in Excel, could look at Selection column and see blanks more easily. -- Jonathan Sewall, ET Mtg, 10/10/2007

==== Plug-ins (general issues) ====

* Please please support Ruby on Rails. -- Ben Shih, December 2008

==== Create UI to grant DataShop user roles ====

* Already tedious.
* Alida, User Meeting AAR on December 9, 2009

== Help ==

==== Specialize Label of Help Button ====
* Since this help is better than in most applications, it should say more than just 'Help'. -- Ken Koedinger & Ryan Baker, Team Mtg, May 30, 2008
** Ideas:
*** 'Page Help'
*** 'Help with this page'
*** 'Help with Learning Curve page' (Ken's favorite)
*** 'Help with this tool'

== Home Page ==

==== Redesign the Home Page ====

* In the menu of data sets at the top, include the N (=20 or as many as fit on the screen?) data sets that I have visited starting with the ones I've visited most recently. --[[User:Koedinger|Koedinger]] 16:16, 16 September 2009 (EDT)
** As of today, the menu shows the last 10 data sets, most recently visited at the top. I think the feature is good enough, but let's ask Ken --[[User:Bleber|Bleber]] 10:56, 6 August 2010 (EDT)
* There needs to be a better ordering for the datasets (DS364)
* Maybe a search to filter the list of datasets since the list is so long. -- Brett Leber, 6/14/2007
* Going back to the home page always goes to 'My Datasets' (DS313)
* Maybe show more high level stats on this page, like how many transactions, students, skill models
* Allow users to post and share project documentation (files, papers, other meta-info) -- Ruth Wylie, suggested during meeting on 8/4/2010. She had a file that is relevant to multiple datasets in her project.

== Import ==
== Miscellaneous ==

==== Analyses by LearnLab ====

* Organize data by LearnLab, not by "data set". -- Ryan Baker, Aug 2008
* Also: Bob Hausmann, Sep 2008; Maxine Eskenazi, Sep 2008
* Essentially, current data sets become samples, but the top-level unit is the LearnLab. You can take every data set in a LearnLab together as a sample.
* Implies being able to run analyses across data sets, and export multiple data sets together; to create multi-data set samples
* As a user of DataShop, I would like to look at learning curves for all Algebra data together (for example), or export all Algebra data
* Important long-term, but is a lot of work -- in particular, we need to solve scalability issues first.

==== Save Settings Between Sessions ====
* It would be useful if DataShop could save settings between sessions. -- Bob Hausmann, User Meeting, 2/1/2008
** "I do a lot of redoing the same steps" (e.g., set cutoffs, select a KC model, select students).

==== Multiple steps per transaction ====

* Needed so that we do not have to create multiple transactions for the same actual action for Andes logs. -- Kurt van Lehn, Feb 2007

==== Demographic data ====

* This has been mentioned by NSF visitors, AB, ESL, and some researchers.
* Also mentioned at Winter Workshop 1/23/2008.
** Derek/Sue-mei: Student background information not in DataShop. Would like to see a student or set of students from a particular demographic, and view them across datasets!
* Note that Gail added demographic data to Additional Notes field on the Dataset Info page for many datasets. The idea here is to put that data into the database somewhere.

==== Single Sign On ====
* Michael Bett, email, 10/8/2007
* It would be nice if the following services have a single login account/password:
*# Theory Wiki
*# Learnlab.org
*# ESL's OSS
*# DataShop

==== Reveal unanonymized student IDs ====

As a researcher/PI performing research assistant tasks, I want to see easily the unanonymized student IDs of subjects in DataShop so that I can email my subjects telling them when to use my system.

* Ruth Wylie, July 3, 2008
* As the honest broker of the PSLC data, I have promised to not reveal the student IDs and to protect the identity of the students. This is part of the DataShop IRB. Therefore, I do not see this request as possible. [[User:Alida|Alida]] 09:53, 4 September 2009 (EDT)
** At some point, some coordination would be good with OLI's Digital Dashboard project that Marsha Lovett (and the OLI team and sometimes me) is working on. This is meant to provide usage information more quickly to instructors. It could also perhaps be used by researchers (with the right IRB rights) in situations like Ruth's.

==== Knight Timeline ====
* Developed by Andrea Knight, 2004

==== Buggy Skills ====
* Ken Koedinger, prototype walkthrough 9/11/2006

==== Confusion Matrix ====
* Brian MacWhinney, prototype walkthrough 9/11/2006

== Navigation Bar ==
==== Filter KCs by Name ====

As a researcher working with KCs, I want to filter KCs based on their names, so that I can...
* Vincent Aleven, Email, 2/3/2008
* "Since Alida said you cannot have two mechanisms for putting together your KC set (i.e., cannot have both the selecting-by-clicking and selecting-by-filtering), I would probably opt for the latter."
* Alida: I thought Vincent mentioned that he'd like to select which KCs are in a set by filtering on the name. Example: Include KCs with '*reason*' in the name and exclude KCs with '*given*' in the name.
* This could be an addition to our v3.0 KC-selection mechanism--filter by name.
* Vincent, Email, 5/6/2009: Expressed another need for this feature. Could just allow for a wider area and longer list so that more items can be checked at once. The number of characters we show right now is not enough because in many cases that number of characters is the same across many of the skills. Reference data set: Geometry CWCTC 2005-2006

* Status: Design Started

==== Facebook-style KC Selection ====

As a researcher working with KCs, I want to select KCs based on the learning curve thumbnail, so that I can see quickly which ones I'm interested in exploring more deeply.

* Feature already designed for v3.0, not implemented due to time constraints.
* Agreed this would be really useful. -- Kirsten Butcher, User Mtg, 1/31/2008
* Status: Guestimate: 20 days, need to revisit requirements document

==== Feedback after clicking a large sample on a large dataset ====

As a user of DataShop (first-time or not), I want some feedback and the ability to cancel after I do something that might take a long time (e.g., clicking "All Data" on a large dataset) so that I do not get stuck.

* Part of the Susan Goldman story
* After clicking a sample for a large dataset, there is no "Loading..." text, no feedback that the click was even registered by the app (besides the sometimes busy cursor and small browser "loading" text), nor the ability to cancel the action. 
* We will always have similar problems even if performance is improved, so providing feedback and the opportunity to cancel is critical.

==== Save Button in Problem Navigation Box ====

* Save buttons in the sidebar. -- Ken Koedinger, Mtg 2006
** Could also put one in the Problem selection box in the sidebar.

==== Make Nav Bar Wider ====
* Make the Sample name and description fields much wider. -- John LaPlante, email 7/10/2007

== New Visualizations/Reports ==

==== Student-KC Rollup ====
As a researcher, I want to see KCs rolled up by student, so that I can ...

* Vincent Aleven, User Mtg, 1/29/2008
** By Student-KC would be more useful than by Student-Problem
** Example: # Steps asking for a hint or error or what proportion had help
** How often bottom out hint occurs

==== Instructor Reports ====
* Phil said he received a lot of positive reactions to providing reports on units for instructors. -- Phil Pavlik, ET Mtg 10/10/2007

==== Manage Authorizations/Projects Page ====
* Lisa Anthony, email 10/23/2007
* Allow PI to change permissions on the datasets.
:"Actually, I couldn't see how to change permissions on the datasets from the website. Is this possible? If not, it might be a nice feature..."

==== Calculate Time Spent on Different Study Activities ====

As a researcher, I want to know how much time, on average, students spend on study activities, so that I can ...

* Bruce McLaren, Email, 4/7/2009
<blockquote>
For my most recent stoich study, Shawn and I are interested in calculating timing information such as: (a) how long students spent, on average, working on individual tutors (b) how long students spent, on average, on all items in an intervention (c) how long students worked, on average, on post-tests. 

Timing information is very commonly required for studies, and can be calculated from DataShop logs relatively easily, so even if we don't have it, might be worth considering. (And we don't want to re-invent the wheel, if you already have it or are planning it...)
</blockquote>

==== Incorrect Step Duration and Hint Step Duration ====

As a researcher, I want to be able to see total step duration if the student's first attempt was an incorrect attempt, and total step duration if the student's first attempt was a hint request, so that I can do some analyses that I can't do with "Error Step Duration".

* Bob Hausmann, email, 11/11/2008.
* Updated title and story with 'step duration' instead of 'time'. -- [[User:Alida|Alida]] 10:36, 4 September 2009 (EDT)

==== Grading ====

[[Grading]]

====Display number of steps and number of observations for skills====
* How and where to be determined by developers. -- Ken (entered by Mimi)

== Reports ==
=== Dataset Info ===

==== Pointers to Hard-copy Data ====
* Brett van de Sande, NSF Site Visit, 5/28/2008
* Pointers to hard-copy data such as paper tests and/or homework.  Include contact information.  It doesn't seem to make sense to scan a whole filing cabinet of paper if no one wants to look at it.  And any secondary researchers don't know about the filing cabinet to ask for it.

==== Sort Problem Breakdown Table ====

* Would like the ability to sort the table by clicking on the column headings of the Problem Breakdown Table on the Dataset Info Tab. -- Bruce McLaren, User Mtg, 11/5/2007

==== Rename dataset ====

As a researcher, I want to rename my dataset so that it makes more sense to other people. I also want to make sure the dataset doesn't become polluted later by new data not associated with my study.
* Ruth Wylie, July 3, 2008
* There are reasons she would want to do this (current name is worthless, other researchers might try her tutor and pollute her data) but also reasons for not doing it (log more data later).
* There are risks in changing a dataset name that might not be apparent. For example, if you want the new data in the same dataset. [[User:Alida|Alida]] 10:27, 4 September 2009 (EDT)

==== Average time per problem, average number of problems, total number of sections ====

In addition to showing student hours per dataset, it would be useful to know the average time spent per problem, average number of problems completed, and the total number of sections.
* Noboru Matsuda, June 18, 2010

=== Error Report ===

==== View By Student ====

* Would like to see what a couple of students saw in the feedback. -- Marsha Lovett, 10/11/2007

==== Export ====
* I would like the ability to export this data. -- John LaPlante, email thread 'Suggestions for Improvement' 7/10/2007
* Also interested in this feature idea. -- Bruce McLaren, User Mtg, 11/5/2007

==== Sort ====

* Was planned for but not implemented in v2.1 (estimated to be a 4 day task)
** By Correctness %, starting with the least correct
** By Hints %
** Step (or KC if view by KC)
** Number of Students
* Ability to sort problems by their average experienced position within the curriculum. -- Ken Koedinger, 02/16/2007
** Which problem did students most often experience first, then the one experience second most often, ...
* Order steps by the order they typically are executed by students. -- Ken Koedinger, email 11/7/2008
:"Searching through the steps in a problem to get a sense of what is going on is currently hard because the steps are ordered alphabetically, not by the order in which most students did them. While not all students do all steps in the same order, there is some regularity there. It would be quite useful if the steps could be ordered in a "typical order". This could be accomplished by using the time stamps (of the first (correct?) transaction?) for each step to determine rank order of the each step for each student in a problem and then for each problem average the rank order of each step across all students. Then arrange the steps in the Error Report by their average rank order -- that is, roughly speaking, the step that is most likely to be first across students (closest on average to first) goes first, the step with the next lowest rank goes next, etc."
* The capability to count the number of errors of each message type and sort in different ways, for instance by all errors that had no messages. -- Bruce McLaren, email 10/22/2007
:"This is an error analysis I recently did in Excel, using pivot tables, that might be handy if in the DataShop. This one is very important for tutors because the errors that occur most frequently, yet don't elicit messages to the students, are good candidates to become errors with feedback."

==== Show more than 500 problems ====

In the error report, can we see more than 500 problems? See set "Cog Model Discovery Experiment Spring 2010"
2/2888 selected.
(Showing the first 500)
-- Ken via email on 2/15/2011

=== Export ===

==== Last attempt on step? column for transaction format ====

* Include a new column that shows whether the row is the last attempt on a step for a student or not. Could be 0 or 1 as value. Helpful for researchers who are doing grading of data. Transaction format. --Vincent Aleven, CTAT mtg 11/5/2010

==== Elapsed Time ====

* Include the elapsed time in preview and transaction export. It is more valuable than the transaction time as an absolute reference. Possible to keep both. --Ken Koedinger, Team Mtg 04/18/2008

==== SQL Format ====

* Option to export as an SQL file. -- Ken Koedinger, 03/26/2007, also brought up in June ET Meeting
** Ability to export an SQL dump of a dataset. --Kyle Cunningham, 04/03/2007

==== Specify Character for Blanks ====

* Ability to specify what character if any is used for blank. --Ryan Baker, email 8/9/2007
:"Not all tools handles TABTABTAB correctly on import. The period '.' is used to mean missing data in most stats packages. The word 'BLANK' is used in some other ones. Not an issue for Ryan as he wrote a preprocessor to convert blanks.

==== Opportunity (at KC) Count in Transaction Export ====
* Include the opportunity count in the transaction export (it's only in the student-step rollup) -- Noboru Matsuda, 10/08/2009
** Would be repetitive
** Would need one Opportunity column per KC

==== Export only rows that have KCs tagged ====
* Is it possible when a exporting dataset to include only the [transaction?] rows that have knowledge components tagged? And the same goes for KC models export, only include the items that have KC tagged? -- Hui Cheng, 01/19/2010
** We have the inverse of this option on the Performance Profiler, 'Include steps without a knowledge component', and with the Student-Problem export, 'Include Steps without KCs'.

==== Include Custom Fields in Student-Step Rollup ====
* Is it possible to include the custom field in the datashop [step] export? -- Hui Cheng, 03/01/2010
** "I am thinking about the student-step level export. We have a student from Statistics wanting to use assistment data from datashop. She wants to have problem set id (=curriculum id in datshop). For Assistment data, we decided to put problem set id in the custom field because in the Assistment a lot of problems are repeated in different problem sets. She needs data at the student-step level." (Hui email)
* Vote from Phil Pavlik too -- see [[Collected_User_Requests#Include_transaction_custom_fields_in_web_services_student-step_export|Phil's comment under Web Services]]

==== Don't duplicate rows in student-step format when not showing KCs ====
* If the checkbox to show knowledge components is not checked, maybe it doesn't make sense to show rows more than once if they have more than one KC associated with the step. (Mimi (and Brett) stumbled on this. 8/16/2010)

==== Student-Step Rollup include Success Column ====
* Step rollup, 1 if correct, 0 if incorrect/hint, blank otherwise call it Success. ~~ Ken Koedinger, DataShop Team Meeting, Oct 22, 2010

=== Learning Curve ===

==== Default sort by observation on LCPID ====

* Default sort by observation on LCPID. ~~ Ken Koedinger, DataShop Team Meeting, 10/22/2010

==== Purple Point ====
* Purple Point: if a point on the LC has more than one KC associated with it but you have drilled down to a given KC, then the blue line is off. We could put a purple point that takes this into account.
* Simpler thing: display a warning message that some points in the display are driven by other KCs
* Pearson may be interested
* This was mentioned during the PSLC Summer School 2010.
* For a step with multiple skills, attribute the error only to the skill with the highest overall error rate. ~~ Alida, meeting with Ken, November 18, 2010

==== Reduce Scrolling ====
* Add a forward and back button to the graph to reduce scrolling. -- John LaPlante, email 7/10/2007

:"The learning curve page could use a forward and back button to cycle through learning the learning curves. Going through them one by one requires a lot of scrolling."

==== Turn On Point Labels ====

* It would be nice to have the option to turn on point labels. It is nice that I can mouse over a point and view the data but it would be nicer if it appeared automatically. -- John LaPlante, email 7/10/2007

==== Option for Bigger Graph ====

* Allow user to see bigger graph. -- Derek Chan, Winter Workshop 1/23/2008
** Potential solution: enable user to set x, y scale manually

==== LC Normalize Scale of Thumbnails ====
* [[LC Normalize Scale of Thumbnails]]

=== Performance Profiler ===

==== Rename Performance Profiler ====
* John laPlante, email thread 'Suggestions for Improvement' 7/10/2007
** Did not use this report as thought it had something to do with improving the performance of the DataShop itself. This report might have been much better to use then the Error Report, would still need an export as using the data in a tabular form was still necessary. Note that the pivot tables created were added to the dataset (Pittsburgh Science of Learning Center Stoichiometry Study 1).
* Lisa Anthony, email 8/2008
** Didn't know to go to the report. Export would be useful.
** Needed a better definition of Error Rate with respect to Problem and Unit rows.

==== Export ====
* John laPlante (see comments in Rename Performance Profiler)
* Lisa Anthony (see comments in Rename Performance Profiler)
* Yes, I too can imagine wanting to export the results of a particular performance profiler output (i.e., to a table) so that I can graph it my own way. --[[User:Koedinger|Koedinger]] 16:24, 16 September 2009 (EDT)

==== Table View ====
* Add option to switch to a table view.
* Columns are: Problem Name, Steps, % incorrect, Incorrect Steps, % hint, Hint Steps, etc. include all values in pop-up.

==== Union of KCs/Problems/Students ====
* Allow user to get the union of KCs/Problems/Students etc so they can compare across samples easier. -- Kirsten Butcher, Winter Workshop 1/23/2008
:"It is difficult to compare performance profiler graphs across samples because the KCs (or problems, or whatever) aren't necessarily in both of those samples."

==== Show Details In Report ====
* Click on bar to see details in report and not just in pop-up. It disappears too quickly. -- Alida, Brett

==== Show More Information in the Graph ====
* Show more information in the graph: [Bruce M, User Mtg, 11/5/2007]
** Had drilled down by a certain skill - skill is not listed in the graph, user has to check the skill list on the LHS to see what skill was selected
* Maybe related to Bruce's "show more info in graph": show the actual value of the range variable. e.g., when range is error rate, show the error rate number somewhere (right now you can see, via mouse-over, the incorrect, hint, and correct percentages, but not the error rate) -- Brett
* Design idea to show # steps incorrect, # steps hint, # steps correct to clarify how the percentages are calculated. -- Alida, Brett

== Sample Selector ==

==== Sub-Samples ====

* Sub Samples would be helpful. -- John LaPlante, email 7/10/2007
:"A nice solution would be to have sub-samples where one property varies. When I'm doing this analysis, I've changed my sample many times, renaming it sometimes, tweaking it to get variations on the data. The samples are really useful but they could help me a lot more with this kind of experimentation."

==== Filter out students ====

As a researcher, I want to filter out test users (including myself) from my data so that I see less noise in the data.

* Ruth Wylie, July 3, 2008
* You can already filter out test users by using a test user id that starts with 'weirdCMUuser_xxx'. Then create a sample that excludes students with a name like 'Test_%'. --[[User:Alida|Alida]] 10:29, 4 September 2009 (EDT)

==== Filter by "Class" ====

As a researcher, I want to create samples based on "class" since class is how I've encoded my conditions.

* Maaike Waalkens, July 30, 2010
* This is what made sense using Mathtutor for tutor delivery.
* What other fields are we missing in the Sample Selector?

==== Filter by "Step" ====

As a researcher, I want to create samples based on "step" since I'm only interested in one particular step.

* Mimi McLaughlin, August 17, 2010

==== Create Sample Automatically ====

* Would it be possible for me to get a random sample from the 'Bridge to Algebra 2006-2007' dataset of 100 students? I am having trouble looking at the data because it takes too long to load, and my adviser thought that was because the dataset was too large. ~~ DataShop User, 10/19/2010

== Web Services ==

==== Include transaction custom fields in web services student-step export ====

* "I noticed in the web services guide that cfs (which provides the custom fields) is not yet implemented for step roll-up tables. It is marked [coming soon] in the manual... I could write a workaround to pull in the transactions and lookup the custom fields, but I'd really rather not." -- Phil Pavlik, 4/27/2011

==== Use Custom Fields Graphs/Reports ====
In Graphs

As a researcher creating custom fields and assigning values at the transaction level, I want DataShop to perform the aggregation to the step level so that I can do other things with my custom-field variable such as graph it. -- Ryan Baker, mtg w/Alida & Brett, 12/15/2008

Performance Profiler
* Use all discrete variables/custom fields on the left and all the continuous variables on the bottom of the Performance Profiler. -- [[User:Koedinger|Ken Koedinger]], DataShop Team Meeting, Sept 11, 2009

==== Ad-hoc queries ====

* Allow restricted filtering on steps and transactions as the next web service feature (after CFs), whatever they can filter on in the navigation boxes (User Meeting AAR, December 9, 2009)

==== Sample creation as a web service ====

* Sample creation is still too slow. (User Meeting AAR, December 9, 2009)

 
----
See completed [[DataShop 3.x Features]] 
See on-going [[DataShop 4.x Features]] 
See prioritized [[DataShop Feature Wish List]]
[[Category:Protected]]
[[Category:DataShop]]

Collected User Requests

2011-04-28T16:26:25Z

Bleber: /* Include Custom Fields in Student-Step Rollup */

See prioritized items on [[DataShop Feature Wish List]].

== Annotations ==

==== Annotations on Transaction Level ====
* I have models which can annotate things like: gaming, bored, etc. on the transaction level. -- Ryan Baker, ET Mtg 12/5/2007

==== Annotations on Student Level ====
* Would annotate on student level. -- Ido Roll, User Meeting, 1/19/2009

==== Annotations on Pages ====
* See the cool thing created by Jeffery Heer where all the settings of the page were recorded with the comment. -- Ryan Baker, DS Team Mtg 5/23/2008

==== Dataset Discussion - Capture data-integrity issues ====
* As a stakeholder in the DataShop project, I want to capture and publicize the data-integrity issues discovered with data sets so that data is better documented (and so we've fulfilled a promise to our funders to better document data). -- Ken Koedinger, Team Meeting, 8/15/2009
* As a user of DataShop, I want to discuss datasets and have that discussion attached to the dataset so that others can comment and better understand any data-integrity issues I've found.

==== Linking to internal pages ====

* It would be handy if a link can be saved to any dataset, sample, page in the DataShop. -- [[Alida]], 10/18/2007
** Currently, https://pslc-qa.andrew.cmu.edu/datashop/DatasetInfo?datasetId=793 works if you are already logged in.

==== Have a link from the DataShop to the Theory Wiki (Dataset to Project Page) ====

* Can we link from the dataset to the project page on the Theory Wiki? In the pipeline have a clickable link to the project page (make project name clickable). -- Michael Bett, ET Mtg 11/14/2007
** Link to a dataset directly? Is that obvious to users? Click on dataset link -> log in -> redirected back to dataset. -- Brett Leber

== Data Format ==
==== Capture the question prompt and answer choices the student chose from ====
* As a research, I want to be able to identify steps based on the question prompt—not the difficult-to-understand step names that come from selection and action of my tutor—so that I can analyze the data more easily. For multiple choice questions, I also want to be able to see all of the choices that were available to the student. -- Eli Silk - February 1, 2011 (meeting with Brett and Ross Higashi of the FIRE project)
** Near-term solution is to create a table locally that maps steps to prompts
** Another solution is to ask CTAT team to modify their Flash components so that they log this information as custom fields
** Long-term solution is to make these fields into standard fields in the tutor message format
** In the email thread "Cognitive Tutor Development and Evaluation Requests", there is support for this general idea from Ken Koedinger, Albert Corbett, and Christian Schunn.
** Ken added that "Ideally, we may want to store any images that the student can see and where they reside on the screen ..."

== Data Modeling ==
=== Non-KC Modeling ===

==== Automatic Distillation ====
* As an educational data miner wishing to develop a machine learned model with PSLC data, I would like to be able to automatically distill data features (e.g. custom fields) commonly used in past educational data mining research for a new data set (see, for instance, Baker, Corbett, Roll, & Koedinger, 2008 in UMUAI) -- Ryan Baker, Summer 2008, Startup Memo
** Could be implemented as a plug-in
* Also interested in this feature idea. -- Dan Franklin, Oct 2008

==== Upload model and apply it to new data set ====

* As an EDM researcher, I would like to take a model, expressable as a linear formula on DataShop fields, or a simple code procedure (e.g. Bayesian Knowledge Tracing, which Ryan has code for), and apply it to a new data set, so that I can ... -- Ryan Baker, Sept 2008
* Also interested in this. -- Maxine Eskenazi, Sept 2008
* May work best as a plug-in
** Code to display GUI to choose which data sets to use, calls model code, re-import to DataShop
** Good to have a way to apply many models, as soon as you import a data set
* Phil has an idea that maybe fits within this one. Please move if there's a better category. -- Brett Leber
<blockquote>This [''transaction? kc? --ed.''] relabeling is really mostly about enabling modeling in DataShop right? With this in mind, I think that it is actually a higher priority to have model alternatives in DataShop.... E.g. Investigators should be able to give you chunks of Java code according to a certain specification, and DataShop should be then able to run these over datasets (perhaps after a certain series of QA occurs according to an SOP) when the investigator clicks some button in DataShop.... Obviously this is a much large project than adding columns, but it is also much more important in my mind. ''-- Phil Pavlik, email to Brett on 1/14/2009''</blockquote>
* Examples:
** Example: running gaming detector in multiple tutors and comparing gaming frequencies
** Example: applying Bayesian Knowledge Tracing to a new data set from the same LearnLab
** Example: applying Ben Shih's models to many data sets. Note that Ben Shih should be included in design of this feature; he is interested, and has a lot of good ideas.

==== Add Different Predicted Values ====
* Would also like to add statistics, different predicted values than what LFA produces. -- Ken Koedinger, ET Meeting, 10/10/2007

==== Bayesian Knowledge Tracing ====
* Bayesian Knowledge-Tracing built into DataShop like LFA is. -- Ryan Baker, Startup Memo, Summer 2008

==== Richer statistics for KC modeling ====

* In addition, to the model stats and estimates generated for learning factor models, we should also create difficulty factor models (i.e., ones with "Slope" parameter). The latter is particularly relevant for the Unique-Step model where the slope parameter is meaningless (but still counts against the BIC value). -- Ken Koedinger, Email "new feature request", 1/22/2009
** Relatedly, we should report significance values on the Slope parameters -- that is, when is the Slope significantly different from 0.
** The KC models page perhaps should also report the log-likelihood and number of parameters (in addition to BIC) and leave out AIC. We might also consider other metrics of model generality, like the "adjusted R2" (if I have this name right -- Joe Beck mentioned in the Assistments meeting yesterday).
** These changes will be part of meeting the CMDM goal of improvement in (or at least demonstrate acceptability of) the cognitive models in 90+ units in our LearnLab courses (or affiliates).

=== KC Modeling ===

==== LFA/AFM: Check if enough memory using formula ====
* The LFA/AFM code could calculate how much RAM would be needed to run the algorithm on a given skill model using the formula provided by Hao. This formula is based on the number of transactions, number of students and number of skills. Right now it will not schedule itself to run on a model with over 300 skills, though there is a manual override. [[User:Alida|Alida]] 13:35, 29 November 2010 (EST)

==== Create KC Models through Web Services ====
* For John Stamper's CMDM project, it would be nice to automatically update KC Models through web services. -- [[User:Koedinger|Ken Koedinger]], DataShop Team Meeting, Sept 11, 2009

==== Automatically discovering new KC model ====

* Would it be possible to run some code (perhaps Hao's KC model selection code, perhaps something else generated by CMDM thrust) to find new best KC model. -- Vincent Aleven, Sept 2008
* As a learning sciences researcher, I would like DataShop to discover a new/better KC model for me.
* Could be done as a plug-in

==== Generate new KC Models with LFA ====
* Not sure who asked for this.
* It would be nice to generate new KC Models with Hao's LFA code
* Would need to specify factors.
* Ideas on where this could run?
** On a separate server? Request it to be run, specify duration. Have separate server queue up requests, email user when done.
** In Java Applet on client machine? -- Phil Pavlik

==== Same Skill Twice on Same Step ====
* Would like to be able to apply the same skill to a step twice during a KC Model Import. -- Ken Koedinger, email, 2/4/2009

==== Save KC Model Import Files ====
* KC Model Import - save the file used to create the KCMs in case we need to recreate them. -- Ken Koedinger, email 3/4/2009

==== Log Likelihood and MAD ====
* Log Likelihood, MAD (mean absolute deviation) problem, MAD step (store and show) -- Hao Cen
** This is a variation on "Richer statistics for KC modeling" above. Probably should be merged. - Ken Koedinger

==== Better naming for KCs in auto-generated Unique-Step KCM ====

As a researcher, I want the KCs in the Unique-Step model to have better names than KC1, KC2, etc, so that I can easily tell which generated KCs go with which unique step.

* Hui Cheng (Email 1/20/2009), Ken Koedinger (Email 1/22/2009)
* Could you put this in your new feature request list: could the “Unique-step model” be better labeled then just “KC1”, “KC2” and etc? For example, for Assistments data, you could use part of the “Step Name”. -- Hui Cheng, Email 1/20/2009
* But, anything is better than "KC". -- Ken Koedinger, email, 1/22/2009
<blockquote>A simple alternative, that preserves uniqueness and addresses length, is to concatenate: 1) the first K letters of the step name 2) a unique numerical increment (same as the "3" in "KC3"). Note that (2) guarantees uniqueness just as it does in the current "KC<num>" scheme. Or perhaps better given that step names are often scoped within problems, is to concatenate: 1) the first L letters of the problem name 2) the first M letters of the step name 3) a unique numerical increment (just like the "3" in "KC3") I think K or L+M should be as big as possible without making the KC names indistinguishable (because they run off the right margin) in the KC list on the Learning Curve and other pages. </blockquote>

==== Visualize Learning Curve Split ====
* Be able to visualize a learning curve split into 2 based on a specification of a subset of problems. -- Albert Corbett, Math CCM, November 2008
* Harder: make me a new KCM out of it. -- Ken Koedinger, Team Mtg, Dec 5, 2008, while trying to describe Albert's request to Alida

==== Statistical Significance ====
* Can DataShop determine if the difference between conditions or learning curves is statistically significant? -- general theme at workshop, probably mentioned by Bob Hausmann in his talk, Winter Workshop 1/23/2008
** Can't do it yet in DataShop, but I can show you how to do it in R (or SPSS...) after you've exported the data -- export the "student-step rollup" than the whole transaction table. --[[User:Koedinger|Koedinger]] 16:30, 16 September 2009 (EDT)
*** This comment perhaps belongs (exists?) elsewhere: The current student-step rollup export (relly, all exports) should be such that I can immediately load it into R (and other packages) without error. Now errors occur, for instance, because there are "#" in the variable names i the student-step rollup. However, it is currently a road-block for helping folks like Bob do analyzes they want to do. --[[User:Koedinger|Koedinger]] 16:30, 16 September 2009 (EDT)
*** See [[Condition in Student-Step Rollup]] -- [[User:Alida|Alida]] 10:35, 17 September 2009 (EDT)

==== Split and Merge Skills ====

* Hand searches though a p-matrix for a dataset to split and merge skills. (Pie in the Sky) -- Ken Koedinger, Team Mtg, 02/22/2008

==== Notes on new KCMs ====

* It would be good if I could add a note to a KC model that was newly imported. -- Noboru Matsuda, email, Nov 19, 2009

==== Display Learning Curves Grouped by Interestingness ====

* The page displaying all the learning curves today seems to be alphabetically sorted by KC name, which is not necessarily meaningful.
* An alternative presentation is to group the curves into 4 sets, breaking up the page. Set 1 has curves that contain significant spikes, and therefore seem to be "low-hanging fruit" for purposes of breaking up into KCs. Set 2 has curves with few spikes, but they have a long X axis, suggesting that students are presented with too many opportunities to acquire those KCs. Set 3 has the "good" curves, i.e., nicely decreasing curves that are not too long. Set 4 is "other". -- [http://www.pitt.edu/~goldin Ilya Goldin] 7 December 2009

==== Order KC models according to AIC====
* This is based on results from multiple dataset analyses that compared AIC, BIC and loglikelihood to cross validation RMSE. -- Mimi McLaughlin, 2/9/2011

====Use log of opportunity count for AIC and BIC calculations====
* We compared using the log of opportunity count to whole number opportunity count in multiple datasets. We found the results for log of opportunity count to be consistently better, though small. -- Mimi McLaughlin, 2/9/2011

== Developer Requests ==

==== Store Converted Date and Converter Info ====

As a DataShop administrator, I'd like to see the converter information (version and date) stored in the database, so that I do not have to store that data manually in the Additional Notes field each time I load a dataset. -- Kyle, 8/5/2008

* DTD new fields:
** Store conversion and converter information in database (anything else?)
*** conversion_time
*** converter_info
* Email from Octav, 10/5/2007

I see there's also a note about the converter version in the Dataset
Info. Which is good, but it seems it's taken from the directory name
when I submitted the set. I don't know how reliable that is.  :-) It
would be better if it's taken from the new converter info field.

==== Convert from XML to tabbed-delimited format ====

* If the users agree that export format is valuable, then maybe if they could convert from XML to export format to see data in Excel, could look at Selection column and see blanks more easily. -- Jonathan Sewall, ET Mtg, 10/10/2007

==== Plug-ins (general issues) ====

* Please please support Ruby on Rails. -- Ben Shih, December 2008

==== Create UI to grant DataShop user roles ====

* Already tedious.
* Alida, User Meeting AAR on December 9, 2009

== Help ==

==== Specialize Label of Help Button ====
* Since this help is better than in most applications, it should say more than just 'Help'. -- Ken Koedinger & Ryan Baker, Team Mtg, May 30, 2008
** Ideas:
*** 'Page Help'
*** 'Help with this page'
*** 'Help with Learning Curve page' (Ken's favorite)
*** 'Help with this tool'

== Home Page ==

==== Redesign the Home Page ====

* In the menu of data sets at the top, include the N (=20 or as many as fit on the screen?) data sets that I have visited starting with the ones I've visited most recently. --[[User:Koedinger|Koedinger]] 16:16, 16 September 2009 (EDT)
** As of today, the menu shows the last 10 data sets, most recently visited at the top. I think the feature is good enough, but let's ask Ken --[[User:Bleber|Bleber]] 10:56, 6 August 2010 (EDT)
* There needs to be a better ordering for the datasets (DS364)
* Maybe a search to filter the list of datasets since the list is so long. -- Brett Leber, 6/14/2007
* Going back to the home page always goes to 'My Datasets' (DS313)
* Maybe show more high level stats on this page, like how many transactions, students, skill models
* Allow users to post and share project documentation (files, papers, other meta-info) -- Ruth Wylie, suggested during meeting on 8/4/2010. She had a file that is relevant to multiple datasets in her project.

== Import ==
== Miscellaneous ==

==== Analyses by LearnLab ====

* Organize data by LearnLab, not by "data set". -- Ryan Baker, Aug 2008
* Also: Bob Hausmann, Sep 2008; Maxine Eskenazi, Sep 2008
* Essentially, current data sets become samples, but the top-level unit is the LearnLab. You can take every data set in a LearnLab together as a sample.
* Implies being able to run analyses across data sets, and export multiple data sets together; to create multi-data set samples
* As a user of DataShop, I would like to look at learning curves for all Algebra data together (for example), or export all Algebra data
* Important long-term, but is a lot of work -- in particular, we need to solve scalability issues first.

==== Save Settings Between Sessions ====
* It would be useful if DataShop could save settings between sessions. -- Bob Hausmann, User Meeting, 2/1/2008
** "I do a lot of redoing the same steps" (e.g., set cutoffs, select a KC model, select students).

==== Multiple steps per transaction ====

* Needed so that we do not have to create multiple transactions for the same actual action for Andes logs. -- Kurt van Lehn, Feb 2007

==== Demographic data ====

* This has been mentioned by NSF visitors, AB, ESL, and some researchers.
* Also mentioned at Winter Workshop 1/23/2008.
** Derek/Sue-mei: Student background information not in DataShop. Would like to see a student or set of students from a particular demographic, and view them across datasets!
* Note that Gail added demographic data to Additional Notes field on the Dataset Info page for many datasets. The idea here is to put that data into the database somewhere.

==== Single Sign On ====
* Michael Bett, email, 10/8/2007
* It would be nice if the following services have a single login account/password:
*# Theory Wiki
*# Learnlab.org
*# ESL's OSS
*# DataShop

==== Reveal unanonymized student IDs ====

As a researcher/PI performing research assistant tasks, I want to see easily the unanonymized student IDs of subjects in DataShop so that I can email my subjects telling them when to use my system.

* Ruth Wylie, July 3, 2008
* As the honest broker of the PSLC data, I have promised to not reveal the student IDs and to protect the identity of the students. This is part of the DataShop IRB. Therefore, I do not see this request as possible. [[User:Alida|Alida]] 09:53, 4 September 2009 (EDT)
** At some point, some coordination would be good with OLI's Digital Dashboard project that Marsha Lovett (and the OLI team and sometimes me) is working on. This is meant to provide usage information more quickly to instructors. It could also perhaps be used by researchers (with the right IRB rights) in situations like Ruth's.

==== Knight Timeline ====
* Developed by Andrea Knight, 2004

==== Buggy Skills ====
* Ken Koedinger, prototype walkthrough 9/11/2006

==== Confusion Matrix ====
* Brian MacWhinney, prototype walkthrough 9/11/2006

== Navigation Bar ==
==== Filter KCs by Name ====

As a researcher working with KCs, I want to filter KCs based on their names, so that I can...
* Vincent Aleven, Email, 2/3/2008
* "Since Alida said you cannot have two mechanisms for putting together your KC set (i.e., cannot have both the selecting-by-clicking and selecting-by-filtering), I would probably opt for the latter."
* Alida: I thought Vincent mentioned that he'd like to select which KCs are in a set by filtering on the name. Example: Include KCs with '*reason*' in the name and exclude KCs with '*given*' in the name.
* This could be an addition to our v3.0 KC-selection mechanism--filter by name.
* Vincent, Email, 5/6/2009: Expressed another need for this feature. Could just allow for a wider area and longer list so that more items can be checked at once. The number of characters we show right now is not enough because in many cases that number of characters is the same across many of the skills. Reference data set: Geometry CWCTC 2005-2006

* Status: Design Started

==== Facebook-style KC Selection ====

As a researcher working with KCs, I want to select KCs based on the learning curve thumbnail, so that I can see quickly which ones I'm interested in exploring more deeply.

* Feature already designed for v3.0, not implemented due to time constraints.
* Agreed this would be really useful. -- Kirsten Butcher, User Mtg, 1/31/2008
* Status: Guestimate: 20 days, need to revisit requirements document

==== Feedback after clicking a large sample on a large dataset ====

As a user of DataShop (first-time or not), I want some feedback and the ability to cancel after I do something that might take a long time (e.g., clicking "All Data" on a large dataset) so that I do not get stuck.

* Part of the Susan Goldman story
* After clicking a sample for a large dataset, there is no "Loading..." text, no feedback that the click was even registered by the app (besides the sometimes busy cursor and small browser "loading" text), nor the ability to cancel the action. 
* We will always have similar problems even if performance is improved, so providing feedback and the opportunity to cancel is critical.

==== Save Button in Problem Navigation Box ====

* Save buttons in the sidebar. -- Ken Koedinger, Mtg 2006
** Could also put one in the Problem selection box in the sidebar.

==== Make Nav Bar Wider ====
* Make the Sample name and description fields much wider. -- John LaPlante, email 7/10/2007

== New Visualizations/Reports ==

==== Student-KC Rollup ====
As a researcher, I want to see KCs rolled up by student, so that I can ...

* Vincent Aleven, User Mtg, 1/29/2008
** By Student-KC would be more useful than by Student-Problem
** Example: # Steps asking for a hint or error or what proportion had help
** How often bottom out hint occurs

==== Instructor Reports ====
* Phil said he received a lot of positive reactions to providing reports on units for instructors. -- Phil Pavlik, ET Mtg 10/10/2007

==== Manage Authorizations/Projects Page ====
* Lisa Anthony, email 10/23/2007
* Allow PI to change permissions on the datasets.
:"Actually, I couldn't see how to change permissions on the datasets from the website. Is this possible? If not, it might be a nice feature..."

==== Calculate Time Spent on Different Study Activities ====

As a researcher, I want to know how much time, on average, students spend on study activities, so that I can ...

* Bruce McLaren, Email, 4/7/2009
<blockquote>
For my most recent stoich study, Shawn and I are interested in calculating timing information such as: (a) how long students spent, on average, working on individual tutors (b) how long students spent, on average, on all items in an intervention (c) how long students worked, on average, on post-tests. 

Timing information is very commonly required for studies, and can be calculated from DataShop logs relatively easily, so even if we don't have it, might be worth considering. (And we don't want to re-invent the wheel, if you already have it or are planning it...)
</blockquote>

==== Incorrect Step Duration and Hint Step Duration ====

As a researcher, I want to be able to see total step duration if the student's first attempt was an incorrect attempt, and total step duration if the student's first attempt was a hint request, so that I can do some analyses that I can't do with "Error Step Duration".

* Bob Hausmann, email, 11/11/2008.
* Updated title and story with 'step duration' instead of 'time'. -- [[User:Alida|Alida]] 10:36, 4 September 2009 (EDT)

==== Grading ====

[[Grading]]

====Display number of steps and number of observations for skills====
* How and where to be determined by developers. -- Ken (entered by Mimi)

== Reports ==
=== Dataset Info ===

==== Pointers to Hard-copy Data ====
* Brett van de Sande, NSF Site Visit, 5/28/2008
* Pointers to hard-copy data such as paper tests and/or homework.  Include contact information.  It doesn't seem to make sense to scan a whole filing cabinet of paper if no one wants to look at it.  And any secondary researchers don't know about the filing cabinet to ask for it.

==== Sort Problem Breakdown Table ====

* Would like the ability to sort the table by clicking on the column headings of the Problem Breakdown Table on the Dataset Info Tab. -- Bruce McLaren, User Mtg, 11/5/2007

==== Rename dataset ====

As a researcher, I want to rename my dataset so that it makes more sense to other people. I also want to make sure the dataset doesn't become polluted later by new data not associated with my study.
* Ruth Wylie, July 3, 2008
* There are reasons she would want to do this (current name is worthless, other researchers might try her tutor and pollute her data) but also reasons for not doing it (log more data later).
* There are risks in changing a dataset name that might not be apparent. For example, if you want the new data in the same dataset. [[User:Alida|Alida]] 10:27, 4 September 2009 (EDT)

==== Average time per problem, average number of problems, total number of sections ====

In addition to showing student hours per dataset, it would be useful to know the average time spent per problem, average number of problems completed, and the total number of sections.
* Noboru Matsuda, June 18, 2010

=== Error Report ===

==== View By Student ====

* Would like to see what a couple of students saw in the feedback. -- Marsha Lovett, 10/11/2007

==== Export ====
* I would like the ability to export this data. -- John LaPlante, email thread 'Suggestions for Improvement' 7/10/2007
* Also interested in this feature idea. -- Bruce McLaren, User Mtg, 11/5/2007

==== Sort ====

* Was planned for but not implemented in v2.1 (estimated to be a 4 day task)
** By Correctness %, starting with the least correct
** By Hints %
** Step (or KC if view by KC)
** Number of Students
* Ability to sort problems by their average experienced position within the curriculum. -- Ken Koedinger, 02/16/2007
** Which problem did students most often experience first, then the one experience second most often, ...
* Order steps by the order they typically are executed by students. -- Ken Koedinger, email 11/7/2008
:"Searching through the steps in a problem to get a sense of what is going on is currently hard because the steps are ordered alphabetically, not by the order in which most students did them. While not all students do all steps in the same order, there is some regularity there. It would be quite useful if the steps could be ordered in a "typical order". This could be accomplished by using the time stamps (of the first (correct?) transaction?) for each step to determine rank order of the each step for each student in a problem and then for each problem average the rank order of each step across all students. Then arrange the steps in the Error Report by their average rank order -- that is, roughly speaking, the step that is most likely to be first across students (closest on average to first) goes first, the step with the next lowest rank goes next, etc."
* The capability to count the number of errors of each message type and sort in different ways, for instance by all errors that had no messages. -- Bruce McLaren, email 10/22/2007
:"This is an error analysis I recently did in Excel, using pivot tables, that might be handy if in the DataShop. This one is very important for tutors because the errors that occur most frequently, yet don't elicit messages to the students, are good candidates to become errors with feedback."

==== Show more than 500 problems ====

In the error report, can we see more than 500 problems? See set "Cog Model Discovery Experiment Spring 2010"
2/2888 selected.
(Showing the first 500)
-- Ken via email on 2/15/2011

=== Export ===

==== Last attempt on step? column for transaction format ====

* Include a new column that shows whether the row is the last attempt on a step for a student or not. Could be 0 or 1 as value. Helpful for researchers who are doing grading of data. Transaction format. --Vincent Aleven, CTAT mtg 11/5/2010

==== Elapsed Time ====

* Include the elapsed time in preview and transaction export. It is more valuable than the transaction time as an absolute reference. Possible to keep both. --Ken Koedinger, Team Mtg 04/18/2008

==== SQL Format ====

* Option to export as an SQL file. -- Ken Koedinger, 03/26/2007, also brought up in June ET Meeting
** Ability to export an SQL dump of a dataset. --Kyle Cunningham, 04/03/2007

==== Specify Character for Blanks ====

* Ability to specify what character if any is used for blank. --Ryan Baker, email 8/9/2007
:"Not all tools handles TABTABTAB correctly on import. The period '.' is used to mean missing data in most stats packages. The word 'BLANK' is used in some other ones. Not an issue for Ryan as he wrote a preprocessor to convert blanks.

==== Opportunity (at KC) Count in Transaction Export ====
* Include the opportunity count in the transaction export (it's only in the student-step rollup) -- Noboru Matsuda, 10/08/2009
** Would be repetitive
** Would need one Opportunity column per KC

==== Export only rows that have KCs tagged ====
* Is it possible when a exporting dataset to include only the [transaction?] rows that have knowledge components tagged? And the same goes for KC models export, only include the items that have KC tagged? -- Hui Cheng, 01/19/2010
** We have the inverse of this option on the Performance Profiler, 'Include steps without a knowledge component', and with the Student-Problem export, 'Include Steps without KCs'.

==== Include Custom Fields in Student-Step Rollup ====
* Is it possible to include the custom field in the datashop [step] export? -- Hui Cheng, 03/01/2010
** "I am thinking about the student-step level export. We have a student from Statistics wanting to use assistment data from datashop. She wants to have problem set id (=curriculum id in datshop). For Assistment data, we decided to put problem set id in the custom field because in the Assistment a lot of problems are repeated in different problem sets. She needs data at the student-step level." (Hui email)
* Vote from Phil Pavlik too -- see [[Collected_User_Requests#Include_transaction_custom_fields_in_web_services_student-step_export]]

==== Don't duplicate rows in student-step format when not showing KCs ====
* If the checkbox to show knowledge components is not checked, maybe it doesn't make sense to show rows more than once if they have more than one KC associated with the step. (Mimi (and Brett) stumbled on this. 8/16/2010)

==== Student-Step Rollup include Success Column ====
* Step rollup, 1 if correct, 0 if incorrect/hint, blank otherwise call it Success. ~~ Ken Koedinger, DataShop Team Meeting, Oct 22, 2010

=== Learning Curve ===

==== Default sort by observation on LCPID ====

* Default sort by observation on LCPID. ~~ Ken Koedinger, DataShop Team Meeting, 10/22/2010

==== Purple Point ====
* Purple Point: if a point on the LC has more than one KC associated with it but you have drilled down to a given KC, then the blue line is off. We could put a purple point that takes this into account.
* Simpler thing: display a warning message that some points in the display are driven by other KCs
* Pearson may be interested
* This was mentioned during the PSLC Summer School 2010.
* For a step with multiple skills, attribute the error only to the skill with the highest overall error rate. ~~ Alida, meeting with Ken, November 18, 2010

==== Reduce Scrolling ====
* Add a forward and back button to the graph to reduce scrolling. -- John LaPlante, email 7/10/2007

:"The learning curve page could use a forward and back button to cycle through learning the learning curves. Going through them one by one requires a lot of scrolling."

==== Turn On Point Labels ====

* It would be nice to have the option to turn on point labels. It is nice that I can mouse over a point and view the data but it would be nicer if it appeared automatically. -- John LaPlante, email 7/10/2007

==== Option for Bigger Graph ====

* Allow user to see bigger graph. -- Derek Chan, Winter Workshop 1/23/2008
** Potential solution: enable user to set x, y scale manually

==== LC Normalize Scale of Thumbnails ====
* [[LC Normalize Scale of Thumbnails]]

=== Performance Profiler ===

==== Rename Performance Profiler ====
* John laPlante, email thread 'Suggestions for Improvement' 7/10/2007
** Did not use this report as thought it had something to do with improving the performance of the DataShop itself. This report might have been much better to use then the Error Report, would still need an export as using the data in a tabular form was still necessary. Note that the pivot tables created were added to the dataset (Pittsburgh Science of Learning Center Stoichiometry Study 1).
* Lisa Anthony, email 8/2008
** Didn't know to go to the report. Export would be useful.
** Needed a better definition of Error Rate with respect to Problem and Unit rows.

==== Export ====
* John laPlante (see comments in Rename Performance Profiler)
* Lisa Anthony (see comments in Rename Performance Profiler)
* Yes, I too can imagine wanting to export the results of a particular performance profiler output (i.e., to a table) so that I can graph it my own way. --[[User:Koedinger|Koedinger]] 16:24, 16 September 2009 (EDT)

==== Table View ====
* Add option to switch to a table view.
* Columns are: Problem Name, Steps, % incorrect, Incorrect Steps, % hint, Hint Steps, etc. include all values in pop-up.

==== Union of KCs/Problems/Students ====
* Allow user to get the union of KCs/Problems/Students etc so they can compare across samples easier. -- Kirsten Butcher, Winter Workshop 1/23/2008
:"It is difficult to compare performance profiler graphs across samples because the KCs (or problems, or whatever) aren't necessarily in both of those samples."

==== Show Details In Report ====
* Click on bar to see details in report and not just in pop-up. It disappears too quickly. -- Alida, Brett

==== Show More Information in the Graph ====
* Show more information in the graph: [Bruce M, User Mtg, 11/5/2007]
** Had drilled down by a certain skill - skill is not listed in the graph, user has to check the skill list on the LHS to see what skill was selected
* Maybe related to Bruce's "show more info in graph": show the actual value of the range variable. e.g., when range is error rate, show the error rate number somewhere (right now you can see, via mouse-over, the incorrect, hint, and correct percentages, but not the error rate) -- Brett
* Design idea to show # steps incorrect, # steps hint, # steps correct to clarify how the percentages are calculated. -- Alida, Brett

== Sample Selector ==

==== Sub-Samples ====

* Sub Samples would be helpful. -- John LaPlante, email 7/10/2007
:"A nice solution would be to have sub-samples where one property varies. When I'm doing this analysis, I've changed my sample many times, renaming it sometimes, tweaking it to get variations on the data. The samples are really useful but they could help me a lot more with this kind of experimentation."

==== Filter out students ====

As a researcher, I want to filter out test users (including myself) from my data so that I see less noise in the data.

* Ruth Wylie, July 3, 2008
* You can already filter out test users by using a test user id that starts with 'weirdCMUuser_xxx'. Then create a sample that excludes students with a name like 'Test_%'. --[[User:Alida|Alida]] 10:29, 4 September 2009 (EDT)

==== Filter by "Class" ====

As a researcher, I want to create samples based on "class" since class is how I've encoded my conditions.

* Maaike Waalkens, July 30, 2010
* This is what made sense using Mathtutor for tutor delivery.
* What other fields are we missing in the Sample Selector?

==== Filter by "Step" ====

As a researcher, I want to create samples based on "step" since I'm only interested in one particular step.

* Mimi McLaughlin, August 17, 2010

==== Create Sample Automatically ====

* Would it be possible for me to get a random sample from the 'Bridge to Algebra 2006-2007' dataset of 100 students? I am having trouble looking at the data because it takes too long to load, and my adviser thought that was because the dataset was too large. ~~ DataShop User, 10/19/2010

== Web Services ==

==== Include transaction custom fields in web services student-step export ====

* "I noticed in the web services guide that cfs (which provides the custom fields) is not yet implemented for step roll-up tables. It is marked [coming soon] in the manual... I could write a workaround to pull in the transactions and lookup the custom fields, but I'd really rather not." -- Phil Pavlik, 4/27/2011

==== Use Custom Fields Graphs/Reports ====
In Graphs

As a researcher creating custom fields and assigning values at the transaction level, I want DataShop to perform the aggregation to the step level so that I can do other things with my custom-field variable such as graph it. -- Ryan Baker, mtg w/Alida & Brett, 12/15/2008

Performance Profiler
* Use all discrete variables/custom fields on the left and all the continuous variables on the bottom of the Performance Profiler. -- [[User:Koedinger|Ken Koedinger]], DataShop Team Meeting, Sept 11, 2009

==== Ad-hoc queries ====

* Allow restricted filtering on steps and transactions as the next web service feature (after CFs), whatever they can filter on in the navigation boxes (User Meeting AAR, December 9, 2009)

==== Sample creation as a web service ====

* Sample creation is still too slow. (User Meeting AAR, December 9, 2009)

 
----
See completed [[DataShop 3.x Features]] 
See on-going [[DataShop 4.x Features]] 
See prioritized [[DataShop Feature Wish List]]
[[Category:Protected]]
[[Category:DataShop]]

Collected User Requests

2011-04-28T16:25:14Z

Bleber: /* Include Custom Fields in Student-Step Rollup */

See prioritized items on [[DataShop Feature Wish List]].

== Annotations ==

==== Annotations on Transaction Level ====
* I have models which can annotate things like: gaming, bored, etc. on the transaction level. -- Ryan Baker, ET Mtg 12/5/2007

==== Annotations on Student Level ====
* Would annotate on student level. -- Ido Roll, User Meeting, 1/19/2009

==== Annotations on Pages ====
* See the cool thing created by Jeffery Heer where all the settings of the page were recorded with the comment. -- Ryan Baker, DS Team Mtg 5/23/2008

==== Dataset Discussion - Capture data-integrity issues ====
* As a stakeholder in the DataShop project, I want to capture and publicize the data-integrity issues discovered with data sets so that data is better documented (and so we've fulfilled a promise to our funders to better document data). -- Ken Koedinger, Team Meeting, 8/15/2009
* As a user of DataShop, I want to discuss datasets and have that discussion attached to the dataset so that others can comment and better understand any data-integrity issues I've found.

==== Linking to internal pages ====

* It would be handy if a link can be saved to any dataset, sample, page in the DataShop. -- [[Alida]], 10/18/2007
** Currently, https://pslc-qa.andrew.cmu.edu/datashop/DatasetInfo?datasetId=793 works if you are already logged in.

==== Have a link from the DataShop to the Theory Wiki (Dataset to Project Page) ====

* Can we link from the dataset to the project page on the Theory Wiki? In the pipeline have a clickable link to the project page (make project name clickable). -- Michael Bett, ET Mtg 11/14/2007
** Link to a dataset directly? Is that obvious to users? Click on dataset link -> log in -> redirected back to dataset. -- Brett Leber

== Data Format ==
==== Capture the question prompt and answer choices the student chose from ====
* As a research, I want to be able to identify steps based on the question prompt—not the difficult-to-understand step names that come from selection and action of my tutor—so that I can analyze the data more easily. For multiple choice questions, I also want to be able to see all of the choices that were available to the student. -- Eli Silk - February 1, 2011 (meeting with Brett and Ross Higashi of the FIRE project)
** Near-term solution is to create a table locally that maps steps to prompts
** Another solution is to ask CTAT team to modify their Flash components so that they log this information as custom fields
** Long-term solution is to make these fields into standard fields in the tutor message format
** In the email thread "Cognitive Tutor Development and Evaluation Requests", there is support for this general idea from Ken Koedinger, Albert Corbett, and Christian Schunn.
** Ken added that "Ideally, we may want to store any images that the student can see and where they reside on the screen ..."

== Data Modeling ==
=== Non-KC Modeling ===

==== Automatic Distillation ====
* As an educational data miner wishing to develop a machine learned model with PSLC data, I would like to be able to automatically distill data features (e.g. custom fields) commonly used in past educational data mining research for a new data set (see, for instance, Baker, Corbett, Roll, & Koedinger, 2008 in UMUAI) -- Ryan Baker, Summer 2008, Startup Memo
** Could be implemented as a plug-in
* Also interested in this feature idea. -- Dan Franklin, Oct 2008

==== Upload model and apply it to new data set ====

* As an EDM researcher, I would like to take a model, expressable as a linear formula on DataShop fields, or a simple code procedure (e.g. Bayesian Knowledge Tracing, which Ryan has code for), and apply it to a new data set, so that I can ... -- Ryan Baker, Sept 2008
* Also interested in this. -- Maxine Eskenazi, Sept 2008
* May work best as a plug-in
** Code to display GUI to choose which data sets to use, calls model code, re-import to DataShop
** Good to have a way to apply many models, as soon as you import a data set
* Phil has an idea that maybe fits within this one. Please move if there's a better category. -- Brett Leber
<blockquote>This [''transaction? kc? --ed.''] relabeling is really mostly about enabling modeling in DataShop right? With this in mind, I think that it is actually a higher priority to have model alternatives in DataShop.... E.g. Investigators should be able to give you chunks of Java code according to a certain specification, and DataShop should be then able to run these over datasets (perhaps after a certain series of QA occurs according to an SOP) when the investigator clicks some button in DataShop.... Obviously this is a much large project than adding columns, but it is also much more important in my mind. ''-- Phil Pavlik, email to Brett on 1/14/2009''</blockquote>
* Examples:
** Example: running gaming detector in multiple tutors and comparing gaming frequencies
** Example: applying Bayesian Knowledge Tracing to a new data set from the same LearnLab
** Example: applying Ben Shih's models to many data sets. Note that Ben Shih should be included in design of this feature; he is interested, and has a lot of good ideas.

==== Add Different Predicted Values ====
* Would also like to add statistics, different predicted values than what LFA produces. -- Ken Koedinger, ET Meeting, 10/10/2007

==== Bayesian Knowledge Tracing ====
* Bayesian Knowledge-Tracing built into DataShop like LFA is. -- Ryan Baker, Startup Memo, Summer 2008

==== Richer statistics for KC modeling ====

* In addition, to the model stats and estimates generated for learning factor models, we should also create difficulty factor models (i.e., ones with "Slope" parameter). The latter is particularly relevant for the Unique-Step model where the slope parameter is meaningless (but still counts against the BIC value). -- Ken Koedinger, Email "new feature request", 1/22/2009
** Relatedly, we should report significance values on the Slope parameters -- that is, when is the Slope significantly different from 0.
** The KC models page perhaps should also report the log-likelihood and number of parameters (in addition to BIC) and leave out AIC. We might also consider other metrics of model generality, like the "adjusted R2" (if I have this name right -- Joe Beck mentioned in the Assistments meeting yesterday).
** These changes will be part of meeting the CMDM goal of improvement in (or at least demonstrate acceptability of) the cognitive models in 90+ units in our LearnLab courses (or affiliates).

=== KC Modeling ===

==== LFA/AFM: Check if enough memory using formula ====
* The LFA/AFM code could calculate how much RAM would be needed to run the algorithm on a given skill model using the formula provided by Hao. This formula is based on the number of transactions, number of students and number of skills. Right now it will not schedule itself to run on a model with over 300 skills, though there is a manual override. [[User:Alida|Alida]] 13:35, 29 November 2010 (EST)

==== Create KC Models through Web Services ====
* For John Stamper's CMDM project, it would be nice to automatically update KC Models through web services. -- [[User:Koedinger|Ken Koedinger]], DataShop Team Meeting, Sept 11, 2009

==== Automatically discovering new KC model ====

* Would it be possible to run some code (perhaps Hao's KC model selection code, perhaps something else generated by CMDM thrust) to find new best KC model. -- Vincent Aleven, Sept 2008
* As a learning sciences researcher, I would like DataShop to discover a new/better KC model for me.
* Could be done as a plug-in

==== Generate new KC Models with LFA ====
* Not sure who asked for this.
* It would be nice to generate new KC Models with Hao's LFA code
* Would need to specify factors.
* Ideas on where this could run?
** On a separate server? Request it to be run, specify duration. Have separate server queue up requests, email user when done.
** In Java Applet on client machine? -- Phil Pavlik

==== Same Skill Twice on Same Step ====
* Would like to be able to apply the same skill to a step twice during a KC Model Import. -- Ken Koedinger, email, 2/4/2009

==== Save KC Model Import Files ====
* KC Model Import - save the file used to create the KCMs in case we need to recreate them. -- Ken Koedinger, email 3/4/2009

==== Log Likelihood and MAD ====
* Log Likelihood, MAD (mean absolute deviation) problem, MAD step (store and show) -- Hao Cen
** This is a variation on "Richer statistics for KC modeling" above. Probably should be merged. - Ken Koedinger

==== Better naming for KCs in auto-generated Unique-Step KCM ====

As a researcher, I want the KCs in the Unique-Step model to have better names than KC1, KC2, etc, so that I can easily tell which generated KCs go with which unique step.

* Hui Cheng (Email 1/20/2009), Ken Koedinger (Email 1/22/2009)
* Could you put this in your new feature request list: could the “Unique-step model” be better labeled then just “KC1”, “KC2” and etc? For example, for Assistments data, you could use part of the “Step Name”. -- Hui Cheng, Email 1/20/2009
* But, anything is better than "KC". -- Ken Koedinger, email, 1/22/2009
<blockquote>A simple alternative, that preserves uniqueness and addresses length, is to concatenate: 1) the first K letters of the step name 2) a unique numerical increment (same as the "3" in "KC3"). Note that (2) guarantees uniqueness just as it does in the current "KC<num>" scheme. Or perhaps better given that step names are often scoped within problems, is to concatenate: 1) the first L letters of the problem name 2) the first M letters of the step name 3) a unique numerical increment (just like the "3" in "KC3") I think K or L+M should be as big as possible without making the KC names indistinguishable (because they run off the right margin) in the KC list on the Learning Curve and other pages. </blockquote>

==== Visualize Learning Curve Split ====
* Be able to visualize a learning curve split into 2 based on a specification of a subset of problems. -- Albert Corbett, Math CCM, November 2008
* Harder: make me a new KCM out of it. -- Ken Koedinger, Team Mtg, Dec 5, 2008, while trying to describe Albert's request to Alida

==== Statistical Significance ====
* Can DataShop determine if the difference between conditions or learning curves is statistically significant? -- general theme at workshop, probably mentioned by Bob Hausmann in his talk, Winter Workshop 1/23/2008
** Can't do it yet in DataShop, but I can show you how to do it in R (or SPSS...) after you've exported the data -- export the "student-step rollup" than the whole transaction table. --[[User:Koedinger|Koedinger]] 16:30, 16 September 2009 (EDT)
*** This comment perhaps belongs (exists?) elsewhere: The current student-step rollup export (relly, all exports) should be such that I can immediately load it into R (and other packages) without error. Now errors occur, for instance, because there are "#" in the variable names i the student-step rollup. However, it is currently a road-block for helping folks like Bob do analyzes they want to do. --[[User:Koedinger|Koedinger]] 16:30, 16 September 2009 (EDT)
*** See [[Condition in Student-Step Rollup]] -- [[User:Alida|Alida]] 10:35, 17 September 2009 (EDT)

==== Split and Merge Skills ====

* Hand searches though a p-matrix for a dataset to split and merge skills. (Pie in the Sky) -- Ken Koedinger, Team Mtg, 02/22/2008

==== Notes on new KCMs ====

* It would be good if I could add a note to a KC model that was newly imported. -- Noboru Matsuda, email, Nov 19, 2009

==== Display Learning Curves Grouped by Interestingness ====

* The page displaying all the learning curves today seems to be alphabetically sorted by KC name, which is not necessarily meaningful.
* An alternative presentation is to group the curves into 4 sets, breaking up the page. Set 1 has curves that contain significant spikes, and therefore seem to be "low-hanging fruit" for purposes of breaking up into KCs. Set 2 has curves with few spikes, but they have a long X axis, suggesting that students are presented with too many opportunities to acquire those KCs. Set 3 has the "good" curves, i.e., nicely decreasing curves that are not too long. Set 4 is "other". -- [http://www.pitt.edu/~goldin Ilya Goldin] 7 December 2009

==== Order KC models according to AIC====
* This is based on results from multiple dataset analyses that compared AIC, BIC and loglikelihood to cross validation RMSE. -- Mimi McLaughlin, 2/9/2011

====Use log of opportunity count for AIC and BIC calculations====
* We compared using the log of opportunity count to whole number opportunity count in multiple datasets. We found the results for log of opportunity count to be consistently better, though small. -- Mimi McLaughlin, 2/9/2011

== Developer Requests ==

==== Store Converted Date and Converter Info ====

As a DataShop administrator, I'd like to see the converter information (version and date) stored in the database, so that I do not have to store that data manually in the Additional Notes field each time I load a dataset. -- Kyle, 8/5/2008

* DTD new fields:
** Store conversion and converter information in database (anything else?)
*** conversion_time
*** converter_info
* Email from Octav, 10/5/2007

I see there's also a note about the converter version in the Dataset
Info. Which is good, but it seems it's taken from the directory name
when I submitted the set. I don't know how reliable that is.  :-) It
would be better if it's taken from the new converter info field.

==== Convert from XML to tabbed-delimited format ====

* If the users agree that export format is valuable, then maybe if they could convert from XML to export format to see data in Excel, could look at Selection column and see blanks more easily. -- Jonathan Sewall, ET Mtg, 10/10/2007

==== Plug-ins (general issues) ====

* Please please support Ruby on Rails. -- Ben Shih, December 2008

==== Create UI to grant DataShop user roles ====

* Already tedious.
* Alida, User Meeting AAR on December 9, 2009

== Help ==

==== Specialize Label of Help Button ====
* Since this help is better than in most applications, it should say more than just 'Help'. -- Ken Koedinger & Ryan Baker, Team Mtg, May 30, 2008
** Ideas:
*** 'Page Help'
*** 'Help with this page'
*** 'Help with Learning Curve page' (Ken's favorite)
*** 'Help with this tool'

== Home Page ==

==== Redesign the Home Page ====

* In the menu of data sets at the top, include the N (=20 or as many as fit on the screen?) data sets that I have visited starting with the ones I've visited most recently. --[[User:Koedinger|Koedinger]] 16:16, 16 September 2009 (EDT)
** As of today, the menu shows the last 10 data sets, most recently visited at the top. I think the feature is good enough, but let's ask Ken --[[User:Bleber|Bleber]] 10:56, 6 August 2010 (EDT)
* There needs to be a better ordering for the datasets (DS364)
* Maybe a search to filter the list of datasets since the list is so long. -- Brett Leber, 6/14/2007
* Going back to the home page always goes to 'My Datasets' (DS313)
* Maybe show more high level stats on this page, like how many transactions, students, skill models
* Allow users to post and share project documentation (files, papers, other meta-info) -- Ruth Wylie, suggested during meeting on 8/4/2010. She had a file that is relevant to multiple datasets in her project.

== Import ==
== Miscellaneous ==

==== Analyses by LearnLab ====

* Organize data by LearnLab, not by "data set". -- Ryan Baker, Aug 2008
* Also: Bob Hausmann, Sep 2008; Maxine Eskenazi, Sep 2008
* Essentially, current data sets become samples, but the top-level unit is the LearnLab. You can take every data set in a LearnLab together as a sample.
* Implies being able to run analyses across data sets, and export multiple data sets together; to create multi-data set samples
* As a user of DataShop, I would like to look at learning curves for all Algebra data together (for example), or export all Algebra data
* Important long-term, but is a lot of work -- in particular, we need to solve scalability issues first.

==== Save Settings Between Sessions ====
* It would be useful if DataShop could save settings between sessions. -- Bob Hausmann, User Meeting, 2/1/2008
** "I do a lot of redoing the same steps" (e.g., set cutoffs, select a KC model, select students).

==== Multiple steps per transaction ====

* Needed so that we do not have to create multiple transactions for the same actual action for Andes logs. -- Kurt van Lehn, Feb 2007

==== Demographic data ====

* This has been mentioned by NSF visitors, AB, ESL, and some researchers.
* Also mentioned at Winter Workshop 1/23/2008.
** Derek/Sue-mei: Student background information not in DataShop. Would like to see a student or set of students from a particular demographic, and view them across datasets!
* Note that Gail added demographic data to Additional Notes field on the Dataset Info page for many datasets. The idea here is to put that data into the database somewhere.

==== Single Sign On ====
* Michael Bett, email, 10/8/2007
* It would be nice if the following services have a single login account/password:
*# Theory Wiki
*# Learnlab.org
*# ESL's OSS
*# DataShop

==== Reveal unanonymized student IDs ====

As a researcher/PI performing research assistant tasks, I want to see easily the unanonymized student IDs of subjects in DataShop so that I can email my subjects telling them when to use my system.

* Ruth Wylie, July 3, 2008
* As the honest broker of the PSLC data, I have promised to not reveal the student IDs and to protect the identity of the students. This is part of the DataShop IRB. Therefore, I do not see this request as possible. [[User:Alida|Alida]] 09:53, 4 September 2009 (EDT)
** At some point, some coordination would be good with OLI's Digital Dashboard project that Marsha Lovett (and the OLI team and sometimes me) is working on. This is meant to provide usage information more quickly to instructors. It could also perhaps be used by researchers (with the right IRB rights) in situations like Ruth's.

==== Knight Timeline ====
* Developed by Andrea Knight, 2004

==== Buggy Skills ====
* Ken Koedinger, prototype walkthrough 9/11/2006

==== Confusion Matrix ====
* Brian MacWhinney, prototype walkthrough 9/11/2006

== Navigation Bar ==
==== Filter KCs by Name ====

As a researcher working with KCs, I want to filter KCs based on their names, so that I can...
* Vincent Aleven, Email, 2/3/2008
* "Since Alida said you cannot have two mechanisms for putting together your KC set (i.e., cannot have both the selecting-by-clicking and selecting-by-filtering), I would probably opt for the latter."
* Alida: I thought Vincent mentioned that he'd like to select which KCs are in a set by filtering on the name. Example: Include KCs with '*reason*' in the name and exclude KCs with '*given*' in the name.
* This could be an addition to our v3.0 KC-selection mechanism--filter by name.
* Vincent, Email, 5/6/2009: Expressed another need for this feature. Could just allow for a wider area and longer list so that more items can be checked at once. The number of characters we show right now is not enough because in many cases that number of characters is the same across many of the skills. Reference data set: Geometry CWCTC 2005-2006

* Status: Design Started

==== Facebook-style KC Selection ====

As a researcher working with KCs, I want to select KCs based on the learning curve thumbnail, so that I can see quickly which ones I'm interested in exploring more deeply.

* Feature already designed for v3.0, not implemented due to time constraints.
* Agreed this would be really useful. -- Kirsten Butcher, User Mtg, 1/31/2008
* Status: Guestimate: 20 days, need to revisit requirements document

==== Feedback after clicking a large sample on a large dataset ====

As a user of DataShop (first-time or not), I want some feedback and the ability to cancel after I do something that might take a long time (e.g., clicking "All Data" on a large dataset) so that I do not get stuck.

* Part of the Susan Goldman story
* After clicking a sample for a large dataset, there is no "Loading..." text, no feedback that the click was even registered by the app (besides the sometimes busy cursor and small browser "loading" text), nor the ability to cancel the action. 
* We will always have similar problems even if performance is improved, so providing feedback and the opportunity to cancel is critical.

==== Save Button in Problem Navigation Box ====

* Save buttons in the sidebar. -- Ken Koedinger, Mtg 2006
** Could also put one in the Problem selection box in the sidebar.

==== Make Nav Bar Wider ====
* Make the Sample name and description fields much wider. -- John LaPlante, email 7/10/2007

== New Visualizations/Reports ==

==== Student-KC Rollup ====
As a researcher, I want to see KCs rolled up by student, so that I can ...

* Vincent Aleven, User Mtg, 1/29/2008
** By Student-KC would be more useful than by Student-Problem
** Example: # Steps asking for a hint or error or what proportion had help
** How often bottom out hint occurs

==== Instructor Reports ====
* Phil said he received a lot of positive reactions to providing reports on units for instructors. -- Phil Pavlik, ET Mtg 10/10/2007

==== Manage Authorizations/Projects Page ====
* Lisa Anthony, email 10/23/2007
* Allow PI to change permissions on the datasets.
:"Actually, I couldn't see how to change permissions on the datasets from the website. Is this possible? If not, it might be a nice feature..."

==== Calculate Time Spent on Different Study Activities ====

As a researcher, I want to know how much time, on average, students spend on study activities, so that I can ...

* Bruce McLaren, Email, 4/7/2009
<blockquote>
For my most recent stoich study, Shawn and I are interested in calculating timing information such as: (a) how long students spent, on average, working on individual tutors (b) how long students spent, on average, on all items in an intervention (c) how long students worked, on average, on post-tests. 

Timing information is very commonly required for studies, and can be calculated from DataShop logs relatively easily, so even if we don't have it, might be worth considering. (And we don't want to re-invent the wheel, if you already have it or are planning it...)
</blockquote>

==== Incorrect Step Duration and Hint Step Duration ====

As a researcher, I want to be able to see total step duration if the student's first attempt was an incorrect attempt, and total step duration if the student's first attempt was a hint request, so that I can do some analyses that I can't do with "Error Step Duration".

* Bob Hausmann, email, 11/11/2008.
* Updated title and story with 'step duration' instead of 'time'. -- [[User:Alida|Alida]] 10:36, 4 September 2009 (EDT)

==== Grading ====

[[Grading]]

====Display number of steps and number of observations for skills====
* How and where to be determined by developers. -- Ken (entered by Mimi)

== Reports ==
=== Dataset Info ===

==== Pointers to Hard-copy Data ====
* Brett van de Sande, NSF Site Visit, 5/28/2008
* Pointers to hard-copy data such as paper tests and/or homework.  Include contact information.  It doesn't seem to make sense to scan a whole filing cabinet of paper if no one wants to look at it.  And any secondary researchers don't know about the filing cabinet to ask for it.

==== Sort Problem Breakdown Table ====

* Would like the ability to sort the table by clicking on the column headings of the Problem Breakdown Table on the Dataset Info Tab. -- Bruce McLaren, User Mtg, 11/5/2007

==== Rename dataset ====

As a researcher, I want to rename my dataset so that it makes more sense to other people. I also want to make sure the dataset doesn't become polluted later by new data not associated with my study.
* Ruth Wylie, July 3, 2008
* There are reasons she would want to do this (current name is worthless, other researchers might try her tutor and pollute her data) but also reasons for not doing it (log more data later).
* There are risks in changing a dataset name that might not be apparent. For example, if you want the new data in the same dataset. [[User:Alida|Alida]] 10:27, 4 September 2009 (EDT)

==== Average time per problem, average number of problems, total number of sections ====

In addition to showing student hours per dataset, it would be useful to know the average time spent per problem, average number of problems completed, and the total number of sections.
* Noboru Matsuda, June 18, 2010

=== Error Report ===

==== View By Student ====

* Would like to see what a couple of students saw in the feedback. -- Marsha Lovett, 10/11/2007

==== Export ====
* I would like the ability to export this data. -- John LaPlante, email thread 'Suggestions for Improvement' 7/10/2007
* Also interested in this feature idea. -- Bruce McLaren, User Mtg, 11/5/2007

==== Sort ====

* Was planned for but not implemented in v2.1 (estimated to be a 4 day task)
** By Correctness %, starting with the least correct
** By Hints %
** Step (or KC if view by KC)
** Number of Students
* Ability to sort problems by their average experienced position within the curriculum. -- Ken Koedinger, 02/16/2007
** Which problem did students most often experience first, then the one experience second most often, ...
* Order steps by the order they typically are executed by students. -- Ken Koedinger, email 11/7/2008
:"Searching through the steps in a problem to get a sense of what is going on is currently hard because the steps are ordered alphabetically, not by the order in which most students did them. While not all students do all steps in the same order, there is some regularity there. It would be quite useful if the steps could be ordered in a "typical order". This could be accomplished by using the time stamps (of the first (correct?) transaction?) for each step to determine rank order of the each step for each student in a problem and then for each problem average the rank order of each step across all students. Then arrange the steps in the Error Report by their average rank order -- that is, roughly speaking, the step that is most likely to be first across students (closest on average to first) goes first, the step with the next lowest rank goes next, etc."
* The capability to count the number of errors of each message type and sort in different ways, for instance by all errors that had no messages. -- Bruce McLaren, email 10/22/2007
:"This is an error analysis I recently did in Excel, using pivot tables, that might be handy if in the DataShop. This one is very important for tutors because the errors that occur most frequently, yet don't elicit messages to the students, are good candidates to become errors with feedback."

==== Show more than 500 problems ====

In the error report, can we see more than 500 problems? See set "Cog Model Discovery Experiment Spring 2010"
2/2888 selected.
(Showing the first 500)
-- Ken via email on 2/15/2011

=== Export ===

==== Last attempt on step? column for transaction format ====

* Include a new column that shows whether the row is the last attempt on a step for a student or not. Could be 0 or 1 as value. Helpful for researchers who are doing grading of data. Transaction format. --Vincent Aleven, CTAT mtg 11/5/2010

==== Elapsed Time ====

* Include the elapsed time in preview and transaction export. It is more valuable than the transaction time as an absolute reference. Possible to keep both. --Ken Koedinger, Team Mtg 04/18/2008

==== SQL Format ====

* Option to export as an SQL file. -- Ken Koedinger, 03/26/2007, also brought up in June ET Meeting
** Ability to export an SQL dump of a dataset. --Kyle Cunningham, 04/03/2007

==== Specify Character for Blanks ====

* Ability to specify what character if any is used for blank. --Ryan Baker, email 8/9/2007
:"Not all tools handles TABTABTAB correctly on import. The period '.' is used to mean missing data in most stats packages. The word 'BLANK' is used in some other ones. Not an issue for Ryan as he wrote a preprocessor to convert blanks.

==== Opportunity (at KC) Count in Transaction Export ====
* Include the opportunity count in the transaction export (it's only in the student-step rollup) -- Noboru Matsuda, 10/08/2009
** Would be repetitive
** Would need one Opportunity column per KC

==== Export only rows that have KCs tagged ====
* Is it possible when a exporting dataset to include only the [transaction?] rows that have knowledge components tagged? And the same goes for KC models export, only include the items that have KC tagged? -- Hui Cheng, 01/19/2010
** We have the inverse of this option on the Performance Profiler, 'Include steps without a knowledge component', and with the Student-Problem export, 'Include Steps without KCs'.

==== Include Custom Fields in Student-Step Rollup ====
* Is it possible to include the custom field in the datashop [step] export? -- Hui Cheng, 03/01/2010
** "I am thinking about the student-step level export. We have a student from Statistics wanting to use assistment data from datashop. She wants to have problem set id (=curriculum id in datshop). For Assistment data, we decided to put problem set id in the custom field because in the Assistment a lot of problems are repeated in different problem sets. She needs data at the student-step level." (Hui email)
* Vote from Phil Pavlik too -- see [[Collected_User_Requests#]]

==== Don't duplicate rows in student-step format when not showing KCs ====
* If the checkbox to show knowledge components is not checked, maybe it doesn't make sense to show rows more than once if they have more than one KC associated with the step. (Mimi (and Brett) stumbled on this. 8/16/2010)

==== Student-Step Rollup include Success Column ====
* Step rollup, 1 if correct, 0 if incorrect/hint, blank otherwise call it Success. ~~ Ken Koedinger, DataShop Team Meeting, Oct 22, 2010

=== Learning Curve ===

==== Default sort by observation on LCPID ====

* Default sort by observation on LCPID. ~~ Ken Koedinger, DataShop Team Meeting, 10/22/2010

==== Purple Point ====
* Purple Point: if a point on the LC has more than one KC associated with it but you have drilled down to a given KC, then the blue line is off. We could put a purple point that takes this into account.
* Simpler thing: display a warning message that some points in the display are driven by other KCs
* Pearson may be interested
* This was mentioned during the PSLC Summer School 2010.
* For a step with multiple skills, attribute the error only to the skill with the highest overall error rate. ~~ Alida, meeting with Ken, November 18, 2010

==== Reduce Scrolling ====
* Add a forward and back button to the graph to reduce scrolling. -- John LaPlante, email 7/10/2007

:"The learning curve page could use a forward and back button to cycle through learning the learning curves. Going through them one by one requires a lot of scrolling."

==== Turn On Point Labels ====

* It would be nice to have the option to turn on point labels. It is nice that I can mouse over a point and view the data but it would be nicer if it appeared automatically. -- John LaPlante, email 7/10/2007

==== Option for Bigger Graph ====

* Allow user to see bigger graph. -- Derek Chan, Winter Workshop 1/23/2008
** Potential solution: enable user to set x, y scale manually

==== LC Normalize Scale of Thumbnails ====
* [[LC Normalize Scale of Thumbnails]]

=== Performance Profiler ===

==== Rename Performance Profiler ====
* John laPlante, email thread 'Suggestions for Improvement' 7/10/2007
** Did not use this report as thought it had something to do with improving the performance of the DataShop itself. This report might have been much better to use then the Error Report, would still need an export as using the data in a tabular form was still necessary. Note that the pivot tables created were added to the dataset (Pittsburgh Science of Learning Center Stoichiometry Study 1).
* Lisa Anthony, email 8/2008
** Didn't know to go to the report. Export would be useful.
** Needed a better definition of Error Rate with respect to Problem and Unit rows.

==== Export ====
* John laPlante (see comments in Rename Performance Profiler)
* Lisa Anthony (see comments in Rename Performance Profiler)
* Yes, I too can imagine wanting to export the results of a particular performance profiler output (i.e., to a table) so that I can graph it my own way. --[[User:Koedinger|Koedinger]] 16:24, 16 September 2009 (EDT)

==== Table View ====
* Add option to switch to a table view.
* Columns are: Problem Name, Steps, % incorrect, Incorrect Steps, % hint, Hint Steps, etc. include all values in pop-up.

==== Union of KCs/Problems/Students ====
* Allow user to get the union of KCs/Problems/Students etc so they can compare across samples easier. -- Kirsten Butcher, Winter Workshop 1/23/2008
:"It is difficult to compare performance profiler graphs across samples because the KCs (or problems, or whatever) aren't necessarily in both of those samples."

==== Show Details In Report ====
* Click on bar to see details in report and not just in pop-up. It disappears too quickly. -- Alida, Brett

==== Show More Information in the Graph ====
* Show more information in the graph: [Bruce M, User Mtg, 11/5/2007]
** Had drilled down by a certain skill - skill is not listed in the graph, user has to check the skill list on the LHS to see what skill was selected
* Maybe related to Bruce's "show more info in graph": show the actual value of the range variable. e.g., when range is error rate, show the error rate number somewhere (right now you can see, via mouse-over, the incorrect, hint, and correct percentages, but not the error rate) -- Brett
* Design idea to show # steps incorrect, # steps hint, # steps correct to clarify how the percentages are calculated. -- Alida, Brett

== Sample Selector ==

==== Sub-Samples ====

* Sub Samples would be helpful. -- John LaPlante, email 7/10/2007
:"A nice solution would be to have sub-samples where one property varies. When I'm doing this analysis, I've changed my sample many times, renaming it sometimes, tweaking it to get variations on the data. The samples are really useful but they could help me a lot more with this kind of experimentation."

==== Filter out students ====

As a researcher, I want to filter out test users (including myself) from my data so that I see less noise in the data.

* Ruth Wylie, July 3, 2008
* You can already filter out test users by using a test user id that starts with 'weirdCMUuser_xxx'. Then create a sample that excludes students with a name like 'Test_%'. --[[User:Alida|Alida]] 10:29, 4 September 2009 (EDT)

==== Filter by "Class" ====

As a researcher, I want to create samples based on "class" since class is how I've encoded my conditions.

* Maaike Waalkens, July 30, 2010
* This is what made sense using Mathtutor for tutor delivery.
* What other fields are we missing in the Sample Selector?

==== Filter by "Step" ====

As a researcher, I want to create samples based on "step" since I'm only interested in one particular step.

* Mimi McLaughlin, August 17, 2010

==== Create Sample Automatically ====

* Would it be possible for me to get a random sample from the 'Bridge to Algebra 2006-2007' dataset of 100 students? I am having trouble looking at the data because it takes too long to load, and my adviser thought that was because the dataset was too large. ~~ DataShop User, 10/19/2010

== Web Services ==

==== Include transaction custom fields in web services student-step export ====

* "I noticed in the web services guide that cfs (which provides the custom fields) is not yet implemented for step roll-up tables. It is marked [coming soon] in the manual... I could write a workaround to pull in the transactions and lookup the custom fields, but I'd really rather not." -- Phil Pavlik, 4/27/2011

==== Use Custom Fields Graphs/Reports ====
In Graphs

As a researcher creating custom fields and assigning values at the transaction level, I want DataShop to perform the aggregation to the step level so that I can do other things with my custom-field variable such as graph it. -- Ryan Baker, mtg w/Alida & Brett, 12/15/2008

Performance Profiler
* Use all discrete variables/custom fields on the left and all the continuous variables on the bottom of the Performance Profiler. -- [[User:Koedinger|Ken Koedinger]], DataShop Team Meeting, Sept 11, 2009

==== Ad-hoc queries ====

* Allow restricted filtering on steps and transactions as the next web service feature (after CFs), whatever they can filter on in the navigation boxes (User Meeting AAR, December 9, 2009)

==== Sample creation as a web service ====

* Sample creation is still too slow. (User Meeting AAR, December 9, 2009)

 
----
See completed [[DataShop 3.x Features]] 
See on-going [[DataShop 4.x Features]] 
See prioritized [[DataShop Feature Wish List]]
[[Category:Protected]]
[[Category:DataShop]]

Collected User Requests

2011-04-28T16:24:28Z

Bleber: /* Web Services */

See prioritized items on [[DataShop Feature Wish List]].

== Annotations ==

==== Annotations on Transaction Level ====
* I have models which can annotate things like: gaming, bored, etc. on the transaction level. -- Ryan Baker, ET Mtg 12/5/2007

==== Annotations on Student Level ====
* Would annotate on student level. -- Ido Roll, User Meeting, 1/19/2009

==== Annotations on Pages ====
* See the cool thing created by Jeffery Heer where all the settings of the page were recorded with the comment. -- Ryan Baker, DS Team Mtg 5/23/2008

==== Dataset Discussion - Capture data-integrity issues ====
* As a stakeholder in the DataShop project, I want to capture and publicize the data-integrity issues discovered with data sets so that data is better documented (and so we've fulfilled a promise to our funders to better document data). -- Ken Koedinger, Team Meeting, 8/15/2009
* As a user of DataShop, I want to discuss datasets and have that discussion attached to the dataset so that others can comment and better understand any data-integrity issues I've found.

==== Linking to internal pages ====

* It would be handy if a link can be saved to any dataset, sample, page in the DataShop. -- [[Alida]], 10/18/2007
** Currently, https://pslc-qa.andrew.cmu.edu/datashop/DatasetInfo?datasetId=793 works if you are already logged in.

==== Have a link from the DataShop to the Theory Wiki (Dataset to Project Page) ====

* Can we link from the dataset to the project page on the Theory Wiki? In the pipeline have a clickable link to the project page (make project name clickable). -- Michael Bett, ET Mtg 11/14/2007
** Link to a dataset directly? Is that obvious to users? Click on dataset link -> log in -> redirected back to dataset. -- Brett Leber

== Data Format ==
==== Capture the question prompt and answer choices the student chose from ====
* As a research, I want to be able to identify steps based on the question prompt—not the difficult-to-understand step names that come from selection and action of my tutor—so that I can analyze the data more easily. For multiple choice questions, I also want to be able to see all of the choices that were available to the student. -- Eli Silk - February 1, 2011 (meeting with Brett and Ross Higashi of the FIRE project)
** Near-term solution is to create a table locally that maps steps to prompts
** Another solution is to ask CTAT team to modify their Flash components so that they log this information as custom fields
** Long-term solution is to make these fields into standard fields in the tutor message format
** In the email thread "Cognitive Tutor Development and Evaluation Requests", there is support for this general idea from Ken Koedinger, Albert Corbett, and Christian Schunn.
** Ken added that "Ideally, we may want to store any images that the student can see and where they reside on the screen ..."

== Data Modeling ==
=== Non-KC Modeling ===

==== Automatic Distillation ====
* As an educational data miner wishing to develop a machine learned model with PSLC data, I would like to be able to automatically distill data features (e.g. custom fields) commonly used in past educational data mining research for a new data set (see, for instance, Baker, Corbett, Roll, & Koedinger, 2008 in UMUAI) -- Ryan Baker, Summer 2008, Startup Memo
** Could be implemented as a plug-in
* Also interested in this feature idea. -- Dan Franklin, Oct 2008

==== Upload model and apply it to new data set ====

* As an EDM researcher, I would like to take a model, expressable as a linear formula on DataShop fields, or a simple code procedure (e.g. Bayesian Knowledge Tracing, which Ryan has code for), and apply it to a new data set, so that I can ... -- Ryan Baker, Sept 2008
* Also interested in this. -- Maxine Eskenazi, Sept 2008
* May work best as a plug-in
** Code to display GUI to choose which data sets to use, calls model code, re-import to DataShop
** Good to have a way to apply many models, as soon as you import a data set
* Phil has an idea that maybe fits within this one. Please move if there's a better category. -- Brett Leber
<blockquote>This [''transaction? kc? --ed.''] relabeling is really mostly about enabling modeling in DataShop right? With this in mind, I think that it is actually a higher priority to have model alternatives in DataShop.... E.g. Investigators should be able to give you chunks of Java code according to a certain specification, and DataShop should be then able to run these over datasets (perhaps after a certain series of QA occurs according to an SOP) when the investigator clicks some button in DataShop.... Obviously this is a much large project than adding columns, but it is also much more important in my mind. ''-- Phil Pavlik, email to Brett on 1/14/2009''</blockquote>
* Examples:
** Example: running gaming detector in multiple tutors and comparing gaming frequencies
** Example: applying Bayesian Knowledge Tracing to a new data set from the same LearnLab
** Example: applying Ben Shih's models to many data sets. Note that Ben Shih should be included in design of this feature; he is interested, and has a lot of good ideas.

==== Add Different Predicted Values ====
* Would also like to add statistics, different predicted values than what LFA produces. -- Ken Koedinger, ET Meeting, 10/10/2007

==== Bayesian Knowledge Tracing ====
* Bayesian Knowledge-Tracing built into DataShop like LFA is. -- Ryan Baker, Startup Memo, Summer 2008

==== Richer statistics for KC modeling ====

* In addition, to the model stats and estimates generated for learning factor models, we should also create difficulty factor models (i.e., ones with "Slope" parameter). The latter is particularly relevant for the Unique-Step model where the slope parameter is meaningless (but still counts against the BIC value). -- Ken Koedinger, Email "new feature request", 1/22/2009
** Relatedly, we should report significance values on the Slope parameters -- that is, when is the Slope significantly different from 0.
** The KC models page perhaps should also report the log-likelihood and number of parameters (in addition to BIC) and leave out AIC. We might also consider other metrics of model generality, like the "adjusted R2" (if I have this name right -- Joe Beck mentioned in the Assistments meeting yesterday).
** These changes will be part of meeting the CMDM goal of improvement in (or at least demonstrate acceptability of) the cognitive models in 90+ units in our LearnLab courses (or affiliates).

=== KC Modeling ===

==== LFA/AFM: Check if enough memory using formula ====
* The LFA/AFM code could calculate how much RAM would be needed to run the algorithm on a given skill model using the formula provided by Hao. This formula is based on the number of transactions, number of students and number of skills. Right now it will not schedule itself to run on a model with over 300 skills, though there is a manual override. [[User:Alida|Alida]] 13:35, 29 November 2010 (EST)

==== Create KC Models through Web Services ====
* For John Stamper's CMDM project, it would be nice to automatically update KC Models through web services. -- [[User:Koedinger|Ken Koedinger]], DataShop Team Meeting, Sept 11, 2009

==== Automatically discovering new KC model ====

* Would it be possible to run some code (perhaps Hao's KC model selection code, perhaps something else generated by CMDM thrust) to find new best KC model. -- Vincent Aleven, Sept 2008
* As a learning sciences researcher, I would like DataShop to discover a new/better KC model for me.
* Could be done as a plug-in

==== Generate new KC Models with LFA ====
* Not sure who asked for this.
* It would be nice to generate new KC Models with Hao's LFA code
* Would need to specify factors.
* Ideas on where this could run?
** On a separate server? Request it to be run, specify duration. Have separate server queue up requests, email user when done.
** In Java Applet on client machine? -- Phil Pavlik

==== Same Skill Twice on Same Step ====
* Would like to be able to apply the same skill to a step twice during a KC Model Import. -- Ken Koedinger, email, 2/4/2009

==== Save KC Model Import Files ====
* KC Model Import - save the file used to create the KCMs in case we need to recreate them. -- Ken Koedinger, email 3/4/2009

==== Log Likelihood and MAD ====
* Log Likelihood, MAD (mean absolute deviation) problem, MAD step (store and show) -- Hao Cen
** This is a variation on "Richer statistics for KC modeling" above. Probably should be merged. - Ken Koedinger

==== Better naming for KCs in auto-generated Unique-Step KCM ====

As a researcher, I want the KCs in the Unique-Step model to have better names than KC1, KC2, etc, so that I can easily tell which generated KCs go with which unique step.

* Hui Cheng (Email 1/20/2009), Ken Koedinger (Email 1/22/2009)
* Could you put this in your new feature request list: could the “Unique-step model” be better labeled then just “KC1”, “KC2” and etc? For example, for Assistments data, you could use part of the “Step Name”. -- Hui Cheng, Email 1/20/2009
* But, anything is better than "KC". -- Ken Koedinger, email, 1/22/2009
<blockquote>A simple alternative, that preserves uniqueness and addresses length, is to concatenate: 1) the first K letters of the step name 2) a unique numerical increment (same as the "3" in "KC3"). Note that (2) guarantees uniqueness just as it does in the current "KC<num>" scheme. Or perhaps better given that step names are often scoped within problems, is to concatenate: 1) the first L letters of the problem name 2) the first M letters of the step name 3) a unique numerical increment (just like the "3" in "KC3") I think K or L+M should be as big as possible without making the KC names indistinguishable (because they run off the right margin) in the KC list on the Learning Curve and other pages. </blockquote>

==== Visualize Learning Curve Split ====
* Be able to visualize a learning curve split into 2 based on a specification of a subset of problems. -- Albert Corbett, Math CCM, November 2008
* Harder: make me a new KCM out of it. -- Ken Koedinger, Team Mtg, Dec 5, 2008, while trying to describe Albert's request to Alida

==== Statistical Significance ====
* Can DataShop determine if the difference between conditions or learning curves is statistically significant? -- general theme at workshop, probably mentioned by Bob Hausmann in his talk, Winter Workshop 1/23/2008
** Can't do it yet in DataShop, but I can show you how to do it in R (or SPSS...) after you've exported the data -- export the "student-step rollup" than the whole transaction table. --[[User:Koedinger|Koedinger]] 16:30, 16 September 2009 (EDT)
*** This comment perhaps belongs (exists?) elsewhere: The current student-step rollup export (relly, all exports) should be such that I can immediately load it into R (and other packages) without error. Now errors occur, for instance, because there are "#" in the variable names i the student-step rollup. However, it is currently a road-block for helping folks like Bob do analyzes they want to do. --[[User:Koedinger|Koedinger]] 16:30, 16 September 2009 (EDT)
*** See [[Condition in Student-Step Rollup]] -- [[User:Alida|Alida]] 10:35, 17 September 2009 (EDT)

==== Split and Merge Skills ====

* Hand searches though a p-matrix for a dataset to split and merge skills. (Pie in the Sky) -- Ken Koedinger, Team Mtg, 02/22/2008

==== Notes on new KCMs ====

* It would be good if I could add a note to a KC model that was newly imported. -- Noboru Matsuda, email, Nov 19, 2009

==== Display Learning Curves Grouped by Interestingness ====

* The page displaying all the learning curves today seems to be alphabetically sorted by KC name, which is not necessarily meaningful.
* An alternative presentation is to group the curves into 4 sets, breaking up the page. Set 1 has curves that contain significant spikes, and therefore seem to be "low-hanging fruit" for purposes of breaking up into KCs. Set 2 has curves with few spikes, but they have a long X axis, suggesting that students are presented with too many opportunities to acquire those KCs. Set 3 has the "good" curves, i.e., nicely decreasing curves that are not too long. Set 4 is "other". -- [http://www.pitt.edu/~goldin Ilya Goldin] 7 December 2009

==== Order KC models according to AIC====
* This is based on results from multiple dataset analyses that compared AIC, BIC and loglikelihood to cross validation RMSE. -- Mimi McLaughlin, 2/9/2011

====Use log of opportunity count for AIC and BIC calculations====
* We compared using the log of opportunity count to whole number opportunity count in multiple datasets. We found the results for log of opportunity count to be consistently better, though small. -- Mimi McLaughlin, 2/9/2011

== Developer Requests ==

==== Store Converted Date and Converter Info ====

As a DataShop administrator, I'd like to see the converter information (version and date) stored in the database, so that I do not have to store that data manually in the Additional Notes field each time I load a dataset. -- Kyle, 8/5/2008

* DTD new fields:
** Store conversion and converter information in database (anything else?)
*** conversion_time
*** converter_info
* Email from Octav, 10/5/2007

I see there's also a note about the converter version in the Dataset
Info. Which is good, but it seems it's taken from the directory name
when I submitted the set. I don't know how reliable that is.  :-) It
would be better if it's taken from the new converter info field.

==== Convert from XML to tabbed-delimited format ====

* If the users agree that export format is valuable, then maybe if they could convert from XML to export format to see data in Excel, could look at Selection column and see blanks more easily. -- Jonathan Sewall, ET Mtg, 10/10/2007

==== Plug-ins (general issues) ====

* Please please support Ruby on Rails. -- Ben Shih, December 2008

==== Create UI to grant DataShop user roles ====

* Already tedious.
* Alida, User Meeting AAR on December 9, 2009

== Help ==

==== Specialize Label of Help Button ====
* Since this help is better than in most applications, it should say more than just 'Help'. -- Ken Koedinger & Ryan Baker, Team Mtg, May 30, 2008
** Ideas:
*** 'Page Help'
*** 'Help with this page'
*** 'Help with Learning Curve page' (Ken's favorite)
*** 'Help with this tool'

== Home Page ==

==== Redesign the Home Page ====

* In the menu of data sets at the top, include the N (=20 or as many as fit on the screen?) data sets that I have visited starting with the ones I've visited most recently. --[[User:Koedinger|Koedinger]] 16:16, 16 September 2009 (EDT)
** As of today, the menu shows the last 10 data sets, most recently visited at the top. I think the feature is good enough, but let's ask Ken --[[User:Bleber|Bleber]] 10:56, 6 August 2010 (EDT)
* There needs to be a better ordering for the datasets (DS364)
* Maybe a search to filter the list of datasets since the list is so long. -- Brett Leber, 6/14/2007
* Going back to the home page always goes to 'My Datasets' (DS313)
* Maybe show more high level stats on this page, like how many transactions, students, skill models
* Allow users to post and share project documentation (files, papers, other meta-info) -- Ruth Wylie, suggested during meeting on 8/4/2010. She had a file that is relevant to multiple datasets in her project.

== Import ==
== Miscellaneous ==

==== Analyses by LearnLab ====

* Organize data by LearnLab, not by "data set". -- Ryan Baker, Aug 2008
* Also: Bob Hausmann, Sep 2008; Maxine Eskenazi, Sep 2008
* Essentially, current data sets become samples, but the top-level unit is the LearnLab. You can take every data set in a LearnLab together as a sample.
* Implies being able to run analyses across data sets, and export multiple data sets together; to create multi-data set samples
* As a user of DataShop, I would like to look at learning curves for all Algebra data together (for example), or export all Algebra data
* Important long-term, but is a lot of work -- in particular, we need to solve scalability issues first.

==== Save Settings Between Sessions ====
* It would be useful if DataShop could save settings between sessions. -- Bob Hausmann, User Meeting, 2/1/2008
** "I do a lot of redoing the same steps" (e.g., set cutoffs, select a KC model, select students).

==== Multiple steps per transaction ====

* Needed so that we do not have to create multiple transactions for the same actual action for Andes logs. -- Kurt van Lehn, Feb 2007

==== Demographic data ====

* This has been mentioned by NSF visitors, AB, ESL, and some researchers.
* Also mentioned at Winter Workshop 1/23/2008.
** Derek/Sue-mei: Student background information not in DataShop. Would like to see a student or set of students from a particular demographic, and view them across datasets!
* Note that Gail added demographic data to Additional Notes field on the Dataset Info page for many datasets. The idea here is to put that data into the database somewhere.

==== Single Sign On ====
* Michael Bett, email, 10/8/2007
* It would be nice if the following services have a single login account/password:
*# Theory Wiki
*# Learnlab.org
*# ESL's OSS
*# DataShop

==== Reveal unanonymized student IDs ====

As a researcher/PI performing research assistant tasks, I want to see easily the unanonymized student IDs of subjects in DataShop so that I can email my subjects telling them when to use my system.

* Ruth Wylie, July 3, 2008
* As the honest broker of the PSLC data, I have promised to not reveal the student IDs and to protect the identity of the students. This is part of the DataShop IRB. Therefore, I do not see this request as possible. [[User:Alida|Alida]] 09:53, 4 September 2009 (EDT)
** At some point, some coordination would be good with OLI's Digital Dashboard project that Marsha Lovett (and the OLI team and sometimes me) is working on. This is meant to provide usage information more quickly to instructors. It could also perhaps be used by researchers (with the right IRB rights) in situations like Ruth's.

==== Knight Timeline ====
* Developed by Andrea Knight, 2004

==== Buggy Skills ====
* Ken Koedinger, prototype walkthrough 9/11/2006

==== Confusion Matrix ====
* Brian MacWhinney, prototype walkthrough 9/11/2006

== Navigation Bar ==
==== Filter KCs by Name ====

As a researcher working with KCs, I want to filter KCs based on their names, so that I can...
* Vincent Aleven, Email, 2/3/2008
* "Since Alida said you cannot have two mechanisms for putting together your KC set (i.e., cannot have both the selecting-by-clicking and selecting-by-filtering), I would probably opt for the latter."
* Alida: I thought Vincent mentioned that he'd like to select which KCs are in a set by filtering on the name. Example: Include KCs with '*reason*' in the name and exclude KCs with '*given*' in the name.
* This could be an addition to our v3.0 KC-selection mechanism--filter by name.
* Vincent, Email, 5/6/2009: Expressed another need for this feature. Could just allow for a wider area and longer list so that more items can be checked at once. The number of characters we show right now is not enough because in many cases that number of characters is the same across many of the skills. Reference data set: Geometry CWCTC 2005-2006

* Status: Design Started

==== Facebook-style KC Selection ====

As a researcher working with KCs, I want to select KCs based on the learning curve thumbnail, so that I can see quickly which ones I'm interested in exploring more deeply.

* Feature already designed for v3.0, not implemented due to time constraints.
* Agreed this would be really useful. -- Kirsten Butcher, User Mtg, 1/31/2008
* Status: Guestimate: 20 days, need to revisit requirements document

==== Feedback after clicking a large sample on a large dataset ====

As a user of DataShop (first-time or not), I want some feedback and the ability to cancel after I do something that might take a long time (e.g., clicking "All Data" on a large dataset) so that I do not get stuck.

* Part of the Susan Goldman story
* After clicking a sample for a large dataset, there is no "Loading..." text, no feedback that the click was even registered by the app (besides the sometimes busy cursor and small browser "loading" text), nor the ability to cancel the action. 
* We will always have similar problems even if performance is improved, so providing feedback and the opportunity to cancel is critical.

==== Save Button in Problem Navigation Box ====

* Save buttons in the sidebar. -- Ken Koedinger, Mtg 2006
** Could also put one in the Problem selection box in the sidebar.

==== Make Nav Bar Wider ====
* Make the Sample name and description fields much wider. -- John LaPlante, email 7/10/2007

== New Visualizations/Reports ==

==== Student-KC Rollup ====
As a researcher, I want to see KCs rolled up by student, so that I can ...

* Vincent Aleven, User Mtg, 1/29/2008
** By Student-KC would be more useful than by Student-Problem
** Example: # Steps asking for a hint or error or what proportion had help
** How often bottom out hint occurs

==== Instructor Reports ====
* Phil said he received a lot of positive reactions to providing reports on units for instructors. -- Phil Pavlik, ET Mtg 10/10/2007

==== Manage Authorizations/Projects Page ====
* Lisa Anthony, email 10/23/2007
* Allow PI to change permissions on the datasets.
:"Actually, I couldn't see how to change permissions on the datasets from the website. Is this possible? If not, it might be a nice feature..."

==== Calculate Time Spent on Different Study Activities ====

As a researcher, I want to know how much time, on average, students spend on study activities, so that I can ...

* Bruce McLaren, Email, 4/7/2009
<blockquote>
For my most recent stoich study, Shawn and I are interested in calculating timing information such as: (a) how long students spent, on average, working on individual tutors (b) how long students spent, on average, on all items in an intervention (c) how long students worked, on average, on post-tests. 

Timing information is very commonly required for studies, and can be calculated from DataShop logs relatively easily, so even if we don't have it, might be worth considering. (And we don't want to re-invent the wheel, if you already have it or are planning it...)
</blockquote>

==== Incorrect Step Duration and Hint Step Duration ====

As a researcher, I want to be able to see total step duration if the student's first attempt was an incorrect attempt, and total step duration if the student's first attempt was a hint request, so that I can do some analyses that I can't do with "Error Step Duration".

* Bob Hausmann, email, 11/11/2008.
* Updated title and story with 'step duration' instead of 'time'. -- [[User:Alida|Alida]] 10:36, 4 September 2009 (EDT)

==== Grading ====

[[Grading]]

====Display number of steps and number of observations for skills====
* How and where to be determined by developers. -- Ken (entered by Mimi)

== Reports ==
=== Dataset Info ===

==== Pointers to Hard-copy Data ====
* Brett van de Sande, NSF Site Visit, 5/28/2008
* Pointers to hard-copy data such as paper tests and/or homework.  Include contact information.  It doesn't seem to make sense to scan a whole filing cabinet of paper if no one wants to look at it.  And any secondary researchers don't know about the filing cabinet to ask for it.

==== Sort Problem Breakdown Table ====

* Would like the ability to sort the table by clicking on the column headings of the Problem Breakdown Table on the Dataset Info Tab. -- Bruce McLaren, User Mtg, 11/5/2007

==== Rename dataset ====

As a researcher, I want to rename my dataset so that it makes more sense to other people. I also want to make sure the dataset doesn't become polluted later by new data not associated with my study.
* Ruth Wylie, July 3, 2008
* There are reasons she would want to do this (current name is worthless, other researchers might try her tutor and pollute her data) but also reasons for not doing it (log more data later).
* There are risks in changing a dataset name that might not be apparent. For example, if you want the new data in the same dataset. [[User:Alida|Alida]] 10:27, 4 September 2009 (EDT)

==== Average time per problem, average number of problems, total number of sections ====

In addition to showing student hours per dataset, it would be useful to know the average time spent per problem, average number of problems completed, and the total number of sections.
* Noboru Matsuda, June 18, 2010

=== Error Report ===

==== View By Student ====

* Would like to see what a couple of students saw in the feedback. -- Marsha Lovett, 10/11/2007

==== Export ====
* I would like the ability to export this data. -- John LaPlante, email thread 'Suggestions for Improvement' 7/10/2007
* Also interested in this feature idea. -- Bruce McLaren, User Mtg, 11/5/2007

==== Sort ====

* Was planned for but not implemented in v2.1 (estimated to be a 4 day task)
** By Correctness %, starting with the least correct
** By Hints %
** Step (or KC if view by KC)
** Number of Students
* Ability to sort problems by their average experienced position within the curriculum. -- Ken Koedinger, 02/16/2007
** Which problem did students most often experience first, then the one experience second most often, ...
* Order steps by the order they typically are executed by students. -- Ken Koedinger, email 11/7/2008
:"Searching through the steps in a problem to get a sense of what is going on is currently hard because the steps are ordered alphabetically, not by the order in which most students did them. While not all students do all steps in the same order, there is some regularity there. It would be quite useful if the steps could be ordered in a "typical order". This could be accomplished by using the time stamps (of the first (correct?) transaction?) for each step to determine rank order of the each step for each student in a problem and then for each problem average the rank order of each step across all students. Then arrange the steps in the Error Report by their average rank order -- that is, roughly speaking, the step that is most likely to be first across students (closest on average to first) goes first, the step with the next lowest rank goes next, etc."
* The capability to count the number of errors of each message type and sort in different ways, for instance by all errors that had no messages. -- Bruce McLaren, email 10/22/2007
:"This is an error analysis I recently did in Excel, using pivot tables, that might be handy if in the DataShop. This one is very important for tutors because the errors that occur most frequently, yet don't elicit messages to the students, are good candidates to become errors with feedback."

==== Show more than 500 problems ====

In the error report, can we see more than 500 problems? See set "Cog Model Discovery Experiment Spring 2010"
2/2888 selected.
(Showing the first 500)
-- Ken via email on 2/15/2011

=== Export ===

==== Last attempt on step? column for transaction format ====

* Include a new column that shows whether the row is the last attempt on a step for a student or not. Could be 0 or 1 as value. Helpful for researchers who are doing grading of data. Transaction format. --Vincent Aleven, CTAT mtg 11/5/2010

==== Elapsed Time ====

* Include the elapsed time in preview and transaction export. It is more valuable than the transaction time as an absolute reference. Possible to keep both. --Ken Koedinger, Team Mtg 04/18/2008

==== SQL Format ====

* Option to export as an SQL file. -- Ken Koedinger, 03/26/2007, also brought up in June ET Meeting
** Ability to export an SQL dump of a dataset. --Kyle Cunningham, 04/03/2007

==== Specify Character for Blanks ====

* Ability to specify what character if any is used for blank. --Ryan Baker, email 8/9/2007
:"Not all tools handles TABTABTAB correctly on import. The period '.' is used to mean missing data in most stats packages. The word 'BLANK' is used in some other ones. Not an issue for Ryan as he wrote a preprocessor to convert blanks.

==== Opportunity (at KC) Count in Transaction Export ====
* Include the opportunity count in the transaction export (it's only in the student-step rollup) -- Noboru Matsuda, 10/08/2009
** Would be repetitive
** Would need one Opportunity column per KC

==== Export only rows that have KCs tagged ====
* Is it possible when a exporting dataset to include only the [transaction?] rows that have knowledge components tagged? And the same goes for KC models export, only include the items that have KC tagged? -- Hui Cheng, 01/19/2010
** We have the inverse of this option on the Performance Profiler, 'Include steps without a knowledge component', and with the Student-Problem export, 'Include Steps without KCs'.

==== Include Custom Fields in Student-Step Rollup ====
* Is it possible to include the custom field in the datashop [step] export? -- Hui Cheng, 03/01/2010
** "I am thinking about the student-step level export. We have a student from Statistics wanting to use assistment data from datashop. She wants to have problem set id (=curriculum id in datshop). For Assistment data, we decided to put problem set id in the custom field because in the Assistment a lot of problems are repeated in different problem sets. She needs data at the student-step level." (Hui email)

==== Don't duplicate rows in student-step format when not showing KCs ====
* If the checkbox to show knowledge components is not checked, maybe it doesn't make sense to show rows more than once if they have more than one KC associated with the step. (Mimi (and Brett) stumbled on this. 8/16/2010)

==== Student-Step Rollup include Success Column ====
* Step rollup, 1 if correct, 0 if incorrect/hint, blank otherwise call it Success. ~~ Ken Koedinger, DataShop Team Meeting, Oct 22, 2010

=== Learning Curve ===

==== Default sort by observation on LCPID ====

* Default sort by observation on LCPID. ~~ Ken Koedinger, DataShop Team Meeting, 10/22/2010

==== Purple Point ====
* Purple Point: if a point on the LC has more than one KC associated with it but you have drilled down to a given KC, then the blue line is off. We could put a purple point that takes this into account.
* Simpler thing: display a warning message that some points in the display are driven by other KCs
* Pearson may be interested
* This was mentioned during the PSLC Summer School 2010.
* For a step with multiple skills, attribute the error only to the skill with the highest overall error rate. ~~ Alida, meeting with Ken, November 18, 2010

==== Reduce Scrolling ====
* Add a forward and back button to the graph to reduce scrolling. -- John LaPlante, email 7/10/2007

:"The learning curve page could use a forward and back button to cycle through learning the learning curves. Going through them one by one requires a lot of scrolling."

==== Turn On Point Labels ====

* It would be nice to have the option to turn on point labels. It is nice that I can mouse over a point and view the data but it would be nicer if it appeared automatically. -- John LaPlante, email 7/10/2007

==== Option for Bigger Graph ====

* Allow user to see bigger graph. -- Derek Chan, Winter Workshop 1/23/2008
** Potential solution: enable user to set x, y scale manually

==== LC Normalize Scale of Thumbnails ====
* [[LC Normalize Scale of Thumbnails]]

=== Performance Profiler ===

==== Rename Performance Profiler ====
* John laPlante, email thread 'Suggestions for Improvement' 7/10/2007
** Did not use this report as thought it had something to do with improving the performance of the DataShop itself. This report might have been much better to use then the Error Report, would still need an export as using the data in a tabular form was still necessary. Note that the pivot tables created were added to the dataset (Pittsburgh Science of Learning Center Stoichiometry Study 1).
* Lisa Anthony, email 8/2008
** Didn't know to go to the report. Export would be useful.
** Needed a better definition of Error Rate with respect to Problem and Unit rows.

==== Export ====
* John laPlante (see comments in Rename Performance Profiler)
* Lisa Anthony (see comments in Rename Performance Profiler)
* Yes, I too can imagine wanting to export the results of a particular performance profiler output (i.e., to a table) so that I can graph it my own way. --[[User:Koedinger|Koedinger]] 16:24, 16 September 2009 (EDT)

==== Table View ====
* Add option to switch to a table view.
* Columns are: Problem Name, Steps, % incorrect, Incorrect Steps, % hint, Hint Steps, etc. include all values in pop-up.

==== Union of KCs/Problems/Students ====
* Allow user to get the union of KCs/Problems/Students etc so they can compare across samples easier. -- Kirsten Butcher, Winter Workshop 1/23/2008
:"It is difficult to compare performance profiler graphs across samples because the KCs (or problems, or whatever) aren't necessarily in both of those samples."

==== Show Details In Report ====
* Click on bar to see details in report and not just in pop-up. It disappears too quickly. -- Alida, Brett

==== Show More Information in the Graph ====
* Show more information in the graph: [Bruce M, User Mtg, 11/5/2007]
** Had drilled down by a certain skill - skill is not listed in the graph, user has to check the skill list on the LHS to see what skill was selected
* Maybe related to Bruce's "show more info in graph": show the actual value of the range variable. e.g., when range is error rate, show the error rate number somewhere (right now you can see, via mouse-over, the incorrect, hint, and correct percentages, but not the error rate) -- Brett
* Design idea to show # steps incorrect, # steps hint, # steps correct to clarify how the percentages are calculated. -- Alida, Brett

== Sample Selector ==

==== Sub-Samples ====

* Sub Samples would be helpful. -- John LaPlante, email 7/10/2007
:"A nice solution would be to have sub-samples where one property varies. When I'm doing this analysis, I've changed my sample many times, renaming it sometimes, tweaking it to get variations on the data. The samples are really useful but they could help me a lot more with this kind of experimentation."

==== Filter out students ====

As a researcher, I want to filter out test users (including myself) from my data so that I see less noise in the data.

* Ruth Wylie, July 3, 2008
* You can already filter out test users by using a test user id that starts with 'weirdCMUuser_xxx'. Then create a sample that excludes students with a name like 'Test_%'. --[[User:Alida|Alida]] 10:29, 4 September 2009 (EDT)

==== Filter by "Class" ====

As a researcher, I want to create samples based on "class" since class is how I've encoded my conditions.

* Maaike Waalkens, July 30, 2010
* This is what made sense using Mathtutor for tutor delivery.
* What other fields are we missing in the Sample Selector?

==== Filter by "Step" ====

As a researcher, I want to create samples based on "step" since I'm only interested in one particular step.

* Mimi McLaughlin, August 17, 2010

==== Create Sample Automatically ====

* Would it be possible for me to get a random sample from the 'Bridge to Algebra 2006-2007' dataset of 100 students? I am having trouble looking at the data because it takes too long to load, and my adviser thought that was because the dataset was too large. ~~ DataShop User, 10/19/2010

== Web Services ==

==== Include transaction custom fields in web services student-step export ====

* "I noticed in the web services guide that cfs (which provides the custom fields) is not yet implemented for step roll-up tables. It is marked [coming soon] in the manual... I could write a workaround to pull in the transactions and lookup the custom fields, but I'd really rather not." -- Phil Pavlik, 4/27/2011

==== Use Custom Fields Graphs/Reports ====
In Graphs

As a researcher creating custom fields and assigning values at the transaction level, I want DataShop to perform the aggregation to the step level so that I can do other things with my custom-field variable such as graph it. -- Ryan Baker, mtg w/Alida & Brett, 12/15/2008

Performance Profiler
* Use all discrete variables/custom fields on the left and all the continuous variables on the bottom of the Performance Profiler. -- [[User:Koedinger|Ken Koedinger]], DataShop Team Meeting, Sept 11, 2009

==== Ad-hoc queries ====

* Allow restricted filtering on steps and transactions as the next web service feature (after CFs), whatever they can filter on in the navigation boxes (User Meeting AAR, December 9, 2009)

==== Sample creation as a web service ====

* Sample creation is still too slow. (User Meeting AAR, December 9, 2009)

 
----
See completed [[DataShop 3.x Features]] 
See on-going [[DataShop 4.x Features]] 
See prioritized [[DataShop Feature Wish List]]
[[Category:Protected]]
[[Category:DataShop]]

Collected User Requests

2011-03-02T14:02:50Z

Bleber: /* Error Report */

See prioritized items on [[DataShop Feature Wish List]].

== Annotations ==

==== Annotations on Transaction Level ====
* I have models which can annotate things like: gaming, bored, etc. on the transaction level. -- Ryan Baker, ET Mtg 12/5/2007

==== Annotations on Student Level ====
* Would annotate on student level. -- Ido Roll, User Meeting, 1/19/2009

==== Annotations on Pages ====
* See the cool thing created by Jeffery Heer where all the settings of the page were recorded with the comment. -- Ryan Baker, DS Team Mtg 5/23/2008

==== Dataset Discussion - Capture data-integrity issues ====
* As a stakeholder in the DataShop project, I want to capture and publicize the data-integrity issues discovered with data sets so that data is better documented (and so we've fulfilled a promise to our funders to better document data). -- Ken Koedinger, Team Meeting, 8/15/2009
* As a user of DataShop, I want to discuss datasets and have that discussion attached to the dataset so that others can comment and better understand any data-integrity issues I've found.

==== Linking to internal pages ====

* It would be handy if a link can be saved to any dataset, sample, page in the DataShop. -- [[Alida]], 10/18/2007
** Currently, https://pslc-qa.andrew.cmu.edu/datashop/DatasetInfo?datasetId=793 works if you are already logged in.

==== Have a link from the DataShop to the Theory Wiki (Dataset to Project Page) ====

* Can we link from the dataset to the project page on the Theory Wiki? In the pipeline have a clickable link to the project page (make project name clickable). -- Michael Bett, ET Mtg 11/14/2007
** Link to a dataset directly? Is that obvious to users? Click on dataset link -> log in -> redirected back to dataset. -- Brett Leber

== Data Format ==
==== Capture the question prompt and answer choices the student chose from ====
* As a research, I want to be able to identify steps based on the question prompt—not the difficult-to-understand step names that come from selection and action of my tutor—so that I can analyze the data more easily. For multiple choice questions, I also want to be able to see all of the choices that were available to the student. -- Eli Silk - February 1, 2011 (meeting with Brett and Ross Higashi of the FIRE project)
** Near-term solution is to create a table locally that maps steps to prompts
** Another solution is to ask CTAT team to modify their Flash components so that they log this information as custom fields
** Long-term solution is to make these fields into standard fields in the tutor message format
** In the email thread "Cognitive Tutor Development and Evaluation Requests", there is support for this general idea from Ken Koedinger, Albert Corbett, and Christian Schunn.
** Ken added that "Ideally, we may want to store any images that the student can see and where they reside on the screen ..."

== Data Modeling ==
=== Non-KC Modeling ===

==== Automatic Distillation ====
* As an educational data miner wishing to develop a machine learned model with PSLC data, I would like to be able to automatically distill data features (e.g. custom fields) commonly used in past educational data mining research for a new data set (see, for instance, Baker, Corbett, Roll, & Koedinger, 2008 in UMUAI) -- Ryan Baker, Summer 2008, Startup Memo
** Could be implemented as a plug-in
* Also interested in this feature idea. -- Dan Franklin, Oct 2008

==== Upload model and apply it to new data set ====

* As an EDM researcher, I would like to take a model, expressable as a linear formula on DataShop fields, or a simple code procedure (e.g. Bayesian Knowledge Tracing, which Ryan has code for), and apply it to a new data set, so that I can ... -- Ryan Baker, Sept 2008
* Also interested in this. -- Maxine Eskenazi, Sept 2008
* May work best as a plug-in
** Code to display GUI to choose which data sets to use, calls model code, re-import to DataShop
** Good to have a way to apply many models, as soon as you import a data set
* Phil has an idea that maybe fits within this one. Please move if there's a better category. -- Brett Leber
<blockquote>This [''transaction? kc? --ed.''] relabeling is really mostly about enabling modeling in DataShop right? With this in mind, I think that it is actually a higher priority to have model alternatives in DataShop.... E.g. Investigators should be able to give you chunks of Java code according to a certain specification, and DataShop should be then able to run these over datasets (perhaps after a certain series of QA occurs according to an SOP) when the investigator clicks some button in DataShop.... Obviously this is a much large project than adding columns, but it is also much more important in my mind. ''-- Phil Pavlik, email to Brett on 1/14/2009''</blockquote>
* Examples:
** Example: running gaming detector in multiple tutors and comparing gaming frequencies
** Example: applying Bayesian Knowledge Tracing to a new data set from the same LearnLab
** Example: applying Ben Shih's models to many data sets. Note that Ben Shih should be included in design of this feature; he is interested, and has a lot of good ideas.

==== Add Different Predicted Values ====
* Would also like to add statistics, different predicted values than what LFA produces. -- Ken Koedinger, ET Meeting, 10/10/2007

==== Bayesian Knowledge Tracing ====
* Bayesian Knowledge-Tracing built into DataShop like LFA is. -- Ryan Baker, Startup Memo, Summer 2008

==== Richer statistics for KC modeling ====

* In addition, to the model stats and estimates generated for learning factor models, we should also create difficulty factor models (i.e., ones with "Slope" parameter). The latter is particularly relevant for the Unique-Step model where the slope parameter is meaningless (but still counts against the BIC value). -- Ken Koedinger, Email "new feature request", 1/22/2009
** Relatedly, we should report significance values on the Slope parameters -- that is, when is the Slope significantly different from 0.
** The KC models page perhaps should also report the log-likelihood and number of parameters (in addition to BIC) and leave out AIC. We might also consider other metrics of model generality, like the "adjusted R2" (if I have this name right -- Joe Beck mentioned in the Assistments meeting yesterday).
** These changes will be part of meeting the CMDM goal of improvement in (or at least demonstrate acceptability of) the cognitive models in 90+ units in our LearnLab courses (or affiliates).

=== KC Modeling ===

==== LFA/AFM: Check if enough memory using formula ====
* The LFA/AFM code could calculate how much RAM would be needed to run the algorithm on a given skill model using the formula provided by Hao. This formula is based on the number of transactions, number of students and number of skills. Right now it will not schedule itself to run on a model with over 300 skills, though there is a manual override. [[User:Alida|Alida]] 13:35, 29 November 2010 (EST)

==== Create KC Models through Web Services ====
* For John Stamper's CMDM project, it would be nice to automatically update KC Models through web services. -- [[User:Koedinger|Ken Koedinger]], DataShop Team Meeting, Sept 11, 2009

==== Automatically discovering new KC model ====

* Would it be possible to run some code (perhaps Hao's KC model selection code, perhaps something else generated by CMDM thrust) to find new best KC model. -- Vincent Aleven, Sept 2008
* As a learning sciences researcher, I would like DataShop to discover a new/better KC model for me.
* Could be done as a plug-in

==== Generate new KC Models with LFA ====
* Not sure who asked for this.
* It would be nice to generate new KC Models with Hao's LFA code
* Would need to specify factors.
* Ideas on where this could run?
** On a separate server? Request it to be run, specify duration. Have separate server queue up requests, email user when done.
** In Java Applet on client machine? -- Phil Pavlik

==== Same Skill Twice on Same Step ====
* Would like to be able to apply the same skill to a step twice during a KC Model Import. -- Ken Koedinger, email, 2/4/2009

==== Save KC Model Import Files ====
* KC Model Import - save the file used to create the KCMs in case we need to recreate them. -- Ken Koedinger, email 3/4/2009

==== Log Likelihood and MAD ====
* Log Likelihood, MAD (mean absolute deviation) problem, MAD step (store and show) -- Hao Cen
** This is a variation on "Richer statistics for KC modeling" above. Probably should be merged. - Ken Koedinger

==== Better naming for KCs in auto-generated Unique-Step KCM ====

As a researcher, I want the KCs in the Unique-Step model to have better names than KC1, KC2, etc, so that I can easily tell which generated KCs go with which unique step.

* Hui Cheng (Email 1/20/2009), Ken Koedinger (Email 1/22/2009)
* Could you put this in your new feature request list: could the “Unique-step model” be better labeled then just “KC1”, “KC2” and etc? For example, for Assistments data, you could use part of the “Step Name”. -- Hui Cheng, Email 1/20/2009
* But, anything is better than "KC". -- Ken Koedinger, email, 1/22/2009
<blockquote>A simple alternative, that preserves uniqueness and addresses length, is to concatenate: 1) the first K letters of the step name 2) a unique numerical increment (same as the "3" in "KC3"). Note that (2) guarantees uniqueness just as it does in the current "KC<num>" scheme. Or perhaps better given that step names are often scoped within problems, is to concatenate: 1) the first L letters of the problem name 2) the first M letters of the step name 3) a unique numerical increment (just like the "3" in "KC3") I think K or L+M should be as big as possible without making the KC names indistinguishable (because they run off the right margin) in the KC list on the Learning Curve and other pages. </blockquote>

==== Visualize Learning Curve Split ====
* Be able to visualize a learning curve split into 2 based on a specification of a subset of problems. -- Albert Corbett, Math CCM, November 2008
* Harder: make me a new KCM out of it. -- Ken Koedinger, Team Mtg, Dec 5, 2008, while trying to describe Albert's request to Alida

==== Statistical Significance ====
* Can DataShop determine if the difference between conditions or learning curves is statistically significant? -- general theme at workshop, probably mentioned by Bob Hausmann in his talk, Winter Workshop 1/23/2008
** Can't do it yet in DataShop, but I can show you how to do it in R (or SPSS...) after you've exported the data -- export the "student-step rollup" than the whole transaction table. --[[User:Koedinger|Koedinger]] 16:30, 16 September 2009 (EDT)
*** This comment perhaps belongs (exists?) elsewhere: The current student-step rollup export (relly, all exports) should be such that I can immediately load it into R (and other packages) without error. Now errors occur, for instance, because there are "#" in the variable names i the student-step rollup. However, it is currently a road-block for helping folks like Bob do analyzes they want to do. --[[User:Koedinger|Koedinger]] 16:30, 16 September 2009 (EDT)
*** See [[Condition in Student-Step Rollup]] -- [[User:Alida|Alida]] 10:35, 17 September 2009 (EDT)

==== Split and Merge Skills ====

* Hand searches though a p-matrix for a dataset to split and merge skills. (Pie in the Sky) -- Ken Koedinger, Team Mtg, 02/22/2008

==== Notes on new KCMs ====

* It would be good if I could add a note to a KC model that was newly imported. -- Noboru Matsuda, email, Nov 19, 2009

==== Display Learning Curves Grouped by Interestingness ====

* The page displaying all the learning curves today seems to be alphabetically sorted by KC name, which is not necessarily meaningful.
* An alternative presentation is to group the curves into 4 sets, breaking up the page. Set 1 has curves that contain significant spikes, and therefore seem to be "low-hanging fruit" for purposes of breaking up into KCs. Set 2 has curves with few spikes, but they have a long X axis, suggesting that students are presented with too many opportunities to acquire those KCs. Set 3 has the "good" curves, i.e., nicely decreasing curves that are not too long. Set 4 is "other". -- [http://www.pitt.edu/~goldin Ilya Goldin] 7 December 2009

==== Order KC models according to AIC====
* This is based on results from multiple dataset analyses that compared AIC, BIC and loglikelihood to cross validation RMSE. -- Mimi McLaughlin, 2/9/2011

====Use log of opportunity count for AIC and BIC calculations====
* We compared using the log of opportunity count to whole number opportunity count in multiple datasets. We found the results for log of opportunity count to be consistently better, though small. -- Mimi McLaughlin, 2/9/2011

== Developer Requests ==

==== Store Converted Date and Converter Info ====

As a DataShop administrator, I'd like to see the converter information (version and date) stored in the database, so that I do not have to store that data manually in the Additional Notes field each time I load a dataset. -- Kyle, 8/5/2008

* DTD new fields:
** Store conversion and converter information in database (anything else?)
*** conversion_time
*** converter_info
* Email from Octav, 10/5/2007

I see there's also a note about the converter version in the Dataset
Info. Which is good, but it seems it's taken from the directory name
when I submitted the set. I don't know how reliable that is.  :-) It
would be better if it's taken from the new converter info field.

==== Convert from XML to tabbed-delimited format ====

* If the users agree that export format is valuable, then maybe if they could convert from XML to export format to see data in Excel, could look at Selection column and see blanks more easily. -- Jonathan Sewall, ET Mtg, 10/10/2007

==== Plug-ins (general issues) ====

* Please please support Ruby on Rails. -- Ben Shih, December 2008

==== Create UI to grant DataShop user roles ====

* Already tedious.
* Alida, User Meeting AAR on December 9, 2009

== Help ==

==== Specialize Label of Help Button ====
* Since this help is better than in most applications, it should say more than just 'Help'. -- Ken Koedinger & Ryan Baker, Team Mtg, May 30, 2008
** Ideas:
*** 'Page Help'
*** 'Help with this page'
*** 'Help with Learning Curve page' (Ken's favorite)
*** 'Help with this tool'

== Home Page ==

==== Redesign the Home Page ====

* In the menu of data sets at the top, include the N (=20 or as many as fit on the screen?) data sets that I have visited starting with the ones I've visited most recently. --[[User:Koedinger|Koedinger]] 16:16, 16 September 2009 (EDT)
** As of today, the menu shows the last 10 data sets, most recently visited at the top. I think the feature is good enough, but let's ask Ken --[[User:Bleber|Bleber]] 10:56, 6 August 2010 (EDT)
* There needs to be a better ordering for the datasets (DS364)
* Maybe a search to filter the list of datasets since the list is so long. -- Brett Leber, 6/14/2007
* Going back to the home page always goes to 'My Datasets' (DS313)
* Maybe show more high level stats on this page, like how many transactions, students, skill models
* Allow users to post and share project documentation (files, papers, other meta-info) -- Ruth Wylie, suggested during meeting on 8/4/2010. She had a file that is relevant to multiple datasets in her project.

== Import ==
== Miscellaneous ==

==== Analyses by LearnLab ====

* Organize data by LearnLab, not by "data set". -- Ryan Baker, Aug 2008
* Also: Bob Hausmann, Sep 2008; Maxine Eskenazi, Sep 2008
* Essentially, current data sets become samples, but the top-level unit is the LearnLab. You can take every data set in a LearnLab together as a sample.
* Implies being able to run analyses across data sets, and export multiple data sets together; to create multi-data set samples
* As a user of DataShop, I would like to look at learning curves for all Algebra data together (for example), or export all Algebra data
* Important long-term, but is a lot of work -- in particular, we need to solve scalability issues first.

==== Save Settings Between Sessions ====
* It would be useful if DataShop could save settings between sessions. -- Bob Hausmann, User Meeting, 2/1/2008
** "I do a lot of redoing the same steps" (e.g., set cutoffs, select a KC model, select students).

==== Multiple steps per transaction ====

* Needed so that we do not have to create multiple transactions for the same actual action for Andes logs. -- Kurt van Lehn, Feb 2007

==== Demographic data ====

* This has been mentioned by NSF visitors, AB, ESL, and some researchers.
* Also mentioned at Winter Workshop 1/23/2008.
** Derek/Sue-mei: Student background information not in DataShop. Would like to see a student or set of students from a particular demographic, and view them across datasets!
* Note that Gail added demographic data to Additional Notes field on the Dataset Info page for many datasets. The idea here is to put that data into the database somewhere.

==== Single Sign On ====
* Michael Bett, email, 10/8/2007
* It would be nice if the following services have a single login account/password:
*# Theory Wiki
*# Learnlab.org
*# ESL's OSS
*# DataShop

==== Reveal unanonymized student IDs ====

As a researcher/PI performing research assistant tasks, I want to see easily the unanonymized student IDs of subjects in DataShop so that I can email my subjects telling them when to use my system.

* Ruth Wylie, July 3, 2008
* As the honest broker of the PSLC data, I have promised to not reveal the student IDs and to protect the identity of the students. This is part of the DataShop IRB. Therefore, I do not see this request as possible. [[User:Alida|Alida]] 09:53, 4 September 2009 (EDT)
** At some point, some coordination would be good with OLI's Digital Dashboard project that Marsha Lovett (and the OLI team and sometimes me) is working on. This is meant to provide usage information more quickly to instructors. It could also perhaps be used by researchers (with the right IRB rights) in situations like Ruth's.

==== Knight Timeline ====
* Developed by Andrea Knight, 2004

==== Buggy Skills ====
* Ken Koedinger, prototype walkthrough 9/11/2006

==== Confusion Matrix ====
* Brian MacWhinney, prototype walkthrough 9/11/2006

== Navigation Bar ==
==== Filter KCs by Name ====

As a researcher working with KCs, I want to filter KCs based on their names, so that I can...
* Vincent Aleven, Email, 2/3/2008
* "Since Alida said you cannot have two mechanisms for putting together your KC set (i.e., cannot have both the selecting-by-clicking and selecting-by-filtering), I would probably opt for the latter."
* Alida: I thought Vincent mentioned that he'd like to select which KCs are in a set by filtering on the name. Example: Include KCs with '*reason*' in the name and exclude KCs with '*given*' in the name.
* This could be an addition to our v3.0 KC-selection mechanism--filter by name.
* Vincent, Email, 5/6/2009: Expressed another need for this feature. Could just allow for a wider area and longer list so that more items can be checked at once. The number of characters we show right now is not enough because in many cases that number of characters is the same across many of the skills. Reference data set: Geometry CWCTC 2005-2006

* Status: Design Started

==== Facebook-style KC Selection ====

As a researcher working with KCs, I want to select KCs based on the learning curve thumbnail, so that I can see quickly which ones I'm interested in exploring more deeply.

* Feature already designed for v3.0, not implemented due to time constraints.
* Agreed this would be really useful. -- Kirsten Butcher, User Mtg, 1/31/2008
* Status: Guestimate: 20 days, need to revisit requirements document

==== Feedback after clicking a large sample on a large dataset ====

As a user of DataShop (first-time or not), I want some feedback and the ability to cancel after I do something that might take a long time (e.g., clicking "All Data" on a large dataset) so that I do not get stuck.

* Part of the Susan Goldman story
* After clicking a sample for a large dataset, there is no "Loading..." text, no feedback that the click was even registered by the app (besides the sometimes busy cursor and small browser "loading" text), nor the ability to cancel the action. 
* We will always have similar problems even if performance is improved, so providing feedback and the opportunity to cancel is critical.

==== Save Button in Problem Navigation Box ====

* Save buttons in the sidebar. -- Ken Koedinger, Mtg 2006
** Could also put one in the Problem selection box in the sidebar.

==== Make Nav Bar Wider ====
* Make the Sample name and description fields much wider. -- John LaPlante, email 7/10/2007

== New Visualizations/Reports ==

==== Student-KC Rollup ====
As a researcher, I want to see KCs rolled up by student, so that I can ...

* Vincent Aleven, User Mtg, 1/29/2008
** By Student-KC would be more useful than by Student-Problem
** Example: # Steps asking for a hint or error or what proportion had help
** How often bottom out hint occurs

==== Instructor Reports ====
* Phil said he received a lot of positive reactions to providing reports on units for instructors. -- Phil Pavlik, ET Mtg 10/10/2007

==== Manage Authorizations/Projects Page ====
* Lisa Anthony, email 10/23/2007
* Allow PI to change permissions on the datasets.
:"Actually, I couldn't see how to change permissions on the datasets from the website. Is this possible? If not, it might be a nice feature..."

==== Calculate Time Spent on Different Study Activities ====

As a researcher, I want to know how much time, on average, students spend on study activities, so that I can ...

* Bruce McLaren, Email, 4/7/2009
<blockquote>
For my most recent stoich study, Shawn and I are interested in calculating timing information such as: (a) how long students spent, on average, working on individual tutors (b) how long students spent, on average, on all items in an intervention (c) how long students worked, on average, on post-tests. 

Timing information is very commonly required for studies, and can be calculated from DataShop logs relatively easily, so even if we don't have it, might be worth considering. (And we don't want to re-invent the wheel, if you already have it or are planning it...)
</blockquote>

==== Incorrect Step Duration and Hint Step Duration ====

As a researcher, I want to be able to see total step duration if the student's first attempt was an incorrect attempt, and total step duration if the student's first attempt was a hint request, so that I can do some analyses that I can't do with "Error Step Duration".

* Bob Hausmann, email, 11/11/2008.
* Updated title and story with 'step duration' instead of 'time'. -- [[User:Alida|Alida]] 10:36, 4 September 2009 (EDT)

==== Grading ====

[[Grading]]

====Display number of steps and number of observations for skills====
* How and where to be determined by developers. -- Ken (entered by Mimi)

== Reports ==
=== Dataset Info ===

==== Pointers to Hard-copy Data ====
* Brett van de Sande, NSF Site Visit, 5/28/2008
* Pointers to hard-copy data such as paper tests and/or homework.  Include contact information.  It doesn't seem to make sense to scan a whole filing cabinet of paper if no one wants to look at it.  And any secondary researchers don't know about the filing cabinet to ask for it.

==== Sort Problem Breakdown Table ====

* Would like the ability to sort the table by clicking on the column headings of the Problem Breakdown Table on the Dataset Info Tab. -- Bruce McLaren, User Mtg, 11/5/2007

==== Rename dataset ====

As a researcher, I want to rename my dataset so that it makes more sense to other people. I also want to make sure the dataset doesn't become polluted later by new data not associated with my study.
* Ruth Wylie, July 3, 2008
* There are reasons she would want to do this (current name is worthless, other researchers might try her tutor and pollute her data) but also reasons for not doing it (log more data later).
* There are risks in changing a dataset name that might not be apparent. For example, if you want the new data in the same dataset. [[User:Alida|Alida]] 10:27, 4 September 2009 (EDT)

==== Average time per problem, average number of problems, total number of sections ====

In addition to showing student hours per dataset, it would be useful to know the average time spent per problem, average number of problems completed, and the total number of sections.
* Noboru Matsuda, June 18, 2010

=== Error Report ===

==== View By Student ====

* Would like to see what a couple of students saw in the feedback. -- Marsha Lovett, 10/11/2007

==== Export ====
* I would like the ability to export this data. -- John LaPlante, email thread 'Suggestions for Improvement' 7/10/2007
* Also interested in this feature idea. -- Bruce McLaren, User Mtg, 11/5/2007

==== Sort ====

* Was planned for but not implemented in v2.1 (estimated to be a 4 day task)
** By Correctness %, starting with the least correct
** By Hints %
** Step (or KC if view by KC)
** Number of Students
* Ability to sort problems by their average experienced position within the curriculum. -- Ken Koedinger, 02/16/2007
** Which problem did students most often experience first, then the one experience second most often, ...
* Order steps by the order they typically are executed by students. -- Ken Koedinger, email 11/7/2008
:"Searching through the steps in a problem to get a sense of what is going on is currently hard because the steps are ordered alphabetically, not by the order in which most students did them. While not all students do all steps in the same order, there is some regularity there. It would be quite useful if the steps could be ordered in a "typical order". This could be accomplished by using the time stamps (of the first (correct?) transaction?) for each step to determine rank order of the each step for each student in a problem and then for each problem average the rank order of each step across all students. Then arrange the steps in the Error Report by their average rank order -- that is, roughly speaking, the step that is most likely to be first across students (closest on average to first) goes first, the step with the next lowest rank goes next, etc."
* The capability to count the number of errors of each message type and sort in different ways, for instance by all errors that had no messages. -- Bruce McLaren, email 10/22/2007
:"This is an error analysis I recently did in Excel, using pivot tables, that might be handy if in the DataShop. This one is very important for tutors because the errors that occur most frequently, yet don't elicit messages to the students, are good candidates to become errors with feedback."

==== Show more than 500 problems ====

In the error report, can we see more than 500 problems? See set "Cog Model Discovery Experiment Spring 2010"
2/2888 selected.
(Showing the first 500)
-- Ken via email on 2/15/2011

=== Export ===

==== Last attempt on step? column for transaction format ====

* Include a new column that shows whether the row is the last attempt on a step for a student or not. Could be 0 or 1 as value. Helpful for researchers who are doing grading of data. Transaction format. --Vincent Aleven, CTAT mtg 11/5/2010

==== Elapsed Time ====

* Include the elapsed time in preview and transaction export. It is more valuable than the transaction time as an absolute reference. Possible to keep both. --Ken Koedinger, Team Mtg 04/18/2008

==== SQL Format ====

* Option to export as an SQL file. -- Ken Koedinger, 03/26/2007, also brought up in June ET Meeting
** Ability to export an SQL dump of a dataset. --Kyle Cunningham, 04/03/2007

==== Specify Character for Blanks ====

* Ability to specify what character if any is used for blank. --Ryan Baker, email 8/9/2007
:"Not all tools handles TABTABTAB correctly on import. The period '.' is used to mean missing data in most stats packages. The word 'BLANK' is used in some other ones. Not an issue for Ryan as he wrote a preprocessor to convert blanks.

==== Opportunity (at KC) Count in Transaction Export ====
* Include the opportunity count in the transaction export (it's only in the student-step rollup) -- Noboru Matsuda, 10/08/2009
** Would be repetitive
** Would need one Opportunity column per KC

==== Export only rows that have KCs tagged ====
* Is it possible when a exporting dataset to include only the [transaction?] rows that have knowledge components tagged? And the same goes for KC models export, only include the items that have KC tagged? -- Hui Cheng, 01/19/2010
** We have the inverse of this option on the Performance Profiler, 'Include steps without a knowledge component', and with the Student-Problem export, 'Include Steps without KCs'.

==== Include Custom Fields in Student-Step Rollup ====
* Is it possible to include the custom field in the datashop [step] export? -- Hui Cheng, 03/01/2010
** "I am thinking about the student-step level export. We have a student from Statistics wanting to use assistment data from datashop. She wants to have problem set id (=curriculum id in datshop). For Assistment data, we decided to put problem set id in the custom field because in the Assistment a lot of problems are repeated in different problem sets. She needs data at the student-step level." (Hui email)

==== Don't duplicate rows in student-step format when not showing KCs ====
* If the checkbox to show knowledge components is not checked, maybe it doesn't make sense to show rows more than once if they have more than one KC associated with the step. (Mimi (and Brett) stumbled on this. 8/16/2010)

==== Student-Step Rollup include Success Column ====
* Step rollup, 1 if correct, 0 if incorrect/hint, blank otherwise call it Success. ~~ Ken Koedinger, DataShop Team Meeting, Oct 22, 2010

=== Learning Curve ===

==== Default sort by observation on LCPID ====

* Default sort by observation on LCPID. ~~ Ken Koedinger, DataShop Team Meeting, 10/22/2010

==== Purple Point ====
* Purple Point: if a point on the LC has more than one KC associated with it but you have drilled down to a given KC, then the blue line is off. We could put a purple point that takes this into account.
* Simpler thing: display a warning message that some points in the display are driven by other KCs
* Pearson may be interested
* This was mentioned during the PSLC Summer School 2010.
* For a step with multiple skills, attribute the error only to the skill with the highest overall error rate. ~~ Alida, meeting with Ken, November 18, 2010

==== Reduce Scrolling ====
* Add a forward and back button to the graph to reduce scrolling. -- John LaPlante, email 7/10/2007

:"The learning curve page could use a forward and back button to cycle through learning the learning curves. Going through them one by one requires a lot of scrolling."

==== Turn On Point Labels ====

* It would be nice to have the option to turn on point labels. It is nice that I can mouse over a point and view the data but it would be nicer if it appeared automatically. -- John LaPlante, email 7/10/2007

==== Option for Bigger Graph ====

* Allow user to see bigger graph. -- Derek Chan, Winter Workshop 1/23/2008
** Potential solution: enable user to set x, y scale manually

==== LC Normalize Scale of Thumbnails ====
* [[LC Normalize Scale of Thumbnails]]

=== Performance Profiler ===

==== Rename Performance Profiler ====
* John laPlante, email thread 'Suggestions for Improvement' 7/10/2007
** Did not use this report as thought it had something to do with improving the performance of the DataShop itself. This report might have been much better to use then the Error Report, would still need an export as using the data in a tabular form was still necessary. Note that the pivot tables created were added to the dataset (Pittsburgh Science of Learning Center Stoichiometry Study 1).
* Lisa Anthony, email 8/2008
** Didn't know to go to the report. Export would be useful.
** Needed a better definition of Error Rate with respect to Problem and Unit rows.

==== Export ====
* John laPlante (see comments in Rename Performance Profiler)
* Lisa Anthony (see comments in Rename Performance Profiler)
* Yes, I too can imagine wanting to export the results of a particular performance profiler output (i.e., to a table) so that I can graph it my own way. --[[User:Koedinger|Koedinger]] 16:24, 16 September 2009 (EDT)

==== Table View ====
* Add option to switch to a table view.
* Columns are: Problem Name, Steps, % incorrect, Incorrect Steps, % hint, Hint Steps, etc. include all values in pop-up.

==== Union of KCs/Problems/Students ====
* Allow user to get the union of KCs/Problems/Students etc so they can compare across samples easier. -- Kirsten Butcher, Winter Workshop 1/23/2008
:"It is difficult to compare performance profiler graphs across samples because the KCs (or problems, or whatever) aren't necessarily in both of those samples."

==== Show Details In Report ====
* Click on bar to see details in report and not just in pop-up. It disappears too quickly. -- Alida, Brett

==== Show More Information in the Graph ====
* Show more information in the graph: [Bruce M, User Mtg, 11/5/2007]
** Had drilled down by a certain skill - skill is not listed in the graph, user has to check the skill list on the LHS to see what skill was selected
* Maybe related to Bruce's "show more info in graph": show the actual value of the range variable. e.g., when range is error rate, show the error rate number somewhere (right now you can see, via mouse-over, the incorrect, hint, and correct percentages, but not the error rate) -- Brett
* Design idea to show # steps incorrect, # steps hint, # steps correct to clarify how the percentages are calculated. -- Alida, Brett

== Sample Selector ==

==== Sub-Samples ====

* Sub Samples would be helpful. -- John LaPlante, email 7/10/2007
:"A nice solution would be to have sub-samples where one property varies. When I'm doing this analysis, I've changed my sample many times, renaming it sometimes, tweaking it to get variations on the data. The samples are really useful but they could help me a lot more with this kind of experimentation."

==== Filter out students ====

As a researcher, I want to filter out test users (including myself) from my data so that I see less noise in the data.

* Ruth Wylie, July 3, 2008
* You can already filter out test users by using a test user id that starts with 'weirdCMUuser_xxx'. Then create a sample that excludes students with a name like 'Test_%'. --[[User:Alida|Alida]] 10:29, 4 September 2009 (EDT)

==== Filter by "Class" ====

As a researcher, I want to create samples based on "class" since class is how I've encoded my conditions.

* Maaike Waalkens, July 30, 2010
* This is what made sense using Mathtutor for tutor delivery.
* What other fields are we missing in the Sample Selector?

==== Filter by "Step" ====

As a researcher, I want to create samples based on "step" since I'm only interested in one particular step.

* Mimi McLaughlin, August 17, 2010

==== Create Sample Automatically ====

* Would it be possible for me to get a random sample from the 'Bridge to Algebra 2006-2007' dataset of 100 students? I am having trouble looking at the data because it takes too long to load, and my adviser thought that was because the dataset was too large. ~~ DataShop User, 10/19/2010

== Web Services ==

==== Use Custom Fields Graphs/Reports ====
In Graphs

As a researcher creating custom fields and assigning values at the transaction level, I want DataShop to perform the aggregation to the step level so that I can do other things with my custom-field variable such as graph it. -- Ryan Baker, mtg w/Alida & Brett, 12/15/2008

Performance Profiler
* Use all discrete variables/custom fields on the left and all the continuous variables on the bottom of the Performance Profiler. -- [[User:Koedinger|Ken Koedinger]], DataShop Team Meeting, Sept 11, 2009

==== Ad-hoc queries ====

* Allow restricted filtering on steps and transactions as the next web service feature (after CFs), whatever they can filter on in the navigation boxes (User Meeting AAR, December 9, 2009)

==== Sample creation as a web service ====

* Sample creation is still too slow. (User Meeting AAR, December 9, 2009)

 
----
See completed [[DataShop 3.x Features]] 
See on-going [[DataShop 4.x Features]] 
See prioritized [[DataShop Feature Wish List]]
[[Category:Protected]]
[[Category:DataShop]]

Collected User Requests

2011-02-04T16:12:44Z

Bleber: /* Capture the question prompt and answer choices the student chose from */

See prioritized items on [[DataShop Feature Wish List]].

== Annotations ==

==== Annotations on Transaction Level ====
* I have models which can annotate things like: gaming, bored, etc. on the transaction level. -- Ryan Baker, ET Mtg 12/5/2007

==== Annotations on Student Level ====
* Would annotate on student level. -- Ido Roll, User Meeting, 1/19/2009

==== Annotations on Pages ====
* See the cool thing created by Jeffery Heer where all the settings of the page were recorded with the comment. -- Ryan Baker, DS Team Mtg 5/23/2008

==== Dataset Discussion - Capture data-integrity issues ====
* As a stakeholder in the DataShop project, I want to capture and publicize the data-integrity issues discovered with data sets so that data is better documented (and so we've fulfilled a promise to our funders to better document data). -- Ken Koedinger, Team Meeting, 8/15/2009
* As a user of DataShop, I want to discuss datasets and have that discussion attached to the dataset so that others can comment and better understand any data-integrity issues I've found.

==== Linking to internal pages ====

* It would be handy if a link can be saved to any dataset, sample, page in the DataShop. -- [[Alida]], 10/18/2007
** Currently, https://pslc-qa.andrew.cmu.edu/datashop/DatasetInfo?datasetId=793 works if you are already logged in.

==== Have a link from the DataShop to the Theory Wiki (Dataset to Project Page) ====

* Can we link from the dataset to the project page on the Theory Wiki? In the pipeline have a clickable link to the project page (make project name clickable). -- Michael Bett, ET Mtg 11/14/2007
** Link to a dataset directly? Is that obvious to users? Click on dataset link -> log in -> redirected back to dataset. -- Brett Leber

== Data Format ==
==== Capture the question prompt and answer choices the student chose from ====
* As a research, I want to be able to identify steps based on the question prompt—not the difficult-to-understand step names that come from selection and action of my tutor—so that I can analyze the data more easily. For multiple choice questions, I also want to be able to see all of the choices that were available to the student. -- Eli Silk - February 1, 2011 (meeting with Brett and Ross Higashi of the FIRE project)
** Near-term solution is to create a table locally that maps steps to prompts
** Another solution is to ask CTAT team to modify their Flash components so that they log this information as custom fields
** Long-term solution is to make these fields into standard fields in the tutor message format
** In the email thread "Cognitive Tutor Development and Evaluation Requests", there is support for this general idea from Ken Koedinger, Albert Corbett, and Christian Schunn.
** Ken added that "Ideally, we may want to store any images that the student can see and where they reside on the screen ..."

== Data Modeling ==
=== Non-KC Modeling ===

==== Automatic Distillation ====
* As an educational data miner wishing to develop a machine learned model with PSLC data, I would like to be able to automatically distill data features (e.g. custom fields) commonly used in past educational data mining research for a new data set (see, for instance, Baker, Corbett, Roll, & Koedinger, 2008 in UMUAI) -- Ryan Baker, Summer 2008, Startup Memo
** Could be implemented as a plug-in
* Also interested in this feature idea. -- Dan Franklin, Oct 2008

==== Upload model and apply it to new data set ====

* As an EDM researcher, I would like to take a model, expressable as a linear formula on DataShop fields, or a simple code procedure (e.g. Bayesian Knowledge Tracing, which Ryan has code for), and apply it to a new data set, so that I can ... -- Ryan Baker, Sept 2008
* Also interested in this. -- Maxine Eskenazi, Sept 2008
* May work best as a plug-in
** Code to display GUI to choose which data sets to use, calls model code, re-import to DataShop
** Good to have a way to apply many models, as soon as you import a data set
* Phil has an idea that maybe fits within this one. Please move if there's a better category. -- Brett Leber
<blockquote>This [''transaction? kc? --ed.''] relabeling is really mostly about enabling modeling in DataShop right? With this in mind, I think that it is actually a higher priority to have model alternatives in DataShop.... E.g. Investigators should be able to give you chunks of Java code according to a certain specification, and DataShop should be then able to run these over datasets (perhaps after a certain series of QA occurs according to an SOP) when the investigator clicks some button in DataShop.... Obviously this is a much large project than adding columns, but it is also much more important in my mind. ''-- Phil Pavlik, email to Brett on 1/14/2009''</blockquote>
* Examples:
** Example: running gaming detector in multiple tutors and comparing gaming frequencies
** Example: applying Bayesian Knowledge Tracing to a new data set from the same LearnLab
** Example: applying Ben Shih's models to many data sets. Note that Ben Shih should be included in design of this feature; he is interested, and has a lot of good ideas.

==== Add Different Predicted Values ====
* Would also like to add statistics, different predicted values than what LFA produces. -- Ken Koedinger, ET Meeting, 10/10/2007

==== Bayesian Knowledge Tracing ====
* Bayesian Knowledge-Tracing built into DataShop like LFA is. -- Ryan Baker, Startup Memo, Summer 2008

==== Richer statistics for KC modeling ====

* In addition, to the model stats and estimates generated for learning factor models, we should also create difficulty factor models (i.e., ones with "Slope" parameter). The latter is particularly relevant for the Unique-Step model where the slope parameter is meaningless (but still counts against the BIC value). -- Ken Koedinger, Email "new feature request", 1/22/2009
** Relatedly, we should report significance values on the Slope parameters -- that is, when is the Slope significantly different from 0.
** The KC models page perhaps should also report the log-likelihood and number of parameters (in addition to BIC) and leave out AIC. We might also consider other metrics of model generality, like the "adjusted R2" (if I have this name right -- Joe Beck mentioned in the Assistments meeting yesterday).
** These changes will be part of meeting the CMDM goal of improvement in (or at least demonstrate acceptability of) the cognitive models in 90+ units in our LearnLab courses (or affiliates).

=== KC Modeling ===

==== LFA/AFM: Check if enough memory using formula ====
* The LFA/AFM code could calculate how much RAM would be needed to run the algorithm on a given skill model using the formula provided by Hao. This formula is based on the number of transactions, number of students and number of skills. Right now it will not schedule itself to run on a model with over 300 skills, though there is a manual override. [[User:Alida|Alida]] 13:35, 29 November 2010 (EST)

==== Create KC Models through Web Services ====
* For John Stamper's CMDM project, it would be nice to automatically update KC Models through web services. -- [[User:Koedinger|Ken Koedinger]], DataShop Team Meeting, Sept 11, 2009

==== Automatically discovering new KC model ====

* Would it be possible to run some code (perhaps Hao's KC model selection code, perhaps something else generated by CMDM thrust) to find new best KC model. -- Vincent Aleven, Sept 2008
* As a learning sciences researcher, I would like DataShop to discover a new/better KC model for me.
* Could be done as a plug-in

==== Generate new KC Models with LFA ====
* Not sure who asked for this.
* It would be nice to generate new KC Models with Hao's LFA code
* Would need to specify factors.
* Ideas on where this could run?
** On a separate server? Request it to be run, specify duration. Have separate server queue up requests, email user when done.
** In Java Applet on client machine? -- Phil Pavlik

==== Same Skill Twice on Same Step ====
* Would like to be able to apply the same skill to a step twice during a KC Model Import. -- Ken Koedinger, email, 2/4/2009

==== Save KC Model Import Files ====
* KC Model Import - save the file used to create the KCMs in case we need to recreate them. -- Ken Koedinger, email 3/4/2009

==== Log Likelihood and MAD ====
* Log Likelihood, MAD (mean absolute deviation) problem, MAD step (store and show) -- Hao Cen
** This is a variation on "Richer statistics for KC modeling" above. Probably should be merged. - Ken Koedinger

==== Better naming for KCs in auto-generated Unique-Step KCM ====

As a researcher, I want the KCs in the Unique-Step model to have better names than KC1, KC2, etc, so that I can easily tell which generated KCs go with which unique step.

* Hui Cheng (Email 1/20/2009), Ken Koedinger (Email 1/22/2009)
* Could you put this in your new feature request list: could the “Unique-step model” be better labeled then just “KC1”, “KC2” and etc? For example, for Assistments data, you could use part of the “Step Name”. -- Hui Cheng, Email 1/20/2009
* But, anything is better than "KC". -- Ken Koedinger, email, 1/22/2009
<blockquote>A simple alternative, that preserves uniqueness and addresses length, is to concatenate: 1) the first K letters of the step name 2) a unique numerical increment (same as the "3" in "KC3"). Note that (2) guarantees uniqueness just as it does in the current "KC<num>" scheme. Or perhaps better given that step names are often scoped within problems, is to concatenate: 1) the first L letters of the problem name 2) the first M letters of the step name 3) a unique numerical increment (just like the "3" in "KC3") I think K or L+M should be as big as possible without making the KC names indistinguishable (because they run off the right margin) in the KC list on the Learning Curve and other pages. </blockquote>

==== Visualize Learning Curve Split ====
* Be able to visualize a learning curve split into 2 based on a specification of a subset of problems. -- Albert Corbett, Math CCM, November 2008
* Harder: make me a new KCM out of it. -- Ken Koedinger, Team Mtg, Dec 5, 2008, while trying to describe Albert's request to Alida

==== Statistical Significance ====
* Can DataShop determine if the difference between conditions or learning curves is statistically significant? -- general theme at workshop, probably mentioned by Bob Hausmann in his talk, Winter Workshop 1/23/2008
** Can't do it yet in DataShop, but I can show you how to do it in R (or SPSS...) after you've exported the data -- export the "student-step rollup" than the whole transaction table. --[[User:Koedinger|Koedinger]] 16:30, 16 September 2009 (EDT)
*** This comment perhaps belongs (exists?) elsewhere: The current student-step rollup export (relly, all exports) should be such that I can immediately load it into R (and other packages) without error. Now errors occur, for instance, because there are "#" in the variable names i the student-step rollup. However, it is currently a road-block for helping folks like Bob do analyzes they want to do. --[[User:Koedinger|Koedinger]] 16:30, 16 September 2009 (EDT)
*** See [[Condition in Student-Step Rollup]] -- [[User:Alida|Alida]] 10:35, 17 September 2009 (EDT)

==== Split and Merge Skills ====

* Hand searches though a p-matrix for a dataset to split and merge skills. (Pie in the Sky) -- Ken Koedinger, Team Mtg, 02/22/2008

==== Notes on new KCMs ====

* It would be good if I could add a note to a KC model that was newly imported. -- Noboru Matsuda, email, Nov 19, 2009

==== Display Learning Curves Grouped by Interestingness ====

* The page displaying all the learning curves today seems to be alphabetically sorted by KC name, which is not necessarily meaningful.
* An alternative presentation is to group the curves into 4 sets, breaking up the page. Set 1 has curves that contain significant spikes, and therefore seem to be "low-hanging fruit" for purposes of breaking up into KCs. Set 2 has curves with few spikes, but they have a long X axis, suggesting that students are presented with too many opportunities to acquire those KCs. Set 3 has the "good" curves, i.e., nicely decreasing curves that are not too long. Set 4 is "other". -- [http://www.pitt.edu/~goldin Ilya Goldin] 7 December 2009

== Developer Requests ==

==== Store Converted Date and Converter Info ====

As a DataShop administrator, I'd like to see the converter information (version and date) stored in the database, so that I do not have to store that data manually in the Additional Notes field each time I load a dataset. -- Kyle, 8/5/2008

* DTD new fields:
** Store conversion and converter information in database (anything else?)
*** conversion_time
*** converter_info
* Email from Octav, 10/5/2007

I see there's also a note about the converter version in the Dataset
Info. Which is good, but it seems it's taken from the directory name
when I submitted the set. I don't know how reliable that is.  :-) It
would be better if it's taken from the new converter info field.

==== Convert from XML to tabbed-delimited format ====

* If the users agree that export format is valuable, then maybe if they could convert from XML to export format to see data in Excel, could look at Selection column and see blanks more easily. -- Jonathan Sewall, ET Mtg, 10/10/2007

==== Plug-ins (general issues) ====

* Please please support Ruby on Rails. -- Ben Shih, December 2008

==== Create UI to grant DataShop user roles ====

* Already tedious.
* Alida, User Meeting AAR on December 9, 2009

== Help ==

==== Specialize Label of Help Button ====
* Since this help is better than in most applications, it should say more than just 'Help'. -- Ken Koedinger & Ryan Baker, Team Mtg, May 30, 2008
** Ideas:
*** 'Page Help'
*** 'Help with this page'
*** 'Help with Learning Curve page' (Ken's favorite)
*** 'Help with this tool'

== Home Page ==

==== Redesign the Home Page ====

* In the menu of data sets at the top, include the N (=20 or as many as fit on the screen?) data sets that I have visited starting with the ones I've visited most recently. --[[User:Koedinger|Koedinger]] 16:16, 16 September 2009 (EDT)
** As of today, the menu shows the last 10 data sets, most recently visited at the top. I think the feature is good enough, but let's ask Ken --[[User:Bleber|Bleber]] 10:56, 6 August 2010 (EDT)
* There needs to be a better ordering for the datasets (DS364)
* Maybe a search to filter the list of datasets since the list is so long. -- Brett Leber, 6/14/2007
* Going back to the home page always goes to 'My Datasets' (DS313)
* Maybe show more high level stats on this page, like how many transactions, students, skill models
* Allow users to post and share project documentation (files, papers, other meta-info) -- Ruth Wylie, suggested during meeting on 8/4/2010. She had a file that is relevant to multiple datasets in her project.

== Import ==
== Miscellaneous ==

==== Analyses by LearnLab ====

* Organize data by LearnLab, not by "data set". -- Ryan Baker, Aug 2008
* Also: Bob Hausmann, Sep 2008; Maxine Eskenazi, Sep 2008
* Essentially, current data sets become samples, but the top-level unit is the LearnLab. You can take every data set in a LearnLab together as a sample.
* Implies being able to run analyses across data sets, and export multiple data sets together; to create multi-data set samples
* As a user of DataShop, I would like to look at learning curves for all Algebra data together (for example), or export all Algebra data
* Important long-term, but is a lot of work -- in particular, we need to solve scalability issues first.

==== Save Settings Between Sessions ====
* It would be useful if DataShop could save settings between sessions. -- Bob Hausmann, User Meeting, 2/1/2008
** "I do a lot of redoing the same steps" (e.g., set cutoffs, select a KC model, select students).

==== Multiple steps per transaction ====

* Needed so that we do not have to create multiple transactions for the same actual action for Andes logs. -- Kurt van Lehn, Feb 2007

==== Demographic data ====

* This has been mentioned by NSF visitors, AB, ESL, and some researchers.
* Also mentioned at Winter Workshop 1/23/2008.
** Derek/Sue-mei: Student background information not in DataShop. Would like to see a student or set of students from a particular demographic, and view them across datasets!
* Note that Gail added demographic data to Additional Notes field on the Dataset Info page for many datasets. The idea here is to put that data into the database somewhere.

==== Single Sign On ====
* Michael Bett, email, 10/8/2007
* It would be nice if the following services have a single login account/password:
*# Theory Wiki
*# Learnlab.org
*# ESL's OSS
*# DataShop

==== Reveal unanonymized student IDs ====

As a researcher/PI performing research assistant tasks, I want to see easily the unanonymized student IDs of subjects in DataShop so that I can email my subjects telling them when to use my system.

* Ruth Wylie, July 3, 2008
* As the honest broker of the PSLC data, I have promised to not reveal the student IDs and to protect the identity of the students. This is part of the DataShop IRB. Therefore, I do not see this request as possible. [[User:Alida|Alida]] 09:53, 4 September 2009 (EDT)
** At some point, some coordination would be good with OLI's Digital Dashboard project that Marsha Lovett (and the OLI team and sometimes me) is working on. This is meant to provide usage information more quickly to instructors. It could also perhaps be used by researchers (with the right IRB rights) in situations like Ruth's.

==== Knight Timeline ====
* Developed by Andrea Knight, 2004

==== Buggy Skills ====
* Ken Koedinger, prototype walkthrough 9/11/2006

==== Confusion Matrix ====
* Brian MacWhinney, prototype walkthrough 9/11/2006

== Navigation Bar ==
==== Filter KCs by Name ====

As a researcher working with KCs, I want to filter KCs based on their names, so that I can...
* Vincent Aleven, Email, 2/3/2008
* "Since Alida said you cannot have two mechanisms for putting together your KC set (i.e., cannot have both the selecting-by-clicking and selecting-by-filtering), I would probably opt for the latter."
* Alida: I thought Vincent mentioned that he'd like to select which KCs are in a set by filtering on the name. Example: Include KCs with '*reason*' in the name and exclude KCs with '*given*' in the name.
* This could be an addition to our v3.0 KC-selection mechanism--filter by name.
* Vincent, Email, 5/6/2009: Expressed another need for this feature. Could just allow for a wider area and longer list so that more items can be checked at once. The number of characters we show right now is not enough because in many cases that number of characters is the same across many of the skills. Reference data set: Geometry CWCTC 2005-2006

* Status: Design Started

==== Facebook-style KC Selection ====

As a researcher working with KCs, I want to select KCs based on the learning curve thumbnail, so that I can see quickly which ones I'm interested in exploring more deeply.

* Feature already designed for v3.0, not implemented due to time constraints.
* Agreed this would be really useful. -- Kirsten Butcher, User Mtg, 1/31/2008
* Status: Guestimate: 20 days, need to revisit requirements document

==== Feedback after clicking a large sample on a large dataset ====

As a user of DataShop (first-time or not), I want some feedback and the ability to cancel after I do something that might take a long time (e.g., clicking "All Data" on a large dataset) so that I do not get stuck.

* Part of the Susan Goldman story
* After clicking a sample for a large dataset, there is no "Loading..." text, no feedback that the click was even registered by the app (besides the sometimes busy cursor and small browser "loading" text), nor the ability to cancel the action. 
* We will always have similar problems even if performance is improved, so providing feedback and the opportunity to cancel is critical.

==== Save Button in Problem Navigation Box ====

* Save buttons in the sidebar. -- Ken Koedinger, Mtg 2006
** Could also put one in the Problem selection box in the sidebar.

==== Make Nav Bar Wider ====
* Make the Sample name and description fields much wider. -- John LaPlante, email 7/10/2007

== New Visualizations/Reports ==

==== Student-KC Rollup ====
As a researcher, I want to see KCs rolled up by student, so that I can ...

* Vincent Aleven, User Mtg, 1/29/2008
** By Student-KC would be more useful than by Student-Problem
** Example: # Steps asking for a hint or error or what proportion had help
** How often bottom out hint occurs

==== Instructor Reports ====
* Phil said he received a lot of positive reactions to providing reports on units for instructors. -- Phil Pavlik, ET Mtg 10/10/2007

==== Manage Authorizations/Projects Page ====
* Lisa Anthony, email 10/23/2007
* Allow PI to change permissions on the datasets.
:"Actually, I couldn't see how to change permissions on the datasets from the website. Is this possible? If not, it might be a nice feature..."

==== Calculate Time Spent on Different Study Activities ====

As a researcher, I want to know how much time, on average, students spend on study activities, so that I can ...

* Bruce McLaren, Email, 4/7/2009
<blockquote>
For my most recent stoich study, Shawn and I are interested in calculating timing information such as: (a) how long students spent, on average, working on individual tutors (b) how long students spent, on average, on all items in an intervention (c) how long students worked, on average, on post-tests. 

Timing information is very commonly required for studies, and can be calculated from DataShop logs relatively easily, so even if we don't have it, might be worth considering. (And we don't want to re-invent the wheel, if you already have it or are planning it...)
</blockquote>

==== Incorrect Step Duration and Hint Step Duration ====

As a researcher, I want to be able to see total step duration if the student's first attempt was an incorrect attempt, and total step duration if the student's first attempt was a hint request, so that I can do some analyses that I can't do with "Error Step Duration".

* Bob Hausmann, email, 11/11/2008.
* Updated title and story with 'step duration' instead of 'time'. -- [[User:Alida|Alida]] 10:36, 4 September 2009 (EDT)

==== Grading ====

[[Grading]]

== Reports ==
=== Dataset Info ===

==== Pointers to Hard-copy Data ====
* Brett van de Sande, NSF Site Visit, 5/28/2008
* Pointers to hard-copy data such as paper tests and/or homework.  Include contact information.  It doesn't seem to make sense to scan a whole filing cabinet of paper if no one wants to look at it.  And any secondary researchers don't know about the filing cabinet to ask for it.

==== Sort Problem Breakdown Table ====

* Would like the ability to sort the table by clicking on the column headings of the Problem Breakdown Table on the Dataset Info Tab. -- Bruce McLaren, User Mtg, 11/5/2007

==== Rename dataset ====

As a researcher, I want to rename my dataset so that it makes more sense to other people. I also want to make sure the dataset doesn't become polluted later by new data not associated with my study.
* Ruth Wylie, July 3, 2008
* There are reasons she would want to do this (current name is worthless, other researchers might try her tutor and pollute her data) but also reasons for not doing it (log more data later).
* There are risks in changing a dataset name that might not be apparent. For example, if you want the new data in the same dataset. [[User:Alida|Alida]] 10:27, 4 September 2009 (EDT)

==== Average time per problem, average number of problems, total number of sections ====

In addition to showing student hours per dataset, it would be useful to know the average time spent per problem, average number of problems completed, and the total number of sections.
* Noboru Matsuda, June 18, 2010

=== Error Report ===

==== View By Student ====

* Would like to see what a couple of students saw in the feedback. -- Marsha Lovett, 10/11/2007

==== Export ====
* I would like the ability to export this data. -- John LaPlante, email thread 'Suggestions for Improvement' 7/10/2007
* Also interested in this feature idea. -- Bruce McLaren, User Mtg, 11/5/2007

==== Sort ====

* Was planned for but not implemented in v2.1 (estimated to be a 4 day task)
** By Correctness %, starting with the least correct
** By Hints %
** Step (or KC if view by KC)
** Number of Students
* Ability to sort problems by their average experienced position within the curriculum. -- Ken Koedinger, 02/16/2007
** Which problem did students most often experience first, then the one experience second most often, ...
* Order steps by the order they typically are executed by students. -- Ken Koedinger, email 11/7/2008
:"Searching through the steps in a problem to get a sense of what is going on is currently hard because the steps are ordered alphabetically, not by the order in which most students did them. While not all students do all steps in the same order, there is some regularity there. It would be quite useful if the steps could be ordered in a "typical order". This could be accomplished by using the time stamps (of the first (correct?) transaction?) for each step to determine rank order of the each step for each student in a problem and then for each problem average the rank order of each step across all students. Then arrange the steps in the Error Report by their average rank order -- that is, roughly speaking, the step that is most likely to be first across students (closest on average to first) goes first, the step with the next lowest rank goes next, etc."
* The capability to count the number of errors of each message type and sort in different ways, for instance by all errors that had no messages. -- Bruce McLaren, email 10/22/2007
:"This is an error analysis I recently did in Excel, using pivot tables, that might be handy if in the DataShop. This one is very important for tutors because the errors that occur most frequently, yet don't elicit messages to the students, are good candidates to become errors with feedback."

=== Export ===

==== Last attempt on step? column for transaction format ====

* Include a new column that shows whether the row is the last attempt on a step for a student or not. Could be 0 or 1 as value. Helpful for researchers who are doing grading of data. Transaction format. --Vincent Aleven, CTAT mtg 11/5/2010

==== Elapsed Time ====

* Include the elapsed time in preview and transaction export. It is more valuable than the transaction time as an absolute reference. Possible to keep both. --Ken Koedinger, Team Mtg 04/18/2008

==== SQL Format ====

* Option to export as an SQL file. -- Ken Koedinger, 03/26/2007, also brought up in June ET Meeting
** Ability to export an SQL dump of a dataset. --Kyle Cunningham, 04/03/2007

==== Specify Character for Blanks ====

* Ability to specify what character if any is used for blank. --Ryan Baker, email 8/9/2007
:"Not all tools handles TABTABTAB correctly on import. The period '.' is used to mean missing data in most stats packages. The word 'BLANK' is used in some other ones. Not an issue for Ryan as he wrote a preprocessor to convert blanks.

==== Opportunity (at KC) Count in Transaction Export ====
* Include the opportunity count in the transaction export (it's only in the student-step rollup) -- Noboru Matsuda, 10/08/2009
** Would be repetitive
** Would need one Opportunity column per KC

==== Export only rows that have KCs tagged ====
* Is it possible when a exporting dataset to include only the [transaction?] rows that have knowledge components tagged? And the same goes for KC models export, only include the items that have KC tagged? -- Hui Cheng, 01/19/2010
** We have the inverse of this option on the Performance Profiler, 'Include steps without a knowledge component', and with the Student-Problem export, 'Include Steps without KCs'.

==== Include Custom Fields in Student-Step Rollup ====
* Is it possible to include the custom field in the datashop [step] export? -- Hui Cheng, 03/01/2010
** "I am thinking about the student-step level export. We have a student from Statistics wanting to use assistment data from datashop. She wants to have problem set id (=curriculum id in datshop). For Assistment data, we decided to put problem set id in the custom field because in the Assistment a lot of problems are repeated in different problem sets. She needs data at the student-step level." (Hui email)

==== Don't duplicate rows in student-step format when not showing KCs ====
* If the checkbox to show knowledge components is not checked, maybe it doesn't make sense to show rows more than once if they have more than one KC associated with the step. (Mimi (and Brett) stumbled on this. 8/16/2010)

==== Student-Step Rollup include Success Column ====
* Step rollup, 1 if correct, 0 if incorrect/hint, blank otherwise call it Success. ~~ Ken Koedinger, DataShop Team Meeting, Oct 22, 2010

=== Learning Curve ===

==== Default sort by observation on LCPID ====

* Default sort by observation on LCPID. ~~ Ken Koedinger, DataShop Team Meeting, 10/22/2010

==== Purple Point ====
* Purple Point: if a point on the LC has more than one KC associated with it but you have drilled down to a given KC, then the blue line is off. We could put a purple point that takes this into account.
* Simpler thing: display a warning message that some points in the display are driven by other KCs
* Pearson may be interested
* This was mentioned during the PSLC Summer School 2010.
* For a step with multiple skills, attribute the error only to the skill with the highest overall error rate. ~~ Alida, meeting with Ken, November 18, 2010

==== Reduce Scrolling ====
* Add a forward and back button to the graph to reduce scrolling. -- John LaPlante, email 7/10/2007

:"The learning curve page could use a forward and back button to cycle through learning the learning curves. Going through them one by one requires a lot of scrolling."

==== Turn On Point Labels ====

* It would be nice to have the option to turn on point labels. It is nice that I can mouse over a point and view the data but it would be nicer if it appeared automatically. -- John LaPlante, email 7/10/2007

==== Option for Bigger Graph ====

* Allow user to see bigger graph. -- Derek Chan, Winter Workshop 1/23/2008
** Potential solution: enable user to set x, y scale manually

==== LC Normalize Scale of Thumbnails ====
* [[LC Normalize Scale of Thumbnails]]

=== Performance Profiler ===

==== Rename Performance Profiler ====
* John laPlante, email thread 'Suggestions for Improvement' 7/10/2007
** Did not use this report as thought it had something to do with improving the performance of the DataShop itself. This report might have been much better to use then the Error Report, would still need an export as using the data in a tabular form was still necessary. Note that the pivot tables created were added to the dataset (Pittsburgh Science of Learning Center Stoichiometry Study 1).
* Lisa Anthony, email 8/2008
** Didn't know to go to the report. Export would be useful.
** Needed a better definition of Error Rate with respect to Problem and Unit rows.

==== Export ====
* John laPlante (see comments in Rename Performance Profiler)
* Lisa Anthony (see comments in Rename Performance Profiler)
* Yes, I too can imagine wanting to export the results of a particular performance profiler output (i.e., to a table) so that I can graph it my own way. --[[User:Koedinger|Koedinger]] 16:24, 16 September 2009 (EDT)

==== Table View ====
* Add option to switch to a table view.
* Columns are: Problem Name, Steps, % incorrect, Incorrect Steps, % hint, Hint Steps, etc. include all values in pop-up.

==== Union of KCs/Problems/Students ====
* Allow user to get the union of KCs/Problems/Students etc so they can compare across samples easier. -- Kirsten Butcher, Winter Workshop 1/23/2008
:"It is difficult to compare performance profiler graphs across samples because the KCs (or problems, or whatever) aren't necessarily in both of those samples."

==== Show Details In Report ====
* Click on bar to see details in report and not just in pop-up. It disappears too quickly. -- Alida, Brett

==== Show More Information in the Graph ====
* Show more information in the graph: [Bruce M, User Mtg, 11/5/2007]
** Had drilled down by a certain skill - skill is not listed in the graph, user has to check the skill list on the LHS to see what skill was selected
* Maybe related to Bruce's "show more info in graph": show the actual value of the range variable. e.g., when range is error rate, show the error rate number somewhere (right now you can see, via mouse-over, the incorrect, hint, and correct percentages, but not the error rate) -- Brett
* Design idea to show # steps incorrect, # steps hint, # steps correct to clarify how the percentages are calculated. -- Alida, Brett

== Sample Selector ==

==== Sub-Samples ====

* Sub Samples would be helpful. -- John LaPlante, email 7/10/2007
:"A nice solution would be to have sub-samples where one property varies. When I'm doing this analysis, I've changed my sample many times, renaming it sometimes, tweaking it to get variations on the data. The samples are really useful but they could help me a lot more with this kind of experimentation."

==== Filter out students ====

As a researcher, I want to filter out test users (including myself) from my data so that I see less noise in the data.

* Ruth Wylie, July 3, 2008
* You can already filter out test users by using a test user id that starts with 'weirdCMUuser_xxx'. Then create a sample that excludes students with a name like 'Test_%'. --[[User:Alida|Alida]] 10:29, 4 September 2009 (EDT)

==== Filter by "Class" ====

As a researcher, I want to create samples based on "class" since class is how I've encoded my conditions.

* Maaike Waalkens, July 30, 2010
* This is what made sense using Mathtutor for tutor delivery.
* What other fields are we missing in the Sample Selector?

==== Filter by "Step" ====

As a researcher, I want to create samples based on "step" since I'm only interested in one particular step.

* Mimi McLaughlin, August 17, 2010

==== Create Sample Automatically ====

* Would it be possible for me to get a random sample from the 'Bridge to Algebra 2006-2007' dataset of 100 students? I am having trouble looking at the data because it takes too long to load, and my adviser thought that was because the dataset was too large. ~~ DataShop User, 10/19/2010

== Web Services ==

==== Use Custom Fields Graphs/Reports ====
In Graphs

As a researcher creating custom fields and assigning values at the transaction level, I want DataShop to perform the aggregation to the step level so that I can do other things with my custom-field variable such as graph it. -- Ryan Baker, mtg w/Alida & Brett, 12/15/2008

Performance Profiler
* Use all discrete variables/custom fields on the left and all the continuous variables on the bottom of the Performance Profiler. -- [[User:Koedinger|Ken Koedinger]], DataShop Team Meeting, Sept 11, 2009

==== Ad-hoc queries ====

* Allow restricted filtering on steps and transactions as the next web service feature (after CFs), whatever they can filter on in the navigation boxes (User Meeting AAR, December 9, 2009)

==== Sample creation as a web service ====

* Sample creation is still too slow. (User Meeting AAR, December 9, 2009)

 
----
See completed [[DataShop 3.x Features]] 
See on-going [[DataShop 4.x Features]] 
See prioritized [[DataShop Feature Wish List]]
[[Category:Protected]]
[[Category:DataShop]]

Collected User Requests

2011-02-04T16:01:13Z

Bleber:

See prioritized items on [[DataShop Feature Wish List]].

== Annotations ==

==== Annotations on Transaction Level ====
* I have models which can annotate things like: gaming, bored, etc. on the transaction level. -- Ryan Baker, ET Mtg 12/5/2007

==== Annotations on Student Level ====
* Would annotate on student level. -- Ido Roll, User Meeting, 1/19/2009

==== Annotations on Pages ====
* See the cool thing created by Jeffery Heer where all the settings of the page were recorded with the comment. -- Ryan Baker, DS Team Mtg 5/23/2008

==== Dataset Discussion - Capture data-integrity issues ====
* As a stakeholder in the DataShop project, I want to capture and publicize the data-integrity issues discovered with data sets so that data is better documented (and so we've fulfilled a promise to our funders to better document data). -- Ken Koedinger, Team Meeting, 8/15/2009
* As a user of DataShop, I want to discuss datasets and have that discussion attached to the dataset so that others can comment and better understand any data-integrity issues I've found.

==== Linking to internal pages ====

* It would be handy if a link can be saved to any dataset, sample, page in the DataShop. -- [[Alida]], 10/18/2007
** Currently, https://pslc-qa.andrew.cmu.edu/datashop/DatasetInfo?datasetId=793 works if you are already logged in.

==== Have a link from the DataShop to the Theory Wiki (Dataset to Project Page) ====

* Can we link from the dataset to the project page on the Theory Wiki? In the pipeline have a clickable link to the project page (make project name clickable). -- Michael Bett, ET Mtg 11/14/2007
** Link to a dataset directly? Is that obvious to users? Click on dataset link -> log in -> redirected back to dataset. -- Brett Leber

== Data Format ==
==== Capture the question prompt and answer choices the student chose from ====
* As a research, I want to be able to identify steps based on the question prompt—not the difficult-to-understand step names that come from selection and action of my tutor—so that I can analyze the data more easily. For multiple choice questions, I also want to be able to see all of the choices that were available to the student. -- Eli Silk - February 1, 2011 (meeting with Brett and Ross Higashi of the FIRE project)
** Near-term solution is to create a table locally that maps steps to prompts
** Another solution is to ask CTAT team to modify their Flash components so that they log this information as custom fields
** Long-term solution is to make these fields into standard fields in the tutor message format

== Data Modeling ==
=== Non-KC Modeling ===

==== Automatic Distillation ====
* As an educational data miner wishing to develop a machine learned model with PSLC data, I would like to be able to automatically distill data features (e.g. custom fields) commonly used in past educational data mining research for a new data set (see, for instance, Baker, Corbett, Roll, & Koedinger, 2008 in UMUAI) -- Ryan Baker, Summer 2008, Startup Memo
** Could be implemented as a plug-in
* Also interested in this feature idea. -- Dan Franklin, Oct 2008

==== Upload model and apply it to new data set ====

* As an EDM researcher, I would like to take a model, expressable as a linear formula on DataShop fields, or a simple code procedure (e.g. Bayesian Knowledge Tracing, which Ryan has code for), and apply it to a new data set, so that I can ... -- Ryan Baker, Sept 2008
* Also interested in this. -- Maxine Eskenazi, Sept 2008
* May work best as a plug-in
** Code to display GUI to choose which data sets to use, calls model code, re-import to DataShop
** Good to have a way to apply many models, as soon as you import a data set
* Phil has an idea that maybe fits within this one. Please move if there's a better category. -- Brett Leber
<blockquote>This [''transaction? kc? --ed.''] relabeling is really mostly about enabling modeling in DataShop right? With this in mind, I think that it is actually a higher priority to have model alternatives in DataShop.... E.g. Investigators should be able to give you chunks of Java code according to a certain specification, and DataShop should be then able to run these over datasets (perhaps after a certain series of QA occurs according to an SOP) when the investigator clicks some button in DataShop.... Obviously this is a much large project than adding columns, but it is also much more important in my mind. ''-- Phil Pavlik, email to Brett on 1/14/2009''</blockquote>
* Examples:
** Example: running gaming detector in multiple tutors and comparing gaming frequencies
** Example: applying Bayesian Knowledge Tracing to a new data set from the same LearnLab
** Example: applying Ben Shih's models to many data sets. Note that Ben Shih should be included in design of this feature; he is interested, and has a lot of good ideas.

==== Add Different Predicted Values ====
* Would also like to add statistics, different predicted values than what LFA produces. -- Ken Koedinger, ET Meeting, 10/10/2007

==== Bayesian Knowledge Tracing ====
* Bayesian Knowledge-Tracing built into DataShop like LFA is. -- Ryan Baker, Startup Memo, Summer 2008

==== Richer statistics for KC modeling ====

* In addition, to the model stats and estimates generated for learning factor models, we should also create difficulty factor models (i.e., ones with "Slope" parameter). The latter is particularly relevant for the Unique-Step model where the slope parameter is meaningless (but still counts against the BIC value). -- Ken Koedinger, Email "new feature request", 1/22/2009
** Relatedly, we should report significance values on the Slope parameters -- that is, when is the Slope significantly different from 0.
** The KC models page perhaps should also report the log-likelihood and number of parameters (in addition to BIC) and leave out AIC. We might also consider other metrics of model generality, like the "adjusted R2" (if I have this name right -- Joe Beck mentioned in the Assistments meeting yesterday).
** These changes will be part of meeting the CMDM goal of improvement in (or at least demonstrate acceptability of) the cognitive models in 90+ units in our LearnLab courses (or affiliates).

=== KC Modeling ===

==== LFA/AFM: Check if enough memory using formula ====
* The LFA/AFM code could calculate how much RAM would be needed to run the algorithm on a given skill model using the formula provided by Hao. This formula is based on the number of transactions, number of students and number of skills. Right now it will not schedule itself to run on a model with over 300 skills, though there is a manual override. [[User:Alida|Alida]] 13:35, 29 November 2010 (EST)

==== Create KC Models through Web Services ====
* For John Stamper's CMDM project, it would be nice to automatically update KC Models through web services. -- [[User:Koedinger|Ken Koedinger]], DataShop Team Meeting, Sept 11, 2009

==== Automatically discovering new KC model ====

* Would it be possible to run some code (perhaps Hao's KC model selection code, perhaps something else generated by CMDM thrust) to find new best KC model. -- Vincent Aleven, Sept 2008
* As a learning sciences researcher, I would like DataShop to discover a new/better KC model for me.
* Could be done as a plug-in

==== Generate new KC Models with LFA ====
* Not sure who asked for this.
* It would be nice to generate new KC Models with Hao's LFA code
* Would need to specify factors.
* Ideas on where this could run?
** On a separate server? Request it to be run, specify duration. Have separate server queue up requests, email user when done.
** In Java Applet on client machine? -- Phil Pavlik

==== Same Skill Twice on Same Step ====
* Would like to be able to apply the same skill to a step twice during a KC Model Import. -- Ken Koedinger, email, 2/4/2009

==== Save KC Model Import Files ====
* KC Model Import - save the file used to create the KCMs in case we need to recreate them. -- Ken Koedinger, email 3/4/2009

==== Log Likelihood and MAD ====
* Log Likelihood, MAD (mean absolute deviation) problem, MAD step (store and show) -- Hao Cen
** This is a variation on "Richer statistics for KC modeling" above. Probably should be merged. - Ken Koedinger

==== Better naming for KCs in auto-generated Unique-Step KCM ====

As a researcher, I want the KCs in the Unique-Step model to have better names than KC1, KC2, etc, so that I can easily tell which generated KCs go with which unique step.

* Hui Cheng (Email 1/20/2009), Ken Koedinger (Email 1/22/2009)
* Could you put this in your new feature request list: could the “Unique-step model” be better labeled then just “KC1”, “KC2” and etc? For example, for Assistments data, you could use part of the “Step Name”. -- Hui Cheng, Email 1/20/2009
* But, anything is better than "KC". -- Ken Koedinger, email, 1/22/2009
<blockquote>A simple alternative, that preserves uniqueness and addresses length, is to concatenate: 1) the first K letters of the step name 2) a unique numerical increment (same as the "3" in "KC3"). Note that (2) guarantees uniqueness just as it does in the current "KC<num>" scheme. Or perhaps better given that step names are often scoped within problems, is to concatenate: 1) the first L letters of the problem name 2) the first M letters of the step name 3) a unique numerical increment (just like the "3" in "KC3") I think K or L+M should be as big as possible without making the KC names indistinguishable (because they run off the right margin) in the KC list on the Learning Curve and other pages. </blockquote>

==== Visualize Learning Curve Split ====
* Be able to visualize a learning curve split into 2 based on a specification of a subset of problems. -- Albert Corbett, Math CCM, November 2008
* Harder: make me a new KCM out of it. -- Ken Koedinger, Team Mtg, Dec 5, 2008, while trying to describe Albert's request to Alida

==== Statistical Significance ====
* Can DataShop determine if the difference between conditions or learning curves is statistically significant? -- general theme at workshop, probably mentioned by Bob Hausmann in his talk, Winter Workshop 1/23/2008
** Can't do it yet in DataShop, but I can show you how to do it in R (or SPSS...) after you've exported the data -- export the "student-step rollup" than the whole transaction table. --[[User:Koedinger|Koedinger]] 16:30, 16 September 2009 (EDT)
*** This comment perhaps belongs (exists?) elsewhere: The current student-step rollup export (relly, all exports) should be such that I can immediately load it into R (and other packages) without error. Now errors occur, for instance, because there are "#" in the variable names i the student-step rollup. However, it is currently a road-block for helping folks like Bob do analyzes they want to do. --[[User:Koedinger|Koedinger]] 16:30, 16 September 2009 (EDT)
*** See [[Condition in Student-Step Rollup]] -- [[User:Alida|Alida]] 10:35, 17 September 2009 (EDT)

==== Split and Merge Skills ====

* Hand searches though a p-matrix for a dataset to split and merge skills. (Pie in the Sky) -- Ken Koedinger, Team Mtg, 02/22/2008

==== Notes on new KCMs ====

* It would be good if I could add a note to a KC model that was newly imported. -- Noboru Matsuda, email, Nov 19, 2009

==== Display Learning Curves Grouped by Interestingness ====

* The page displaying all the learning curves today seems to be alphabetically sorted by KC name, which is not necessarily meaningful.
* An alternative presentation is to group the curves into 4 sets, breaking up the page. Set 1 has curves that contain significant spikes, and therefore seem to be "low-hanging fruit" for purposes of breaking up into KCs. Set 2 has curves with few spikes, but they have a long X axis, suggesting that students are presented with too many opportunities to acquire those KCs. Set 3 has the "good" curves, i.e., nicely decreasing curves that are not too long. Set 4 is "other". -- [http://www.pitt.edu/~goldin Ilya Goldin] 7 December 2009

== Developer Requests ==

==== Store Converted Date and Converter Info ====

As a DataShop administrator, I'd like to see the converter information (version and date) stored in the database, so that I do not have to store that data manually in the Additional Notes field each time I load a dataset. -- Kyle, 8/5/2008

* DTD new fields:
** Store conversion and converter information in database (anything else?)
*** conversion_time
*** converter_info
* Email from Octav, 10/5/2007

I see there's also a note about the converter version in the Dataset
Info. Which is good, but it seems it's taken from the directory name
when I submitted the set. I don't know how reliable that is.  :-) It
would be better if it's taken from the new converter info field.

==== Convert from XML to tabbed-delimited format ====

* If the users agree that export format is valuable, then maybe if they could convert from XML to export format to see data in Excel, could look at Selection column and see blanks more easily. -- Jonathan Sewall, ET Mtg, 10/10/2007

==== Plug-ins (general issues) ====

* Please please support Ruby on Rails. -- Ben Shih, December 2008

==== Create UI to grant DataShop user roles ====

* Already tedious.
* Alida, User Meeting AAR on December 9, 2009

== Help ==

==== Specialize Label of Help Button ====
* Since this help is better than in most applications, it should say more than just 'Help'. -- Ken Koedinger & Ryan Baker, Team Mtg, May 30, 2008
** Ideas:
*** 'Page Help'
*** 'Help with this page'
*** 'Help with Learning Curve page' (Ken's favorite)
*** 'Help with this tool'

== Home Page ==

==== Redesign the Home Page ====

* In the menu of data sets at the top, include the N (=20 or as many as fit on the screen?) data sets that I have visited starting with the ones I've visited most recently. --[[User:Koedinger|Koedinger]] 16:16, 16 September 2009 (EDT)
** As of today, the menu shows the last 10 data sets, most recently visited at the top. I think the feature is good enough, but let's ask Ken --[[User:Bleber|Bleber]] 10:56, 6 August 2010 (EDT)
* There needs to be a better ordering for the datasets (DS364)
* Maybe a search to filter the list of datasets since the list is so long. -- Brett Leber, 6/14/2007
* Going back to the home page always goes to 'My Datasets' (DS313)
* Maybe show more high level stats on this page, like how many transactions, students, skill models
* Allow users to post and share project documentation (files, papers, other meta-info) -- Ruth Wylie, suggested during meeting on 8/4/2010. She had a file that is relevant to multiple datasets in her project.

== Import ==
== Miscellaneous ==

==== Analyses by LearnLab ====

* Organize data by LearnLab, not by "data set". -- Ryan Baker, Aug 2008
* Also: Bob Hausmann, Sep 2008; Maxine Eskenazi, Sep 2008
* Essentially, current data sets become samples, but the top-level unit is the LearnLab. You can take every data set in a LearnLab together as a sample.
* Implies being able to run analyses across data sets, and export multiple data sets together; to create multi-data set samples
* As a user of DataShop, I would like to look at learning curves for all Algebra data together (for example), or export all Algebra data
* Important long-term, but is a lot of work -- in particular, we need to solve scalability issues first.

==== Save Settings Between Sessions ====
* It would be useful if DataShop could save settings between sessions. -- Bob Hausmann, User Meeting, 2/1/2008
** "I do a lot of redoing the same steps" (e.g., set cutoffs, select a KC model, select students).

==== Multiple steps per transaction ====

* Needed so that we do not have to create multiple transactions for the same actual action for Andes logs. -- Kurt van Lehn, Feb 2007

==== Demographic data ====

* This has been mentioned by NSF visitors, AB, ESL, and some researchers.
* Also mentioned at Winter Workshop 1/23/2008.
** Derek/Sue-mei: Student background information not in DataShop. Would like to see a student or set of students from a particular demographic, and view them across datasets!
* Note that Gail added demographic data to Additional Notes field on the Dataset Info page for many datasets. The idea here is to put that data into the database somewhere.

==== Single Sign On ====
* Michael Bett, email, 10/8/2007
* It would be nice if the following services have a single login account/password:
*# Theory Wiki
*# Learnlab.org
*# ESL's OSS
*# DataShop

==== Reveal unanonymized student IDs ====

As a researcher/PI performing research assistant tasks, I want to see easily the unanonymized student IDs of subjects in DataShop so that I can email my subjects telling them when to use my system.

* Ruth Wylie, July 3, 2008
* As the honest broker of the PSLC data, I have promised to not reveal the student IDs and to protect the identity of the students. This is part of the DataShop IRB. Therefore, I do not see this request as possible. [[User:Alida|Alida]] 09:53, 4 September 2009 (EDT)
** At some point, some coordination would be good with OLI's Digital Dashboard project that Marsha Lovett (and the OLI team and sometimes me) is working on. This is meant to provide usage information more quickly to instructors. It could also perhaps be used by researchers (with the right IRB rights) in situations like Ruth's.

==== Knight Timeline ====
* Developed by Andrea Knight, 2004

==== Buggy Skills ====
* Ken Koedinger, prototype walkthrough 9/11/2006

==== Confusion Matrix ====
* Brian MacWhinney, prototype walkthrough 9/11/2006

== Navigation Bar ==
==== Filter KCs by Name ====

As a researcher working with KCs, I want to filter KCs based on their names, so that I can...
* Vincent Aleven, Email, 2/3/2008
* "Since Alida said you cannot have two mechanisms for putting together your KC set (i.e., cannot have both the selecting-by-clicking and selecting-by-filtering), I would probably opt for the latter."
* Alida: I thought Vincent mentioned that he'd like to select which KCs are in a set by filtering on the name. Example: Include KCs with '*reason*' in the name and exclude KCs with '*given*' in the name.
* This could be an addition to our v3.0 KC-selection mechanism--filter by name.
* Vincent, Email, 5/6/2009: Expressed another need for this feature. Could just allow for a wider area and longer list so that more items can be checked at once. The number of characters we show right now is not enough because in many cases that number of characters is the same across many of the skills. Reference data set: Geometry CWCTC 2005-2006

* Status: Design Started

==== Facebook-style KC Selection ====

As a researcher working with KCs, I want to select KCs based on the learning curve thumbnail, so that I can see quickly which ones I'm interested in exploring more deeply.

* Feature already designed for v3.0, not implemented due to time constraints.
* Agreed this would be really useful. -- Kirsten Butcher, User Mtg, 1/31/2008
* Status: Guestimate: 20 days, need to revisit requirements document

==== Feedback after clicking a large sample on a large dataset ====

As a user of DataShop (first-time or not), I want some feedback and the ability to cancel after I do something that might take a long time (e.g., clicking "All Data" on a large dataset) so that I do not get stuck.

* Part of the Susan Goldman story
* After clicking a sample for a large dataset, there is no "Loading..." text, no feedback that the click was even registered by the app (besides the sometimes busy cursor and small browser "loading" text), nor the ability to cancel the action. 
* We will always have similar problems even if performance is improved, so providing feedback and the opportunity to cancel is critical.

==== Save Button in Problem Navigation Box ====

* Save buttons in the sidebar. -- Ken Koedinger, Mtg 2006
** Could also put one in the Problem selection box in the sidebar.

==== Make Nav Bar Wider ====
* Make the Sample name and description fields much wider. -- John LaPlante, email 7/10/2007

== New Visualizations/Reports ==

==== Student-KC Rollup ====
As a researcher, I want to see KCs rolled up by student, so that I can ...

* Vincent Aleven, User Mtg, 1/29/2008
** By Student-KC would be more useful than by Student-Problem
** Example: # Steps asking for a hint or error or what proportion had help
** How often bottom out hint occurs

==== Instructor Reports ====
* Phil said he received a lot of positive reactions to providing reports on units for instructors. -- Phil Pavlik, ET Mtg 10/10/2007

==== Manage Authorizations/Projects Page ====
* Lisa Anthony, email 10/23/2007
* Allow PI to change permissions on the datasets.
:"Actually, I couldn't see how to change permissions on the datasets from the website. Is this possible? If not, it might be a nice feature..."

==== Calculate Time Spent on Different Study Activities ====

As a researcher, I want to know how much time, on average, students spend on study activities, so that I can ...

* Bruce McLaren, Email, 4/7/2009
<blockquote>
For my most recent stoich study, Shawn and I are interested in calculating timing information such as: (a) how long students spent, on average, working on individual tutors (b) how long students spent, on average, on all items in an intervention (c) how long students worked, on average, on post-tests. 

Timing information is very commonly required for studies, and can be calculated from DataShop logs relatively easily, so even if we don't have it, might be worth considering. (And we don't want to re-invent the wheel, if you already have it or are planning it...)
</blockquote>

==== Incorrect Step Duration and Hint Step Duration ====

As a researcher, I want to be able to see total step duration if the student's first attempt was an incorrect attempt, and total step duration if the student's first attempt was a hint request, so that I can do some analyses that I can't do with "Error Step Duration".

* Bob Hausmann, email, 11/11/2008.
* Updated title and story with 'step duration' instead of 'time'. -- [[User:Alida|Alida]] 10:36, 4 September 2009 (EDT)

==== Grading ====

[[Grading]]

== Reports ==
=== Dataset Info ===

==== Pointers to Hard-copy Data ====
* Brett van de Sande, NSF Site Visit, 5/28/2008
* Pointers to hard-copy data such as paper tests and/or homework.  Include contact information.  It doesn't seem to make sense to scan a whole filing cabinet of paper if no one wants to look at it.  And any secondary researchers don't know about the filing cabinet to ask for it.

==== Sort Problem Breakdown Table ====

* Would like the ability to sort the table by clicking on the column headings of the Problem Breakdown Table on the Dataset Info Tab. -- Bruce McLaren, User Mtg, 11/5/2007

==== Rename dataset ====

As a researcher, I want to rename my dataset so that it makes more sense to other people. I also want to make sure the dataset doesn't become polluted later by new data not associated with my study.
* Ruth Wylie, July 3, 2008
* There are reasons she would want to do this (current name is worthless, other researchers might try her tutor and pollute her data) but also reasons for not doing it (log more data later).
* There are risks in changing a dataset name that might not be apparent. For example, if you want the new data in the same dataset. [[User:Alida|Alida]] 10:27, 4 September 2009 (EDT)

==== Average time per problem, average number of problems, total number of sections ====

In addition to showing student hours per dataset, it would be useful to know the average time spent per problem, average number of problems completed, and the total number of sections.
* Noboru Matsuda, June 18, 2010

=== Error Report ===

==== View By Student ====

* Would like to see what a couple of students saw in the feedback. -- Marsha Lovett, 10/11/2007

==== Export ====
* I would like the ability to export this data. -- John LaPlante, email thread 'Suggestions for Improvement' 7/10/2007
* Also interested in this feature idea. -- Bruce McLaren, User Mtg, 11/5/2007

==== Sort ====

* Was planned for but not implemented in v2.1 (estimated to be a 4 day task)
** By Correctness %, starting with the least correct
** By Hints %
** Step (or KC if view by KC)
** Number of Students
* Ability to sort problems by their average experienced position within the curriculum. -- Ken Koedinger, 02/16/2007
** Which problem did students most often experience first, then the one experience second most often, ...
* Order steps by the order they typically are executed by students. -- Ken Koedinger, email 11/7/2008
:"Searching through the steps in a problem to get a sense of what is going on is currently hard because the steps are ordered alphabetically, not by the order in which most students did them. While not all students do all steps in the same order, there is some regularity there. It would be quite useful if the steps could be ordered in a "typical order". This could be accomplished by using the time stamps (of the first (correct?) transaction?) for each step to determine rank order of the each step for each student in a problem and then for each problem average the rank order of each step across all students. Then arrange the steps in the Error Report by their average rank order -- that is, roughly speaking, the step that is most likely to be first across students (closest on average to first) goes first, the step with the next lowest rank goes next, etc."
* The capability to count the number of errors of each message type and sort in different ways, for instance by all errors that had no messages. -- Bruce McLaren, email 10/22/2007
:"This is an error analysis I recently did in Excel, using pivot tables, that might be handy if in the DataShop. This one is very important for tutors because the errors that occur most frequently, yet don't elicit messages to the students, are good candidates to become errors with feedback."

=== Export ===

==== Last attempt on step? column for transaction format ====

* Include a new column that shows whether the row is the last attempt on a step for a student or not. Could be 0 or 1 as value. Helpful for researchers who are doing grading of data. Transaction format. --Vincent Aleven, CTAT mtg 11/5/2010

==== Elapsed Time ====

* Include the elapsed time in preview and transaction export. It is more valuable than the transaction time as an absolute reference. Possible to keep both. --Ken Koedinger, Team Mtg 04/18/2008

==== SQL Format ====

* Option to export as an SQL file. -- Ken Koedinger, 03/26/2007, also brought up in June ET Meeting
** Ability to export an SQL dump of a dataset. --Kyle Cunningham, 04/03/2007

==== Specify Character for Blanks ====

* Ability to specify what character if any is used for blank. --Ryan Baker, email 8/9/2007
:"Not all tools handles TABTABTAB correctly on import. The period '.' is used to mean missing data in most stats packages. The word 'BLANK' is used in some other ones. Not an issue for Ryan as he wrote a preprocessor to convert blanks.

==== Opportunity (at KC) Count in Transaction Export ====
* Include the opportunity count in the transaction export (it's only in the student-step rollup) -- Noboru Matsuda, 10/08/2009
** Would be repetitive
** Would need one Opportunity column per KC

==== Export only rows that have KCs tagged ====
* Is it possible when a exporting dataset to include only the [transaction?] rows that have knowledge components tagged? And the same goes for KC models export, only include the items that have KC tagged? -- Hui Cheng, 01/19/2010
** We have the inverse of this option on the Performance Profiler, 'Include steps without a knowledge component', and with the Student-Problem export, 'Include Steps without KCs'.

==== Include Custom Fields in Student-Step Rollup ====
* Is it possible to include the custom field in the datashop [step] export? -- Hui Cheng, 03/01/2010
** "I am thinking about the student-step level export. We have a student from Statistics wanting to use assistment data from datashop. She wants to have problem set id (=curriculum id in datshop). For Assistment data, we decided to put problem set id in the custom field because in the Assistment a lot of problems are repeated in different problem sets. She needs data at the student-step level." (Hui email)

==== Don't duplicate rows in student-step format when not showing KCs ====
* If the checkbox to show knowledge components is not checked, maybe it doesn't make sense to show rows more than once if they have more than one KC associated with the step. (Mimi (and Brett) stumbled on this. 8/16/2010)

==== Student-Step Rollup include Success Column ====
* Step rollup, 1 if correct, 0 if incorrect/hint, blank otherwise call it Success. ~~ Ken Koedinger, DataShop Team Meeting, Oct 22, 2010

=== Learning Curve ===

==== Default sort by observation on LCPID ====

* Default sort by observation on LCPID. ~~ Ken Koedinger, DataShop Team Meeting, 10/22/2010

==== Purple Point ====
* Purple Point: if a point on the LC has more than one KC associated with it but you have drilled down to a given KC, then the blue line is off. We could put a purple point that takes this into account.
* Simpler thing: display a warning message that some points in the display are driven by other KCs
* Pearson may be interested
* This was mentioned during the PSLC Summer School 2010.
* For a step with multiple skills, attribute the error only to the skill with the highest overall error rate. ~~ Alida, meeting with Ken, November 18, 2010

==== Reduce Scrolling ====
* Add a forward and back button to the graph to reduce scrolling. -- John LaPlante, email 7/10/2007

:"The learning curve page could use a forward and back button to cycle through learning the learning curves. Going through them one by one requires a lot of scrolling."

==== Turn On Point Labels ====

* It would be nice to have the option to turn on point labels. It is nice that I can mouse over a point and view the data but it would be nicer if it appeared automatically. -- John LaPlante, email 7/10/2007

==== Option for Bigger Graph ====

* Allow user to see bigger graph. -- Derek Chan, Winter Workshop 1/23/2008
** Potential solution: enable user to set x, y scale manually

==== LC Normalize Scale of Thumbnails ====
* [[LC Normalize Scale of Thumbnails]]

=== Performance Profiler ===

==== Rename Performance Profiler ====
* John laPlante, email thread 'Suggestions for Improvement' 7/10/2007
** Did not use this report as thought it had something to do with improving the performance of the DataShop itself. This report might have been much better to use then the Error Report, would still need an export as using the data in a tabular form was still necessary. Note that the pivot tables created were added to the dataset (Pittsburgh Science of Learning Center Stoichiometry Study 1).
* Lisa Anthony, email 8/2008
** Didn't know to go to the report. Export would be useful.
** Needed a better definition of Error Rate with respect to Problem and Unit rows.

==== Export ====
* John laPlante (see comments in Rename Performance Profiler)
* Lisa Anthony (see comments in Rename Performance Profiler)
* Yes, I too can imagine wanting to export the results of a particular performance profiler output (i.e., to a table) so that I can graph it my own way. --[[User:Koedinger|Koedinger]] 16:24, 16 September 2009 (EDT)

==== Table View ====
* Add option to switch to a table view.
* Columns are: Problem Name, Steps, % incorrect, Incorrect Steps, % hint, Hint Steps, etc. include all values in pop-up.

==== Union of KCs/Problems/Students ====
* Allow user to get the union of KCs/Problems/Students etc so they can compare across samples easier. -- Kirsten Butcher, Winter Workshop 1/23/2008
:"It is difficult to compare performance profiler graphs across samples because the KCs (or problems, or whatever) aren't necessarily in both of those samples."

==== Show Details In Report ====
* Click on bar to see details in report and not just in pop-up. It disappears too quickly. -- Alida, Brett

==== Show More Information in the Graph ====
* Show more information in the graph: [Bruce M, User Mtg, 11/5/2007]
** Had drilled down by a certain skill - skill is not listed in the graph, user has to check the skill list on the LHS to see what skill was selected
* Maybe related to Bruce's "show more info in graph": show the actual value of the range variable. e.g., when range is error rate, show the error rate number somewhere (right now you can see, via mouse-over, the incorrect, hint, and correct percentages, but not the error rate) -- Brett
* Design idea to show # steps incorrect, # steps hint, # steps correct to clarify how the percentages are calculated. -- Alida, Brett

== Sample Selector ==

==== Sub-Samples ====

* Sub Samples would be helpful. -- John LaPlante, email 7/10/2007
:"A nice solution would be to have sub-samples where one property varies. When I'm doing this analysis, I've changed my sample many times, renaming it sometimes, tweaking it to get variations on the data. The samples are really useful but they could help me a lot more with this kind of experimentation."

==== Filter out students ====

As a researcher, I want to filter out test users (including myself) from my data so that I see less noise in the data.

* Ruth Wylie, July 3, 2008
* You can already filter out test users by using a test user id that starts with 'weirdCMUuser_xxx'. Then create a sample that excludes students with a name like 'Test_%'. --[[User:Alida|Alida]] 10:29, 4 September 2009 (EDT)

==== Filter by "Class" ====

As a researcher, I want to create samples based on "class" since class is how I've encoded my conditions.

* Maaike Waalkens, July 30, 2010
* This is what made sense using Mathtutor for tutor delivery.
* What other fields are we missing in the Sample Selector?

==== Filter by "Step" ====

As a researcher, I want to create samples based on "step" since I'm only interested in one particular step.

* Mimi McLaughlin, August 17, 2010

==== Create Sample Automatically ====

* Would it be possible for me to get a random sample from the 'Bridge to Algebra 2006-2007' dataset of 100 students? I am having trouble looking at the data because it takes too long to load, and my adviser thought that was because the dataset was too large. ~~ DataShop User, 10/19/2010

== Web Services ==

==== Use Custom Fields Graphs/Reports ====
In Graphs

As a researcher creating custom fields and assigning values at the transaction level, I want DataShop to perform the aggregation to the step level so that I can do other things with my custom-field variable such as graph it. -- Ryan Baker, mtg w/Alida & Brett, 12/15/2008

Performance Profiler
* Use all discrete variables/custom fields on the left and all the continuous variables on the bottom of the Performance Profiler. -- [[User:Koedinger|Ken Koedinger]], DataShop Team Meeting, Sept 11, 2009

==== Ad-hoc queries ====

* Allow restricted filtering on steps and transactions as the next web service feature (after CFs), whatever they can filter on in the navigation boxes (User Meeting AAR, December 9, 2009)

==== Sample creation as a web service ====

* Sample creation is still too slow. (User Meeting AAR, December 9, 2009)

 
----
See completed [[DataShop 3.x Features]] 
See on-going [[DataShop 4.x Features]] 
See prioritized [[DataShop Feature Wish List]]
[[Category:Protected]]
[[Category:DataShop]]

Web Services - Add Custom Fields

2011-01-07T19:33:58Z

Bleber: /* Notes/Comments */

'''status:''' pslc member review of design proposal

== We are seeking your feedback on our current design proposal. ==
We've been discussing and designing a new feature for DataShop—the ability to add and modify custom fields for existing data in DataShop. '''We'd like your feedback!''' If you have a moment, take a look at the PDF below, pages 1-9 (the rest is an archive of our discussion that you're also welcome to read). Then send us feedback or add it to this page below.

In the document we talk a lot about web services (a way to programmatically retrieve and annotate data), but the idea of custom fields would be universal. Would this proposal meet your needs for annotating (adding columns to) your data? If not, how can it be improved?

'''Please add your comments by Wed January 19 2011''' (after which we'll be writing low-level requirements)

* [[media:WebServicesProposal2011-public.pdf|Web Services Custom Fields Proposal 2011 (PDF, 827 KB)]]

Feel free to add comments in the document and email it to datashop-help@lists.andrew.cmu.edu, or add comments on this wiki page.

== User Stories ==

# As a researcher (EDM, M&M, CMDM), I want to retrieve transaction data programmatically and append custom fields to transactions in DataShop so that I can more easily run my models on the data outside of DataShop and put the results back in.
# As a researcher who has created a custom field for an existing data set, I want to import that new field into the step rollup table so that it is both preserved and available in DataShop tools. - Ryan Baker, 12/15/2008

== Notes/Comments ==
* For permissions: it seems to me that anyone with only "view" access should at most be able to create private custom fields, i.e., fields viewable only by the creator. It seems like it should require "edit" access to create publicly-viewable custom fields. -- Geoff Gordon, 1/6/2011

* It seems like there should be an error when someone tries to add a custom field value for a key that doesn't exist, and at least a warning when someone tries to delete an object (e.g., a KC) which has custom fields attached. In the latter case, telling the system to delete the KC anyway should presumably lead to deletion of the corresponding custom field values. -- Geoff Gordon, 1/6/2011

----
See on-going [[DataShop 4.x Features]] 
See unordered [[Collected User Requests]] 
See [[Web Services]]
[[Category:Protected]]
[[Category:DataShop]]

Web Services - Add Custom Fields

2011-01-07T19:04:46Z

Bleber: /* Notes/Comments */

'''status:''' pslc member review of design proposal

== We are seeking your feedback on our current design proposal. ==
We've been discussing and designing a new feature for DataShop—the ability to add and modify custom fields for existing data in DataShop. '''We'd like your feedback!''' If you have a moment, take a look at the PDF below, pages 1-9 (the rest is an archive of our discussion that you're also welcome to read). Then send us feedback or add it to this page below.

In the document we talk a lot about web services (a way to programmatically retrieve and annotate data), but the idea of custom fields would be universal. Would this proposal meet your needs for annotating (adding columns to) your data? If not, how can it be improved?

'''Please add your comments by Wed January 19 2011''' (after which we'll be writing low-level requirements)

* [[media:WebServicesProposal2011-public.pdf|Web Services Custom Fields Proposal 2011 (PDF, 827 KB)]]

Feel free to add comments in the document and email it to datashop-help@lists.andrew.cmu.edu, or add comments on this wiki page.

== User Stories ==

# As a researcher (EDM, M&M, CMDM), I want to retrieve transaction data programmatically and append custom fields to transactions in DataShop so that I can more easily run my models on the data outside of DataShop and put the results back in.
# As a researcher who has created a custom field for an existing data set, I want to import that new field into the step rollup table so that it is both preserved and available in DataShop tools. - Ryan Baker, 12/15/2008

== Notes/Comments ==
* For permissions: it seems to me that anyone with only "view" access should at most be able to create private custom fields, i.e., fields viewable only by the creator. It seems like it should require "edit" access to create publicly-viewable custom fields.

* It seems like there should be an error when someone tries to add a custom field value for a key that doesn't exist, and at least a warning when someone tries to delete an object (e.g., a KC) which has custom fields attached. In the latter case, telling the system to delete the KC anyway should presumably lead to deletion of the corresponding custom field values.

-- Geoff Gordon, 1/6/2011

----
See on-going [[DataShop 4.x Features]] 
See unordered [[Collected User Requests]] 
See [[Web Services]]
[[Category:Protected]]
[[Category:DataShop]]

Web Services - Add Custom Fields

2011-01-06T15:02:13Z

Bleber: /* We are seeking your feedback on our current design proposal. */

'''status:''' pslc member review of design proposal

== We are seeking your feedback on our current design proposal. ==
We've been discussing and designing a new feature for DataShop—the ability to add and modify custom fields for existing data in DataShop. '''We'd like your feedback!''' If you have a moment, take a look at the PDF below, pages 1-9 (the rest is an archive of our discussion that you're also welcome to read). Then send us feedback or add it to this page below.

In the document we talk a lot about web services (a way to programmatically retrieve and annotate data), but the idea of custom fields would be universal. Would this proposal meet your needs for annotating (adding columns to) your data? If not, how can it be improved?

'''Please add your comments by Wed January 19 2011''' (after which we'll be writing low-level requirements)

* [[media:WebServicesProposal2011-public.pdf|Web Services Custom Fields Proposal 2011 (PDF, 827 KB)]]

Feel free to add comments in the document and email it to datashop-help@lists.andrew.cmu.edu, or add comments on this wiki page.

== User Stories ==

# As a researcher (EDM, M&M, CMDM), I want to retrieve transaction data programmatically and append custom fields to transactions in DataShop so that I can more easily run my models on the data outside of DataShop and put the results back in.
# As a researcher who has created a custom field for an existing data set, I want to import that new field into the step rollup table so that it is both preserved and available in DataShop tools. - Ryan Baker, 12/15/2008

== Notes/Comments ==
''Your feedback here''

----
See on-going [[DataShop 4.x Features]] 
See unordered [[Collected User Requests]] 
See [[Web Services]]
[[Category:Protected]]
[[Category:DataShop]]

Web Services - Add Custom Fields

2011-01-06T15:00:07Z

Bleber: /* We are seeking your feedback on our current design proposal. */

File:WebServicesProposal2011-public.pdf

2011-01-06T14:55:35Z

Bleber: DataShop design document for PSLC member review

DataShop design document for PSLC member review

DataShop

2011-01-05T20:55:59Z

Bleber:

The [https://pslcdatashop.web.cmu.edu PSLC DataShop] provides two main services to the learning science community:

* a central repository to secure and store research data
* a set of analysis and reporting tools

Researchers can rapidly access standard reports such as learning curves, as well as browse data using the interactive web application. To support other analyses, the DataShop can export data to a tab-delimited format that can then be used in statistical software and other analysis packages. Keep up-to-date on the latest DataShop news on our [http://pslcdatashop.org/about about] page.

== We need your help! ==

'''We are seeking your feedback on [[Web Services - Add Custom Fields|our current custom fields design proposal]].''' We've been discussing and designing a new feature for DataShop—the ability to add and modify custom fields for existing data in DataShop. We'd like your feedback '''by January 19, 2011'''! Please [[Web Services - Add Custom Fields|take a look]].

Please also help us decide which features to add to DataShop, as well as the order we'll build them. The more votes a feature has, the sooner we'll build it. Go to [[DataShop Feature Wish List]] to join the conversation.

== Features We Are Building or Have Built ==
* [[DataShop 4.x Features]] — ''Sep 2009 – present''
* [[DataShop 3.x Features]] — ''Nov 2008 – Aug 2009''

== How to Request a Feature ==
* [[Write a User Story]]
* [[Create a Feature Page]]
* Add Link to Feature on [[Collected User Requests]] page.
 
----
See completed [[DataShop 3.x Features]] 
See on-going [[DataShop 4.x Features]] 
See prioritized [[DataShop Feature Wish List]] 
See unordered [[Collected User Requests]] 
See the [[:Category:DataShop Glossary|DataShop Glossary]] 
See [[Web Services]]
[[Category:Protected]]
[[Category:DataShop]]

Web Services - Get Transactions and Student-Step Records

2011-01-05T20:49:06Z

Bleber:

'''status:''' done, released with DataShop v4.0 December 2009

== User Stories ==

# As a researcher (EDM, M&M, CMDM), I want to retrieve transaction data programmatically and append custom fields to transactions in DataShop so that I can more easily run my models on the data outside of DataShop and put the results back in.
# As a researcher with a machine learning background, I want to retrieve student-step data programmatically so that I can more easily run my analyses on the data without much human intervention.
# As a researcher who has created a custom field for an existing data set, I want to import that new field into the step rollup table so that it is both preserved and available in DataShop tools. - Ryan Baker, 12/15/2008

== Notes/Comments ==

* Adding custom fields to existing datasets
* Needed for M&M and CMDM clusters [Ryan Baker, October 2008]
* Will enable researchers to use output from existing models
** Gaming Detector [Arnon Hershkovitz, Mihaela Cocea, Summer 2008]
** Bayesian Knowledge Tracing. [Hao Cen, migrated from feature wish list]
* Key to collaboration.
* Ability to automatically add columns to DataShop data sets at transaction level [Ryan, Startup Memo, summer 2008]
** Tx level, Need user incentive to upload values, need associated notes, need graphing of

* Get Step Rollup
** How to show the KCs, multiple columns? one column? use labels or 1's and 0's? -- [[User:Koedinger|Ken Koedinger]], DataShop Team Meeting, Aug 28, 2009
*** Multimapped - column for every KC, then 1 or 0 (this is the Q matrix structure)
*** The other way to go, is to have one column for all the KCs

----
See completed [[DataShop 3.x Features]] 
See on-going [[DataShop 4.x Features]] 
See unordered [[Collected User Requests]] 
See [[Web Services]]
[[Category:Protected]]
[[Category:DataShop]]
[[Category:DataShop Completed Feature]]

Web Services - Add Custom Fields

2011-01-05T20:48:40Z

Bleber: /* Notes/Comments */

Adding Custom Fields through Web Application

2011-01-05T20:46:21Z

Bleber: /* Notes/Comments */

'''Status: Definition Needed'''
== User Story ==
As a researcher, I want to add custom fields through the web application instead of using a program, so that I can blah blah blah.

== Notes/Comments ==
* This is related to the [[Web Services - Add Custom Fields]] feature.
* First, implement at the '''Transaction Level'''.

 
----
See completed [[DataShop 3.x Features]] 
See on-going [[DataShop 4.x Features]] 
See prioritized [[DataShop Feature Wish List]] 
See unordered [[Collected User Requests]]
[[Category:Protected]]
[[Category:DataShop]]
[[Category:DataShop Feature]]

Web Services - Add Custom Fields

2011-01-05T20:30:28Z

Bleber:

DataShop 4.x Features

2011-01-05T20:28:22Z

Bleber: /* v4.0 December 2009 */

== v4.0 December 2009 ==
# [[Web Services - Get Transactions and Student-Step Records]]
# [[Project Announcements in DataShop Web Application]]
# [[Learning Curve Point Info Details: Add Frequency]]
# [[Rename Pound Sign|Rename Pound Sign to Row in all 4 exports]]
# [[Condition in Student-Step Rollup]] — Vote: Ken Koedinger(2), Vincent Aleven(3)
# [[DS995]] information on cached transaction files

== v4.1 May 2010 ==
# Bug fixes

== v4.2 August 2010 ==
# [[Metrics]]

== v4.3 September 2010 ==

# Bug fixes

== v4.4 December 2010 ==
# [[DB Merge]]
# [[Unique File Name on Export]]
# [[Rename LFA to AFM]]
# [[KCM Cross Validation Values]]

 
----
See completed [[DataShop 3.x Features]] 
See prioritized [[DataShop Feature Wish List]] 
See unordered [[Collected User Requests]]
[[Category:Protected]]
[[Category:DataShop]]

Web Services

2011-01-05T20:27:02Z

Bleber:

This page collects links to web services features.

* [[DataShop_3.x_Features#Web_Services_Feature_.28Authentication.2C_Get_Dataset_Metadata.2C_Get_Sample_Metadata.29 | Web Services - Authentication, Get Dataset Metadata, Get Sample Metadata]]
* [[Web Services - Get Transactions and Student-Step Records]]
* [[Web Services - Add Custom Fields]]

== User Stories ==

# As a researcher (EDM, M&M, CMDM), I want to retrieve transaction data programmatically and append custom fields to transactions in DataShop so that I can more easily run my models on the data outside of DataShop and put the results back in.
# As a researcher with a machine learning background, I want to retrieve student-step data programmatically so that I can more easily run my analyses on the data without much human intervention.
# As a researcher who has created a custom field for an existing data set, I want to import that new field into the step rollup table so that it is both preserved and available in DataShop tools. - Ryan Baker, 12/15/2008

----
See completed [[DataShop 3.x Features]] 
See on-going [[DataShop 4.x Features]] 
See unordered [[Collected User Requests]]

[[Category:Protected]]
[[Category:DataShop]]
[[Category:DataShop Completed Feature]]

Web Services

2011-01-05T20:23:17Z

Bleber:

This page collects links to web services features.

* [[Web Services - Get Transactions and Student-Step Records]]
* [[Web Services - Add Custom Fields]]

== User Stories ==

# As a researcher (EDM, M&M, CMDM), I want to retrieve transaction data programmatically and append custom fields to transactions in DataShop so that I can more easily run my models on the data outside of DataShop and put the results back in.
# As a researcher with a machine learning background, I want to retrieve student-step data programmatically so that I can more easily run my analyses on the data without much human intervention.
# As a researcher who has created a custom field for an existing data set, I want to import that new field into the step rollup table so that it is both preserved and available in DataShop tools. - Ryan Baker, 12/15/2008

----
See completed [[DataShop 3.x Features]] 
See on-going [[DataShop 4.x Features]] 
See unordered [[Collected User Requests]]

[[Category:Protected]]
[[Category:DataShop]]
[[Category:DataShop Completed Feature]]

Web Services

2011-01-05T20:22:27Z

Bleber:

This page collects links to web services features.

* [[Web Services - Get Transactions and Student-Step Records]]
* [[Web Services - Add Custom Fields]]

== User Stories ==

# As a researcher (EDM, M&M, CMDM), I want to retrieve transaction data programmatically and append custom fields to transactions in DataShop so that I can more easily run my models on the data outside of DataShop and put the results back in.
# As a researcher with a machine learning background, I want to retrieve student-step data programmatically so that I can more easily run my analyses on the data without much human intervention.
# As a researcher who has created a custom field for an existing data set, I want to import that new field into the step rollup table so that it is both preserved and available in DataShop tools. - Ryan Baker, 12/15/2008

----
See completed [[DataShop 3.x Features]] 
See on-going [[DataShop 4.x Features]] 
See unordered [[Collected User Requests]]
[[Category:Protected]]
[[Category:DataShop]]
[[Category:DataShop Completed Feature]]

Web Services - Get Transactions and Student-Step Records

2011-01-05T20:19:51Z

Bleber:

'''status:''' done, released with DataShop v4.0 December 2009

== User Stories ==

# As a researcher (EDM, M&M, CMDM), I want to retrieve transaction data programmatically and append custom fields to transactions in DataShop so that I can more easily run my models on the data outside of DataShop and put the results back in.
# As a researcher with a machine learning background, I want to retrieve student-step data programmatically so that I can more easily run my analyses on the data without much human intervention.
# As a researcher who has created a custom field for an existing data set, I want to import that new field into the step rollup table so that it is both preserved and available in DataShop tools. - Ryan Baker, 12/15/2008

== Notes/Comments ==

* Adding custom fields to existing datasets
* Needed for M&M and CMDM clusters [Ryan Baker, October 2008]
* Will enable researchers to use output from existing models
** Gaming Detector [Arnon Hershkovitz, Mihaela Cocea, Summer 2008]
** Bayesian Knowledge Tracing. [Hao Cen, migrated from feature wish list]
* Key to collaboration.
* Ability to automatically add columns to DataShop data sets at transaction level [Ryan, Startup Memo, summer 2008]
** Tx level, Need user incentive to upload values, need associated notes, need graphing of

* Get Step Rollup
** How to show the KCs, multiple columns? one column? use labels or 1's and 0's? -- [[User:Koedinger|Ken Koedinger]], DataShop Team Meeting, Aug 28, 2009
*** Multimapped - column for every KC, then 1 or 0 (this is the Q matrix structure)
*** The other way to go, is to have one column for all the KCs

----
See completed [[DataShop 3.x Features]] 
See on-going [[DataShop 4.x Features]] 
See unordered [[Collected User Requests]]
[[Category:Protected]]
[[Category:DataShop]]
[[Category:DataShop Completed Feature]]

Web Services - Get Transactions and Student-Step Records

2011-01-05T20:16:05Z

Bleber:

'''status: done, released with DataShop v4.0 December 2009'''

== User Stories ==

# As a researcher (EDM, M&M, CMDM), I want to retrieve transaction data programmatically and append custom fields to transactions in DataShop so that I can more easily run my models on the data outside of DataShop and put the results back in.
# As a researcher with a machine learning background, I want to retrieve student-step data programmatically so that I can more easily run my analyses on the data without much human intervention.
# As a researcher who has created a custom field for an existing data set, I want to import that new field into the step rollup table so that it is both preserved and available in DataShop tools. - Ryan Baker, 12/15/2008

== Notes/Comments ==

* Adding custom fields to existing datasets
* Needed for M&M and CMDM clusters [Ryan Baker, October 2008]
* Will enable researchers to use output from existing models
** Gaming Detector [Arnon Hershkovitz, Mihaela Cocea, Summer 2008]
** Bayesian Knowledge Tracing. [Hao Cen, migrated from feature wish list]
* Key to collaboration.
* Ability to automatically add columns to DataShop data sets at transaction level [Ryan, Startup Memo, summer 2008]
** Tx level, Need user incentive to upload values, need associated notes, need graphing of

* Get Step Rollup
** How to show the KCs, multiple columns? one column? use labels or 1's and 0's? -- [[User:Koedinger|Ken Koedinger]], DataShop Team Meeting, Aug 28, 2009
*** Multimapped - column for every KC, then 1 or 0 (this is the Q matrix structure)
*** The other way to go, is to have one column for all the KCs

----
See completed [[DataShop 3.x Features]] 
See on-going [[DataShop 4.x Features]] 
See unordered [[Collected User Requests]]
[[Category:Protected]]
[[Category:DataShop]]
[[Category:DataShop Completed Feature]]

Web Services - Get Transactions and Student-Step Records

2011-01-05T20:13:08Z

Bleber:

'''status: done, released with DataShop v4.0 December 2009'''

[[Category:Protected]]
[[Category:DataShop]]
[[Category:DataShop Completed Feature]]

Web Services - Add Custom Fields

2011-01-05T20:12:10Z

Bleber:

Web Services - Get Transactions and Student-Step Records

2011-01-05T20:11:39Z

Bleber:

'''status: done, released with DataShop v4.0 December 2009'''