Difference between revisions of "Flat File Importer"

From LearnLab
Jump to: navigation, search
(Notes/Comments)
Line 4: Line 4:
  
 
As a DataShop administrator, I want to redesign Dataset Import Tool to load the dataset into database by processing it column by column so that I can speed up the import process.
 
As a DataShop administrator, I want to redesign Dataset Import Tool to load the dataset into database by processing it column by column so that I can speed up the import process.
 +
 +
== Summary from Release Notes ==
 +
This release of DataShop concludes work improving the tool used to import tab-delimited text files into DataShop. With these improvements, loading large tab-delimited text files of transaction data is now possible. It's fast, too.
 +
 +
As part of this release, we have used the new import tool to load 6 datasets that we had been unable to load. These datasets range in size from 122,000 to 870,000 transactions.
  
 
== Notes/Comments ==
 
== Notes/Comments ==

Revision as of 14:10, 16 May 2011

Status: Requirements done. Estimate: 18 weeks

User Story

As a DataShop administrator, I want to redesign Dataset Import Tool to load the dataset into database by processing it column by column so that I can speed up the import process.

Summary from Release Notes

This release of DataShop concludes work improving the tool used to import tab-delimited text files into DataShop. With these improvements, loading large tab-delimited text files of transaction data is now possible. It's fast, too.

As part of this release, we have used the new import tool to load 6 datasets that we had been unable to load. These datasets range in size from 122,000 to 870,000 transactions.

Notes/Comments

  • Current Dataset Import Tool processes the dataset row by row and uses Hibernate layer, which takes a long time to import a dataset.
  • The import sometime has failed for several large datasets and has some bugs as well.
  • We'd like to rewrite this import tool to allow column by column process and avoid Hibernate layer to make the import faster.
  • The goal is to process 1 million rows per minute on import only. Shanwen10:09, 12 October 2010 (EDT)



See completed DataShop 3.x Features
See on-going DataShop 4.x Features
See prioritized DataShop Feature Wish List
See unordered Collected User Requests