Flat File Importer: Difference between revisions

From Theory Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
 
(6 intermediate revisions by the same user not shown)
Line 1: Line 1:
'''Status: Requirements and Estimate done, 12 weeks, 2011'''
'''Status: Done (DataShop v5.0 May 2011)'''


== User Story ==
== User Story ==


As a DataShop administrator, I want to redesign Dataset Import Tool to load the dataset into database by processing it column by column so that I can speed up the import process.
As a DataShop administrator, I want to redesign Dataset Import Tool to load the dataset into database by processing it column by column so that I can speed up the import process.
== Summary from Release Notes ==
This release of DataShop concludes work improving the tool used to import tab-delimited text files into DataShop. With these improvements, loading large tab-delimited text files of transaction data is now possible. It's fast, too.
As part of this release, we have used the new import tool to load 6 datasets that we had been unable to load. These datasets range in size from 122,000 to 870,000 transactions.


== Notes/Comments ==
== Notes/Comments ==


* Current Dataset Import Tool processes teh dataset row by row and uses Hibernate layer, which takes a long time to import a dataset.  
* Current Dataset Import Tool processes the dataset row by row and uses Hibernate layer, which takes a long time to import a dataset.  
* The import sometime has failed for several large datasets.  
* The import sometime has failed for several large datasets and has some bugs as well.
* We'd like to rewrite this import tool to allow column by column process and avoid Hibernate layer to make the import faster.  
* We'd like to rewrite this import tool to allow column by column process and avoid Hibernate layer to make the import faster.  
* The goal is to process 1 million rows per minute on import only. [[User:Shanwen|Shanwen]]10:09, 12 October 2010 (EDT)  
* The goal is to process 1 million rows per minute on import only. [[User:Shanwen|Shanwen]]10:09, 12 October 2010 (EDT)  
Line 14: Line 19:
<br>
<br>
----
----
See completed [[DataShop 3.x Features]]<br>
See [[DataShop Completed Features|completed features]]<br>
See on-going [[DataShop 4.x Features]]<br>
See [[DataShop On-going Features|on-going features]]<br>
See prioritized [[DataShop Feature Wish List]]<br>
See unordered [[Collected User Requests]]<br>
See unordered [[Collected User Requests]]
See the [[:Category:DataShop Glossary|DataShop Glossary]]
[[Category:Protected]]
[[Category:Protected]]
[[Category:DataShop]]
[[Category:DataShop]]
[[Category:DataShop Feature]]
[[Category:DataShop Feature]]

Latest revision as of 12:14, 28 September 2011

Status: Done (DataShop v5.0 May 2011)

User Story

As a DataShop administrator, I want to redesign Dataset Import Tool to load the dataset into database by processing it column by column so that I can speed up the import process.

Summary from Release Notes

This release of DataShop concludes work improving the tool used to import tab-delimited text files into DataShop. With these improvements, loading large tab-delimited text files of transaction data is now possible. It's fast, too.

As part of this release, we have used the new import tool to load 6 datasets that we had been unable to load. These datasets range in size from 122,000 to 870,000 transactions.

Notes/Comments

  • Current Dataset Import Tool processes the dataset row by row and uses Hibernate layer, which takes a long time to import a dataset.
  • The import sometime has failed for several large datasets and has some bugs as well.
  • We'd like to rewrite this import tool to allow column by column process and avoid Hibernate layer to make the import faster.
  • The goal is to process 1 million rows per minute on import only. Shanwen10:09, 12 October 2010 (EDT)



See completed features
See on-going features
See unordered Collected User Requests
See the DataShop Glossary