Thursday, October 15, 2009

Why it is better to load data into flat files first?


If you have dealed with DWH project you will know that in most of the project instead of directly loading data into tables,first data is loaded into flat files.The advantages of loading data into flat files are lot.In some cases even it is loaded into intermediate tables called staging tables.

1. Generally the ETL tool is installed on UNIX server.It is 5 to 6 times faster for ETL tool to perform processing on flat files rather than on tables.Let take an example suppose you have date format that you want to convert in other date format.If the ETL tool perform operation on the table it will be slow but the same operation on flat file will be faster.

2.Once the load ready files is created the data can be loaded directly into tables which is much speedier process.

Note this approach is useful for DWH tables with huge amount of data.This approach is not useful tables with small tables in which you can load data directtly.

Snowflake Schema in data warehouse