So the company you worked has decided to have a data warehouse.The big question is to choose the right database.Although there are numerous databases in market the choice depend on various factors.
1. Price : If you are planning to have huge data warehouse the best choice is to have database like teradata which can handle huge amount of data with ease.The concern here will be price.Teradata comes with its own hardware so total cost builds up.
2. Ease of use: Db2 and Oracle are the most user friendly databases available.The amount of computer professionals and tech support avialable for these databases is huge.
3. ETL features : If you choose certain tools like informatica for ETL it is good choice to have oracle as database.
So far in my career I had seen these databases used mostly
1. DB2
2. Oracle
3. Teradata
4. Sybase
Please post your comment as it encourages me to write more.
Offbeat topic : What is google page rank
Tuesday, November 3, 2009
Thursday, October 15, 2009
Why it is better to load data into flat files first?

If you have dealed with DWH project you will know that in most of the project instead of directly loading data into tables,first data is loaded into flat files.The advantages of loading data into flat files are lot.In some cases even it is loaded into intermediate tables called staging tables.
1. Generally the ETL tool is installed on UNIX server.It is 5 to 6 times faster for ETL tool to perform processing on flat files rather than on tables.Let take an example suppose you have date format that you want to convert in other date format.If the ETL tool perform operation on the table it will be slow but the same operation on flat file will be faster.
2.Once the load ready files is created the data can be loaded directly into tables which is much speedier process.
Note this approach is useful for DWH tables with huge amount of data.This approach is not useful tables with small tables in which you can load data directtly.
Snowflake Schema in data warehouse
Tuesday, July 7, 2009
Data Warehousing What is it?

Data Warehouse is big database which contain huge amount of data for analysis .
The data in data warehouse can come from heterogeneous sources as well as from homogeneous sources.
Heterogeneous sources mean data coming from differnet databases like DB2,Oracle etc.
Homogeneous sources mean data coming from same database.
So there are plenty of challenges to be overcome while building a data warehouse.
The data in data warehouse can come from heterogeneous sources as well as from homogeneous sources.
Heterogeneous sources mean data coming from differnet databases like DB2,Oracle etc.
Homogeneous sources mean data coming from same database.
So there are plenty of challenges to be overcome while building a data warehouse.
In the picture above you will data coming from various sources like OLTP etc.
The purpose of ETL layer after that(Informatica,Abinitio) is the data may not be in right format to be loaded into data warehouse.
Example will be data coming from source is in 'YYYY-MM-DD' format and we want another format like 'MM-DD-YYYY'.
After ETL we load data into Data Warehouse.
Plenty of Data bases are available which are present in current market to fulfil the requirement of data warehouse like Oracle 10g,Teradata and DB@.
The data base you choose depend on cost,skills available and few other factor.
Aprt from Data Warehouse we have data mart.
Data mart are subset of Data Warwehouse.In an organization you may have different lot of data mart.Data Mart specifically deal with one domain example finance.
There are two approaches of building data warehouse.
1. Top Down: Build Data Warehouse first and than Data mart
2. Bottom up:Build Data mart first and than Data warehouse.
After the data warehouse is build it can be used for various purpose like data reporting and data mining.
Data reporting can be done by various tools available in the market like cognos,Business Obects etc
Please post you comment.
Thanks for reading
Mapplets and Transformation
Here I continue to write about Informatica.
Mapplet and Transformation are two important things that you will use a lot while developing your mappings.
What Exactly is Mapplet
A mapplet is reusable component of mapping.Suppose you feel that you are going to change date to '10/10/2009' whatver date may come for all the tables.
For loading data into different tables you will have differnt mapping.
You can create a mapplet for such purpose.
You can drag this mapplet than to mapping.
Mapplet will contain various transformation like expression,filter,lookup
A mapplet will have input as well ouput.
When you drag a mapplet to mapping you will connect one of its port to input and other to output.
Expression
To perform various operation in informatica you need to have transformation.Transformation can be of several type like expression,filter(Each transformation will be covered in detail).
Transformation can be of two
1. Active : They change the number of rows passing through them like filter
2. Passive : They dont change the number of rows passing throught them like expression.
I know some of you may not be getting what I write.
Please share your view and let me know on which topic I can expand more.
Please read my torrent blog to get good ebooks http://torrent-tut.blogspot.com/.
Keep reading and Keep posting
Thanks
Mapplet and Transformation are two important things that you will use a lot while developing your mappings.
What Exactly is Mapplet
A mapplet is reusable component of mapping.Suppose you feel that you are going to change date to '10/10/2009' whatver date may come for all the tables.
For loading data into different tables you will have differnt mapping.
You can create a mapplet for such purpose.
You can drag this mapplet than to mapping.
Mapplet will contain various transformation like expression,filter,lookup
A mapplet will have input as well ouput.
When you drag a mapplet to mapping you will connect one of its port to input and other to output.
Expression
To perform various operation in informatica you need to have transformation.Transformation can be of several type like expression,filter(Each transformation will be covered in detail).
Transformation can be of two
1. Active : They change the number of rows passing through them like filter
2. Passive : They dont change the number of rows passing throught them like expression.
I know some of you may not be getting what I write.
Please share your view and let me know on which topic I can expand more.
Please read my torrent blog to get good ebooks http://torrent-tut.blogspot.com/.
Keep reading and Keep posting
Thanks
Monday, July 6, 2009
Import and Export XML objects in Informatica
Most of us at one time will like to export an XML object in informatica to take it backup or need to import somone else XML object as workflow or mapping.
That is quite simple to do.
I will take example of mapping and you can use it for other things like session workfow.
Suppose you want to export abc mapping to desktop.
You just need to open that mapping in desinger and than go to file tab and click on export and choose the path where you want to save it.
Same goes for import if you want to import a mapping you just need to click file tab and than import .You will be asked to file the source of XML object .Choose it and than Save.
It is done.
Right now I dont have snapshot but will add it latter
I hope you are interested in finding some great stuff on internet . Torrent is answer fot it
Read Further http://torrent-tut.blogspot.com/
Thanks for Reading.
Please post Comment.
That is quite simple to do.
I will take example of mapping and you can use it for other things like session workfow.
Suppose you want to export abc mapping to desktop.
You just need to open that mapping in desinger and than go to file tab and click on export and choose the path where you want to save it.
Same goes for import if you want to import a mapping you just need to click file tab and than import .You will be asked to file the source of XML object .Choose it and than Save.
It is done.
Right now I dont have snapshot but will add it latter
I hope you are interested in finding some great stuff on internet . Torrent is answer fot it
Read Further http://torrent-tut.blogspot.com/
Thanks for Reading.
Please post Comment.
Careers in Data Warehousing
Well if you reading this you must be interested in a career in data warehousing.
Data Warehousing is vast field.
So even if you are involved with data warehousing it not necessary that you work in main field that is creating the data warehouse.
You can take several roles in the project.
I will try to explain in detail two real time projects in latter blog
Well there are plenty of career which you can take depending on your skills set,experience or interest.
1. ETL developer : This is main person who works on creating mapping(in informatica).That is he develop the code which move the data from source to target.
2.DBA : Generally for Data Warehousing DBA will be allocated.His main job will be to maintain tables in data warehouse and deals with other DBA issues like indexes,keys etc.
3.UNIX administrator : Well generally information server is installed on UNIX server.So we need an admin to take care of various issues like space allocation in unix etc.
4.Data Modeller : Well this job is coolest.You are required to be the first to analyze the source data and build the data warehouse accordingly.
5.Testing: Well yeah we have testing too.Testers will analyze the data that you loaded into Data Warehouse and check the consistency of data with source data.
6.Repors Developer : Ok.The data warehouse has been build but who is going to use it.Here come the person who will prepare reports on data loaded into datawarehouse like "sales of shop in north region" in graph format so that Business Users Can use it
Depending on the project you may need few other people too.
Please leave your comment and view
Thanks for Reading
Data Warehousing is vast field.
So even if you are involved with data warehousing it not necessary that you work in main field that is creating the data warehouse.
You can take several roles in the project.
I will try to explain in detail two real time projects in latter blog
Well there are plenty of career which you can take depending on your skills set,experience or interest.
1. ETL developer : This is main person who works on creating mapping(in informatica).That is he develop the code which move the data from source to target.
2.DBA : Generally for Data Warehousing DBA will be allocated.His main job will be to maintain tables in data warehouse and deals with other DBA issues like indexes,keys etc.
3.UNIX administrator : Well generally information server is installed on UNIX server.So we need an admin to take care of various issues like space allocation in unix etc.
4.Data Modeller : Well this job is coolest.You are required to be the first to analyze the source data and build the data warehouse accordingly.
5.Testing: Well yeah we have testing too.Testers will analyze the data that you loaded into Data Warehouse and check the consistency of data with source data.
6.Repors Developer : Ok.The data warehouse has been build but who is going to use it.Here come the person who will prepare reports on data loaded into datawarehouse like "sales of shop in north region" in graph format so that Business Users Can use it
Depending on the project you may need few other people too.
Please leave your comment and view
Thanks for Reading
Sunday, July 5, 2009
Informatica Desinger

This is how Mapping Designer look like.As you see on there are two panels .
The one on the left is used to give details about folders,mappings,source,target,transformation.
The one on the right contain actual mapping.
You drag items from left panel like source and target and connect them using various transformation.
There are several tabs in the desinger as you can see.
They are as follows
1. Repository
2.Edit
3.View
4. Tools
5.Layout
6.Versioning:Yes that is true informatica has versioning tool too.if you dont know what exactly is versioning or want to know more about it.Just send me a Comment.
Versioning is quite important and is used in almost all project.
RMS is one of the best tool available in project.
7.Mapping
8.Transformation
9.Windows
10.Help
I will cover each of these tab in details in coming blogs.Please give comment or leave your view on the blog. It will help me continue the blog
Thanks
Informatica Beginner Tutorial
Informatica is ETL tool used to load data into data warehouse.It is one of the widely used tools in the market apart from abinitio and IBM datastage.
Informatica has got capability to connect to various databases.
There are four tools in informatica
1. Mapping Desinger : This is used to create Mapping(I will explain latter what exactly you mean by mappings.)
2. Workflow Manager: Once you have created the Mapping you will need to run it through session
and workflow.These tasks are performed in manager
3. Workflow Monitor:When you run the workflow or session.You will see the status i.e whether it is successful or not in workflow Monitor
4. Repository Manager : This is used for various admin tasks like expoting importing mapping worklfow etc
Informatica has got capability to connect to various databases.
There are four tools in informatica
1. Mapping Desinger : This is used to create Mapping(I will explain latter what exactly you mean by mappings.)
2. Workflow Manager: Once you have created the Mapping you will need to run it through session
and workflow.These tasks are performed in manager
3. Workflow Monitor:When you run the workflow or session.You will see the status i.e whether it is successful or not in workflow Monitor
4. Repository Manager : This is used for various admin tasks like expoting importing mapping worklfow etc
Subscribe to:
Posts (Atom)