Monday, May 3, 2010

Slowly changing dimension

As part of the data mart design process, you should identify which of your dimensions will be changing over time, and which will be static. One you have identified the changing dimensions, the next design decision centers on the tracking of history in dimensions. You have two options; you can decide to store history about changes in a dimension, or to simply keep the most recent information, and not store the history. The exact decision you make really depends on the nature of the change, and how the data mart should reflect the change. Sometimes storing history is appropriate, in other cases it's not necessary. No matter which decision you do make, it shouldn't be influenced by your ability to create the necessary mappings.

There are five basic implementations for dimension tables in a data mart, and each has a corresponding mapping model in PowerCenter. This table contains a summary of the characteristics of these basic types.
Dimension Type
Change Processing
History Tracking
Notes

A.
Static
None
None

B.
Slowly Growing
New records added
None

C.
Slowly Changing, Update with Changes
Content updated
None
commonly known as a "Type 1" slowly changing dimension

D.
Slowly Changing, Versioning of Changes
Content updated, new records added
Full
Commonly known as a "Type 2" slowly changing dimension

E.
Slowly Changing, Keep Current and Previous values
Content updated
Partial
Commonly known as a "Type 3" slowly changing dimension


Usually, a dimension like time will be static (although it may need to be refreshed occasionally to extend it with new entries). Examples of some other common static dimensions are transaction types, shipping method, and status dimensions of various types. These dimensions fall under type A in the table above.

Examples of dimensions that change over time are the customer dimension, product dimensions, and many dimensions that include organizational hierarchy information. This last group is important, since companies regularly reorganize themselves. In the event of a reorganization, it's necessary to update the dimension to reflect the new organization, sometimes the old one as well.

Changes in dimensions fall into two subcategories. The first type of change is growth - simply adding new dimension records over time, like new products or customers. This is type B in the table. The second type of change is a change in the content of the dimension records, and this is the case where a decision needs to be made about tracking history. Types C, D, and E in the table all deal with changes in the content of the dimension, and provide different options for tracking history.

To demonstrate the mapping logic used in these situations, this article will cover examples of types B, C, D, and E. In this case, they will all be variations of a customer dimension, since it's probably the most easily understood example, allowing us to focus on the implementation. In addition, a customer dimension is often required in a data mart project, so it provides a real world reference.

Basic Building Blocks for Dimensions in PowerCenter
The next issue we have to deal with is how to implement these mappings in PowerCenter. Let's take a look at the same dimension types again, but this time in terms of the per row operations required at transformation time for each case.

Dimension Type
Row Level Operation
Change Detection
Notes

A.
Static
Insert

B.
Slowly Growing
Insert if new
None

C.
Slowly Changing, Update with Changes
Insert if new, update if changed
Optional
Can update entire record instead of detecting changes.

D.
Slowly Changing, Versioning of Changes
Insert if new, Update and Insert if changed
Required
Requires up to two operations at the target per source row

E.
Slowly Changing, Keep Current and Previous values
Insert if new, Update if changed
Required


All these dimensions have some characteristics in common, that provide clues about the general structure of the mappings. First of all, on a row-by-row basis, the mapping needs to decide the appropriate operation at the target, either insert or update. This means the mapping needs to take advantage of the per row operation control available in PowerCenter, using an update strategy transformation. In addition, when updates of an existing row are involved, the primary key for the row that is already in the target needs to be determined. Normally, you use a Lookup transformation to find out whether a record already exists in a target.

One of the trickier issues to deal with is identifying whether or not there has been a change since the last refresh. If the source actually provides a change indicator, this is simple. It's also rare - in most situations, it will be necessary to compare the new source with the existing dimension data, to determine if and where a change has occurred. We can do this with a lookup transformation that retrieves the 'old' data from the target, and an expression transformation to compare the current and old values. As a general rule, not all changes in the source are relevant, only some, so 'change' is an application specific thing. For example, a customer note field in the source might be irrelevant, but an address change is important.

Finally, for type D in the table, we may need to perform two operations on the target for a single source row. This means bringing two instances of the same target into the mapping, and splitting the data flow so it goes to both instances. Filter transformations will provide control over which paths the data takes.

This gives us our basic building blocks for dimensional mappings:

Update Strategy transformation to decide insert or update
Lookup transformation to determine whether records already exist in the target
Lookup transformation to get the previous values from the targets
Filter transformations to control the routing of the data to the target
Expression transformations to implement decision logic like comparisons

Friday, January 22, 2010

Starting with informatica 8.6

PowerCenter provides an environment that allows you to load data into a centralized location, such as a data warehouse or operational data store (ODS). You can extract data from multiple sources, transform the data according to business logic you build in the client application, and load the transformed data into file and relational targets.

some important terminologies in informatica are:

Repository Manager. You use the Repository Manager to create a folder to store the metadata you create in the lessons.

Designer. Use the Designer to create mappings that contain transformation instructions for the Integration Service. Before you can create mappings, you must add source and target definitions to the repository. In this tutorial, you use the following tools in the Designer:

- Source Analyzer. Import or create source definitions.

- Target Designer. Import or create target definitions. You also create tables in the target database based on the target definitions.

- Mapping Designer. Create mappings that the Integration Service uses to extract, transform, and load data.

Workflow Manager. Use the Workflow Manager to create and run workflows and tasks. A workflow is a set of instructions that describes how and when to run tasks related to extracting, transforming, and loading data.

Workflow Monitor. Use the Workflow Monitor to monitor scheduled and running workflows for each Integration Service.

Monday, January 18, 2010

Starting with tools in DWH

Data Warehouse is repository of Data picked from Transaction systems, and filtered and transformed to make it available for data analysis reporting.


Data-Warehouse is a repository of Data, which can provide most OR all of the Data and information requirements of an enterprise. It means that it pulls data from all the production and other sources. Once the data is pulled onto an offline staging area, it is cleansed, transformed and loaded into a sanitized, uniform and well-organized manner so that you can run queries, reports and all kind of analysis on the data.


There are a number of important tools which are connected to data warehouses, and one of these is data aggregation.Data warehousing requires a hardware platform, database, data loading method (ETL tools), and data extract method (reporting and business intelligence tools). Informatica is a famous ETL tool and Buisness Object is a reporting tool..

GEN AI

  Stay Tuned....