Wednesday, December 9, 2009

Importance of Metadata in DWH

Metadata is data about data.The word meta comes from Greek and means among,beside or after.It is like a card index describing how information is structured within data warehouse.Meta data is not only technical data, which describes dimensions and facts, but also data, which describes the warehouse process (ETL), and, most important, “political” data like target groups, stakeholders, team members, and other important people.
Metadata is used for a variety of purposes.
As a part of extracting and loading,it is used to map data sources to the common view of information within DWH.(data transformation and load)
As a part of warehouse management process,it is used to automate the production of summary tables.(data management)
As a part of query management process,it is used to direct a query to the appropriate data source.(query generation)
Metadata is required to describe the data as it resides in DWH. Every object in database needs to be described.Metadata is needed for :
  • tables 
  • indexes
  • views
  • constraints
For all this tables,there will be primary key,foreign key and normal attributes.The common attributes are columns,name and type.

Aggregations are a special case of tables and they require the following metadata to be stored.
  • aggregation(table name)
  • aggregation_name
  • columns
Mathematical Functions like min,max,average and sum are examples of aggregation.
       
The important Metadata goals are:
1)business term definition
2)business rule definition
3)enforcement of information consistency
Eg:
Descriptions of the properties or characteristics of the data, including data types, field sizes, allowable values, and documentation
•Data about the data
•Data about the database structure
•Security and access rules
Eg:
Data Dictionary Storage

Data dictionary(also called system catalog) stores metadata: that is, data about data, such as

  • Information about relations


  1. names of relations
  2. names and types of attributes of each relation
  3. names and definitions of views
  4. integrity constraints

  • User and accounting information, including passwords
  • Statistical and descriptive data
    1. number of tuples in each relation
    • Physical file organization information
    1. How relation is stored (sequential/hash/heap)
    2. Physical location of relation
    • Information about indices

    1 comment:

    GEN AI

      Stay Tuned....