Enterprise Software Solutions  [Company Logo Image]What is a data warehouse

Home Up Feedback Contents Search  Jump to: DW University, Employment

[Under Construction]

News
Products
Programming
DW University
Services
Employment

 

 

What is a data warehouse? According to Stanford University,

"A Data Warehouse is a repository of integrated information, available for queries and analysis. Data and information are extracted from heterogeneous sources as they are generated....This makes it much easier and more efficient to run queries over data that originally came from different sources."

One characteristic of a data warehouse is that its data is organized in a multidimensional schema, not a normalized schema.  A Dimension is typically a qualifiable text value, such as a region, product, and includes date values like a time dimension. It defines the secondary headings or labels that make up the body of the report.

View multidimensional model

 

OLTP vs OLAP

On Line Transaction Processing (OLTP) systems and On Line Analytical Processing (OLAP) systems are different in many ways, and similar in some.  For instance, both OLTP and OLAP use a database to store data in an organized, queryable fashion.  Both are “on line” which means that users interact with the system’s database real-time.  They are, however, fundamentally different in scope, design, and purpose.

 The scope of an Enterprise Data Warehouse (EDW) encompasses data from many sources in a single database and draws relationships between the data of these many systems.  With this, analysis can be done that would otherwise not be possible.  An OLTP system, on the other hand, is a source in and of itself to capture and store data in its own database.  Although an OLTP system is often integrated with other systems, this is done for the purpose of data integrity among external systems and to eliminate duplicate effort of data capture / entry.  In a data warehouse, there is often little or no data entry beyond simply making relationship and groupings that might not exists in legacy, or source, systems.

 The design of an OLAP database is different from OLTP database design in many ways.  OLTP is normalized and OLAP is multidimensional.  In a multidimensional data model, data is often duplicated in logical hierarchies called dimensions.  Dimensions might include product, customer, and time.  In a dimension like product, the hierarchy might include levels such as product group, product item, and product skew.  A product group might be t-shirts, a product item might be men’s large, white v-neck t-shirts, and product skew might be a package of three as apposed to a package of five.  Note that each level in the hierarchy has a one-to-many relationship to the child level in the hierarchy.  Figure 1.1 demonstrates an example of a logical data model of a multidimensional model.  Note that in this model there is a dimension called “Cart” that duplicates data from the Product dimension.  This is done for the purpose of performing market basket analysis, a data-mining tool.  In an OLTP system, this data would not be duplicated because there would be a table for transaction header and a table for transaction details, or scanned items with quantity. 

A few basic modeling concepts of OLAP models are that they dimensions only join through fact tables.  Dimensional (hierarchy) attributes can have multiple parents, but never multiple children.  The bottom of each dimension should have exactly one integer filed that joins to the fact able.  A fact table is the only place where numerical “measures” are stored.  When designing a data warehouse, the data should be already calculated in the Electronic transformation Load (ETL) process.  Ideally, only simple column summation is done at time of query; no calculations need to be worked out.  Another important fact is that data warehouses include complex business rules in the ETL process.  This might include rules about how calculations are made, what elements are filtered out, etc.  When a user is running queries against the OLAP database, it should be primarily simple data retrieval, and not include complex calculations, filtering, and other rule applications. 

The purpose of an OLAP system is to perform high-level analysis without OLAP users having to worry about complex business rules.  These systems are often used by back-office people such as marketing analysts to identify marketing opportunities.  By using analytics, business analysts might be able to identify trends in customer demographics over time that might provide insight into why sales might be higher or lower given a certain set of criteria.  This information is highly powerful and valuable.  By contrast, OLTP systems are often used by people such as cash register clerks, order entry clerks, shipping and receiving clerks, etc.

Another important consideration in a data warehouse is the timeliness of the data.  In a data warehouse, a great deal of history is often kept actively on line for querying.  This can include not only the measure (fact) data, such as sales, but also slowly changing dimensional data such as sales force alignment and management hierarchies.  These slowly changing dimensions can be reported on an “as of” basis.  For example, if store 123 used to be part of the Midwest region, but is now part of the central region, the summarized sales data by region can be reported “as of” today or “as of” last year with respect to the regional alignment.  This is a powerful toll when doing “what if” analysis and tracking trends over time.

 OLTP and OLTP databases have their distinct purposes for capturing and reporting data.  Although OLTP systems almost always have reporting capabilities, data warehouses provide much more capabilities that are not available in single OLTP system reporting.  The field of data warehousing is growing in popularity, as the power of such systems becomes more widely known.

Figure 1.1

 

 

 

Home ] Up ]

Send mail to Webmaster@ESScorporation.com with questions or comments about this web site.
Copyright © 2004 Enterprise Software Solutions, Inc.
Last modified: July 29, 2004