|Kimball and Inmon |
Numerous data warehouse practitioners have questioned the different approaches to data warehousing as taught by Bill Inmon and Ralph Kimball. As the industry has matured, the two philosophies have become much more aligned with each other. However, the language used to describe their methodology leads to questions of compatibility.
Actually, their approaches are quite compatible. It's just a question of how Kimball and Inmon name their structures. Both experts agree that the success of the warehouse/marts depends on effectively gathering the business requirements first. These requirements drive the design of the mart which, in turn, drives the data required in the warehouse. The big bang theory of building the warehouse based on the operational systems has proven excessively time- and resource-consuming with limited return to the business.
Both experts agree that business-user validation of the data mart design ensures that expectations are managed. Once this has occurred, the data analyst researches the most appropriate source system(s) and obtains the model (or designs their own if one does not exist). This model is the starting point for the design of the staging area (or warehouse). Kimball calls it the backroom, and Inmon calls it the data warehouse. This is where the referential integrity rules are applied (via the DBMS or software validation) and transformation of disparate values is performed. Other business rules are applied to the data to come up with a consistent format and derived values. Depending on slowly changing dimension requirements, the entity will be defined differently. Type I: Just overwrite; Type II: We need versioning and dates; and Type III: Add old and new copies of certain important fields. Surrogate keys are also generated here. This is necessary for conforming dimensions. Both Kimball and Inmon recommend extracting from a single source once. Some warehouse consultants may have interpreted Kimball's approach to extract multiple times from the same source but I don't think that was his intention. This is all done in the staging area.
Once the data has been loaded into the staging area, the next process is to load the data marts. All parties agree that the most suitable business structure is the star schema. For data mining the data needs to remain in third normal form so hidden relationships can be discovered. Star schema hides those due to extensive denormalization. The star schema closely resembles an Excel spreadsheet along with the use of pivot table. This is the best presentation layer structure.
Regarding the ODS, Inmon's definition is an integrated, volatile, up-to-the-minute picture of the business. This is a useful structure for on one-to-one marketing and customer relations, in addition to other areas where only the most recent transactions are important to the operational business process. Kimball describes the ODS as the structure to store detailed transaction data including history which is a reaction to Ěno detailed data in the warehouse. My impression is that if the warehouse were initially designed to hold only lightly summarized data and no detail transactions this was due to limited computer resources and the fact that most business users do not need that detailed information for their analysis. As the power, speed and storage capacity of computers has increased, the transaction detail history belongs in the warehouse where the users can access that level of detail if need be.