Online Analytical Processing (OLAP) Hweichao Lu CS157B-02 Spring 2007 What is OLAP п‚§ Basic idea: converting data into information that decision makers need п‚§ Concept to analyze data by multiple dimension in a structure called data cube History п‚§ In 1993, E. F. Codd came up with the term online analytical processing (OLAP) and proposed 12 criteria to define an OLAP database п‚§ the term OLAP seems perfect to describe databases designed to facilitate decision making (analysis) in an organization Purpose of OLAP п‚§ To derive summarized information from large volume database п‚§ To generate automated reports for human view Why need OLAP over Relational Database I п‚§ Consistently fast response п‚§ OLAP obtains a consistently fast response is by prestoring calculated values Why need OLAP over Relational Database II п‚§ Metadata-based queries п‚§ provide analysis functions that are difficult or impossible to express in SQL п‚§ SQL was developed primarily for transaction systems, not for reporting applications Why need OLAP over Relational Database III п‚§ Spreadsheet-style formulas п‚§ design the data structure with users in mind. п‚§ Spreadsheets are key components of business management because they are intuitive to create Step I 1. identify multidimensional data п‚§ measure attribute (measure some value, can be aggregated upon) п‚§ dimension attribute (define the dimension and summary of measure attribute) (Cont.) п‚§ Each dimension is typically expressed as a вЂњhierarchyвЂќ п‚§ Hierarchy: Analyst is interested in different level of detail of a dimension Step II 2. Analyze multidimensional data into cross-tabulation row header: value for one attribute column header: value for another attr. individual cell: value aggregation Step III 3. Visualize n-dimensional cube - data cube the word CUBE describe what in the relational world would be the integration of the fact table with dimension tables Step IV п‚§ After you design the cube, you will use the cube's structure to build a relational database (known as a star schema) to house the data for the cube Step V п‚§ Once you load data into the relational database, and then into the cube, you'll be able to see how attributes, dimensions, measures, and measure groups fit together within a cube to create a powerful analytical tool. Star Schema п‚§ Cubes are easily stored in relational databases, using a denormalized data structure called the star schema, developed by Ralph Kimball п‚§ starts with a central fact table п‚§ Each row in the central fact table contains some combination of keys that makes it unique. These keys are called dimensions. Slicing & Dicing п‚§ Additional Functionality that can be thought of as viewing a slice of the data cube, particularly when values for multiple dimensions are fixed. п‚§ Slicing/Dicing simply consists of selecting specific values for these attributes, which are then displayed on top of the cross-tab Rollup & Drill-down п‚§ OLAP permit users to view data at ay desired level of granularity. п‚§ Rollup: moving from finer-granularity data to coarser granularity п‚§ Drill-down: opposite to Rollup OLAP Inplementation п‚§ Multidimensional OLAP (MOLAP) п‚§ Relational OLAP (ROLAP) п‚§ Hybrid OLAP (HOLAP) MOLAP п‚§ The database is stored in a special, usually proprietary, structure that is optimized for multidimensional analysis. п‚§ + : very fast query response time because data is mostly pre-calculated п‚§ -: practical limit on the size because the time taken to calculate the database and the space required to hold these pre-calculated values ROLAP п‚§ The database is a standard relational database and the database model is a multidimensional model, often referred to as a star or snowflake model or schema. п‚§ +: more scalable solution п‚§ -: performance of the queries will be largely governed by the complexity of the SQL and the number and size of the tables being joined in the query HOLAP п‚§ a hybrid of ROLAP and MOLAP п‚§ can be thought of as a virtual database whereby the higher levels of the database are implemented as MOLAP and the lower levels of the database as ROLAP DOLAP п‚§ The previous terms are used to refer to server based OLAP technologies п‚§ DOLAP (Desktop OLAP) п‚§ DOLAP enables users to quickly pull together small cubes that run on their desktops or laptops Conclusion п‚§ OLAP is a significant improvement over query systems п‚§ OLAP is an interactive system to show different summaries of multidimensional data by interactively selecting the attributes in a multidimensional data cube References п‚§ IBM Redbooks. DB2 Cube Views: A Primer. Durham, NC, USA: IBM, 2003. ebrary collections. San Jose State University. <http://site.ebrary.com/lib/sjsu/Doc?id=10113016&ppg=43> п‚§ Jacobson, Reed, MicrosoftВ® SQL Serverв„ў 2005 Analysis Services Step by Step. Microsoft Press. п‚§ Berry, Michael J. A. Data Mining Techniques : For Marketing, Sales, and Customer Relationship Management. Hoboken, NJ, USA: John Wiley & Sons, Incorporated, 2004. ebrary collections. San Jose State University. <http://site.ebrary.com/lib/sjsu/Doc?id=10114278&ppg=522> .