Home >
Course Catalog >
Business Intelligence and Reporting Tools >
Data Warehousing
The Data Warehouse ETL Toolkit Training
Course #: |
DBDB-415 |
Format: |
On-Site, Virtual |
Duration: |
4 days |
Professional Development Units:
Continuing Development Units:
Course Tier: ??Tier??
Per Student Kit Price: ??KitPricePerStudent??
See all
Location
Location
Dates
Price
Virtual
06/17/13-06/20/13
$2,600.00
Description
This course is designed to provide students with the skills necessary to plan, design, build, and run the ETL processes which are needed to build and maintain a data warehouse. It is based on the Ralph Kimball and Joe Caserta book The Data Warehouse ETL Toolkit published in 2004 by Wiley Publishing, Inc, ISBN: 0-7645-6757-8. The course also uses the following book: Mastering Data Warehouse Aggregates published in 2006 by Wiley Publishing, ISBN: 0-471-77709-9. If the students will be using SQL Server 2005 Integration Services (SSIS) the following book may be substituted for the Ralph Kimball book: The Microsoft Data Warehouse Toolkit: With SQL Server 2005 and the Microsoft Business Intelligence Toolset by Joy Mundy, Warren Thornthwaite, and Ralph Kimball published by Wiley, February 13, 2006, ISBN 0471267155
Objectives
Audience
- Technical Staff
- Team Leaders
- Project Managers
PreRequisites
- Basic experience with any relational database management system
Topics
- Surrounding the Requirements
- Requirements
- Business Needs
- Compliance Requirements
- Data Profiling
- Security Requirements
- Data Integration
- Data Latency
- Archiving and Lineage
- End User Delivery Interfaces
- Available Skills
- Legacy Licenses
- Architecture
- ETL Tool Versus Hand Coding (Buy a Tool Suite or Roll Your Own)
- The Back Room—Preparing the Data
- The Front Room—Data Access
- The Mission of the Data Warehouse
- What the Data Warehouse Is
- What the Data Warehouse Is Not
- Industry Terms Not Used Consistently
- Resolving Architectural Conflict: A Hybrid Approach
- How the Data warehouse Is Changing
- The Mission of the ETL Team
- ETL Data Structures
- To Stage or Not to Stage
- Designing the Staging Area
- Data Structures in the ETL System
- Flat Files
- XML Data Sets
- Relational Tables
- Independent DBMS Working Tables
- Third Normal Form Entity/Relation Models
- Nonrelational Data Sources
- Dimensional Data Models: The Handoff from the Back Room to the Front Room
- Fact Tables
- Dimension Tables
- Atomic and Aggregate Fact Tables
- Surrogate Key Mapping Tables
- Planning and Design Standards
- Impact Analysis
- Metadata Capture
- Naming Conventions
- Auditing Data Transformation Steps
- Summary
- Extracting
- The Logical Data Map
- Designing Logical Before Physical
- Inside the Logical Data Map
- Components of the Logical Data Map
- Using Tools for the Logical Data Map
- Building the Logical Data Map
- Data Discovery Phase
- Data Content Analysis
- Collecting Business Rules in the ETL Process
- Integrating Heterogeneous Data Sources
- The Challenge of Extracting from Disparate Platforms
- Connecting to Diverse Sources Through ODBC
- Mainframe Sources
- Working with COBOL Copybooks
- EBCDIC Character Set
- Converting EBCDIC to ASCII
- Tranferring Data Between Platforms
- Handling Mainframe Numeric Data
- Using PICtures
- Unpacking Packed Decimal
- Working with Redefined Fields
- Multiple OCCURS
- Managing Multiple Mainframe Record Type Files
- Handling Mainframe Variable Record Lengths
- Flat Files
- Processing Fixed Length Flat Files
- Processing Delimited Flat Files
- XML Sources
- Character Sets
- XML Meta Data
- Web Log Sources
- W3C Common and Extended Formats
- Name Value Pairs in Web Logs
- ERP System Sources
- Extracting Changed Data
- Detecting Changes
- Extraction Tips
- Detecting Deleted or Overwritten Fact Records at the Source
- Summary
- Cleaning and Conforming
- Defining Data Quality
- Assumptions
- Part 1: Design Objectives
- Understand Your Key Constituencies
- Competing Factors
- Balancing Conflicting Priorities
- Formulate a Policy
- Part 2: Cleaning Deliverables
- Data Profiling Deliverable
- Cleaning Deliverable #1: Error Event Table
- Cleaning Deliverable #2: Audit Dimension
- Audit Dimension Fine Points
- Part 3: Screens and Their Measurements
- Anomaly Detection Phase
- Types of Enforcement
- Column Property Enforcement
- Structure Enforcement
- Data and Value Rule Enforcement
- Measurements Driving Screen Design
- Overall Process Flow
- The Show Must Go On—Usually
- Screens
- Known Table Row Counts
- Column Nullity
- Column Numeric and Date Ranges
- Column Length Restriction
- Column Explicit Valid Values
- Column Explicit Invalid Values
- Checking Table Row Count Reasonability
- Checking Column Distribution Reasonability
- General Data and Value Rule Reasonability
- Part 4: Conforming Deliverables
- Conformed Dimensions
- Designing the Conformed Dimensions
- Taking the Pledge
- Permissible Variations of Conformed Dimensions
- Conformed Facts
- The Fact Table Provider
- The Dimension Manager: Publishing Conformed Dimensions to Affected Fact Tables
- Detailed Delivery Steps for Conformed Dimensions
- Implementing the Conforming Modules
- Matching Drives Deduplication
- Surviving: Final Step of Conforming
- Delivering
- Summary
- Delivering Dimension Tables
- The Basic Structure of a Dimension
- The Grain of a Dimension
- The Basic Load Plan for a Dimension
- Flat Dimensions and Snowflaked Dimensions
- Date and Time Dimensions
- Big Dimensions
- Small Dimensions
- One Dimension or Two
- Dimensional Roles
- Dimensions as Subdimensions of Another Dimension
- Degenerate Dimensions
- Slowly Changing Dimensions
- Type 1 Slowly Changing Dimension (Overwrite)
- Type 2 Slowly Changing Dimension (Partitioning History)
- Precise Time Stamping of a Type 2 Slowly Changing Dimension
- Type 3 Slowly Changing Dimension (Alternate Realities)
- Hybrid Slowly Changing Dimensions
- Late-Arriving Dimension Records and Correcting Bad Data
- Multivalued Dimensions and Bridge Tables
- Ragged Hierarchies and Bridge Tables
- Populating Hierarchy Bridge Tables
- Using Positional Attributes in a Dimension to Represent Text Facts
- Summary
- Delivering Fact Tables
- The Basic Structure of a Fact Table
- Guaranteeing Referential Integrity
- Surrogate Key Pipeline
- Using the Dimension Instead of a Lookup Table
- Fundamental Grains
- Transaction Grain Fact Tables
- Periodic Snapshot Fact Tables
- Accumulating Snapshot Fact Tables
- Preparing for Loading Fact Tables
- Managing Indexes
- Managing Partions
- Outwitting the Rollback Log
- Loading the Data
- Incremental Loading
- Inserting Facts
- Updating and Correcting Facts
- Negating Facts
- Updating Facts
- Deleting Facts
- Physically Deleting Facts
- Logically Deleting Facts
- Factless Fact Tables
- Augmenting a Type 1 Fact Table with Type 2 History
- Graceful Modifications
- Multiple Units of Measure in a Fact Table
- Collecting Revenue in Multiple Currencies
- Late Arriving Facts
- Aggregations
- Design Requirements #1 Through #4
- Administering Aggregations, Including Materialized Views
- Delivering Dimensional Data to OLAP Cubes
- Cube Data Sources
- Processing Dimensions
- Changes in Dimension Data
- Processing Facts
- Integrating OLAP Processing into the ETL System
- OLAP Wrapup
- Summary
- Development
- Current Marketplace ETL Tool Suite Offerings
- Current Scripting Languages
- Time Is of the Essence
- Push Me or Pull Me
- Ensuring Transfers with Sentinels
- Sorting Data During a Preload
- Sorting on Mainframe Systems
- Sorting on UNIX and Windows Systems
- Trimming the Fat (Filtering)
- Extracting a Subset of the Source File Records on Mainframe Systems
- Extracting a Subset of the Source File Fields
- Extracting a Subset of the Source File Records on UNIX and Windows Systems
- Extracting a Subset of the Source File Fields
- Creating Aggregated Extracts on Mainframe Systems
- Creating Aggregated Extracts on UNIX and Windows Systems
- Using Database Bulk Loader Utilities to Speed Inserts
- Managing Database Features to Improve Performance
- The Order of Things
- The Effect of Aggregates and Group Bys on Performance
- Performance Impact of Using Scalar Functions
- Avoiding Triggers
- Overcoming ODBC Bottlenecks
- Benefiting from Parallel Processing
- Troubleshooting Performance Problems
- Increasing ETL Throughput
- Reducing Input/Output Contention
- Eliminating Database Reads/Writes
- Filtering as Soon as Possible
- Partiioning and Parellelizing
- Updating Aggregates Incrementally
- Taking Only What You Need
- Bulk Loading/Eliminating Logging
- Dropping Database Constraints and Indexes
- Eliminating Network Traffic
- Letting the ETL Engine Do the Work
- Summary
- Operations
- Scheduling and Support
- Reliability, Availability, Manageability Analysis for ETL
- ETL Scheduling 101
- Scheduling Tools
- Load Dependencies
- Metadata
- Migrating to Production
- Operational Support for the Data Warehouse
- Bundling Version Releases
- Supporting the ETL System in Production
- Achieving Optimal ETL Performance
- Estimating Load Time
- Vulnerabilities of Long-Running ETL Processes
- Minimizing the Risk of Load Failures
- Purging Historic Data
- Monitoring the ETL System
- Measuring ETL Specific Performance Indicators
- Measuring Infrastructure Performance Indicators
- Measuring Data Warehouse Usage to Help Manage ETL Processes
- Tuning ETL Processes
- Explaining Database Overhead
- ETL System Security
- Securing the Development Environment
- Securing the Production Environment
- Short-Term Archiving and Recovery
- Long-Term Archiving and Recovery
- Media, Formats, Software, and Hardware
- Obsolete Formats and Archaic Formats
- Hard Copy, Standards, and Museums
- Refreshing, Migrating, Emulating, and Encapsulating
- Summary
- Metadata
- Defining Metadata
- Metadata—What Is It?
- Source System Metadata
- Data-Staging Metadata
- DBMS Metadata
- Front Room Metadata
- Business Metadata
- Business Definitions
- Source System Information
- Data Warehouse Data Dictionary
- Logical Data Maps
- Technical Metadata
- System Inventory
- Data Models
- Data Definitions
- Business Rules
- ETL-Generated Metadata
- ETL Job Metadata
- Transformation Metadata
- Batch Metadata
- Data Quality Error Event Metadata
- Process Execution Metadata
- Metadata Standards and Practices
- Establishing Rudimentary Standards
- Naming Conventions
- Impact Analysis
- Summary
- Responsibilities
- Planning and Leadership
- Having Dedicated Leadership
- Planning Large, Building Small
- Hiring Qualified Developers
- Building Teams with Database Expertise
- Don’t Try to Save the World
- Enforcing Standardization
- Monitoring, Auditing, and Publishing Statistics
- Maintaining Documentation
- Providing and Utilizing Metadata
- Keeping It Simple
- Optimizing Throughput
- Managing the Project
- Responsibility of the ETL Team
- Defining the Project
- Planning the Project
- Determining the Tool Set
- Staffing Your Project
- Project Plan Guidelines
- Managing Scope
- Summary
- Real-Time ETL Systems
- Why Real-Time ETL?
- Defining Real-Time ETL
- Challenges and Opportunities of Real-Time Data Warehousing
- Real-Time Data Warehousing Review
- Generation 1—The Operational Data Store
- Generation 2—The Real-Time Partition
- Recent CRM Trends
- The Strategic Role of the Dimension Manager
- Categorizing the Requirement
- Data Freshness and Historical Needs
- Reporting Only or Integration, Too?
- Just the Facts or Dimension Changes, Too?
- Alerts, Continuous Polling, or Nonevents?
- Data Integration or Application Integration?
- Point-to-Point Versus Hub-and-Spoke
- Customer Data Cleanup Considerations
- Real-Time ETL Approaches
- Microbatch ETL
- Enterprise Application Integration
- Capture, Transform, and Flow
- Enterprise Information Integration
- The Real-Time Dimension Manager
- Microbatch Processing
- Choosing an Approach—A decision Guide
- Summary
- Conclusions
- Deepening the Definition of ETL
- The Future of Data Warehousing and ETL in Particular
- Ongoing Evolution of ETL Systems
Testimonials
??Testimonials??
Certifications
Roadmaps
This course is included in the following Roadmaps: