Optimization of Data Warehouse – Reporting Design

Problem Definition:- As per any standard Data Warehouse Project the entities <Transactional DB Source, ETL(Extract, Transform & Load)  Process, Analytical DB Destination, UI for Reporting>  are being used in the current Process/Project. Before looking in to the problems in the current system design let us first see the current system design:

 

Current Design with ETL and User Reporting on Same Server
 
Following were some problems that exist with the above current system design:-
  1. ETL Performance: ETLs were taking longer time to Complete.
  2. Resource Conflict: ETLs and User Reporting were happening at same time at the same Server (OLAP Analytical Server in the above example) is affecting the ETL and Report execution performance because of conflict for same resource.
  3. Non-Flexible ETLs Schedule: ETLs can not be executed in Business hours because it was affecting the User Report performance.
  4. Users Reports Performance: As most of the Users are Redmond Users so ETLs were executed in Non- Redmond Hours but for the other regions Users the ETL execution was impacting their reports because the ETL and Reporting Server were same.
Solution Proposed: To overcome the above stated problems and for better performance of ETLs and Users reporting the following design change has been done: 
 
Phase 1 (Implemented):
New Design with ETL and User Reporting separated on different Server
 
As shown above, in the new design approach the ETL server and Reporting Server are now separated. The data movement from the ETL Server to the Reporting Server is through Replication. The replication job is scheduled to run for every 15 minutes. ETL Server is specifically used for ETL purpose only and Reporting Server is used for User reporting only. So following are the performance benefit after implementing the above design approach:-
 
  1. ETL Performance Gain: ETLs are now completing as expected. 
  2. No Resource Conflict: ETLs and User Reporting can now happen at same time because the ETL Server and Reporting Server are now different. So there is no Resource Conflict hence no performance issues. 
  3. Flexible ETLs Schedule: ETLs now can  be executed in Business hours also.
  4. Users Reports Performance Gain: The users are now not experiencing any query performance difference in between the regions. Irrespective of the Region the Reports/Queries execution time is same.
 Following graphs depicts the improvement in the ETL and Reporting Performance:
 
Reports Execution time improvement
 
 
 
ETL Execution time improvement
 
 
Phase 2 (Yet to Implement): Following design approach is proposed for Phase 2 approach. In this design the ETL source will be the Replicated OLTP rather then the main transactional Data base. The advantages with this approach will be:
  1. Independent ETL Source: No interfere with the Transactional Application load. The ETLs data pull will be from dedicated server that will get replicated data from OLTP Source.
  2. Indexes Creation:- In the Old design its very difficult to create any new index on the OLT P data base because it may impact the Transactional performance but in the new design new indexes can be created on the dedicated ETL Server that can help in enhancing the ETL data pull Performance. Because these indexes now won’t affect the Transactional performance.
  3. Performance Gain:-  As new replicated server will be a dedicated server for ETL pull data process so there will be less load on the server that in turns increase the performance for ETL Process.
 
 Proposed Design with Transactional DB and ETL Source separated on different Server