DataStage Interview Questions and Answers,Solution and Explanation - Part 2

What are conformed dimensions?
Ans:
A conformed dimension is a single, coherent view of the same piece of data throughout the organization. The same dimension is used in all subsequent star schemas defined. This enables reporting across the complete data warehouse in a simple format.

Why fact table is in normal form?
Ans:
Basically the fact table consists of the Index keys of the dimension/ook up tables and the measures.
so when ever we have the keys in a table .that itself implies that the table is in the normal form.

What is a linked cube?
Ans:
A cube can be stored on a single analysis server and then defined as a linked cube on other Analysis servers. End users connected to any of these analysis servers can then access the cube. This arrangement avoids the more costly alternative of storing and maintaining copies of a cube on multiple analysis servers. linked cubes can be connected using TCP/IP or HTTP. To end users a linked cube looks like a regular cube.

What is degenerate dimension table?
Ans:
The values of dimension which is stored in fact table is called degenerate dimensions. these dimensions doesn,t have its own dimensions.

What is ODS?
Ans:
ODS stands for Online Data Storage.

What is a general purpose scheduling tool?
Ans:
The basic purpose of the scheduling tool in a DW Application is to stream line the flow of data from Source To Target at specific time or based on some condition.

What is the need of surrogate key;why primary key not used as surrogate key?
Ans:
Surrogate Key is an artificial identifier for an entity. In surrogate key values are generated by the system sequentially(Like Identity property in SQL Server and Sequence in Oracle). They do not describe anything. Primary Key is a natural identifier for an entity. In Primary keys all the values are entered manually by the user which are uniquely identified. There will be no repetition of data

What is Hash file stage and what is it used for?
Ans:
Used for Look-ups. It is like a reference table. It is also used in-place of ODBC, OCI tables for better performance.

What is the utility you use to schedule the jobs on a UNIX server other than using Ascential Director?
Ans:
Use crontab utility along with d***ecute() function along with proper parameters passed.

What is the OCI? and how to use the ETL Tools?
Ans:
OCI means orabulk data which used client having bulk data its retrive time is much more ie., your used to orabulk data the divided and retrived

What are OConv () and Iconv () functions and where are they used?
Ans:
IConv() - Converts a string to an internal storage formatOConv() - Converts an expression to an output format.

What is Fact table?
Ans:
Fact Table contains the measurements or metrics or facts of business process. If your business process is "Sales" , then a measurement of this business process such as "monthly sales number" is captured in the Fact table. Fact table also contains the foriegn keys for the dimension tables.

What are the steps In Building the Data Model
Ans:
While ER model lists and defines the constructs required to build a data model, there is no standard process for doing so. Some methodologies, such as IDEFIX, specify a bottom-up

What is Dimensional Modelling?
Ans:
Dimensional Modelling is a design concept used by many data warehouse desginers to build thier datawarehouse. In this design model all the data is stored in two types of tables - Facts table and Dimension table. Fact table contains the facts/measurements of the business and the dimension table contains the context of measuremnets ie, the dimensions on which the facts are calculated.

What type of Indexing mechanism do we need to use for a typical datawarehouse?
Ans:
On the fact table it is best to use bitmap indexes. Dimension tables can use bitmap and/or the other types of clustered/non-clustered, unique/non-unique indexes.

What is Normalization, First Normal Form, Second Normal Form , Third Normal Form?
Ans:
Normalization can be defined as segregating of table into two different tables, so as to avoid duplication of values.

Is it correct/feasible develop a Data Mart using an ODS?
Ans:
Yes it is correct to develop a Data Mart using an ODS.becoz ODS which is used to?store transaction data and few Days (less historical data) this is what datamart is required so it is coct to develop datamart using ODS.

What are other Performance tunings you have done in your last project to increase the performance of slowly running jobs?
Ans:
1. Staged the data coming from ODBC/OCI/DB2UDB stages or any database on the server using Hash/Sequential files for optimum performance also for data recovery in case job aborts.
2. Tuned the OCI stage for 'Array Size' and 'Rows per Transaction' numerical values for faster inserts, updates and selects.
3. Tuned the 'Project Tunables' in Administrator for better performance.
4. Used sorted data for Aggregator.
5. Sorted the data as much as possible in DB and reduced the use of DS-Sort for better performance of jobs
6. Removed the data not used from the source as early as possible in the job.
7. Worked with DB-admin to create appropriate Indexes on tables for better performance of DS queries
8. Converted some of the complex joins/business in DS to Stored Procedures on DS for faster execution of the jobs.
9. If an input file has an excessive number of rows and can be split-up then use standard logic to run jobs in parallel.
10. Before writing a routine or a transform, make sure that there is not the functionality required in one of the standard routines supplied in the sdk or ds utilities categories.
Constraints are generally CPU intensive and take a significant amount of time to process. This may be the case if the constraint calls routines or
external macros but if it is inline code then the overhead will be minimal.

Try to have the constraints in the 'Selection' criteria of the jobs itself. This will eliminate the unnecessary records even getting in before joins
are made.
12. Tuning should occur on a job-by-job basis.
13. Use the power of DBMS.
14. Try not to use a sort stage when you can use an ORDER BY clause in the database.
15. Using a constraint to filter a record set is much slower than performing a SELECT … WHERE….
16. Make every attempt to use the bulk loader for your particular database. Bulk loaders are generally faster than using ODBC or OLE.

Tags