Thursday, October 15, 2009


PART A – (10 x 2=20 marks)

1. What is the difference between view and materialized view?

2. Explain the Difference between star and snowflake schema?

3. Mention the various tasks to be accomplished as part of data pre-processing.

4. Define Data Mining.

5. What is over fitting and what can you do to prevent it?

6. In classification trees, what are surrogate splits, and how are they used?

7. What is the objective function of the K-Means algorithm?

8. The naïve Bayes’ classifier makes what assumption that motivates its name?

9. What is the frequent itemset property?

10. Mention the advantages of Hierarchical clustering.

PART B – (5 x 16 = 80 marks)

11. (a) Enumerate the building blocks of a data warehouse. Explain the importance of

metadata in a data warehouse environment. What are thechallenges in metadata

management?[Marks 16]Or

(b) (i) Distinguish between the entity-relationship modeling techniqueand dimensional

modeling. Why is the entity-relational modelingtechnique not suitable for the data

warehouse?[Marks 8](ii) Create a star schema diagram that will enable FIT-WORLD

GYMINC. to analyze their revenue. The fact table will include – for everyinstance of

revenue taken – attribute(s) useful for analyzingrevenue. The star schema will include all

dimensions that can beuseful for analyzing revenue. Formulate query: “Find the

percentage of revenue generated by members in the last year”.How many cuboids are

there in the complete data cube?[Marks 8]

12. (a) Explain the 5 steps in the Knowledge Discovery in Databases (KDD)process. Discuss

in brief the characterization of data mining algorithms.Discuss in brief important

implementation issues in data mining.[Marks 5 + 6 + 5]


(b) Distinguish between statistical inference and exploratory data analysis.Enumerate and

discuss various statistical techniques and methods fordata analysis. Write a short note on

machine learning. What is supervised and unsupervised learning? Write a short note on

regressionand correlation.[Marks 16]

13. (a) Decision tree induction is a popular classification method. Taking one typical

decision tree induction algorithm , briefly outline the method of decision tree

classification.[Marks 16]


(b) Consider the following training dataset and the original decision tree induction

algorithm (ID3). Risk is the class label attribute. The Heightvalues have been already

discretized into disjoint ranges. Calculate the information gain if Gender is chosen as the

test attribute. Calculate theinformation gain if Height is chosen as the test attribute.

Draw the final decision tree (without any pruning) for the training dataset. Generate all

the “IF-THEN rules from the decision tree.

Gender Height Risk

F (1.5, 1.6) Low

M (1.9, 2.0) High

F (1.8, 1.9) Medium

F (1.8, 1.9) Medium

F (1.6, 1.7) Low

M (1.8, 1.9) Medium

F (1.5, 1.6) Low

M (1.6, 1.7) Low

M (2.0, 8) High

M (2.0, 8) High

F (1.7, 1.8) Medium

M (1.9, 2.0) Medium

F (1.8, 1.9) Medium

F (1.7, 1.8) Medium

F (1.7, 1.8) Medium [Marks 16]

14. (a) Given the following transactional database1 C, B, H2 B, F, S3 A, F, G4 C, B, H5 B,

F, G6 B, E, O

(i) We want to mine all the frequent itemsets in the data using theApriori algorithm.

Assume the minimum support level is 30%. (You need to give the set of frequent itemsets

in L1, L2,… candidate itemsets in C1, C2,…) [Marks 9]

(ii) Find all the association rules that involve only B, C.H (in either left or right hand side

of the rule). The minimum confidence is 70%.[Marks7]


(b) Describe the multi-dimensional association rule, giving a suitable example.[Marks 16]

15. (a) BIRCH and CLARANS are two interesting clustering algorithms that perform

effective clustering in large data sets.

(i) Outline how BIRCH performs clustering in large data sets. [Marks 10]

(ii) Compare and outline the major differences of the two scalable clustering algorithms :

BIRCH and CLARANS.[Marks 6]

Or(b) Write a short note on web mining taxonomy. Explain

the different activities of text mining. Discuss and elaborate the current trends in data

mining.[Marks 6+5+5]

Click the following link to download:


Post a Comment