Thursday, October 15, 2009
DATA WAREHOUSING AND MINING, NOVEMBER/DECEMBER 2008 |
PART A – (10 x 2=20 marks) |
1. What is the difference between view and materialized view? |
2. Explain the Difference between star and snowflake schema? |
3. Mention the various tasks to be accomplished as part of data pre-processing. |
4. Define Data Mining. |
5. What is over fitting and what can you do to prevent it? |
6. In classification trees, what are surrogate splits, and how are they used? |
7. What is the objective function of the K-Means algorithm? |
8. The naïve Bayes’ classifier makes what assumption that motivates its name? |
9. What is the frequent itemset property? |
10. Mention the advantages of Hierarchical clustering. |
PART B – (5 x 16 = 80 marks) |
11. (a) Enumerate the building blocks of a data warehouse. Explain the importance of |
metadata in a data warehouse environment. What are thechallenges in metadata |
management?[Marks 16]Or |
(b) (i) Distinguish between the entity-relationship modeling techniqueand dimensional |
modeling. Why is the entity-relational modelingtechnique not suitable for the data |
warehouse?[Marks 8](ii) Create a star schema diagram that will enable FIT-WORLD |
GYMINC. to analyze their revenue. The fact table will include – for everyinstance of |
revenue taken – attribute(s) useful for analyzingrevenue. The star schema will include all |
dimensions that can beuseful for analyzing revenue. Formulate query: “Find the |
percentage of revenue generated by members in the last year”.How many cuboids are |
there in the complete data cube?[Marks 8] |
12. (a) Explain the 5 steps in the Knowledge Discovery in Databases (KDD)process. Discuss |
in brief the characterization of data mining algorithms.Discuss in brief important |
implementation issues in data mining.[Marks 5 + 6 + 5] |
Or |
(b) Distinguish between statistical inference and exploratory data analysis.Enumerate and |
discuss various statistical techniques and methods fordata analysis. Write a short note on |
machine learning. What is supervised and unsupervised learning? Write a short note on |
regressionand correlation.[Marks 16] |
13. (a) Decision tree induction is a popular classification method. Taking one typical |
decision tree induction algorithm , briefly outline the method of decision tree |
classification.[Marks 16] |
Or |
(b) Consider the following training dataset and the original decision tree induction |
algorithm (ID3). Risk is the class label attribute. The Heightvalues have been already |
discretized into disjoint ranges. Calculate the information gain if Gender is chosen as the |
test attribute. Calculate theinformation gain if Height is chosen as the test attribute. |
Draw the final decision tree (without any pruning) for the training dataset. Generate all |
the “IF-THEN rules from the decision tree. |
Gender Height Risk |
F (1.5, 1.6) Low |
M (1.9, 2.0) High |
F (1.8, 1.9) Medium |
F (1.8, 1.9) Medium |
F (1.6, 1.7) Low |
M (1.8, 1.9) Medium |
F (1.5, 1.6) Low |
M (1.6, 1.7) Low |
M (2.0, 8) High |
M (2.0, 8) High |
F (1.7, 1.8) Medium |
M (1.9, 2.0) Medium |
F (1.8, 1.9) Medium |
F (1.7, 1.8) Medium |
F (1.7, 1.8) Medium [Marks 16] |
14. (a) Given the following transactional database1 C, B, H2 B, F, S3 A, F, G4 C, B, H5 B, |
F, G6 B, E, O |
(i) We want to mine all the frequent itemsets in the data using theApriori algorithm. |
Assume the minimum support level is 30%. (You need to give the set of frequent itemsets |
in L1, L2,… candidate itemsets in C1, C2,…) [Marks 9] |
(ii) Find all the association rules that involve only B, C.H (in either left or right hand side |
of the rule). The minimum confidence is 70%.[Marks7] |
Or |
(b) Describe the multi-dimensional association rule, giving a suitable example.[Marks 16] |
15. (a) BIRCH and CLARANS are two interesting clustering algorithms that perform |
effective clustering in large data sets. |
(i) Outline how BIRCH performs clustering in large data sets. [Marks 10] |
(ii) Compare and outline the major differences of the two scalable clustering algorithms : |
BIRCH and CLARANS.[Marks 6] Or(b) Write a short note on web mining taxonomy. Explain |
the different activities of text mining. Discuss and elaborate the current trends in data |
mining.[Marks 6+5+5] |
http://www.ziddu.com/download/7201500/DATAWAREHOUSINGANDMINING.pdf.html