Thursday, October 15, 2009
| DATA WAREHOUSING AND MINING, NOVEMBER/DECEMBER 2008 |
| PART A – (10 x 2=20 marks) |
| 1. What is the difference between view and materialized view? |
| 2. Explain the Difference between star and snowflake schema? |
| 3. Mention the various tasks to be accomplished as part of data pre-processing. |
| 4. Define Data Mining. |
| 5. What is over fitting and what can you do to prevent it? |
| 6. In classification trees, what are surrogate splits, and how are they used? |
| 7. What is the objective function of the K-Means algorithm? |
| 8. The naïve Bayes’ classifier makes what assumption that motivates its name? |
| 9. What is the frequent itemset property? |
| 10. Mention the advantages of Hierarchical clustering. |
| PART B – (5 x 16 = 80 marks) |
| 11. (a) Enumerate the building blocks of a data warehouse. Explain the importance of |
| metadata in a data warehouse environment. What are thechallenges in metadata |
| management?[Marks 16]Or |
| (b) (i) Distinguish between the entity-relationship modeling techniqueand dimensional |
| modeling. Why is the entity-relational modelingtechnique not suitable for the data |
| warehouse?[Marks 8](ii) Create a star schema diagram that will enable FIT-WORLD |
| GYMINC. to analyze their revenue. The fact table will include – for everyinstance of |
| revenue taken – attribute(s) useful for analyzingrevenue. The star schema will include all |
| dimensions that can beuseful for analyzing revenue. Formulate query: “Find the |
| percentage of revenue generated by members in the last year”.How many cuboids are |
| there in the complete data cube?[Marks 8] |
| 12. (a) Explain the 5 steps in the Knowledge Discovery in Databases (KDD)process. Discuss |
| in brief the characterization of data mining algorithms.Discuss in brief important |
| implementation issues in data mining.[Marks 5 + 6 + 5] |
| Or |
| (b) Distinguish between statistical inference and exploratory data analysis.Enumerate and |
| discuss various statistical techniques and methods fordata analysis. Write a short note on |
| machine learning. What is supervised and unsupervised learning? Write a short note on |
| regressionand correlation.[Marks 16] |
| 13. (a) Decision tree induction is a popular classification method. Taking one typical |
| decision tree induction algorithm , briefly outline the method of decision tree |
| classification.[Marks 16] |
| Or |
| (b) Consider the following training dataset and the original decision tree induction |
| algorithm (ID3). Risk is the class label attribute. The Heightvalues have been already |
| discretized into disjoint ranges. Calculate the information gain if Gender is chosen as the |
| test attribute. Calculate theinformation gain if Height is chosen as the test attribute. |
| Draw the final decision tree (without any pruning) for the training dataset. Generate all |
| the “IF-THEN rules from the decision tree. |
| Gender Height Risk |
| F (1.5, 1.6) Low |
| M (1.9, 2.0) High |
| F (1.8, 1.9) Medium |
| F (1.8, 1.9) Medium |
| F (1.6, 1.7) Low |
| M (1.8, 1.9) Medium |
| F (1.5, 1.6) Low |
| M (1.6, 1.7) Low |
| M (2.0, 8) High |
| M (2.0, 8) High |
| F (1.7, 1.8) Medium |
| M (1.9, 2.0) Medium |
| F (1.8, 1.9) Medium |
| F (1.7, 1.8) Medium |
| F (1.7, 1.8) Medium [Marks 16] |
| 14. (a) Given the following transactional database1 C, B, H2 B, F, S3 A, F, G4 C, B, H5 B, |
| F, G6 B, E, O |
| (i) We want to mine all the frequent itemsets in the data using theApriori algorithm. |
| Assume the minimum support level is 30%. (You need to give the set of frequent itemsets |
| in L1, L2,… candidate itemsets in C1, C2,…) [Marks 9] |
| (ii) Find all the association rules that involve only B, C.H (in either left or right hand side |
| of the rule). The minimum confidence is 70%.[Marks7] |
| Or |
| (b) Describe the multi-dimensional association rule, giving a suitable example.[Marks 16] |
| 15. (a) BIRCH and CLARANS are two interesting clustering algorithms that perform |
| effective clustering in large data sets. |
| (i) Outline how BIRCH performs clustering in large data sets. [Marks 10] |
| (ii) Compare and outline the major differences of the two scalable clustering algorithms : |
BIRCH and CLARANS.[Marks 6] Or(b) Write a short note on web mining taxonomy. Explain |
| the different activities of text mining. Discuss and elaborate the current trends in data |
| mining.[Marks 6+5+5] |
http://www.ziddu.com/download/7201500/DATAWAREHOUSINGANDMINING.pdf.html



