# Intro to Data Mining

**Alexis91**

Chapter 3, exercises in 3.11

5. Consider the following data set for a binary class problem.

A B Class Label

T F +

T T +

T T +

T F −

T T +

F F −

F F −

F F −

T T −

T F −

a. Calculate the information gain when splitting on A and B. Which

attribute would the decision tree induction algorithm choose?

b. Calculate the gain in the Gini index when splitting on A and B.

Which attribute would the decision tree induction algorithm

choose?

c. Figure 3.11 shows that entropy and the Gini index are both

monotonically increasing on the range [0, 0.5] and they are both

monotonically decreasing on the range [0.5, 1]. Is it possible that

information gain and the gain in the Gini index favor different

attributes? Explain.

7. Consider the following set of training examples.

X Y Z No. of Class C1 Examples No. of Class C2 Examples

0 0 0 5 40

0 0 1 0 15

0 1 0 10 5

0 1 1 45 0

1 0 0 10 5

1 0 1 25 0

1 1 0 5 20

1 1 1 0 15

a. Compute a two-level decision tree using the greedy approach

described in this chapter. Use the classification error rate as the

criterion for splitting. What is the overall error rate of the induced

tree?

b. Repeat part (a) using X as the first splitting attribute and then

choose the best remaining attribute for splitting at each of the two

successor nodes. What is the error rate of the induced tree?

c. Compare the results of parts (a) and (b). Comment on the suitability

of the greedy heuristic used for splitting attribute selection.

8. The following table summarizes a data set with three attributes A, B,

C and two class labels +, −. Build a two-level decision tree.

A B C

Number of Instances

+ −

T T T 5 0

F T T 0 20

T F T 20 0

F F T 0 5

T T F 0 0

F T F 25 0

T F F 0 0

F F F 0 25

a. According to the classification error rate, which attribute would be

chosen as the first splitting attribute? For each attribute, show the

contingency table and the gains in classification error rate.

b. Repeat for the two children of the root node.

c. How many instances are misclassified by the resulting decision

tree?

d. Repeat parts (a), (b), and (c) using C as the splitting attribute.

e. Use the results in parts (c) and (d) to conclude about the greedy

nature of the decision tree induction algorithm.

- a year ago
- 7

**Answer(1)**

Purchase the answer to view it

- DataMining.docx

**other Questions(10)**

- homework 7
- 5 pages Conflict Identification and Resolution
- COMMENT KARAN
- Journal Question 1-2 Paragraphs
- THANKYOU
- QNT 275 Week 4 Quiz Latest Version
- Need Immediate Assistance for Olive Garden Training for Upcoming Virtual Reality Train Installation
- Yhtomit 201700618
- I need this assignment for jun 28
- WHITE PRIVILEGE AND COLORISM WORKSHEET

### Intro to Data Mining

Data Science is a vastly growing field. Part of what makes this such a significant field to be in is its applicability to many other fields and nearly every industry. One of these is …

a year ago### Data mining

You have been asked by management (manufacturing, healthcare, retail, financial, etc. ) to create a demo using a data analytic or BI tool. It is your responsibility to download and produce …

9 months ago### homework assignment 4-Data Mining

- 1. What is the time and space complexity of fuzzy c-means? Of SOM? How do these complexities compare to those of K-means? (Chapter 8)

2. Compare the membership weights and probabilities of …

a year ago- 1. What is the time and space complexity of fuzzy c-means? Of SOM? How do these complexities compare to those of K-means? (Chapter 8)
### Data mining Mid

NOT RATEDReflection manages investigating and portraying your comprehension of something. The information mining causes the associations to accomplish authoritative productivity through the usage …

9 months ago### Data mining 6

NOT RATEDChapter 9 Check Point

Answer the following questions. Please ensure to use the Author, YYYY APA citations with any content brought into the assignment.

- For sparse data, …

8 months ago### WK 3 - Assignment

NOT RATED:

·

**Analyze the given case study on security breach.**·

**Recommend controls to avoid an enterprise security breach.****Read the text sheet named “Local Breach of Sensitive Online Data” and address …**5 months ago