Introduction to Data Mining

profileanilkumar2045
Homework.docx

1. Consider once again the coffee-tea example, presented in Example 10.9. The following two tables are the same as the one presented in Example 10.9 except that each entry has been divided by 10 (left table) or multiplied by 10 (right table).

Table 10.7. Beverage preferences among a group of 100 people (left) and 10,000 people (right).

a. Compute the p-value of the observed support count for each table, i.e., for 15 and 1500. What pattern do you observe as the sample size increases?

P value for table 1=0.5319

P value for table 2=4.104E-10

We observe that as sample size increases p value decreases!

 

Coffee

No Coffee

 

 

Coffee

No Coffee

 

Tea

15

5

20

Tea

1500

500

2000

No Tea

65

15

80

No Tea

6500

1500

8000

 

80

20

100

 

8000

2000

10000

 

 

 

 

 

 

 

 

Expected

Expected

 

Coffee

No Coffee

 

 

Coffee

No Coffee

 

Tea

16

4

 

Tea

1600

400

 

No Tea

64

14

 

No Tea

6400

1600

 

 

 

 

 

 

 

 

 

p value 0.531971

p value 4.10453E-10

In excel, we will calculate the expected table for finding out the p value.

Expected table

80*20/100=16

80*80/100=64

20*20/100=4

p value =chitest(observed,expected)

p value for table 1=0.5319

p value for table 2=4.104E-10

we observe that as sample size increases p value decreases

b. Compute the odds ratio and interest factor for the two contingency tables presented in this problem and the original table of Example 10.9. (See Section 5.7.1 for definitions of these two measures.) What pattern do you observe?

c. The odds ratio and interest factor are measures of effect size. Are these two effect sizes significant from a practical point of view?

d. What would you conclude about the relationship between p-values and effect size for this situation?

2. Consider the different combinations of effect size and p-value applied to an experiment where we want to determine the efficacy of a new drug.

(i) effect size small, p-value small

(ii) effect size small, p-value large

(iii) effect size large, p-value small

(iv) effect size large, p-value large

Whether effect size is small or large depends on the domain, which in this case is medical. For this problem consider a small p-value to be less than 0.001, while a large p-value is above 0.05. Assume that the sample size is relatively large, e.g., thousands of patients with the condition that the drug hopes to treat.

a. Which combination(s) would very likely be of interest?

b. Which combinations(s) would very likely not be of interest?

c. If the sample size were small, would that change your answers?

2