Data Analysis and Application Problem using SPSS
bbotitRunning head: DATA ANALYSIS AND APPLICATION 1
DATA ANALYSIS AND APPLICATION 12
Data Analysis and Application (DAA): U02A1
MYSTERY STUDENT OF THE WEEK!
Capella University
Data Analysis and Application (DAA): U02A1
In conducting an independent samples t test, it is important to consider the assumptions, and to analyze the data to determine if the assumptions have been met, especially with regard to the variance between group means. Additionally, for the purpose of this assignment, a post-hoc and priori power analysis will assist in determining whether the data can be credibility used in research. Type I and Type II errors can be detrimental to research in the field of psychology and should be carefully avoided; this assignment will address measures that can be taken to avoid such errors.
Section 1: Reporting the t Test Results
In this particular analysis of the bpstudy.sav data set, researchers investigated data from 65 participants. With gender as a predictor variable and heart rate (HR1) as the outcome variable, a t test analysis compared mean heart rates using interval level data for both male and female participants. This data set also contained participant’s smoking status (categorical, nominal data), as well as their weight and systolic/diastolic pressure (interval level data).
Using an independent samples t test, researchers were able to compare mean female heart rates with mean male heart rates, to determine if a significant difference exists among mean heart rates as it relates to the gender variable. Gender, a traditionally dichotomous variable, is a meaningless variable, and as Warner (2013) explains, “it would be nonsense to add up scores for a nominal variable… and calculate a mean… based on the sum of those scores”; therefore, mean gender score was not analyzed in this research (p. 7). Field (2014) explains that variables like gender, ethnicity and other characteristic variables used to identify participants are often collected using nominal, data and in descriptive statistics is not usually relevant to the analysis process (Field, 2013, p. 8). Of the 65 samples collected, 28 were male (N₁) and 36 were female (N₂) with one misidentified gender; for the purpose of this research, participant 11, whose gender is listed as “3”, is automatically selected out and will not be considered for this t test analysis (Warner, 2013, p. 137). As outlined in Table 1 below, the mean heart rate for males was 73.68 beats per minute (BPM) and the mean heart rate for females was 74.97 BPM. The standard deviation for the male heart rate was 9.77 (s₁), rounded to the nearest hundredth, and for the female heart rate it was 7.87 (s₂). The mean difference, (M₁-M₂) is -1.29. These values are essential for computing effect size.
Table 1
Descriptive statistics for heart rate by gender
Group Statistics |
|||||
|
GENDER |
N |
Mean |
Std. Deviation |
Std. Error Mean |
HR1 |
male |
28 |
73.68 |
9.772 |
1.847 |
|
female |
36 |
74.97 |
7.865 |
1.311 |
The effect size is calculated using Cohen’s d, which is computed by dividing the mean difference by the (average) standard deviation, (M₁-M₂)/s₁ or (M₁-M₂)/s₂. Therefore, -1.29/ 9.77 = -0.13 for s₁, rounded to the nearest hundredth, and -1.29/ 7.87 = -0.16 for s₂. Averaging these two computations yields a value of -0.145, or an absolute value of 0.15, rounded. In accordance with Warner (2013), this value indicates a small effect size (Warner, 2013, p. 208). Therefore, data analyzation supports there is little difference between mean heart rates for male and female participants, and the effect size is not of great significance.
There are several assumptions that must be met in order to conduct an independent samples t test. First, the outcome variable should be quantitative and normally distributed. Second, the variance should be generally similar or equal across groups, also referred to as homogeneity of variance; finally, there should be independent observations both between and within groups, meaning that each group is independent of one another (Warner, 2013, p.189-190). As explained in the last assignment, and again detailed in the histogram below in Table 2, the data for heart rate is quantitative and normally distributed with a mostly symmetrical, mesokurtic shape; there are no extreme outliers. The assumption for independence of observations is satisfied, both between and within groups, because there is no group overlap with regard to the participant’s gender.
Table 2
Heart rate histogram
Levene’s test for equality of variance, as detailed in Table 3, indicates that the significance value of 0.125 is well above the alpha value of 0.05, so equal variance is assumed for males and females. This shows that the assumption for the equality of variances has been satisfied, and equal variances assumed data should be utilized. Levene’s test rationalizes that the data is not significantly different, so it satisfies the assumption of homogeneity of variance. Based on the equal variances assumed data, the t value is -0.59, rounded to the nearest hundredth with 62 df.
Table 3
Levene’s test for homogeneity of variance
Independent Samples Test |
||||||||||
|
Levene's Test for Equality of Variances |
t-test for Equality of Means |
||||||||
|
F |
Sig. |
t |
Df |
Sig. (2-tailed) |
Mean Difference |
Std. Error Difference |
95% Confidence Interval of the Difference |
||
|
|
|
|
|
|
|
|
Lower |
Upper |
|
HR1 |
Equal variances assumed |
2.421 |
.125 |
-.587 |
62 |
.559 |
-1.294 |
2.204 |
-5.699 |
3.112 |
|
Equal variances not assumed |
|
|
-.571 |
51.062 |
.570 |
-1.294 |
2.265 |
-5.840 |
3.253 |
The null hypothesis for the purpose of this research is that there is no difference between males and females with regard to their heart rate, while the alternative hypothesis states there is a difference between male and female heart rate. The alpha level for the purpose of this analysis is 0.05 (Warner, 2013; George & Mallery, 2014).
According to Table 3, the t ratio is -0.587 with 62 degrees of freedom. The degrees of freedom are calculated using N-2 (Warner, 2013, p. 199). The critical t value, according to Warner (2013), is 2.00 for a two-tailed test (p. 1057). The p value is 0.559, which is significantly more that the alpha value of 0.05, which justifies a failure to reject the null hypothesis.
Females have slightly higher mean heart rates than their male counterparts in this data set, however, the mean values do not indicate that the higher heart rates are statistically significant. The data analysis in this report constitutes a failure to reject the null hypothesis, which suggests that there is no difference between males and females with regard to gender and their mean heart rate. The alternative hypothesis that there is a difference between the male and female heart rate is subsequently rejected.
Section 2: Post-hoc Power Analysis
Warner (2013) defines statistical power as “the probability of correctly rejecting the H₀ when H₀ is false” (p. 107). The G*Power post-hoc power analysis gives the researcher insight into how the two gender groups vary with regard to mean differences, in this case, heart rate. Observation of the distribution graph below in Table 4 shows a great deal of overlap between males and females in this data set. Table 4 suggests that there is little difference between male and female heart rates, and rejecting the null hypothesis is justifiable, based on visual interpretation of this output graph.
Table 4
G*Power distribution graph for post-hoc power analysis
Table 5 below shows the critical t value is -1.9989715, or -2.00 rounded, which is in accordance with the findings in section 1. The noncentrality parameter δ computes to a value of -0.52 and the Power (1-β err prob) is 0.08; both are higher than the alpha value of 0.05. A Type II error would be committed if it was determined there was no difference between male and female heart rates when, in fact, there was a difference between the mean heart rates for males and females, which would be indicative of a false negative (Field, 2013; Warner, 2013). In the output in Table 5, the risk of committing such an error is relatively low, but still higher than the alpha level.
Table 5
G*Power output values for post-hoc power analysis
t tests - Means: Difference between two independent means (two groups)
Analysis: Post hoc: Compute achieved power
Input: Tail(s) = Two
Effect size d = -0.13
α err prob = 0.05
Sample size group 1 = 28
Sample size group 2 = 36
Output: Noncentrality parameter δ = -0.5159215
Critical t = -1.9989715
Df = 62
Power (1-β err prob) = 0.0800468
Section 3: A Priori Power Analysis
The G*Power priori power analysis gives the researcher insight into how to ensure ample data collection will be necessary to achieve sufficient results. In this particular research, increasing the Power (1-β err prob) to 0.80 would require data collection on a much larger scale N₁=930 and N₂=930. The larger sample size would likely yield less overlap in the distribution of scores; however, obtaining data from a large sample size is generally more difficult (Warner, 2013, p. 103-104). Table 6 below shows the decreased overlap when sample size is significantly increased.
Table 6
G*Power distribution graph for priori power analysis
By increasing the Power (1-β err prob) to 0.80, the sample size is also significantly increased, as shown in Table 7 below, where N₁ and N₂ both equal 930. This is in line with findings that increasing the sample size, effect size or alpha level increases power; when Cohen’s d increases, so will the t ratio because there is a greater likelihood to acquire a greater t value when the effect size is greater (Warner, 2013, p. 109).
When N is increased, statistical power also increases; having a high N value will reduce the risk of the researcher committing a Type II error (Warner, 2013, p. 109). This is a logical conclusion because the greater the sample size, the more confidently a researcher can generalize. When the alpha level is made smaller, statistical power decreases; contrarily, if the alpha level is increased, statistical power also increases; the smaller the threshold to reject the null, the more credible the claims made by the researcher (Warner, 2013, p. 108).
Table 7
G*Power output values for priori power analysis
t tests - Means: Difference between two independent means (two groups)
Analysis: A priori: Compute required sample size
Input: Tail(s) = Two
Effect size d = -0.13
α err prob = 0.05
Power (1-β err prob) = 0.80
Allocation ratio N2/N1 = 1
Output: Noncentrality parameter δ = -2.8033016
Critical t = -1.9612416
Df = 1858
Sample size group 1 = 930
Sample size group 2 = 930
Total sample size = 1860
Actual power = 0.8000757
By further manipulating the effect size to 0.50, Table 8 and Table 9 show that it would be necessary to obtain equal sample sizes, N₁=64 and N₂=64. The researcher can ensure valid data is obtained in research by being aware of these principles in advance, and compensating during data collection if necessary. For example, if Cohen’s d is small, the necessary sample size to receive the desired level of statistical power (80%) can be determined prior to research; the development of tables, like Table 3.3 in Warner (2013) assist in this process (p. 112-113).
Table 8
G*Power distribution output for priori power analysis with medium effect size
Table 9
G*Power output values for priori power analysis with medium effect size
t tests - Means: Difference between two independent means (two groups)
Analysis: A priori: Compute required sample size
Input: Tail(s) = Two
Effect size d = 0.50
α err prob = 0.05
Power (1-β err prob) = 0.80
Allocation ratio N2/N1 = 1
Output: Noncentrality parameter δ = 2.8284271
Critical t = 1.9789706
Df = 126
Sample size group 1 = 64
Sample size group 2 = 64
Total sample size = 128
Actual power = 0.8014596
Conclusion
In conducting psychological research, it is imperative that the data collected be, not only error-free, but statistically significant; the data analysis should also provide sound insight that will answer the research question. Setting alpha levels, threshold values and gauging statistical power are all avenues of rationalizing findings and providing accountability through concrete and standardized values. Conducting priori power analysis can ensure that ample data is collected, that alpha levels are reasonable for the sample size and that the effect size correctly measures the significance of the variance.
References
Field, A. (2013). Discovering statistics using IBM SPSS statistics (4th ed.). Thousand Oaks, CA:
SAGE Publications, Inc.
George, D. & Mallery, P. (2014). IBM statistics 21 step by step: A simple guide and reference
(13th ed.). Boston, MA: Pearson.
Warner, R. M. (2013). Applied statistics: From bivariate through multivariate techniques (2nd
ed.). Los Angeles, CA: SAGE Publications, Inc.