Graphical Analysis Techniques & Examples of Graphical Analysis Techniques in Research

profileclw0dq8
Week2Guidance.docx

Week 2 Guidance

This week, we look at graphical analysis. We learn how to select a graph to best display a certain type of data including two-dimensional scatter plots for paired or bivariate data.  The shape of a scatter plot tells us if the data are correlated with one another. If data are highly correlated, then the value of one variable may be used to make a prediction about the value of the other. This prediction process involves regression analysis and the construction of a regression equation.

As in week one, we will employ the eight elements of thought to critically think about these topics. As you think this week, try to discern the purpose for correlation and regression (Paul and Elder 2006). What questions might we be able to answer? What assumptions must we make? What data do we need? How does our point of view impact our ability to predict? What are the critical ideas or concepts? What conclusions can we draw and what are the consequences or implications?

Bivariate Data in Context

Bivariate data are paired data. The pairing of data does not combine them, but rather associates them according to collection. For example, suppose you collect the height and weight of a high school basketball team. Each player has two unique measurements that describe different traits. Suppose, for example, that there are only five players

Height (in inches)

Weight (in pounds)

67

155

72

220

77

240

74

195

69

175

If we look at just height or just weight, we might display the data as a bar graph or (for more players) a histogram. If we sorted one column and didn’t sort the other we would unpair the data – the 67 inch tall person would be adjacent to the score for the 240 pound person, for example, even though they represent different people. Bivariate data are coupled. In fact, we could also represent the data as a single list of ordered pairs: (67, 155), (72, 220), (77, 240), (74, 195), and (69,175). The first number in each ordered pair represents height and the second number represents weight.

Bivariate data allow us to look at trends in one variable and determine if there is any relationship with trends in the other variable. Do you think that taller people in general will weigh more? If so, then you are suggesting that there is a positive correlation between height and weight. A small business owner might collect bivariate data for the price of a certain product and the number of units sold on a monthly basis. If price increases, we might expect sales to decrease. When one variable increasing is associated with another paired variable decreasing, we refer to the relationship as a negative correlation.

Scatter Diagrams and the Correlation Coefficient

Six Sigma is a set of tools designed to improve business processes by minimizing defects, errors, and variability through the use of statistical tools. On its website, Six Sigma. defines scatter plots as follows:

Scatter plots are used with variable data to study possible relationships between two different variables. Even though a scatter plot depicts a relationship between variables, it does not indicate a cause and effect relationship. Use Scatter plots to determine what happens to one variable when another variable changes value. It is a tool used to visually determine whether a potential relationship exists between an input and an outcome.

So a scatter plot or scatter diagram is just a two-dimensional plot, as you may have done in middle school, where we use one variable as the horizontal axis (x-coordinate), and one variable as the vertical axis (y-coordinate). Our Basketball data above would be plotted as

The correlation coefficient, or Pearson’s r-value is a measure of how closely the scatter plot diagram is modeled by a straight line. The correlation coefficient for any bivariate data will be a number between -1 and +1. Data with an r near -1 are highly correlated in the negative direction, which means there is the inverse relationship discussed in the price and sales example. These data will display as a negatively sloped line in the scatter diagram with a pattern that descends from left to right. Data with a correlation coefficient near +1 are highly correlated in the positive direction and resemble a positively sloped line in the scatter plot. Data with a correlation value near 0 (on either side) are not correlated. No line fits better than any other line and there is practically no association between the values. Non-correlated bivariate data appear like a round cloud of dots with no discernible direction or pattern.

Predictions with Linear Regression

If data are highly correlated, in either the positive or negative direction, then we are able to use information about one value to make predictions about the potential value of the correlated variable. Since we use a straight line approximation for the data, we call this process linear regression. The better our data fit to a straight line, the better our predictions using this method. Another way of stating the same principle is that correlations with a coefficient near +/- 1 carry the most reliability as predictive linear models.

The general process for linear regression is as follows:

1.      Check the strength of the correlation.  Regression usually requires an r-value above 0.4 or below -0.4

2.      Use the least squares method to find the equation for the line of best fit. Often this step is completed using a software package such as Minitab, SPSS, a TI Calculator, or even Excel. The resulting equation will have the form: wk2 formula 1.gif. Where x is the variable depicted on the horizontal axis (input) and wk 2 formula 2.gif is the output or predicted value for the variable on the vertical axis.

3.      Substitute hypothesized values in for x to predict values for y.

Students should be able to: 1. Examine the value of presenting data graphically. 2. Describe guidelines for effectively using graphical tools to present numerical information.

References:

Lind, D. A., Marchal, W. G., & Wathen, S. A. (2017). Statistical techniques in business and economics. (17th ed.). 

Paul, R. and Elder, L. (2006). The Miniature Guide to Critical Thinking: Concepts and Tools., Berkeley, CA: The Foundation for Critical Thinking

Passy. (2012, March 13). Misleading graphs. Retrieved from http://passyworldofmathematics.com/misleading-graphs/

Pearson, Karl (1924). The Life, Letters, and Labours of Francis Galton. London: Cambridge University Press