eco homework for 6 hours

profilefuducia
gujarati_-_essentials_of_econometrics_4ed.pdf

ESSENTIALS OF ECONOMETRICS

FOURTH EDITION

Damodar N. Gujarati Professor Emeritus of Economics, United States Military Academy, West Point

Dawn C. Porter University of Southern California

Boston Burr Ridge, IL Dubuque, IA New York San Francisco St. Louis Bangkok Bogotá Caracas Kuala Lumpur Lisbon London Madrid Mexico City Milan Montreal New Delhi Santiago Seoul Singapore Sydney Taipei Toronto

guj75845_fm.qxd 4/16/09 12:48 PM Page i

ESSENTIALS OF ECONOMETRICS Published by McGraw-Hill/Irwin, a business unit of The McGraw-Hill Companies, Inc., 1221 Avenue of the Americas, New York, NY, 10020. Copyright © 2010, 2006, 1999, 1992 by The McGraw- Hill Companies, Inc. All rights reserved. No part of this publication may be reproduced or distrib- uted in any form or by any means, or stored in a database or retrieval system, without the prior written consent of The McGraw-Hill Companies, Inc., including, but not limited to, in any network or other electronic storage or transmission, or broadcast for distance learning.

Some ancillaries, including electronic and print components, may not be available to customers out- side the United States.

This book is printed on acid-free paper.

1 2 3 4 5 6 7 8 9 0 DOC/DOC 0 9

ISBN 978-0-07-337584-7 MHID 0-07-337584-5

Vice president and editor-in-chief: Brent Gordon Publisher: Douglas Reiner Director of development: Ann Torbert Development editor: Anne E. Hilbert Editorial coordinator: Noelle Fox Vice president and director of marketing: Robin J. Zwettler Associate marketing manager: Dean Karampelas Vice president of editing, design and production: Sesha Bolisetty Project manager: Kathryn D. Mikulic Lead production supervisor: Carol A. Bielski Design coordinator: Joanne Mennemeier Media project manager: Suresh Babu, Hurix Systems Pvt. Ltd. Typeface: 10/12 Palatino Compositor: Macmillan Publishing Solutions Printer: R. R. Donnelley

Library of Congress Cataloging-in-Publication Data

Gujarati, Damodar N. Essentials of econometrics / Damodar N. Gujarati, Dawn C. Porter.—4th ed.

p. cm. Includes index. ISBN-13: 978-0-07-337584-7 (alk. paper) ISBN-10: 0-07-337584-5 (alk. paper) 1. Econometrics. 2. Economics—Statistical methods. I. Porter, Dawn C. II. Title.

HB139.G85 2010 330.01'5195—dc22

2009010482

www.mhhe.com

guj75845_fm.qxd 4/16/09 12:48 PM Page ii

For Joan Gujarati, Diane Gujarati-Chesnut, Charles Chesnut, and my grandchildren,

“Tommy” and Laura Chesnut. DNG

For Judy, Lee, Brett, Bryan, Amy, and Autumn Porter. But especially for my adoring father, Terry.

DCP

guj75845_fm.qxd 4/16/09 12:48 PM Page iii

ABOUT THE AUTHORS

DAMODAR N. GUJARATI After teaching for more than 25 years at the City University of New York and 17 years in the Department of Social Sciences, U.S. Military Academy at West Point, New York, Dr. Gujarati is currently Professor Emeritus of economics at the Academy. Dr. Gujarati received his M.Com. degree from the University of Bombay in 1960, his M.B.A. degree from the University of Chicago in 1963, and his Ph.D. degree from the University of Chicago in 1965. Dr. Gujarati has pub- lished extensively in recognized national and international journals, such as the Review of Economics and Statistics, the Economic Journal, the Journal of Financial and Quantitative Analysis, and the Journal of Business. Dr. Gujarati was a member of the board of editors of the Journal of Quantitative Economics, the official journal of the Indian Econometric Society. Dr. Gujarati is also the au- thor of Pensions and the New York City Fiscal Crisis (the American Enterprise Institute, 1978), Government and Business (McGraw-Hill, 1984), and Basic Econometrics (McGraw-Hill, 5th ed., 2009). Dr. Gujarati’s books on economet- rics have been translated into several languages.

Dr. Gujarati was a Visiting Professor at the University of Sheffield, U.K. (1970–1971), a Visiting Fulbright Professor to India (1981–1982), a Visiting Professor in the School of Management of the National University of Singapore 1985–1986), and a Visiting Professor of Econometrics, University of New South Wales, Australia (summer of 1988). Dr. Gujarati has lectured extensively on micro- and macroeconomic topics in countries such as Australia, China, Bangladesh, Germany, India, Israel, Mauritius, and the Republic of South Korea.

iv

guj75845_fm.qxd 4/16/09 12:48 PM Page iv

ABOUT THE AUTHORS v

DAWN C. PORTER Dawn Porter has been an assistant professor in the Information and Operations Management Department at the Marshall School of Business of the University of Southern California since the fall of 2006. She currently teaches undergraduate, M.B.A., and graduate elective statistics courses in the business school. Prior to joining the faculty at USC, from 2001–2006, Dawn was an assistant professor at the McDonough School of Business at Georgetown University and also served as a Visiting Professor in the Psychology Department at the Graduate School of Arts and Sciences at NYU. At NYU she taught a number of advanced statistical methods courses and was also an instructor at the Stern School of Business. Her Ph.D. is from the Stern School in Statistics, and her undergraduate degree is in mathematics from Cornell University.

Dawn’s areas of research interest include categorical analysis, agreement measures, multivariate modeling, and applications to the field of psychology. Her current research examines online auction models from a statistical perspective. She has presented her research at the Joint Statistical Meetings, the Decision Sciences Institute meetings, the International Conference on Information Systems, several universities including the London School of Economics and NYU, and various e-commerce and statistics seminar series. Dawn is also a co- author on Essentials of Business Statistics, 2nd edition and Basic Econometrics, 5th edition, both from McGraw-Hill.

Outside academics, Dawn has been employed as a statistical consultant for KPMG, Inc. She also has worked as a statistical consultant for many other major companies, including Ginnie Mae, Inc.; Toys R Us Corporation; IBM; Cosmaire, Inc; and New York University (NYU) Medical Center.

guj75845_fm.qxd 4/16/09 12:48 PM Page v

guj75845_fm.qxd 4/16/09 12:48 PM Page vi

CONTENTS

PREFACE xix

1 The Nature and Scope of Econometrics 1 1.1 WHAT IS ECONOMETRICS? 1 1.2 WHY STUDY ECONOMETRICS? 2 1.3 THE METHODOLOGY OF ECONOMETRICS 3

Creating a Statement of Theory or Hypothesis 3 Collecting Data 4 Specifying the Mathematical Model of Labor Force Participation 5 Specifying the Statistical, or Econometric, Model of Labor Force

Participation 7 Estimating the Parameters of the Chosen Econometric Model 9 Checking for Model Adequacy: Model Specification Testing 9 Testing the Hypothesis Derived from the Model 11 Using the Model for Prediction or Forecasting 12

1.4 THE ROAD AHEAD 12 KEY TERMS AND CONCEPTS 13 QUESTIONS 14 PROBLEMS 14 APPENDIX 1A: ECONOMIC DATA ON

THE WORLD WIDE WEB 16

PART I THE LINEAR REGRESSION MODEL 19

2 Basic Ideas of Linear Regression:The Two-Variable Model 21 2.1 THE MEANING OF REGRESSION 21 2.2 THE POPULATION REGRESSION FUNCTION (PRF):

A HYPOTHETICAL EXAMPLE 22

vii

guj75845_fm.qxd 4/16/09 12:48 PM Page vii

2.3 STATISTICAL OR STOCHASTIC SPECIFICATION OF THE POPULATION REGRESSION FUNCTION 25

2.4 THE NATURE OF THE STOCHASTIC ERROR TERM 27 2.5 THE SAMPLE REGRESSION FUNCTION (SRF) 28 2.6 THE SPECIAL MEANING OF THE TERM “LINEAR”

REGRESSION 31 Linearity in the Variables 31 Linearity in the Parameters 32

2.7 TWO-VARIABLE VERSUS MULTIPLE LINEAR REGRESSION 33

2.8 ESTIMATION OF PARAMETERS: THE METHOD OF ORDINARY LEAST SQUARES 33

The Method of Ordinary Least Squares 34 2.9 PUTTING IT ALL TOGETHER 36

Interpretation of the Estimated Math S.A.T. Score Function 37 2.10 SOME ILLUSTRATIVE EXAMPLES 38 2.11 SUMMARY 43

KEY TERMS AND CONCEPTS 44 QUESTIONS 44 PROBLEMS 45 OPTIONAL QUESTIONS 51 APPENDIX 2A: DERIVATION OF LEAST-SQUARES

ESTIMATES 52

3 The Two-Variable Model: Hypothesis Testing 53 3.1 THE CLASSICAL LINEAR REGRESSION MODEL 54 3.2 VARIANCES AND STANDARD ERRORS OF

ORDINARY LEAST SQUARES ESTIMATORS 57 Variances and Standard Errors of the Math S.A.T. Score Example 59 Summary of the Math S.A.T. Score Function 59

3.3 WHY OLS? THE PROPERTIES OF OLS ESTIMATORS 60 Monte Carlo Experiment 61

3.4 THE SAMPLING, OR PROBABILITY, DISTRIBUTIONS OF OLS ESTIMATORS 62

3.5 HYPOTHESIS TESTING 64 Testing = 0 versus : The Confidence

Interval Approach 66 The Test of Significance Approach to Hypothesis Testing 68 Math S.A.T. Example Continued 69

3.6 HOW GOOD IS THE FITTED REGRESSION LINE: THE COEFFICIENT OF DETERMINATION, r2 71

Formulas to Compute r2 73 r2 for the Math S.A.T. Example 74 The Coefficient of Correlation, r 74

3.7 REPORTING THE RESULTS OF REGRESSION ANALYSIS 75

H1:B2 Z 0H0:B2

viii CONTENTS

guj75845_fm.qxd 4/17/09 8:17 AM Page viii

CONTENTS ix

3.8 COMPUTER OUTPUT OF THE MATH S.A.T. SCORE EXAMPLE 76

3.9 NORMALITY TESTS 77 Histograms of Residuals 77 Normal Probability Plot 78 Jarque-Bera Test 78

3.10 A CONCLUDING EXAMPLE: RELATIONSHIP BETWEEN WAGES AND PRODUCTIVITY IN THE U.S. BUSINESS SECTOR, 1959–2006 79

3.11 A WORD ABOUT FORECASTING 82 3.12 SUMMARY 85

KEY TERMS AND CONCEPTS 86 QUESTIONS 86 PROBLEMS 88

4 Multiple Regression: Estimation and Hypothesis Testing 93 4.1 THE THREE-VARIABLE LINEAR REGRESSION

MODEL 94 The Meaning of Partial Regression Coefficient 95

4.2 ASSUMPTIONS OF THE MULTIPLE LINEAR REGRESSION MODEL 97

4.3 ESTIMATION OF THE PARAMETERS OF MULTIPLE REGRESSION 99

Ordinary Least Squares Estimators 99 Variance and Standard Errors of OLS Estimators 100 Properties of OLS Estimators of Multiple Regression 102

4.4 GOODNESS OF FIT OF ESTIMATED MULTIPLE REGRESSION: MULTIPLE COEFFICIENT OF DETERMINATION, R2 102

4.5 ANTIQUE CLOCK AUCTION PRICES REVISITED 103 Interpretation of the Regression Results 103

4.6 HYPOTHESIS TESTING IN A MULTIPLE REGRESSION: GENERAL COMMENTS 104

4.7 TESTING HYPOTHESES ABOUT INDIVIDUAL PARTIAL REGRESSION COEFFICIENTS 105

The Test of Significance Approach 105 The Confidence Interval Approach to Hypothesis Testing 106

4.8 TESTING THE JOINT HYPOTHESIS THAT 107

An Important Relationship between F and R2 111 4.9 TWO-VARIABLE REGRESSION IN THE CONTEXT OF

MULTIPLE REGRESSION: INTRODUCTION TO SPECIFICATION BIAS 112

4.10 COMPARING TWO R2 VALUES: THE ADJUSTED R2 113

B2 = B3 = 0 OR R 2 = 0

guj75845_fm.qxd 4/17/09 11:20 AM Page ix

4.11 WHEN TO ADD AN ADDITIONAL EXPLANATORY VARIABLE TO A MODEL 114

4.12 RESTRICTED LEAST SQUARES 116 4.13 ILLUSTRATIVE EXAMPLES 117

Discussion of Regression Results 118 4.14 SUMMARY 122

KEY TERMS AND CONCEPTS 123 QUESTIONS 123 PROBLEMS 125 APPENDIX 4A.1: DERIVATIONS OF OLS ESTIMATORS

GIVEN IN EQUATIONS (4.20) TO (4.22) 129 APPENDIX 4A.2: DERIVATION OF EQUATION (4.31) 129 APPENDIX 4A.3: DERIVATION OF EQUATION (4.50) 130 APPENDIX 4A.4: EVIEWS OUTPUT OF THE

CLOCK AUCTION PRICE EXAMPLE 131

5 Functional Forms of Regression Models 132 5.1 HOW TO MEASURE ELASTICITY: THE LOG-LINEAR

MODEL 133 Hypothesis Testing in Log-Linear Models 137

5.2 COMPARING LINEAR AND LOG-LINEAR REGRESSION MODELS 138

5.3 MULTIPLE LOG-LINEAR REGRESSION MODELS 140 5.4 HOW TO MEASURE THE GROWTH RATE: THE

SEMILOG MODEL 144 Instantaneous versus Compound Rate of Growth 147 The Linear Trend Model 148

5.5 THE LIN-LOG MODEL: WHEN THE EXPLANATORY VARIABLE IS LOGARITHMIC 149

5.6 RECIPROCAL MODELS 150 5.7 POLYNOMIAL REGRESSION MODELS 156 5.8 REGRESSION THROUGH THE ORIGIN 158 5.9 A NOTE ON SCALING AND UNITS OF MEASUREMENT 160 5.10 REGRESSION ON STANDARDIZED VARIABLES 161 5.11 SUMMARY OF FUNCTIONAL FORMS 163 5.12 SUMMARY 164

KEY TERMS AND CONCEPTS 165 QUESTIONS 166 PROBLEMS 167 APPENDIX 5A: LOGARITHMS 175

6 Dummy Variable Regression Models 178 6.1 THE NATURE OF DUMMY VARIABLES 178 6.2 ANCOVA MODELS: REGRESSION ON ONE

QUANTITATIVE VARIABLE AND ONE QUALITATIVE VARIABLE WITH TWO CATEGORIES: EXAMPLE 6.1 REVISITED 185

x CONTENTS

guj75845_fm.qxd 4/16/09 12:48 PM Page x

6.3 REGRESSION ON ONE QUANTITATIVE VARIABLE AND ONE QUALITATIVE VARIABLE WITH MORE THAN TWO CLASSES OR CATEGORIES 187

6.4 REGRESSION ON ONE QUANTIATIVE EXPLANATORY VARIABLE AND MORE THAN ONE QUALITATIVE VARIABLE 190

Interaction Effects 191 A Generalization 192

6.5 COMPARING TWO REGESSIONS 193 6.6 THE USE OF DUMMY VARIABLES IN SEASONAL

ANALYSIS 198 6.7 WHAT HAPPENS IF THE DEPENDENT VARIABLE IS

ALSO A DUMMY VARIABLE? THE LINEAR PROBABILITY MODEL (LPM) 201

6.8 SUMMARY 204 KEY TERMS AND CONCEPTS 205 QUESTIONS 206 PROBLEMS 207

PART II REGRESSION ANALYSIS IN PRACTICE 217

7 Model Selection: Criteria and Tests 219 7.1 THE ATTRIBUTES OF A GOOD MODEL 220 7.2 TYPES OF SPECIFICATION ERRORS 221 7.3 OMISSON OF RELEVANT VARIABLE BIAS:

“UNDERFITTING” A MODEL 221 7.4 INCLUSION OF IRRELEVANT VARIABLES:

“OVERFITTING” A MODEL 225 7.5 INCORRECT FUNCTIONAL FORM 227 7.6 ERRORS OF MEASUREMENT 229

Errors of Measurement in the Dependent Variable 229 Errors of Measurement in the Explanatory Variable(s) 229

7.7 DETECTING SPECIFICATION ERRORS: TESTS OF SPECIFICATION ERRORS 230

Detecting the Presence of Unnecessary Variables 230 Tests for Omitted Variables and Incorrect Functional Forms 233 Choosing between Linear and Log-linear Regression Models:

The MWD Test 235 Regression Error Specification Test: RESET 237

7.8 SUMMARY 239 KEY TERMS AND CONCEPTS 240 QUESTIONS 240 PROBLEMS 241

CONTENTS xi

guj75845_fm.qxd 4/16/09 12:48 PM Page xi

8 Multicollinearity: What Happens If Explanatory Variables are Correlated? 245 8.1 THE NATURE OF MULTICOLLINEARITY: THE

CASE OF PERFECT MULTICOLLINEARITY 246 8.2 THE CASE OF NEAR, OR IMPERFECT,

MULTICOLLINEARITY 248 8.3 THEORETICAL CONSEQUENCES OF

MULTICOLLINEARITY 250 8.4 PRACTICAL CONSEQUENCES OF MULTICOLLINEARITY 251 8.5 DETECTION OF MULTICOLLINEARITY 253 8.6 IS MULTICOLLINEARITY NECESSARILY BAD? 258 8.7 AN EXTENDED EXAMPLE: THE DEMAND FOR

CHICKENS IN THE UNITED STATES, 1960 TO 1982 259 Collinearity Diagnostics for the Demand Function for

Chickens (Equation [8.15]) 260 8.8 WHAT TO DO WITH MULTICOLLINEARITY:

REMEDIAL MEASURES 261 Dropping a Variable(s) from the Model 262 Acquiring Additional Data or a New Sample 262 Rethinking the Model 263 Prior Information about Some Parameters 264 Transformation of Variables 265 Other Remedies 266

8.9 SUMMARY 266 KEY TERMS AND CONCEPTS 267 QUESTIONS 267 PROBLEMS 268

9 Heteroscedasticity: What Happens If the Error Variance Is Nonconstant? 274 9.1 THE NATURE OF HETEROSCEDASTICITY 274 9.2 CONSEQUENCES OF HETEROSCEDASTICITY 280 9.3 DETECTION OF HETEROSCEDASTICITY: HOW DO

WE KNOW WHEN THERE IS A HETEROSCEDASTICITY PROBLEM? 282

Nature of the Problem 283 Graphical Examination of Residuals 283 Park Test 285 Glejser Test 287 White’s General Heteroscedasticity Test 289 Other Tests of Heteroscedasticity 290

9.4 WHAT TO DO IF HETEROSCEDASTICITY IS OBSERVED: REMEDIAL MEASURES 291

When �2i Is Known: The Method of Weighted Least Squares (WLS) 291 When True �2i Is Unknown 292 Respecification of the Model 297

xii CONTENTS

guj75845_fm.qxd 4/16/09 12:48 PM Page xii

9.5 WHITE’S HETEROSCEDASTICITY-CORRECTED STANDARD ERRORS AND t STATISTICS 298

9.6 SOME CONCRETE EXAMPLES OF HETEROSCEDASTICITY 299

9.7 SUMMARY 302 KEY TERMS AND CONCEPTS 303 QUESTIONS 304 PROBLEMS 304

10 Autocorrelation: What Happens If Error Terms Are Correlated? 312 10.1 THE NATURE OF AUTOCORRELATION 313

Inertia 314 Model Specification Error(s) 315 The Cobweb Phenomenon 315 Data Manipulation 315

10.2 CONSEQUENCES OF AUTOCORRELATION 316 10.3 DETECTING AUTOCORRELATION 317

The Graphical Method 318 The Durbin-Watson d Test 320

10.4 REMEDIAL MEASURES 325 10.5 HOW TO ESTIMATE � 327

� � 1: The First Difference Method 327 � Estimated from Durbin-Watson d Statistic 327 � Estimated from OLS Residuals, et 328 Other Methods of Estimating � 328

10.6 A LARGE SAMPLE METHOD OF CORRECTING OLS STANDARD ERRORS: THE NEWEY-WEST (NW) METHOD 332

10.7 SUMMARY 334 KEY TERMS AND CONCEPTS 335 QUESTIONS 335 PROBLEMS 336 APPENDIX 10A: THE RUNS TEST 341 Swed-Eisenhart Critical Runs Test 342 Decision Rule 342 APPENDIX 10B: A GENERAL TEST OF

AUTOCORRELATION: THE BREUSCH-GODFREY (BG) TEST 343

PART III ADVANCED TOPICS IN ECONOMETRICS 345

11 Simultaneous Equation Models 347 11.1 THE NATURE OF SIMULTANEOUS EQUATION MODELS 348 11.2 THE SIMULTANEOUS EQUATION BIAS:

INCONSISTENCY OF OLS ESTIMATORS 350

CONTENTS xiii

guj75845_fm.qxd 4/16/09 12:48 PM Page xiii

11.3 THE METHOD OF INDIRECT LEAST SQUARES (ILS) 352 11.4 INDIRECT LEAST SQUARES: AN ILLUSTRATIVE

EXAMPLE 353 11.5 THE IDENTIFICATION PROBLEM: A ROSE BY

ANY OTHER NAME MAY NOT BE A ROSE 355 Underidentification 356 Just or Exact Identification 357 Overidentification 359

11.6 RULES FOR IDENTIFICATION: THE ORDER CONDITION OF IDENTIFICATION 361

11.7 ESTIMATION OF AN OVERIDENTIFIED EQUATION: THE METHOD OF TWO-STAGE LEAST SQUARES 362

11.8 2SLS: A NUMERICAL EXAMPLE 364 11.9 SUMMARY 365

KEY TERMS AND CONCEPTS 366 QUESTIONS 367 PROBLEMS 367 APPENDIX 11A: INCONSISTENCY OF OLS ESTIMATORS 369

12 Selected Topics in Single Equation Regression Models 371 12.1 DYNAMIC ECONOMIC MODELS: AUTOREGRESSIVE AND

DISTRIBUTED LAG MODELS 371 Reasons for Lag 372 Estimation of Distributed Lag Models 374 The Koyck, Adaptive Expectations, and Stock Adjustment Models

Approach to Estimating Distributed Lag Models 377 12.2 THE PHENOMENON OF SPURIOUS REGRESSION:

NONSTATIONARY TIME SERIES 380 12.3 TESTS OF STATIONARITY 382 12.4 COINTEGRATED TIME SERIES 383 12.5 THE RANDOM WALK MODEL 384 12.6 THE LOGIT MODEL 386

Estimation of the Logit Model 390 12.7 SUMMARY 396

KEY TERMS AND CONCEPTS 397 QUESTIONS 397 PROBLEMS 398

INTRODUCTION TO APPENDIXES A, B, C, AND D: BASICS OF PROBABILITY AND STATISTICS 403

Appendix A: Review of Statistics: Probability and Probability Distributions 405

A.1 SOME NOTATION 405 The Summation Notation 405 Properties of the Summation Operator 406

xiv CONTENTS

guj75845_fm.qxd 4/16/09 12:48 PM Page xiv

A.2 EXPERIMENT, SAMPLE SPACE, SAMPLE POINT, AND EVENTS 407

Experiment 407 Sample Space or Population 407 Sample Point 408 Events 408 Venn Diagrams 408

A.3 RANDOM VARIABLES 409 A.4 PROBABILITY 410

Probability of an Event: The Classical or A Priori Definition 410 Relative Frequency or Empirical Definition of Probability 411 Probability of Random Variables 417

A.5 RANDOM VARIABLES AND THEIR PROBABILITY DISTRIBUTIONS 417

Probability Distribution of a Discrete Random Variable 417 Probability Distribution of a Continuous Random Variable 419 Cumulative Distribution Function (CDF) 420

A.6 MULTIVARIATE PROBABILITY DENSITY FUNCTIONS 422 Marginal Probability Functions 424 Conditional Probability Functions 425 Statistical Independence 427

A.7 SUMMARY AND CONCLUSIONS 428 KEY TERMS AND CONCEPTS 428 REFERENCES 429 QUESTIONS 429 PROBLEMS 430

Appendix B: Characteristics of Probability Distributions 434 B.1 EXPECTED VALUE: A MEASURE OF CENTRAL

TENDENCY 434 Properties of Expected Value 436 Expected Value of Multivariate Probability Distributions 437

B.2 VARIANCE: A MEASURE OF DISPERSION 438 Properties of Variance 439 Chebyshev’s Inequality 441 Coefficient of Variation 442

B.3 COVARIANCE 443 Properties of Covariance 444

B.4 CORRELATION COEFFICIENT 445 Properties of Correlation Coefficient 445 Variances of Correlated Variables 447

B.5 CONDITIONAL EXPECTATION 447 Conditional Variance 449

B.6 SKEWNESS AND KURTOSIS 449 B.7 FROM THE POPULATION TO THE SAMPLE 452

Sample Mean 452

CONTENTS xv

guj75845_fm.qxd 4/16/09 12:48 PM Page xv

Sample Variance 453 Sample Covariance 454 Sample Correlation Coefficient 455 Sample Skewness and Kurtosis 456

B.8 SUMMARY 456 KEY TERMS AND CONCEPTS 457 QUESTIONS 457 PROBLEMS 458 OPTIONAL EXERCISES 460

Appendix C: Some Important Probability Distributions 461 C.1 THE NORMAL DISTRIBUTION 462

Properties of the Normal Distribution 462 The Standard Normal Distribution 464 Random Sampling from a Normal Population 468 The Sampling or Probability Distribution of the Sample Mean X

– 468 The Central Limit Theorem (CLT) 472

C.2 THE t DISTRIBUTION 473 Properties of the t Distribution 474

C.3 THE CHI-SQUARE ( 2) PROBABILITY DISTRIBUTION 477 Properties of the Chi-square Distribution 478

C.4 THE F DISTRIBUTION 480 Properties of the F Distribution 481

C.5 SUMMARY 483 KEY TERMS AND CONCEPTS 483 QUESTIONS 484 PROBLEMS 484

Appendix D: Statistical Inference: Estimation and Hypothesis Testing 487

D.1 THE MEANING OF STATISTICAL INFERENCE 487 D.2 ESTIMATION AND HYPOTHESIS TESTING:

TWIN BRANCHES OF STATISTICAL INFERENCE 489 D.3 ESTIMATION OF PARAMETERS 490 D.4 PROPERTIES OF POINT ESTIMATORS 493

Linearity 494 Unbiasedness 494 Minimum Variance 495 Efficiency 496 Best Linear Unbiased Estimator (BLUE) 497 Consistency 497

D.5 STATISTICAL INFERENCE: HYPOTHESIS TESTING 498 The Confidence Interval Approach to Hypothesis Testing 499 Type I and Type II Errors: A Digression 500 The Test of Significance Approach to Hypothesis Testing 503

x

xvi CONTENTS

guj75845_fm.qxd 4/16/09 12:48 PM Page xvi

A Word on Choosing the Level of Significance, �, and the p Value 506 The 2 and F Tests of Significance 507

D.6 SUMMARY 510 KEY TERMS AND CONCEPTS 510 QUESTIONS 511 PROBLEMS 512

Appendix E: Statistical Tables 515

Appendix F: Computer Output of EViews, MINITAB, Excel, and STATA 534

SELECTED BIBLIOGRAPHY 541

INDEXES 545 Name Index 545 Subject Index 547

x

CONTENTS xvii

guj75845_fm.qxd 4/16/09 12:48 PM Page xvii

guj75845_fm.qxd 4/16/09 12:48 PM Page xviii

PREFACE

OBJECTIVE OF THE BOOK

As in the previous editions, the primary objective of the fourth edition of Essentials of Econometrics is to provide a user-friendly introduction to econometric theory and techniques. The intended audience is undergraduate economics ma- jors, undergraduate business administration majors, MBA students, and others in social and behavioral sciences where econometrics techniques, especially the techniques of linear regression analysis, are used. The book is designed to help students understand econometric techniques through extensive examples, care- ful explanations, and a wide variety of problem material. In each of the previous editions, I have tried to incorporate major developments in the field in an intu- itive and informative way without resorting to matrix algebra, calculus, or sta- tistics beyond the introductory level. The fourth edition continues that tradition.

Although I am in the eighth decade of my life, I have not lost my love for econometrics and I strive to keep up with the major developments in the field. To assist me in this endeavor, I am now happy to have Dr. Dawn Porter, Assistant Professor of Statistics at the Marshall School of Business at the University of Southern California in Los Angeles, as my co-author. Both of us have been deeply involved in bringing the fourth edition of Essentials of Econometrics to fruition.

MAJOR FEATURES OF THE FOURTH EDITION

Before discussing the specific changes in the various chapters, the following features of the new edition are worth noting:

1. In order to streamline topics and jump right into information about linear regression techniques, we have moved the background statistics material (formerly Chapters 2 through 5) to the appendix. This allows for easy refer- ence to more introductory material for those who need it, without disturbing the main content of the text.

2. Practically all the data used in the illustrative examples have been updated from the previous edition.

3. Several new examples have been added.

xix

guj75845_fm.qxd 4/16/09 12:48 PM Page xix

4. In several chapters, we have included extended concluding examples that illustrate the various points made in the text.

5. Concrete computer printouts of several examples are included in the book. Most of these results are based on EViews (version 6), STATA (version 10), and MINITAB (version 15).

6. Several new diagrams and graphs are included in various chapters. 7. Several new data-based exercises are included throughout the book. 8. Small-sized data are included in the book, but large sample data are posted

on the book’s Web site, thereby minimizing the size of the text. The Web site also contains all the data used in the book.

SPECIFIC CHANGES

Some of the chapter-specific changes in the fourth edition are as follows: Chapter 1: A revised and expanded list of Web sites for economic data has been included. Chapters 2 and 3: An interesting new data example concerning the relationship between family income and student performance on the S.A.T. is utilized to introduce the two-variable regression model. Chapter 4: We have included a brief explanation of nonstochastic versus stochas- tic predictors. An additional example regarding educational expenditures among several countries that adds to the explanation of regression hypothesis testing. Chapter 5: The math S.A.T. example is revisited to illustrate various functional forms. Section 5.10 has been added to handle the topic of regression on stan- dardized variables. Also, several new data exercises have been included. Chapter 6: An example concerning acceptance rates among top business schools has been added to help illustrate the usefulness of dummy variable regression models. Several new data exercises also have been added. Chapter 8: Again, we have added several new, updated data exercises dealing with the issue of multicollinearity. Chapter 9: To illustrate the concept of heteroscedasticity, a new example relat- ing wages to education levels and years of experience has been included, as well as more real data exercises. Chapter 10: A new section concerning the Newey-West standard error correc- tion method using a data example has been added. Also, a new appendix has been included at the end of the chapter to cover the Breusch-Godfrey test of autocorrelation. Chapter 12: An expanded treatment of logistic regression has been included in this chapter with new examples to illustrate the results. Appendixes A–D: As noted above, the material in these appendixes was formerly contained in Chapters 2–5 of the main text. By placing them in the back of the book, they can more easily serve as reference sections to the main text. Data examples have been updated, and new exercises have been added.

Besides these specific changes, errors and misprints in the previous editions have been corrected. Also, our discussion of several topics in the various chap- ters has been streamlined.

xx PREFACE

guj75845_fm.qxd 4/20/09 8:00 AM Page xx

MATHEMATICAL REQUIREMENTS

In presenting the various topics, we have used very little matrix algebra or cal- culus. We firmly believe that econometrics can be taught to the beginner in an intuitive manner, without a heavy dose of matrix algebra or calculus. Also, we have not given any proofs unless they are easily understood. We do not feel that the nonspe- cialist needs to be burdened with detailed proofs. Of course, the instructor can supply the necessary proofs as the situation demands. Some of the proofs are available in our Basic Econometrics (McGraw-Hill, 5th ed., 2009).

SUPPLEMENTS AID THE PROBLEM SOLVING APPROACH

The comprehensive Web site for the fourth edition contains the following sup- plementary material to assist both instructors and students:

• Data from the text, as well as additional large set data referenced in the book. • A Solutions Manual providing answers to all of the questions and problems

throughout the text is provided for the instructors to use as they wish. • A digital image library containing all of the graphs and tables from the book.

For more information, please visit the Online Learning Center at www.mhhe .com/gujaratiess4e.

COMPUTERS AND ECONOMETRICS

It cannot be overemphasized that what has made econometrics accessible to the beginner is the availability of several user-friendly computer statistical pack- ages. The illustrative problems in this book are solved using statistical software packages, such as EViews, Excel, MINITAB, and STATA. Student versions of some of these packages are readily available. The data posted on the Web site is in Excel format and can also be read easily by many standard statistical pack- ages such as LIMDEP, RATS, SAS, and SPSS.

In Appendix E we show the outputs of EViews, Excel, MINITAB, and STATA, using a common data set. Each of these software packages has some unique features although some of the statistical routines are quite similar.

IN CLOSING

To sum up, in writing Essentials of Econometrics, our primary objective has been to introduce the wonderful world of econometrics to the beginner in a relaxed but informative style. We hope the knowledge gained from this book will prove to be of lasting value in the reader’s future academic or professional ca- reer and that the reader’s knowledge learned in this book can be further widened by reading some advanced and specialized books in econometrics. Some of these books can be found in the selected bibliography given at the end of the book.

PREFACE xxi

guj75845_fm.qxd 4/16/09 12:48 PM Page xxi

ACKNOWLEDGMENTS

Our foremost thanks are to the following reviewers who made very valuable suggestions to improve the quality of the book.

Michael Allison University of Missouri, St. Louis Giles Bootheway Saint Bonaventure University Bruce Brown California State Polytechnic University, Pomona Kristin Butcher Wellesley College Juan Cabrera Queens College Tom Chen Saint John’s University Joanne Doyle James Madison University Barry Falk Iowa State University Eric Furstenberg University of Virginia, Charlottesville Steffen Habermalz Northwestern University Susan He Washington State University, Pullman Jerome Heavey Lafayette College George Jakubson Cornell University Elia Kacapyr Ithaca College Janet Kohlhase University of Houston Maria Kozhevnikova Queens College John Krieg Western Washington University William Latham University of Delaware Jinman Lee University of Illinois, Chicago Stephen LeRoy University of California, Santa Barbara Dandan Liu Bowling Green State University Fabio Milani University of California, Irvine Hillar Neumann Northern State University Jennifer Rice Eastern Michigan University Steven Stageberg University of Mary Washington Joseph Sulock University of North Carolina, Asheville Mark Tendall Stanford University Christopher Warburton John Jay College Tiemen Woutersen Johns Hopkins University

We are very grateful to Douglas Reiner, our publisher at McGraw-Hill, for help- ing us through this edition of the book. We are also grateful to Noelle Fox, edito- rial coordinator at McGraw-Hill, for working with us through all of our setbacks. We also need to acknowledge the project management provided by Manjot Singh Dodi, and the great copy editing by Ann Sass, especially since this type of text- book incorporates so many technical formulas and symbols.

Damodar N. Gujarati United States Military Academy, West Point

Dawn C. Porter University of Southern California, Los Angeles

xxii PREFACE

guj75845_fm.qxd 4/16/09 12:48 PM Page xxii

CHAPTER 1 THE NATURE AND SCOPE

OF ECONOMETRICS

1

Research in economics, finance, management, marketing, and related disci- plines is becoming increasingly quantitative. Beginning students in these fields are encouraged, if not required, to take a course or two in econometrics—a field of study that has become quite popular. This chapter gives the beginner an overview of what econometrics is all about.

1.1 WHAT IS ECONOMETRICS?

Simply stated, econometrics means economic measurement. Although quan- titative measurement of economic concepts such as the gross domestic prod- uct (GDP), unemployment, inflation, imports, and exports is very important, the scope of econometrics is much broader, as can be seen from the following definitions:

Econometrics may be defined as the social science in which the tools of economic the- ory, mathematics, and statistical inference are applied to the analysis of economic phenomena.1

Econometrics, the result of a certain outlook on the role of economics, consists of the application of mathematical statistics to economic data to lend empirical support to the models constructed by mathematical economics and to obtain numerical results.2

1Arthur S. Goldberger, Econometric Theory, Wiley, New York, 1964, p. 1. 2P. A. Samuelson, T. C. Koopmans, and J. R. N. Stone, “Report of the Evaluative Committee for

Econometrica,” Econometrica, vol. 22, no. 2, April 1954, pp. 141–146.

guj75845_ch01.qxd 4/16/09 10:07 AM Page 1

1.2 WHY STUDY ECONOMETRICS?

As the preceding definitions suggest, econometrics makes use of economic the- ory, mathematical economics, economic statistics (i.e., economic data), and mathematical statistics. Yet, it is a subject that deserves to be studied in its own right for the following reasons.

Economic theory makes statements or hypotheses that are mostly qualitative in nature. For example, microeconomic theory states that, other things remain- ing the same (the famous ceteris paribus clause of economics), an increase in the price of a commodity is expected to decrease the quantity demanded of that commodity. Thus, economic theory postulates a negative or inverse relation- ship between the price and quantity demanded of a commodity—this is the widely known law of downward-sloping demand or simply the law of demand. But the theory itself does not provide any numerical measure of the strength of the relationship between the two; that is, it does not tell by how much the quan- tity demanded will go up or down as a result of a certain change in the price of the commodity. It is the econometrician’s job to provide such numerical esti- mates. Econometrics gives empirical (i.e., based on observation or experiment) content to most economic theory. If we find in a study or experiment that when the price of a unit increases by a dollar the quantity demanded goes down by, say, 100 units, we have not only confirmed the law of demand, but in the process we have also provided a numerical estimate of the relationship between the two variables—price and quantity.

The main concern of mathematical economics is to express economic theory in mathematical form or equations (or models) without regard to measurability or empirical verification of the theory. Econometrics, as noted earlier, is primar- ily interested in the empirical verification of economic theory. As we will show shortly, the econometrician often uses mathematical models proposed by the mathematical economist but puts these models in forms that lend themselves to empirical testing.

Economic statistics is mainly concerned with collecting, processing, and pre- senting economic data in the form of charts, diagrams, and tables. This is the economic statistician’s job. He or she collects data on the GDP, employment, un- employment, prices, etc. These data constitute the raw data for econometric work. But the economic statistician does not go any further because he or she is not primarily concerned with using the collected data to test economic theories.

Although mathematical statistics provides many of the tools employed in the trade, the econometrician often needs special methods because of the unique nature of most economic data, namely, that the data are not usually generated as the result of a controlled experiment. The econometrician, like the meteorol- ogist, generally depends on data that cannot be controlled directly. Thus, data on consumption, income, investments, savings, prices, etc., which are collected by public and private agencies, are nonexperimental in nature. The econometri- cian takes these data as given. This creates special problems not normally dealt with in mathematical statistics. Moreover, such data are likely to contain errors of measurement, of either omission or commission, and the econometrician

2 CHAPTER ONE: THE NATURE AND SCOPE OF ECONOMETRICS

guj75845_ch01.qxd 4/16/09 10:07 AM Page 2

may be called upon to develop special methods of analysis to deal with such errors of measurement.

For students majoring in economics and business there is a pragmatic reason for studying econometrics. After graduation, in their employment, they may be called upon to forecast sales, interest rates, and money supply or to estimate de- mand and supply functions or price elasticities for products. Quite often, econo- mists appear as expert witnesses before federal and state regulatory agencies on behalf of their clients or the public at large. Thus, an economist appearing before a state regulatory commission that controls prices of gas and electricity may be re- quired to assess the impact of a proposed price increase on the quantity de- manded of electricity before the commission will approve the price increase. In situations like this the economist may need to develop a demand function for electricity for this purpose. Such a demand function may enable the economist to estimate the price elasticity of demand, that is, the percentage change in the quan- tity demanded for a percentage change in the price. Knowledge of econometrics is very helpful in estimating such demand functions.

It is fair to say that econometrics has become an integral part of training in economics and business.

1.3 THE METHODOLOGY OF ECONOMETRICS

How does one actually do an econometric study? Broadly speaking, economet- ric analysis proceeds along the following lines.

1. Creating a statement of theory or hypothesis. 2. Collecting data. 3. Specifying the mathematical model of theory. 4. Specifying the statistical, or econometric, model of theory. 5. Estimating the parameters of the chosen econometric model. 6. Checking for model adequacy: Model specification testing. 7. Testing the hypothesis derived from the model. 8. Using the model for prediction or forecasting.

To illustrate the methodology, consider this question: Do economic condi- tions affect people’s decisions to enter the labor force, that is, their willingness to work? As a measure of economic conditions, suppose we use the unemploy- ment rate (UNR), and as a measure of labor force participation we use the labor force participation rate (LFPR). Data on UNR and LFPR are regularly published by the government. So to answer the question we proceed as follows.

Creating a Statement of Theory or Hypothesis

The starting point is to find out what economic theory has to say on the subject you want to study. In labor economics, there are two rival hypotheses about the effect of economic conditions on people’s willingness to work. The discouraged- worker hypothesis (effect) states that when economic conditions worsen, as

CHAPTER ONE: THE NATURE AND SCOPE OF ECONOMETRICS 3

guj75845_ch01.qxd 4/16/09 10:07 AM Page 3

reflected in a higher unemployment rate, many unemployed workers give up hope of finding a job and drop out of the labor force. On the other hand, the added-worker hypothesis (effect) maintains that when economic conditions worsen, many secondary workers who are not currently in the labor market (e.g., mothers with children) may decide to join the labor force if the main breadwinner in the family loses his or her job. Even if the jobs these secondary workers get are low paying, the earnings will make up some of the loss in in- come suffered by the primary breadwinner.

Whether, on balance, the labor force participation rate will increase or decrease will depend on the relative strengths of the added-worker and discouraged- worker effects. If the added-worker effect dominates, LFPR will increase even when the unemployment rate is high. Contrarily, if the discouraged-worker effect dominates, LFPR will decrease. How do we find this out? This now becomes our empirical question.

Collecting Data

For empirical purposes, therefore, we need quantitative information on the two variables. There are three types of data that are generally available for empirical analysis.

1. Time series. 2. Cross-sectional. 3. Pooled (a combination of time series and cross-sectional).

Times series data are collected over a period of time, such as the data on GDP, employment, unemployment, money supply, or government deficits. Such data may be collected at regular intervals—daily (e.g., stock prices), weekly (e.g., money supply), monthly (e.g., the unemployment rate), quarterly (e.g., GDP), or annually (e.g., government budget). These data may be quanti- tative in nature (e.g., prices, income, money supply) or qualitative (e.g., male or female, employed or unemployed, married or unmarried, white or black). As we will show, qualitative variables, also called dummy or categorical variables, can be every bit as important as quantitative variables.

Cross-sectional data are data on one or more variables collected at one point in time, such as the census of population conducted by the U.S. Census Bureau every 10 years (the most recent was on April 1, 2000); the surveys of consumer expenditures conducted by the University of Michigan; and the opinion polls such as those conducted by Gallup, Harris, and other polling organizations.

In pooled data we have elements of both time series and cross-sectional data. For example, if we collect data on the unemployment rate for 10 countries for a period of 20 years, the data will constitute an example of pooled data—data on the unemployment rate for each country for the 20-year period will form time se- ries data, whereas data on the unemployment rate for the 10 countries for any single year will be cross-sectional data. In pooled data we will have 200 observations—20 annual observations for each of the 10 countries.

4 CHAPTER ONE: THE NATURE AND SCOPE OF ECONOMETRICS

guj75845_ch01.qxd 4/16/09 10:07 AM Page 4

There is a special type of pooled data, panel data, also called longitudinal or micropanel data, in which the same cross-sectional unit, say, a family or firm, is surveyed over time. For example, the U.S. Department of Commerce conducts a census of housing at periodic intervals. At each periodic survey the same household (or the people living at the same address) is interviewed to find out if there has been any change in the housing and financial conditions of that household since the last survey. The panel data that result from repeatedly in- terviewing the same household at periodic intervals provide very useful infor- mation on the dynamics of household behavior.

Sources of Data A word is in order regarding data sources. The success of any econometric study hinges on the quality, as well as the quantity, of data. Fortunately, the Internet has opened up a veritable wealth of data. In Appendix 1A we give addresses of several Web sites that have all kinds of mi- croeconomic and macroeconomic data. Students should be familiar with such sources of data, as well as how to access or download them. Of course, these data are continually updated so the reader may find the latest available data.

For our analysis, we obtained the time series data shown in Table 1-1. This table gives data on the civilian labor force participation rate (CLFPR) and the civilian unemployment rate (CUNR), defined as the number of civilians unem- ployed as a percentage of the civilian labor force, for the United States for the period 1980–2007.3

Unlike physical sciences, most data collected in economics (e.g., GDP, money supply, Dow Jones index, car sales) are nonexperimental in that the data- collecting agency (e.g., government) may not have any direct control over the data. Thus, the data on labor force participation and unemployment are based on the information provided to the government by participants in the labor market. In a sense, the government is a passive collector of these data and may not be aware of the added- or discouraged-worker hypotheses, or any other hypothesis, for that matter. Therefore, the collected data may be the result of several factors affecting the labor force participation decision made by the individual person. That is, the same data may be compatible with more than one theory.

Specifying the Mathematical Model of Labor Force Participation

To see how CLFPR behaves in relation to CUNR, the first thing we should do is plot the data for these variables in a scatter diagram, or scattergram, as shown in Figure 1-1.

The scattergram shows that CLFPR and CUNR are inversely related, perhaps suggesting that, on balance, the discouraged-worker effect is stronger than the added-worker effect.4 As a first approximation, we can draw a straight line

CHAPTER ONE: THE NATURE AND SCOPE OF ECONOMETRICS 5

3We consider here only the aggregate CLFPR and CUNR, but data are available by age, sex, and ethnic composition.

4On this, see Shelly Lundberg, “The Added Worker Effect,” Journal of Labor Economics, vol. 3, January 1985, pp. 11–37.

guj75845_ch01.qxd 4/16/09 10:07 AM Page 5

6 CHAPTER ONE: THE NATURE AND SCOPE OF ECONOMETRICS

U.S. CIVILIAN LABOR FORCE PARTICIPATION RATE (CLFPR), CIVILIAN UNEMPLOYMENT RATE (CUNR), AND REAL AVERAGE HOURLY EARNINGS (AHE82)* FOR THE YEARS 1980–2007

Year CLFPR (%) CUNR (%) AHE82 ($)

1980 63.8 7.1 8.00 1981 63.9 7.6 7.89 1982 64.0 9.7 7.87 1983 64.0 9.6 7.96 1984 64.4 7.5 7.96 1985 64.8 7.2 7.92 1986 65.3 7.0 7.97 1987 65.6 6.2 7.87 1988 65.9 5.5 7.82 1989 66.5 5.3 7.75 1990 66.5 5.6 7.66 1991 66.2 6.8 7.59 1992 66.4 7.5 7.55 1993 66.3 6.9 7.54 1994 66.6 6.1 7.54 1995 66.6 5.6 7.54 1996 66.8 5.4 7.57 1997 67.1 4.9 7.69 1998 67.1 4.5 7.89 1999 67.1 4.2 8.01 2000 67.1 4.0 8.04 2001 66.8 4.7 8.12 2002 66.6 5.8 8.25 2003 66.2 6.0 8.28 2004 66.0 5.5 8.24 2005 66.0 5.1 8.18 2006 66.2 4.6 8.24 2007 66.0 4.6 8.32

*AHE82 represents average hourly earnings in 1982 dollars. Source: Economic Report of the President, 2008, CLFPR from

Table B-40, CUNR from Table B-43, and AHE82 from Table B-47.

TABLE 1-1

through the scatter points and write the relationship between CLFPR and CUNR by the following simple mathematical model:

(1.1)

Equation (1.1) states that CLFPR is linearly related to CUNR. B1 and B2 are known as the parameters of the linear function.5 B1 is also known as the intercept; it

CLFPR = B1 + B2 CUNR

5Broadly speaking, a parameter is an unknown quantity that may vary over a certain set of val- ues. In statistics a probability distribution function (PDF) of a random variable is often character- ized by its parameters, such as its mean and variance. This topic is discussed in greater detail in Appendixes A and B.

guj75845_ch01.qxd 4/16/09 10:07 AM Page 6

gives the value of CLFPR when CUNR is zero.6 B2 is known as the slope. The slope measures the rate of change in CLFPR for a unit change in CUNR, or more gen- erally, the rate of change in the value of the variable on the left-hand side of the equation for a unit change in the value of the variable on the right-hand side. The slope coefficient B2 can be positive (if the added-worker effect dominates the discouraged-worker effect) or negative (if the discouraged-worker effect dominates the added-worker effect). Figure 1-1 suggests that in the present case it is negative.

Specifying the Statistical, or Econometric, Model of Labor Force Participation

The purely mathematical model of the relationship between CLFPR and CUNR given in Eq. (1.1), although of prime interest to the mathematical economist, is of limited appeal to the econometrician, for such a model assumes an exact, or deterministic, relationship between the two variables; that is, for a given CUNR, there is a unique value of CLFPR. In reality, one rarely finds such neat relation- ships between economic variables. Most often, the relationships are inexact, or statistical, in nature.

This is seen clearly from the scattergram given in Figure 1-1. Although the two variables are inversely related, the relationship between them is not perfectly or exactly linear, for if we draw a straight line through the 28 data points, not all the data points will lie exactly on that straight line. Recall that to draw a straight line we need only two points.7 Why don’t the 28 data points lie exactly on the straight

CHAPTER ONE: THE NATURE AND SCOPE OF ECONOMETRICS 7

C L

FP R

(% )

CUNR (%)

67.5

67.0

66.5

66.0

65.5

65.0

64.5

64.0

63.5 3.5 4.5 5.5 6.5

Fitted Line Plot

7.5 8.5 9.5 10.5

Regression plot for civilian labor force participationrate (%) and civilian unemployment rate (%) FIGURE 1-1

6In Chapter 2 we give a more precise interpretation of the intercept in the context of regression analysis.

7We even tried to fit a parabola to the scatter points given in Fig. 1-1, but the results were not materially different from the linear specification.

guj75845_ch01.qxd 4/16/09 10:07 AM Page 7

line specified by the mathematical model, Eq. (1.1)? Remember that our data on labor force and unemployment are nonexperimentally collected. Therefore, as noted earlier, besides the added- and discouraged-worker hypotheses, there may be other forces affecting labor force participation decisions. As a result, the observed relationship between CLFPR and CUNR is likely to be imprecise.

Let us allow for the influence of all other variables affecting CLFPR in a catchall variable u and write Eq. (1.2) as follows:

(1.2)

where u represents the random error term, or simply the error term.8 We let u represent all those forces (besides CUNR) that affect CLFPR but are not explic- itly introduced in the model, as well as purely random forces. As we will see in Part II, the error term distinguishes econometrics from purely mathematical economics.

Equation (1.2) is an example of a statistical, or empirical or econometric, model. More precisely, it is an example of what is known as a linear regression model, which is a prime subject of this book. In such a model, the variable appearing on the left-hand side of the equation is called the dependent variable, and the vari- able on the right-hand side is called the independent, or explanatory, variable. In linear regression analysis our primary objective is to explain the behavior of one variable (the dependent variable) in relation to the behavior of one or more other variables (the explanatory variables), allowing for the fact that the rela- tionship between them is inexact.

Notice that the econometric model, Eq. (1.2), is derived from the mathemati- cal model, Eq. (1.1), which shows that mathematical economics and economet- rics are mutually complementary disciplines. This is clearly reflected in the definition of econometrics given at the outset.

Before proceeding further, a warning regarding causation is in order. In the regression model, Eq. (1.2), we have stated that CLFPR is the dependent vari- able and CUNR is the independent, or explanatory, variable. Does that mean that the two variables are causally related; that is, is CUNR the cause and CLFPR the effect? In other words, does regression imply causation? Not necessarily. As Kendall and Stuart note, “A statistical relationship, however strong and how- ever suggestive, can never establish causal connection: our ideas of causation must come from outside statistics, ultimately from some theory or other.”9 In our example, it is up to economic theory (e.g., the discouraged-worker hypoth- esis) to establish the cause-and-effect relationship, if any, between the depen- dent and explanatory variables. If causality cannot be established, it is better to call the relationship, Eq. (1.2), a predictive relationship: Given CUNR, can we pre- dict CLFPR?

CLFPR = B1 + B2CUNR + u

8 CHAPTER ONE: THE NATURE AND SCOPE OF ECONOMETRICS

8In statistical lingo, the random error term is known as the stochastic error term. 9M. G. Kendall and A. Stuart, The Advanced Theory of Statistics, Charles Griffin Publishers, New

York, 1961, vol. 2, Chap. 26, p. 279.

guj75845_ch01.qxd 4/16/09 10:07 AM Page 8

Estimating the Parameters of the Chosen Econometric Model

Given the data on CLFPR and CUNR, such as that in Table 1-1, how do we esti- mate the parameters of the model, Eq. (1.2), namely, B1 and B2? That is, how do we find the numerical values (i.e., estimates) of these parameters? This will be the focus of our attention in Part II, where we develop the appropriate methods of computation, especially the method of ordinary least squares (OLS). Using OLS and the data given in Table 1-1, we obtained the following results:

(1.3)

Note that we have put the symbol on CLFPR (read as “CLFPR hat”) to remind us that Eq. (1.3) is an estimate of Eq. (1.2). The estimated regression line is shown in Figure 1-1, along with the actual data points.

As Eq. (1.3) shows, the estimated value of B1 is 69.5 and that of B2 is – 0.58, where the symbol means approximately. Thus, if the unemployment rate goes up by one unit (i.e., one percentage point), ceteris paribus, CLFPR is ex- pected to decrease on the average by about 0.58 percentage points; that is, as eco- nomic conditions worsen, on average, there is a net decrease in the labor force participation rate of about 0.58 percentage points, perhaps suggesting that the discouraged-worker effect dominates. We say “on the average” because the presence of the error term u, as noted earlier, is likely to make the relationship somewhat imprecise. This is vividly seen in Figure 1-1 where the points not on the estimated regression line are the actual participation rates and the (vertical) distance between them and the points on the regression line are the estimated u’s. As we will see in Chapter 2, the estimated u’s are called residuals. In short, the estimated regression line, Eq. (1.3), gives the relationship between average CLFPR and CUNR; that is, on average how CLFPR responds to a unit change in CUNR. The value of about 69.5 suggests that the average value of CLFPR will be about 69.5 percent if the CUNR is zero; that is, about 69.5 percent of the civil- ian working-age population will participate in the labor force if there is full employment (i.e., zero unemployment).10

Checking for Model Adequacy: Model Specification Testing

How adequate is our model, Eq. (1.3)? It is true that a person will take into account labor market conditions as measured by, say, the unemployment rate before entering the labor market. For example, in 1982 (a recession year) the civilian un- employment rate was about 9.7 percent. Compared to that, in 2001 it was only 4.7 percent.Aperson is more likely to be discouraged from entering the labor mar- ket when the unemployment rate is more than 9 percent than when it is 5 percent. But there are other factors that also enter into labor force participation decisions. For example, hourly wages, or earnings, prevailing in the labor market also will

L LL

¿

CLFPR = 69.4620 - 0.5814CUNR

CHAPTER ONE: THE NATURE AND SCOPE OF ECONOMETRICS 9

10This is, however, a mechanical interpretation of the intercept. We will see in Chapter 2 how to interpret the intercept term meaningfully in a given context.

guj75845_ch01.qxd 4/16/09 10:07 AM Page 9

be an important decision variable. In the short run at least, a higher wage may at- tract more workers to the labor market, other things remaining the same (ceteris paribus). To see its importance, in Table 1-1 we have also given data on real average hourly earnings (AHE82), where real earnings are measured in 1982 dollars. To take into account the influence of AHE82, we now consider the following model:

(1.4)

Equation (1.4) is an example of a multiple linear regression model, in contrast to Eq. (1.2), which is an example of a simple (two-variable or bivariate) linear regression model. In the two-variable model there is a single explanatory variable, whereas in a multiple regression there are several, or multiple, explanatory variables. Notice that in the multiple regression, Eq. (1.4), we also have included the error term, u, for no matter how many explanatory variables one introduces in the model, one cannot fully explain the behavior of the dependent variable. How many variables one introduces in the multiple regression is a decision that the researcher will have to make in a given situation. Of course, the underlying eco- nomic theory will often tell what these variables might be. However, keep in mind the warning given earlier that regression does not mean causation; the relevant theory must determine whether one or more explanatory variables are causally related to the dependent variable.

How do we estimate the parameters of the multiple regression, Eq. (1.4)? We cover this topic in Chapter 4, after we discuss the two-variable model in Chapters 2 and 3. We consider the two-variable case first because it is the build- ing block of the multiple regression model. As we shall see in Chapter 4, the multiple regression model is in many ways a straightforward extension of the two-variable model.

For our illustrative example, the empirical counterpart of Eq. (1.4) is as fol- lows (these results are based on OLS):

(1.5)

These results are interesting because both the slope coefficients are negative. The negative coefficient of CUNR suggests that, ceteris paribus (i.e., holding the influence of AHE82 constant), a one-percentage-point increase in the unem- ployment rate leads, on average, to about a 0.64-percentage-point decrease in CLFPR, perhaps once again supporting the discouraged-worker hypothesis. On the other hand, holding the influence of CUNR constant, an increase in real average hourly earnings of one dollar, on average, leads to about a 1.44 percentage- point decline in CLFPR.11 Does the negative coefficient for AHE82 make eco- nomic sense? Would one not expect a positive coefficient—the higher the hourly

CLFPR = 81.2267 - 0.6384CUNR - 1.4449AHE82

CLFPR = B1 + B2CUNR + B3AHE82 + u

10 CHAPTER ONE: THE NATURE AND SCOPE OF ECONOMETRICS

11As we will discuss in Chapter 4, the coefficients of CUNR and AHE82 given in Eq. (1.5) are known as partial regression coefficients. In that chapter we will discuss the precise meaning of partial regression coefficients.

guj75845_ch01.qxd 4/16/09 10:07 AM Page 10

earnings, the higher the attraction of the labor market? However, one could justify the negative coefficient by recalling the twin concepts of microeconomics, namely, the income effect and the substitution effect.12

Which model do we choose, Eq. (1.3) or Eq. (1.5)? Since Eq. (1.5) encompasses Eq. (1.3) and since it adds an additional dimension (earnings) to the analysis, we may choose Eq. (1.5). After all, Eq. (1.2) was based implicitly on the assumption that variables other than the unemployment rate were held constant. But where do we stop? For example, labor force participation may also depend on family wealth, number of children under age 6 (this is especially critical for married women thinking of joining the labor market), availability of day-care centers for young children, religious beliefs, availability of welfare benefits, unemploy- ment insurance, and so on. Even if data on these variables are available, we may not want to introduce them all in the model because the purpose of developing an econometric model is not to capture total reality, but just its salient features. If we decide to include every conceivable variable in the regression model, the model will be so unwieldy that it will be of little practical use. The model ulti- mately chosen should be a reasonably good replica of the underlying reality. In Chapter 7, we will discuss this question further and find out how one can go about developing a model.

Testing the Hypothesis Derived from the Model

Having finally settled on a model, we may want to perform hypothesis testing. That is, we may want to find out whether the estimated model makes economic sense and whether the results obtained conform with the underlying economic theory. For example, the discouraged-worker hypothesis postulates a negative relationship between labor force participation and the unemployment rate. Is this hypothesis borne out by our results? Our statistical results seem to be in confor- mity with this hypothesis because the estimated coefficient of CUNR is negative.

However, hypothesis testing can be complicated. In our illustrative example, suppose someone told us that in a prior study the coefficient of CUNR was found to be about –1. Are our results in agreement? If we rely on the model, Eq. (1.3), we might get one answer; but if we rely on Eq. (1.5), we might get another answer. How do we resolve this question? Although we will develop the neces- sary tools to answer such questions, we should keep in mind that the answer to a particular hypothesis may depend on the model we finally choose.

The point worth remembering is that in regression analysis we may be inter- ested not only in estimating the parameters of the regression model but also in testing certain hypotheses suggested by economic theory and/or prior empiri- cal experience.

CHAPTER ONE: THE NATURE AND SCOPE OF ECONOMETRICS 11

12Consult any standard textbook on microeconomics. One intuitive justification of this result is as follows. Suppose both spouses are in the labor force and the earnings of one spouse rise substan- tially. This may prompt the other spouse to withdraw from the labor force without substantially affecting the family income.

guj75845_ch01.qxd 4/16/09 10:07 AM Page 11

Using the Model for Prediction or Forecasting

Having gone through this multistage procedure, you can legitimately ask the question: What do we do with the estimated model, such as Eq. (1.5)? Quite nat- urally, we would like to use it for prediction, or forecasting. For instance, sup- pose we have 2008 data on the CUNR and AHE82. Assume these values are 6.0 and 10, respectively. If we put these values in Eq. (1.5), we obtain 62.9473 per- cent as the predicted value of CLFPR for 2008. That is, if the unemployment rate in 2008 were 6.0 percent and the real hourly earnings were $10, the civilian labor force participation rate for 2008 would be about 63 percent. Of course, when data on CLFPR for 2008 actually become available, we can compare the predicted value with the actual value. The discrepancy between the two will represent the prediction error. Naturally, we would like to keep the prediction error as small as possible. Whether this is always possible is a question that we will answer in Chapters 2 and 3.

Let us now summarize the steps involved in econometric analysis.

Step Example

1. Statement of theory The added-/discouraged-worker hypothesis 2. Collection of data Table 1-1 3. Mathematical model of theory: CLFPR = B1 + B2CUNR 4. Econometric model of theory: CLFPR = B1 + B2CUNR + u 5. Parameter estimation: CLFPR = 69.462 - 0.5814CUNR 6. Model adequacy check: CLFPR = 81.3 - 0.638CUNR - 1.445AHE82 7. Hypothesis test: B2 � 0 or B2 � 0 8. Prediction: What is CLFPR, given values of CUNR and AHE82?

Although we examined econometric methodology using an example from labor economics, we should point out that a similar procedure can be employed to analyze quantitative relationships between variables in any field of knowl- edge. As a matter of fact, regression analysis has been used in politics, interna- tional relations, psychology, sociology, meteorology, and many other disciplines.

1.4 THE ROAD AHEAD

Now that we have provided a glimpse of the nature and scope of econometrics, let us see what lies ahead. The book is divided into four parts.

Appendixes A, B, C, and D review the basics of probability and statistics for the benefit of those readers whose knowledge of statistics has become rusty. The reader should have some previous background in introductory statistics.

Part I introduces the reader to the bread-and-butter tool of econometrics, namely, the classical linear regression model (CLRM). A thorough understanding of CLRM is a must in order to follow research in the general areas of economics and business.

Part II considers the practical aspects of regression analysis and discusses a variety of problems that the practitioner will have to tackle when one or more assumptions of the CLRM do not hold.

12 CHAPTER ONE: THE NATURE AND SCOPE OF ECONOMETRICS

guj75845_ch01.qxd 4/16/09 10:07 AM Page 12

Part III discusses two comparatively advanced topics—simultaneous equa- tion regression models and time series econometrics.

This book keeps the needs of the beginner in mind. The discussion of most top- ics is straightforward and unencumbered with mathematical proofs, derivations, etc.13 We firmly believe that the apparently forbidding subject of econometrics can be taught to beginners in such a way that they can see the value of the subject without getting bogged down in mathematical and statistical minutiae. The student should keep in mind that an introductory econometrics course is just like the introductory statistics course he or she has already taken. As in statistics, econometrics is primarily about estimation and hypothesis testing. What is dif- ferent, and generally much more interesting and useful, is that the parameters being estimated or tested are not just means and variances, but relationships be- tween variables, which is what much of economics and other social sciences is all about.

A final word: The availability of comparatively inexpensive computer soft- ware packages has now made econometrics readily accessible to beginners. In this book we will largely use four software packages: EViews, Excel, STATA, and MINITAB. These packages are readily available and widely used. Once stu- dents get used to using such packages, they will soon realize that learning econometrics is really great fun, and they will have a better appreciation of the much maligned “dismal” science of economics.

KEY TERMS AND CONCEPTS

The key terms and concepts introduced in this chapter are

CHAPTER ONE: THE NATURE AND SCOPE OF ECONOMETRICS 13

Econometrics Mathematical economics Discouraged-worker hypothesis

(effect) Added-worker hypothesis (effect) Time series data

a) quantitative b) qualitative

Cross-sectional data Pooled data Panel (or longitudinal or micropanel

data) Scatter diagram (scattergram)

a) parameters b) intercept c) slope

Random error term (error term) Linear regression model:

dependent variable independent (or explanatory) variable

Causation Parameter estimates Hypothesis testing Prediction (forecasting)

13Some of the proofs and derivations are presented in our Basic Econometrics, 5th ed., McGraw- Hill, New York, 2009.

guj75845_ch01.qxd 4/16/09 10:07 AM Page 13

QUESTIONS

1.1. Suppose a local government decides to increase the tax rate on residential prop- erties under its jurisdiction. What will be the effect of this on the prices of resi- dential houses? Follow the eight-step procedure discussed in the text to answer this question.

1.2. How do you perceive the role of econometrics in decision making in business and economics?

1.3. Suppose you are an economic adviser to the Chairman of the Federal Reserve Board (the Fed), and he asks you whether it is advisable to increase the money supply to bolster the economy. What factors would you take into account in your advice? How would you use econometrics in your advice?

1.4. To reduce the dependence on foreign oil supplies, the government is thinking of increasing the federal taxes on gasoline. Suppose the Ford Motor Company has hired you to assess the impact of the tax increase on the demand for its cars. How would you go about advising the company?

1.5. Suppose the president of the United States is thinking of imposing tariffs on im- ported steel to protect the interests of the domestic steel industry. As an economic adviser to the president, what would be your recommendations? How would you set up an econometric study to assess the consequences of imposing the tariff?

PROBLEMS

1.6. Table 1-2 gives data on the Consumer Price Index (CPI), S&P 500 stock index, and three-month Treasury bill rate for the United States for the years 1980–2007. a. Plot these data with time on the horizontal axis and the three variables

on the vertical axis. If you prefer, you may use a separate figure for each variable. b. What relationships do you expect to find between the CPI and the S&P index

and between the CPI and the three-month Treasury bill rate? Why? c. For each variable, “eyeball” a regression line from the scattergram.

14 CHAPTER ONE: THE NATURE AND SCOPE OF ECONOMETRICS

CONSUMER PRICE INDEX (CPI, 1982–1984 = 100), STANDARD AND POOR’S COMPOSITE INDEX (S&P 500, 1941–1943 = 100), AND THREE-MONTH TREASURY BILL RATE (3-m T BILL, %)

Year CPI S&P 500 3-m T bill Year CPI S&P 500 3-m T bill

TABLE 1-2

1980 82.4 118.78 12.00 1981 90.9 128.05 14.00 1982 96.5 119.71 11.00 1983 99.6 160.41 8.63 1984 103.9 160.46 9.58 1985 107.6 186.84 7.48 1986 109.6 236.34 5.98 1987 113.6 286.83 5.82 1988 118.3 265.79 6.69 1989 124.0 322.84 8.12 1990 130.7 334.59 7.51 1991 136.2 376.18 5.42 1992 140.3 415.74 3.45 1993 144.5 451.41 3.02

1994 148.2 460.42 4.29 1995 152.4 541.72 5.51 1996 156.9 670.50 5.02 1997 160.5 873.43 5.07 1998 163.0 1,085.50 4.81 1999 166.6 1,327.33 4.66 2000 172.2 1,427.22 5.85 2001 177.1 1,194.18 3.45 2002 179.9 993.94 1.62 2003 184.0 965.23 1.02 2004 188.9 1,130.65 1.38 2005 195.3 1,207.23 3.16 2006 201.6 1,310.46 4.73 2007 207.3 1,477.19 4.41

Source: Economic Report of the President, 2008, Tables B-60, B-95, B-96, and B-74, respectively.

guj75845_ch01.qxd 4/16/09 10:07 AM Page 14

CHAPTER ONE: THE NATURE AND SCOPE OF ECONOMETRICS 15

1.7. Table 1-3 gives you data on the exchange rate between the U.K. pound and the U.S. dollar (number of U.K. pounds per U.S. dollar) as well as the consumer price indexes in the two countries for the period 1985–2007. a. Plot the exchange rate (ER) and the two consumer price indexes against time,

measured in years. b. Divide the U.S. CPI by the U.K. CPI and call it the relative price ratio (RPR). c. Plot ER against RPR. d. Visually sketch a regression line through the scatterpoints.

1.8. Table 1-4 on the textbook Web site contains data on 1247 cars from 2008.14 Is there a strong relationship between a car’s MPG (miles per gallon) and the number of cylinders it has? a. Create a scatterplot of the combined MPG for the vehicles based on the num-

ber of cylinders. b. Sketch a straight line that seems to fit the data. c. What type of relationship is indicated by the plot?

U.K. POUND / $ EXCHANGE RATE BETWEEN U.K. POUND AND U.S. DOLLAR AND THE CPI IN THE UNITED STATES AND THE U.K., 1985–2007

Period £ / $ CPI U.S. CPI U.K.

1985 1.2974 107.6 111.1 1986 1.4677 109.6 114.9 1987 1.6398 113.6 119.7 1988 1.7813 118.3 125.6 1989 1.6382 124.0 135.4 1990 1.7841 130.7 148.2 1991 1.7674 136.2 156.9 1992 1.7663 140.3 162.7 1993 1.5016 144.5 165.3 1994 1.5319 148.2 169.3 1995 1.5785 152.4 175.2 1996 1.5607 156.9 179.4 1997 1.6376 160.5 185.1 1998 1.6573 163.0 191.4 1999 1.6172 166.6 194.3 2000 1.5156 172.2 200.1 2001 1.4396 177.1 203.6 2002 1.5025 179.9 207.0 2003 1.6347 184.0 213.0 2004 1.8330 188.9 219.4 2005 1.8204 195.3 225.6 2006 1.8434 201.6 232.8 2007 2.0020 207.3 242.7

Source: Economic Report of the President, 2008. U.K. Pound/ $ from Table B-110; CPI (1982–1984 = 100) from Table B-108.

TABLE 1-3

14Data were collected from the United States Department of Energy Web site at http://www. fueleconomy.gov/.

guj75845_ch01.qxd 4/16/09 10:07 AM Page 15

16 CHAPTER ONE: THE NATURE AND SCOPE OF ECONOMETRICS

APPENDIX 1A: Economic Data on the World Wide Web15

Economic Statistics Briefing Room: An excellent source of data on output, income, employment, unemployment, earnings, production and business activity, prices and money, credits and security markets, and international statistics. http://www.whitehouse.gov/fsbr/esbr.htm

Federal Reserve System Beige Book: Gives a summary of current economic con- ditions by the Federal Reserve District. There are 12 Federal Reserve Districts. www.federalreserve.gov/FOMC/BeigeBook/2008

National Bureau of Economic Research (NBER) Home Page: This highly regarded private economic research institute has extensive data on asset prices, labor, productivity, money supply, business cycle indicators, etc. NBER has many links to other Web sites. http://www.nber.org

Panel Study: Provides data on longitudinal survey of representative sample of U.S. individuals and families. These data have been collected annually since 1968. http://www.umich.edu/~psid

The Federal Web Locator: Provides information on almost every sector of the federal government; has international links. www.lib.auburn.edu/madd/docs/fedloc.html

WebEC:WWW Resources in Economics: A most comprehensive library of eco- nomic facts and figures. www.helsinki.fi/WebEc

American Stock Exchange: Information on some 700 companies listed on the second largest stock market. http://www.amex.com/

Bureau of Economic Analysis (BEA) Home Page: This agency of the U.S. Department of Commerce, which publishes the Survey of Current Business, is an excellent source of data on all kinds of economic activities. www.bea.gov

Business Cycle Indicators: You will find data on about 256 economic time series. http://www.globalexposure.com/bci.html

CIA Publication: You will find the World Fact Book (annual). www.cia.gov/library/publications

Energy Information Administration (Department of Energy [DOE]): Economic information and data on each fuel category. http://www.eia.doe.gov/

FRED Database: Federal Reserve Bank of St. Louis publishes historical eco- nomic and social data, which include interest rates, monetary and business indicators, exchange rates, etc. http://www.stls.frb.org/fred/

15It should be noted that this list is by no means exhaustive. The sources listed here are up- dated continually. The best way to get information on the Internet is to search using a key word (e.g., unemployment rate). Don’t be surprised if you get a plethora of information on the topic you search.

guj75845_ch01.qxd 4/16/09 10:07 AM Page 16

CHAPTER ONE: THE NATURE AND SCOPE OF ECONOMETRICS 17

International Trade Administration: Offers many Web links to trade statistics, cross-country programs, etc. http://www.ita.doc.gov/

STAT-USA Databases: The National Trade Data Bank provides the most com- prehensive source of international trade data and export promotion informa- tion. It also contains extensive data on demographic, political, and socioeco- nomic conditions for several countries. http://www.stat-usa.gov/

Bureau of Labor Statistics: The home page contains data related to various as- pects of employment, unemployment, and earnings and provides links to other statistical Web sites. http://stats.bls.gov

U.S. Census Bureau Home Page: Prime source of social, demographic, and economic data on income, employment, income distribution, and poverty. http://www.census.gov/

General Social Survey: Annual personal interview survey data on U.S. house- holds that began in 1972. More than 35,000 have responded to some 2500 different questions covering a variety of data. www.norc.org/GCS+Website

Institute for Research on Poverty: Data collected by nonpartisan and nonprofit university-based research center on a variety of questions relating to poverty and social inequality. http://www.ssc.wisc.edu/irp/

Social Security Administration: The official Web site of the Social Security Administration with a variety of data. http://www.ssa.gov

Federal Deposit Insurance Corporation, Bank Data and Statistics: http://www.fdic.gov/bank/statistical/

Federal Reserve Board, Economic Research and Data: http://www.federalreserve.gov/econresdata

U.S. Census Bureau, Home Page: http://www.census.gov

U.S. Department of Energy, Energy Information Administration: www.eia.doe.gov/overview_hd.html

U.S. Department of Health and Human Services, National Center for Health Statistics: http://www.cdc.gov/nchs

U.S. Department of Housing and Urban Development, Data Sets: http://www.huduser.org/datasets/pdrdatas.html

U.S. Department of Labor, Bureau of Labor Statistics: http://www.bls.gov

U.S. Department of Transportation, TranStats: http://www.transtats.bts.gov

U.S. Department of the Treasury, Internal Revenue Service, Tax Statistics: http://www.irs.gov/taxstats

guj75845_ch01.qxd 4/16/09 10:07 AM Page 17

18 CHAPTER ONE: THE NATURE AND SCOPE OF ECONOMETRICS

Rockefeller Institute of Government, State and Local Fiscal Data: www.rockinst.org/research/sl_finance

American Economic Association, Resources for Economists: http://www.rfe.org

American Statistical Association, Business and Economic Statistics: www.amstat.org/publications/jbes

American Statistical Association, Statistics in Sports: http://www.amstat.org/sections/sis/

European Central Bank, Statistics: http://www.ecb.int/stats

World Bank, Data and Statistics: http://www.worldbank.org/data

International Monetary Fund, Statistical Topics: http://www.imf.org/external/np/sta/

Penn World Tables: http://pwt.econ.upenn.edu

Current Population Survey: http://www.bls.census.gov/cps/

Consumer Expenditure Survey: http://www.bls.gov/cex/

Survey of Consumer Finances: http://www.federalreserve.gov/pubs/oss/

City and County Data Book: http://www.census.gov/statab/www/ccdb.html

Panel Study of Income Dynamics: http://psidonline.isr.umich.edu

National Longitudinal Surveys: http://www.bls.gov/nls/

National Association of Home Builders, Economic and Housing Data: http://www.nahb.org/page.aspx/category/sectionID=113

National Science Foundation, Division of Science Resources Statistics: http://www.nsf.gov/sbe/srs/

Economic Report of the President: http://www.gpoaccess.gov/eop/

Various Economic Data Sets: http://www.economy.com/freelunch/

The Economist Market Indicators: http://www.economist.com/markets/indicators

Statistical Resources on the Military: http://www.lib.umich.edu/govdocs/stmil.html

World Economic Indicators: http://devdata.worldbank.org/

Economic Time Series Data: http://www.economagic.com/

guj75845_ch01.qxd 4/16/09 10:07 AM Page 18

PART I THE LINEAR

REGRESSION MODEL

19

The objective of Part I, which consists of five chapters, is to introduce you to the “bread-and-butter” tool of econometrics, namely, the linear regression model.

Chapter 2 discusses the basic ideas of linear regression in terms of the simplest possible linear regression model, in particular, the two-variable model. We make an important distinction between the population regression model and the sample regression model and estimate the former from the latter. This estimation is done using the method of least squares, one of the popular methods of estimation.

Chapter 3 considers hypothesis testing. As in any hypothesis testing in sta- tistics, we try to find out whether the estimated values of the parameters of the regression model are compatible with the hypothesized values of the parame- ters. We do this hypothesis testing in the context of the classical linear regres- sion model (CLRM). We discuss why the CLRM is used and point out that the CLRM is a useful starting point. In Part II we will reexamine the assumptions of the CLRM to see what happens to the CLRM if one or more of its assumptions are not fulfilled.

Chapter 4 extends the idea of the two-variable linear regression model developed in the previous two chapters to multiple regression models, that is, models having more than one explanatory variable. Although in many ways the multiple regression model is an extension of the two-variable model, there are differences when it comes to interpreting the coefficients of the model and in the hypothesis-testing procedure.

The linear regression model, whether two-variable or multivariable, only re- quires that the parameters of the model be linear; the variables entering the model need not themselves be linear. Chapter 5 considers a variety of models

guj75845_ch02.qxd 4/16/09 10:13 AM Page 19

that are linear in the parameters (or can be made so) but are not necessarily lin- ear in the variables. With several illustrative examples, we point out how and where such models can be used.

Often the explanatory variables entering into a regression model are qualita- tive in nature, such as sex, race, and religion. Chapter 6 shows how such variables can be measured and how they enrich the linear regression model by taking into account the influence of variables that otherwise cannot be quantified.

Part I makes an effort to “wed” practice to theory. The availability of user- friendly regression packages allows you to estimate a regression model without knowing much theory, but remember the adage that “a little knowledge is a dangerous thing.” So even though theory may be boring, it is absolutely essen- tial in understanding and interpreting regression results. Besides, by omitting all mathematical derivations, we have made the theory “less boring.”

20 PART ONE: THE LINEAR REGRESSION MODEL

guj75845_ch02.qxd 4/16/09 10:13 AM Page 20

CHAPTER 2 BASIC IDEAS OF

LINEAR REGRESSION: THE TWO-VARIABLE

MODEL

21

In Chapter 1 we noted that in developing a model of an economic phenomenon (e.g., the law of demand) econometricians make heavy use of a statistical tech- nique known as regression analysis. The purpose of this chapter and Chapter 3 is to introduce the basics of regression analysis in terms of the simplest possible linear regression model, namely, the two-variable model. Subsequent chapters will consider various modifications and extensions of the two-variable model.

2.1 THE MEANING OF REGRESSION

As noted in Chapter 1, regression analysis is concerned with the study of the re- lationship between one variable called the explained, or dependent, variable and one or more other variables called independent, or explanatory, variables.

Thus, we may be interested in studying the relationship between the quan- tity demanded of a commodity in terms of the price of that commodity, income of the consumer, and prices of other commodities competing with this com- modity. Or, we may be interested in finding out how sales of a product (e.g., au- tomobiles) are related to advertising expenditure incurred on that product. Or, we may be interested in finding out how defense expenditures vary in relation to the gross domestic product (GDP). In all these examples there may be some underlying theory that specifies why we would expect one variable to be de- pendent or related to one or more other variables. In the first example, the law of demand provides the rationale for the dependence of the quantity demanded of a product on its own price and several other variables previously mentioned.

For notational uniformity, from here on we will let Y represent the dependent variable and X the independent, or explanatory, variable. If there is more than

guj75845_ch02.qxd 4/16/09 10:13 AM Page 21

one explanatory variable, we will show the various X’s by the appropriate sub- scripts (X1, X2, X3, etc.).

It is very important to bear in mind the warning given in Chapter 1 that, although regression analysis deals with the relationship between a dependent variable and one or more independent variables, it does not necessarily imply causation; that is, it does not necessarily mean that the independent variables are the cause and the dependent variable is the effect. If causality between the two exists, it must be justified on the basis of some (economic) theory. As noted ear- lier, the law of demand suggests that if all other variables are held constant, the quantity demanded of a commodity is (inversely) dependent on its own price. Here microeconomic theory suggests that the price may be the causal force and the quantity demanded the effect. Always keep in mind that regression does not nec- essarily imply causation. Causality must be justified, or inferred, from the theory that underlies the phenomenon that is tested empirically.

Regression analysis may have one of the following objectives:

1. To estimate the mean, or average, value of the dependent variable, given the values of the independent variables.

2. To test hypotheses about the nature of the dependence—hypotheses sug- gested by the underlying economic theory. For example, in the demand function mentioned previously, we may want to test the hypothesis that the price elasticity of demand is, say, –1.0; that is, the demand curve has unitary price elasticity. If the price of the commodity goes up by 1 per- cent, the quantity demanded on the average goes down by 1 percent, assuming all other factors affecting demand are held constant.

3. To predict, or forecast, the mean value of the dependent variable, given the value(s) of the independent variable(s) beyond the sample range. Thus, in the S.A.T. example discussed in Appendix C, we may wish to predict the average score on the critical reasoning part of the S.A.T. for a group of students who know their scores on the math part of the test (see Table 2-15).

4. One or more of the preceding objectives combined.

2.2 THE POPULATION REGRESSION FUNCTION (PRF): A HYPOTHETICAL EXAMPLE

To illustrate what all this means, we will consider a concrete example. In the last two years of high school, most American teenagers take the S.A.T. college en- trance examination. The test consists of three sections: critical reasoning (formerly called the verbal section), mathematics, and an essay portion, each scored on a scale of 0 to 800. Since the essay portion is more difficult to score, we will focus pri- marily on the mathematics section. Suppose we are interested in finding out whether a student’s family income is related to how well students score on the mathematics section of the test. Let Y represent the math S.A.T. score and X rep- resent annual family income. The income variable has been broken into 10 classes:

22 PART ONE: THE LINEAR REGRESSION MODEL

guj75845_ch02.qxd 4/16/09 10:13 AM Page 22

(6 $10,000), ($10,000–$20,000), ($20,000–$30,000), . . . , ($80,000–$100,000), and (7 $100,000). For simplicity, we have used the midpoints of each of the classes, estimating the last class midpoint at $150,000, for the analysis. Assume that a hypothetical population of 100 high school students is reported in Table 2-1.

Table 2.1 can be interpreted as follows: For an annual family income of $5,000, one student scored a 460 on the math section of the S.A.T. Nine other stu- dents had similar family incomes, and their scores, together with the first stu- dent, averaged to 452. For a family income of $15,000, one student scored a 480 on the section, and the average of 10 students in that income bracket was 475. The remaining columns are similar.

A scattergram of these data is shown in Figure 2-1. For this graph, the hori- zontal axis represents annual family income and the vertical axis represents the students’ math S.A.T. scores. For each income level, there are several S.A.T. scores; in fact, in this instance there are 10 recorded scores.1 The points con- nected with the line are the mean values for each income level. It seems as though there is a general, overall upward trend in the math scores; higher income levels tend to be associated with higher math scores. This is especially evident with the connected open circles, representing the average scores per income level. These connected circles are formally called the conditional mean or conditional expected values (see Appendix B for details). Since we have assumed the data represent the population of score values, the line connecting the conditional means is called the population regression line (PRL). The PRL gives the average, or mean, value of the dependent variable (math S.A.T. scores in this

CHAPTER TWO: BASIC IDEAS OF LINEAR REGRESSION: THE TWO-VARIABLE MODEL 23

TABLE 2-1 MATHEMATICS S.A.T. SCORES IN RELATION TO ANNUAL FAMILY INCOME

Math S.A.T. Scores Family Income

Student $5,000 $15,000 $25,000 $35,000 $45,000 $55,000 $65,000 $75,000 $90,000 $150,000

1 460 480 460 520 500 450 560 530 560 570 2 470 510 450 510 470 540 480 540 500 560 3 460 450 530 440 450 460 530 540 470 540 4 420 420 430 540 530 480 520 500 570 550 5 440 430 520 490 550 530 510 480 580 560 6 500 450 490 460 510 480 550 580 480 510 7 420 510 440 460 530 510 480 560 530 520 8 410 500 480 520 440 540 500 490 520 520 9 450 480 510 490 510 510 520 560 540 590

10 490 520 470 450 470 550 470 500 550 600

Mean 452 475 478 488 496 505 512 528 530 552

1For simplicity, we are assuming there are 10 scores for each income level. In reality, there may be a very large number of scores for each X (income) value, and each income level need not have the same number of observations.

guj75845_ch02.qxd 4/16/09 10:13 AM Page 23

example) corresponding to each value of the independent variable (here, annual family income) in the population as a whole. Thus, corresponding to an annual income of $25,000, the average math S.A.T. score is 478, whereas corresponding to an an- nual income of $45,000, the average math S.A.T. score is 496. In short, the PRL tells us how the mean, or average, value of Y (or any dependent variable) is related to each value of X (or any independent variable) in the whole population.

Since the PRL in Figure 2-1 is approximately linear, we can express it mathe- matically in the following functional form:

(2.1)

which is the mathematical equation of a straight line. In Equation (2.1), means the mean, or expected value, of Y corresponding to, or conditional upon, a given value of X. The subscript i refers to the ith subpopulation. Thus, in Table 2-1, is 452, which is the mean, or expected, value of Y in the first subpopulation (i.e., corresponding to X = $5000).

The last row of Table 2-1 gives the conditional mean values of Y. It is very important to note that is a function of Xi (linear in the present example). This means that the dependence of Y on X, technically called the regression of Y on X, can be defined simply as the mean of the distribution of Y values (as in Table 2-1), which has the given X. In other words, the population regression line (PRL) is a line that passes through the conditional means of Y. The mathematical form in which the PRL is expressed, such as Eq. (2.1), is called the population regression function (PRF), as it represents the regression line in the population as a whole. In the present instance the PRF is linear. (The more technical mean- ing of linearity is discussed in Section 2.6.)

E(Y|Xi)

E(Y|Xi = 5000)

E (Y|Xi)

E(Y|Xi) = B1 + B2Xi

24 PART ONE: THE LINEAR REGRESSION MODEL

0 350

400

450

500

550

600

650

20000 40000 60000 80000 100000 120000 140000 160000

Annual Family Income ($)

M at

h S

.A .T

. S co

re

Annual family income ($) and math S.A.T. scoreFIGURE 2-1

guj75845_ch02.qxd 4/16/09 10:13 AM Page 24

In Eq. (2.1), B1 and B2 are called the parameters, also known as the regression coefficients. B1 is also known as the intercept (coefficient) and B2 as the slope (coefficient). The slope coefficient measures the rate of change in the (conditional) mean value of Y per unit change in X. If, for example, the slope coefficient (B2) were 0.001, it would suggest that if annual family income were to increase by a dol- lar, the (conditional) mean value of Y would increase by 0.001 points. Because of the scale of the variables, it is easier to interpret the results for a one-thousand- dollar increase in annual family income; for each one-thousand-dollar increase in annual family income, we would expect to see a 1 point increase in the (conditional) mean value of the math S.A.T. score. B1 is the (conditional) mean value of Y if X is zero; it gives the average value of the math S.A.T. score if the annual family income were zero. We will have more to say about this interpre- tation of the intercept later in the chapter.

How do we go about finding the estimates, or numerical values, of the inter- cept and slope coefficients? We explore this in Section 2.8.

Before moving on, a word about terminology is in order. Since in regression analysis, as noted in Chapter 1, we are concerned with examining the behavior of the dependent variable conditional upon the given values of the independent vari- able(s), our approach to regression analysis can be termed conditional regression analysis.2 As a result, there is no need to use the adjective “conditional” all the time. Therefore, in the future expressions like will be simply written as E (Y ), with the explicit understanding that the latter in fact stands for the former. Of course, where there is cause for confusion, we will use the more extended notation.

2.3 STATISTICAL OR STOCHASTIC SPECIFICATION OF THE POPULATION REGRESSION FUNCTION

As we just discussed, the PRF gives the average value of the dependent variable corresponding to each value of the independent variable. Let us take another look at Table 2-1. We know, for example, that corresponding to X = $75,000, the average Y is 528 points. But if we pick one student at random from the 10 students corresponding to this income, we know that the math S.A.T. score for that stu- dent will not necessarily be equal to the mean value of 528. To be concrete, take the last student in this group. His or her math S.A.T. score is 500, which is below the mean value. By the same token, if you take the first student in that group, his or her score is 530, which is above the average value.

How do you explain the score of an individual student in relation to income? The best we can do is to say that any individual’s math S.A.T. score is equal to

E (Y|Xi)

CHAPTER TWO: BASIC IDEAS OF LINEAR REGRESSION: THE TWO-VARIABLE MODEL 25

2The fact that our analysis is conditional on X does not mean that X causes Y. It is just that we want to see the behavior of Y in relation to an X variable that is of interest to the analyst. For exam- ple, when the Federal Reserve Bank (the Fed) changes the Federal funds rate, it is interested in find- ing out how the economy responds. During the economic crisis of 2008 in the United States, the Fed reduced the Federal Funds rate several times to resuscitate the ailing economy. One of the key de- terminants of the demand for housing is the mortgage interest rate. It is therefore of great interest to prospective homeowners to track the mortgage interest rates. When the Fed reduces the Federal Funds rate, all other interest rates follow suit.

guj75845_ch02.qxd 4/16/09 10:13 AM Page 25

the average for that group plus or minus some quantity. Let us express this mathematically as

(2.2)

where u is known as the stochastic, or random, error term, or simply the error term.3 We have already encountered this term in Chapter 1. The error term is a random variable (r.v.), for its value cannot be controlled or known a priori. As we know from Appendix A, an r.v. is usually characterized by its probability distrib- ution (e.g., the normal or the t distribution).

How do we interpret Equation (2.2)? We can say that a student’s math S.A.T. score, say, the ith individual, corresponding to a specific family income can be expressed as the sum of two components. The first component is , which is simply the mean, or average, math score in the ith subpopulation; that is, the point on the PRL corresponding to the family income. This compo- nent may be called the systematic, or deterministic, component. The second component is ui, which may be called the nonsystematic, or random, component (i.e., determined by factors other than income). The error term ui is also known as the noise component.

To see this clearly, consider Figure 2-2, which is based on the data of Table 2-1. As this figure shows, at annual family income = $5000, one student scores 470

on the test, whereas the average math score at this income level is 452. Thus, this

(B1 + B2Xi)

Yi = B1 + B2Xi + ui

26 PART ONE: THE LINEAR REGRESSION MODEL

M at

h S

.A .T

. S co

re 528

u

500

470

452

u

5000 75000

Annual Family Income ($)

FIGURE 2-2

3The word stochastic comes from the Greek word stokhos meaning a “bull’s eye.” The outcome of throwing darts onto a dart board is a stochastic process, that is, a process fraught with misses. In sta- tistics, the word implies the presence of a random variable—a variable whose outcome is deter- mined by a chance experiment.

Math S.A.T. scores in relation to family income

guj75845_ch02.qxd 4/17/09 7:24 AM Page 26

student’s score exceeds the systematic component (i.e., the mean for the group) by 18 points. So his or her u component is +18 units. On the other hand, at income = $75,000, a randomly chosen second student scores 500 on the math test, whereas the average score for this group is 528. This person’s math score is less than the systematic component by 28 points; his or her u component is thus -28.

Eq. (2.2) is called the stochastic (or statistical) PRF, whereas Eq. (2.1) is called the deterministic, or nonstochastic, PRF. The latter represents the means of the various Y values corresponding to the specified income levels, whereas the for- mer tells us how individual math S.A.T. scores vary around their mean values due to the presence of the stochastic error term, u.

What is the nature of the u term?

2.4 THE NATURE OF THE STOCHASTIC ERROR TERM

1. The error term may represent the influence of those variables that are not explicitly included in the model. For example in our math S.A.T. scenario it may very well represent influences, such as a person’s wealth, the area where he or she lives, high school GPA, or math courses taken in school.

2. Even if we included all the relevant variables determining the math test score, some intrinsic randomness in the math score is bound to occur that cannot be explained no matter how hard we try. Human behavior, after all, is not totally predictable or rational. Thus, u may reflect this inherent randomness in human behavior.

3. u may also represent errors of measurement. For example, the data on annual family income may be rounded or the data on math scores may be suspect because in some communities few students plan to attend col- lege and therefore don’t take the test.

4. The principle of Ockham’s razor—that descriptions be kept as simple as possible until proved inadequate—would suggest that we keep our re- gression model as simple as possible. Therefore, even if we know what other variables might affect Y, their combined influence on Y may be so small and nonsystematic that you can incorporate it in the random term, u. Remember that a model is a simplification of reality. If we truly want to build reality into a model it may be too unwieldy to be of any practical use. In model building, therefore, some abstraction from re- ality is inevitable. By the way, William Ockham (1285–1349) was an English philosopher who maintained that a complicated explanation should not be accepted without good reason and wrote “Frustra fit per plura, quod fieri potest per pauciora—It is vain to do with more what can be done with less.”

It is for one or more of these reasons that an individual student’s math S.A.T. score will deviate from his or her group average (i.e., the systematic compo- nent). And as we will soon discover, this error term plays an extremely crucial role in regression analysis.

CHAPTER TWO: BASIC IDEAS OF LINEAR REGRESSION: THE TWO-VARIABLE MODEL 27

guj75845_ch02.qxd 4/16/09 10:13 AM Page 27

2.5 THE SAMPLE REGRESSION FUNCTION (SRF)

How do we estimate the PRF of Eq. (2.1), that is, obtain the values of B1 and B2? If we have the data from Table 2-1, the whole population, this would be a rela- tively straightforward task. All we have to do is to find the conditional means of Y corresponding to each X and then join these means. Unfortunately, in prac- tice, we rarely have the entire population at our disposal. Often we have only a sample from this population. (Recall from Chapter 1 and Appendix A our dis- cussion regarding the population and the sample.) Our task here is to estimate the PRF on the basis of the sample information. How do we accomplish this?

Pretend that you have never seen Table 2-1 but only had the data given in Table 2-2, which presumably represent a randomly selected sample of Y values corresponding to the X values shown in Table 2-1.

Unlike Table 2-1, we now have only one Y value corresponding to each X. The important question that we now face is: From the sample data of Table 2-2, can we estimate the average S.A.T. math score in the population as a whole corresponding to each X? In other words, can we estimate the PRF from the sample data? As you can well surmise, we may not be able to estimate the PRF accurately because of sampling fluctuations, or sampling error, a topic we discuss in Appendix C. To see this clearly, suppose another random sample, which is shown in Table 2-3, is drawn from the population of Table 2-1. If we plot the data of Tables 2-2 and 2-3, we obtain the scattergram shown in Figure 2-3.

Through the scatter points we have drawn visually two straight lines that fit the scatter points reasonably well. We will call these lines the sample regression lines (SRLs). Which of the two SRLs represents the true PRL? If we avoid the temptation of looking at Figure 2-1, which represents the PRL, there is no way we can be sure that either of the SRLs shown in Figure 2-3 represents the true PRL. For if we had yet another sample, we would obtain a third SRL. Supposedly, each SRL represents the PRL, but because of sampling variation, each is at best an approximation of the true PRL. In general, we would get K different SRLs for K different samples, and all these SRLs are not likely to be the same.

28 PART ONE: THE LINEAR REGRESSION MODEL

A RANDOM SAMPLE FROM TABLE 2-1

Y X

410 5000 420 15000 440 25000 490 35000 530 45000 530 55000 550 65000 540 75000 570 90000 590 150000

TABLE 2-2 A RANDOM SAMPLE FROM TABLE 2-1

Y X

420 5000 520 15000 470 25000 450 35000 470 45000 550 55000 470 65000 500 75000 550 90000 600 150000

TABLE 2-3

guj75845_ch02.qxd 4/16/09 10:13 AM Page 28

Now analogous to the PRF that underlies the PRL, we can develop the con- cept of the sample regression function (SRF) to represent the SRL. The sample counterpart of Eq. (2.1) may be written as

(2.3)

where ^ is read as “hat” or “cap,” and

where = estimator of E(Y|Xi), the estimator of the population conditional mean b1 = estimator of B1 b2 = estimator of B2

As noted in Appendix D, an estimator, or a sample statistic, is a rule or a for- mula that suggests how we can estimate the population parameter at hand. A particular numerical value obtained by the estimator in an application, as we know, is an estimate. (See Appendix D for the discussion on point and interval estimators.)

If we look at the scattergram in Figure 2-3, we observe that not all the sample data lie exactly on the respective sample regression lines. Therefore, just as we developed the stochastic PRF of Eq. (2.2), we need to develop the stochastic version of Eq. (2.3), which we write as

(2.4)

where ei = the estimator of ui.

Yi = b1 + b2Xi + ei

NYi

YNi = b1 + b2Xi

CHAPTER TWO: BASIC IDEAS OF LINEAR REGRESSION: THE TWO-VARIABLE MODEL 29

M at

h S

.A .T

. S co

re

0 20000 40000 60000 80000 100000 120000 140000 160000

Annual Family Income ($)

650

600

550

500

450

400

350

SRL1

SRL2

Sample 1

Sample 2

Sample regression lines based on two independent samplesFIGURE 2-3

guj75845_ch02.qxd 4/16/09 10:13 AM Page 29

We call ei the residual term, or simply the residual. Conceptually, it is analo- gous to ui and can be regarded as the estimator of the latter. It is introduced in the SRF for the same reasons as ui was introduced in the PRF. Simply stated, ei represents the difference between the actual Y values and their estimated values from the sample regression. That is,

(2.5)

To summarize, our primary objective in regression analysis is to estimate the (stochastic) PRF

on the basis of the SRF

because more often than not our analysis is based on a single sample from some population. But because of sampling variation, our estimate of the PRF based on the SRF is only approximate. This approximation is shown in Figure 2-4. Keep in mind that we actually do not observe B1, B2, and u. What we observe are their proxies, b1, b2, and e, once we have a specific sample.

For a given Xi, shown in this figure, we have one (sample) observation, Yi. In terms of the SRF, the observed Yi can be expressed as

(2.6)Yi = NYi + ei

Yi = b1 + b2Xi + ei

Yi = B1 + B2Xi + ui

ei = Yi - NYi

30 PART ONE: THE LINEAR REGRESSION MODEL

M at

h S

.A .T

. S co

re

u1

Y1 unA

e1

en

Yn

Yn

X1 Xn Annual Family Income ($)

PRF: E(Y⎪Xi) � B1 � B2 Xi

Y1 ˆ

ˆ

SRF: Yi � b1 � b2 Xi ˆ

The population and sample regression linesFIGURE 2-4

guj75845_ch02.qxd 4/16/09 10:13 AM Page 30

and in terms of the PRF it can be expressed as

(2.7)

Obviously, in Figure 2-4, underestimates the true mean value E(Y|X1) for the X1 shown therein. By the same token, for any Y to the right of point A in Figure 2-4 (e.g., ), the SRF will overestimate the true PRF. But you can read- ily see that such over- and underestimation is inevitable due to sampling fluctuations.

The important question now is: Granted that the SRF is only an approxima- tion of the PRF, can we find a method or a procedure that will make this ap- proximation as close as possible? In other words, how should we construct the SRF so that b1 is as close as possible to B1 and b2 is as close as possible to B2, be- cause generally we do not have the entire population at our disposal? As we will show in Section 2.8, we can indeed find a “best-fitting” SRF that will mirror the PRF as faithfully as possible. It is fascinating to consider that this can be done even though we never actually determine the PRF itself.

2.6 THE SPECIAL MEANING OF THE TERM “LINEAR” REGRESSION

Since in this text we are concerned primarily with “linear” models like Eq. (2.1), it is essential to know what the term linear really means, for it can be interpreted in two different ways.

Linearity in the Variables

The first and perhaps the more “natural” meaning of linearity is that the condi- tional mean value of the dependent variable is a linear function of the indepen- dent variable(s) as in Eq. (2.1) or Eq. (2.2) or in the sample counterparts, Eqs. (2.3) and (2.4).4 In this interpretation, the following functions are not linear:

(2.8)

(2.9)

because in Equation (2.8) X appears with a power of 2, and in Eq. (2.9) it appears in the inverse form. For regression models linear in the explanatory variable(s), the rate of change in the dependent variable remains constant for a unit change in the explanatory variable; that is, the slope remains constant. But for a regression

E(Y) = B1 + B2 1 Xi

E(Y) = B1 + B2Xi2

NYn

NY1

Yi = E(Y|Xi) + ui

CHAPTER TWO: BASIC IDEAS OF LINEAR REGRESSION: THE TWO-VARIABLE MODEL 31

4A function Y = f(X) is said to be linear in X if (1) X appears with a power of 1 only; that is, terms such as X2 and are excluded; and (2) X is not multiplied or divided by another variable (e.g., and X/Z, where Z is another variable).X # Z

1X

guj75845_ch02.qxd 4/16/09 10:13 AM Page 31

model nonlinear in the explanatory variables the slope does not remain constant. This can be seen more clearly in Figure 2-5.

As Figure 2-5 shows, for the regression (2.1), the slope—the rate of change in E(Y)—the mean of Y, remains the same, namely, B2 no matter at what value of X we measure the change. But for regression, say, Eq. (2.8), the rate of change in the mean value of Y varies from point to point on the regression line; it is actually a curve here.5

Linearity in the Parameters

The second interpretation of linearity is that the conditional mean of the depen- dent variable is a linear function of the parameters, the B’s; it may or may not be linear in the variables. Analogous to a linear-in-variable function, a function is said to be linear in the parameter, say, B2, if B2 appears with a power of 1 only. On this definition, models (2.8) and (2.9) are both linear models because B1 and B2 enter the models linearly. It does not matter that the variable X enters non- linearly in both models. However, a model of the type

(2.10)

is nonlinear in the parameter model since B2 enters with a power of 2. In this book we are primarily concerned with models that are linear in the

parameters. Therefore, from now on the term linear regression will mean a regres- sion that is linear in the parameters, the B’s (i.e., the parameters are raised to the power of 1 only); it may or may not be linear in the explanatory variables.6

E(Y) = B1 + B22Xi

32 PART ONE: THE LINEAR REGRESSION MODEL

Slope B2 is same at each point on the curve.

Q u

an ti

ty

Q u

an ti

ty

PricePrice

Y Y

XX

1

B2

1

Y i = B1 + B2 Xi

B2

B2

1 1

B2 Slope B2 varies from point to point on the curve.

Y i = B1 + B2 (1/Xi)

(a) (b)

(a) Linear demand curve; (b) nonlinear demand curveFIGURE 2-5

5Those who know calculus will recognize that in the linear model the slope, that is, the deriva- tive of Y with respect to X, is constant, equal to B2, but in the nonlinear model Eq. (2.8) it is equal to

, which obviously will depend on the value of X at which the slope is measured, and is therefore not constant.

6This is not to suggest that nonlinear (in-the-parameters) models like Eq. (2.10) cannot be esti- mated or that they are not used in practice. As a matter of fact, in advanced courses in econometrics such models are studied in depth.

-B2(1/X2i )

guj75845_ch02.qxd 4/16/09 10:13 AM Page 32

2.7 TWO-VARIABLE VERSUS MULTIPLE LINEAR REGRESSION

So far in this chapter we have considered only the two-variable, or simple, regression models in which the dependent variable is a function of just one explanatory variable. This was done just to introduce the fundamental ideas of regression analysis. But the concept of regression can be extended easily to the case where the dependent variable is a function of more than one explanatory variable. For instance, if the math S.A.T. score is a function of income (X2), num- ber of math classes taken (X3), and age of the student (X4), we can write the extended math S.A.T. function as

(2.11)

[Note: ] Equation (2.11) is an example of a multiple linear regression, a regression in

which more than one independent, or explanatory, variable is used to explain the behavior of the dependent variable. Model (2.11) states that the (condi- tional) mean value of the math S.A.T. score is a linear function of income, num- ber of math classes taken, and age of the student. The score function of a student (i.e., the stochastic PRF) can be expressed as

(2.12)

which shows that the individual math S.A.T. score will differ from the group mean by the factor u, which is the stochastic error term. As noted earlier, even in a multiple regression we introduce the error term because we cannot take into account all the forces that might affect the dependent variable.

Notice that both Eqs. (2.11) and (2.12) are linear in the parameters and are therefore linear regression models. The explanatory variables themselves do not need to enter the model linearly, although in the present example they do.

2.8 ESTIMATION OF PARAMETERS:THE METHOD OF ORDINARY LEAST SQUARES

As noted in Section 2.5, we estimate the population regression function (PRF) on the basis of the sample regression function (SRF), since in practice we only have a sample (or two) from a given population. How then do we estimate the PRF? And how do we find out whether the estimated PRF (i.e., the SRF) is a “good” estimate of the true PRF? We will answer the first question in this chap- ter and take up the second question—of the “goodness” of the estimated PRF— in Chapter 3.

To introduce the fundamental ideas of estimation of the PRF, we consider the simplest possible linear regression model, namely, the two-variable linear re- gression in which we study the relationship of the dependent variable Y to a single

= E(Y) + ui

Yi = B1 + B2X2i + B3X3i + B4X4i + ui

E(Y) = E(Y|X2i, X3i, X4i).

E(Y) = B1 + B2X2i + B3X3i + B4X4i

CHAPTER TWO: BASIC IDEAS OF LINEAR REGRESSION: THE TWO-VARIABLE MODEL 33

guj75845_ch02.qxd 4/16/09 10:13 AM Page 33

explanatory variable X. In Chapter 4 we extend the analysis to the multiple regression, where we will study the relationship of the dependent variable Y to more than one explanatory variable.

The Method of Ordinary Least Squares

Although there are several methods of obtaining the SRF as an estimator of the true PRF, in regression analysis the method that is used most frequently is that of least squares (LS), more popularly known as the method of ordinary least squares (OLS).7 We will use the terms LS and OLS methods interchangeably. To explain this method, we first explain the least squares principle.

The Least Squares Principle Recall our two-variable PRF, Eq. (2.2):

Since the PRF is not directly observable (Why?), we estimate it from the SRF

which we can write as

which shows that the residuals are simply the differences between the actual and estimated Y values, the latter obtained from the SRF, Eq. (2.3). This can be seen more vividly in Figure 2-4.

Now the best way to estimate the PRF is to choose b1 and b2, the estimators of B1 and B2, in such a way that the residuals ei are as small as possible. The method of ordinary least squares (OLS) states that b1 and b2 should be chosen in such a way that the residual sum of squares (RSS), is as small as possible.8

Algebraically, the least squares principle states

(2.13) = a (Yi - b1 - b2Xi) 2

Minimize a e 2 i = a (Yi - NY)

2

ge2i ,

= Yi - b1 - b2Xi [using Eq. (2.3)] = Yi - NYi

ei = actual Yi - predicted Yi

Yi = b1 + b2X1 + ei

Yi = B1 + B2Xi + ui

34 PART ONE: THE LINEAR REGRESSION MODEL

7Despite the name, there is nothing ordinary about this method. As we will show, this method has several desirable statistical properties. It is called OLS because there is another method, called the generalized least squares (GLS) method, of which OLS is a special case.

8Note that the smaller the ei is, the smaller their sum of squares will be. The reason for consider- ing the squares of ei and not the ei themselves is that this procedure avoids the problem of the sign of the residuals. Note that ei can be positive as well as negative.

guj75845_ch02.qxd 4/16/09 10:13 AM Page 34

As you can observe from Eq. (2.13), once the sample values of Y and X are given, RSS is a function of the estimators b1 and b2. Choosing different values of b1 and b2 will yield different e’s and hence different values of RSS. To see this, just rotate the SRF shown in Figure 2-4 any way you like. For each rota- tion, you will get a different intercept (i.e., b1) and a different slope (i.e., b2). We want to choose the values of these estimators that will give the smallest possi- ble RSS.

How do we actually determine these values? This is now simply a matter of arithmetic and involves the technique of differential calculus. Without going into detail, it can be shown that the values of b1 and b2 that actually minimize the RSS given in Eq. (2.13) are obtained by solving the following two simultaneous equations. (The details are given in Appendix 2A at the end of this chapter.)

(2.14)

(2.15)

where n is the sample size. These simultaneous equations are known as the (least squares) normal equations.

In Equations (2.14) and (2.15) the unknowns are the b’s and the knowns are the quantities involving sums, squared sums, and the sum of the cross-products of the variables Y and X, which can be easily obtained from the sample at hand. Now solving these two equations simultaneously (using any high school alge- bra trick you know), we obtain the following solutions for b1 and b2.

(2.16)

which is the estimator of the population intercept, B1. The sample intercept is thus the sample mean value of Y minus the estimated slope times the sample mean value of X.

(2.17)

which is the estimator of the population slope coefficient B2. Note that

that is, the small letters denote deviations from the sample mean values, a convention that we will adopt in this book. As you can see from the formula for b2, it is simpler

xi = (Xi - X) and yi = (Yi - Y)

= gXiYi - nX Y gX2i - nX2

= g (Xi - X)(Yi - Y) g (Xi - X)2

b2 = gxiyi gx2i

b1 = Y - b2X

aYiXi = b1aXi + b2aX 2 i

aYi = nb1 + b2aXi

CHAPTER TWO: BASIC IDEAS OF LINEAR REGRESSION: THE TWO-VARIABLE MODEL 35

guj75845_ch02.qxd 4/16/09 10:13 AM Page 35

to write the estimator using the deviation form. Expressing the values of a variable from its mean value does not change the ranking of the values, since we are subtracting the same constant from each value. Note that b1 and b2 are solely expressed in terms of quantities that can be readily computed from the sample at hand. Of course, these days the computer will do all the calculations for you.

The estimators given in Equations (2.16) and (2.17) are known as OLS esti- mators, since they are obtained by the method of OLS.

Before proceeding further, we should note a few interesting features of the OLS estimators given in Eqs. (2.16) and (2.17):

1. The SRF obtained by the method of OLS passes through the sample mean values of X and Y, which is evident from Eq. (2.16), for it can be written as

(2.18)

2. The mean value of the residuals, is always zero, which provides a check on the arithmetical accuracy of the calculations (see Table 2-4).

3. The sum of the product of the residuals e and the values of the explana- tory variable X is zero; that is, these two variables are uncorrelated (on the definition of correlation, see Appendix B). Symbolically,

(2.19)

This provides yet another check on the least squares calculations. 4. The sum of the product of the residuals ei and the estimated

is zero; that is, is zero (see Question 2.25).

2.9 PUTTING IT ALL TOGETHER

Let us use the sample data given in Table 2-2 to compute the values of b1 and b2. The necessary computations involved in implementing formulas (2.16) and (2.17) are laid out in Table 2-4. Keep in mind that the data given in Table 2-2 are a random sample from the population given in Table 2-1.

From the computations shown in Table 2-4, we obtain the following sample math S.A.T. score regression:

(2.20)

where Y represents math S.A.T. score and X represents annual family income. Note that we have put a cap on Y to remind us that it is an estimator of the true popu- lation mean corresponding to the given level of X (recall Eq. 2.3). The estimated regression line is shown in Figure 2-6.

NYi = 432.4138 + 0.0013Xi

gei NYi Yi(= NYi)

a eiXi = 0

e(=gei/n)

Y = b1 + b2X

36 PART ONE: THE LINEAR REGRESSION MODEL

guj75845_ch02.qxd 4/16/09 10:13 AM Page 36

Interpretation of the Estimated Math S.A.T. Score Function

The interpretation of the estimated math S.A.T. score function is as follows: The slope coefficient of 0.0013 means that, other things remaining the same, if annual family income goes up by a dollar, the mean or average math S.A.T. score goes up by about 0.0013 points. The intercept value of 432.4138 means

CHAPTER TWO: BASIC IDEAS OF LINEAR REGRESSION: THE TWO-VARIABLE MODEL 37

TABLE 2-4 RAW DATA (FROM TABLE 2-2) FOR MATH S.A.T. SCORES

Yi Xi aYiXi xi yi ayixi ei aeixi

410 5000 2050000 25000000 -51000 -97 2601000000 9409 4947000 439.073 -29.0733 845.255 1482737.069 420 15000 6300000 225000000 -41000 -87 1681000000 7569 3567000 452.392 -32.3922 1049.257 1328081.897 440 25000 11000000 625000000 -31000 -67 961000000 4489 2077000 465.711 -25.7112 661.066 797047.4138 490 35000 17150000 1225000000 -21000 -17 441000000 289 357000 479.030 10.9698 120.337 -230366.3793 530 45000 23850000 2025000000 -11000 23 121000000 529 -253000 492.349 37.6509 1417.587 -414159.4828 530 55000 29150000 3025000000 -1000 23 1000000 529 -23000 505.668 24.3319 592.0412 -24331.89655 550 65000 35750000 4225000000 9000 43 81000000 1849 387000 518.987 31.0129 961.8019 279116.3793 540 75000 40500000 5625000000 19000 33 361000000 1089 627000 532.306 7.69397 59.1971 146185.3448 570 90000 51300000 8100000000 34000 63 1156000000 3969 2142000 552.284 17.7155 313.8396 602327.5862 590 150000 88500000 22500000000 94000 83 8836000000 6889 7802000 632.198 -42.1982 1780.694 -3966637.931

5070 560000 305550000 47600000000 0 0 16240000000 36610 21630000 5070 0 7801.0776 0

Note: xi = (Xi - X ); yi = (Yi - Y ); X = 56000; Y = 507.

e 2iNYiy 2 ix

2 iX

2 i

Y

0 20000 40000 60000 80000 100000 120000 140000 160000

X

650

600

550

500

450

400

350

Fitted Line Plot Ŷ � 432.4138 � 0.0013X

Regression line based on data from Table 2-4FIGURE 2-6

guj75845_ch02.qxd 4/16/09 10:13 AM Page 37

that if family income is zero, the mean math score will be about 432.4138. Very often such an interpretation has no economic meaning. For example, we have no data where an annual family income is zero. As we will see throughout the book, often the intercept has no particular economic meaning. In general you have to use common sense in interpreting the intercept term, for very often the sample range of the X values (family income in our example) may not include zero as one of the observed values. Perhaps it is best to interpret the intercept term as the mean or average effect on Y of all the variables omitted from the regression model.

2.10 SOME ILLUSTRATIVE EXAMPLES

Now that we have discussed the OLS method and learned how to estimate a PRF, let us provide some concrete applications of regression analysis.

Example 2.1. Years of Schooling and Average Hourly Earnings

Based on a sample of 528 observations, Table 2-5 gives data on average hourly wage Y($) and years of schooling (X).

Suppose we want to find out how Y behaves in relation to X. From human capital theories of labor economics, we would expect average wage to increase with years of schooling. That is, we expect a positive relationship between the two variables; it would be bad news if such were not the case.

The regression results based on the data in Table 2-5 are as follows:

(2.21)NYi = -0.0144 + 0.7241Xi

38 PART ONE: THE LINEAR REGRESSION MODEL

AVERAGE HOURLY WAGE BY EDUCATION

Years of schooling Average hourly wage ($) Number of people

6 4.4567 3 7 5.7700 5 8 5.9787 15 9 7.3317 12

10 7.3182 17 11 6.5844 27 12 7.8182 218 13 7.8351 37 14 11.0223 56 15 10.6738 13 16 10.8361 70 17 13.6150 24 18 13.5310 31

Source: Arthur S. Goldberger, Introductory Econometrics, Harvard University Press, Cambridge, Mass., 1998, Table 1.1, p. 5 (adapted).

TABLE 2-5

guj75845_ch02.qxd 4/16/09 10:13 AM Page 38

As these results show, there is a positive association between education and earnings, which accords with prior expectations. For every additional year of schooling, the mean wage rate goes up by about 72 cents per hour.9

The negative intercept in the present instance has no particular economic meaning.

Example 2.2. Okun’s Law

Based on the U.S. data for 1947 to 1960, the late Arthur Okun of the Brookings Institution and a former chairman of the President’s Council of Economic Advisers obtained the following regression, known as Okun’s law:

(2.22)

where Yt = change in the unemployment rate, percentage points Xt = percent growth rate in real output, as measured by real GDP

2.5 = the long-term, or trend, rate of growth of output historically observed in the United States

In this regression the intercept is zero and the slope coefficient is -0.4. Okun’s law says that for every percentage point of growth in real GDP above 2.5 percent, the unemployment rate declines by 0.4 percentage points.

Okun’s law has been used to predict the required growth in real GDP to reduce the unemployment rate by a given percentage point. Thus, a growth rate of 5 percent in real GDP will reduce the unemployment rate by 1 per- centage point, or a growth rate of 7.5 percent is required to reduce the unemployment rate by 2 percentage points. In Problem 2.17, which gives comparatively more recent data, you are asked to find out if Okun’s law still holds.

This example shows how sometimes a simple (i.e., two-variable) regres- sion model can be used for policy purposes.

Example 2.3. Stock Prices and Interest Rates

Stock prices and interest rates are key economic indicators. Investors in stock markets, individual or institutional, watch very carefully the movements in the interest rates. Since interest rates represent the cost of borrowing money, they have a vast effect on investment and hence on the profitability of a com- pany. Macroeconomic theory would suggest an inverse relationship between stock prices and interest rates.

As a measure of stock prices, let us use the S&P 500 composite index ($1941–1943 = 10), and as a measure of interest rates, let us use the three-month

Yt = -0.4(Xt - 2.5)

CHAPTER TWO: BASIC IDEAS OF LINEAR REGRESSION: THE TWO-VARIABLE MODEL 39

9Since the data in Table 2-5 refer to the mean wage for the various categories, the slope coefficient here should strictly be interpreted as the average increase in the mean hourly earnings.

guj75845_ch02.qxd 4/16/09 10:13 AM Page 39

Treasury bill rate (%). Table 2-6, found on the textbook’s Web site, gives data on these variables for the period 1980–2007.

Plotting these data, we obtain the scattergram as shown in Figure 2-7. The scattergram clearly shows that there is an inverse relationship between the two variables, as per theory. But the relationship between the two is not linear (i.e., straight line); it more closely resembles Figure 2-5(b). Therefore, let us maintain that the true relationship is:

(2.23)

Note that Eq. (2.23) is a linear regression model, as the parameters in the model are linear. It is, however, nonlinear in the variable X. If you let Z = 1/X, then the model is linear in the parameters as well as the variables Y and Z.

Using the EViews statistical package, we estimate Eq. (2.23) by OLS, giving the following results:

(2.24)

How do we interpret these results? The value of the intercept has no practi- cal economic meaning. The interpretation of the coefficient of (1/X) is rather tricky. Literally interpreted, it suggests that if the reciprocal of the three- month Treasury bill rate goes up by one unit, the average value of the S&P 500 index will go up by about 997 units. This is, however, not a very en- lightening interpretation. If you want to measure the rate of change of

NYt = 404.4067 + 996.866(1/Xt)

Yt = B1 + B2(1/Xi) + ui

40 PART ONE: THE LINEAR REGRESSION MODEL

0 2 864 10 1412 16

Treasury Bill Rate (%)

S &

P 5

00 I

n d

ex

600

1000

1200

1400

1600

800

400

200

0

S&P 500 composite index and three-month Treasury bill rate, 1980–2007FIGURE 2-7

guj75845_ch02.qxd 4/16/09 10:13 AM Page 40

(mean) Y with respect to X (i.e., the derivative of Y with respect to X), then as footnote 5 shows, this rate of change is given by which de- pends on the value taken by X. Suppose Knowing that the estimated B2 is 996.866, we find the rate of change at this X value as (approx). That is, starting with a Treasury bill rate of about 2 percent, if that rate goes up by one percentage point, on average, the S&P 500 index will decline by about 249 units. Of course, an increase in the Treasury bill rate from 2 percent to 3 percent is a substantial increase.

Interestingly, if you had disregarded Figure 2-5 and had simply fitted the straight line regression to the data in Table 2-6, (found on the textbook’s Web site), you would obtain the following regression:

(2.25)

Here the interpretation of the intercept term is that if the Treasury bill rate were zero, the average value of the S&P index would be about 1229. Again, this may not have any concrete economic meaning. The slope coefficient here suggests that if the Treasury bill rate were to increase by one unit, say, one percentage point, the average value of the S&P index would go down by about 99 units.

Regressions (2.24) and (2.25) bring out the practical problems in choosing an appropriate model for empirical analysis. Which is a better model? How do we know? What tests do we use to choose between the two models? We will pro- vide answers to these questions as we progress through the book (see Chapter 5). A question to ponder: In Eq. (2.24) the sign of the slope coefficient is positive, whereas in Eq. (2.25) it is negative. Are these findings conflicting?

Example 2.4. Median Home Price and Mortgage Interest Rate in the United States, 1980–2007

Over the past several years there has been a surge in home prices across the United States. It is believed that this surge is due to sharply falling mortgage interest rates. To see the impact of mortgage interest rates on home prices, Table 2-7 (found on the textbook’s Web site) gives data on median home prices (1000 $) and 30-year fixed rate mortgage (%) in the United States for the period 1980–2007.

These data are plotted in Figure 2-8. As a first approximation, if you fit a straight line regression model, you

will obtain the following results, where Y = median home price (1000 $) and X = 30-year fixed rate mortgage (%):

(2.26)NYt = 329.0041 - 17.3694Xt

NYt = 1229.3414 - 99.4014Xt

-249.22 X = 2.

-B2(1/Xi2),

CHAPTER TWO: BASIC IDEAS OF LINEAR REGRESSION: THE TWO-VARIABLE MODEL 41

guj75845_ch02.qxd 4/16/09 10:13 AM Page 41

These results show that if the mortgage interest rate goes up by 1 percentage point,10 on average, the median home price goes down by about 17.4 units or about $17,400. (Note: Y is measured in thousands of dollars.) Literally inter- preted, the intercept coefficient of about 329 would suggest that if the mort- gage interest rate were zero, the median home price on average would be about $329,000, an interpretation that may stretch our credulity.

It seems that falling interest rates do have a substantial impact on home prices. A question: If we had taken median family income into account, would this conclusion still stand?

Example 2.5. Antique Clocks and Their Prices

The Triberg Clock Company of Schonachbach, Germany, holds an annual an- tique clock auction. Data on about 32 clocks (the age of the clock, the number of bidders, and the price of the winning bid in marks) are given in Table 2-14 in Problem 2.19. Note that this auction took place about 25 years ago.

If we believe that the price of the winning bid depends on the age of the clock—the older the clock, the higher the price, ceteris paribus—we would expect a positive relationship between the two. Similarly, the higher the num- ber of bidders, the higher the auction price because a large number of bidders for a particular clock would suggest that that clock is more valuable, and hence we would expect a positive relationship between the two variables.

42 PART ONE: THE LINEAR REGRESSION MODEL

M ed

ia n

H om

e P

ri ce

(1 00

0 $)

5 7 9 11 13 15 17 19

Interest Rate (%)

350

300

250

200

150

100

50

Median home prices and interest rates, 1980–2007FIGURE 2-8

10Note that there is a difference between a 1 percentage point increase and a 1 percent increase. For example, if the current interest rate is 6 percent but then goes to 7 percent, this represents a 1 percentage point increase; the percentage increase is, however, .A7 - 66 B * 100 = 16.6%

guj75845_ch02.qxd 4/16/09 10:13 AM Page 42

Using the data given in Table 2-14, we obtained the following OLS regressions:

Price = -191.6662 + 10.4856 Age (2.27)

Price = 807.9501 + 54.5724 Bidders (2.28)

As these results show, the auction price is positively related to the age of the clock, as well as to the number of bidders present at the auction.

In Chapter 4 on multiple regression we will see what happens when we regress price on age and number of bidders together, rather than individu- ally, as in the preceding two regressions.

The regression results presented in the preceding examples can be obtained eas- ily by applying the OLS formulas Eq. (2.16) and Eq. (2.17) to the data presented in the various tables. Of course, this would be very tedious and very time- consuming to do manually. Fortunately, there are several statistical software packages that can estimate regressions in practically no time. In this book we will use the EViews and MINITAB software packages to estimate several re- gression models because these packages are comprehensive, easy to use, and readily available. (Excel can also do simple and multiple regressions.) Throughout this book, we will reproduce the computer output obtained from these pack- ages. But keep in mind that there are other software packages that can estimate all kinds of regression models. Some of these packages are LIMDEP, MICROFIT, PC-GIVE, RATS, SAS, SHAZAM, SPSS, and STATA.

2.11 SUMMARY

In this chapter we introduced some fundamental ideas of regression analysis. Starting with the key concept of the population regression function (PRF), we developed the concept of linear PRF. This book is primarily concerned with lin- ear PRFs, that is, regressions that are linear in the parameters regardless of whether or not they are linear in the variables. We then introduced the idea of the stochastic PRF and discussed in detail the nature and role of the stochastic error term u. PRF is, of course, a theoretical or idealized construct because, in practice, all we have is a sample(s) from some population. This necessitated the discussion of the sample regression function (SRF).

We then considered the question of how we actually go about obtaining the SRF. Here we discussed the popular method of ordinary least squares (OLS) and presented the appropriate formulas to estimate the parameters of the PRF. We illustrated the OLS method with a fully worked-out numerical example as well as with several practical examples.

Our next task is to find out how good the SRF obtained by OLS is as an esti- mator of the true PRF. We undertake this important task in Chapter 3.

CHAPTER TWO: BASIC IDEAS OF LINEAR REGRESSION: THE TWO-VARIABLE MODEL 43

guj75845_ch02.qxd 4/16/09 10:13 AM Page 43

44 PART ONE: THE LINEAR REGRESSION MODEL

KEY TERMS AND CONCEPTS

The key terms and concepts introduced in this chapter are

Regression analysis a) explained, or dependent,

variable b) independent, or explanatory,

variable Scattergram; scatter diagram Population regression line (PRL)

a) conditional mean, or conditional expected, values

Population regression function (PRF)

Regression coefficients; parameters a) intercept b) slope

Conditional regression analysis Stochastic, or random, error term;

error term a) noise component b) stochastic, or statistical, PRF

c) deterministic, or nonstochastic, PRF

Sample regression line (SRL) Sample regression function (SRF) Estimator; sample statistic Estimate Residual term e; residual Linearity in variables Linearity in parameters

a) linear regression Two-variable, or simple, regression

vs. multiple linear regression Estimation of parameters

a) the method of ordinary least squares (OLS)

b) the least squares principle c) residual sum of squares (RSS) d) normal equations e) OLS estimators

QUESTIONS

2.1. Explain carefully the meaning of each of the following terms: a. Population regression function (PRF). b. Sample regression function (SRF). c. Stochastic PRF. d. Linear regression model. e. Stochastic error term (ui). f. Residual term (ei). g. Conditional expectation. h. Unconditional expectation. i. Regression coefficients or parameters. j. Estimators of regression coefficients.

2.2. What is the difference between a stochastic population regression function (PRF) and a stochastic sample regression function (SRF)?

2.3. Since we do not observe the PRF, why bother studying it? Comment on this statement.

2.4. State whether the following statements are true, false, or uncertain. Give your reasons. Be precise. a. The stochastic error term ui and the residual term ei mean the same thing. b. The PRF gives the value of the dependent variable corresponding to each

value of the independent variable. c. A linear regression model means a model linear in the variables.

guj75845_ch02.qxd 4/16/09 10:13 AM Page 44

d. In the linear regression model the explanatory variable is the cause and the dependent variable is the effect.

e. The conditional and unconditional mean of a random variable are the same thing.

f. In Eq. (2.2) the regression coefficients, the B’s, are random variables, whereas the b’s in Eq. (2.4) are the parameters.

g. In Eq. (2.1) the slope coefficient B2 measures the slope of Y per unit change in X.

h. In practice, the two-variable regression model is useless because the behav- ior of a dependent variable can never be explained by a single explanatory variable.

i. The sum of the deviation of a random variable from its mean value is always equal to zero.

2.5. What is the relationship between a. B1 and b1; b. B2 and b2; and c. ui and ei? Which of these entities can be ob-

served and how? 2.6. Can you rewrite Eq. (2.22) to express X as a function of Y? How would you

interpret the converted equation? 2.7. The following table gives pairs of dependent and independent variables. In

each case state whether you would expect the relationship between the two variables to be positive, negative, or uncertain. In other words, tell whether the slope coefficient will be positive, negative, or neither. Give a brief justification in each case.

Dependent variable Independent variable

(a) GDP Rate of interest (b) Personal savings Rate of interest (c) Yield of crop Rainfall (d ) U.S. defense expenditure Soviet Union’s defense expenditure (e) Number of home runs hit by Annual salary

a star baseball player (f ) A president’s popularity Length of stay in office (g) A student’s first-year grade- S.A.T. score

point average (h) A student’s grade in econometrics Grade in statistics (i) Imports of Japanese cars U.S. per capita income

PROBLEMS

2.8. State whether the following models are linear regression models: a. b. c. d. e. f. Note: ln stands for the natural log, that is, log to the base e. (More on this in Chapter 4.)

Yi = B1 + B32 Xi + ui Yi = B1 + B2B3 Xi + ui ln Yi = B1 + B2 ln Xi + ui ln Yi = B1 + B2 Xi + ui Yi = B1 + B2 ln Xi + ui Yi = B1 + B2(1/Xi)

CHAPTER TWO: BASIC IDEAS OF LINEAR REGRESSION: THE TWO-VARIABLE MODEL 45

guj75845_ch02.qxd 4/16/09 10:13 AM Page 45

2.9. Table 2-8 gives data on weekly family consumption expenditure (Y) (in dollars) and weekly family income (X) (in dollars).

46 PART ONE: THE LINEAR REGRESSION MODEL

HYPOTHETICAL DATA ON WEEKLY CONSUMPTION EXPENDITURE AND WEEKLY INCOME

Weekly income Weekly consumption expenditure ($)(X ) ($) (Y )

80 55, 60, 65, 70, 75 100 65, 70, 74, 80, 85, 88 120 79, 84, 90, 94, 98 140 80, 93, 95, 103, 108, 113, 115 160 102, 107, 110, 116, 118, 125 180 110, 115, 120, 130, 135, 140 200 120, 136, 140, 144, 145 220 135, 137, 140, 152, 157, 160, 162 240 137, 145, 155, 165, 175, 189 260 150, 152, 175, 178, 180, 185, 191

TABLE 2-8

a. For each income level, compute the mean consumption expenditure, that is, the conditional expected value.

b. Plot these data in a scattergram with income on the horizontal axis and consumption expenditure on the vertical axis.

c. Plot the conditional means derived in part (a) in the same scattergram cre- ated in part (b).

d. What can you say about the relationship between Y and X and between mean Y and X?

e. Write down the PRF and the SRF for this example. f. Is the PRF linear or nonlinear?

2.10. From the data given in the preceding problem, a random sample of Y was drawn against each X. The result was as follows:

Y 70 65 90 95 110 115 120 140 155 150

X 80 100 120 140 160 180 200 220 240 260

a. Draw the scattergram with Y on the vertical axis and X on the horizontal axis. b. What can you say about the relationship between Y and X? c. What is the SRF for this example? Show all your calculations in the manner

of Table 2-4. d. On the same diagram, show the SRF and PRF. e. Are the PRF and SRF identical? Why or why not?

2.11. Suppose someone has presented the following regression results for your con- sideration:

where Y = coffee consumption in the United States (cups per person per day) X = retail price of coffee ($ per pound) t = time period

a. Is this a time series regression or a cross-sectional regression? b. Sketch the regression line.

NYt = 2.6911 - 0.4795Xt

E(Y|Xi),

guj75845_ch02.qxd 4/16/09 10:13 AM Page 46

c. What is the interpretation of the intercept in this example? Does it make economic sense?

d. How would you interpret the slope coefficient? e. Is it possible to tell what the true PRF is in this example? f. The price elasticity of demand is defined as the percentage change in the

quantity demanded for a percentage change in the price. Mathematically, it is expressed as

That is, elasticity is equal to the product of the slope and the ratio of X to Y, where X = the price and Y = the quantity. From the regression results pre- sented earlier, can you tell what the price elasticity of demand for coffee is? If not, what additional information would you need to compute the price elasticity?

2.12. Table 2-9 gives data on the Consumer Price Index (CPI) for all items (1982–1984 = 100) and the Standard & Poor’s (S&P) index of 500 common stock prices (base of index: 1941–1943 = 10).

Elasticity = Slopea X Y b

CHAPTER TWO: BASIC IDEAS OF LINEAR REGRESSION: THE TWO-VARIABLE MODEL 47

CONSUMER PRICE INDEX (CPI) AND S&P 500 INDEX (S&P), UNITED STATES, 1978–1989

Year CPI S&P

1978 65.2 96.02 1979 72.6 103.01 1980 82.4 118.78 1981 90.9 128.05 1982 96.5 119.71 1983 99.6 160.41 1984 103.9 160.46 1985 107.6 186.84 1986 109.6 236.34 1987 113.6 286.83 1988 118.3 265.79 1989 124.0 322.84

Source: Economic Report of the President, 1990, Table C-58, for CPI and Table C-93 for the S&P index.

TABLE 2-9

a. Plot the data on a scattergram with the S&P index on the vertical axis and CPI on the horizontal axis.

b. What can you say about the relationship between the two indexes? What does economic theory have to say about this relationship?

c. Consider the following regression model:

Use the method of least squares to estimate this equation from the preced- ing data and interpret your results.

d. Do the results obtained in part (c) make economic sense? e. Do you know why the S&P index dropped in 1988?

(S&P)t = B1 + B2CPIt + ut

guj75845_ch02.qxd 4/16/09 10:13 AM Page 47

2.13. Table 2-10 gives data on the nominal interest rate (Y) and the inflation rate (X) for the year 1988 for nine industrial countries.

48 PART ONE: THE LINEAR REGRESSION MODEL

NOMINAL INTEREST RATE (Y ) AND INFLATION (X) IN NINE INDUSTRIAL COUNTRIES FOR THE YEAR 1988

Country Y (%) X (%)

Australia 11.9 7.7 Canada 9.4 4.0 France 7.5 3.1 Germany 4.0 1.6 Italy 11.3 4.8 Mexico 66.3 51.7 Switzerland 2.2 2.0 United Kingdom 10.3 6.8 United States 7.6 4.4

Source: Rudiger Dornbusch and Stanley Fischer, Macroeconomics, 5th ed., McGraw- Hill, New York, 1990, p. 652. The original data are from various issues of International Financial Statistics, published by the International Monetary Fund (IMF).

TABLE 2-10

a. Plot these data with the interest rate on the vertical axis and the inflation rate on the horizontal axis. What does the scattergram reveal?

b. Do an OLS regression of Y on X. Present all your calculations. c. If the real interest rate is to remain constant, what must be the relationship

between the nominal interest rate and the inflation rate? That is, what must be the value of the slope coefficient in the regression of Y on X and that of the intercept? Do your results suggest that this is the case? For a theoretical discussion of the relationship among the nominal interest rate, the inflation rate, and the real interest rate, see any textbook on macroeconomics and look up the topic of the Fisher equation, named after the famous American economist, Irving Fisher.

2.14. The real exchange rate (RE) is defined as the nominal exchange rate (NE) times the ratio of the domestic price to foreign price. Thus, RE for the United States against UK is

a. From the data given in Table 1-3 of Problem 1.7, compute REUS. b. Using a regression package you are familiar with, estimate the following

regression:

(1)

c. A priori, what do you expect the relationship between the nominal and real exchange rates to be? You may want to read up on the purchasing power parity (PPP) theory from any text on international trade or macroeconomics.

d. Are the a priori expectations supported by your regression results? If not, what might be the reason?

NEUS = B1 + B2 REUS + u

REUS = NEUS(USCPI/UKCPI)

guj75845_ch02.qxd 4/16/09 10:13 AM Page 48

*e. Run regression (1) in the following alternative form:

(2)

where ln stands for the natural logarithm, that is, log to the base e. Interpret the results of this regression. Are the results from regressions (1) and (2) qualitatively the same?

2.15. Refer to problem 2.12. In Table 2-11 we have data on CPI and the S&P 500 index for the years 1990 to 2007.

ln NEUS = A1 + A2 ln REUS + u

CHAPTER TWO: BASIC IDEAS OF LINEAR REGRESSION: THE TWO-VARIABLE MODEL 49

CONSUMER PRICE INDEX (CPI) AND S&P 500 INDEX (S&P), UNITED STATES, 1990–2007

Year CPI S&P

1990 130.7 334.59 1991 136.2 376.18 1992 140.3 415.74 1993 144.5 451.41 1994 148.2 460.42 1995 152.4 541.72 1996 156.9 670.50 1997 160.5 873.43 1998 163.0 1085.50 1999 166.6 1327.33 2000 172.2 1427.22 2001 177.1 1194.18 2002 179.9 993.94 2003 184.0 965.23 2004 188.9 1130.65 2005 195.3 1207.23 2006 201.6 1310.46 2007 207.3 1477.19

Source: Economic Report of the President, 2008.

TABLE 2-11

a. Repeat questions (a) to (e) from problem 2.12. b. Do you see any difference in the estimated regressions? c. Now combine the two sets of data and estimate the regression of the S&P

index on the CPI. d. Are there noticeable differences in the three regressions?

2.16. Table 2-12, found on the textbook’s Web site, gives data on average starting pay (ASP), grade point average (GPA) scores (on a scale of 1 to 4), GMAT scores, an- nual tuition, percent of graduates employed at graduation, recruiter assess- ment score (5.0 highest), and percent of applicants accepted in the graduate business school for 47 well-regarded business schools in the United States for the year 2007–2008. Note: Northwestern University ranked 4th (in a tie with MIT and University of Chicago) but was removed from the data set because there was no information available about percent of applicants accepted. a. Using a bivariate regression model, find out if GPA has any effect on ASP. b. Using a suitable regression model, find out if GMAT scores have any rela-

tionship to ASP.

*Optional.

guj75845_ch02.qxd 4/16/09 10:13 AM Page 49

c. Does annual tuition have any relationship to ASP? How do you know? If there is a positive relationship between the two, does that mean it pays to go to the most expensive business school? Can you argue that a high-tuition business school means a high-quality MBA program? Why or why not?

d. Does the recruiter perception have any bearing on ASP? 2.17. Table 2-13 (found on the textbook’s Web site) gives data on real GDP (Y) and

civilian unemployment rate (X) for the United States for period 1960 to 2006. a. Estimate Okun’s law in the form of Eq. (2.22). Are the regression results

similar to the ones shown in (2.22)? Does this suggest that Okun’s law is universally valid?

b. Now regress percentage change in real GDP on change in the civilian un- employment rate and interpret your regression results.

c. If the unemployment rate remains unchanged, what is the expected (per- cent) rate of growth in real GDP? (Use the regression in [b]). How would you interpret this growth rate?

2.18. Refer to Example 2.3, for which the data are as shown in Table 2-6 (on the text- book’s Web site). a. Using a statistical package of your choice, confirm the regression results

given in Eq. (2.24) and Eq. (2.25). b. For both regressions, get the estimated values of Y (i.e., ) and compare

them with the actual Y values in the sample. Also obtain the residual values, ei. From this can you tell which is a better model, Eq. (2.24) or Eq. (2.25)?

2.19. Refer to Example 2.5 on antique clock prices. Table 2-14 gives the underlying data. a. Plot clock prices against the age of the clock and against the number of

bidders. Does this plot suggest that the linear regression models shown in Eq. (2.27) and Eq. (2.28) may be appropriate?

NYi

50 PART ONE: THE LINEAR REGRESSION MODEL

AUCTION DATA ON PRICE, AGE OF CLOCK, AND NUMBER OF BIDDERS

Number of Number of Observations Price Age bidders Observations Price Age bidders

1 1235 127 13 17 854 143 6 2 1080 115 12 18 1483 159 9 3 845 127 7 19 1055 108 14 4 1552 150 9 20 1545 175 8 5 1047 156 6 21 729 108 6 6 1979 182 11 22 1792 179 9 7 1822 156 12 23 1175 111 15 8 1253 132 10 24 1593 187 8 9 1297 137 9 25 1147 137 8

10 946 113 9 26 1092 153 6 11 1713 137 15 27 1152 117 13 12 1024 117 11 28 1336 126 10 13 2131 170 14 29 785 111 7 14 1550 182 8 30 744 115 7 15 1884 162 11 31 1356 194 5 16 2041 184 10 32 1262 168 7

TABLE 2-14

guj75845_ch02.qxd 4/16/09 10:13 AM Page 50

b. Would it make any sense to plot the number of bidders against the age of the clock? What would such a plot reveal?

2.20. Refer to the math S.A.T. score example discussed in the text. Table 2-4 gives the necessary raw calculations to obtain the OLS estimators. Look at the columns Y (actual Y) and (estimated Y) values. Plot the two in a scattergram. What does the scattergram reveal? If you believe that the fitted model [Eq. (2.20)] is a “good” model, what should be the shape of the scattergram? In the next chapter we will see what we mean by a “good” model.

2.21. Table 2-15 (on the textbook’s Web site) gives data on verbal and math S.A.T. scores for both males and females for the period 1972–2007. a. You want to predict the male math score (Y) on the basis of the male ver-

bal score (X). Develop a suitable linear regression model and estimate its parameters.

b. Interpret your regression results. c. Reverse the roles of Y and X and regress the verbal score on the math score.

Interpret this regression d. Let a2 be the slope coefficient in the regression of the math score on the ver-

bal score and let b2 be the slope coefficient of the verbal score on the math score. Multiply these two values. Compare the resulting value with the r2

obtained from the regression of math score on verbal score or the r2 value obtained from the regression of verbal score on math score. What conclusion can you draw from this exercise?

2.22. Table 2-16 (on the textbook’s Web site) gives data on investment rate (ipergdp) and savings rate (spergdp), both measured as percent of GDP, for a cross- section of countries. These rates are averages for the period 1960–1974.* a. Plot the investment rate on the vertical axis and the savings rate on the hor-

izontal axis. b. Eyeball a suitable curve from the scatter diagram in (a). c. Now estimate the following model

d. Interpret the estimated coefficients. e. What general conclusion do you draw from your analysis? Note: Save your results for further analysis in the next chapter.

OPTIONAL QUESTIONS

2.23. Prove that , and hence show that . 2.24. Prove that . 2.25. Prove that that is, that the sum of the product of residuals ei and the

estimated Yi is always zero. 2.26. Prove that that is, that the means of the actual Y values and the

estimated Y values are the same. Y = YN ,

gei N Yi = 0, geixi = 0

e = 0gei = 0

ipergdpi = B1 + B2 spergdpi + ui

NY

CHAPTER TWO: BASIC IDEAS OF LINEAR REGRESSION: THE TWO-VARIABLE MODEL 51

*Source of data: Martin Feldstein and Charles Horioka, “Domestic Savings and International Capital Flows,” Economic Journal, vol. 90, June 1980, pp. 314–329.

guj75845_ch02.qxd 4/16/09 10:13 AM Page 51

2.27. Prove that where and . 2.28. Prove that where xi and yi are as defined in Problem 2.27. 2.29. For the math S.A.T. score example data given in Table 2-4, verify that state-

ments made in Question 2.23 hold true (save the rounding errors).

APPENDIX 2A: Derivation of Least-Squares Estimates

We start with Eq. (2.13):

(2A.1)

Using the technique of partial differentiation from calculus, we obtain:

(2A.2)

(2A.3)

By the first order condition of optimization, we set these two derivations to zero and simplify, which will give

(2A.4)

(2A.5)

which are Eqs. (2.14) and (2.15), respectively, given in the text. Solving these two equations simultaneously, we get the formulas given in

Eqs. (2.16) and (2.17).

aYiXi = b1aXi + b2aXi 2

aYi = nb1 + b2aXi

0ae 2 i /0 b2 = 2a (Yi - b1 - b2Xi)(-Xi)

0ae 2 i/0b1 = 2a (Yi - b1 - b2Xi)(-1)

ae 2 i = a (Yi - b1 - b2X1)

2

gxi = gyi = 0, yi = (Yi - Y)xi = (Xi - X)gxiyi = gxiyi = gxiyi,

52 PART ONE: THE LINEAR REGRESSION MODEL

guj75845_ch02.qxd 4/16/09 10:13 AM Page 52

CHAPTER 3 THE TWO-VARIABLE

MODEL: HYPOTHESIS TESTING

53

In Chapter 2 we showed how the method of least squares works. By applying that method to our math S.A.T. sample data given in Table 2-2, we obtained the following math S.A.T. score function:

(2.20)

where Y represents math S.A.T. score and X represents annual family income, measured in dollars.

This example illustrated the estimation stage of statistical inference. We now turn our attention to its other stage, namely, hypothesis testing. The important question that we raise is: How “good” is the estimated regression line given in Equation (2.20)? That is, how can we tell that it really is a good estimator of the true population regression function (PRF)? How can we be sure just on the basis of a single sample given in Table 2-2 that the estimated regression function (i.e., the sample regression function [SRF]) is in fact a good approximation of the true PRF?

We cannot answer this question definitely unless we are a little more specific about our PRF, Eq. (2.2). As Eq. (2.2) shows, Yi depends on both Xi and ui. Now we have assumed that the Xi values are known or given—recall from Chapter 2 that our analysis is a conditional regression analysis, conditional upon the given X’s. In short, we treat the X values as nonstochastic. The (nonobservable) error term u is of course random, or stochastic. (Why?) Since a stochastic term (u) is added to a nonstochastic term (X) to generate Y, Y becomes stochastic, too. This means that unless we are willing to assume how the stochastic u terms are gener- ated, we will not be able to tell how good an SRF is as an estimate of the true PRF.

YNi = 432.4138 + 0.0013Xi

guj75845_ch03.qxd 4/16/09 11:24 AM Page 53

In deriving the ordinary least squares (OLS) estimators so far, we did not say how the ui were generated, for the derivation of OLS estimators did not depend on any (probabilistic) assumption about the error term. But in testing statistical hy- potheses based on the SRF, we cannot make further progress, as we will show shortly, unless we make some specific assumptions about how ui are generated. This is precisely what the so-called classical linear regression model (CLRM) does, which we will now discuss. Again, to explain the fundamental ideas, we consider the two-variable regression model introduced in Chapter 2. In Chapter 4 we extend the ideas developed here to the multiple regression models.

3.1 THE CLASSICAL LINEAR REGRESSION MODEL

The CLRM makes the following assumptions:

A3.1.

The regression model is linear in the parameters; it may or may not be linear in the variables. That is, the regression model is of the following type.

(2.2)

As will be discussed in Chapter 4, this model can be extended to include more explanatory variables.

A3.2.

The explanatory variable(s) X is uncorrelated with the disturbance term u. However, if the X variable(s) is nonstochastic (i.e., its value is a fixed number), this assumption is automatically fulfilled. Even if the X value(s) is stochastic, with a large enough sample size this assumption can be related without severely affecting the analysis.1

This assumption is not a new assumption because in Chapter 2 we stated that our regression analysis is a conditional regression analysis, conditional upon the given X values. In essence, we are assuming that the X’s are nonstochastic. Assumption (3.1) is made to deal with simultaneous equation regression models, which we will discuss in Chapter 11.

A3.3.

Given the value of Xi, the expected, or mean, value of the disturbance term u is zero. That is,

(3.1)

Recall our discussion in Chapter 2 about the nature of the random term ui. It represents all those factors that are not specifically introduced in the model.

E(u|Xi) = 0

Yi = B1 + B2Xi + ui

54 PART ONE: THE LINEAR REGRESSION MODEL

1For further discussion, see Gujarati and Porter, Basic Econometrics, 5th ed., McGraw-Hill, New York, 2009.

guj75845_ch03.qxd 4/16/09 11:24 AM Page 54

What Assumption (3.1) states is that these other factors or forces are not related to Xi (the variable explicitly introduced in the model) and therefore, given the value of Xi, their mean value is zero.2 This is shown in Figure 3-1.

A3.4.

The variance of each ui is constant, or homoscedastic (homo means equal and scedastic means variance). That is

(3.2)

Geometrically, this assumption is as shown in Figure 3-2(a). This assumption simply means that the conditional distribution of each Y population corre- sponding to the given value of X has the same variance; that is, the individual Y values are spread around their mean values with the same variance.3 If this is not the case, then we have heteroscedasticity, or unequal variance, which is depicted in Figure 3-2(b).4 As this figure shows, the variance of each Y popula- tion is different, which is in contrast to Figure 3-2(a), where each Y population has the same variance. The CLRM assumes that the variance of u is as shown in Figure 3-2(a).

var(ui) = �2

CHAPTER THREE: THE TWO-VARIABLE MODEL: HYPOTHESIS TESTING 55

Y

X

�ui

�ui

0

PRF: E(Y �Xi) � B1 � B2 Xi

Conditional distribution of disturbances uiFIGURE 3-1

2Note that Assumption (3.2) only states that X and u are uncorrelated. Assumption (3.3) adds that not only are X and u uncorrelated, but also that given the value of X, the mean of u (which represents umpteen factors) is zero.

3Since the X values are assumed to be given, or nonstochastic, the only source of variation in Y is from u. Therefore, given Xi, the variance of Yi is the same as that of ui. In short, the conditional variances of ui and Yi are the same, namely, . Note, however, that the unconditional variance of Yi, as shown in Appendix B, is . As we will see, if the variable X has any impact on Y, the conditional variance of Y will be smaller than the unconditional variance of Y. Incidentally, the sample counterpart of the unconditional variance of Y is .

4There is a debate in the literature regarding whether it is homoscedasticity or homoskedasticity and heteroscedasticity or heteroskedasticty. Both seem to be acceptable.

g (Yi - Y)2/(n - 1)

E[Yi - E(Y)]2 �2

guj75845_ch03.qxd 4/16/09 11:24 AM Page 55

A3.5.

There is no correlation between two error terms. This is the assumption of no autocorrelation.

Algebraically, this assumption can be written as

(3.3)

Here cov stands for covariance (see Appendix B) and i and j are any two error terms. (Note: If i = j, Equation (3.3) will give the variance of u, which by Eq. (3.2) is a constant).

Geometrically, Eq. (3.3) can be shown in Figure 3-3. Assumption (3.5) means that there is no systematic relationship between two

error terms. It does not mean that if one u is above the mean value, another error term u will also be above the mean value (for positive correlation), or that if one error term is below the mean value, another error term has to be above the mean value, or vice versa (negative correlation). In short, the assumption of no autocorrelation means the error terms ui are random.

cov (ui, uj) = 0 i Z j

56 PART ONE: THE LINEAR REGRESSION MODEL

Y

X 0

Y

X 0

PRF: Yi � B1 � B2 Xi

PRF: Yi � B1 � B2 Xi

(a) (b)

(a) Homoscedasticity (equal variance); (b) Heteroscedasticity (unequal variance)FIGURE 3-2

(a) (b) (c)

u i

u i

u j uj uj

u i

Patterns of autocorrelation: (a) No autocorrelation; (b) positive autocorrelation; (c) negative autocorrelation

FIGURE 3-3

guj75845_ch03.qxd 4/16/09 11:24 AM Page 56

Since any two error terms are assumed to be uncorrelated, it means that any two Y values will also be uncorrelated; that is, . This is because

and given that the B’s are fixed numbers and that X is assumed to be fixed, Y will vary as u varies. So, if the u’s are uncorrelated, the Y’s will be uncorrelated also.

A3.6.

The regression model is correctly specified. Alternatively, there is no specifi- cation bias or specification error in the model used in empirical analysis.

What this assumption implies is that we have included all the variables that affect a particular phenomenon. Thus, if we are studying the demand for auto- mobiles, if we only include prices of automobiles and consumer income and do not take into account variables such as advertising, financing costs, and gaso- line prices, we will be committing model specification errors. Of course, it is not easy to determine the “correct” model in any given case, but we will provide some guidelines in Chapter 7.

You might wonder about all these assumptions. Why are they needed? How realistic are they? What happens if they are not true? How do we know that a particular regression model in fact satisfies all these assumptions? Although these questions are certainly pertinent, at this stage of the development of our subject matter, we cannot provide totally satisfactory answers to all of them. However, as we progress through the book, we will see the utility of these assumptions. As a matter of fact, all of Part II is devoted to finding out what happens if one or more of the assumptions of CLRM are not fulfilled.

But keep in mind that in any scientific inquiry we make certain assumptions because they facilitate the development of the subject matter in gradual steps, not because they are necessarily realistic. An analogy might help here. Students of economics are generally introduced to the model of perfect competition before they are introduced to the models of imperfect competition. This is done because the implications derived from this model enable us to better appreciate the models of imperfect competition, not because the model of perfect competi- tion is necessarily realistic, although there are markets that may be reasonably perfectly competitive, such as the stock market or the foreign exchange market.

3.2 VARIANCES AND STANDARD ERRORS OF ORDINARY LEAST SQUARES ESTIMATORS

One immediate result of the assumptions just introduced is that they enable us to estimate the variances and standard errors of the ordinary least squares (OLS) estimators given in Eqs. (2.16) and (2.17). In Appendix D we discuss the basics of estimation theory, including the notions of (point) estimators, their sampling distributions, and the concepts of the variance and standard error of the estimators. Based on our knowledge of those concepts, we know that the

Yi = B1 + B2Xi + ui cov(Yi, Yj) = 0

CHAPTER THREE: THE TWO-VARIABLE MODEL: HYPOTHESIS TESTING 57

guj75845_ch03.qxd 4/16/09 11:24 AM Page 57

OLS estimators given in Eqs. (2.16) and (2.17) are random variables, for their val- ues will change from sample to sample. Naturally, we would like to know something about the sampling variability of these estimators, that is, how they vary from sample to sample. These sampling variabilities, as we know now, are measured by the variances of these estimators, or by their standard errors (se), which are the square roots of the variances. The variances and standard errors of the OLS estimators given in Eqs. (2.16) and (2.17) are as follows:5

(3.4)

(Note: This formula involves both small x and capital X.)

(3.5)

(3.6)

(3.7)

where var = the variance and se = the standard error, and where is the vari- ance of the disturbance term ui, which by the assumption of homoscedasticity is assumed to be the same for each u.

Once is known, then all the terms on the right-hand sides of the preceding equations can be easily computed, which will give us the numerical values of the variances and standard errors of the OLS estimators. The homoscedastic is estimated from the following formula:

(3.8)

where is an estimator of (recall we use ˆ to indicate an estimator) and is the residual sum of squares (RSS), that is, , the sum of the squared difference between the actual Y and the estimated Y. (See the next to the last column of Table 2-4.)

The expression is known as the degrees of freedom (d.f.), which, as noted in Appendix C, is simply the number of independent observations.6

Once ei is computed, as shown in Table 2-4, can be computed easily. Incidentally, in passing, note that

(3.9)�N = 2�N2

a e 2 i

(n - 2)

a (Yi - YN i) 2

a e 2 i�

2�N2

�N2 = a e2i

n - 2

�2

�2

�2

se (b2) = 2var (b2)

var (b2) = �2b2 = �2

ax 2 i

se (b1) = 2var (b1)

var (b1) = �2b1 = aX

2 i

nax 2 i

# �2

58 PART ONE: THE LINEAR REGRESSION MODEL

5The proofs can be found in Gujarati and Porter, Basic Econometrics, 5th ed., McGraw-Hill, New York, 2009, pp. 93–94.

6Notice that we can compute ei only when is computed. But to compute the latter, we must first obtain b1 and b2. In estimating these two unknowns, we lose 2 d.f. Therefore, although we have n observations, the d.f. are only .(n - 2)

YN i

guj75845_ch03.qxd 4/16/09 11:24 AM Page 58

which is known as the standard error of the regression (SER), which is simply the standard deviation of the Y values about the estimated regression line.7 This standard error of regression is often used as a summary measure of the goodness of fit of the estimated regression line, a topic discussed in Section 3.6. As you would suspect, the smaller the value of , the closer the actual Y value is to its estimated value from the regression model.

Variances and Standard Errors of the Math S.A.T. Score Example

Using the preceding formulas, let us compute the variances and standard errors of our math S.A.T. score example. These calculations are presented in Table 3-1. (See Eqs. [3.10] to [3.15] therein.)

Summary of the Math S.A.T. Score Function

Let us express the estimated S.A.T. score function in the following form:

(3.16)

where the figures in parentheses are the estimated standard errors. Regression results are sometimes presented in this format (but more on this in Section 3.8). Such a presentation indicates immediately the estimated parameters and their

se = (16.9061)(0.000245)

YNi = 432.4138 + 0.0013Xi

N�

CHAPTER THREE: THE TWO-VARIABLE MODEL: HYPOTHESIS TESTING 59

COMPUTATIONS FOR THE S.A.T. EXAMPLE

Estimator Formula Answer Equation number

975.1347 (3.10)

31.2271 (3.11)

var (b1) 285.8153 (3.12)

se (b1) 16.9061 (3.13)

var (b2) 6.0045 * 10-9 (3.14)

se (b2) 0.0000775 (3.15)

Note: The raw data underlying the calculations are given in Table 2-4. In computing the variances of the estimators, has been replaced by its estimator, .�N2�2

2var(b2) = 26.0045 * 10-9

�2

ax 2 i

= 975.1347

1.624 * 1011

2var(b1) = 2285.8153

a aX

2 i

nax 2 i b�2 =

4.76 * 1010

10(1.624 * 1011) (975.1347)

2�N2 = 2975.1347�N

a a e2i

n - 2 b�N2

TABLE 3-1

7Note the difference between the standard error of regression and the standard deviation of Y.

The latter is measured, as usual, from its mean value, as , whereas the former is

measured from the estimated value (i.e., from the sample regression). See also footnote 3.YN i

Sy = A g (Yi - Y)

2

n - 1

�N

guj75845_ch03.qxd 4/16/09 11:24 AM Page 59

standard errors. For example, it tells us that the estimated slope coefficient of the math S.A.T. score function (i.e., the coefficient of the annual family income variable) is 0.0013 and its standard deviation, or standard error, is 0.000245. This is a measure of variability of b2 from sample to sample.

What use can we make of this finding? Can we say, for example, that our computed b2 lies within a certain number of standard deviation units from the true B2? If we can do that, we can state with some confidence (i.e., probability) how good the computed SRF, Equation (3.16), is as an estimate of the true PRF. This is, of course, the topic of hypothesis testing.

But before discussing hypothesis testing, we need a bit more theory. In particular, since b1 and b2 are random variables, we must find their sampling, or probability, distributions. Recall from Appendixes C and D that a random variable (r.v.) has a probability distribution associated with it. Once we deter- mine the sampling distributions of our two estimators, as we will show in Section 3.4, the task of hypothesis testing becomes straightforward. But even before that we answer an important question: Why do we use the OLS method?

3.3 WHY OLS? THE PROPERTIES OF OLS ESTIMATORS

The method of OLS is used popularly not only because it is easy to use but also because it has some strong theoretical properties, which are summarized in the well-known Gauss-Markov theorem.

Gauss-Markov Theorem

Given the assumptions of the classical linear regression model, the OLS esti- mators have minimum variance in the class of linear estimators; that is, they are BLUE (best linear unbiased estimators).

We provide an overview of the BLUE property in Appendix D. In short, the OLS estimators have the following properties:8

1. b1 and b2 are linear estimators; that is, they are linear functions of the ran- dom variable Y, which is evident from Equations (2.16) and (2.17).

2. They are unbiased; that is, E(b1) = B1 and E(b2) = B2. Therefore, in repeated applications, on average, b1 and b2 will coincide with their true values B1 and B2, respectively.

3. that is, the OLS estimator of the error variance is unbiased. In repeated applications, on average, the estimated value of the error vari- ance will converge to its true value.

4. b1 and b2 are efficient estimators; that is, var (b1) is less than the variance of any other linear unbiased estimator of B1, and var (b2) is less than the

E(�N2) = �2

60 PART ONE: THE LINEAR REGRESSION MODEL

8For proof, see Gujarati and Porter, Basic Econometrics, 5th ed., McGraw-Hill, New York, 2009, pp. 95–96.

guj75845_ch03.qxd 4/16/09 11:24 AM Page 60

variance of any other linear unbiased estimator of B2. Therefore, we will be able to estimate the true B1 and B2 more precisely if we use OLS rather than any other method that also gives linear unbiased estimators of the true parameters.

The upshot of the preceding discussion is that the OLS estimators possess many desirable statistical properties that we discuss in Appendix D. It is for this reason that the OLS method has been used popularly in regression analysis, as well as for its intuitive appeal and ease of use.

Monte Carlo Experiment

In theory the OLS estimators are unbiased, but how do we know that in practice this is the case? To find out, let us conduct the following Monte Carlo experiment.

Assume that we are given the following information:

where That is, we are told that the true values of the intercept and slope coefficients

are 1.5 and 2.0, respectively, and that the error term follows the normal distrib- ution with a mean of zero and a variance of 4. Now suppose you are given 10 values of X: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10.

Given this information, you can proceed as follows. Using any statistical package, you generate 10 values of ui from a normal distribution with mean zero and variance 4. Given B1, B2, the 10 values of X, and the 10 values of ui gen- erated from the normal distribution, you will then obtain 10 values of Y from the preceding equation. Call this experiment or sample number 1. Go to the nor- mal distribution table, collect another 10 values of ui, generate another 10 values of Y, and call it sample number 2. In this manner obtain 21 samples.

For each sample of 10 values, regress Yi generated above on the X values and obtain b1, b2, and . Repeat this exercise for all 21 samples. Therefore, you will have 21 values each of b1, b2, and . We conducted this experiment and obtained the results shown in Table 3-2.

From the data given in this table, we have computed the mean, or average, values of b1, b2, and , which are, respectively, 1.4526, 1.9665, and 4.4743, whereas the true values of the corresponding coefficients, as we know, are 1.5, 2.0, and 4.0.

What conclusion can we draw from this experiment? It seems that if we apply the method of least squares time and again, on average, the values of the estimated parameters will be equal to their true (population parameter) values. That is, OLS estimators are unbiased. In the present example, had we conducted more than 21 sampling experiments, we would have come much closer to the true values.

�N2

�N2 �N2

ui ' N(0, 4).

= 1.5 + 2.0Xi + ui

Yi = B1 + B2Xi + ui

CHAPTER THREE: THE TWO-VARIABLE MODEL: HYPOTHESIS TESTING 61

guj75845_ch03.qxd 4/16/09 11:24 AM Page 61

3.4 THE SAMPLING, OR PROBABILITY, DISTRIBUTIONS OF OLS ESTIMATORS

Now that we have seen how to compute the OLS estimators and their stan- dard errors and have examined some of the properties of these estimators, we need to find the sampling distributions of these estimators. Without that knowledge we will not be able to engage in hypothesis testing. The general notion of sampling distribution of an estimator is discussed in Appendix C (see Section C.2).

To derive the sampling distributions of the OLS estimators b1 and b2, we need to add one more assumption to the list of assumptions of the CLRM. This assumption is

A3.7.

In the PRF the error term ui follows the normal distribu- tion with mean zero and variance . That is,

(3.17)ui ' N(0, �2)

�2 Yi = B1 + B2Xi + ui

62 PART ONE: THE LINEAR REGRESSION MODEL

MONTE CARLO EXPERIMENT: Yi = 1.5 + 2Xi + ui; u ~ N(0, 4)

b1 b2

2.247 1.840 2.7159 0.360 2.090 7.1663

-2.483 2.558 3.3306 0.220 2.180 2.0794 3.070 1.620 4.3932 2.570 1.830 7.1770 2.551 1.928 5.7552 0.060 2.070 3.6176

-2.170 2.537 3.4708 1.470 2.020 4.4479 2.540 1.970 2.1756 2.340 1.960 2.8291 0.775 2.050 1.5252 3.020 1.740 1.5104 0.810 1.940 4.7830 1.890 1.890 7.3658 2.760 1.820 1.8036

-0.136 2.130 1.8796 0.950 2.030 4.9908 2.960 1.840 4.5514 3.430 1.740 5.2258

N�2 = 4.4743b2 = 1.9665b1 = 1.4526

N�2

TABLE 3-2

guj75845_ch03.qxd 4/16/09 11:24 AM Page 62

What is the rationale for this assumption? There is a celebrated theorem in statistics, known as the central limit theorem (CLT), which we discuss in Appendix C (see Section C.1), which states that:

Central Limit Theorem

If there is a large number of independent and identically distributed ran- dom variables, then, with a few exceptions,9 the distribution of their sum tends to be a normal distribution as the number of such variables increases indefinitely.

Recall from Chapter 2 our discussion about the nature of the error term, ui. As shown in Section 2.4, the error term represents the influence of all those forces that affect Y but are not specifically included in the regression model because there are so many of them and the individual effect of any one such force (i.e., variable) on Y may be too minor. If all these forces are random, and if we let u represent the sum of all these forces, then by invoking the CLT we can assume that the error term u follows the normal distribution. We have already assumed that the mean value of ui is zero and that its variance, following the homoscedas- ticity assumption, is the constant . Hence, we have Equation (3.17).

But how does the assumption that u follows the normal distribution help us to find out the probability distributions of b1 and b2? Here we make use of another property of the normal distribution discussed in Appendix C, namely, any linear function of a normally distributed variable is itself normally distributed. Does this mean that if we prove that b1 and b2 are linear functions of the nor- mally distributed variable ui, they themselves are normally distributed? That’s right! You can indeed prove that these two OLS estimators are in fact linear functions of the normally distributed ui. (For proof, see Exercise 3.24).10

Now we know from Appendix C that a normally distributed r.v. has two parameters, the mean and the variance. What are the parameters of the normally distributed b1 and b2? They are as follows:

(3.18)

(3.19)

where the variances of b1 and b2 are as given in Eq. (3.4) and Eq. (3.6). In short, b1 and b2 each follow the normal distribution with their means equal

to true B1 and B2 and their variances given by Eqs. (3.4) and (3.6) developed previously. Geometrically, the distributions of these estimators are as shown in Figure 3-4.

b2 ' N AB2, �2b2 B

b1 ' N AB1, �2b1 B

�2

CHAPTER THREE: THE TWO-VARIABLE MODEL: HYPOTHESIS TESTING 63

9One exception is the Cauchy probability distribution, which has no mean or variance. 10It may also be noted that since if , then

because Yi is a linear combination of ui. (Note that B1, B2 are constants and Xi fixed). Yi ' N(B1 + B2Xi, �2)ui ' N(0, �2)Yi = B1 + B2Xi + ui

guj75845_ch03.qxd 4/16/09 11:24 AM Page 63

3.5 HYPOTHESIS TESTING

Recall that estimation and hypothesis testing are the two main branches of sta- tistical inference. In Chapter 2 we showed how OLS helps us to estimate the parameters of linear regression models. In this chapter the classical framework enabled us to examine some of the properties of OLS estimators. With the added assumption that the error term ui is normally distributed, we were able to find the sampling (or probability) distributions of the OLS estimators, namely, the normal distribution. With this knowledge we are now equipped to deal with the topic of hypothesis testing in the context of regression analysis.

Let us return to our math S.A.T. example. The estimated math S.A.T. score function is given in Eq. (2.20). Suppose someone suggests that annual family income has no relationship to a student’s math S.A.T. score.

In applied regression analysis such a “zero” null hypothesis, the so-called straw man hypothesis, is deliberately chosen to find out whether Y is related to X at all. If there is no relationship between Y and X to begin with, then testing a hypothesis that or any other value is meaningless. Of course, if the zero null hypothesis is sustainable, there is no point at all in including X in the model. Therefore, if X really belongs in the model, you would fully expect to reject the zero null hypothesis H0 in favor of the alternative hypothesis H1, which says, for example, that ; that is, the slope coefficient is different from zero. It could be positive or it could be negative.

B2 Z 0

B2 = -2

H0 : B2 = 0

64 PART ONE: THE LINEAR REGRESSION MODEL

B2 b2

B1

(a)

(b)

b1

(Normal) sampling distributions of b1 and b2FIGURE 3-4

guj75845_ch03.qxd 4/16/09 11:24 AM Page 64

Our numerical results show that b2 = 0.0013. You would therefore expect that the zero null hypothesis is not tenable in this case. But we cannot look at the nu- merical results alone, for we know that because of sampling fluctuations, the numerical value will change from sample to sample. Obviously, we need some formal testing procedure to reject or not reject the null hypothesis. How do we proceed?

This should not be a problem now, for in Equation (3.19) we have shown that b2 follows the normal distribution with mean = B2 and Then, following our discussion about hypothesis testing in Appendix D, Section D.5, we can use either:

1. The confidence interval approach or 2. The test of significance approach

to test any hypotheses about B2 as well as B1. Since b2 follows the normal distribution, with the mean and the variance

stated in expression (3.19), we know that

(3.20)

follows the standard normal distribution. From Appendix C we know the proper- ties of the standard normal distribution, particularly, the property that per- cent of the area of the normal distribution lies within two standard deviation units of the mean value, where means approximately. Therefore, if our null hypothesis is B2 = 0 and the computed b2 = 0.0013, we can find out the proba- bility of obtaining such a value from the Z, or standard normal, distribution (Appendix E, Table E-1). If this probability is very small, we can reject the null hypothesis, but if it is large, say, greater than 10 percent, we may not reject the null hypothesis. All this is familiar material from Appendixes C and D.

But, there is a hitch! To use Equation (3.20) we must know the true . This is not known, but we can estimate it by using given in Eq. (3.8). However, if we replace in Eq. (3.20) by its estimator , then, as shown in Appendix C, Eq. (C.8), the right-hand side of Eq. (3.20) follows the t distribution with d.f., not the standard normal distribution; that is,

(3.21)

Or, more generally,

(3.22)

Note that we lose 2 d.f. in computing for reasons stated earlier.�N2

b2 - B2 se(b2)

' tn-2

b2 - B2

�Nn Aa

x2i

' tn-2

(n - 2) �N�

�N2 �2

L

L95

= b2 - B2

�n Aa

x2i

' N(0, 1)

Z = b2 - B2 se(b2)

var(b2) = �2>ax 2 i .

CHAPTER THREE: THE TWO-VARIABLE MODEL: HYPOTHESIS TESTING 65

guj75845_ch03.qxd 4/16/09 11:24 AM Page 65

Therefore, to test the null hypothesis in the present case, we have to use the t distribution in lieu of the (standard) normal distribution. But the procedure of hypothesis testing remains the same, as explained in Appendix D.

Testing H0:B2 = 0 versus H1: B2 �= 0:The Confidence Interval Approach

For our math S.A.T. example we have 10 observations, hence the d.f. are . Let us assume that �, the level of significance or the probability of

committing a type I error, is fixed at 5 percent. Since the alternative hypothesis is two-sided, from the t table given in Appendix E, Table E-2, we find that for 8 d.f.,

(3.23)

That is, the probability that a t value (for 8 d.f.) lies between the limits (-2.306, 2.306) is 0.95 or 95 percent; these, as we know, are the critical t values. Now by substituting for t from expression (3.21) into the preceding equation, we obtain

(3.24)

Rearranging inequality (3.24), we obtain

(3.25)

Or, more generally,

(3.26)

which provides a 95% confidence interval for B2. In repeated applications 95 out of 100 such intervals will include the true B2. As noted previously, in the language of hypothesis testing such a confidence interval is known as the region of acceptance (of H0) and the area outside the confidence interval is known as the rejection region (of H0).

Geometrically, the 95% confidence interval is shown in Figure 3-5(a). Now following our discussion in Appendix D, if this interval (i.e., the accep-

tance region) includes the null-hypothesized value of B2, we do not reject the hypothesis. But if it lies outside the confidence interval (i.e., it lies in the rejec- tion region), we reject the null hypothesis, bearing in mind that in making either of these decisions we are taking a chance of being wrong a certain percent, say, 5 percent, of the time.

P[(b2 - 2.306 se(b2) … B2 … b2 + 2.306 se(b2)] = 0.95

PPb2 - 2.306 �N Aa

x2i

… B2 … b2 + 2.306 �N

Aa x2i Q = 0.95

PP -2.306 … b2 - B2

�Nn Aa

x2i

… 2.306Q = 0.95

P(-2.306 … t … 2.306) = 0.95

(10 - 2) = 8

66 PART ONE: THE LINEAR REGRESSION MODEL

guj75845_ch03.qxd 4/16/09 11:24 AM Page 66

All that remains to be done for our math S.A.T. score example is to obtain the numerical value of this interval. But that is now easy, for we have already obtained se(b2) = 0.000245, as shown in Eq. (3.16). Substituting this value in Eq. (3.26), we now obtain the 95% confidence interval as shown in Figure 3-5(b).

That is,

(3.27)

Since this interval does not include the null-hypothesized value of 0, we can reject the null hypothesis that annual family income is not related to math S.A.T. scores. Put positively, income does have a relationship to math S.A.T. scores.

A cautionary note: As noted in Appendix D, although the statement given in Eq. (3.26) is true, we cannot say that the probability is 95 percent that the particular interval in Eq. (3.27) includes the true B2, for unlike Eq. (3.26), expression (3.27) is not a random interval; it is fixed. Therefore, the probability is either 1 or 0 that the interval in Eq. (3.27) includes B2. We can only say that if we construct 100 intervals like the interval in Eq. (3.27), 95 out of 100 such in- tervals will include the true B2; we cannot guarantee that this particular interval will necessarily include B2.

Following a similar procedure exactly, the reader should verify that the 95% confidence interval for the intercept term B1 is

(3.28)

If, for example, H0:B1 = 0 vs. H1:B1 0, obviously this null hypothesis will be rejected too, for the preceding 95% confidence interval does not include 0.

Z

393.4283 … B1 … 471.3993

0.00074 … B2 … 0.00187

0.0013 - 2.306(0.000245) … B2 … 0.0013 + 2.306(0.000245)

CHAPTER THREE: THE TWO-VARIABLE MODEL: HYPOTHESIS TESTING 67

[b2 − 2.306 se(b2)] [b2 + 2.306 se(b2)]b2

(a)

0.0556 0.1072

(b)

(a) 95% confidence interval for B2 (8 d.f.); (b) 95% confidence interval for the slope coefficient of the math S.A.T. score example

FIGURE 3-5

guj75845_ch03.qxd 4/16/09 11:24 AM Page 67

On the other hand, if the null hypothesis were that the true intercept term is 400, we would not reject this null hypothesis because the 95% confidence interval includes this value.

The Test of Significance Approach to Hypothesis Testing

The key idea underlying this approach to hypothesis testing is that of a test statistic (see Appendix D) and the sampling distribution of the test statistic under the null hypothesis, H0. The decision to accept or reject H0 is made on the basis of the value of the test statistic obtained from the sample data.

To illustrate this approach, recall that

(3.22)

follows the t distribution with d.f. Now if we let

where is a specific numerical value of B2 (e.g., ), then

(3.29)

can be readily computed from the sample data. Since all the quantities in Equation (3.29) are now known, we can use the t value computed from Eq. (3.29) as the test statistic, which follows the t distribution with d.f. Appropriately, the testing procedure is called the t test.11

Now to use the t test in any concrete application, we need to know three things:

1. The d.f., which are always for the two-variable model 2. The level of significance, �, which is a matter of personal choice,

although 1, 5, or 10 percent levels are usually used in empirical analysis. Instead of arbitrarily choosing the � value, you can find the p value (the exact level of significance as described in Appendix D) and reject the null hypothesis if the computed p value is sufficiently low.

3. Whether we use a one-tailed or two-tailed test (see Table D-2 and Figure D-7).

(n - 2)

(n - 2)

= estimator - hypothesized value standard error of the estimator

t = b2 - B*2 se(b2)

B*2 = 0B*2

H0:B2 = B*2

(n - 2)

t = b2 - B2 se(b2)

68 PART ONE: THE LINEAR REGRESSION MODEL

11The difference between the confidence interval and the test of significance approaches lies in the fact that in the former we do not know what the true B2 is and therefore try to guess it by estab- lishing a confidence interval. In the test of significance approach, on the other hand, we hypothesize what the true B2 ( ) is and try to find out if the sample value b2 is sufficiently close to (the hypothesized) .B*2

=B*2 (1 - �)

guj75845_ch03.qxd 4/16/09 11:24 AM Page 68

Math S.A.T. Example Continued

1. A Two-Tailed Test Assume that Using Eq. (3.29), we find that

(3.30)

Now from the t table given in Appendix E, Table E-2, we find that for 8 d.f. we have the following critical t values (two-tailed) (see Figure 3-6):

Level of significance Critical t

0.01 3.355 0.05 2.306 0.10 1.860

In Appendix D, Table D-2 we stated that, in the case of the two-tailed t test, if the computed |t|, the absolute value of t, exceeds the critical t value at the chosen level of significance, we can reject the null hypothe- sis. Therefore, in the present case we can reject the null hypothesis that the true B2 (i.e., the income coefficient) is zero because the computed |t| of 5.4354 far exceeds the critical t value even at the 1% level of signifi- cance. We reached the same conclusion on the basis of the confidence interval shown in Eq. (3.27), which should not be surprising because the confidence interval and the test of significance approaches to hypothesis testing are merely two sides of the same coin.

Incidentally, in the present example the p value (i.e., probability value) of the t statistic of 5.4354 is about 0.0006. Thus, if we were to reject the null hypothesis that the true slope coefficient is zero at this p value, we would be wrong in six out of ten thousand occasions.

2. A One-Tailed Test Since the income coefficient in the math S.A.T. score function is expected to be positive, a realistic set of hypotheses would be here the alternative hypothesis is one- sided.

H0:B2 … 0 and H1:B2 7 0;

t = 0.0013

0.000245 = 5.4354

H0:B2 = 0 and H1:B2 Z 0.

CHAPTER THREE: THE TWO-VARIABLE MODEL: HYPOTHESIS TESTING 69

0 3.3552.3061.860

2.5% 5%

0.5%

–3.355 –1.860–2.306

t = 5.4354 0.5%

2.5% 5%

t (8 d.f.)

The t distribution for 8 d.f.FIGURE 3-6

guj75845_ch03.qxd 4/16/09 11:24 AM Page 69

The t-testing procedure remains exactly the same as before, except, as noted in Appendix D, Table D-2, the probability of committing a type I error is not divided equally between the two tails of the t distribution but is concentrated in only one tail, either left or right. In the present case it will be the right tail. (Why?) For 8 d.f. we observe from the t table (Appendix E, Table E-2) that the critical t value (right-tailed) is

Level of significance Critical t

0.01 2.896 0.05 1.860 0.10 1.397

For the math S.A.T. example, we first compute the t value as if the null hypothesis were that B2 = 0. We have already seen that this t value is

(3.30)

Since this t value exceeds any of the critical values shown in the preced- ing table, following the rules laid down in Appendix D, Table D-2, we can reject the hypothesis that annual family income has no relationship to math S.A.T. scores; actually it has a positive effect (i.e., ) (see Figure 3-7).

B2 7 0

t = 5.4354

70 PART ONE: THE LINEAR REGRESSION MODEL

0

0 2.8961.8601.397

(a)

10%

5%

1%

t (8 d.f.)

–2.896 –1.860 –1.397

1%

5%

10%

(b)

t (8 d.f.)

t = 5.4354

One-tailed t test: (a) Right-tailed; (b) left-tailedFIGURE 3-7

guj75845_ch03.qxd 4/16/09 11:24 AM Page 70

3.6 HOW GOOD IS THE FITTED REGRESSION LINE: THE COEFFICIENT OF DETERMINATION, r2

Our finding in the preceding section that on the basis of the t test both the esti- mated intercept and slope coefficients are individually statistically significant (i.e., significantly different from zero) suggests that the SRF, Eq. (3.16), shown in Figure 2-6 seems to “fit” the data “reasonably” well. Of course, not each actual Y value lies on the estimated PRF. That is, not all are zero; as Table 2-4 shows, some e are positive and some are negative. Can we develop an overall measure of “goodness of fit” that will tell us how well the estimated regression line, Eq. (3.16), fits the actual Y values? Indeed, such a measure has been developed and is known as the coefficient of determination, denoted by the symbol r2 (read as r squared). To see how r2 is computed, we proceed as follows.

Recall that

(Eq. 2.6)

Let us express this equation in a slightly different but equivalent form (see Figure 3-8) as

(3.31) Variation in Yi Variation in Yi explained Unexplained or

from its mean value by around residual variation its mean value (Note: )

Now, letting small letters indicate deviations from mean values, we can write the preceding equation as

(3.32)

(Note: , etc.) Also, note that , as a result of which ; that is, the mean values of the actual Y and the estimated Y are the same. Or

(3.33)

since . Now squaring Equation (3.33) on both sides and summing over the sample,

we obtain, after simple algebraic manipulation,

(3.34)

Or, equivalently,

(3.35)ay 2 i = b22ax

2 i + a e

2 i

ay 2 i = ayNi

2 + a e 2 i

yNi = b2xi

yi = b2xi + ei

Y = YNe = 0yi = (Yi - Y)

yi = yNi + ei

Y = YN

X(=YNi)

(Yi - Y) = (YNi - Y) + (Yi - YNi)(i.e., ei)

Yi = YNi + ei

ei = (Yi - YNi)

CHAPTER THREE: THE TWO-VARIABLE MODEL: HYPOTHESIS TESTING 71

guj75845_ch03.qxd 4/16/09 11:24 AM Page 71

This is an important relationship, as we will see. For proof of Equation (3.35), see Problem 3.25.

The various sums of squares appearing in Eq. (3.35) can be defined as follows:

the total variation12 of the actual Y values about their sample mean , which may be called the total sum of squares (TSS).

the total variation of the estimated Y values about their mean value , which may be called appropriately the sum of squares due to regression

(i.e., due to the explanatory variable [s]), or simply the explained sum of squares (ESS).

as before, the residual sum of squares (RSS) or residual or unex- plained variation of the Y values about the regression line.

Put simply, then, Eq. (3.35) is

(3.36)

and shows that the total variation in the observed Y values about their mean value can be partitioned into two parts, one attributable to the regression line and the other to random forces, because not all actual Y observations lie on the fitted line. All this can be seen clearly from Figure 3-8 (see also Fig. 2-6).

Now if the chosen SRF fits the data quite well, ESS should be much larger than RSS. If all actual Y lie on the fitted SRF, ESS will be equal to TSS, and RSS will be zero. On the other hand, if the SRF fits the data poorly, RSS will be much larger than ESS. In the extreme, if X explains no variation at all in Y, ESS will be zero and RSS will equal TSS. These are, however, polar cases. Typically, neither

TSS = ESS + RSS

a e 2 i =

( YN = Y) ayN

2 i =

Yay 2 i =

72 PART ONE: THE LINEAR REGRESSION MODEL

12The terms variation and variance are different. Variation means the sum of squares of deviations of a variable from its mean value. Variance is this sum divided by the appropriate d.f. In short, variance = variation/d.f.

Xi XX

Y

Y

Yi

(Yi � Y)

Total variation in Yi SRF

(Yi � Y) � Variation in Yi explained by regression

ˆ

ei � (Yi � Yi) � Variation in Yi not explained by regression

ˆ

Yi ˆ

Breakdown of total variation in YiFIGURE 3-8

guj75845_ch03.qxd 4/16/09 11:24 AM Page 72

ESS nor RSS will be zero. If ESS is relatively larger than RSS, the SRF will explain a substantial proportion of the variation in Y. If RSS is relatively larger than ESS, the SRF will explain only some part of the variation of Y. All these qualitative statements are intuitively easy to understand and can be readily quantified. If we divide Equation (3.36) by TSS on both sides, we obtain

(3.37)

Now let us define

(3.38)

The quantity r2 thus defined is known as the (sample) coefficient of determina- tion and is the most commonly used measure of the goodness of fit of a regression line. Verbally, r2 measures the proportion or percentage of the total varia- tion in Y explained by the regression model.

Two properties of r2 may be noted:

1. It is a non-negative quantity. (Why?) 2. Its limits are since a part (ESS) cannot be greater than the

whole (TSS).13 An r2 of 1 means a “perfect fit,” for the entire variation in Y is explained by the regression. An r2 of zero means no relationship between Y and X whatsoever.

Formulas to Compute r 2

Using Equation (3.38), Equation (3.37) can be written as

(3.39)

Therefore,

(3.40)

There are several equivalent formulas to compute r2, which are given in Question 3.5.

r2 = 1 - a e2i

ay 2 i

= r2 + a e2i

ay 2 i

1 = r2 + RSS TSS

0 … r2 … 1

r2 = ESS TSS

1 = ESS TSS

+ RSS TSS

CHAPTER THREE: THE TWO-VARIABLE MODEL: HYPOTHESIS TESTING 73

13This statement assumes that an intercept term is included in the regression model. More on this in Chapter 5.

guj75845_ch03.qxd 4/16/09 11:24 AM Page 73

r2 for the Math S.A.T. Example

From the data given in Table 2-4, and using formula (3.40), we obtain the following r2 value for our math S.A.T. score example:

(3.41)

Since r2 can at most be 1, the computed r2 is pretty high. In our math S.A.T. ex- ample X, the income variable, explains about 79 percent of the variation in math S.A.T. scores. In this case we can say that the sample regression (3.16) gives an excellent fit.

It may be noted that , the proportion of variation in Y not explained by X, is called, perhaps appropriately, the coefficient of alienation.

The Coefficient of Correlation, r

In Appendix B, we introduce the sample coefficient of correlation, r, as a measure of the strength of the linear relationship between two variables Y and X and show that r can be computed from formula (B.46), which can also be written as

(3.42)

(3.43)

But this coefficient of correlation can also be computed from the coefficient of determination, r2, as follows:

(3.44)

Since most regression computer packages routinely compute r2, r can be com- puted easily. The only question is about the sign of r. However, that can be determined easily from the nature of the problem. In our math S.A.T. example, since math S.A.T. scores and annual family income are expected to be positively related, the r value in this case will be positive. In general, though, r has the same sign as the slope coefficient, which should be clear from formulas (2.17) and (3.43).

Thus, for the math S.A.T. example,

(3.45)

In our example, math S.A.T. scores and annual family income are highly posi- tively correlated, a finding that is not surprising.

r = 20.7869 = 0.8871

r = ; 2r2

= a xiyi

4ax 2 i ay

2 i

r = a (Xi - X)(Yi - Y)

2(Xi - X)2(Yi - Y)2

(1 - r2)

= 0.7869

r2 = 1 - 7801.0776

36610

74 PART ONE: THE LINEAR REGRESSION MODEL

guj75845_ch03.qxd 4/16/09 11:24 AM Page 74

Incidentally, if you use formula (3.43) to compute r between the actual Y val- ues in the sample and the estimated Yi values ( ) from the given model, and square this r value, the squared r is precisely equal to the r2 value obtained from Eq. (3.42). For proof, see Question 3.5. You can verify this from the data given in Table 2-4. As you would expect, the closer the estimated Y values are to the actual Y values in the sample, the higher the r2 value will be.

3.7 REPORTING THE RESULTS OF REGRESSION ANALYSIS

There are various ways of reporting the results of regression analysis. Until the advent of statistical software, regression results were presented in the format shown in Equation (3.46). Many journal articles still present regression results in this format. For our math S.A.T. score example, we have:

(3.46)

In Equation (3.46) the figures in the first set of parentheses are the estimated standard errors (se) of the estimated regression coefficients. Those in the second set of parentheses are the estimated t values computed from Eq. (3.22) under the null hypothesis that the true population value of each regression coefficient individually is zero (i.e., the t values given are simply the ratios of the estimated coefficients to their standard errors). And those in the third set of parentheses are the p values of the computed t values.14 As a matter of convention, from now on, if we do not specify a specific null hypothesis, then we will assume that it is the zero null hypothesis (i.e., the population parameter assumes zero value). And if we reject it (i.e., when the test statistic is significant), it means that the true population value is different from zero.

One advantage of reporting the regression results in the preceding format is that we can see at once whether each estimated coefficient is individually statistically significant, that is, significantly different from zero. By quoting the p values we can determine the exact level of significance of the estimated t value. Thus the t value of the estimated slope coefficient is 5.4354, whose p value is practically zero. As we note in Appendix D, the lower the p value, the greater the ev- idence against the null hypothesis.

A warning is in order here. When deciding whether to reject or not reject a null hypothesis, determine beforehand what level of the p value (call it the criti- cal p value) you are willing to accept and then compare the computed p value with the critical p value. If the computed p value is smaller than the critical p value, the null hypothesis can be rejected. But if it is greater than the critical

p value = (5.85 * 10-9)(0.0006) d.f. = 8 t = (25.5774)(0.0006) r2 = 0.7849 se = (16.9061)(0.000245)

YNt = 432.4138 + 0.0013Xi

= YNi

CHAPTER THREE: THE TWO-VARIABLE MODEL: HYPOTHESIS TESTING 75

14The t table in Appendix E of this book (Table E-2) can now be replaced by electronic tables that will compute the p values to several digits. This is also true of the normal, chi-square, and the F tables (Appendix E, Tables E-4 and E-3, respectively).

guj75845_ch03.qxd 4/16/09 11:24 AM Page 75

p value the null hypothesis may not be rejected. If you feel comfortable with the tradition of fixing the critical p value at the conventional 1, 5, or 10 percent level, that is fine. In Eq. (3.46), the actual p value (i.e., the exact level of significance) of the t coefficient of 5.4354 is 0.0006. If we had chosen the critical p value at 5 per- cent, obviously we would reject the null hypothesis, for the computed p value of 0.0006 is much smaller than 5 percent.

Of course, any null hypothesis (besides the zero null hypothesis) can be tested easily by making use of the t test discussed earlier. Thus, if the null hypothesis is that the true intercept term is 450 and if the t value will be

The p value of obtaining such a t value is about 0.3287, which is obtained from electronic tables. If you had fixed the critical p value at the 10 percent level, you would not reject the null hypothesis, for the computed p value is much greater than the critical p value.

The zero null hypothesis, as mentioned before, is essentially a kind of straw man. It is usually adopted for strategic reasons—to “dramatize” the statistical significance (i.e., importance) of an estimated coefficient.

3.8 COMPUTER OUTPUT OF THE MATH S.A.T. SCORE EXAMPLE

Since these days we rarely run regressions manually, it may be useful to pro- duce the actual output of regression analysis obtained from a statistical software package. Below we give the selected output of our math S.A.T. example obtained from EViews.

Dependent Variable: Y Method: Least Squares Sample: 1 10 Included observations: 10

Coefficient Std. Error t-Statistic Prob. C 432.4138 16.90607 25.57742 0.0000 X 0.001332 0.000245 5.435396 0.0006 R-squared 0.786914 S.E. of regression 31.22715 Sum squared resid 7801.078

In this output, C denotes the constant term (i.e., intercept); Prob. is the p value; sum of squared resid is the RSS ; and S.E. of regression is the standard error of the regression. The t values presented in this table are computed under the (null) hypothesis that the corresponding population regression coefficients are zero.

(= ge2i)

t = 432.4138 - 450

16.9061 = -1.0402

H1: B1 Z 450,

76 PART ONE: THE LINEAR REGRESSION MODEL

guj75845_ch03.qxd 4/17/09 11:22 AM Page 76

3.9 NORMALITY TESTS

Before we leave our math S.A.T. example, we need to look at the regression re- sults given in Eq. (3.46). Remember that our statistical testing procedure is based on the assumption that the error term ui is normally distributed. How do we find out if this is the case in our example, since we do not directly observe the true errors ui? We have the residuals, ei, which are proxies for ui. Therefore, we will have to use the ei to learn something about the normality of ui. There are several tests of normality, but here we will consider only three comparatively simple tests.15

Histograms of Residuals

A histogram of residuals is a simple graphical device that is used to learn some- thing about the shape of the probability density function (PDF) of a random variable. On the horizontal axis, we divide the values of the variable of interest (e.g., OLS residuals) into suitable intervals, and in each class interval, we erect rectangles equal in height to the number of observations (i.e., frequency) in that class interval.

If you mentally superimpose the bell-shaped normal distribution curve on this histogram, you might get some idea about the nature of the probability distribution of the variable of interest.

It is always a good practice to plot the histogram of residuals from any regression to get some rough idea about the likely shape of the underlying probability distribution.

CHAPTER THREE: THE TWO-VARIABLE MODEL: HYPOTHESIS TESTING 77

We also show (in Figure 3-9) how EViews presents the actual and esti- mated Y values as well as the residuals (i.e., ei) in graphic form:

Actual Yi

Residual ei

Residual Plot

(0) (�)(�)

Fitted Yi ˆ

410.000 420.000 440.000 490.000 530.000 530.000 550.000 540.000 570.000 590.000

439.073 452.392 465.711 479.030 492.349 505.668 518.987 532.306 552.284 632.198

�29.0733 �32.3922 �25.7112 10.9698 37.6509 24.3319 31.0129 07.69397 17.7155

�42.1983

Actual and fitted Y values and residuals for the math S.A.T. exampleFIGURE 3-9

15For a detailed discussion of various normality tests, see G. Barrie Wetherhill, Regression Analysis with Applications, Chapman and Hall, London, 1986, Chap. 8.

guj75845_ch03.qxd 4/16/09 11:24 AM Page 77

Normal Probability Plot

Another comparatively simple graphical device to study the PDF of a random variable is the normal probability plot (NPP) which makes use of normal prob- ability paper, a specially ruled graph paper. On the horizontal axis, (X-axis) we plot values of the variable of interest (say, OLS residuals ei), and on the vertical axis (Y-axis), we show the expected values of this variable if its distribution were normal. Therefore, if the variable is in fact from the normal population, the NPP will approximate a straight line. MINITAB has the capability to plot the NPP of any random variable. MINITAB also produces the Anderson-Darling normality test known as the A2 statistic. The underlying null hypothesis is that a variable is normally distributed. This hypothesis can be sustained if the com- puted A2 is not statistically significant.

Jarque-Bera Test

A test of normality that has now become very popular and is included in several statistical packages is the Jarque-Bera (JB) test.16 This is an asymptotic, or large sample, test and is based on OLS residuals. This test first computes the coeffi- cients of skewness, S (a measure of asymmetry of a PDF), and kurtosis, K (a mea- sure of how tall or flat a PDF is in relation to the normal distribution), of a ran- dom variable (e.g., OLS residuals) (see Appendix B). For a normally distributed variable, skewness is zero and kurtosis is 3 (see Figure B-4 in Appendix B).

Jarque and Bera have developed the following test statistic:

(3.47)

where n is the sample size, S represents skewness, and K represents kurtosis. They have shown that under the normality assumption the JB statistic given in Equation (3.47) follows the chi-square distribution with 2 d.f. asymptotically (i.e., in large samples). Symbolically,

(3.48)

where asy means asymptotically. As you can see from Eq. (3.47), if a variable is normally distributed, S is zero

and is also zero, and therefore the value of the JB statistic is zero ipso facto. But if a variable is not normally distributed, the JB statistic will assume in- creasingly larger values. What constitutes a large or small value of the JB statis- tic can be learned easily from the chi-square table (Appendix E, Table E-4). If the computed chi-square value from Eq. (3.47) exceeds the critical chi-square value for 2 d.f. at the chosen level of significance, we reject the null hypothesis of normal distribution; but if it does not exceed the critical chi-square value, we do

(K - 3)

JBasy ' �2(2)

JB = n 6 cS2 +

(K - 3)2

4 d

78 PART ONE: THE LINEAR REGRESSION MODEL

16See C. M. Jarque and A. K. Bera, “A Test for Normality of Observations and Regression Residuals,” International Statistical Review, vol. 55, 1987, pp. 163–172.

guj75845_ch03.qxd 4/16/09 11:24 AM Page 78

not reject the null hypothesis. Of course, if we have the p value of the computed chi-square value, we will know the exact probability of obtaining that value.

We will illustrate these normality tests with the following example.

3.10 A CONCLUDING EXAMPLE: RELATIONSHIP BETWEEN WAGES AND PRODUCTIVITY IN THE U.S. BUSINESS SECTOR, 1959–2006

According to the marginal productivity theory of microeconomics, we would expect a positive relationship between wages and worker productivity. To see if this so, in Table 3-3 (on the textbook’s Web site) we provide data on labor pro- ductivity, as measured by the index of output per hour of all persons, and wages, as measured by the index of real compensation per hour, for the busi- ness sector of the U.S. economy for the period 1959 to 2006. The base year of the index is 1992 and hourly real compensation is hourly compensation divided by the consumer price index (CPI).

Let Compensation (Y) = index of real compensation and Productivity (X) = index of output per hour of all persons. Plotting these data, we obtain the scatter dia- gram shown in Figure 3-10.

This figure shows a very close linear relationship between labor produc- tivity and real wages. Therefore, we can use a (bivariate) linear regression to

CHAPTER THREE: THE TWO-VARIABLE MODEL: HYPOTHESIS TESTING 79

60 80

Index of Productivity

In d

ex o

f C

om p

en sa

ti on

100 120 140 16040

130

120

110

100

90

80

70

60

50

Relationship between compensation and productivity in the U.S. business sector, 1959–2006

FIGURE 3-10

guj75845_ch03.qxd 4/16/09 11:24 AM Page 79

model the data given in Table 3-3. Using EViews, we obtain the following results:

Dependent Variable: Compensation Method: Least Squares Sample: 1959 2006 Included observations: 48

Coefficient Std. Error t-Statistic Prob. C 33.63603 1.400085 24.02428 0.0000 Productivity 0.661444 0.015640 42.29178 0.0000

R-squared 0.974926 Adjusted R-squared 0.974381 S.E. of regression 2.571761 Sum squared resid 304.2420 Durbin-Watson stat 0.146315

Let us interpret the results. The slope coefficient of about 0.66 suggests that if the index of productivity goes up by a unit, the index of real wages will go up, on average, by 0.66 units. This coefficient is highly significant, for the t value of about 42.3 (obtained under the assumption that the true population coefficient is zero) is highly significant for the p value is almost zero. The intercept coeffi- cient, C, is also highly significant, for the p value of obtaining a t value for this coefficient of as much as about 24 is practically zero.

The R2 value of about 0.97 means that the index of productivity explains about 97 percent of the variation in the index of real compensation. This is a very high value, since an R2 can at most be 1. For now neglect some of the in- formation given in the preceding table (e.g., the Durbin-Watson statistic), for we will explain it at appropriate places.

Figure 3-11 gives the actual and estimated values of the index of real com- pensation, the dependent variable in our model, as well the differences between the two, which are nothing but the residuals ei. These residuals are also plotted in this figure.

Figure 3-12 plots the histogram of the residuals shown in Figure 3-11 and also shows the JB statistics. The histogram and the JB statistic show that there is no reason to reject the hypothesis that the true error terms in the wages-productivity regression are normally distributed.

Figure 3-13 shows the normal probability plot of the residuals obtained from the compensation-productivity regression; this figure was obtained from MINITAB. As is clear from this figure, the estimated residuals lie approximately on a straight line, suggesting that the error terms (i.e., ui) in this regression may be normally distributed. The computed AD statistic of 0.813 has a p value of about 0.03 or 3 percent. If we fix the critical p value, say, at the 5 percent level, the observed AD statistic is statistically significant, suggesting that the error terms are not normally distributed. This is in contrast to the conclusion reached on the basis of the JB

80 PART ONE: THE LINEAR REGRESSION MODEL

guj75845_ch03.qxd 4/16/09 11:24 AM Page 80

CHAPTER THREE: THE TWO-VARIABLE MODEL: HYPOTHESIS TESTING 81

ResidualFittedActual Residual plot eiYi Yi

ˆ

59.8710 61.3180 63.0540 65.1920 66.6330 68.2570 69.6760 72.3000 74.1210 76.8950 78.0080 79.4520 80.8860 83.3280 85.0620 83.9880 84.8430 87.1480 88.3350 89.7360 89.8630 89.5920 89.6450 90.6370 90.5910 90.7120 91.9100 94.8690 95.2070 96.5270 95.0050 96.2190 97.4650

100.0000 99.7120 99.0240 98.6900 99.4780

100.5120 105.1730 108.0440 111.9920 113.5360 115.6940 117.7090 118.9490 119.6920 120.4470

65.4025 65.9575 67.0833 68.6145 69.9824 71.2113 72.5402 74.1191 75.0041 76.4163 76.6253 77.4799 79.2856 80.7593 82.1926 81.4300 83.1068 84.6631 85.5296 86.1018 86.0919 85.9900 87.0662 86.6495 88.5366 90.0003 91.2683 92.9497 93.2540 94.1621 94.7588 96.0664 97.0705 99.7804

100.0360 100.6730 100.7690 102.7520 104.0650 106.0470 108.2650 110.4410 112.4020 115.6210 118.7670 121.2050 122.9450 123.8600

�5.53155 �4.63950 �4.02928 �3.42252 �3.34939 �2.95435 �2.86419 �1.81906 �0.88307

0.47875 1.38273 1.97214 1.60040 2.56870 2.86936 2.55800 1.73624 2.48486 2.80537 3.63422 3.77114 3.60200 2.57884 3.98755 2.05445 0.71167 0.64168 1.91929 1.95303 2.36486 0.24624 0.15257 0.39449 0.21956

�0.32376 �1.64873 �2.07930 �3.27365 �3.55328 �0.87396 �0.22145

1.55106 1.13388 0.07329

�1.05820 �2.25562 �3.25288 �3.41265

(0)

Actual Y, estimated Y, and residuals (regression of compensation on productivity)

Note: Y = Actual index of compensation

= Estimated index of compensationYN

FIGURE 3-11

guj75845_ch03.qxd 4/16/09 11:24 AM Page 81

82 PART ONE: THE LINEAR REGRESSION MODEL

�4�6 �2 0 2 4

7

6

5

4

3

2

1

0

Series: Residuals Sample: 1959–2006 Observations: 48

Mean Median Maximum Minimum Std. Dev. Skewness Kurtosis

�1.50e-14 0.320367 3.987545

�5.531548 2.544255

�0.343902 2.008290

Jarque-Bera Probability

2.913122 0.233036

Histogram of residuals from the compensation-productivity regressionFIGURE 3-12

99

95

90

80

70

60 50 40

30

20

10

5

1 �8 �6 �2�4 0

RESI1

Probability Plot of RESI1 Normal – 95% CI

P er

ce n

t

2 4 6 8

Mean Std. Dev. N AD P-Value

3.330669E-14 2.544

48 0.813 0.033

Normal probability plot of residuals obtained from the compensation-productivity regression

FIGURE 3-13

statistic. The problem here is that our sample of 10 observations is too small for using the JB and AD statistics, which are designed for large samples.

3.11 A WORD ABOUT FORECASTING

We noted in Chapter 2 that one of the purposes of regression analysis is to pre- dict the mean value of the dependent variable, given the values of the explana- tory variable(s). To be more specific, let us return to our math S.A.T. score example. Regression (3.46) presented the results of the math section of the S.A.T. based on the score data of Table 2-2. Suppose we want to find out the

guj75845_ch03.qxd 4/16/09 11:24 AM Page 82

average math S.A.T. score by a person with a given level of annual family income. What is the expected math S.A.T. score at this level of annual family income?

To fix these ideas, assume that X (income) takes the value X0, where X0 is some specified numerical value of X, say X0 = $78,000. Now suppose we want to estimate , that is, the true mean math S.A.T. score correspond- ing to a family income of $78,000. Let

(3.49)

How do we obtain this estimate? Under the assumptions of the classical linear regression model (CLRM), it can be shown that Equation (3.49) can be obtained by simply putting the given X0 value in Eq. (3.46), which gives:

(3.50)

That is, the forecasted mean math S.A.T. score for a person with an annual family income of $78,000 is about 534 points.

Although econometric theory shows that under CLRM , or, more generally, is an unbiased estimator of the true mean value (i.e., a point on the population regression line), it is not likely to be equal to the latter in any given sample. (Why?) The difference between them is called the forecasting, or prediction, error. To assess this error, we need to find out the sampling distrib- ution of .17 Given the assumptions of the CLRM, it can be shown that is normally distributed with the following mean and variance:

(3.51)

where = the sample mean of X values in the historical regression (3.46) = their sum of squared deviations from = the variance of ui

n = sample size

The positive square root of Equation (3.51) gives the standard error of . Since in practice is not known, if we replace it by its unbiased estimator , follows the t distribution with d.f. (Why?) Therefore, we can use the t

distribution to establish a 100 % confidence interval for the true (i.e., pop- ulation) mean value of Y corresponding to X0 in the usual manner as follows:

(3.52)P[b1 + b2X0 - ta/2 se( NY0) … B1 + B2X0 … b1 + b2X0 + ta/2 se( NY0)] = (1 - a)

(1 - a) (n - 2)NY0

N�2�2 YN0, se( NY0)

�2 Xgx2i

X

var = �2J 1n + (X0 - X)2 ax

2 i K

Mean = E(Y|X0) = B1 + B2X0

YN 0YN 0

YN 0

NYNX=78000

= 533.8138

N YX=78000 = 432.4138 + 0.0013(78000)

NY0 = the estimator of E(Y|X0)

E(Y|X0 = 78000)

CHAPTER THREE: THE TWO-VARIABLE MODEL: HYPOTHESIS TESTING 83

17Note that is an estimator and therefore will have a sampling distribution.NY0

guj75845_ch03.qxd 4/16/09 11:24 AM Page 83

Let us continue with our math S.A.T. score example. First, we compute the variance of from Equation (3.51).

(3.53)

Therefore,

(3.54)

Note: In this example, and (see Table 2-4).

The preceding result suggests that given the estimated annual family income = $78,000, the mean predicted math S.A.T. score, as shown in Equation (3.50), is 533.8138 points and the standard error of this predicted value is 11.2506 (points).

Now if we want to establish, say, a 95% confidence interval for the population mean math S.A.T. score corresponding to an annual family income of $78,000, we obtain it from expression (3.52) as

That is,

(3.55)

Note: For 8 d.f., the 5 percent two-tailed t value is 2.306. Given the annual family income of $78,000, Equation (3.55) states that al-

though the single best, or point, estimate of the mean math S.A.T. score is 533.8138, it is expected to lie in the interval 507.8699 to 559.7577 points, which is between about 508 and 560, with 95% confidence. Therefore, with 95% confi- dence, the forecast error will be between -25.9439 points (507.8699 - 533.8138) and 25.9439 points (559.7577 – 533.8138).

If we obtain a 95% confidence interval like Eq. (3.55) for each value of X shown in Table 2-2, we obtain what is known as a confidence interval or con- fidence band for the true mean math S.A.T. score for each level of annual fam- ily income, or for the entire population regression line (PRL). This can be seen clearly from Figure 3-14, obtained from EViews.

Notice some interesting aspects of Figure 3-14. The width of the confi- dence band is smallest when which should be apparent from the variance formula given in Eq. (3.51). However, the width widens sharply (i.e.,

X0 = X,

507.8699 … E(Y|X = 78000) … 559.7577

533.8138 - 2.306(11.2506) … E(Y|X = 78000) … 533.8138 + 2.306 (11.2506)

�N2 = 975.1347ax 2 i = 16,240,000,000,X = 56000,

= 11.2506

sea NYX=78000b = 2126.5754

= 126.5754

vara NYX=78000b = 975.1347 c 1

10 +

(78,000 - 56,000)2

16,240,000,000 d

NYX=78000

84 PART ONE: THE LINEAR REGRESSION MODEL

guj75845_ch03.qxd 4/16/09 11:24 AM Page 84

the prediction error increases) as X0 moves away from . This suggests that the predictive ability of the historical regression, such as regression (3.46), falls markedly as X0 (the X value for which the forecast is made) departs pro- gressively from . The message here is clear: We should exercise great caution in “extrapolating” the historical regression line to predict the mean value of Y associ- ated with any X that is far removed from the sample mean of X. In more practical terms, we should not use the math S.A.T. score regression (3.46) to predict the aver- age math score for income well beyond the sample range on which the historical re- gression line is based.

3.12 SUMMARY

In Chapter 2 we showed how to estimate the parameters of the two-variable linear regression model. In this chapter we showed how the estimated model can be used for the purpose of drawing inferences about the true population regression model. Although the two-variable model is the simplest possible linear regression model, the ideas introduced in these two chapters are the foundation of the more involved multiple regression models that we will discuss in ensuing chapters. As we will see, in many ways the multiple regres- sion model is a straightforward extension of the two-variable model.

X

X

CHAPTER THREE: THE TWO-VARIABLE MODEL: HYPOTHESIS TESTING 85

Yi � 432.

4138 � 0.

0013 Xi

ˆ

559.76

533.81

507.87

95% CI

Y

600

300

10000 30000 50000

Annual Family Income

M at

h S

.A .T

. S co

re

70000 90000 X

X

95% confidence band for the true math S.A.T. score functionFIGURE 3-14

guj75845_ch03.qxd 4/16/09 11:24 AM Page 85

KEY TERMS AND CONCEPTS

The key terms and concepts introduced in this chapter are

86 PART ONE: THE LINEAR REGRESSION MODEL

Classical linear regression model (CLRM)

Homoscedasticity or equal variance Heteroscedasticity or unequal

variance Autocorrelation and no

autocorrelation Variances of OLS estimators Standard errors of OLS estimators Residual sum of squares (RSS) Standard error of the regression (SER) Sampling, or probability,

distributions of OLS estimators Gauss-Markov theorem BLUE property Central limit theorem (CLT)

“Zero” null hypothesis; straw man hypothesis

t test of significance a) two-tailed t test b) one-tailed t test

Coefficient of determination, r2

Total sum of squares (TSS) Explained sum of squares (ESS) Coefficient of alienation Coefficient of correlation, r Normal probability plot (NPP) Anderson-Darling normality test (A2

statistic) Jarque-Bera (JB) test of normality Forecasting, or prediction, error Confidence interval; confidence band

QUESTIONS

3.1. Explain the meaning of a. Least squares. b. OLS estimators. c. The variance of an estimator. d. Standard error of an estimator. e. Homoscedasticity. f. Heteroscedasticity. g. Autocorrelation. h. Total sum of squares (TSS). i. Explained sum of squares (ESS). j. Residual sum of squares (RSS).

k. r2. l. Standard error of estimate.

m. BLUE. n. Test of significance. o. t test. p. One-tailed test. q. Two-tailed test. r. Statistically significant.

3.2. State with brief reasons whether the following statements are true, false, or uncertain. a. OLS is an estimating procedure that minimizes the sum of the errors

squared, . b. The assumptions made by the classical linear regression model (CLRM) are

not necessary to compute OLS estimators.

gu2i

guj75845_ch03.qxd 4/16/09 11:24 AM Page 86

c. The theoretical justification for OLS is provided by the Gauss-Markov theorem.

d. In the two-variable PRF, b2 is likely to be a more accurate estimate of B2 if the disturbances ui follow the normal distribution.

e. The OLS estimators b1 and b2 each follow the normal distribution only if ui follows the normal distribution.

f. r2 is the ratio of TSS/ESS. g. For a given alpha and d.f., if the computed exceeds the critical t value,

we should accept the null hypothesis. h. The coefficient of correlation, r, has the same sign as the slope coefficient

b2. i. The p value and the level of significance, �, mean the same thing.

3.3. Fill in the appropriate gaps in the following statements: a. If b. If c. r2 lies between . . . and . . . d. r lies between . . . and . . . e. TSS = RSS + . . . f. d.f. (of TSS) = d.f. (of . . .) + d.f. (of RSS) g. is called . . . h. i.

3.4. Consider the following regression:

Fill in the missing numbers. Would you reject the hypothesis that true B2 is zero at ? Tell whether you are using a one-tailed or two-tailed test and why.

3.5. Show that all the following formulas to compute r2 are equivalent:

3.6. Show that gei = nY - nb1 - nb2X = 0

= Aayi Nyi B

2

Aay 2 i B Aa Ny

2 i B

= b22ax

2 i

ay 2 i

= a Ny2i

ay 2 i

r2 = 1 - a e2i

ay 2 i

� = 5%

NYi = - 66.1058 + 0.0650Xi r2 = 0.9460 se = (10.7509) ( ) n = 20

t = ( ) (18.73)

gy2i = b2(. . .) gy2i = g (Yi - . . .)2 N�

B2 = 0, t = b2/ . . . B2 = 0, b2/se(b2) = . . .

|t|

CHAPTER THREE: THE TWO-VARIABLE MODEL: HYPOTHESIS TESTING 87

guj75845_ch03.qxd 4/16/09 11:24 AM Page 87

PROBLEMS

3.7. Based on the data for the years 1962 to 1977 for the United States, Dale Bails and Larry Peppers18 obtained the following demand function for automobiles:

where Y = retail sales of passenger cars (thousands) and X = the real disposable income (billions of 1972 dollars). Note: The se for b1 is not given. a. Establish a 95% confidence interval for B2. b. Test the hypothesis that this interval includes . If not, would you

accept this null hypothesis? c. Compute the t value under . Is it statistically significant at the

5 percent level? Which t test do you use, one-tailed or two-tailed, and why? 3.8. The characteristic line of modern investment analysis involves running the

following regression:

where r = the rate of return on a stock or security rm = the rate of return on the market portfolio represented by a broad

market index such as S&P 500, and t = time

In investment analysis, B2 is known as the beta coefficient of the security and is used as a measure of market risk, that is, how developments in the market affect the fortunes of a given company.

Based on 240 monthly rates of return for the period 1956 to 1976, Fogler and Ganapathy obtained the following results for IBM stock. The market index used by the authors is the market portfolio index developed at the University of Chicago:19

a. Interpret the estimated intercept and slope. b. How would you interpret r2? c. A security whose beta coefficient is greater than 1 is called a volatile or

aggressive security. Set up the appropriate null and alternative hypotheses and test them using the t test. Note: Use .

3.9. You are given the following data based on 10 pairs of observations on Y and X.

aX 2 i = 315,400 aY

2 i = 133,300

ayi = 1110 aXi = 1680 aXiYi = 204,200

� = 5%

se = (0.3001) (0.0728) r2 = 0.4710

rt = 0.7264 + 1.0598rmt

r1 = B1 + B2rmt + ut

H0:B2 = 0

B2 = 0

se = (1.634)

YNt = 5807 + 3.24Xt r2 = 0.22

88 PART ONE: THE LINEAR REGRESSION MODEL

18See Dale G. Bails and Larry C. Peppers, Business Fluctuations: Forecasting Techniques and Applications, Prentice-Hall, Englewood Cliffs, N.J., 1982, p. 147.

19H. Russell Fogler and Sundaram Ganapathy, Financial Econometrics, Prentice-Hall, Englewood- Cliffs, N.J., 1982, p. 13.

guj75845_ch03.qxd 4/16/09 11:24 AM Page 88

Assuming all the assumptions of CLRM are fulfilled, obtain a. b1 and b2. b. standard errors of these estimators. c. r2. d. Establish 95% confidence intervals for B1 and B2. e. On the basis of the confidence intervals established in (d), can you accept

the hypothesis that 3.10. Based on data for the United States for the period 1965 to 2006 (found in Table

3-4 on the textbook’s Web site), the following regression results were obtained:

where GNP is the gross national product ($, in billions) and M1 is the money supply ($, in billions). Note: M1 includes currency, demand deposits, traveler’s checks, and other checkable deposits. a. Fill in the blank parentheses. b. The monetarists maintain that money supply has a significant positive

impact on GNP. How would you test this hypothesis? c. What is the meaning of the negative intercept? d. Suppose M1 for 2007 is $750 billion. What is the mean forecast value of

GNP for that year? 3.11. Political business cycle: Do economic events affect presidential elections? To test

this so-called political business cycle theory, Gary Smith20 obtained the fol- lowing regression results based on the U.S. presidential elections for the four yearly periods from 1928 to 1980 (i.e., the data are for years 1928, 1932, etc.):

where Y is the percentage of the vote received by the incumbent and X is the unemployment rate change—unemployment rate in an election year minus the unemployment rate in the preceding year. a. A priori, what is the expected sign of X? b. Do the results support the political business cycle theory? Support your

contention with appropriate calculations. c. Do the results of the 1984 and 1988 presidential elections support the

preceding theory? d. How would you compute the standard errors of b1 and b2?

3.12. To study the relationship between capacity utilization in manufacturing and inflation in the United States, we obtained the data shown in Table 3-5 (found on the textbook’s Web site). In this table, Y = inflation rate as measured by the

t = (34.10) (- 2.67) r2 = 0.37

NYt = 53.10 - 1.70Xt

t = (- 3.8258) ( )

se = ( ) (0.3214)

GNPt = - 995.5183 + 8.7503M1t r2 = 0.9488

B2 = 0?

CHAPTER THREE: THE TWO-VARIABLE MODEL: HYPOTHESIS TESTING 89

20Gary Smith, Statistical Reasoning, Allyn & Bacon, Boston, Mass., 1985, p. 488. Change in notation was made to conform with our format. The original data were obtained by Ray C. Fair, “The Effect of Economic Events on Votes for President,” The Review of Economics and Statistics, May 1978, pp. 159–173.

guj75845_ch03.qxd 4/16/09 11:24 AM Page 89

percentage change in GDP implicit price deflator and X = capacity utilization rate in manufacturing as measured by output as a percent of capacity for the years 1960–2007. a. A priori, what would you expect to be the relationship between inflation

rate and capacity utilization rate? What is the economic rationale behind your expectation?

b. Regress Y on X and present your result in the format of Eq. (3.46 ). c. Is the estimated slope coefficient statistically significant? d. Is it statistically different from unity? e. The natural rate of capacity utilization is defined as the rate at which Y is

zero. What is this rate for the period under study? 3.13. Reverse regression21: Continue with Problem 3.12, but suppose we now regress

X on Y. a. Present the result of this regression and comment. b. If you multiply the slope coefficients in the two regressions, what do you

obtain? Is this result surprising to you? c. The regression in Problem 3.12 may be called the direct regression. When

would a reverse regression be appropriate? d. Suppose the r2 value between X and Y is 1. Does it then make any differ-

ence if we regress Y on X or X on Y? 3.14. Table 3-6 gives data on X (net profits after tax in U.S. manufacturing industries

[$, in millions]) and Y (cash dividend paid quarterly in manufacturing indus- tries [$, in millions]) for years 1974 to 1986. a. What relationship, if any, do you expect between cash dividend and after-tax

profits? b. Plot the scattergram between Y and X. c. Does the scattergram support your expectations in part (a)? d. If so, do an OLS regression of Y on X and obtain the usual statistics. e. Establish a 99% confidence interval for the true slope and test the hypothe-

sis that the true slope coefficient is zero; that is, there is no relationship between dividend and the after-tax profit.

90 PART ONE: THE LINEAR REGRESSION MODEL

21On this see G. S. Maddala, Introduction to Econometrics, 3rd ed., Wiley, New York, 2001, pp. 71–75.

CASH DIVIDEND (Y ) AND AFTER-TAX PROFITS (X) IN U.S. MANUFACTURING INDUSTRIES, 1974–1986

Year Y X Year Y X

($, in millions) ($, in millions)

1974 19,467 58,747 1981 40,317 101,302 1975 19,968 49,135 1982 41,259 71,028 1976 22,763 64,519 1983 41,624 85,834 1977 26,585 70,366 1984 45,102 107,648 1978 28,932 81,148 1985 45,517 87,648 1979 32,491 98,698 1986 46,044 83,121 1980 36,495 92,579

Source: Business Statistics, 1986, U.S. Department of Commerce, Bureau of Economic Analysis, December 1987, p. 72.

TABLE 3-6

guj75845_ch03.qxd 4/16/09 11:24 AM Page 90

3.15. Refer to the S.A.T. data given in Table 2-15 on the textbook’s Web site. Suppose you want to predict the male math scores on the basis of the female math scores by running the following regression:

where Y and X denote the male and female math scores, respectively. a. Estimate the preceding regression, obtaining the usual summary statistics. b. Test the hypothesis that there is no relationship between Y and X whatsoever. c. Suppose the female math score in 2008 is expected to be 490. What is the

predicted (average) male math score? d. Establish a 95% confidence interval for the predicted value in part (c).

3.16. Repeat the exercise in Problem 3.15 but let Y and X denote the male and the female critical reading scores, respectively. Assume a female critical reading score for 2008 of 505.

3.17. Consider the following regression results:22

where Y = the real return on the stock price index from January of the current year to January of the following year

X = the total dividends in the preceding year divided by the stock price index for July of the preceding year

t = time

Note: On Durbin-Watson statistic, see Chapter 10. The time period covered by the study was 1926 to 1982. Note: stands for the adjusted coefficient of determination. The Durbin- Watson value is a measure of autocorrelation. Both measures are explained in subsequent chapters. a. How would you interpret the preceding regression? b. If the previous results are acceptable to you, does that mean the best in-

vestment strategy is to invest in the stock market when the dividend/price ratio is high?

c. If you want to know the answer to part (b), read Shiller’s analysis. 3.18. Refer to Example 2.1 on years of schooling and average hourly earnings. The

data for this example are given in Table 2-5 and the regression results are pre- sented in Eq. (2.21). For this regression a. Obtain the standard errors of the intercept and slope coefficients and r2. b. Test the hypothesis that schooling has no effect on average hourly earnings.

Which test did you use and why? c. If you reject the null hypothesis in (b), would you also reject the hypothesis

that the slope coefficient in Eq. (2.21) is not different from 1? Show the necessary calculations.

R2

t = (- 1.73)(2.71)

NYt = - 0.17 + 5.26Xt R2 = 0.10, Durbin-Watson = 2.01

Yt = B1 + B2Xt + ut

CHAPTER THREE: THE TWO-VARIABLE MODEL: HYPOTHESIS TESTING 91

22See Robert J. Shiller, Market Volatility, MIT Press, Cambridge, Mass., 1989, pp. 32–36.

guj75845_ch03.qxd 4/16/09 11:24 AM Page 91

3.19. Example 2.2 discusses Okun’s law, as shown in Eq. (2.22). This equation can also be written as where X = percent growth in real output, as measured by GDP and Y = change in the unemployment rate, measured in percentage points. Using the data given in Table 2-13 on the textbook’s Web site, a. Estimate the preceding regression, obtaining the usual results as per

Eq. (3.46). b. Is the change in the unemployment rate a significant determinant of per-

cent growth in real GDP? How do you know? c. How would you interpret the intercept coefficient in this regression? Does

it have any economic meaning? 3.20. For Example 2.3, relating stock prices to interest rates, are the regression results

given in Eq. (2.24) statistically significant? Show the necessary calculations. 3.21. Refer to Example 2.5 about antique clocks and their prices. Based on Table 2-14,

we obtained the regression results shown in Eqs. (2.27) and (2.28). For each regression obtain the standard errors, the t ratios, and the r2 values. Test for the statistical significance of the estimated coefficients in the two regressions.

3.22. Refer to Problem 3.22. Using OLS regressions, answer questions (a), (b), and (c). 3.23. Table 3-7 (found on the textbook’s Web site) gives data on U.S. expenditure on

imported goods (Y) and personal disposable income (X) for the period 1959 to 2006.

Based on the data given in this table, estimate an import expenditure func- tion, obtaining the usual regression statistics, and test the hypothesis that expenditure on imports is unrelated to personal disposable income.

3.24. Show that the OLS estimators, b1 and b2, are linear estimators. Also show that these estimators are linear functions of the error term ui (Hint: Note that

where and note that the X’s are nonstochastic).

3.25. Prove Eq. (3.35). (Hint: Square Eq. [3.33] and use some of the properties of OLS).

wi = xi/gx 2 ib2 = gxiyi/gx

2 i = gwiyi,

Xt = B1 + B2Yt,

92 PART ONE: THE LINEAR REGRESSION MODEL

guj75845_ch03.qxd 4/16/09 11:24 AM Page 92

CHAPTER 4 MULTIPLE REGRESSION:

ESTIMATION AND HYPOTHESIS TESTING

93

In the two-variable linear regression model that we have considered so far there was a single independent, or explanatory, variable. In this chapter we extend that model by considering the possibility that more than one explanatory vari- able may influence the dependent variable. A regression model with more than one explanatory variable is known as a multiple regression model, multiple because multiple influences (i.e., variables) may affect the dependent variable.

For example, consider the 1980s savings and loan (S&L) crisis resulting from the bankruptcies of some S&L institutions in several states. Similar events also occurred in the fall of 2008 as several banks were forced into bankruptcy. What factors should we focus on to understand these events? Is there a way to reduce the possibility that they will happen again? Suppose we want to develop a regre- ssion model to explain bankruptcy, the dependent variable. Now a phenomenon such as bankruptcy is too complex to be explained by a single explanatory vari- able; the explanation may entail several variables, such as the ratio of primary capital to total assets, the ratio of loans that are more than 90 days past due to total assets, the ratio of nonaccruing loans to total assets, the ratio of renegotiated loans to total assets, the ratio of net income to total assets, etc.1 To include all these variables in a regression model to allow for the multiplicity of influences affecting bankruptcies, we have to consider a multiple regression model.

Needless to say, we could cite hundreds of examples of multiple regression models. In fact, most regression models are multiple regression models because very few economic phenomena can be explained by only a single explanatory variable, as in the case of the two-variable model.

1As a matter of fact, these were some of the variables that were considered by the Board of Governors of the Federal Reserve System in their internal studies of bankrupt banks.

guj75845_ch04.qxd 4/16/09 11:27 AM Page 93

New Text

In this chapter we discuss the multiple regression model seeking answers to the following questions:

1. How do we estimate the multiple regression model? Is the estimating procedure any different from that for the two-variable model?

2. Is the hypothesis-testing procedure any different from the two-variable model?

3. Are there any unique features of multiple regressions that we did not encounter in the two-variable case?

4. Since a multiple regression can have any number of explanatory varia- bles, how do we decide how many variables to include in any given situation?

To answer these and other related questions, we first consider the simplest of the multiple regression models, namely, the three-variable model in which the behavior of the dependent variable Y is examined in relation to two explanatory variables, X2 and X3. Once the three-variable model is clearly understood, the extension to the four-, five-, or more variable case is quite straightforward, although the arithmetic gets a bit tedious. (But in this age of high-speed com- puters, that should not be a problem.) It is interesting that the three-variable model itself is in many ways a clear-cut extension of the two-variable model, as the following discussion reveals.

4.1 THE THREE-VARIABLE LINEAR REGRESSION MODEL

Generalizing the two-variable population regression function (PRF), we can write the three-variable PRF in its nonstochastic form as

(4.1)2

and in the stochastic form as

(4.2)

(4.3)

where Y = the dependent variable X2 and X3 = the explanatory variables

u = the stochastic disturbance term t = the tth observation

= E(Yt) + ut

Yt = B1 + B2X2t + B3X3t + ut

E(Yt) = B1 + B2X2t + B3X3t

94 PART ONE: THE LINEAR REGRESSION MODEL

2Equation (4.1) can be written as: with the understanding that for each observation. The presentation in Eq. (4.1) is for notational convenience in that the

subscripts on the parameters or their estimators match the subscripts on the variables to which they are attached.

X1t = 1 E(Yt) = B1X1t + B2X2t + B3X3t

guj75845_ch04.qxd 4/16/09 11:27 AM Page 94

In case the data are cross-sectional, the subscript i will denote the ith observa- tion. Note that we introduce u in the three-variable, or, more generally, in the multivariable model for the same reason that it was introduced in the two- variable case.

B1 is the intercept term. It represents the average value of Y when X2 and X3 are set equal to zero. The coefficients B2 and B3 are called partial regression coefficients; their meaning will be explained shortly.

Following the discussion in Chapter 2, Equation (4.1) gives the conditional mean value of Y, conditional upon the given or fixed values of the variables X2 and X3. Therefore, as in the two-variable case, multiple regression analysis is conditional regression analysis, conditional upon the given or fixed values of the explanatory variables, and we obtain the average, or mean, value of Y for the fixed values of the X variables. Recall that the PRF gives the (conditional) means of the Y populations corresponding to the given levels of the explanatory variables, X2 and X3.3

The stochastic version, Equation (4.2), states that any individual Y value can be expressed as the sum of two components:

1. A systematic, or deterministic, component (B1 + B2X2t + B3X3t), which is simply its mean value E(Yt) (i.e., the point on the population regression line, PRL),4 and

2. ut, which is the nonsystematic, or random, component, determined by factors other than X2 and X3.

All this is familiar territory from the two-variable case; the only point to note is that we now have two explanatory variables instead of one explanatory variable.

Notice that Eq. (4.1), or its stochastic counterpart Eq. (4.2), is a linear regression model—a model that is linear in the parameters, the B’s. As noted in Chapter 2, our concern in this book is with regression models that are linear in the parameters; such models may or may not be linear in the variables (but more on this in Chapter 5).

The Meaning of Partial Regression Coefficient

As mentioned earlier, the regression coefficients B2 and B3 are known as partial regression or partial slope coefficients. The meaning of the partial regression coefficient is as follows: B2 measures the change in the mean value of Y, E(Y), per unit change in X2, holding the value of X3 constant. Likewise, B3 measures the

CHAPTER FOUR: MULTIPLE REGRESSION: ESTIMATION AND HYPOTHESIS TESTING 95

3Unlike the two-variable case, we cannot show this diagrammatically because to represent the three variables Y, X2, and X3, we have to use a three-dimensional diagram, which is difficult to visualize in two-dimensional form. But by stretching the imagination, we can visualize a diagram similar to Figure 2-6.

4Geometrically, the PRL in this case represents what is known as a plane.

guj75845_ch04.qxd 4/16/09 11:27 AM Page 95

change in the mean value of Y per unit change in X3, holding the value of X2 constant. This is the unique feature of a multiple regression; in the two-variable case, since there was only a single explanatory variable, we did not have to worry about the presence of other explanatory variables in the model. In the multiple regression model we want to find out what part of the change in the average value of Y can be directly attributable to X2 and what part to X3. Since this point is so crucial to understanding the logic of multiple regression, let us explain it by a simple example. Suppose we have the following PRF:

(4.4)

Let X3 be held constant at the value 10. Putting this value in Equation (4.4), we obtain

(4.5)

Here the slope coefficient B2 = −1.2 indicates that the mean value of Y decreases by 1.2 per unit increase in X2 when X3 is held constant—in this example it is held constant at 10 although any other value will do.5 This slope coefficient is called the partial regression coefficient.6 Likewise, if we hold X2 constant, say, at the value 5, we obtain

(4.6)

Here the slope coefficient B3 = 0.8 means that the mean value of Y increases by 0.8 per unit increase in X3 when X2 is held constant—here it is held constant at 5, but any other value will do just as well. This slope coefficient too is a partial regression coefficient.

In short, then, a partial regression coefficient reflects the (partial) effect of one ex- planatory variable on the mean value of the dependent variable when the values of other explanatory variables included in the model are held constant. This unique feature of multiple regression enables us not only to include more than one explanatory variable in the model but also to “isolate” or “disentangle” the effect of each X variable on Y from the other X variables included in the model.

We will consider a concrete example in Section 4.5.

= 9 + 0.8X3t

E(Yt) = 15 - 1.2(5) + 0.8X3t

= 23 - 1.2X2t

= (15 + 8) - 1.2X2t

E(Yt) = 15 - 1.2X2t + 0.8(10)

E (Yt) = 15 - 1.2X2t + 0.8X3t

96 PART ONE: THE LINEAR REGRESSION MODEL

5As the algebra of Eq. (4.5) shows, it does not matter at what value X3 is held constant, for that constant value multiplied by its coefficient will be a constant number, which will simply be added to the intercept.

6The mathematically inclined reader will notice at once that B2 is the partial derivative of E(Y) with respect to X2 and that B3 is the partial derivative of E(Y) with respect to X3.

guj75845_ch04.qxd 4/16/09 11:27 AM Page 96

4.2 ASSUMPTIONS OF THE MULTIPLE LINEAR REGRESSION MODEL

As in the two-variable case, our first order of business is to estimate the regres- sion coefficients of the multiple regression model. Toward that end, we con- tinue to operate within the framework of the classical linear regression model (CLRM) first introduced in Chapter 3 and to use the method of ordinary least squares (OLS) to estimate the coefficients.

Specifically, for model (4.2), we assume (cf. Section 3.1):

A4.1.

The regression model is linear in the parameters as in Eq. (4.1) and it is cor- rectly specified.

A4.2.

X2 and X3 are uncorrelated with the disturbance term u. If X2 and X3 are nonstochastic (i.e., fixed numbers in repeated sampling), this assumption is automatically fulfilled.

However, if the X variables are random, or stochastic, they must be dis- tributed independently of the error term u; otherwise, we will not be able to obtain unbiased estimates of the regression coefficients. But more on this in Chapter 11.

A4.3.

The error term u has a zero mean value; that is,

(4.7)

A4.4.

Homoscedasticity, that is, the variance of u, is constant:

(4.8)

A4.5.

No autocorrelation exists between the error terms ui and uj:

(4.9)

A4.6.

No exact collinearity exists between X2 and X3; that is, there is no exact linear relationship between the two explanatory variables. This is a new assump- tion and is explained later.

cov (ui, uj) i Z j

var (ui) = �2

E (ui) = 0

CHAPTER FOUR: MULTIPLE REGRESSION: ESTIMATION AND HYPOTHESIS TESTING 97

guj75845_ch04.qxd 4/16/09 11:27 AM Page 97

A4.7.

For hypothesis testing, the error term u follows the normal distribution with mean zero and (homoscedastic) variance . That is,

(4.10)

Except for Assumption (4.6), the rationale for the other assumptions is the same as that discussed for the two-variable linear regression. As noted in Chapter 3, we make these assumptions to facilitate the development of the sub- ject. In Part II we will revisit these assumptions and see what happens if one or more of them are not fulfilled in actual applications.

According to Assumption (4.6) there is no exact linear relationship between the explanatory variables X2 and X3, technically known as the assumption of no collinearity, or no multicollinearity, if more than one exact linear relationship is in- volved. This concept is new and needs some explanation.

Informally, no perfect collinearity means that a variable, say, X2, cannot be expressed as an exact linear function of another variable, say, X3. Thus, if we can express

or

then the two variables are collinear, for there is an exact linear relationship between X2 and X3. Assumption (4.6) states that this should not be the case. The logic here is quite simple. If, for example, X2 = 4X3, then substituting this in Eq. (4.1), we see that

(4.11)

where

(4.12)

Equation (4.11) is a two-variable model, not a three-variable model. Now even if we can estimate Eq. (4.11) and obtain an estimate of A, there is no way that we can get individual estimates of B2 or B3 from the estimated A. Note that since Equation (4.12) is one equation with two unknowns we need two (independent) equations to obtain unique estimates of B2 and B3.

The upshot of this discussion is that in cases of perfect collinearity we cannot estimate the individual partial regression coefficients B2 and B3; in other words,

A = 4B2 + B3

= B1 + AX3t

= B1 + (4B2 + B3)X3t

E(Yt) = B1 + B2(4X3t) + B3X3t

X2t = 4X3t

X2t = 3 + 2X3t

ui ' N(0, �2)

�2

98 PART ONE: THE LINEAR REGRESSION MODEL

guj75845_ch04.qxd 4/16/09 11:27 AM Page 98

we cannot assess the individual effect of X2 and X3 on Y. But this is hardly surprising, for we really do not have two independent variables in the model.

Although, in practice, the case of perfect collinearity is rare, the cases of high or near perfect collinearity abound. In a later chapter (see Chapter 8) we will examine this case more fully. For now we merely require that two or more explanatory variables do not have exact linear relationships among them.

4.3 ESTIMATION OF THE PARAMETERS OF MULTIPLE REGRESSION

To estimate the parameters of Eq. (4.2), we use the ordinary least squares (OLS) method whose main features have already been discussed in Chapters 2 and 3.

Ordinary Least Squares Estimators

To find the OLS estimators, let us first write the sample regression function (SRF) corresponding to the PRF Eq. (4.2), as follows:

(4.13)

where, following the convention introduced in Chapter 2, e is the residual term, or simply the residual—the sample counterpart of u—and where the b’s are the estimators of the population coefficients, the B’s. More specifically,

The sample counterpart of Eq. (4.1) is

(4.14)

which is the estimated population regression line (PRL) (actually a plane). As explained in Chapter 2, the OLS principle chooses the values of the un-

known parameters in such a way that the residual sum of squares (RSS) is as small as possible. To do this, we first write Equation (4.13) as

(4.15)

Squaring this equation on both sides and summing over the sample observa- tions, we obtain

(4.16)

And in OLS we minimize this RSS (which is simply the sum of the squared difference between actual Yt and estimated Yt).

RSS: a e 2 t = a (Yt - b1 - b2X2t - b3X3t)

2

et = Yt - b1 - b2 X2t - b3X3t

a et 2

NYt = b1 + b2X2t + b3X3t

b1 = the estimator of B1 b2 = the estimator of B2 b3 = the estimator of B3

Yt = b1 + b2X2t + b3X3t + et

CHAPTER FOUR: MULTIPLE REGRESSION: ESTIMATION AND HYPOTHESIS TESTING 99

guj75845_ch04.qxd 4/16/09 11:27 AM Page 99

The minimization of Equation (4.16) involves the calculus technique of dif- ferentiation. Without going into detail, this process of differentiation gives us the following equations, known as (least squares) normal equations, to help esti- mate the unknowns7 (compare them with the corresponding equations given for the two-variable case in Equations [2.14] and [2.15]):

(4.17)

(4.18)

(4.19)

where the summation is over the sample range 1 to n. Here we have three equa- tions in three unknowns; the knowns are the variables Y and the X’s and the un- knowns are the b’s. Ordinarily, we should be able to solve three equations with three unknowns. By simple algebraic manipulations of the preceding equa- tions, we obtain the three OLS estimators as follows:

(4.20)

(4.21)

(4.22)

where, as usual, lowercase letters denote deviations from sample mean values (e.g., ).

You will notice the similarity between these equations and the correspond- ing ones for the two-variable case given in Eqs. (2.16) and (2.17). Also, notice the following features of the preceding equations: (1) Equations (4.21) and (4.22) are symmetrical in that one can be obtained from the other by interchanging the roles of x2 and x3, and (2) the denominators of these two equations are identical.

Variance and Standard Errors of OLS Estimators

Having obtained the OLS estimators of the intercept and partial regression coefficients, we can derive the variances and standard errors of these estimators in the manner of the two-variable model. These variances or standard errors give us some idea about the variability of the estimators from sample to sample. As in the two-variable case, we need the standard errors for two main

yt = Yt - Y

b3 = (gytx3t) Agx22t B - (gytx2t) (gx2tx3t)

Agx22t B Agx23t B - (gx2tx3t)2

b2 = (gytx2t) Agx 23t B - (gytx3t)(gx2tx3t)

Agx22t B Agx23t B - (gx2tx3t)2

b1 = Y - b2 X2 - b3 X3

aYtX3t = b1aX3t + b2aX2tX3t + b3aX 2 3t

aYX2t = b1aX2t + b2aX 2 2t + b3aX2tX3t

Y = b1 + b2X2 + b3X3

100 PART ONE: THE LINEAR REGRESSION MODEL

7The mathematical details can be found in Appendix 4A.1.

guj75845_ch04.qxd 4/16/09 11:27 AM Page 100

purposes: (1) to establish confidence intervals for the true parameter values and (2) to test statistical hypotheses. The relevant formulas, stated without proof, are as follows:

(4.23)

(4.24)

(4.25)

(4.26)

(4.27)

(4.28)

In all these formulas is the (homoscedastic) variance of the population error term ut. The OLS estimator of this unknown variance is

(4.29)

This formula is a straightforward extension of its two-variable companion given in Equation (3.8) except that now the degrees of freedom (d.f.) are (n - 3). This is because in estimating RSS, , we must first obtain b1, b2, and b3, which consume 3 d.f. This argument is quite general. In the four-variable case the d.f. will be (n - 4); in the five-variable case, (n - 5); etc.

Also, note that the (positive) square root of

(4.30)

is the standard error of the estimate, or the standard error of the regression, which, as noted in Chapter 3, is the standard deviation of Y values around the estimated regression line.

A word about computing . Since , to compute this expression, one has first to compute , which the computer does very easily. But there is a shortcut to computing the RSS (see Appendix 4A.2), which is

(4.31)

which can be readily computed once the partial slopes are estimated.

a e 2 t = ay

2 t - b2 a yt x2t - b3a yt x3t

NYt ge2t = g (Yt - NYt)2ge2t

N� = 2 N�2

N�2

ge2t

N�2 = ge2t

n - 3

�2

se(b3) = 2var (b3)

var (b3) = gx22t

Agx22t B Agx23t B - (gx2t x3t)2 # �2

se(b2) = 2var (b2)

var (b2) = gx23t

Agx22t B Agx23t B - (gx2t x3t)2 # �2

se(b1) = 2var (b1)

var (b1) = c 1 n

+ X22 gx23t + X23gx22t - 2X2X3gx2tx3t

gx22tgx23t - (gx2tx3t)2 d # �2

CHAPTER FOUR: MULTIPLE REGRESSION: ESTIMATION AND HYPOTHESIS TESTING 101

guj75845_ch04.qxd 4/16/09 11:27 AM Page 101

Properties of OLS Estimators of Multiple Regression

In the two-variable case we saw that under assumed conditions the OLS esti- mators are best linear unbiased estimators (BLUE). This property continues to hold for the multiple regression. Thus, each regression coefficient estimated by OLS is linear and unbiased—on the average it coincides with the true value. Among all such linear unbiased estimators, the OLS estimators have the least possible variance so that the true parameter can be estimated more accurately than by competing linear unbiased estimators. In short, the OLS estimators are efficient.

As the preceding development shows, in many ways the three-variable model is an extension of its two-variable counterpart, although the estimating formulas are a bit involved. These formulas get much more involved and cumbersome once we go beyond the three-variable model. In that case, we have to use matrix algebra, which expresses various estimating formulas more compactly. Of course, in this text matrix algebra is not used. Besides, today you rarely compute the estimates by hand; instead, you let the computer do the work.

4.4 GOODNESS OF FIT OF ESTIMATED MULTIPLE REGRESSION: MULTIPLE COEFFICIENT OF DETERMINATION, R2

In the two-variable case we saw that r2 as defined in Equation (3.38) measures the goodness of fit of the fitted sample regression line (SRL); that is, it gives the proportion or percentage of the total variation in the dependent variable Y explained by the single explanatory variable X. This concept of r2 can be extended to regression models containing any number of explanatory variables. Thus, in the three- variable case we would like to know the proportion of the total variation in

explained by X2 and X3 jointly. The quantity that gives this informa- tion is known as the multiple coefficient of determination and is denoted by the symbol R2; conceptually, it is akin to r2.

As in the two-variable case, we have the identity (cf. Eq. 3.36):

TSS = ESS + RSS (4.32)

where TSS = the total sum of squares of the dependent variable ESS = the explained sum of squares (i.e., explained by all the X variables) RSS = the residual sum of squares

Also, as in the two-variable case, R2 is defined as

(4.33)

That is, it is the ratio of the explained sum of squares to the total sum of squares; the only change is that the ESS is now due to more than one explanatory variable.

R2 = ESS TSS

Y (=gy2t)

Y(=gy2t )

102 PART ONE: THE LINEAR REGRESSION MODEL

guj75845_ch04.qxd 4/16/09 11:27 AM Page 102

Now it can be shown that8

(4.34)

and, as shown before,

(4.35)

Therefore, R2 can be computed as

(4.36)9

In passing, note that the positive square root of R2, R, is known as the coeffi- cient of multiple correlation, the two-variable analogue of r. Just as r measures the degree of linear association between Y and X, R can be interpreted as the de- gree of linear association between Y and all the X variables jointly. Although r can be positive or negative, R is always taken to be positive. In practice, how- ever, R is of little importance.

4.5 ANTIQUE CLOCK AUCTION PRICES REVISITED

Let us take time out to illustrate all the preceding theory with the antique clock auction prices example we considered in Chapter 2 (See Table 2-14). Let Y = auc- tion price, X2 = age of clock, and X3 = number of bidders. A priori, one would expect a positive relationship between Y and the two explanatory variables. The results of regressing Y on the two explanatory variables are as follows (the EViews output of this regression is given in Appendix 4A.4).

se = (175.2725) (0.9123) (8.8019)

t = (-7.6226) (13.9653) (9.7437) (4.37)

p = (0.0000)* (0.0000)* (0.0000)*

R2 = 0.8906; F = 118.0585

Interpretation of the Regression Results

As expected, the auction price is positively related to both the age of the clock and the number of bidders. The interpretation of the slope coefficient of about 12.74 means that holding other variables constant, if the age of the clock goes up

YiN = -1336.049 + 12.7413X2i + 85.7640X3i

R2 = b2gytx2t + b3gytx3t

gy2t

RSS = ay 2 t - b2ayt x2t - b3ayt x3t

ESS = b2ayt x2t + b3aytx3t

CHAPTER FOUR: MULTIPLE REGRESSION: ESTIMATION AND HYPOTHESIS TESTING 103

8See Appendix 4A.2. 9R2 can also be computed as .

*Denotes an extremely small value.

1 - RSSTSS = 1 - g e2t gy2t

guj75845_ch04.qxd 4/16/09 11:27 AM Page 103

by a year, the average price of the clock will go up by about 12.74 marks. Likewise, holding other variables constant, if the number of bidders increases by one, the average price of the clock goes up by about 85.76 marks. The nega- tive value of the intercept has no viable economic meaning. The R2 value of about 0.89 means that the two explanatory variables account for about 89 per- cent of the variation in the auction bid price, a fairly high value. The F value given in Eq. (4.37) will be explained shortly.

4.6 HYPOTHESIS TESTING IN A MULTIPLE REGRESSION: GENERAL COMMENTS

Although R2 gives us an overall measure of goodness of fit of the estimated re- gression line, by itself R2 does not tell us whether the estimated partial regres- sion coefficients are statistically significant, that is, statistically different from zero. Some of them may be and some may not be. How do we find out?

To be specific, let us suppose we want to entertain the hypothesis that age of the antique clock has no effect on its price. In other words, we want to test the null hypothesis: . How do we go about it? From our discussion of hypothesis testing for the two-variable model given in Chapter 3, in order to answer this question we need to find out the sampling distribution of b2, the estimator of B2. What is the sampling distribution of b2? And what is the sam- pling distribution of b1 and b3?

In the two-variable case we saw that the OLS estimators, b1 and b2, are nor- mally distributed if we are willing to assume that the error term u follows the normal distribution. Now in Assumption (4.7) we have stated that even for multiple regression we will continue to assume that u is normally distributed with zero mean and constant variance �2. Given this and the other assumptions listed in Section 4.2, we can prove that b1, b2, and b3 each follow the normal dis- tribution with means equal to B1, B2, and B3, respectively, and the variances given by Eqs. (4.23), (4.25), and (4.27), respectively.

However, as in the two-variable case, if we replace the true but unobservable �2 by its unbiased estimator given in Eq. (4.29), the OLS estimators follow the t distribution with (n � 3) d.f., not the normal distribution. That is,

(4.38)

(4.39)

(4.40)

Notice that the d.f. are now (n � 3) because in computing the RSS, and hence , we first need to estimate the intercept and the two partial slope coef- ficients; so we lose 3 d.f.

�N 2 ge2t,

t = b3 - B3 se(b2)

~ tn-3

t = b2 - B2 se(b2)

~ tn-3

t = b1 - B1 se(b1)

~ tn-3

�N 2

H0 : B2 = 0

104 PART ONE: THE LINEAR REGRESSION MODEL

guj75845_ch04.qxd 4/16/09 11:27 AM Page 104

We know that by replacing with the OLS estimators follow the t distri- bution. Now we can use this information to establish confidence intervals as well as to test statistical hypotheses about the true partial regression coeffi- cients. The actual mechanics in many ways resemble the two-variable case, which we now illustrate with an example.

4.7 TESTING HYPOTHESES ABOUT INDIVIDUAL PARTIAL REGRESSION COEFFICIENTS

Suppose in our illustrative example we hypothesize that

That is, under the null hypothesis, the age of the antique clock has no effect whatsoever on its bid price, whereas under the alternative hypothesis, it is con- tended that age has some effect, positive or negative, on price. The alternative hypothesis is thus two-sided.

Given the preceding null hypothesis, we know that

(Note: B2 = 0) (4.41)

follows the t distribution with (n � 3) � 29 d.f., since n = 32 in our example. From the regression results given in Eq. (4.37), we obtain

(4.42)

which has the t distribution with 29 d.f. On the basis of the computed t value, do we reject the null hypothesis that

the age of the antique clock has no effect on its bid price? To answer this ques- tion, we can either use the test of significance approach or the confidence interval approach, as we did for the two-variable regression.

The Test of Significance Approach

Recall that in the test of significance approach to hypothesis testing we develop a test statistic, find out its sampling distribution, choose a level of significance �, and determine the critical value(s) of the test statistic at the chosen level of significance. Then we compare the value of the test statistic obtained from the sample at hand with the critical value(s) and reject the null hypothesis if the computed value of the test statistic exceeds the critical value(s).10 Alternatively,

t = 12.7413 0.9123

L 13.9653

= b2

se(b2)

t = b2 - B2 se(b2)

H0 : B2 = 0 and H1 : B2 Z 0

�N 2�2

CHAPTER FOUR: MULTIPLE REGRESSION: ESTIMATION AND HYPOTHESIS TESTING 105

10If the test statistic has a negative value, we consider its absolute value and say that if the absolute value of the test statistic exceeds the critical value, we reject the null hypothesis.

guj75845_ch04.qxd 4/16/09 11:27 AM Page 105

we can find the p value of the test statistic and reject the null hypothesis if the p value is smaller than the chosen � value. The approach that we followed for the two-variable case also carries over to the multiple regression.

Returning to our illustrative example, we know that the test statistic in ques- tion is the t statistic, which follows the t distribution with (n � 3) d.f. Therefore, we use the t test of significance. The actual mechanics are now straightforward. Suppose we choose � � 0.05 or 5%. Since the alternative hypothesis is two- sided, we have to find the critical t value at (Why?) for (n � 3) d.f., which in the present example is 29. Then from the t table we observe that for 29 d.f.,

(4.43)

That is, the probability that a t value lies between the limits −2.045 and +2.045 (i.e., the critical t values) is 95 percent.

From Eq. (4.42), we see that the computed t value under H0 :B2 = 0 is approxi- mately 14, which obviously exceeds the critical t value of 2.045. We therefore re- ject the null hypothesis and conclude that age of an antique clock definitely has an influence on its bid price. This conclusion is also reinforced by the p value given in Eq. (4.37), which is practically zero. That is, if the null hypothesis that B2 = 0 were true, our chances of obtaining a t value of about 14 or greater would be practically nil. Therefore, we can reject the null hypothesis more resoundingly on the basis of the p value than the conventionally chosen � value of 1% or 5%.

One-Tail or Two-Tail t Test? Since, a priori, we expect the coefficient of the age variable to be positive, we should in fact use the one-tail t test here. The 5% critical t value for the one-tail test for 29 d.f. now becomes 1.699. Since the com- puted t value of about 14 is still so much greater than 1.699, we reject the null hypothesis and now conclude that the age of the antique clock positively impacts its bid price; the two-tail test, on the other hand, simply told us that age of the antique clock could have a positive or negative impact on its bid price. Therefore, be careful about how you formulate your null and alternative hypotheses. Let theory be the guide in choosing these hypotheses.

The Confidence Interval Approach to Hypothesis Testing

The basics of the confidence interval approach to hypothesis testing have already been discussed in Chapter 3. Here we merely illustrate it with our numerical example. We showed previously that

We also know from Eq. (4.39) that

t = b2 - B2 se(b2)

P(- 2.045 … t … 2.045) = 0.95

(-2.045 … t … 2.045) = 0.95

�/2 = 2.5%

106 PART ONE: THE LINEAR REGRESSION MODEL

guj75845_ch04.qxd 4/16/09 11:27 AM Page 106

If we substitute this t value into Equation (4.43), we obtain

Which, after rearranging becomes

(4.44)

which is a 95% confidence interval for B2 (cf. Eq. [3.26]). Recall that under the confidence interval approach, if the confidence interval, which we call the ac- ceptance region, includes the null-hypothesized value, we do not reject the null hypothesis. On the other hand, if the null-hypothesized value lies outside the confidence interval, that is, in the region of rejection, we can reject the null hy- pothesis. But always bear in mind that in making either decision we are taking a chance of being wrong �% (say, 5%) of the time.

For our illustrative example, Eq. (4.44) becomes

that is,

(4.45)

which is a 95% confidence interval for true B2. Since this interval does not include the null-hypothesized value, we can reject the null hypothesis: If we construct confidence intervals like expression (4.45), then 95 out of 100 such intervals will include the true B2, but, as noted in Chapter 3, we cannot say that the probability is 95% that the particular interval in Eq. (4.45) does or does not include the true B2.

Needless to say, we can use the two approaches to hypothesis testing to test hypotheses about any other coefficient given in the regression results for our illustrative example. As you can see from the regression results, the variable, number of bidders, is also statistically significant (i.e., significantly different from zero) because the estimated t value of about 8 has a p value of almost zero. Remember that the lower the p value, the greater the evidence against the null hypothesis.

4.8 TESTING THE JOINT HYPOTHESIS THAT B2 = B3 = 0 OR R2 = 0 For our illustrative example we saw that individually the partial slope coeffi- cients b2 and b3 are statistically significant; that is, individually each partial slope coefficient is significantly different from zero. But now consider the following null hypothesis:

(4.46)H0 : B2 = B3 = 0

10.8757 … B2 … 14.6069

12.7413 - 2.045(0.9123) … B2 … 12.7413 + 2.045(0.9123)

P[b2 - 2.045 se(b2) … B2 … b2 + 2.045 se(b2)] = 0.95

P a-2.045 … b2 - B2 se(b2)

… 2.045b = 0.95

CHAPTER FOUR: MULTIPLE REGRESSION: ESTIMATION AND HYPOTHESIS TESTING 107

guj75845_ch04.qxd 4/16/09 11:27 AM Page 107

This null hypothesis is a joint hypothesis that B2 and B3 are jointly or simultane- ously (and not individually or singly) equal to zero. This hypothesis states that the two explanatory variables together have no influence on Y. This is the same as saying that

(4.47)

That is, the two explanatory variables explain zero percent of the variation in the dependent variable (recall the definition of R2). Therefore, the two sets of hypotheses (4.46) and (4.47) are equivalent; one implies the other. A test of either hypothesis is called a test of the overall significance of the estimated multiple regression; that is, whether Y is linearly related to both X2 and X3.

How do we test, say, the hypothesis given in Equation (4.46)? The temptation here is to state that since individually b2 and b3 are statistically different from zero in the present example, then jointly or collectively they also must be statis- tically different from zero; that is, we reject H0 given in Eq. (4.46). In other words, since age of the antique clock and the number of bidders at the auction each has a significant effect on the auction price, together they also must have a significant effect on the auction price. But we should be careful here for, as we show more fully in Chapter 8 on multicollinearity, in practice, in a multiple re- gression one or more variables individually have no effect on the dependent variable but collectively they have a significant impact on it. This means that the t-testing procedure discussed previously, although valid for testing the statistical significance of an individual regression coefficient, is not valid for testing the joint hypothesis.

How then do we test a hypothesis like Eq. (4.46)? This can be done by using a technique known as analysis of variance (ANOVA). To see how this tech- nique is employed, recall the following identity:

(4.32)

That is,

(4.48)11

Equation (4.48) decomposes the TSS into two components, one explained by the (chosen) regression model (ESS) and the other not explained by the model (RSS). A study of these components of TSS is known as the analysis of variance (ANOVA) from the regression viewpoint.

As noted in Appendix C every sum of squares has associated with it its degrees of freedom (d.f.); that is, the number of independent observations on

ay 2 t = b2ayt x2t + b3ayt x3t + a e

2 t

TSS = ESS + RSS

H0 : R2 = 0

108 PART ONE: THE LINEAR REGRESSION MODEL

11This is Equation (4.35) written differently.

guj75845_ch04.qxd 4/16/09 11:27 AM Page 108

the basis of which the sum of squares is computed. Now each of the preceding sums of squares has these d.f.:

Sum of squares d.f.

TSS n − 1 (always, Why?) RSS n − 3 (three-variable model) ESS 2 (three-variable model)*

*An easy way to find the d.f. for ESS is to subtract the d.f. for RSS from the d.f. for TSS.

Let us arrange all these sums of squares and their associated d.f. in a tabular form, known as the ANOVA table, as shown in Table 4-1.

Now given the assumptions of the CLRM (and Assumption 4.7) and the null hypothesis: , it can be shown that the variable

(4.49)

follows the F distribution with 2 and d.f. in the numerator and denomi- nator, respectively. (See Appendix C for a general discussion of the F distribu- tion and Appendix D for some applications). In general, if the regression model has k explanatory variables including the intercept term, the F ratio has (k − 1) d.f. in the numerator and (n − k) d.f. in the denominator.12

How can we use the F ratio of Equation (4.49) to test the joint hypothesis that both X2 and X3 have no impact on Y? The answer is evident in Eq. (4.49). If the

(n - 3)

= (b2gytx2t + b3gytx3t)/2

ge2t /(n - 3)

= variance explained by X2 and X3

unexplained variance

F = ESS/d.f. RSS/d.f.

H0 : B2 = B3 = 0

CHAPTER FOUR: MULTIPLE REGRESSION: ESTIMATION AND HYPOTHESIS TESTING 109

ANOVA TABLE FOR THE THREE-VARIABLE REGRESSION

Source of variation Sum of squares (SS) d.f.

Due to regression (ESS) b2 yt x2t + b3 ytx3t 2

Due to residual (RSS) n − 3

Total (TSS) n − 1 Note: MSS = mean, or average, sum of squares.

gy 2t

ge2t n - 3

ge2t

b2g yt x2t + b3gyt x3t 2

gg

MSS = SSd.f.

TABLE 4-1

12A simple way to remember this is that the numerator d.f. of the F ratio is equal to the number of partial slope coefficients in the model, and the denominator d.f. is equal to n minus the total num- ber of parameters estimated (i.e., partial slopes plus the intercept).

guj75845_ch04.qxd 4/16/09 11:27 AM Page 109

numerator of Eq. (4.49) is larger than its denominator—if the variance of Y explained by the regression (i.e., by X2 and X3) is larger than the variance not ex- plained by the regression—the F value will be greater than 1. Therefore, as the variance explained by the X variables becomes increasingly larger relative to the unexplained variance, the F ratio will be increasingly larger, too. Thus, an increasingly large F value will be evidence against the null hypothesis that the two (or more) explanatory variables have no effect on Y.

Of course, this intuitive reasoning can be formalized in the usual framework of hypothesis testing. As shown in Appendix C, Section C.4, we compute F as given in Eq. (4.49) and compare it with the critical F value for 2 and (n - 3) d.f. at the chosen level of �, the probability of committing a type I error. As usual, if the computed F value exceeds the critical F value, we reject the null hypothesis that the impact of all explanatory variables is simultaneously equal to zero. If it does not exceed the critical F value, we do not reject the null hypothesis that the explanatory variables have no impact whatsoever on the dependent variable.

To illustrate the actual mechanics, let us return to our illustrative example. The numerical counterpart of Table 4-1 is given in Table 4-2.

The entries in this table are obtained from the EViews computer output given in Appendix 4A.4.13 From this table and the computer output, we see that the estimated F value is 118.0585, or about 119. Under the null hypothesis that B2 = B3 = 0, and given the assumptions of the classical linear regression model (CLRM), we know that the computed F value follows the F distribution with 2 and 29 d.f. in the numerator and denominator, respectively. If the null hypothe- sis were true, what would be the probability of our obtaining an F value of as much as 118 or greater for 2 and 13 d.f.? The p value of obtaining an F value of 118 or greater is 0.000000, which is practically zero. Hence, we can reject the null hypothesis that age and number of bidders together has no effect on the bid price of antique clocks.14

In our illustrative example it so happens that not only do we reject the null hypothesis that B2 and B3 are individually statistically insignificant, but we also

110 PART ONE: THE LINEAR REGRESSION MODEL

ANOVA TABLE FOR THE CLOCK AUCTION PRICE EXAMPLE

Source of variation Sum of squares (SS) d.f.

Due to regression (ESS) 4278295.3 2 4278295.3/2 Due to residual (RSS) 525462.2 29 525462.2/29

Total (TSS) 4803757.5 31 F = 2139147.6/18119.386 = 118.0585*

*Figures have been rounded.

MSS = SSd.f.

TABLE 4-2

13Unlike other software packages, EViews does not produce the ANOVA table, although it gives the F value. But it is very easy to construct this table, for EViews gives TSS and RSS from which ESS can be easily obtained.

14If you had chosen , the critical F value for 2 and 30 (which is close to 29) d.f. would be 5.39. The F value of 118 is obviously much greater than this critical value.

� = 1%

guj75845_ch04.qxd 4/16/09 11:27 AM Page 110

reject the hypothesis that collectively they are insignificant. However, this need not happen all the time. We will come across cases where not all explanatory variables individually have much impact on the dependent variable (i.e., some of the t values may be statistically insignificant) yet all of them collectively influ- ence the dependent variable (i.e., the F test will reject the null hypothesis that all partial slope coefficients are simultaneously equal to zero.) As we will see, this happens if we have the problem of multicollinearity, which we will discuss more in Chapter 8.

An Important Relationship between F and R 2

There is an important relationship between the coefficient of determination R2

and the F ratio used in ANOVA. This relationship is as follows:

(4.50)

where n = the number of observations and k = the number of explanatory vari- ables including the intercept.

Equation (4.50) shows how F and R2 are related. These two statistics vary directly. When R2 = 0 (i.e., no relationship between Y and the X variables), F is zero ipso facto. The larger R2 is, the greater the F value will be. In the limit when R2 = 1, the F value is infinite.

Thus the F test discussed earlier, which is a measure of the overall signifi- cance of the estimated regression line, is also a test of significance of R2; that is, whether R2 is different from zero. In other words, testing the null hypothesis Eq. (4.46) is equivalent to testing the null hypothesis that (the population) R2 is zero, as noted in Eq. (4.47).

One advantage of the F test expressed in terms of R2 is the ease of computa- tion. All we need to know is the R2 value, which is routinely computed by most regression programs. Therefore, the overall F test of significance given in Eq. (4.49) can be recast in terms of R2 as shown in Eq. (4.50), and the ANOVA Table 4-1 can be equivalently expressed as Table 4-3.

F = R2/(k - 1)

(1 - R2)/(n - k)

CHAPTER FOUR: MULTIPLE REGRESSION: ESTIMATION AND HYPOTHESIS TESTING 111

ANOVA TABLE IN TERMS OF R2

Source of variation Sum of squares (SS) d.f.

Due to regression (ESS) 2

Due to residual (RSS) n − 3

Total (TSS) n − 1 Note: In computing the F value, we do not need to multiply R2 and (1 − R2) by since it

drops out, as can be seen from Eq. (4.49). In the k-variable model the d.f. will be (k − 1) and (n − k), respectively.

gy 2i

gy 2i

(1 - R 2) Agy 2i B (n - 3)

(1 - R 2 ) Agy 2i B

R2 A gy 2i B 2

R 2 Agy 2i B

MSS = SSd.f.

TABLE 4-3

guj75845_ch04.qxd 4/16/09 11:27 AM Page 111

For our illustrative example, R2 = 0.8906. Therefore, the F ratio of Equation (4.50) becomes

(4.51)

which is about the same F as shown in Table 4-2, except for rounding errors. It is left for you to set up the ANOVA table for our illustrative example in the

manner of Table 4-3.

4.9 TWO-VARIABLE REGRESSION IN THE CONTEXT OF MULTIPLE REGRESSION: INTRODUCTION TO SPECIFICATION BIAS

Let us return to our example. In Example 2.5, we regressed auction price on the age of the antique clock and the number of bidders separately, as shown in Equations (2.27) and (2.28). These equations are reproduced here with the usual regression output.

(4.52)

(4.53)

If we compare these regressions with the results of the multiple regression given in Eq. (4.37), we see several differences:

1. The slope values in Equations (4.52) and (4.53) are different from those given in the multiple regression (4.37), especially that of the number of bidders variable.

2. The intercept values in the three regressions are also different. 3. The R2 value in the multiple regression is quite different from the r2

values given in the two bivariate regressions. In a bivariate regression, however, R2 and r2 are basically indistinguishable.

As we will show, some of these differences are statistically significant and some others may not be.

Why the differences in the results of the two regressions? Remember that in Eq. (4.37), while deriving the impact of age of the antique clock on the auction price, we held the number of bidders constant, whereas in Eq. (4.52) we simply neglected the number of bidders. Put differently, in Eq. (4.37) the effect of a clock’s age on auction price is net of the effect, or influence, of the number of bidders, whereas in Eq. (4.52) the effect of the number of bidders has not been netted out. Thus, the coefficient of the age variable in Eq. (4.52) reflects the gross

t = (3.4962) (2.3455) r2 = 0.1549; F = 5.5017se = (231.9501) (23.5724) NYi = 807.9501 + 54.5724 Bidders

t = ( - 0.7248) (5.8457) r2 = 0.5325; F = 34.1723se = (264.4393) + (1.7937) NYi = - 191.6662 + 10.4856 Agei

F = 0.8906/2

(1 - 0.8906)/29 « 118.12

112 PART ONE: THE LINEAR REGRESSION MODEL

guj75845_ch04.qxd 4/16/09 11:27 AM Page 112

effect—the direct effect of age as well as the indirect effect of the number of bidders. This difference between the results of regressions (4.37) and (4.52) shows very nicely the meaning of the “partial” regression coefficient.

We saw in our discussion of regression (4.37) that both the age of the clock and the number of bidders variables were individually as well as collectively im- portant influences on the auction price. Therefore, by omitting the number of bidders variable from regression (4.52) we have committed what is known as a (model) specification bias or specification error, more specifically, the specifi- cation error of omitting a relevant variable from the model. Similarly, by omit- ting the age of the clock from regression (4.53), we also have committed a spec- ification error.

Although we will examine the topic of specification errors in Chapter 7, what is important to note here is that you should be very careful in developing a regression model for empirical purposes. Take whatever help you can from the underlying theory and/or prior empirical work in developing the model. And once you choose a model, do not drop variables from the model arbitrarily.

4.10 COMPARING TWO R2 VALUES:THE ADJUSTED R2

By examining the R2 values of our two-variable (Eq. [4.52] or Eq. [4.53]) and three-variable (Eq. [4.37]) regressions for our illustrative example, you will no- tice that the R2 value of the former (0.5325 for Eq. [4.52] or 0.1549 for Eq. [4.53]) is smaller than that of the latter (0.8906). Is this always the case? Yes! An impor- tant property of R2 is that the larger the number of explanatory variables in a model, the higher the R2 will be. It would then seem that if we want to explain a substantial amount of the variation in a dependent variable, we merely have to go on adding more explanatory variables!

However, do not take this “advice” too seriously because the definition of R2 = ESS/TSS does not take into account the d.f. Note that in a k-variable model including the intercept term the d.f. for ESS is (k - 1). Thus, if you have a model with 5 explanatory variables including the intercept, the d.f. associated with ESS will be 4, whereas if you had a model with 10 explanatory variables includ- ing the intercept, the d.f. for the ESS would be 9. But the conventional R2 for- mula does not take into account the differing d.f. in the various models. Note that the d.f. for TSS is always (n - 1). (Why?) Therefore, comparing the R2 values of two models with the same dependent variable but with differing numbers of explanatory variables is essentially like comparing apples and oranges.

Thus, what we need is a measure of goodness of fit that is adjusted for (i.e., takes into account explicitly) the number of explanatory variables in the model. Such a measure has been devised and is known as the adjusted R2, denoted by the symbol, . This can be derived from the conventional R2 (see Appendix 4A.3) as follows:

(4.54)R2 = 1 - (1 - R2) n - 1 n - k

R2R2

CHAPTER FOUR: MULTIPLE REGRESSION: ESTIMATION AND HYPOTHESIS TESTING 113

guj75845_ch04.qxd 4/16/09 11:27 AM Page 113

Note that the R2 we have considered previously is also known as the unadjusted R2 for obvious reasons.

The features of the adjusted R2 are:

1. If ; that is, as the number of explanatory variables increases in a model, the adjusted R2 becomes increasingly smaller than the unadjusted R2. There seems to be a “penalty” involved in adding more explanatory variables to a regression model.

2. Although the unadjusted R2 is always positive, the adjusted R2 can on occasion turn out to be negative. For example, in a regression model involving k = 3 and n = 30, if an R2 is found to be 0.06, R2 can be negative (−0.0096).

At present, most computer regression packages compute both the adjusted and unadjusted R2 values. This is a good practice, for the adjusted R2 will en- able us to compare two regressions that have the same dependent variable but a different number of explanatory variables.15 Even when we are not comparing two regression models, it is a good practice to find the adjusted R2 value because it explicitly takes into account the number of variables included in a model.

For our illustrative example, you should verify that the adjusted R2 value is 0.8830, which, as expected, is smaller than the unadjusted R2 value of 0.8906. The adjusted R2 values for regressions (4.52) and (4.53) are 0.5169 and 0.1268, respectively, which are slightly lower than the corresponding unadjusted R2

values.

4.11 WHEN TO ADD AN ADDITIONAL EXPLANATORY VARIABLE TO A MODEL

In practice, in order to explain a particular phenomenon, we are often faced with the problem of deciding among several competing explanatory variables. The common practice is to add variables as long as the adjusted R2 increases (even though its numerical value may be smaller than the unadjusted R2). But when does adjusted R2 increase? It can be shown that will increase if the (absolute t) value of the coefficient of the added variable is larger than 1, where the t value is computed under the null hypothesis that the population value of the said coeffi- cient is zero.16

To see this all clearly, let us first regress auction price on a constant only, then on a constant and the age of the clock, and then on a constant, the age of the clock, and the number of bidders. The results are given in Table 4-4.

|t|R2

k 7 1, R2 … R2

114 PART ONE: THE LINEAR REGRESSION MODEL

15As we will see in Chapter 5, if two regressions have different dependent variables, we cannot compare their R2 values directly, adjusted or unadjusted.

16Whether or not a particular t value is significant, the adjusted R2 will increase so long as the of the coefficient of the added variable is greater than 1.

ƒ t ƒ

guj75845_ch04.qxd 4/16/09 11:27 AM Page 114

Some interesting facts stand out in this exercise:

1. When we regress auction price on the intercept only, the R2, , and F val- ues are all zero, as we would expect. But what does the intercept value represent here? It is nothing but the (sample) mean value of auction price. One way to check on this is to look at Eq. (2.16). If there is no X variable in this equation, the intercept is equal to the mean value of the dependent variable.

2. When we regress auction price on a constant and the age of the antique clock, we see that the t value of the age variable is not only greater than 1, but it is also statistically significant. Unsurprisingly, both R2 and values increase (although the latter is somewhat smaller than the former). But notice an interesting fact. If you square the t value of 5.8457, we get (5.8457)2 = 34.1722, which is about the same as the F value of 34.1723 shown in Table 4-4. Is this surprising? No, because in Equation (C.15) in Appendix C we state that

(4.55) (C.15)

That is, the square of the t statistic with k d.f. is equal to the F statistic with 1 d.f. in the numerator and k d.f. in the denominator. In our example, k = 30 (32 observations − 2, the two coefficients estimated in model [2]). The numerator d.f. is 1, because we have only one explana- tory variable in this model.

3. When we regress auction price on a constant and the number of bidders, we see that the t value of the latter is 2.3455. If you square this value, you will get (2.3455)2 = 5.5013, which is about the same as the F value shown in Table 4-4, which again verifies Eq. (4.55). Since the t value is greater than 1, both R2 and values have increased. The computed t value is also statistically significant, suggesting that the number of bidders vari- able should be added to model (1). A similar conclusion holds for model (2).

R2

�t2k = F1,k

R2

R2

CHAPTER FOUR: MULTIPLE REGRESSION: ESTIMATION AND HYPOTHESIS TESTING 115

A COMPARISON OF FOUR MODELS OF ANTIQUE CLOCK AUCTION PRICES

Dependent variable Intercept Age # of Bidders R2 F

Auction price 1328.094 — — 0.00 0.00 0 (1) (19.0850)

Auction price −191.6662 10.4856 — 0.5325 0.5169 34.1723 (2) (−0.7248) (5.8457)

Auction price 807.9501 — 54.5724 0.1549 0.1268 5.5017 (3) (3.4962) (2.3455)

Auction price −1336.049 12.7413 85.7640 0.8906 0.8830 118.0585 (4) (−7.6226) (13.9653) (9.7437)

Note: Figures in the parentheses are the estimated t values under the null hypothesis that the corresponding population values are zero.

R2

TABLE 4-4

guj75845_ch04.qxd 4/16/09 11:27 AM Page 115

4. How do we decide if it is worth adding both age and number of bid- ders together to model (1)? We have already answered this question with the help of the ANOVA technique and the attendant F test. In Table 4.2 we showed that one could reject the hypothesis that B2 = B3 = 0; that is, the two explanatory variables together have no impact on the auction bid price.17

4.12 RESTRICTED LEAST SQUARES

Let us take another look at the regressions given in Table 4-4. There we saw the consequences of omitting relevant variables from a regression model. Thus, in regression (1) shown in this table we regressed antique clock auction price on the intercept only, which gave an R2 value of 0, which is not surprising. Then in regression (4) we regressed auction price on the age of the antique clock as well as on the number of bidders present at the auction, which gave an R2 value of 0.8906. On the basis of the F test we concluded that there was a specification error and that both the explanatory variables should be added to the model.

Let us call regression (1) the restricted model because it implicitly assumes that the coefficients of the age of the clock and the number of bidders are zero; that is, these variables do not belong in the model (i.e., B2 = B3 = 0). Let us call regression (4) the unrestricted model because it includes all the relevant variables. Since (1) is a restricted model, when we estimate it by OLS, we call it restricted least squares (RLS). Since (4) is an unrestricted model, when we estimate it by OLS, we call it unrestricted least squares (URLS). All the models we have esti- mated thus far have been essentially URLS, for we have assumed that the model being estimated has been correctly specified and that we have included all the relevant variables in the model. In Chapter 7 we will see the conse- quences of violating this assumption.

The question now is: How do we decide between RLS and URLS? That is, how do we find out if the restrictions imposed by a model, such as (1) in the pre- sent instance, are valid? This question can be answered by the F test. For this purpose, let denote the R2 value obtained from the restricted model and denote the R2 value obtained from the unrestricted model. Now assuming that the error term ui is normally distributed, it can be shown that

(4.56)

follows the F distribution with m and (n - k ) d.f. in the numerator and denom- inator, respectively, where obtained from the restricted regression,R2r = R2

F = AR2ur - R2r B/m

A1 - R2ur B/(n - k) ' Fm,n-k

R2urR 2 r

116 PART ONE: THE LINEAR REGRESSION MODEL

17Suppose you have a model with four explanatory variables. Initially you only include two of these variables but then you want to find out if it is worth adding two more explanatory variables. This can be handled by an extension of the F test. For details, see Gujarati and Porter, Basic Econometrics, 5th ed., McGraw-Hill, New York, 2009, pp. 243–246.

guj75845_ch04.qxd 4/16/09 11:27 AM Page 116

obtained from the unrestricted regression, m = number of restrictions imposed by the restricted regression (two in our example), n = number of observations in the sample, and k = number of parameters estimated in the un- restricted regression (including the intercept). The null hypothesis tested here is that the restrictions imposed by the restricted model are valid. If the F value estimated from Equation (4.56) exceeds the critical F value at the chosen level of significance, we reject the restricted regression. That is, in this situation, the restrictions imposed by the (restricted) model are not valid.

Returning to our antique clock auction price example, putting the appropri- ate values in Eq. (4.56) from Table 4-4, we obtain:

(4.57)

The probability of such an F value is extremely small. Therefore, we reject the restricted regression. More positively, age of the antique clock as well as the num- ber of bidders at auction both have a statistically significant impact on the auction price.

The formula (4.56) is of general application. The only precaution to be taken in its application is that in comparing the restricted and unrestricted regres- sions, the dependent variables must be in the same form. If they are not, we have to make them comparable using the method discussed in Chapter 5 (see Problem 5.16) or use an alternative that is discussed in Exercise 4.20.

4.13 ILLUSTRATIVE EXAMPLES

To conclude this chapter, we consider several examples involving multiple regressions. Our objective here is to show you how multiple regression models are used in a variety of applications.

Example 4.1. Does Tax Policy Affect Corporate Capital Structure?

To find out the extent to which tax policy has been responsible for the recent trend in U.S. manufacturing toward increasing use of debt capital in lieu of eq- uity capital—that is, toward an increasing debt/equity ratio (called leverage in the financial literature)—Pozdena estimated the following regression model:18

(4.58)

where Y = the leverage (= debt/equity) in percent X2 = the corporate tax rate X3 = the personal tax rate X4 = the capital gains tax rate X5 = nondebt-related tax shields X6 = the inflation rate

Yt = B1 + B2 X2t + B3 X3t + B4 X4t + B5 B X5t + B6 X6t + ut

F = (0.890 - 0)/2

(1 - 0.890)/(32 - 3) =

0.445 0.00379

= 117.414

R2ur = R2

CHAPTER FOUR: MULTIPLE REGRESSION: ESTIMATION AND HYPOTHESIS TESTING 117

18Randall Johnston Pozdena, “Tax Policy and Corporate Capital Structure,” Economic Review, Federal Reserve Bank of San Francisco, Fall 1987, pp. 37–51.

guj75845_ch04.qxd 4/16/09 11:27 AM Page 117

Economic theory suggests that coefficients B2, B4, and B6 will be positive and coefficients B3 and B5 will be negative.

19 Based on the data for U.S. manufactur- ing corporations for the years 1935 to 1982, Pozdena obtained the OLS results that are presented in tabular form (Table 4-5) rather than in the usual format (e.g., Eq. [4.37]). (Results are sometimes presented in this form for ease of reading.)

Discussion of Regression Results

The first fact to note about the preceding regression results is that all the coeffi- cients have signs according to prior expectations. For instance, the corporate tax rate has a positive effect on leverage. Holding other things the same, as the cor- porate tax rate goes up by 1 percentage point, on the average, the leverage ratio (i.e., the debt/equity ratio) goes up by 2.4 percentage points. Likewise, if the in- flation rate goes up by 1 percentage point, on the average, leverage goes up by 1.4 percentage points, other things remaining the same. (Question: Why would you expect a positive relation between leverage and inflation?) Other partial re- gression coefficients should be interpreted similarly.

Since the t values are presented underneath each partial regression coefficient under the null hypothesis that each population partial regression coefficient is

118 PART ONE: THE LINEAR REGRESSION MODEL

LEVERAGE IN MANUFACTURING CORPORATIONS, 1935–1982

Coefficient Explanatory variable (t value in parentheses)

Corporate tax rate 2.4 (10.5)

Personal tax rate −1.2 (−4.8)

Capital gains tax rate 0.3 (1.3)

Non-debt-related tax shield −2.4 (−4.8)

Inflation rate 1.4 (3.0)

n = 48 (number of observations) R2 = 0.87

= 0.85

Notes: 1. The author does not present the estimated intercept. 2. The adjusted R2 is calculated using Eq. (4.54). 3. The standard errors of the various coefficients can be obtained

by dividing the coefficient value by its t value (e.g., 2.4/10.5 = 0.2286 is the se of the corporate tax rate coefficient).

Source: Randall Johnston Pozdena, “Tax Policy and Corporate Capital Structure,” Economic Review, Federal Reserve Bank of San Francisco, Fall 1987, Table 1, p. 45 (adapted).

R2

TABLE 4-5

19See Pozdena’s article (footnote 18) for the theoretical discussion of expected signs of the various coefficients. In the United States the interest paid on debt capital is tax deductible, whereas the in- come paid as dividends is not. This is one reason that corporations may prefer debt to equity capital.

guj75845_ch04.qxd 4/16/09 11:27 AM Page 118

individually equal to zero, we can easily test whether such a null hypothesis stands up against the (two-sided) alternative hypothesis that each true popula- tion coefficient is different from zero. Hence, we use the two-tail t test. The d.f. in this example are 42, which are obtained by subtracting from n (= 48) the number of parameters estimated, which are 6 in the present instance. (Note: The intercept value is not presented in Table 4-5, although it was estimated.) If we choose or 5%, the two-tail critical t value is about 2.021 for 40 d.f. (Note: This is good enough for present purposes since the t table does not give the precise t value for 42 d.f.) If is fixed at 0.01 or a 1% level, the critical t value for 40 d.f. is 2.704 (two-tail). Looking at the t values presented in Table 4-5, we see that each partial regression coefficient, except that of the capital gains tax variable, is statistically significantly different from zero at the 1% level of significance. The coefficient of the capital gains tax variable is not significant at either the 1% or 5% level. Therefore, except for this variable, we can reject the individual null hypothesis that each partial regression coefficient is zero. In other words, all but one of the explanatory variables individually has an impact on the debt/equity ratio. In passing, note that if an estimated coefficient is statisti- cally significant at the 1% level, it is also significant at the 5% level, but the converse is not true.

What about the overall significance of the estimated regression line? That is, do we reject the null hypothesis that all partial slopes are simultaneously equal to zero or, equivalently, is R2 = 0? This hypothesis can be easily tested by using Eq. (4.50), which in the present case gives

(4.59)

This F value has an F distribution with 5 and 42 d.f. If is set at 0.05, the F table (Appendix E, Table E-3) shows that for 5 and 40 d.f. (the table has no exact value of 42 d.f. in the denominator), the critical F value is 2.45. The corresponding value at is 3.51. The computed F of L 56 far exceeds either of these critical F values. Therefore, we reject the null hypothesis that all partial slopes are simultaneously equal to zero or, alternatively, R2 = 0. Collectively, all five explanatory variables influence the dependent variable. Individually, however, as we have seen, only four variables have an impact on the dependent variable, the debt/equity ratio. Example 4.1 again underscores the point made earlier that the (individual) t test and the (joint) F test are quite different.20

� = 0.01

= 56.22

= 0.87/5

0.13/42

F = R2/(k - 1)

(1 - R2)/(n - k)

� = 0.05

CHAPTER FOUR: MULTIPLE REGRESSION: ESTIMATION AND HYPOTHESIS TESTING 119

20In the two-variable linear regression model, as noted before, ; that is, the square of a t value with k d.f. is equal to an F value with 1 d.f. in the numerator and k d.f. in the denominator.

t2k = F1,k

guj75845_ch04.qxd 4/16/09 11:27 AM Page 119

Example 4.2. The Demand for Imports in Jamaica

To explain the demand for imports in Jamaica, J. Gafar21 obtained the fol- lowing regression based on annual data for 19 years:

(4.60)

where Y = quantity of imports X2 = personal consumption expenditure X3 = import price/domestic price

Economic theory would suggest a positive relationship between Y and X2 and a negative relationship between Y and X3, which turns out to be the case. Individually, the coefficient of X2 is statistically significant but that of X3 is not at, say, the 5% level. But since the absolute t value of X3 is greater than 1, for this example will drop if X3 is dropped from the model. (Why?) Together, X2 and X3 explain about 96 percent of the variation in the quantity of imports into Jamaica.

Example 4.3. The Demand for Alcoholic Beverages in the United Kingdom

To explain the demand for alcoholic beverages in the United Kingdom, T. McGuinness22 estimated the following regression based on annual data for 20 years:

(4.61)

where Y = the annual change in pure alcohol consumption per adult X2 = the annual change in the real price index of alcoholic drinks X3 = the annual change in the real disposable income per person

X4 =

X5 = the annual change in real advertising expenditure on alcoholic drinks per adult

Theory would suggest that all but the variable X2 will be positively related to Y. This is borne out by the results, although not all coefficients are

the annual change in the number of licensed premises

the adult population

NYt = - 0.014 - 0.354X2t + 0.0018X3t + 0.657X4t + 0.0059X5t se = (0.012) (0.2688) (0.0005) (0.266) (0.0034)

t = ( - 1.16) (1.32) (3.39) (2.47) (1.73) R2 = 0.689

R2

t = (21.74) (- 1.1904) R2 = 0.955se = (0.0092) (0.084) R2 = 0.96 NYt = - 58.9 + 0.20X2t - 0.10X3t

120 PART ONE: THE LINEAR REGRESSION MODEL

21J. Gafar, “Devaluation and the Balance of Payments Adjustment in a Developing Economy: An Analysis Relating to Jamaica,” Applied Economics, vol. 13, 1981, pp. 151–165. Notations were adapted. Adjusted R2 computed.

22T. McGuinness, “An Econometric Analysis of Total Demand for Alcoholic Beverages in the United Kingdom,” Journal of Industrial Economics, vol. 29, 1980, pp. 85–109. Notations were adapted.

guj75845_ch04.qxd 4/16/09 11:27 AM Page 120

individually statistically significant. For 15 d.f. (Why?), the 5% critical t value is 1.753 (one-tail) and 2.131 (two-tail). Consider the coefficient of X5, the change in advertising expenditure. Since the advertising expenditure and the demand for alcoholic beverages are expected to be positive (otherwise, it is bad news for the advertising industry), we can entertain the hypothesis that

and therefore use the one-tail t test. The computed t value of 1.73 is very close to being significant at the 5% level.

It is left as an exercise for you to compute the F value for this example to test the hypothesis that all partial slope coefficients are simultaneously equal to zero.

Example 4.4. Civilian Labor Force Participation Rate, Unemployment Rate, and Average Hourly Earnings Revisited

In Chapter 1 we presented regression (1.5) without discussing the statistical significance of the results. Now we have the necessary tools to do that. The complete regression results are as follows:

se = (3.4040) (0.0715) (0.4148)

t = (23.88) (−8.94) (−3.50) (4.62) p value = (0.000)* (0.000)* (0.002)

As these results show, each of the estimated regression coefficients is indi- vidually statistically highly significant, because the p values are so small. That is, each coefficient is significantly different from zero. Collectively, both CUNR and AHE82 are also highly statistically significant, because the p value of the computed F value (for 2 and 25 d.f.) of 41.09 is extremely low.

As expected, the civilian unemployment rate has a negative relationship to the civilian labor force participation rate, suggesting that perhaps the discouraged-worker effect dominates the added-worker hypothesis. The the- oretical reasoning behind this has already been explained in Chapter 1. The negative value of AHE82 suggests that perhaps the income effect dominates the substitution effect.

Example 4.5. Expenditure on Education in 38 Countries:23

Based on data taken from a sample of 38 countries (see Table 4-6, found on the textbook’s Web site), we obtained the following regression:

Educi = 414.4583 + 0.0523GDPi - 50.0476 Pop

R2 = 0.767; R2 = 0.748; F = 41.09

CLFPRt = 81.2267 - 0.6384CUNRt - 1.4449 AHE82t

H0 : B5 = 0 vs. H1 : B5 7 0

CHAPTER FOUR: MULTIPLE REGRESSION: ESTIMATION AND HYPOTHESIS TESTING 121

23The data used in this exercise are from Gary Koop, Introduction to Econometrics, John Wiley & Sons, England, 2008 and can be found on the following Web site: www.wileyeurope.com/ college/koop.

*Denotes extremely small value.

guj75845_ch04.qxd 4/16/09 11:27 AM Page 121

se = (266.4583) (0.0018) ( 9.9581) t = (1.5538) (28.2742) (-5.0257)

p value = (0.1292) (0.0000) (0.0000)

where Educ = expenditure on education (millions of U.S. dollars), GDP = gross domestic product (millions of U.S. dollars), and Pop = population (mil- lions of people). As you can see from the data, the sample includes a variety of countries in different stages of economic development.

It can be readily assessed that the GDP and Pop variables are individually highly significant, although the sign of the population variable may be puz- zling. Since the estimated F is so highly significant, collectively the two vari- ables have a significant impact on expenditure on education. As noted, the variables are also individually significant.

The R2 and adjusted square values are quite high, which is unusual in a cross-section sample of diverse countries.

We will explore these data further in later chapters.

4.14 SUMMARY

In this chapter we considered the simplest of the multiple regression models, namely, the three-variable linear regression model—one dependent variable and two explanatory variables. Although in many ways a straightforward extension of the two-variable linear regression model, the three-variable model introduced several new concepts, such as partial regression coefficients, adjusted and unad- justed multiple coefficient of determination, and multicollinearity.

Insofar as estimation of the parameters of the multiple regression coeffi- cients is concerned, we still worked within the framework of the classical linear regression model and used the method of ordinary least squares (OLS). The OLS estimators of multiple regression, like the two-variable model, possess several desirable statistical properties summed up in the Gauss-Markov prop- erty of best linear unbiased estimators (BLUE).

With the assumption that the disturbance term follows the normal distri- bution with zero mean and constant variance , we saw that, as in the two- variable case, each estimated coefficient in the multiple regression follows the normal distribution with a mean equal to the true population value and the variances given by the formulas developed in the text. Unfortunately, in prac- tice, is not known and has to be estimated. The OLS estimator of this unknown variance is . But if we replace by , then, as in the two-variable case, each estimated coefficient of the multiple regression follows the t distribu- tion, not the normal distribution.

The knowledge that each multiple regression coefficient follows the t distribution with d.f. equal to , where k is the number of parameters esti- mated (including the intercept), means we can use the t distribution to test

(n - k)

�N 2�2�N 2 �2

�2

R2

R2 = 0.9616; R2 = 0.9594; F = 439.22; p value of F = 0.000

122 PART ONE: THE LINEAR REGRESSION MODEL

guj75845_ch04.qxd 4/16/09 11:27 AM Page 122

statistical hypotheses about each multiple regression coefficient individually. This can be done on the basis of either the t test of significance or the confidence interval based on the t distribution. In this respect, the multiple regression model does not differ much from the two-variable model, except that proper allowance must be made for the d.f., which now depend on the number of para- meters estimated.

However, when testing the hypothesis that all partial slope coefficients are simultaneously equal to zero, the individual t testing referred to earlier is of no help. Here we should use the analysis of variance (ANOVA) technique and the attendant F test. Incidentally, testing that all partial slope coefficients are simul- taneously equal to zero is the same as testing that the multiple coefficient of determination R2 is equal to zero. Therefore, the F test can also be used to test this latter but equivalent hypothesis.

We also discussed the question of when to add a variable or a group of variables to a model, using either the t test or the F test. In this context we also discussed the method of restricted least squares.

All the concepts introduced in this chapter have been illustrated by numerical examples and by concrete economic applications.

KEY TERMS AND CONCEPTS

The key terms and concepts introduced in this chapter are

CHAPTER FOUR: MULTIPLE REGRESSION: ESTIMATION AND HYPOTHESIS TESTING 123

Multiple regression model Partial regression coefficients;

partial slope coefficients Multicollinearity Collinearity; exact linear

relationship a) high or near perfect collinearity

Multiple coefficient of determination, R2

Coefficient of multiple correlation, R Individual hypothesis testing

Joint hypothesis testing or test of overall significance of estimated multiple regression a) analysis of variance (ANOVA) b) F test

Model specification bias (specification error)

Adjusted R2 ( ) Restricted least squares (RLS) Unrestricted least squares (URLS) Relationship between t and F tests

R2

QUESTIONS

4.1. Explain carefully the meaning of a. Partial regression coefficient b. Coefficient of multiple determination, R2

c. Perfect collinearity d. Perfect multicollinearity e. Individual hypothesis testing f. Joint hypothesis testing g. Adjusted R2

guj75845_ch04.qxd 4/16/09 11:27 AM Page 123

4.2. Explain step by step the procedure involved in a. Testing the statistical significance of a single multiple regression coeffi-

cient. b. Testing the statistical significance of all partial slope coefficients.

4.3. State with brief reasons whether the following statements are true (T), false (F), or uncertain (U). a. The adjusted and unadjusted R2s are identical only when the unadjusted R2

is equal to 1. b. The way to determine whether a group of explanatory variables exerts

significant influence on the dependent variable is to see if any of the explanatory variables has a significant t statistic; if not, they are statistically insignificant as a group.

c. When R2 = 1, F = 0, and when R2 = 0, F = infinite. d. When the d.f. exceed 120, the 5% critical t value (two-tail) and the 5% criti-

cal Z (standard normal) value are identical, namely, 1.96. *e. In the model , if X2 and X3 are negatively cor-

related in the sample and , omitting X3 from the model will bias b12 downward [i.e., ] where b12 is the slope coefficient in the regres- sion of Y on X2 alone.

f. When we say that an estimated regression coefficient is statistically signifi- cant, we mean that it is statistically different from 1.

g. To compute a critical t value, we need to know only the d.f. h. By the overall significance of a multiple regression we mean the statistical

significance of any single variable included in the model. i. Insofar as estimation and hypothesis testing are concerned, there is no dif-

ference between simple regression and multiple regression. j. The d.f. of the total sum of squares (TSS) are always regardless of the

number of explanatory variables included in the model. 4.4. What is the value of in each of the following cases?

a. n = 25, k = 4 (including intercept) b. n = 14, k = 3 (excluding intercept)

4.5. Find the critical t value(s) in the following situations:

Degrees of freedom Level of significance (d.f.) (%) H0

12 5 Two-tail 20 1 Right-tail 30 5 Left-tail

200 5 Two-tail

4.6. Find the critical F values for the following combinations:

Numerator d.f. Denominator d.f. Level of significance (%)

5 5 5 4 19 1

20 200 5

ge2i = 1220, ge2i = 880,

�N 2

(n - 1)

E(b12) 6 B2 B3 7 0

Yi = B1 + B2X2i + B3X3i + ui

124 PART ONE: THE LINEAR REGRESSION MODEL

* Optional.

guj75845_ch04.qxd 4/16/09 11:27 AM Page 124

PROBLEMS

4.7. You are given the following data:

Y X2 X3

1 1 2 3 2 1 8 3 −3

Based on these data, estimate the following regressions (Note: Do not worry about estimating the standard errors): a. Yi = A1 + A2X2i + ui b. Yi = C1 + C3X3i + ui c. Yi = B1 + B2X2i + B3X3i + ui d. Is A2 = B2? Why or why not? e. Is C3 = B3? Why or why not? What general conclusion can you draw from this exercise?

4.8. You are given the following data based on 15 observations:

where lowercase letters, as usual, denote deviations from sample mean values. a. Estimate the three multiple regression coefficients. b. Estimate their standard errors. c. Obtain R2 and . d. Estimate 95% confidence intervals for B2 and B3. e. Test the statistical significance of each estimated regression coefficient using

(two-tail). f. Test at that all partial slope coefficients are equal to zero. Show the

ANOVA table. 4.9. A three-variable regression gave the following results:

Sum of squares Mean sum of Source of variation (SS) d.f. squares (MSS)

Due to regression (ESS) 65,965 — — Due to residual (RSS) — — — Total (TSS) 66,042 14

a. What is the sample size? b. What is the value of the RSS? c. What are the d.f. of the ESS and RSS? d. What is R2? And ? e. Test the hypothesis that X2 and X3 have zero influence on Y. Which test do

you use and why? f. From the preceding information, can you determine the individual contri-

bution of X2 and X3 toward Y? 4.10. Recast the ANOVA table given in problem 4.9 in terms of R2.

R2

� = 5% � = 5%

R2

ayix3i = 4,250.9; ax2ix3i = 4,796.0 ax

2 2i = 84,855.096; ax

2 3i = 280.0; ayix2i = 74,778.346

Y = 367.693; X2 = 402.760; X3 = 8.0; ay 2 i = 66,042.269

CHAPTER FOUR: MULTIPLE REGRESSION: ESTIMATION AND HYPOTHESIS TESTING 125

guj75845_ch04.qxd 4/16/09 11:27 AM Page 125

4.11. To explain what determines the price of air conditioners, B. T. Ratchford24 ob- tained the following regression results based on a sample of 19 air conditioners:

se = (0.005) (8.992) (3.082)

where Y = the price, in dollars X2 = the BTU rating of air conditioner X3 = the energy efficiency ratio X4 = the number of settings se = standard errors

a. Interpret the regression results. b. Do the results make economic sense? c. At , test the hypothesis that the BTU rating has no effect on the

price of an air conditioner versus that it has a positive effect. d. Would you accept the null hypothesis that the three explanatory variables

explain a substantial variation in the prices of air conditioners? Show clearly all your calculations.

4.12. Based on the U.S. data for 1965-IQ to 1983-IVQ , James Doti and Esmael Adibi25 obtained the following regression to explain personal con- sumption expenditure (PCE) in the United States.

= −10.96 + 0.93X2t − 2.09X3t t = (−3.33)(249.06) (−3.09) R2 = 0.9996

F = 83,753.7

where Y = the PCE ($, in billions) X2 = the disposable (i.e., after-tax) income ($, in billions) X3 = the prime rate (%) charged by banks

a. What is the marginal propensity to consume (MPC)—the amount of additional consumption expenditure from an additional dollar’s personal disposable income?

b. Is the MPC statistically different from 1? Show the appropriate testing procedure.

c. What is the rationale for the inclusion of the prime rate variable in the model? A priori, would you expect a negative sign for this variable?

d. Is b3 significantly different from zero? e. Test the hypothesis that R2 = 0. f. Compute the standard error of each coefficient.

NYt

(n = 76)

� = 5%

NYi = - 68.236 + 0.023X2i + 19.729X3i + 7.653X4iR2 = 0.84

126 PART ONE: THE LINEAR REGRESSION MODEL

24B. T. Ratchford, “The Value of Information for Selected Appliances,” Journal of Marketing Research, vol. 17, 1980, pp. 14–25. Notations were adapted.

25James Doti and Esmael Adibi, Econometric Analysis: An Applications Approach, Prentice-Hall, Englewood Cliffs, N.J., 1988, p. 188. Notations were adapted.

guj75845_ch04.qxd 4/16/09 11:27 AM Page 126

4.13. In the illustrative Example 4.2 given in the text, test the hypothesis that X2 and X3 together have no influence on Y. Which test will you use? What are the assumptions underlying that test?

4.14. Table 4-7 (found on the textbook’s Web site) gives data on child mortality (CM), female literacy rate (FLR), per capita GNP (PGNP), and total fertility rate (TFR) for a group of 64 countries. a. A priori, what is the expected relationship between CM and each of the

other variables? b. Regress CM on FLR and obtain the usual regression results. c. Regress CM on FLR and PGNP and obtain the usual results. d. Regress CM on FLR, PGNP, and TFR and obtain the usual results. Also

show the ANOVA table. e. Given the various regression results, which model would you choose and

why? f. If the regression model in (d) is the correct model, but you estimate (a) or (b)

or (c), what are the consequences? g. Suppose you have regressed CM on FLR as in (b). How would you decide

if it is worth adding the variables PGNP and TFR to the model? Which test would you use? Show the necessary calculations.

4.15. Use formula (4.54) to answer the following question:

Value of R2 n k

0.83 50 6 — 0.55 18 9 — 0.33 16 12 — 0.12 1,200 32 —

What conclusion do you draw about the relationship between R2 and ? 4.16. For Example 4.3, compute the F value. If that F value is significant, what does

that mean? 4.17. For Example 4.2, set up the ANOVA table and test that R2 = 0. Use . 4.18. Refer to the data given in Table 2-12 (found on the textbook’s Web site) to

answer the following questions: a. Develop a multiple regression model to explain the average starting pay of

MBA graduates, obtaining the usual regression output. b. If you include both GPA and GMAT scores in the model, a priori, what

problem(s) may you encounter and why? c. If the coefficient of the tuition variable is positive and statistically signifi-

cant, does that mean it pays to go to the most expensive business school? What might the tuition variable be a proxy for?

d. Suppose you regress GMAT score on GPA and find a statistically significant positive relationship between the two. What can you say about the prob- lem of multicollinearity?

e. Set up the ANOVA table for the multiple regression in part (a) and test the hypothesis that all partial slope coefficients are zero.

f. Do the ANOVA exercise in part (e), using the R2 value.

� = 1%

R2

R 2

CHAPTER FOUR: MULTIPLE REGRESSION: ESTIMATION AND HYPOTHESIS TESTING 127

guj75845_ch04.qxd 4/16/09 11:27 AM Page 127

4.19. Figure 4-1 gives you the normal probability plot for Example 4.4. a. From this figure, can you tell if the error term in Eq. (4.62) follows the

normal distribution? Why or why not? b. Is the observed Anderson-Darling A2 value of 0.468 statistically significant?

If it is, what does that mean? If it is not, what conclusion do you draw? c. From the given data, can you identify the mean and variance of the error

term? 4.20. Restricted least squares (RLS). If the dependent variables in the restricted and

unrestricted regressions are not the same, you can use the following vari- ant of the F test given in Eq. (4.56)

where RSSr = residual sum of squares from the restricted regression, RSSur = residual sum of squares from the unrestricted regression, m = number of restrictions, and d.f. in the unrestricted regression.

Just to familiarize yourself with this formula, rework the model given in Table 4-4.

4.21. Refer to Example 4.5. a. Use the method of restricted least squares to find out if it is worth adding

the Pop (population) variable to the model. b. Divide both Educ and GDP by Pop to obtain per capita Educ and per capita

GDP. Now regress per capita Educ on per capita GDP and compare your

(n - k) =

F = (RSSr - RSSur)/m

RSSur/(n - k) ' Fm,n-k

128 PART ONE: THE LINEAR REGRESSION MODEL

Residual

Normal Probability Plot (response is CLFPR [%])

P er

ce n

t

1

5

10

20

30 40 50 60 70

80

90

95

99

�1.0�1.5 �0.5 0.0 0.5 1.0

Mean Std. Dev. n AD p-value of AD

0 0.5074

28 0.468

0.231

Normal probability plot for Example 4.4 AD = Anderson-Darling statistic

FIGURE 4-1

guj75845_ch04.qxd 4/16/09 11:27 AM Page 128

results with those given in Example 4.5. What conclusion can you draw from this exercise?

4.22. Table 4-8 (found on the textbook’s Web site) contains variables from the Los Angeles 2008 Zagat Restaurant Guide. The variables are score values out of 30, with 30 being the best. For each restaurant listed, the table provides data for four categories: food, décor, service, and average price for a single meal at the establishment. a. Create a least squares regression model to predict Price based on the

other three variables (Food, Décor, and Service). Are all the independent variables statistically significant?

b. Does the normal probability plot indicate any problems? c. Create a scattergram of the residual values from the model versus the

fitted values of the Price estimates. Does the plot indicate the resid- ual values have constant variance? Retain this plot for use in future chapters.

APPENDIX 4A.1: Derivations of OLS Estimators Given in Equations (4.20) to (4.22)

Start with Eq. (4.16). Differentiate this equation partially with respect to b1, b2, and b3, and set the resulting equations to zero to obtain:

Simplifying these equations gives Eq. (4.17), (4.18), and (4.19). Using small letters to denote deviations from the mean values (e.g., ), we can solve the preceding equations to obtain the formulas given in Eqs. (4.20), (4.21), and (4.22).

APPENDIX 4A.2: Derivation of Equation (4.31)

Note that the three-variable sample regression model

(4A.2.1)

can be expressed in the deviation form (i.e., each variable expressed as a devia- tion from the mean value and noting that ) as

(4A.2.2)yi = b2x2i + b3x3i + ei

e = 0

Yi = b1 + b2X2i + b3X3i + ei

x2i = X2i - X2

0ge2i

0 b3 = 2g (Yi - b1 - b2X2i - b3X3)(-X3i) = 0

0ge2i

0 b2 = 2g (Yi - b1 - b2X2 - b3X3i)(-X2i) = 0

0ge2i 0gb1

= 2g (Yi - b1 - b2X2i - b3X3i)(-1) = 0

CHAPTER FOUR: MULTIPLE REGRESSION: ESTIMATION AND HYPOTHESIS TESTING 129

guj75845_ch04.qxd 4/16/09 11:27 AM Page 129

Therefore,

(4A.2.3)

Which we can write as

APPENDIX 4A.3: Derivation of Equation (4.50)

Recall that (see footnote 9)

(4A.3.1)

Now is defined as

(4A.3.2)

Note how the degrees of freedom are taken into account. Now substituting Equation (4A.3.1) into Equation (4A.3.2), and after

algebraic manipulations, we obtain

Notice that if we do not take into account the d.f. associated with RSS (= n − k) and TSS (= n − 1), then, obviously .R2 = R2

R2 = 1 - (1 - R2) n - 1 n - k

= 1 - RSS (n - 1) TSS (n - k)

R2 = 1 - RSS/(n - k) TSS/(n - 1)

R2

R2 = 1 - RSS TSS

= TSS - ESS

= ay 2 i - (b2ayix2i + b3ayix3i)

= ay 2 i - b2ayi x2i - b3ayix3i

= a (yi - b2x2i - b3x3i)(yi)

= a eiyi since the last two terms are zero (why?)

= a eiyi - b2a eix2i - b3a eix3i

= a ei (yi - b2x2i - b3x3i)

a e 2 i = a (eiei)

ei = yi - b2x2i - b3x3i

130 PART ONE: THE LINEAR REGRESSION MODEL

guj75845_ch04.qxd 4/16/09 11:27 AM Page 130

APPENDIX 4A.4: EViews Output of the Clock Auction Price Example

CHAPTER FOUR: MULTIPLE REGRESSION: ESTIMATION AND HYPOTHESIS TESTING 131

Actual Y

Residual ei

Fitted (Y )ˆ

Residual Plot

� �0

Method: Least Squares

Variable Coefficient Std. Error t-Statistic Prob.

C AGE

NOBID

�1336.049 12.74138 85.76407

175.2725 0.912356 8.801995

�7.622698 13.96537 9.743708

R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood Durbin-Watson stat

0.890614 0.883070 134.6083 525462.2

�200.7068 1.864656

Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion F-statistic Prob (F-statistic)

1328.094 393.6495 12.73167 12.86909 118.0585 0.000000

Sample: 1 32 Included observations: 32

0.0000 0.0000 0.0000

1235.00 1080.00 845.000 1552.00 1047.00 1979.00 1822.00 1253.00 1297.00 946.000 1713.00 1024.00 2131.00 1550.00 1884.00 2041.00 854.000 1483.00 1055.00 1545.00 729.000 1792.00 1175.00 1593.00 1147.00 1092.00 1152.00 1336.00 785.000 744.000 1356.00 1262.00

1397.04 1158.38 882.455 1347.03 1166.19 1926.29 1680.78 1203.45 1181.40 875.604 1695.98 1098.10 2030.68 1669.00 1671.46 1866.01 1000.55 1461.71 1240.72 1579.81 554.605 1716.53 1364.71 1732.70 1095.63 1127.97 1269.63 1127.01 678.593 729.558 1564.60 1404.85

�162.039 �78.3786 �37.4549 204.965 �119.191 52.7127 141.225 49.5460 115.603 70.3963 17.0187 �74.0973 100.317 �118.995 212.540 174.994 �146.553 21.2927 �185.717 �34.8054 174.395 75.4650 �189.705 �139.702 51.3672 �35.9668 �117.625 208.994 106.407 14.4417 �208.599 �142.852

guj75845_ch04.qxd 4/16/09 11:27 AM Page 131

132

CHAPTER 5 FUNCTIONAL FORMS OF REGRESSION MODELS

Until now we have considered models that were linear in parameters as well as in variables. But recall that in this textbook our concern is with models that are linear in parameters; the Y and X variables do not necessarily have to be linear. As a matter of fact, as we show in this chapter, there are many economic phe- nomena for which the linear-in-parameters/linear-in-variables (LIP/LIV, for short) regression models may not be adequate or appropriate.

For example, suppose for the LIP/LIV math S.A.T. score function given in Equation (2.20) we want to estimate the score elasticity of the math S.A.T., that is, the percentage change in the math S.A.T. score for a percentage change in an- nual family income. We cannot estimate this elasticity from Eq. (2.20) directly because the slope coefficient of that model simply gives the absolute change in the (average) math S.A.T. score for a unit (say, a dollar) change in the annual fam- ily income, but this is not elasticity. Such elasticity, however, can be readily com- puted from the so-called log-linear models that will be discussed in Section 5.1. As we will show, this model, although linear in the parameters, is not linear in the variables.

For another example, suppose we want to find out the rate of growth1 over time of an economic variable, such as gross domestic product (GDP) or money supply, or unemployment rate. As we show in Section 5.4, this growth rate can

1If Yt and Yt-1 are values of a variable, say, GDP, at time t and (t − 1), say, 2009 and 2008, then the rate of growth of Y in the two time periods is measured as , which is simply the relative, or proportional, change in Y multiplied by 100. It is shown in Section 5.4 how the semilog model can be used to measure the growth rate over a longer period of time.

Yt - Yt-1 Yt

# 100

guj75845_ch05.qxd 4/16/09 11:55 AM Page 132

CHAPTER FIVE: FUNCTIONAL FORMS OF REGRESSION MODELS 133

be measured by the so-called semilog model which, while linear in parameters, is nonlinear in variables.

Note that even within the confines of the linear-in-parameter regression models, a regression model can assume a variety of functional forms. In particu- lar, in this chapter we will discuss the following types of regression models:

1. Log-linear or constant elasticity models (Section 5.1). 2. Semilog models (Sections 5.4 and 5.5). 3. Reciprocal models (Section 5.6). 4. Polynomial regression models (Section 5.7). 5. Regression-through-the-origin, or zero intercept, model (Section 5.8).

An important feature of all these models is that they are linear in parameters (or can be made so by simple algebraic manipulations), but they are not neces- sarily linear in variables. In Chapter 2 we discussed the technical meaning of lin- earity in both variables and parameters. Briefly, for a regression model linear in explanatory variable(s) the rate of change (i.e., the slope) of the dependent vari- able remains constant for a unit change in the explanatory variable, whereas for regression models nonlinear in explanatory variable(s) the slope does not remain constant.

To introduce the basic concepts, and to illustrate them graphically, initially we will consider two-variable models and then extend the discussion to multiple regression models.

5.1 HOW TO MEASURE ELASTICITY:THE LOG-LINEAR MODEL

Let us revisit our math S.A.T. score function discussed in Chapters 2 and 3. But now consider the following model for the math S.A.T. score function. (To ease the algebra, we will introduce the error term later.)

(5.1)

where Y is math S.A.T. score and X is annual family income. This model is nonlinear in the variable X.2 Let us, however, express Equation

(5.1) in an alternative, but equivalent, form, as follows:

(5.2)ln Yi = lnA + B2 ln Xi

Yi = AXB2i

ui

2Using calculus, it can be shown that

which shows that the rate of change of Y with respect to X is not independent of X; that is, it is not constant. By definition, then, model (5.1) is not linear in variable X.

dY dX

= AB2X(B2 -1)

guj75845_ch05.qxd 4/16/09 11:55 AM Page 133

where ln = the natural log, that is, logarithm to the base e.3 Now if we let

(5.3)

we can write Equation (5.2) as

(5.4)

And for estimating purposes, we can write this model as

(5.5)

This is a linear regression model, for the parameters B1 and B2 enter the model linearly.4 It is of interest that this model is also linear in the logarithms of the variables Y and X. (Note: The original model [5.1] was nonlinear in X.) Because of this linearity, models like Equation (5.5) are called double-log (because both variables are in the log form) or log-linear (because of linearity in the logs of the variables) models.

Notice how an apparently nonlinear model (5.1) can be converted into a linear (in the parameter) model by suitable transformation, here the logarithmic transfor- mation. Now letting and , we can write model (5.5) as

(5.6)

which resembles the models we have considered in previous chapters; it is linear in both the parameters and the transformed variables Y* and X*.

If the assumptions of the classical linear regression model (CLRM) are satis- fied for the transformed model, regression (5.6) can be estimated easily with the usual ordinary least squares (OLS) routine and the estimators thus obtained will have the usual best linear unbiased estimator (BLUE) property.5

One attractive feature of the double-log, or log-linear, model that has made it popular in empirical work is that the slope coefficient B2 measures the elasticity of Y with respect to X, that is, the percentage change in Y for a given (small) percentage change in X.

Y*i = B1 + B2X*i + ui

X*i = lnXiY*i = ln Yi

ln Yi = B1 + B2 ln Xi + ui

ln Yi = B1 + B2 ln Xi

B1 = lnA

134 PART ONE: THE LINEAR REGRESSION MODEL

3Appendix 5A discusses logarithms and their properties for the benefit of those who need it.

4Note that since B1 = ln A, A can be expressed as A = antilog (B1) which is, mathematically speak- ing, a nonlinear transformation. In practice, however, the intercept A often does not have much con- crete meaning.

5Any regression package now routinely computes the logs of (positive) numbers. So there is no additional computational burden involved.

guj75845_ch05.qxd 4/16/09 11:55 AM Page 134

CHAPTER FIVE: FUNCTIONAL FORMS OF REGRESSION MODELS 135

Symbolically, if we let stand for a small change in Y and for a small change in X, we define the elasticity coefficient, E, as

(5.7)6

Thus, if Y represents the quantity of a commodity demanded and X its unit price, B2 measures the price elasticity of demand.

All this can be shown graphically. Figure 5-1(a) represents the function (5.1), and Figure 5-1(b) shows its loga-

rithmic transformation. The slope of the straight line shown in Figure 5-1(b) gives the estimate of price elasticity, (−B2). An important feature of the log- linear model should be apparent from Figure 5-1(b). Since the regression line is a straight line (in the logs of Y and X), its slope (−B2) is constant throughout. And since this slope coefficient is equal to the elasticity coefficient, for this

= slope a X Y b

= ¢Y ¢X

# X Y

= ¢Y/Y # 100 ¢X/X # 100

E = % change in Y

% change in X

¢X¢Y

6In calculus notation

where dY/dX means the derivative of Y with respect to X, that is, the rate of change of Y with respect to X. �Y/�X is an approximation of dY/dX. Note: For the transformed model (5.6),

which is the elasticity of Y with respect to X as per Equation (5.7). As noted in Appendix 5A, a change in the log of a number is a relative or proportional change. For example, ¢lnY = ¢YY .

B2 = ¢Y* ¢X*

= ¢ln Y ¢ln X

= ¢Y/Y ¢X/X

= ¢Y ¢X

# X Y

E = dY dX

# X Y

A constant elasticity modelFIGURE 5-1

ln X

ln YY

(a)

X

(b)

0 0

Q u

an ti

ty

Price

Y = AX–B2 B2

Log of Price

L og

o f

Q u

an ti

ty

guj75845_ch05.qxd 4/16/09 11:55 AM Page 135

model, the elasticity is also constant throughout—it does not matter at what value of X this elasticity is computed.7

Because of this special feature, the double-log or log-linear model is also known as the constant elasticity model. Therefore, we will use all of these terms interchangeably.

Example 5.1 Math S.A.T. Score Function Revisited

In Equation (3.46) we presented the linear (in variables) function for our math S.A.T. score example. Recall, however, that the scattergram showed that the relationship between math S.A.T. scores and annual family income was approximately linear because not all points were really on a straight line. Eq. (3.46) was, of course, developed for pedagogy. Let us see if the log-linear model fits the data given in Table 2-2, which for convenience is reproduced in Table 5-1.

The OLS regression based on the log-linear data gave the following results:

(5.8)

As these results show, the (constant) score elasticity is 0.13, suggesting that if the annual family income increases by 1 percent, the math S.A.T. score on average increases 0.13 percent. By convention, an elasticity coefficient lessL

L

p = (1.25 * 10-9)(2.79 * 10-5) r2 = 0.9005

t = (31.0740) (8.5095)

se = (0.1573) (0.0148)

lnYi = 4.8877 + 0.1258 ln Xi

136 PART ONE: THE LINEAR REGRESSION MODEL

7Note carefully, however, that in general, elasticity and slope coefficients are different concepts. As Eq. (5.7) makes clear, elasticity is equal to the slope times the ratio of X/Y. It is only for the double-log, or log-linear, model that the two are identical.

MATH S.A.T. SCORE (Y) IN RELATION TO ANNUAL FAMILY INCOME (X) ($)

Y X

410 5000 420 15000 440 25000 490 35000 530 45000 530 55000 550 65000 540 75000 570 90000 590 150000

TABLE 5-1

guj75845_ch05.qxd 4/16/09 11:55 AM Page 136

CHAPTER FIVE: FUNCTIONAL FORMS OF REGRESSION MODELS 137

than 1 in absolute value is said to be inelastic, whereas if it is greater than 1, it is called elastic. An elasticity coefficient of 1 (in absolute value) has unitary elasticity. Therefore, in our example, the math S.A.T. score is inelastic; the math score increases proportionately less than the increase in annual family income.

The interpretation of the intercept of 4.89 means that the average value of ln Y is 4.89 if the value of ln X is zero. Again, this mechanical interpretation of the intercept may not have concrete economic meaning.8

The interpretation of r2 = 0.9005 is that 90 percent of the variation in the log of Y is explained by the variation in the log of X.

The regression line in Equation (5.8) is sketched in Figure 5-2. Notice that this figure is quite similar to Figure 2-1.

Hypothesis Testing in Log-Linear Models

There is absolutely no difference between the linear and log-linear models inso- far as hypothesis testing is concerned. Under the assumption that the error term follows the normal distribution with mean zero and constant variance , it fol- lows that each estimated regression coefficient is normally distributed. Or, if we replace by its unbiased estimator , each estimator follows the t distribution with degrees of freedom (d.f.) equal to (n - k), where k is the number of parameters

�N2�2

�2

L

L

Log-linear model of math S.A.T. scoreFIGURE 5-2

6.5

6.4

6.3 ln

(S co

re )

6.2

6.1

6.0

5.9 8.0 8.5 9.0 9.5 10.0

ln (Income)

10.5 11.0 11.5 12.0 12.5

Scatterplot of ln (Score) vs. ln (Income)

8Since ln Y = 4.8877 when ln X is zero, if we take the antilog of this number, we obtain 132.94. Thus, the average math S.A.T. score is about 133 points if the log of annual family income is zero. For the linear model given in Eq. (3.46), the intercept value was about 432.41 points when annual family income (not the log of income) was zero.

L

guj75845_ch05.qxd 4/16/09 11:55 AM Page 137

estimated, including the intercept. In the two-variable case, k is 2, in the three- variable case, k = 3, etc.

From the regression (5.8), you can readily check that the slope coefficient is statistically significantly different from zero since the t value of 8.51 has a p value of , which is very small. If the null hypothesis that annual family income has no relationship to math S.A.T. score were true, our chances of obtaining a t value of as much as 8.51 or greater would be about 3 in 100,000! The intercept value of 4.8877 is also statistically significant because the p value is about .

5.2 COMPARING LINEAR AND LOG-LINEAR REGRESSION MODELS

We take this opportunity to consider an important practical question. We have fitted a linear (in variables) S.A.T. score function, Eq. (3.46), as well as a log- linear function, Eq. (5.8), for our S.A.T. score example. Which model should we choose? Although it may seem logical that students with higher family income would tend to have higher S.A.T. scores, indicating a positive relation- ship, we don’t really know which particular functional form defines the rela- tionship between them.9 That is, we may not know if we should fit the linear, log-linear, or some other model. The functional form of the regression model then becomes essentially an empirical question. Are there any guidelines or rules of thumb that we can follow in choosing among competing models?

One guiding principle is to plot the data. If the scattergram shows that the relationship between two variables looks reasonably linear (i.e., a straight line), the linear specification might be appropriate. But if the scattergram shows a nonlinear relationship, plot the log of Y against the log of X. If this plot shows an approximately linear relationship, a log-linear model may be appropriate. Unfortunately, this guiding principle works only in the simple case of two- variable regression models and is not very helpful once we consider multiple regressions; it is not easy to draw scattergrams in multiple dimensions. We need other guidelines.

Why not choose the model on the basis of ; that is, choose the model that gives the highest ? Although intuitively appealing, this criterion has its own problems. First, as noted in Chapter 4, to compare the values of two models, the dependent variable must be in the same form.10 For model (3.46), the dependent variable is Y, whereas for the model (5.8), it is ln Y, and these two dependent variables are obviously not the same. Therefore, of the linear model (3.46) and of the log-linear model are not directly comparable, even though they are approximately the same in the present case.

r2 = 0.9005 r2 = 0.7869

r2 r2

r2

1.25 * 10-9

2.79 * 10-5 L

138 PART ONE: THE LINEAR REGRESSION MODEL

9A cautionary note here: Remember that regression models do not imply causation, so we are not implying that having a higher annual family income causes higher math S.A.T. scores, only that we would tend to see the two together. There may be several other reasons explaining this result. Perhaps students with higher family incomes are able to afford S.A.T. preparation classes or attend schools that focus more on material typically covered in the exam.

10It does not matter what form the independent or explanatory variables take; they may or may not be linear.

guj75845_ch05.qxd 4/16/09 11:55 AM Page 138

CHAPTER FIVE: FUNCTIONAL FORMS OF REGRESSION MODELS 139

The reason that we cannot compare these two r2 values is not difficult to grasp. By definition, r2 measures the proportion of the variation in the depen- dent variable explained by the explanatory variable(s). In the linear model (3.46) r2 thus measures the proportion of the variation in Y explained by X, whereas in the log-linear model (5.8) it measures the proportion of the variation in the log of Y explained by the log of X. Now the variation in Y and the variation in the log of Y are conceptually different. The variation in the log of a number mea- sures the relative or proportional change (or percentage change if multiplied by 100), and the variation in a number measures the absolute change.11 Thus, for the lin- ear model (3.46), percent of the variation in Y is explained by X, whereas for the log-linear model, 90 percent of the variation in the log of Y is explained by the log of X. If we want to compare the two r2s, we can use the method dis- cussed in Problem 5.16.

Even if the dependent variable in the two models is the same so that two r2

values can be directly compared, you are well-advised against choosing a model on the basis of a high r2 value criterion. This is because, as pointed out in Chapter 4, an r2 (=R2) can always be increased by adding more explanatory variables to the model. Rather than emphasizing the r2 value of a model, you should consider factors such as the relevance of the explanatory variables in- cluded in the model (i.e., the underlying theory), the expected signs of the coef- ficients of the explanatory variables, their statistical significance, and certain derived measures like the elasticity coefficient. These should be the guiding principles in choosing between two competing models. If based on these crite- ria one model is preferable to the other, and if the chosen model also happens to have a higher r2 value, then well and good. But avoid the temptation of choosing a model only on the basis of the r2 value alone.

Comparing the results of the log-linear score function (5.8) versus the linear function (3.46), we observe that in both models the slope coefficient is positive, as per prior expectations. Also, both slope coefficients are statistically signifi- cant. However, we cannot compare the two slope coefficients directly, for in the LIV model it measures the absolute rate of change in the dependent variable, whereas in the log-linear model it measures elasticity of Y with respect to X.

If for the LIV model we can measure score elasticity, then it is possible to compare the two slope coefficients. To do this, we can use Equation (5.7), which shows that elasticity is equal to the slope times the ratio of X to Y. Although for the linear model the slope coefficient remains the same (Why?), which is 0.0013 in our S.A.T. score example, the elasticity changes from point to point on the linear curve because the ratio X/Y changes from point to point. From Table 5-1 we see that there are 10 different math S.A.T. score and annual family income figures. Therefore, in principle we can compute 10 different elasticity coefficients. In practice, however, the elasticity coefficient for the

L L 79

11If a number goes from 45 to 50, the absolute change is 5, but the relative change is or about 11.11 percent.(50 - 45)>45 L 0.1111,

guj75845_ch05.qxd 4/16/09 11:55 AM Page 139

linear model is often computed at the sample mean values of X and Y to obtain a measure of average elasticity. That is,

(5.9)

where and are sample mean values. For the data given in Table 5-1, and . Thus, the average elasticity for our sample is

It is interesting to note that for the log-linear function the score elasticity coefficient was 0.1258, which remains the same no matter at what income the elasticity is measured (see Figure 5-1[b]). This is why such a model is called a constant elasticity model. For the LIV, on the other hand, the elasticity coeffi- cient changes from point to point on the score = family income curve.12

The fact that for the linear model the elasticity coefficient changes from point to point and that for the log-linear model it remains the same at all points on the demand curve means that we have to exercise some judgment in choosing between the two specifications, for, in practice, both these assumptions may be extreme. It is possible that over a small segment of the expenditure curve the elasticity remains constant but that over some other segment(s) it is variable.

5.3 MULTIPLE LOG-LINEAR REGRESSION MODELS

The two-variable log-linear model can be generalized easily to models contain- ing more than one explanatory variable. For example, a three-variable log- linear model can be expressed as

(5.10)

In this model the partial slope coefficients B2 and B3 are also called the partial elasticity coefficients.13 Thus, B2 measures the elasticity of Y with respect to X2, holding the influence of X3 constant; that is, it measures the percentage change in Y for a percentage change in X2, holding the influence of X3 constant. Since the influence of X3 is held constant, it is called a partial elasticity. Similarly, B3

lnYi = B1 + B2 ln X2i + B3 lnX3i + ui

Average score elasticity = (0.0013) 56,000

507 = 0.1436

Y = 507X = 56,000 YX

Average elasticity = ¢Y ¢X

# X Y

140 PART ONE: THE LINEAR REGRESSION MODEL

12Notice this interesting fact: For the LIV model, the slope coefficient is constant but the elastic- ity coefficient is variable. However, for the log-linear model, the elasticity coefficient is constant but the slope coefficient is variable, which can be seen at once from the formula given in footnote 2.

13The calculus-minded reader will recognize that the partial derivative of ln Y with respect to ln X2 is

which by definition is elasticity of Y with respect to X2. Likewise, B3 is the elasticity of Y with respect to X3.

B2 = 0 ln Y 0 ln X2

= 0Y/Y

0X2/X2 =

0Y 0X2

# X2 Y

guj75845_ch05.qxd 4/16/09 11:55 AM Page 140

CHAPTER FIVE: FUNCTIONAL FORMS OF REGRESSION MODELS 141

measures the (partial) elasticity of Y with respect to X3, holding the influence of X2 constant. In short, in a multiple log-linear model, each partial slope coefficient measures the partial elasticity of the dependent variable with respect to the explanatory variable in question, holding all other variables constant.

Example 5.2. The Cobb-Douglas Production Function

As an example of model (5.10), let Y = output, X2 = labor input, and X3 = capital input. In that case model (5.10) becomes a production function—a function that relates output to labor and capital inputs. As a matter of fact, regression (5.10) in this case represents the celebrated Cobb-Douglas (C-D) production function. As an illustration, consider the data given in Table 5-2, which relates to Mexico for the years 1955 to 1974. Y, the output, is measured by gross domestic product (GDP) (millions of 1960 pesos), X2, the labor input, is measured by total employment (thousands of people), and X3, the capital input, is measured by stock of fixed capital (millions of 1960 pesos).

REAL GDP, EMPLOYMENT, AND REAL FIXED CAPITAL, MEXICO, 1955–1974

Year GDPa Employmentb Fixed capitalc

1955 114043 8310 182113 1956 120410 8529 193745 1557 129187 8738 205192 1958 134705 8952 215130 1959 139960 9171 225021 1960 150511 9569 237026 1961 157897 9527 248897 1962 165286 9662 260661 1963 178491 10334 275466 1964 199457 10981 295378 1965 212323 11746 315715 1966 226977 11521 337642 1967 241194 11540 363599 1968 260881 12066 391847 1969 277498 12297 422382 1970 296530 12955 455049 1971 306712 13338 484677 1972 329030 13738 520553 1973 354057 15924 561531 1974 374977 14154 609825

Notes: aMillions of 1960 pesos. bThousands of people. cMillions of 1960 pesos.

Source: Victor J. Elias, Sources of Growth: A Study of Seven Latin American Economies, International Center for Economic Growth, ICS Press, San Francisco, 1992. Data from Tables E5, E12, and E14.

TABLE 5-2

guj75845_ch05.qxd 4/16/09 11:55 AM Page 141

Based on the data given in Table 5-2, the following results were obtained using the MINITAB statistical package:

ln = -1.6524 + 0.3397 ln X2t + 0.8460 ln X3t se = (0.6062) (0.1857) (0.09343)

t = (−2.73) (1.83) (9.06) (5.11)

p value = (0.014) (0.085) (0.000)*

R2 = 0.995

F = 1719.23 (0.000)**

The interpretation of regression (5.11) is as follows. The partial slope coefficient of 0.3397 measures the elasticity of output with respect to the labor input. Specifically, this number states that, holding the capital input constant, if the labor input increases by 1 percent, on the average, output goes up by about 0.34 percent. Similarly, holding the labor input constant, if the capital input increases by 1 per- cent, on the average, output goes up by about 0.85 percent. If we add the elasticity coefficients, we obtain an economically important parameter, called the returns to scale parameter, which gives the response of output to a proportional change in inputs. If the sum of the two elasticity coefficients is 1, we have constant returns to scale (i.e., doubling the inputs simultaneously doubles the output); if it is greater than 1, we have increasing returns to scale (i.e., doubling the inputs simul- taneously more than doubles the output); if it is less than 1, we have decreasing returns to scale (i.e., doubling the inputs less than doubles the output).

For Mexico, for the study period, the sum of the two elasticity coefficients is 1.1857, suggesting that perhaps the Mexican economy was characterized by increasing returns to scale.

Returning to the estimated coefficients, we see that both labor and capital are individually statistically significant on the basis of the one-tail test although the impact of capital seems to be more important than that of labor. (Note: We use a one-tail test because both labor and capital are expected to have a positive effect on output.)

The estimated F value is so highly significant (because the p value is almost zero) we can strongly reject the null hypothesis that labor and capital together do not have any impact on output.

The R2 value of 0.995 means that about 99.5 percent of the variation in the (log) of output is explained by the (logs) of labor and capital, a very high degree of explanation, suggesting that the model (5.11) fits the data very well.

Example 5.3. The Demand for Energy

Table 5-3 gives data on the indexes of aggregate final energy demand (Y), real GDP (X2), and real energy price (X3) for seven OECD countries (the

Yt

142 PART ONE: THE LINEAR REGRESSION MODEL

*Denotes extremely small value. ** p value of F, also extremely small.

guj75845_ch05.qxd 4/16/09 11:55 AM Page 142

CHAPTER FIVE: FUNCTIONAL FORMS OF REGRESSION MODELS 143

United States, Canada, Germany, France, the United Kingdom, Italy, and Japan) for the period 1960 to 1982. All indexes are with base 1973 = 100. Using the data given in Table 5-3 and MINITAB we obtained the following log- linear energy demand function:

se = (0.0903) (0.0191) (0.0243) t = (17.17) (52.09) (13.61)

p value = (0.000)* (0.000)* (0.000)* (5.12)

R2 = 0.994 = 0.994

F = 1688

As this regression shows, energy demand is positively related to income (as measured by real GDP) and negatively related to real price; these findings

R2

lnYt = 1.5495 + 0.9972 ln X2t - 0.3315 ln X3t

ENERGY DEMAND IN OECD COUNTRIES, 1960–1982

Year Final demand Real GDP Real energy price

1960 54.1 54.1 111.9 1961 55.4 56.4 112.4 1962 58.5 59.4 111.1 1963 61.7 62.1 110.2 1964 63.6 65.9 109.0 1965 66.8 69.5 108.3 1966 70.3 73.2 105.3 1967 73.5 75.7 105.4 1968 78.3 79.9 104.3 1969 83.8 83.8 101.7 1970 88.9 86.2 97.7 1971 91.8 89.8 100.3 1972 97.2 94.3 98.6 1973 100.0 100.0 100.0 1974 97.4 101.4 120.1 1975 93.5 100.5 131.0 1976 99.1 105.3 129.6 1977 100.9 109.9 137.7 1078 103.9 114.4 133.7 1979 106.9 118.3 144.5 1980 101.2 119.6 179.0 1981 98.1 121.1 189.4 1982 95.6 120.6 190.9

Source: Richard D. Prosser, “Demand Elasticities in OECD: Dynamic Aspects,” Energy Economics, January 1985, p. 10.

TABLE 5-3

*Denotes extremely small value.

guj75845_ch05.qxd 4/16/09 11:55 AM Page 143

accord with economic theory. The estimated income elasticity is about 0.99, meaning that if real income goes up by 1 percent, the average amount of en- ergy demanded goes up by about 0.99 percent, or just about 1 percent, ceteris paribus. Likewise, the estimated price elasticity is about −0.33, meaning that, holding other factors constant, if energy price goes up by 1 percent, the aver- age amount of energy demanded goes down by about 0.33 percent. Since this coefficient is less than 1 in absolute value, we can say that the demand for energy is price inelastic, which is not very surprising because energy is a very essential item for consumption.

The R2 values, both adjusted and unadjusted, are very high. The F value of about 1688 is also very high; the probability of obtaining such an F value, if in fact is true, is almost zero. Therefore, we can say that income and energy price together strongly affect energy demand.

5.4 HOW TO MEASURE THE GROWTH RATE: THE SEMILOG MODEL

As noted in the introduction to this chapter, economists, businesspeople, and the government are often interested in finding out the rate of growth of certain economic variables. For example, the projection of the government budget deficit (surplus) is based on the projected rate of growth of the GDP, the single most important indicator of economic activity. Likewise, the Fed keeps a strong eye on the rate of growth of consumer credit outstanding (auto loans, install- ment loans, etc.) to monitor its monetary policy.

In this section we will show how regression analysis can be used to measure such growth rates.

Example 5.4. The Growth of the U.S. Population, 1975–2007

Table 5-4 gives data on the U.S. population (in millions) for the period 1975 to 2007.

We want to measure the rate of growth of the U.S. population (Y) over this period. Now consider the following well-known compound interest formula from your introductory courses in money, banking, and finance:

(5.13)14

Y0 = the beginning, or initial, value of Y Yt = Y’s value at time t

r = the compound (i.e., over time) rate of growth of Y

Yt = Y0(1 + r)t

B2 = B3 = 0

144 PART ONE: THE LINEAR REGRESSION MODEL

14Suppose you deposit in a passbook account in a bank, paying, say, 6 percent inter- est per year. Here r = 0.06, or 6 percent. At the end of the first year this amount will grow to

at the end of the second year it will be because in the second year you get interest not only on the initial $100 but

also on the interest earned in the first year. In the third year this amount grows to etc.100(1 + 0.06)3 = 119.1016,

(1 + 0.06)2 = 112.36 Y2 = 106(1 + 0.06) = 100Y1 = 100(1 + 0.6) = 106;

Y0 = $100

guj75845_ch05.qxd 4/16/09 11:55 AM Page 144

CHAPTER FIVE: FUNCTIONAL FORMS OF REGRESSION MODELS 145

Let us manipulate Equation (5.13) as follows. Take the (natural) log of Eq. (5.13) on both sides to obtain

(5.14)

Now let

(5.15)

(5.16)

Therefore, we can express model (5.14) as

(5.17)

Now if we add the error term ut to model (5.17), we will obtain15

(5.18)

This model is like any other linear regression model in that parameters B1 and B2 are linear. The only difference is that the dependent variable is the logarithm of Y and the independent, or explanatory, variable is “time,” which will take values of 1, 2, 3, etc.

ln Yt = B1 + B2t + ut

ln Yt = B1 + B2t

B2 = ln (1 + r)

B1 = ln Y0

ln Yt = ln Y0 + t ln(1 + r)

POPULATION OF UNITED STATES (MILLIONS OF PEOPLE), 1975–2007

U.S. population Time U.S. population Time

TABLE 5-4

215.973 1 218.035 2 220.239 3 222.585 4 225.055 5 227.726 6 229.966 7 232.188 8 234.307 9 236.348 10 238.466 11 240.651 12 242.804 13 245.021 14 247.342 15 250.132 16 253.493 17

256.894 18 260.255 19 263.436 20 266.557 21 269.667 22 272.912 23 276.115 24 279.295 25 282.430 26 285.454 27 288.427 28 291.289 29 294.056 30 296.940 31 299.801 32 302.045 33

Note: 1975 = 1; 2007 = 33. Source: Economic Report of the President, 2008, Table B34.

15The reason we add the error term is that the compound interest formula will not exactly fit the data of Table 5-4.

guj75845_ch05.qxd 4/16/09 11:55 AM Page 145

Models like regression (5.18) are called semilog models because only one variable (in this case the dependent variable) appears in the logarithmic form. How do we interpret semilog models like regression (5.18)? Before we discuss this, note that model (5.18) can be estimated by the usual OLS method, assuming of course that the usual assumptions of OLS are satisfied. For the data of Table 5-4, we obtain the following regression results:

(5.19)

Note that in Eq. (5.19) we have only reported the t values. The estimated regression line is sketched in Figure 5-3.

The interpretation of regression (5.19) is as follows. The slope coefficient of 0.0107 means on the average the log of Y (U.S. population) has been increas- ing at the rate of 0.0107 per year. In plain English, Y has been increasing at the rate of 1.07 percent per year, for in a semilog model like regression (5.19) the slope coefficient measures the proportional or relative change in Y for a given absolute change in the explanatory variable, time in the present case.16 If this relative change is multiplied by 100, we obtain the percentage change or the growth

t = (3321.13)(129.779) r2 = 0.9982

ln (USpop) = 5.3593 + 0.0107t

146 PART ONE: THE LINEAR REGRESSION MODEL

Semilog modelFIGURE 5-3

5.75

5.70

5.65

5.60

ln (P

op )

5.55

5.45

5.50

5.40

5.35 0 5 10 15

Time

Scatterplot of ln (Pop) vs. Time

20 25 30 35

16Using calculus it can be shown that

= dY Y

dt =

relative change in Y

absolute change in t

B2 = d lnY

dt = a

1 Y b a

dY dt b

guj75845_ch05.qxd 4/16/09 11:55 AM Page 146

CHAPTER FIVE: FUNCTIONAL FORMS OF REGRESSION MODELS 147

rate (see footnote 1). In our example the relative change is 0.0107, and hence the growth rate is 1.07 percent.

Because of this, semilog models like Eq. (5.19) are known as growth mod- els and such models are routinely used to measure the growth rate of many variables, whether economic or not.

The interpretation of the intercept term 5.3593 is as follows. From Eq. (5.15) it is evident that

Therefore, if we take the antilog of 5.3593 we obtain

which is the value of Y when t = 0, that is, at the beginning of the period. Since our sample begins in 1975, we can interpret the value of 213 (millions) as the population figure at the end of 1974. But remember the warning given previ- ously that often the intercept term has no particular physical meaning.

Instantaneous versus Compound Rate of Growth

Notice from Eq. (5.16) that

Therefore,

antilog (b2) = (1 + r) which means that

r = antilog (b2) − 1 (5.20) And since r is the compound rate of growth, once we have obtained b2 we can easily estimate the compound rate of growth of Y from Equation (5.20). For Example 5.4, we obtain

r = antilog (0.0107) − 1 = 1.0108 − 1 = 0.010757 (5.21)

That is, over the sample period, the compound rate of growth of the U.S. population had been at the rate of 1.0757 percent per year.

Earlier we said that the growth rate in Y was 1.07 percent but now we say it is 1.0757 percent. What is the difference? The growth rate of 1.07 percent (or, more generally, the slope coefficient in regressions like Eq. [5.19], multiplied by 100) gives the instantaneous (at a point in time) growth rate, whereas the growth rate of 1.0757 percent (or, more generally, that obtained from Equation [5.20]) is the compound (over a period of time) growth rate. In the present example the difference between the two growth rates may not sound important, but do not forget the power of compounding.

b2 = the estimate of B2 = ln (1 + r)

L

antilog (5.3593) L 212.5761

b1 = the estimate of ln Y0 = 5.3593

guj75845_ch05.qxd 4/16/09 11:55 AM Page 147

In practice, one generally quotes the instantaneous growth rate, although the compound growth rate can be easily computed, as just shown.

The Linear Trend Model

Sometimes, as a quick and ready method of computation, researchers estimate the following model:

(5.22)

That is, regress Y on time itself, where time is measured chronologically. Such a model is called, appropriately, the linear trend model, and the time variable t is known as the trend variable.17 If the slope coefficient in the preceding model is positive, there is an upward trend in Y, whereas if it is negative, there is a down- ward trend in Y.

For the data in Table 5-4, the results of fitting Equation (5.22) are as follows:

(5.23)

As these results show, over the sample period the U.S. population had been increasing at the absolute (note, not the relative) rate of 2.757 million per year. Thus, over that period there was an upward trend in the U.S. population. The intercept value here probably represents the base population in the year 1974, which from this model it is about 210 million.

In practice, both the linear trend and growth models have been used exten- sively. For comparative purposes, however, the growth model is more useful. People are often interested in finding out the relative performance and not the absolute performance of economic measures, such as GDP, money supply, etc.

Incidentally, note that we cannot compare r2 values of the two models because the dependent variables in the two models are not the same (but see Problem 5.16). Statistically speaking, both models give fairly good results, judged by the usual t test of significance.

Recall that for the log-linear, or double-log, model the slope coefficient gives the elasticity of Y with respect to the relevant explanatory variable. For the growth model and the linear trend models, we can also measure such elastici- ties. As a matter of fact, once the functional form of the regression model is known, we can compute elasticities from the basic definition of elasticity given in Eq. (5.7). Table 5-11 at the end of this chapter summarizes the elasticity coef- ficients for the various models we have considered in the chapter.

A cautionary note: The traditional practice of introducing the trend variable t in models such as (5.18) and (5.22) has recently been questioned by the new generation of time series econometricians. They argue that such a practice may be justifiable only if the error term ut in the preceding models is stationary.

t = (287.4376)(73.6450) r2 = 0.9943

USpopt = 209.6731 + 2.7570t

Yt = B1 + B2t + ut

148 PART ONE: THE LINEAR REGRESSION MODEL

17By trend we mean a sustained upward or downward movement in the behavior of a variable.

guj75845_ch05.qxd 4/16/09 11:55 AM Page 148

CHAPTER FIVE: FUNCTIONAL FORMS OF REGRESSION MODELS 149

Although the precise meaning of stationarity will be explained in Chapter 12, for now we state that ut is stationary if its mean value and its variance do not vary systematically over time. In our classical linear regression model we have as- sumed that ut has zero mean and constant variance . Of course, in an applica- tion we will have to check to see if these assumptions are valid. We will discuss this topic later.

5.5 THE LIN-LOG MODEL: WHEN THE EXPLANATORY VARIABLE IS LOGARITHMIC

In the previous section we considered the growth model in which the depen- dent variable was in the log form but the explanatory variable was in the linear form. For descriptive purposes, we can call such a model a log-lin, or growth, model. In this section we consider a model where the dependent variable is in the linear form but the explanatory variable is in the log form. Appropriately, we call this model the lin-log model.

We introduce this model with a concrete example.

Example 5.5. The Relationship between Expenditure on Services in Relation to Total Personal Consumption Expenditure in 1992 Billions of Dollars, 1975–2006

Consider the annual data given in Table 5-5 (found on the textbook’s Web site) on consumer expenditure on various categories in relation to total per- sonal consumption expenditure.

Suppose we want to find out how expenditure on services (Y) behaves if total personal consumption expenditure (X) increases by a certain percentage. Toward that end, suppose we consider the following model:

(5.24)

In contrast to the log-lin model in Eq. (5.18) where the dependent variable is in log form, the independent variable here is in log form. Before interpret- ing this model, we present the results based on this model; the results are based on MINITAB.

(5.25)

Interpreted in the usual fashion, the slope coefficient of L 1844 means that if the log of total personal consumption increases by a unit, the absolute change in the expenditure on personal services is L $1844 billion. What does it mean in everyday language? Recall that a change in the log of a number

p = (0.00) (0.00) r2 = 0.881

t = (-13.71) (16.13)

se = (916.351) (114.32)

NYt = -12564.8 + 1844.22 ln Xt

Yt = B1 + B2 ln X2t + ut

�2

guj75845_ch05.qxd 4/16/09 11:55 AM Page 149

is a relative change. Therefore, the slope coefficient in model (5.25) measures18

(5.26)

where, as before, and represent (small) changes in Y and X. Equation (5.26) can be written, equivalently, as

(5.27)

This equation states that the absolute change in is equal to B2 times the relative change in X. If the latter is multiplied by 100, then Equation (5.27) gives the absolute change in Y for a percentage change in X. Thus, if changes by 0.01 unit (or 1 percent), the absolute change in Y is 0.01 (B2). Thus, if in an application we find that , the absolute change in Y is (0.01)(674), or 6.74. Therefore, when regressions like Eq. (5.24) are estimated by OLS, multiply the value of the estimated slope coefficient B2 by 0.01, or what amounts to the same thing, divide it by 100.

Returning to our illustrative regression given in Equation (5.25), we then see that if aggregate personal expenditure increases by 1 percent, on the av- erage, expenditure on services increases by L $18.44 billion. (Note: Divide the estimated slope coefficient by 100.)

Lin-log models like Eq. (5.24) are thus used in situations that study the ab- solute change in the dependent variable for a percentage change in the inde- pendent variable. Needless to say, models like regression (5.24) can have more than one X variable in the log form. Each partial slope coefficient will then mea- sure the absolute change in the dependent variable for a percentage change in the given X variable, holding all other X variables constant.

5.6 RECIPROCAL MODELS

Models of the following type are known as reciprocal models:

(5.28)Yi = B1 + B2a 1 Xi b + ui

B2 = 674

¢X/X

Y( = ¢Y)

¢Y = B2a ¢X X b

¢X¢Y

= ¢Y

¢X/X

B2 = absolute change in Y

relative change in X

150 PART ONE: THE LINEAR REGRESSION MODEL

18If using calculus it can be shown that Therefore, Eq. (5.26).B2 = X

dY dX =

dY dX/X =

dY dX = B2 A

1 X B .Y = B1 + B2 ln X,

guj75845_ch05.qxd 4/16/09 11:55 AM Page 150

CHAPTER FIVE: FUNCTIONAL FORMS OF REGRESSION MODELS 151

This model is nonlinear in X because it enters the model inversely or reciprocally, but it is a linear regression model because the parameters are linear.19

The salient feature of this model is that as X increases indefinitely, the term approaches zero (Why?) and Y approaches the limiting or asymptotic

value of B1. Therefore, models like regression (5.28) have built into them an asymptote or limit value that the dependent variable will take when the value of the X variable increases indefinitely.

Some likely shapes of the curve corresponding to Eq. (5.28) are shown in Figure 5-4.

In Figure 5-4(a) if we let Y stand for the average fixed cost (AFC) of production, that is, the total fixed cost divided by the output, and X for the output, then as economic theory shows, AFC declines continuously as the output increases (because the fixed cost is spread over a larger number of units) and eventually becomes asymptotic at level B1.

An important application of Figure 5-4(b) is the Engel expenditure curve (named after the German statistician Ernst Engel, 1821–1896), which relates a consumer’s expenditure on a commodity to his or her total expenditure or income. If Y denotes expenditure on a commodity and X the total income, then certain commodities have these features: (1) There is some critical or threshold level of income below which the commodity is not purchased (e.g., an automo- bile). In Figure 5-4(b) this threshold level of income is at the level −(B2/B1). (2) There is a satiety level of consumption beyond which the consumer will not go no matter how high the income (even millionaires do not generally own more than two or three cars at a time). This level is nothing but the asymptote B1 shown in Figure 5-4(b). For such commodities, the reciprocal model of this figure is the most appropriate.

One important application of Figure 5-4(c) is the celebrated Phillips curve of macroeconomics. Based on the British data on the percent rate of change of money wages (Y) and the unemployment rate (X) in percent, Phillips obtained

(1/Xi)

19If we define , then Equation (5.28) is linear in the parameters as well as the variables Y and X*.

X* = (1/X)

Y

(a)

X

Y

(b)

X 0 0

B1

B1

Y

(c)

X 0

B1 < 0 B2 > 0

B1 > 0 B2 > 0

B1 > 0 B2 < 0

–B2/B1

B1 UN

The reciprocal model: Yi = B1 + B2(1/Xi)FIGURE 5-4

guj75845_ch05.qxd 4/16/09 11:55 AM Page 151

a curve similar to Figure 5-4(c).20 As this figure shows, there is asymmetry in the response of wage changes to the level of unemployment. Wages rise faster for a unit change in unemployment if the unemployment rate is below UN, which is called the natural rate of unemployment by economists, than they fall for an equiv- alent change when the unemployment rate is above the natural level, B1 indicat- ing the asymptotic floor for wage change. (See Figure 5-5 later.) This particular feature of the Phillips curve may be due to institutional factors, such as union bargaining power, minimum wages, or unemployment insurance.

Example 5.6. The Phillips Curve for the United States, 1958 to 1969

Because of its historical importance, and to illustrate the reciprocal model, we have obtained data, shown in Table 5-6, on percent change in the index of hourly earnings (Y) and the civilian unemployment rate (X) for the United States for the years 1958 to 1969.21

Model (5.28) was fitted to the data in Table 5-6, and the results were as follows:

(5.29)

This regression line is shown in Figure 5-5(a).

t = (-0.2572) (4.3996) r2 = 0.6594

YN t = -0.2594 + 20.5880 a 1

Xt b

152 PART ONE: THE LINEAR REGRESSION MODEL

20A. W. Phillips, “The Relationship between Unemployment and the Rate of Change of Money Wages in the United Kingdom, 1861–1957,” Economica, November 1958, pp. 283–299.

21We chose this period because until 1969 the traditional Phillips curve seems to have worked. Since then it has broken down, although many attempts have been made to resuscitate it with varying degrees of success.

YEAR-TO-YEAR PERCENTAGE CHANGE IN THE INDEX OF HOURLY EARNINGS (Y ) AND THE UNEMPLOYMENT RATE (%) (X), UNITED STATES, 1958–1969

Year Y X

1958 4.2 6.8 1959 3.5 5.5 1960 3.4 5.5 1961 3.0 6.7 1962 3.4 5.5 1963 2.8 5.7 1964 2.8 5.2 1965 3.6 4.5 1966 4.3 3.8 1967 5.0 3.8 1968 6.1 3.6 1969 6.7 3.5

Source: Economic Report of the President, 1989. Data on X from Table B-39, p. 352, and data on Y from Table B-44, p. 358.

TABLE 5-6

guj75845_ch05.qxd 4/16/09 11:55 AM Page 152

CHAPTER FIVE: FUNCTIONAL FORMS OF REGRESSION MODELS 153

As Figure 5-5 shows, the wage floor is −0.26 percent, which is not statisti- cally different from zero. (Why?) Therefore, no matter how high the unem- ployment rate is, the rate of growth of wages will be, at most, zero.

For comparison we present the results of the following linear regression based on the same data (see Figure 5-5[b]):

(5.30)

Observe these features of the two models. In the linear model (5.30) the slope coefficient is negative, for the higher the unemployment rate is, the lower the rate of growth of earnings will be, ceteris paribus. In the reciprocal model, however, the slope coefficient is positive, which should be the case because the X variable enters inversely (two negatives make one positive). In other words, a positive slope in the reciprocal model is analogous to the negative slope in

t = (6.4625) (-3.2605) r2 = 0.5153

YN t = 8.0147 - 0.7883Xt

Y

Y

X

X

(a)

(b)

–0.26

0

R at

e of

C h

an ge

o f

E ar

n in

gs

Unemployment rate (%)

UN

Y t = –0.2594 + 20.5880 (1/X

t )

[Eq. (5.29)]

R at

e of

C h

an ge

o f

E ar

n in

gs

–0.7883 1

Unemployment rate (%)

Y t = 8.0147 – 0.7883 X

t

[Eq. (5.30)]

0

The Phillips curve for the United States, 1958–1969; (a) reciprocal model; (b) linear model

FIGURE 5-5

guj75845_ch05.qxd 4/16/09 11:55 AM Page 153

the linear model. The linear model suggests that as the unemployment rate increases by 1 percentage point, on the average, the percentage point change in the earnings is a constant amount of -0.79 no matter at what X we mea- sure it. On the other hand, in the reciprocal model the percentage point rate of change in the earnings is not constant, but rather depends on at what level of X (i.e., the unemployment rate) the change is measured (see Table 5-11).22 The latter assumption seems economically more plausible. Since the dependent variable in the two models is the same, we can compare the two r2 values. The r2 for the reciprocal model is higher than that for the linear model, suggesting that the former model fits the data better than the latter model.

As this example shows, once we go beyond the LIV/LIP models to those models that are still linear in the parameters but not necessarily so in the variables, we have to exercise considerable care in choosing a suitable model in a given situation. In this choice the theory underlying the phenomenon of interest is often a big help in choosing the appropriate model. There is no denying that model building involves a good dose of theory, some introspection, and considerable hands-on experience. But the latter comes with practice.

Before we leave reciprocal models, we discuss another application of such a model.

Example 5.7. Advisory Fees Charged for a Mutual Fund

The data in Table 5-7 relate to the management fees that a leading mutual fund in the United States pays its investment advisers to manage its assets. The fees depend on the net asset value of the fund. As you can see from Figure 5-6, the higher the net asset value of the fund, the lower the advisory fees are.

L

154 PART ONE: THE LINEAR REGRESSION MODEL

22As shown in Table 5-11, for the reciprocal model the slope is .-B2(1>X2)

MANAGEMENT FEE SCHEDULE OF A MUTUAL FUND

Fee (%) Net asset value ($, in billions) Y X

0.5200 0.5 0.5080 5.0 0.4840 10.0 0.4600 15.0 0.4398 20.0 0.4238 25.0 0.4115 30.0 0.4020 35.0 0.3944 40.0 0.3880 45.0 0.3825 55.0 0.3738 60.0

TABLE 5-7

guj75845_ch05.qxd 4/16/09 11:55 AM Page 154

CHAPTER FIVE: FUNCTIONAL FORMS OF REGRESSION MODELS 155

The graph suggests that the relationship between the two variables is non- linear. Therefore, a model of the following type might be appropriate:

(5.31)

Using the data in Table 5-7 and the EViews output in Figure 5-7, we obtained the following regression results:

Fees = B1 + B2a 1

assets b + ui

0.52

0.50

0.48

0.46 Fe

es

0.44

0.42

0.40

0.38

0.36 0 10 20

Assets

Scatterplot of Fees vs. Assets

30 40 50 60

Management fees and asset sizeFIGURE 5-6

Dependent Variable: Fees Method: Least Squares

Variable Coefficient Std. Error t-Statistic Prob.

C 1/assets

0.420412 0.054930

0.012858 0.022099

32.69715 2.485610

R-squared Adjusted R-squared S.E. of regression Sum squared resid

0.381886 0.320075 0.041335 0.017086

Mean dependent var S.D. dependent var

F-statistic Prob (F-statistic)

0.432317 0.050129

6.178255 0.032232

Sample: 1 12 Included observations: 12

0.0000 0.0322

EViews output of Equation (5.31)FIGURE 5-7

It is left as an exercise for you to interpret these regression results (see Problem [5.20]).

guj75845_ch05.qxd 4/16/09 11:55 AM Page 155

5.7 POLYNOMIAL REGRESSION MODELS

In this section we consider regression models that have found extensive use in applied econometrics relating to production and cost functions. In particular, consider Figure 5-8, which depicts the total cost of production (TC) as a function of output as well as the associated marginal cost (MC) and the average cost (AC) curves.

Letting Y stand for TC and X for the output, mathematically, the total cost function can be expressed as

(5.32)

which is called a cubic function, or, more generally, a third-degree polynomial in the variable X—the highest power of X represents the degree of the polyno- mial (three in the present instance).

Notice that in these types of polynomial functions there is only one explana- tory variable on the right-hand side, but it appears with various powers, thus making them multiple regression models.23 (Note: We add the error term ui to make model (5.32) a regression model.)

Although model (5.32) is nonlinear in the variable X, it is linear in the parame- ters, the B’s, and is therefore a linear regression model. Thus, models like regression (5.32) can be estimated by the usual OLS routine. The only “worry” about the model is the likely presence of the problem of collinearity because the various powered terms of X are functionally related. But this concern is more apparent than real, for the terms X2 and X3 are nonlinear functions of X and do not violate the assumption of no perfect collinearity, that is, no perfect linear relationship between variables. In short, polynomial regression models can be estimated in the usual manner and do not present any special estimation problems.

Example 5.8. Hypothetical Total Cost Function

To illustrate the polynomial model, consider the hypothetical cost-output data given in Table 5-8.

The OLS regression results based on these data are as follows (see Figure 5-8):

(5.33)

R2 = 0.9983

se = (6.3753) (4.7786) (0.9857) (0.0591)

YNi = 141.7667 + 63.4776Xi - 12.9615Xi2 + 0.9396Xi3

Yi = B1 + B2Xi + B3Xi2 + B4Xi3

156 PART ONE: THE LINEAR REGRESSION MODEL

23Of course, one can introduce other X variables and their powers, if needed.

guj75845_ch05.qxd 4/16/09 11:55 AM Page 156

CHAPTER FIVE: FUNCTIONAL FORMS OF REGRESSION MODELS 157

If cost curves are to have the U-shaped average and marginal cost curves shown in price theory texts, then the theory suggests that the coefficients in model (5.32) should have these a priori values:24

1. B1, B2, and B4, each is greater than zero. 2. . 3.

The regression results given in regression (5.33) clearly are in conformity with these expectations.

As a concrete example of polynomial regression models, consider the following example.

Example 5.9. Cigarette Smoking and Lung Cancer

Table 5-9, on the textbook’s Web site, gives data on cigarette smoking and various types of cancer for 43 states and Washington, D.C., for 1960.

B23 6 3B2B4. B3 < 0

HYPOTHETICAL COST-OUTPUT DATA

Y($) 193 226 240 244 257 260 274 297 350 420 Total cost X 1 2 3 4 5 6 7 8 9 10 Output

TABLE 5-8

Y

X

TC

C os

t

Output

Y i = 141.77 + 63.48X

i – 12.96X

i 2 + 0.94X

i 3

[Eq. (5.33)] Y Y

MC

AC

Output C

os t

X

Cost-output relationshipFIGURE 5-8

24For the economics of this, see Alpha C. Chiang, Fundamental Methods of Mathematical Economics, 3rd ed., McGraw-Hill, New York, 1984, pp. 205–252. The rationale for these restrictions is that to make economic sense the total cost curve must be upward-sloping (the larger the output is, the higher the total cost will be) and the marginal cost of production must be positive.

guj75845_ch05.qxd 4/16/09 11:55 AM Page 157

For now consider the relationship between lung cancer and smoking. To see if smoking has an increasing or decreasing effect on lung cancer, consider the following model:

(5.34)

where Y = number of deaths from lung cancer and X = the number of cigarettes smoked. The regression results using MINITAB are as shown in Figure 5-9.

These results show that the slope coefficient is positive but the coefficient of the cigarette-squared variable is negative. What this suggests is that ciga- rette smoking has an adverse impact on lung cancer, but that the adverse impact increases at a diminishing rate.25 All the slope coefficients are statisti- cally significant on the basis of the one-tail t test. We use the one-tail t test be- cause medical research has shown that smoking has an adverse impact on lung and other types of cancer. The F value of 26.56 is also highly significant, for the estimated p value is practically zero. This would suggest that both variables belong in the model.

Yi = B1 + B2Xi + B3Xi2 + ui

158 PART ONE: THE LINEAR REGRESSION MODEL

Predictor Coef SE Coef T P

Constant CIG CIGSQ

�6.910 1.5765

�0.019179

6.193 0.4560

0.008168

�1.12 3.46

�2.35

MS 201.94

7.60

F 26.56

Source Regression Residual Error Total

DF 2

41 43

SS 403.89 311.69 715.58

P 0.000

0.271 0.001 0.024

S � 2.75720 R-Sq � 56.4% R-Sq (adj) � 54.3%

Analysis of Variance

MINITAB output of regression (5.34)FIGURE 5-9

25Neglecting the error term, if you take the derivative of Y in Equation (5.34) with respect to X, you will obtain , which in the present example gives 1.57 - 2(0.0192)X = 1.57 - 0.0384X, which shows that the rate of change of lung cancer with respect to cigarette smoking is declining. If the coefficient of the cigsq variable were positive, then the effect of cigarette smoking on lung cancer would be increasing at an increasing rate. Here Y = incidence of lung cancer and X is the number of cigarettes smoked.

0y 0X = B2 + 2B3X

5.8 REGRESSION THROUGH THE ORIGIN

There are occasions when the regression model assumes the following form, which we illustrate with the two-variable model, although generalization to multiple regression models is straightforward.

(5.35)Yi = B2Xi + ui

guj75845_ch05.qxd 4/16/09 11:55 AM Page 158

CHAPTER FIVE: FUNCTIONAL FORMS OF REGRESSION MODELS 159

In this model the intercept is absent or zero, hence the name regression through the origin. We have already come across an example of this in Okun’s law in Eq. (2.22). For Equation (5.35) it can be shown that26

(5.36)

(5.37)

(5.38)

If you compare these formulas with those given for the two-variable model with intercept, given in Equations (2.17), (3.6), and (3.8), you will note several differences. First, in the model without the intercept, we use raw sums of squares and cross products, whereas in the intercept-present model, we use mean-adjusted sums of squares and cross products. Second, the d.f. in comput- ing is now rather than , since in Eq. (5.35) we have only one unknown. Third, the conventionally computed r2 formula we have used thus far explicitly assumes that the model has an intercept term. Therefore, you should not use that formula. If you use it, sometimes you will get nonsensical results because the computed r2 may turn out to be negative. Finally, for the model that includes the intercept, the sum of the estimated residuals, is al- ways zero, but this need not be the case for a model without the intercept term.

For all these reasons, one may use the zero-intercept model only if there is strong theoretical reason for it, as in Okun’s law or some areas of economics and finance. An example is given in Problem 5.22. For now we will illustrate the zero-intercept model using the data given in Table 2-13, which relates to U.S. real GDP and the unemployment rate for the period 1960 to 2006. Similar to Equation (2.22), we add the variable representing the year and obtain the fol- lowing results:

(5.39)

where Y = change in the unemployment rate in percentage points and Year, percentage growth rate in real GDP from one year prior to the data in Y

and Year. Xt-1 =

t = (2.55) (-2.92)

YN t = 0.00005Year - 3.070Xt-1

auN i = aei

(n - 2)(n - 1)�N2

�N 2 = a ei

2

n - 1

var (b2) = �2

aX 2 i

b2 = aXiYi

aX 2 i

26The proofs can be found in Gujarati and Porter, Basic Econometrics, 5th ed., McGraw-Hill, New York, 2009, pp. 182–183.

guj75845_ch05.qxd 4/16/09 11:55 AM Page 159

For comparison, we re-estimate Equation (5.39) with the intercept added.

(5.40)

As you will notice, the intercept term is significant in Equation (5.40), but now the Year variable is not. Also notice that we have given the R2 value for Eq. (5.40) but not for Eq. (5.39) for reasons stated before.27

5.9 A NOTE ON SCALING AND UNITS OF MEASUREMENT

Variables, economic or not, are expressed in various units of measurement. For example, we can express temperature in Fahrenheit or Celsius. GDP can be measured in millions or billions of dollars. Are regression results sensitive to the unit of measurement? The answer is that some results are and some are not. To show this, consider the data given in Table 5-10.

This table gives data on gross private domestic investment measured in billions of dollars (GDIB), the same data expressed in millions of dollars (GDIM), gross domestic product measured in billions of dollars (GDPB), and the same data expressed in millions of dollars (GDPM). Suppose we want to

t = (3.354)(-0.90) (-3.05) R2 = 0.182

YN t = 3.128 - 0.0015Year - 3.294Xt-1

160 PART ONE: THE LINEAR REGRESSION MODEL

27For Eq. (5.39) we can compute the so-called “raw” R2, which is discussed in Problem 5.23.

GROSS PRIVATE DOMESTIC INVESTMENT AND GROSS DOMESTIC PRODUCT, UNITED STATES, 1997–2006

Year GDPB GDPM GDIB GDIM

1997 1389.8 1389800 8304.3 8304300 1998 1509.1 1509100 8747.0 8747000 1999 1625.7 1625700 9268.4 9268400 2000 1735.5 1735500 9817.0 9817000 2001 1614.3 1614300 10128.0 10128000 2002 1582.1 1582100 10469.6 10469600 2003 1664.1 1664100 10960.8 10960800 2004 1888.6 1888600 11685.9 11685900 2005 2077.2 2077200 12433.9 12433900 2006 2209.2 2209200 13194.7 13194700

Variables: GDPB = Gross private domestic product (billions of dollars). GDPM = Gross private domestic product (millions of dollars). GDIB = Gross private domestic investment (billions of dollars). GDIM = Gross private domestic investment (millions of dollars).

TABLE 5-10

guj75845_ch05.qxd 4/16/09 11:55 AM Page 160

CHAPTER FIVE: FUNCTIONAL FORMS OF REGRESSION MODELS 161

find out how GDI behaves in relation to GDP. Toward that end, we estimate the following regression models:

(5.41)

(5.42)

(5.43)

(5.44)

At first glance these results may look different. But they are not if we take into ac- count the fact that 1 billion is equal to 1,000 million. All we have done in these various regressions is to express variables in different units of measurement. But keep in mind these facts. First, the r2 value in all these regressions is the same, which should not be surprising because r2 is a pure number, devoid of units in which the dependent variable (Y) and the independent variable (X) are mea- sured. Second, the intercept term is always in the units in which the dependent variable is measured; recall that the intercept represents the value of the depen- dent variable when the independent variable takes the value of zero. Third, when Y and X are measured in the same units of measurement the slope coefficients as well as their standard errors remain the same (compare Equations [5.41] and [5.42]), although the intercept values and their standard errors are different. But the t ratios remain the same. Third, when the Y and X variables are measured in different units of measurement, the slope coefficients are different, but the inter- pretation does not change. Thus, in Equation (5.43) if GDP changes by a million, GDI changes by 0.0058 billions of dollars, which is 5.8 millions of dollars. Likewise, in Equation (5.44) if GDP increases by a billion dollars, GDI increases by 5804.6 millions. All these results are perfectly commonsensical.

5.10 REGRESSION ON STANDARDIZED VARIABLES

We saw in the previous section that the units in which the dependent variable (Y) and the explanatory variables (the X’s) are measured affect the interpretation of the regression coefficients. This can be avoided if we express all the variables as

t = (0.3466) (7.6143) r2 = 0.8787 se = (1331451) (762.335)

GDIMt = 461511.076 + 5804.626GDPBt

t = (0.3466) (7.6143) r2 = 0.8787 se = (1331.451) (0.00076)

GDIBt = 461.511 + 0.0058GDPMt

t = (0.3466) (7.6143) r2 = 0.8787 se = (1331451) (0.762)

GDIMt = 461511.076 + 5.8046GDPMt

t = (0.3466) (7.6143) r2 = 0.8787 se = (1331.451) (0.762)

GDIBt = 461.511 + 5.8046GDPBt

guj75845_ch05.qxd 4/16/09 11:55 AM Page 161

standardized variables. A variable is said to be standardized if we subtract the mean value of the variable from its individual values and divide the difference by the standard deviation of that variable.

Thus, in the regression of Y on X, if we redefine these variables as

(5.45)

(5.46)

where = sample mean of Y = sample standard deviation of Y = sample mean of X = sample standard deviation of X

the variables are called standardized variables. An interesting property of a standardized variable is that its mean value is always

zero and its standard deviation is always 1.28

As a result, it does not matter in what unit the Y and X variable(s) are measured. Therefore, instead of running the standard (bivariate) regression:

(5.47)

we could run the regression on the standardized variables as

(5.48)

since it is easy to show that in the regression involving standardized variables the intercept value is always zero.29 The regression coefficients of the standard- ized explanatory variables, denoted by starred B coefficients , are known in the literature as the beta coefficients. Incidentally, note that Eq. (5.48) is a regression through the origin.

How do we interpret the beta coefficients? The interpretation is that if the (standardized) regressor increases by one standard deviation, the average value of the (standardized) regressand increases by standard deviation units. Thus, unlike the traditional model in Eq. (5.47), we measure the effect not in terms of the original units in which Y and X are measured, but in standard deviation units.

B*2

(B*)

= B*2X*i + u*i

Y*i = B*1 + B*2X*i + u*i

Yi = B1 + B2Xi + ui

Y*i and X*i

SX X SY Y

X*i = Xi - X

SX

Y*i = Y - Y

SY

162 PART ONE: THE LINEAR REGRESSION MODEL

28For proof, see Gujarati and Porter, op.cit., pp. 183–184. 29Recall from Eq. (2.16) that Intercept = Mean value of Y - Slope * Mean value of X. But for the

standardized variables, the mean value is always zero. This can be easily generalized to more than one X variable.

guj75845_ch05.qxd 4/16/09 11:55 AM Page 162

CHAPTER FIVE: FUNCTIONAL FORMS OF REGRESSION MODELS 163

It should be added that if there is more than one X variable, we can convert each variable into the standardized form. To show this, let us return to the Cobb-Douglas production function data given for real GDP, employment, and real fixed capital for Mexico, 1955–1974, in Table 5-2. The results of fitting the logarithmic function are given in Eq. (5.11). The results of regressing the stan- dardized logs of GDP on standardized employment and standardized fixed capital, using EViews, are as follows:

where SLGDP = standardized log of GDP SLE = standardized log of employment SLK = standardized log of capital

The interpretation of the regression coefficients is as follows: Holding capital constant, a standard deviation increase in employment increases the GDP, on average, by standard deviation units. Likewise, holding employment constant, a one standard deviation increase in capital, on average, increases GDP by standard deviation units. (Note that all variables are in the logarithmic form.) Relatively speaking, capital has more impact on GDP than employment. Here you will see the advantage of using standardized variables, for standardization puts all variables on equal footing because all standardized variables have zero means and unit variances.

Incidentally, we have not introduced the intercept term in the regression results. (Why?) If you include intercept in the model, its value will be almost zero.

5.11 SUMMARY OF FUNCTIONAL FORMS

In this chapter we discussed several regression models that, although linear in the parameters, were not necessarily linear in the variables. For each model, we noted its special features and also the circumstances in which it might be

L0.83

L0.17

Dependent Variable: SLGDP Method: Least Squares Sample: 1955 1974 Included observations: 20

Variable Coefficient Std. Error t-Statistic Prob.

SLE 0.167964 0.089220 1.882590 0.0760 SLK 0.831995 0.089220 9.325223 0.0000 R-squared 0.995080 Mean dependent var 6.29E-06 Adjusted R-squared 0.994807 S.D. dependent var 0.999999 S.E. of regression 0.072063 Sum squared resid 0.093475

guj75845_ch05.qxd 4/16/09 11:55 AM Page 163

appropriate. In Table 5-11 we summarize the various functional forms that we discussed in terms of a few salient features, such as the slope coefficients and the elasticity coefficients. Although for double-log models the slope and elasticity coefficients are the same, this is not the case for other models. But even for these models, we can compute elasticities from the basic definition given in Eq. (5.7).

As Table 5-11 shows, for the linear-in-variable (LIV) models, the slope coeffi- cient is constant but the elasticity coefficient is variable, whereas for the log-log, or log-linear, model, the elasticity coefficient is constant but the slope coefficient is variable. For other models shown in Table 5-11, both the slope and elasticity coefficients are variable.

5.12 SUMMARY

In this chapter we considered models that are linear in parameters, or that can be rendered as such with suitable transformation, but that are not necessarily linear in variables. There are a variety of such models, each having special applications. We considered five major types of nonlinear-in-variable but linear-in-parameter models, namely:

1. The log-linear model, in which both the dependent variable and the explanatory variable are in logarithmic form.

2. The log-lin or growth model, in which the dependent variable is logarithmic but the independent variable is linear.

3. The lin-log model, in which the dependent variable is linear but the independent variable is logarithmic.

4. The reciprocal model, in which the dependent variable is linear but the independent variable is not.30

164 PART ONE: THE LINEAR REGRESSION MODEL

SUMMARY OF FUNCTIONAL FORMS

Model Form Slope = Elasticity =

Linear Y = B1 + B2X B2 B2

Log-linear ln Y = B1 + B2 ln X B2 B2 Log-lin ln Y = B1 + B2X B2(Y ) B2 (X )* Lin-log Y = B1 + B2 ln X B2 B2 Reciprocal Y = B1 + B2 −B2 −B2 Log-inverse ln(Y ) = B1 − B2 B2 B2

Note: * Indicates that the elasticity coefficient is variable, depending on the value taken by X or Y or both. When no X and Y are specified, in practice, these elasticities are often measured at the mean values and .YX

A 1 X BA

Y X 2 BA

1 X B

A 1

XY B *A 1

X2 BA

1 X B

A 1 Y B*A

1 X B

A Y X B

A X Y B*

dY dX

# X Y

dY dX

TABLE 5-11

30The dependent variable can also be reciprocal and the independent variable linear, as in Problem 5.15. See also Problem 5.20.

guj75845_ch05.qxd 4/16/09 11:55 AM Page 164

CHAPTER FIVE: FUNCTIONAL FORMS OF REGRESSION MODELS 165

5. The polynominal model, in which the independent variable enters with various powers.

Of course, there is nothing that prevents us from combining the features of one or more of these models. Thus, we can have a multiple regression model in which the dependent variable is in log form and some of the X variables are also in log form, but some are in linear form.

We studied the properties of these various models in terms of their relevance in applied research, their slope coefficients, and their elasticity coefficients. We also showed with several examples the situations in which the various models could be used. Needless to say, we will come across several more examples in the remainder of the text.

In this chapter we also considered the regression-through-the-origin model and discussed some of its features.

It cannot be overemphasized that in choosing among the competing models, the overriding objective should be the economic relevance of the various mod- els and not merely the summary statistics, such as R2. Model building requires a proper balance of theory, availability of the appropriate data, a good understanding of the statistical properties of the various models, and the elusive quality that is called practical judgment. Since the theory underlying a topic of interest is never per- fect, there is no such thing as a perfect model. What we hope for is a reasonably good model that will balance all these criteria.

Whatever model is chosen in practice, we have to pay careful attention to the units in which the dependent and independent variables are expressed, for the interpretation of regression coefficients may hinge upon units of measurement.

KEY TERMS AND CONCEPTS

The key terms and concepts introduced in this chapter are

Double-log, log-linear, or constant elasticity model

Linear vs. log-linear regression model a) Functional form b) High r2 value criterion

Cobb-Douglas (C-D) production function a) Returns to scale parameter b) Constant returns to scale c) Increasing and decreasing

returns to scale Semilog models

a) Instantaneous growth rate b) Compound growth rate

Linear trend model a) trend variable

Log-lin, or growth, model Lin-log model Reciprocal models

a) Asymptotic value b) Engel expenditure curve c) the Phillips curve

Polynomial regression models a) cubic function or third-degree

polynomial Regression through the origin Scaling and units of measurement Regression on standardized variables

a) Standardized variables b) beta coefficients

guj75845_ch05.qxd 4/16/09 11:55 AM Page 165

QUESTIONS

5.1. Explain briefly what is meant by a. Log-log model b. Log-lin model c. Lin-log model d. Elasticity coefficient e. Elasticity at mean value

5.2. What is meant by a slope coefficient and an elasticity coefficient? What is the relationship between the two?

5.3. Fill in the blanks in Table 5-12.

166 PART ONE: THE LINEAR REGRESSION MODEL

FUNCTIONAL FORMS OF REGRESSION MODELS

Model When appropriate

ln Yi = B1 + B2 ln Xi — ln Yi = B1 + B2 Xi —

Yi = B1 + B2 ln Xi — Yi = B1 + B2 —A 1Xi B

TABLE 5-12

5.4. Complete the following sentences: a. In the double-log model the slope coefficient measures . . . b. In the lin-log model the slope coefficient measures . . . c. In the log-lin model the slope coefficient measures . . . d. Elasticity of Y with respect to X is defined as . . . e. Price elasticity is defined as . . . f. Demand is said to be elastic if the absolute value of the price elasticity is . . . ,

but demand is said to be inelastic if it is . . . 5.5. State with reason whether the following statements are true (T) or false (F):

a. For the double-log model, the slope and elasticity coefficients are the same. b. For the linear-in-variable (LIV) model, the slope coefficient is constant but

the elasticity coefficient is variable, whereas for the log-log model, the elas- ticity coefficient is constant but the slope is variable.

c. The R2 of a log-log model can be compared with that of a log-lin model but not with that of a lin-log model.

d. The R2 of a lin-log model can be compared with that of a linear (in variables) model but not with that of a double-log or log-lin model.

e. Model A: ln Y = -0.6 + 0.4X; r2 = 0.85 Model B: = 1.3 + 2.2X; r2 = 0.73 Model A is a better model because its r2 is higher.

5.6. The Engel expenditure curve relates a consumer’s expenditure on a commodity to his or her total income. Letting Y = the consumption expenditure on a com- modity and X = the consumer income, consider the following models: a. Yi = B1 + B2Xi + ui b. Yi = B1 + B2(1/Xi) + ui c. ln Yi = B1 + B2 ln Xi + ui

YN

guj75845_ch05.qxd 4/16/09 11:55 AM Page 166

CHAPTER FIVE: FUNCTIONAL FORMS OF REGRESSION MODELS 167

d. ln Yi = B1 + B2(1/Xi) + ui e. Yi = B1 + B2 ln Xi + ui f. This model is known as the log-inverse model.

Which of these models would you choose for the Engel curve and why? (Hint: Interpret the various slope coefficients, find out the expressions for elasticity of expenditure with respect to income, etc.)

5.7. The growth model Eq. (5.18) was fitted to several U.S. economic time series and the following results were obtained:

Time series and period B1 B2 r 2

Real GNP (1954–1987) 7.2492 0.0302 0.9839 (1982 dollars) t = (529.29) (44.318) Labor force participation rate 4.1056 0.053 0.9464 (1973–1987) t = (1290.8) (15.149) S&P 500 index 3.6960 0.0456 0.8633 (1954–1987) t = (57.408) (14.219) S&P 500 index 3.7115 0.0114 0.8524 (1954–1987 quarterly data) t = (114.615) (27.819)

a. In each case find out the instantaneous rate of growth. b. What is the compound rate of growth in each case? c. For the S&P data, why is there a difference in the two slope coefficients?

How would you reconcile the difference?

PROBLEMS

5.8. Refer to the cubic total cost (TC) function given in Eq. (5.32). a. The marginal cost (MC) is the change in the TC for a unit change in output;

that is, it is the rate of change of the TC with respect to output. (Technically, it is the derivative of the TC with respect to X, the output.) Derive this func- tion from regression (5.32).

b. The average variable cost (AVC) is the total variable cost (TVC) divided by the total output. Derive the AVC function from regression (5.32).

c. The average cost (AC) of production is the TC of production divided by total output. For the function given in regression (5.32), derive the AC function.

d. Plot the various cost curves previously derived and confirm that they resemble the stylized textbook cost curves.

5.9. Are the following models linear in the parameters? If not, is there any way to make them linear-in-parameter (LIP) models?

a.

b.

5.10. Based on 11 annual observations, the following regressions were obtained:

Model A: = 2.6911 - 0.4795Xt se = (0.1216) (0.1140) r2 = 0.6628 NYt

Yi = Xi

B1 + B2X2i

Yi = 1

B1 + B2Xi

ln(Y) = B1 - B2 A 1 X B .

guj75845_ch05.qxd 4/16/09 11:55 AM Page 167

Model B: ln = 0.7774 - 0.2530 ln Xt se = (0.0152) (0.0494) r2 = 0.7448

where Y = the cups of coffee consumed per person per day and X = the price of coffee in dollars per pound. a. Interpret the slope coefficients in the two models. b. You are told that and . At these mean values, estimate

the price elasticity for Model A. c. What is the price elasticity for Model B? d. From the estimated elasticities, can you say that the demand for coffee is

price inelastic? e. How would you interpret the intercept in Model B? (Hint: Take the antilog.) f. Since the r2 of Model B is larger than that of Model A, Model B is preferable

to Model A. Comment on this statement. 5.11. Refer to the Cobb-Douglas production function given in regression (5.11).

a. Interpret the coefficient of the labor input X2. Is it statistically different from 1?

b. Interpret the coefficient of the capital input X3. Is it statistically different from zero? And from 1?

c. What is the interpretation of the intercept value of -1.6524? d. Test the hypothesis that B2 = B3 = 0.

5.12. In their study of the demand for international reserves (i.e., foreign reserve cur- rency such as the dollar or International Monetary Fund [IMF] drawing rights), Mohsen Bahami-Oskooee and Margaret Malixi31 obtained the following regres- sion results for a sample of 28 less developed countries (LDC):

ln(R/P) = 0.1223 + 0.4079 ln(Y/P) + 0.5040 ln - 0.0918 ln t = (2.5128) (17.6377) (15.2437) (−2.7449)

R2 = 0.8268 F = 1151 n = 1120

where R = the level of nominal reserves in U.S. dollars P = U.S. implicit price deflator for GNP Y = the nominal GNP in U.S. dollars

�BP = the variability measure of balance of payments �EX = the variability measure of exchange rates

(Notes: The figures in parentheses are t ratios. This regression was based on quarterly data from 1976 to 1985 (40 quarters) for each of the 28 countries, giving a total sample size of 1120.) a. A priori, what are the expected signs of the various coefficients? Are the

results in accord with these expectations? b. What is the interpretation of the various partial slope coefficients?

�EX�BP

X = 1.11Y = 2.43

NYt

168 PART ONE: THE LINEAR REGRESSION MODEL

31See Mohsen Bahami-Oskooee and Margaret Malixi, “Exchange Rate Flexibility and the LDCs Demand for International Reserves,” Journal of Quantitative Economics, vol. 4, no. 2, July 1988, pp. 317–328.

guj75845_ch05.qxd 4/16/09 11:55 AM Page 168

CHAPTER FIVE: FUNCTIONAL FORMS OF REGRESSION MODELS 169

c. Test the statistical significance of each estimated partial regression coeffi- cient (i.e., the null hypothesis is that individually each true or population regression coefficient is equal to zero).

d. How would you test the hypothesis that all partial slope coefficients are simultaneously zero?

5.13. Based on the U.K. data on annual percentage change in wages (Y) and the per- cent annual unemployment rate (X) for the years 1950 to 1966, the following regression results were obtained:

= -1.4282 + 8.7243

se = (2.0675) (2.8478) r2 = 0.3849 F(1,15) = 9.39

a. What is the interpretation of 8.7243? b. Test the hypothesis that the estimated slope coefficient is not different from

zero. Which test will you use? c. How would you use the F test to test the preceding hypothesis? d. Given that percent and percent, what is the rate of change

of Y at these mean values? e. What is the elasticity of Y with respect to X at the mean values? f. How would you test the hypothesis that the true r2 = 0?

5.14. Table 5-13 gives data on the Consumer Price Index, Y(1980 = 100), and the money supply, X (billions of German marks), for Germany for the years 1971 to 1987.

X = 1.5Y = 4.8

a 1

Xt bNYt

CONSUMER PRICE INDEX (Y ) (1980 = 100) AND THE MONEY SUPPLY (X ) (MARKS, IN BILLIONS), GERMANY, 1971–1987

Year Y X

1971 64.1 110.02 1972 67.7 125.02 1973 72.4 132.27 1974 77.5 137.17 1975 82.0 159.51 1976 85.6 176.16 1977 88.7 190.80 1978 91.1 216.20 1979 94.9 232.41 1980 100.0 237.97 1981 106.3 240.77 1982 111.9 249.25 1983 115.6 275.08 1984 118.4 283.89 1985 121.0 296.05 1986 120.7 325.73 1987 121.1 354.93

Source: International Economic Conditions, annual ed., June 1988, The Federal Reserve Bank of St. Louis, p. 24.

TABLE 5-13

guj75845_ch05.qxd 4/16/09 11:55 AM Page 169

a. Regress the following: 1. Y on X 2. ln Y on ln X 3. ln Y on X 4. Y on ln X b. Interpret each estimated regression. c. For each model, find the rate of change of Y with respect to X. d. For each model, find the elasticity of Y with respect to X. For some of these

models, the elasticity is to be computed at the mean values of Y and X. e. Based on all these regression results, which model would you choose and

why? 5.15. Based on the following data, estimate the model:

a 1 Yi b = B1 + B2Xi + ui

170 PART ONE: THE LINEAR REGRESSION MODEL

32For additional details and numerical computation, see Gujarati and Porter, Basic Econometrics, 5th ed., McGraw-Hill, New York, 2009, pp. 203–205.

Y 86 79 76 69 65 62 52 51 51 48 X 3 7 12 17 25 35 45 55 70 120

a. What is the interpretation of B2? b. What is the rate of change of Y with respect to X? c. What is the elasticity of Y with respect to X? d. For the same data, run the regression

e. Can you compare the r2s of the two models? Why or why not? f. How do you decide which is a better model?

5.16. Comparing two r2s when dependent variables are different.32 Suppose you want to compare the r2 values of the growth model (5.19) with the linear trend model (5.23) of the consumer credit outstanding regressions given in the text. Proceed as follows: a. Obtain ln Yt, that is, the estimated log value of each observation from

model (5.19). b. Obtain the antilog values of the values obtained in (a). c. Compute r2 between the values obtained in (b) and the actual Y values

using the definition of r2 given in Question 3.5. d. This r2 value is comparable with the r2 value obtained from linear

model (5.23). Use the preceding steps to compare the r2 values of models (5.19) and (5.23).

5.17. Based on the GNP/money supply data given in Table 5-14 (found on the textbook’s Web site), the following regression results were obtained (Y = GNP, X = M2):

Yi = B1 + B2a 1 Xi b + ui

guj75845_ch05.qxd 4/16/09 11:55 AM Page 170

CHAPTER FIVE: FUNCTIONAL FORMS OF REGRESSION MODELS 171

Model Intercept Slope r 2

Log-linear 0.7826 0.8539 0.997 t = 11.40 t = 108.93

Log-lin 7.2392 0.0001 0.832 (growth model) t = 80.85 t = 14.07 Lin-log -24299 3382.4 0.899

t = -15.45 t = 18.84 Linear 703.28 0.4718 0.991 (LIV model) t = 8.04 t = 65.58

a. For each model, interpret the slope coefficient. b. For each model, estimate the elasticity of the GNP with respect to money

supply and interpret it. c. Are all r2 values directly comparable? If not, which ones are? d. Which model will you choose? What criteria did you consider in your

choice? e. According to the monetarists, there is a one-to-one relationship between

the rate of changes in the money supply and the GDP. Do the preceding regressions support this view? How would you test this formally?

5.18. Refer to the energy demand data given in Table 5-3. Instead of fitting the log- linear model to the data, fit the following linear model:

a. Estimate the regression coefficients, their standard errors, and obtain R2

and adjusted R2. b. Interpret the various regression coefficients c. Are the estimated partial regression coefficients individually statistically

significant? Use the p values to answer the question. d. Set up the ANOVA table and test the hypothesis that B2 = B3 = 0. e. Compute the income and price elasticities at the mean values of Y, X2, and

X3. How do these elasticities compare with those given in regression (5.12)? f. Using the procedure described in Problem 5.16, compare the R2 values of

the linear and log-linear regressions. What conclusion do you draw from these computations?

g. Obtain the normal probability plot for the residuals obtained from the linear-in-variable regression above. What conclusions do you draw?

h. Obtain the normal probability plot for the residuals obtained from the log- linear regression (5.12) and decide whether the residuals are approximately normally distributed.

i. If the conclusions in (g) and (h) are different, which regression would you choose and why?

5.19. To explain the behavior of business loan activity at large commercial banks, Bruce J. Summers used the following model:33

(A)Yt = 1

A + Bt

Yt = B1 + B2X2t + B3X3t + ut

33See his article, “A Time Series Analysis of Business Loans at Large Commercial Banks,” Economic Review, Federal Reserve Bank of St. Louis, May/June, 1975, pp. 8–14.

guj75845_ch05.qxd 4/16/09 11:55 AM Page 171

where Y is commercial and industrial (C&I) loans in millions of dollars, and t is time, measured in months. The data used in the analysis was collected monthly for the years 1966 to 1967, a total of 24 observations.

For estimation purposes, however, the author used the following model:

(B)

The regression results based on this model for banks including New York City banks and excluding New York City banks are given in Equations (1) and (2), respectively:

(1)

(2)

DW = 0.03*

*Durbin-Watson (DW) statistic (see Chapter 10). a. Why did the author use Model (B) rather than Model (A)? b. What are the properties of the two models? c. Interpret the slope coefficients in Models (1) and (2). Are the two slope

coefficients statistically significant? d. How would you find out the standard errors of the intercept and slope

coefficients in the two regressions? e. Is there a difference in the behavior of New York City and the non–New

York City banks in their C&I activity? How would you go about testing the difference, if any, formally?

5.20. Refer to regression (5.31). a. Interpret the slope coefficient. b. Using Table 5-11, compute the elasticity for this model. Is this elasticity con-

stant or variable? 5.21. Refer to the data given in Table 5-5 (found on the textbook’s Web site). Fit an

appropriate Engle curve to the various expenditure categories in relation to total personal consumption expenditure and comment on the statistical results.

5.22. Table 5-15 gives data on the annual rate of return Y (%) on Afuture mutual fund and a return on a market portfolio as represented by the Fisher Index, X (%). Now consider the following model, which is known in the finance literature as the characteristic line.

(1)Yt = B1 + B2Xi + ui

R 2 = 0.97 t = (196.70) (-66.52)

DW = 0.04* N1 Yt

= 26.79 - 0.14t

R 2 = 0.84 t = (96.13) (-24.52)

1 Yt

= 52.00 - 0.2t

1 Yt

= A + Bt

172 PART ONE: THE LINEAR REGRESSION MODEL

guj75845_ch05.qxd 4/16/09 11:55 AM Page 172

CHAPTER FIVE: FUNCTIONAL FORMS OF REGRESSION MODELS 173

In the literature there is no consensus about the prior value of B1. Some stud- ies have shown it to be positive and statistically significant and some have shown it to be statistically insignificant. In the latter case, Model (1) becomes a regression-through-the-origin model, which can be written as

(2)

Using the data given in Table 5-15, to estimate both these models and decide which model fits the data better.

5.23. Raw R2 for the regression-through-the-origin model. As noted earlier, for the regres- sion-through-the-origin regression model the conventionally computed R2 may not be meaningful. One suggested alternative for such models is the so-called “raw” R2, which is defined (for the two-variable case) as follows:

If you compare the raw R2 with the traditional r2 computed from Eq. (3.43), you will see that the sums of squares and cross-products in the raw r2 are not mean-corrected.

For model (2) in Problem 5.22 compute the raw r2. Compare this with the r2

value that you obtained for Model (1) in Problem (5.22). What general conclu- sion do you draw?

5.24. For regression (5.39) compute the raw r2 value and compare it with that given in Eq. (5.40).

5.25. Consider data on the weekly stock prices of Qualcomm, Inc., a digital wire- less telecommunications designer and manufacturer, over the time period of 1995 to 2000. The complete data can be found in Table 5-16 on the textbook’s Web site.

Raw r2 = AaXiYi B

2

aX 2 iaY

2 i

Yt = B2Xt + ut

ANNUAL RATES OF RETURN (%) ON AFUTURE FUND (Y ) AND ON THE FISHER INDEX (X ), 1971–1980

Year Y X

1971 67.5 19.5 1972 19.2 8.5 1973 −35.2 −29.3 1974 −42.0 −26.5 1975 63.7 61.9 1976 19.3 45.5 1977 3.6 9.5 1978 20.0 14.0 1979 40.3 35.3 1980 37.5 31.0

Source: Haim Levy and Marshall Sarnat, Portfolio and Investment Selection: Theory and Practice, Prentice-Hall International, Englewood Cliffs, N.J., 1984, pp. 730, 738.

TABLE 5-15

guj75845_ch05.qxd 4/16/09 11:55 AM Page 173

a. Create a scattergram of the closing stock price over time. What kind of pat- tern is evident in the plot?

b. Estimate a linear model to predict the closing stock price based on time. Does this model seem to fit the data well?

c. Now estimate a squared model by using both time and time-squared. Is this a better fit than in part (b)?

d. Now attempt to fit a cubic or third-degree polynomial to the data as follows:

where Y = stock price and X = time. Which model seems to be the best estimator for the stock prices?

5.26. Table 5-17 on the textbook’s Web site contains data about several magazines. The variables are: magazine name, cost of a full-page ad, circulation (projected, in thousands), percent male among the predicted readership, and median household income of readership. The goal is to predict the advertise- ment cost. a. Create scattergrams of the cost variable versus each of the three other vari-

ables. What types of relationships do you see? b. Estimate a linear regression equation with all the variables and create a

residuals versus fitted values plot. Does the plot exhibit constant variance from left to right?

c. Now estimate the following mixed model:

and create another residual plot. Does this model fit better than the one in part (b)?

5.27. Refer to Example 4.5 (Table 4-6) about education, GDP, and population for 38 countries. a. Estimate a linear (LIV) model for the data. What are the resulting equation

and relevant output values (i.e., F statistic, t values, and R2)? b. Now attempt to estimate a log-linear model (where both of the indepen-

dent variables are also in the natural log format). c. With the log-linear model, what does the coefficient of the GDP variable

indicate about education? What about the population variable? d. Which model is more appropriate?

5.28. Table 5-18 on the textbook’s Web site contains data on average life expectancy for 40 countries. It comes from the World Almanac and Book of Facts, 1993, by Pharos Books. The independent variables are the ratio of the number of people per television set and the ratio of number of people per physician. a. Try fitting a linear (LIV) model to the data. Does this model seem to fit

well? b. Create two scattergrams, one of the natural log of life expectancy versus the

natural log of people per television, and one of the natural log of life expectancy versus the natural log of people per physician. Do the graphs appear linear?

c. Estimate the equation for a log-linear model. Does this model fit well?

ln Yi = B0 + B1 ln Circ + B2 PercMale + B3 MedIncome + ui

Yi = B0 + B1Xi + B2X2i + B3X3i + ui

174 PART ONE: THE LINEAR REGRESSION MODEL

guj75845_ch05.qxd 4/16/09 11:55 AM Page 174

CHAPTER FIVE: FUNCTIONAL FORMS OF REGRESSION MODELS 175

d. What do the coefficients of the log-linear model indicate about the relation- ships of the variables to life expectancy? Does this seem reasonable?

5.29 Refer to Example 5.6 in the chapter. It was shown that the percentage change in the index of hourly earnings and the unemployment rate from 1958–1969 followed the traditional Phillips curve model. An updated version of the data, from 1965–2007, can be found in Table 5-19 on the textbook’s Web site. a. Create a scattergram using the percentage change in hourly earnings as the

Y variable and the unemployment rate as the X variable. Does the graph appear linear?

b. Now create a scattergram as above, but use 1/X as the independent vari- able. Does this seem better than the graph in part (a)?

c. Fit Eq. (5.29) to the new data. Does this model seem to fit well? Also create a regular linear (LIV) model as in Eq. (5.30). Which model is better? Why?

APPENDIX 5A: Logarithms

Consider the numbers 5 and 25. We know that

(5A.1)

We say that the exponent 2 is the logarithm of 25 to the base 5. More formally, the logarithm of a number (e.g., 25) to a given base (e.g., 5) is the power (2) to which the base (5) must be raised to obtain the given number (25).

More generally, if

(5A.2)

then

(5A.3)

In mathematics the function (5A.2) is called an exponential function and (5A.3) is called the logarithmic function. As is clear from Eqs. (5A.2) and (5A.3), one function is the inverse of the other function.

Although any (positive) base can be used, in practice, the two commonly used bases are 10 and the mathematical number

Logarithms to base 10 are called common logarithms. Thus,

That is, in the first case 100 = 102 and in the latter case Logarithms to the base e are called natural logarithms. Thus,

All these calculations can be done routinely on a hand calculator. By convention, the logarithm to base 10 is denoted by the letters log and to

the base e by ln. Thus, in the preceding example, we can write log 100 or log 30 or ln 100 or ln 30.

loge 100 L 4.6051 and loge 30 L 3.4012

30 L 101.48.

log10 100 = 2 log10 30 L 1.48

e = 2.71828 . . . .

logb Y = X

Y = bx (b 7 0)

25 = 52

guj75845_ch05.qxd 4/16/09 11:55 AM Page 175

There is a fixed relationship between the common log and natural log, which is

(5A.4)

That is, the natural log of the number X is equal to 2.3026 times the log of X to the base 10. Thus,

as before. Therefore, it does not matter whether one uses common or natural logs. But in mathematics the base that is usually preferred is e, that is, the nat- ural logarithm. Hence, in this book all logs are natural logs, unless stated ex- plicitly. Of course, we can convert the log of a number from one basis to the other using Eq. (5A.4).

Keep in mind that logarithms of negative numbers are not defined. Thus, the log of (−5) or the ln (−5) is not defined.

Some properties of logarithms are as follows: If A and B are any positive numbers, then it can be shown that:

1. (5A.5)

That is, the log of the product of two (positive) numbers A and B is equal to the sum of their logs.

2. (5A.6)

That is, the log of the ratio of A to B is the difference in the logs of A and B.

3. (5A.7)

That is, the log of the sum or difference of A and B is not equal to the sum or difference of their logs.

4. (5A.8)

That is, the log of A raised to power k is k times the log of A.

5. (5A.9)

That is, the log of e to itself as a base is 1 (as is the log of 10 to the base 10).

6. (5A.10)

That is, the natural log of the number 1 is zero (so is the common log of number 1). 7. If

(5A.11) dY dX

= 1 X

Y = ln X,

ln 1 = 0

ln e = 1

ln (Ak) = k ln A

ln (A � B) Z ln A � ln B

ln (A/B) = ln A - ln B

ln (A * B) = ln A + ln B

ln 30 = 2.3026 log 30 = 2.3026(1.48) = 3.4012 (approx.)

ln X = 2.3026 log X

176 PART ONE: THE LINEAR REGRESSION MODEL

guj75845_ch05.qxd 4/16/09 11:55 AM Page 176

CHAPTER FIVE: FUNCTIONAL FORMS OF REGRESSION MODELS 177

That is, the rate of change (i.e., the derivative) of Y with respect to X is 1 over X.

The exponential and (natural) logarithmic functions are depicted in Figure 5A.1. Although the number whose log is taken is always positive, the logarithm

of that number can be positive as well as negative. It can be easily verified that if

Also note that although the logarithmic curve shown in Figure 5A-1(b) is positively sloping, implying that the larger the number is, the larger its loga- rithmic value will be, the curve is increasing at a decreasing rate (mathemati- cally, the second derivative of the function is negative). Thus, ln(10) = 2.3026 (approx.) and ln (20) = 2.9957 (approx.). That is, if a number is doubled, its log- arithm does not double.

This is why the logarithm transformation is called a nonlinear trans- formation. This can also be seen from Equation (5A.11), which notes that if Y = ln X, dY/dX = 1/X. This means that the slope of the logarithmic function de- pends on the value of X; that is, it is not constant (recall the definition of linear- ity in the variable).

Logarithms and percentages: Since or for very small changes the change in lnX is equal to the relative or proportional change in X. In practice, if the change in X is reasonably small, the preceding relationship can be written as the change in ln to the relative change in X, where means approximately.

Thus, for small changes,

relative change in X(ln Xt - lnXt-1) L (Xt - Xt-1)

Xt-1 =

LX L

d(ln X) = dXX , d(ln X)

d X = 1 X,

Y 7 1 then ln Y 7 0 Y = 1 then ln Y = 0 0 6 Y 6 1 then ln Y 6 0

Y

(a)

X Y

(b) 0 0

Y = eX

45°

X = ln Y

45° 1

1

X = ln Y

Exponential and logarithmic functions: (a) exponential function; (b) logarithmic function

FIGURE 5A-1

guj75845_ch05.qxd 4/16/09 11:55 AM Page 177

178

CHAPTER 6 DUMMY VARIABLE REGRESSION MODELS

In all the linear regression models considered so far the dependent variable Y and the explanatory variables, the X’s, have been numerical or quantitative. But this may not always be the case; there are occasions when the explanatory vari- able(s) can be qualitative in nature. These qualitative variables, often known as dummy variables, have some alternative names used in the literature, such as indicator variables, binary variables, categorical variables, and dichotomous variables. In this chapter we will present several illustrations to show how the dummy variables enrich the linear regression model. For the bulk of this chapter we will continue to assume that the dependent variable is numerical.

6.1 THE NATURE OF DUMMY VARIABLES

Frequently in regression analysis the dependent variable is influenced not only by variables that can be quantified on some well-defined scale (e.g., income, output, costs, prices, weight, temperature) but also by variables that are basi- cally qualitative in nature (e.g., gender, race, color, religion, nationality, strikes, political party affiliation, marital status). For example, some researchers have reported that, ceteris paribus, female college teachers are found to earn less than their male counterparts, and, similarly, that the average score of female students on the math part of the S.A.T. examination is less than their male counterparts (see Table 2-15, found on the textbook’s Web site). Whatever the reason for this difference, qualitative variables such as gender should be included among the explanatory variables when problems of this type are encountered. Of course, there are other examples that also could be cited.

guj75845_ch06.qxd 4/16/09 11:56 AM Page 178

CHAPTER SIX: DUMMY VARIABLE REGRESSION MODELS 179

Such qualitative variables usually indicate the presence or absence of a “quality” or an attribute, such as male or female, black or white, Catholic or non-Catholic, citizens or non-citizens. One method of “quantifying” these attributes is by constructing artificial variables that take on values of 0 or 1, 0 in- dicating the absence of an attribute and 1 indicating the presence (or posses- sion) of that attribute. For example, 1 may indicate that a person is a female and 0 may designate a male, or 1 may indicate that a person is a college graduate and 0 that he or she is not, or 1 may indicate membership in the Democratic party and 0 membership in the Republican party. Variables that assume values such as 0 and 1 are called dummy variables. We denote the dummy explana- tory variables by the symbol D rather than by the usual symbol X to emphasize that we are dealing with a qualitative variable.

Dummy variables can be used in regression analysis just as readily as quan- titative variables. As a matter of fact, a regression model may contain only dummy explanatory variables. Regression models that contain only dummy explanatory variables are called analysis-of-variance (ANOVA) models. Consider the following example of the ANOVA model:

(6.1)

where Y = annual expenditure on food ($) Di = 1 if female

= 0 if male

Note that model (6.1) is like the two-variable regression models encountered previously except that instead of a quantitative explanatory variable X, we have a qualitative or dummy variable D. As noted earlier, from now on we will use D to denote a dummy variable.

Assuming that the disturbances ui in model (6.1) satisfy the usual assump- tions of the classical linear regression model (CLRM), we obtain from model (6.1) the following:1

Mean food expenditure, males:

(6.2) = B1

E(Yi|Di = 0) = B1 + B2(0)

Yi = B1 + B2Di + ui

1Since dummy variables generally take on values of 1 or 0, they are nonstochastic; that is, their values are fixed. And since we have assumed all along that our X variables are fixed in repeated sampling, the fact that one or more of these X variables are dummies does not create any special problems insofar as estimation of model (6.1) is concerned. In short, dummy explanatory variables do not pose any new estimation problems and we can use the customary OLS method to estimate the parameters of models that contain dummy explanatory variables.

guj75845_ch06.qxd 4/16/09 11:56 AM Page 179

Mean food expenditure, females:

(6.3)

From these regressions we see that the intercept term B1 gives the average or mean food expenditure of males (that is, the category for which the dummy variable gets the value of zero) and that the “slope” coefficient B2 tells us by how much the mean food expenditure of females differs from the mean food expenditure of males; (B1 + B2) gives the mean food expenditure for females. Since the dummy variable takes values of 0 and 1, it is not legitimate to call B2 the slope coefficient, since there is no (continuous) regression line involved here. It is better to call it the differential intercept coefficient because it tells by how much the value of the intercept term differs between the two categories. In the present context, the differential intercept term tells by how much the mean food expenditure of females differs from that of males.

A test of the null hypothesis that there is no difference in the mean food ex- penditure of the two sexes (i.e., B2 = 0) can be made easily by running regres- sion (6.1) in the usual ordinary least squares (OLS) manner and finding out whether or not on the basis of the t test the computed b2 is statistically significant.

Example 6.1. Annual Food Expenditure of Single Male and Single Female Consumers

Table 6-1 gives data on annual food expenditure ($) and annual after-tax income ($) for males and females for the year 2000 to 2001.

From the data given in Table 6-1, we can construct Table 6-2. For the moment, just concentrate on the first three columns of this table, which relate to expenditure on food, the dummy variable taking the value of 1 for females and 0 for males, and after-tax income.

= B1 + B2

E(Yi|Di = 1) = B1 + B2(1)

180 PART ONE: THE LINEAR REGRESSION MODEL

FOOD EXPENDITURE IN RELATION TO AFTER-TAX INCOME, SEX, AND AGE

Food expenditure, After-tax income, Food expenditure, After-tax income, Age female ($) female ($) male ($) male ($)

25 1983 11557 2230 11589 25–34 2987 29387 3757 33328 35–44 2993 31463 3821 36151 45–54 3156 29554 3291 35448 55–64 2706 25137 3429 32988 65 2217 14952 2533 20437

Note: The food expenditure and after-tax income data are averages based on the actual number of people in various age groups. The actual numbers run into the thousands.

Source: Consumer Expenditure Survey, Bureau of Labor Statistics, http://Stats.bls.gov/Cex/CSXcross.htm.

7

6

TABLE 6-1

guj75845_ch06.qxd 4/16/09 11:56 AM Page 180

CHAPTER SIX: DUMMY VARIABLE REGRESSION MODELS 181

Regressing food expenditure on the gender dummy variable, we obtain the following results.

se = (233.0446)(329.5749) (6.4)

t = (13.6318) (-1.5267)

where Y = food expenditure ($) and D = 1 if female, 0 if male.

As these results show, the mean food expenditure of males is and that of females is (3176.833 - 503.1667) = 2673.6663 or about $2,674. But what is interesting to note is that the estimated Di is not statistically significant, for its t value is only about -1.52 and its p value is about 15 percent. This means that although the numerical values of the male and female food expenditures are different, statistically there is no significant difference between the two numbers. Does this finding make practical (as opposed to statistical) sense? We will soon find out.

We can look at this problem in a different perspective. If you simply take the averages of the male and female food expenditure figures separately, you will see that these averages are $3176.833 and $2673.6663. These numbers are the same as those that we obtained on the basis of regression (6.4). What this means is that the dummy variable regression (6.4) is simply a device to find out if two mean values are different. In other words, a regression on an intercept and a dummy variable is a simple way of finding out if the mean values of two groups differ. If the dummy coefficient B2 is statistically significant (at the chosen level of

L$3,177

r2 = 0.1890

YNi = 3176.833 - 503.1667Di

FOOD EXPENDITURE IN RELATION TO AFTER-TAX INCOME AND SEX

Observation Food expenditure After-tax income Sex

1 1983.000 11557.00 1 2 2987.000 29387.00 1 3 2993.000 31463.00 1 4 3156.000 29554.00 1 5 2706.000 25137.00 1 6 2217.000 14952.00 1 7 2230.000 11589.00 0 8 3757.000 33328.00 0 9 3821.000 36151.00 0

10 3291.000 35448.00 0 11 3429.000 32988.00 0 12 2533.000 20437.00 0

Notes: Food expenditure = Expenditure on food in dollars. After-tax income = After-tax income in dollars. Sex = 1 if female, 0 if male. Source: Extracted from Table 10-1.

TABLE 6-2

guj75845_ch06.qxd 4/16/09 11:56 AM Page 181

significance level), we say that the two means are statistically different. If it is not statistically significant, we say that the two means are not statistically sig- nificant. In our example, it seems they are not.

Notice that in the present example the dummy variable “sex” has two cate- gories. We have assigned the value of 1 to female consumers and the value of 0 to male consumers. The intercept value in such an assignment represents the mean value of the category that gets the value of 0, or male, in the present case. We can therefore call the category that gets the value of 0 the base, or reference, or benchmark, or comparison, category. To compute the mean value of food ex- penditure for females, we have to add the value of the coefficient of the dummy variable to the intercept value, which represents food expenditure of females, as shown before.

A natural question that arises is: Why did we choose male as the reference category and not female? If we have only two categories, as in the present instance, it does not matter which category gets the value of 1 and which gets the value of 0. If you want to treat female as the reference category (i.e., it gets the value of 0), Eq. (6.4) now becomes:

se = (233.0446) (329.5749) (6.5)

t = (11.4227) (1.5267)

where Di = 1 for male and 0 for female. In either assignment of the dummy variable, the mean food consumption

expenditure of the two sexes remains the same, as it should. Comparing Equations (6.4) and (6.5), we see the r2 values remain the same, and the absolute value of the dummy coefficients and their standard errors remain the same. The only change is in the numerical value of the intercept term and its t value.

Another question: Since we have two categories, why not assign two dum- mies to them? To see why this is inadvisable, consider the following model:

(6.6)

where Y is expenditure on food, D2 = 1 for female and 0 for male, and D3 = 1 for male and 0 for female. This model cannot be estimated because of perfect collinearity (i.e., perfect linear relationship) between D2 and D3. To see this clearly, suppose we have a sample of two females and three males. The data matrix will look something like the following.

Intercept D2 D3

Male Y1 1 0 1 Male Y2 1 0 1 Female Y3 1 1 0 Male Y4 1 0 1 Female Y5 1 1 0

Yi = B1 + B2D2i + B3Di + ui

r2 = 0.1890

YNi = 2673.667 + 503.1667Di

182 PART ONE: THE LINEAR REGRESSION MODEL

guj75845_ch06.qxd 4/16/09 11:56 AM Page 182

CHAPTER SIX: DUMMY VARIABLE REGRESSION MODELS 183

The first column in this data matrix represents the common intercept term, B1. It is easy to verify that D2 = (1 - D3) or D3 = (1 - D2); that is, the two dummy variables are perfectly collinear. Also, if you add up columns D2 and D3, you will get the first column of the data matrix. In any case, we have the situation of perfect collinear- ity. As we noted in Chapter 3, in cases of perfect collinearity among explanatory variables, it is not possible to obtain unique estimates of the parameters.

There are various ways to mitigate the problem of perfect collinearity. If a model contains the (common) intercept, the simplest way is to assign the dum- mies the way we did in model (6.4), namely, to use only one dummy if a qualita- tive variable has two categories, such as sex. In this case, drop the column D2 or D3 in the preceding data matrix. The general rule is: If a model has the common intercept, B1, and if a qualitative variable has m categories, introduce only (m - 1) dummy variables. In our example, sex has two categories, hence we introduced only a single dummy variable. If this rule is not followed, we will fall into what is known as the dummy variable trap, that is, the situation of perfect collinearity or multicollinearity, if there is more than one perfect relationship among the variables.2

Example 6.2. Union Membership and Right-to-Work Laws

Several states in the United States have passed right-to-work laws that prohibit union membership as a prerequisite for employment and collective bargain- ing. Therefore, we would expect union membership to be lower in those states that have such laws compared to those states that do not. To see if this is the case, we have collected the data shown in Table 6-3. For now concen- trate only on the variable PVT (% of private sector employees in trade unions in 2006) and RWL, a dummy that takes a value of 1 if a state has a right-to- work law and 0 if a state does not have such a law. Note that we are assign- ing one dummy to distinguish the right- and non-right-to-work-law states to avoid the dummy variable trap.

The regression results based on the data for 50 states and the District of Columbia are as follows:

se = (0.758) (1.181)

t = (20.421)* (-6.062)* (6.7)

*p values are extremely small

Note: RWL = 1 for right-to-work-law states

In the states that do not have right-to-work laws, the average union membership is about 15.5 percent. But in those states that have such laws, the

r2 = 0.429

PVTi = 15.480 - 7.161RWLi

2Another way to resolve the perfect collinearity problem is to keep as many dummies as the number of categories but to drop the common intercept term, B1, from the model; that is, run the re- gression through the origin. But we have already warned about the problems involved in this pro- cedure in Chapter 5.

guj75845_ch06.qxd 4/16/09 11:56 AM Page 183

average union membership is (15.48 - 7.161) 8.319 percent. Since the dummy coefficient is statistically significant, it seems that there is indeed a difference in union membership between states that have the right-to-work laws and the states that do not have such laws.

It is instructive to see the scattergram of PVT and RWL, which is shown in Figure 6-1.

As you can see, the observations are concentrated at two extremes, 0 (no RWL states) and 1 (RWL states). For comparison, we have also shown the average level of unionization (%) in the two groups. The individual observa- tions are scattered about their respective mean values.

ANOVA models like regressions (6.4) and (6.7), although common in fields such as sociology, psychology, education, and market research, are not that common in economics. In most economic research a regression model contains some explanatory variables that are quantitative and some that are qualitative. Regression models containing a combination of quantitative and qualitative variables are called analysis-of-covariance (ANCOVA) models, and in the re- mainder of this chapter we will deal largely with such models. ANCOVA mod- els are an extension of the ANOVA models in that they provide a method of statistically controlling the effects of quantitative explanatory variables, called covariates or control variables, in a model that includes both quantitative and

184 PART ONE: THE LINEAR REGRESSION MODEL

UNION MEMBERSHIP IN THE PRIVATE SECTOR AND RIGHT-TO-WORK LAWS

PVT RWL PVT RWL PVT RWL

TABLE 6-3

10.6 1 24.7 0 9.7 0 6.5 1

17.8 0 9.2 0

16.6 0 12.8 0 13.6 0 7.3 1 5.4 1

24.2 0 6.4 1

15.2 0 12.9 1 13.1 1 8.7 1

11.1 0 6.5 1

13.8 0 14.5 0 14.0 0 20.6 0 17.0 0 8.9 1

11.9 0 15.6 0 9.7 1

17.7 1 11.2 0 20.6 0 11.4 0 26.3 0 3.9 1

7.6 1 15.4 0 8.5 1

15.4 0 16.6 0 15.8 0 5.9 1 7.7 1 6.4 1 5.7 0 6.8 1

12.2 0 4.8 1

21.4 0 14.7 0 15.4 0 9.4 1

Notes: PVT = Percent unionized in the private sector. RWL = 1 for right-to-work-law states, 0 otherwise.

Sources: http://www.dol.gov/esa/whd/state/righttowork.htm. http://www.bls.gov/news.release/union2.t05.htm.

guj75845_ch06.qxd 4/16/09 11:56 AM Page 184

CHAPTER SIX: DUMMY VARIABLE REGRESSION MODELS 185

qualitative, or dummy, explanatory variables. As we will show, if we exclude covariates from a model, the regression results are subject to model specifica- tion error.

6.2 ANCOVA MODELS: REGRESSION ON ONE QUANTITATIVE VARIABLE AND ONE QUALITATIVE VARIABLE WITH TWO CATEGORIES: EXAMPLE 6.1 REVISITED

As an example of the ANCOVA model, we reconsider Example 6.1 by bringing in disposable income (i.e., income after taxes), a covariate, as an explanatory variable.

(6.8)

Y = expenditure on food ($), X = after-tax income ($), and D = 1 for female and 0 for male.

Using the data given in Table 6-2, we obtained the following regression results:

= 1506.244 - 228.9868Di + 0.0589Xi se = (188.0096)(107.0582) (0.0061)

t = (8.0115) (-2.1388) (9.6417) (6.9)

p = (0.000)* (0.0611) (0.000)*

R2 = 0.9284

*Denotes extremely small values.

YNi

Yi = B1 + B2Di + B3Xi + ui

Mean � 15.5%

Mean � 8.3%

30

25

20

10

5

15

0 0 0.2 0.30.1 0.4 0.5 0.6 0.7

RWL

P V

T

0.8 0.9 1.0

Unionization in private sector (PVT) versus right-to-work-law (RWL) statesFIGURE 6-1

guj75845_ch06.qxd 4/16/09 11:56 AM Page 185

These results are noteworthy for several reasons. First, in Eq. (6.2), the dummy coefficient was statistically insignificant, but now it is significant. (Why?) It seems in estimating Eq. (6.2) we committed a specification error because we ex- cluded a covariate, the after-tax income variable, which a priori is expected to have an important influence on consumption expenditure. Of course, we did this for pedagogic reasons. This shows how specification errors can have a dramatic effect(s) on the regression results. Second, since Equation (6.9) is a multiple re- gression, we now can say that holding after-tax income constant, the mean food expenditure for males is about $1,506, and for females it is (1506.244 - 228.9866) or about $1,277, and these means are statistically significantly different. Third, holding gender differences constant, the income coefficient of 0.0589 means the mean food expenditure goes up by about 6 cents for every additional dollar of after-tax income. In other words, the marginal propensity of food consumption— additional expenditure on food for an additional dollar of disposable income— is about 6 cents.

As a result of the preceding discussion, we can now derive the following regressions from Eq. (6.9) for the two groups as follows:

Mean food expenditure regression for females:

= 1277.2574 + 0.0589Xi (6.10)

Mean food expenditure regression for males:

= 1506.2440 + 0.0589Xi (6.11)

These two regression lines are depicted in Figure 6-2.

YNi

YNi

186 PART ONE: THE LINEAR REGRESSION MODEL

Y

X After-Tax Expenditure

Fo od

E xp

en d

it u

re

Yi � 1277.2

547 � 0.058

9 Xi

ˆ

Yi � 1506.

2440 � 0.058

9 Xi

ˆ

(male)

(female)

Food expenditure in relation to after-tax incomeFIGURE 6-2

guj75845_ch06.qxd 4/16/09 11:56 AM Page 186

CHAPTER SIX: DUMMY VARIABLE REGRESSION MODELS 187

As you can see from this figure, the two regression lines differ in their inter- cepts but their slopes are the same. In other words, these two regression lines are parallel.

A question: By holding sex constant, we have said that the marginal propen- sity of food consumption is about 6 cents. Could there also be a difference in the marginal propensity of food consumption between the two sexes? In other words, could the slope coefficient B3 in Equation (6.8) be statistically different for the two sexes, just as there was a statistical difference in their intercept val- ues? If that turned out to be the case, then Eq. (6.8) and the results based on this model given in Eq. (6.9) would be suspect; that is, we would be commit- ting another specification error. We explore this question in Section 6.5.

6.3 REGRESSION ON ONE QUANTITATIVE VARIABLE AND ONE QUALITATIVE VARIABLE WITH MORE THAN TWO CLASSES OR CATEGORIES

In the examples we have considered so far we had a qualitative variable with only two categories or classes—male or female, right-to-work laws or no right- to-work laws, etc. But the dummy variable technique is quite capable of han- dling models in which a qualitative variable has more than two categories.

To illustrate this, consider the data given in Table 6-4 on the textbook’s Web site. This table gives data on the acceptance rates (in percents) of the top 65 grad- uate schools (as ranked by U.S. News), among other things. For the time being, we will concentrate only on the schools’ acceptance rates. Suppose we are interested in finding out if there are statistically significant differences in the acceptance rates among the 65 schools included in the analysis. For this purpose, the schools have been divided into three regions: (1) South (22 states in all), (2) Northeast and North Central (32 states in all), and (3) West (10 states in all). The qualitative vari- able here is “region,” which has the three categories just listed.

Now consider the following model:

(6.12)

where D2 = 1 if the school is in the Northeastern or North Central region = 0 otherwise (i.e., in one of the other 2 regions)

D3 = 1 if the school is in the Western region = 0 otherwise (i.e., in one of the other 2 regions)

Since the qualitative variable region has three classes, we have assigned only two dummies. Here we are treating the South as the base or reference category. Table 6-4 includes these dummy variables.

From Equation (6.12) we can easily obtain the mean acceptance rate in the three regions as follows:

Mean acceptance rate for schools in the Northeastern and North Central region:

(6.13)E(Si|D2i = 1, D3i = 0) = B1 + B2

Accepti = B1 + B2D2i + B3D3i + ui

guj75845_ch06.qxd 4/16/09 11:56 AM Page 187

Mean acceptance rate for schools in the Western region:

(6.14)

Mean acceptance rate for schools in the Southern region:

(6.15)

As this exercise shows, the common intercept, B1, represents the mean accep- tance rate for schools that are assigned the dummy values of (0, 0). Notice that B2 and B3, being the differential intercepts, tell us by how much the mean accep- tance rates differ among schools in the different regions. Thus, B2 tells us by how much the mean acceptance rates of the schools in the Northeastern and North Central region differ from those in the Southern region. Analogously, B3 tells us by how much the mean acceptance rates of the schools in the Western region dif- fer from those in the Southern region. To get the actual mean acceptance rate in the Northeastern and North Central region, we have to add B2 to B1, and the ac- tual mean acceptance rate in the Western region is found by adding B3 to B1.

Before we present the statistical results, note carefully that we are treating the South as the reference region. Hence all acceptance rate comparisons are in re- lation to the South. If we had chosen the West as our reference instead, then we would have to estimate Eq. (6.12) with the appropriate dummy assignment. Therefore, once we go beyond the simple dichotomous classification (female or male, union or nonunion, etc.), we must be very careful in specifying the base category, for all comparisons are in relation to it. Changing the base category will change the compar- isons, but it will not change the substance of the regression results. Of course, we can estimate Eq. (6.12) with any category as the base category.

The regression results of model (6.12) are as follows:

Accepti = 44.541 - 10.680D2i - 12.501D3i t = (14.38) (-2.67) (-2.26)

p = (0.000) (0.010) (0.028) (6.16)

R2 = 0.122

These results show that the mean acceptance rate in the South (reference cate- gory) was about 45 percent. The differential intercept coefficients of D2i and D3i are statistically significant (Why?). This suggests that there is a significant statis- tical difference in the mean acceptance rates between the Northeastern/North Central and the Southern schools, as well as between the Western and Southern schools.

In passing, note that the dummy variables will simply point out the differ- ences, if they exist, but they will not suggest the reasons for the differences. Acceptance rates in the South may be higher for a variety of reasons.

As you can see, Eq. (6.12) and its empirical counterpart in Eq. (6.16) are ANOVA models. What happens if we consider an ANCOVA model by bringing

E(Si|D2i = 0, D3i = 0) = B1 + B2

E(Si|D2i = 0, D3i = 1) = B1 + B2

188 PART ONE: THE LINEAR REGRESSION MODEL

guj75845_ch06.qxd 4/16/09 11:56 AM Page 188

CHAPTER SIX: DUMMY VARIABLE REGRESSION MODELS 189

in a quantitative explanatory variable, a covariate, such as the annual tuition per school? The data on this variable are already contained in Table 6-4. Incorporating this variable, we get the following regression (see Figure 6-3):

Accepti = 79.033 - 5.670D2i - 11.14D3i - 0.0011Tuition

t = (15.53) (-1.91) (-2.79) (-7.55) (6.17)

p = (0.000)* (0.061)** (0.007)* (0.000)*

R2 = 0.546

A comparison of Equations (6.17) and (6.16) brings out a few surprises. Holding tuition costs constant, we now see that, at the 5 percent level of signif- icance, there does not appear to be a significant difference in mean acceptance rates between schools in the Northeastern/North Central and the Southern re- gions (Why?). As we saw before, however, there still is a statistically significant difference in mean acceptance rates between the Western and Southern schools, even while holding the tuition costs constant. In fact, it appears that the Western schools’ average acceptance rate is about 11 percent lower that that of the Southern schools while accounting for tuition costs. Since we see a difference in results between Eqs. (6.17) and (6.16), there is a chance we have committed a specification error in the earlier model by not including the tuition costs. This is similar to the finding regarding the food expenditure function with and without after-tax income. As noted before, omitting a covariate may lead to model specification errors.

Tuition Cost

A ve

ra ge

A cc

ep ta

n ce

R at

e Accepti � 67.893 � 0.0011Tuition

i

Accepti � 79.033 � 0.0011Tuition i

Northeast/North Central and South

West

� Average acceptance rates and tuition costsFIGURE 6-3

*Statistically significant at the 5% level. **Not statistically significant at the 5% level; however, at a 10% level, this variable would be

significant.

guj75845_ch06.qxd 4/16/09 11:56 AM Page 189

The slope of -0.0011 suggests that if the tuition costs increase by $1, we should expect to see a decrease of about 0.11 percent in a school’s acceptance rate, on average.

We also ask the same question that we raised earlier about our food expendi- ture example. Could the slope coefficient of tuition vary from region to region? We will answer this question in Section 6.5.

6.4 REGRESSION ON ONE QUANTIATIVE EXPLANATORY VARIABLE AND MORE THAN ONE QUALITATIVE VARIABLE

The technique of dummy variables can be easily extended to handle more than one qualitative variable. To that end, consider the following model:

(6.18)

where Y = hourly wage in dollars X = education (years of schooling)

D2 = 1 if female, 0 if male D3 = 1 if nonwhite and non-Hispanic, 0 if otherwise

In this model sex and race are qualitative explanatory variables and education is a quantitative explanatory variable.3

To estimate the preceding model, we obtained data on 528 individuals, which gave the following results.4

= -0.2610 - 2.3606D2i - 1.7327D3i + 0.8028Xi

t = (-0.2357)** (-5.4873)* (-2.1803)* (9.9094)* (6.19)

R2 = 0.2032; n = 528

*indicates p value less than 5%; **indicates p value greater than 5%

Let us interpret these results. First, what is the base category here, since we now have two qualitative variables? It is white and/or Hispanic male. Second, holding the level of education and race constant, on average, women earn less than men by about $2.36 per hour. Similarly, holding the level of education and sex con- stant, on average, nonwhite/non-Hispanics earn less than the base category by about $1.73 per hour. Third, holding sex and race constant, mean hourly wages go up by about 80 cents per hour for every additional year of education.

YN i

Yi = B1 + B2D2i + B3D3i + B4Xi + ui

190 PART ONE: THE LINEAR REGRESSION MODEL

3If we were to define education as less than high school, high school, and more than high school, education would also be a dummy variable with three categories, which means we would have to use two dummies to represent the three categories.

4These data were originally obtained by Ernst Bernd and are reproduced from Arthur S. Goldberger, Introductory Econometrics, Harvard University Press, Cambridge, Mass., 1998, Table 1.1. These data were derived from the Current Population Survey conducted in May 1985.

guj75845_ch06.qxd 4/16/09 11:56 AM Page 190

CHAPTER SIX: DUMMY VARIABLE REGRESSION MODELS 191

Interaction Effects

Although the results given in Equation (6.19) make sense, implicit in Equation (6.18) is the assumption that the differential effect of the sex dummy D2 is constant across the two categories of race and the differential effect of the race dummy D3 is also constant across the two sexes. That is to say, if the mean hourly wage is higher for males than for females, this is so whether they are nonwhite/non-Hispanic or not. Likewise, if, say, nonwhite/non-Hispanics have lower mean wages, this is so regardless of sex.

In many cases such an assumption may be untenable. As a matter of fact, U.S. courts are full of cases charging all kinds of discrimination from a variety of groups. A female nonwhite/non-Hispanic may earn lower wages than a male nonwhite/non-Hispanic. In other words, there may be interaction between the qualitative variables, D2 and D3. Therefore, their effect on mean Y may not be simply additive, as in Eq. (6.18), but may be multiplicative as well, as in the following model:

(6.20)

The dummy D2iD3, the product of two dummies, is called the interaction dummy, for it gives the joint, or simultaneous, effect of two qualitative variables.

From Equation (6.20) we can obtain:

(6.21)

which is the mean hourly wage function for female nonwhite/non-Hispanic workers. Observe that:

B2 = differential effect of being female B3 = differential effect of being a nonwhite/non-Hispanic B4 = differential effect of being a female nonwhite/non-Hispanic

which shows that the mean hourly wage of female nonwhite/non-Hispanics is different (by B4) from the mean hourly wage of females or nonwhite/ non-Hispanics. Depending on the statistical significance of the various dummy coefficients, we can arrive at specific cases.

Using the data underlying Eq. (6.19), we obtained the following regression results:

= -0.2610 -2.3606D2i - 1.7327D3i + 2.1289D2iD3i + 0.8028Xi t = (-0.2357)** (-5.4873)* (-2.1803)*(1.7420)! (9.9095)* (6.22)

R2 = 0.2032, n = 528

*p value below 5%, ! = p value about 8%, **p value greater than 5%

YN i

E (Yi|D2i = 1, D3i = 1, Xi) = (B1 + B2 + B3 + B4) + B5Xi

Yi = B1 + B2D2i + B3D3i + B3(D2iD3i) + B4Xi + u

guj75845_ch06.qxd 4/16/09 11:56 AM Page 191

Holding the level of education constant, if we add all the dummy coefficients, we obtain (-2.3606 - 1.7327 + 2.1289) = -1.964. This would suggest that the mean hourly wage of nonwhite/non-Hispanic female workers is lower by about $1.96, which is between the value of 2.3606 (sex difference alone) and 1.7327 (race difference alone). So, you can see how the interaction dummy mod- ifies the effect of the two coefficients taken individually.

Incidentally, if you select 5% as the level of significance, the interaction dummy is not statistically significant at this level, so there is no interaction ef- fect of the two dummies and we are back to Eq. (6.18).

A Generalization

As you can imagine, we can extend our model to include more than one quan- titative variable and more than two qualitative variables. However, we must be careful that the number of dummies for each qualitative variable is one less than the number of categories of that variable. An example follows.

Example 6.3. Campaign Contributions by Political Parties

In a study of party contributions to congressional elections in 1982, Wilhite and Theilmann obtained the following regression results, which are given in tabular form (Table 6-5) using the authors’ symbols. The dependent variable in this regression is PARTY$ (campaign contributions made by political parties to local congressional candidates). In this regression $GAP, VGAP, and PU are three quantitative variables and OPEN, DEMOCRAT, and COMM are three qualitative variables, each with two categories.

What do these results suggest? The larger the $GAP is (i.e., the opponent has substantial funding), the less the support by the national party to the local candidate is. The larger the VGAP is (i.e., the larger the margin by which the opponent won the previous election), the less money the national party is going to spend on this candidate. (This expectation is not borne out by the results for 1982.) An open race is likely to attract more funding from the national party to secure that seat for the party; this expectation is sup- ported by the regression results. The greater the party loyalty (PU) is, the greater the party support will be, which is also supported by the results. Since the Democratic party has a smaller campaign money chest than the Republican party, the Democratic dummy is expected to have a negative sign, which it does (the intercept term for the Democratic party’s campaign contribution regression will be smaller than that of its rival). The COMM dummy is expected to have a positive sign, for if you are up for election and happen to be a member of the national committees that distribute the cam- paign funds, you are more likely to steer proportionately larger amounts of money toward your own election.

192 PART ONE: THE LINEAR REGRESSION MODEL

guj75845_ch06.qxd 4/16/09 11:56 AM Page 192

CHAPTER SIX: DUMMY VARIABLE REGRESSION MODELS 193

6.5 COMPARING TWO REGESSIONS5

Earlier in Sec. 6.2 we raised the possibility that not only the intercepts but also the slope coefficients could vary between categories. Thus, for our food expen- diture example, are the slope coefficients of the after-tax income the same for

AGGREGATE CONTRIBUTIONS BY U.S. POLITICAL PARTIES, 1982

Explanatory variable Coefficient

$GAP -8.189* (1.863)

VGAP 0.0321 (0.0223)

OPEN 3.582* (0.7293)

PU 18.189* (0.849)

DEMOCRAT -9.986* (0.557)

COMM 1.734* (0.746)

R2 0.70 F 188.4

Notes: Standard errors are in parentheses. *Means significant at the 0.01 level.

$GAP = A measure of the candidate’s finances

VGAP = The size of the vote differential in the previous election

OPEN = 1 for open seat races, 0 if otherwise PU = Party unity index as calculated by

Congressional Quarterly DEMOCRAT = 1 for members of the Democratic

party, 0 if otherwise COMM = 1 for representatives who are

members of the Democratic Congressional Campaign Committee or the National Republican Congressional Committee

= 0 otherwise (i.e., those who are not members of such committees)

Source: Al Wilhite and John Theilmann, “Campaign Contributions by Political Parties: Ideology versus Winning,” Atlantic Economic Journal, vol. XVII, June 1989, pp. 11–20. Table 2, p. 15 (adapted).

TABLE 6-5

5An alternative approach to comparing two or more regressions that gives similar results to the dummy variable approach discussed below is popularly known as the Chow test, which was popu- larized by the econometrician Gregory Chow. The Chow test is really an application of the restricted least-squares method that we discussed in Chapter 4. For a detailed discussion of the Chow test, see Gujarati and Porter, Basic Econometrics, 5th ed., McGraw-Hill, New York, 2009, pp. 256–259.

guj75845_ch06.qxd 4/16/09 11:56 AM Page 193

both male and female? To explore this possibility, consider the following model:

(6.23)

This is a modification of model (6.8) in that we have added an extra variable DiXi.

From this regression we can derive the following regression:

Mean food expenditure function, males (Di = 0). Taking the conditional expectation of Equation (6.23), given the values of D

and X, we obtain

(6.24)

Mean food expenditure function, females (Di = 1). Again, taking the conditional expectation of Eq. (6.23), we obtain

(6.25)

Just as we called B2 the differential intercept coefficient, we can now call B4 the differential slope coefficient (also called the slope drifter), for it tells by how much the slope coefficient of the income variable differs between the two sexes or two categories. Just as (B1 + B2) gives the mean value of Y for the category that receives the dummy value of 1 when X is zero, (B3 + B4) gives the slope co- efficient of the income variable for the category that receives the dummy value of 1. Notice how the introduction of the dummy variable in the additive form en- ables us to distinguish between the intercept coefficients of the two groups and how the introduction of the dummy variable in the interactive, or multiplica- tive, form (D multiplied by X) enables us to differentiate between slope coeffi- cients of the two groups.6

Now depending on the statistical significance of the differential intercept coefficient, B2, and the differential slope coefficient, B4, we can tell whether the female and male food expenditure functions differ in their intercept values or their slope values, or both. We can think of four possibilities, as shown in Figure 6-4.

Figure 6-4(a) shows that there is no difference in the intercept or the slope coefficients of the two food expenditure regressions. That is, the two regressions are identical. This is the case of coincident regressions.

Figure 6-4(b) shows that the two slope coefficients are the same, but the intercepts are different. This is the case of parallel regressions.

= (B1 + B2) + (B3 + B4)Xi, since Di = 1

E (Yi|Di = 1, Xi) = (B1 + B2Di) + (B3 + B4Di)Xi

E (Yi|D = 0, Xi) = B1 + B3Xi

Yi = B1 + B2Di + B3Xi + B4(DiXi) + ui

194 PART ONE: THE LINEAR REGRESSION MODEL

6In Eq. (6.20) we allowed for interactive dummies. But a dummy could also interact with a quan- titative variable.

guj75845_ch06.qxd 4/16/09 11:56 AM Page 194

CHAPTER SIX: DUMMY VARIABLE REGRESSION MODELS 195

Figure 6-4(c) shows that the two regressions have the same intercepts, but different slopes. This is the case of concurrent regressions.

Figure 6-4(d) shows that both the intercept and slope coefficients are differ- ent; that is, the two regressions are different. This is the case of dissimilar regressions.

Returning to our example, let us first estimate Eq. (6.23) and see which of the situations depicted in Figure 6-4 prevails. The data to run this regression are already given in Table 6-2. The regression results, using EViews, are as shown in Table 6-6.

It is clear from this regression that neither the differential intercept nor the dif- ferential slope coefficient is statistically significant, suggesting that perhaps we have the situation of coincident regressions shown in Figure 6-4(a). Are these results in conflict with those given in Eq. (6.8), where we saw that the two inter- cepts were statistically different? If we accept the results given in Eq. (6.8), then we have the situation shown in Figure 6-4(b), the case of parallel regressions (see also Fig. 6-3). What is an econometrician to do in situations like this?

It seems in going from Equations (6.8) to (6.23), we also have committed a specification error in that we seem to have included an unnecessary variable,

Y

X

(a) Coincident regressions

Y

X

(b) Parallel regressions

Y

X (c) Concurrent regressions

Y

X (d) Dissimilar regressions

0

Comparing two regressionsFIGURE 6-4

guj75845_ch06.qxd 4/16/09 11:56 AM Page 195

DiXi. As we will see in Chapter 7, the consequences of including or excluding variables from a regression model can be serious, depending on the particular situation. As a practical matter, we should consider the most comprehensive model (e.g., model [6.23]) and then reduce it to a smaller model (e.g., Eq. [6.8]) after suitable diagnostic testing. We will consider this topic in greater detail in Chapter 7.

Where do we stand now? Considering the results of models (6.1), (6.8), and (6.23), it seems that model (6.8) is probably the most appropriate model for the food expenditure example. We probably have the case of parallel regression: The female and male food expenditure regressions only differ in their intercept values. Holding sex constant, it seems there is no difference in the response of food consumption expenditure in relation to after-tax income for men and women. But keep in mind that our sample is quite small. A larger sample might give a different outcome.

Example 6.4. The Savings-Income Relationship in the United States

As a further illustration of how we can use the dummy variables to assess the influence of qualitative variables, consider the data given in Table 6-7. These data relate to personal disposable (i.e., after-tax) income and personal sav- ings, both measured in billions of dollars, in the United States for the period 1970 to 1995. Our objective here is to estimate a savings function that relates savings (Y) to personal disposable income (PDI) (X) for the United States for the said period.

To estimate this savings function, we could regress Y and X for the entire period. If we do that, we will be maintaining that the relationship between savings and PDI remains the same throughout the sample period. But that might be a tall assumption. For example, it is well known that in 1982 the United States suffered its worst peacetime recession. The unemployment rate that year reached 9.7 percent, the highest since 1948. An event such as this

196 PART ONE: THE LINEAR REGRESSION MODEL

RESULTS OF REGRESSION (6.23)

Variable Coefficient Std. Error t-Statistic Prob.

C 1432.577 248.4782 5.765404 0.0004 D -67.89322 350.7645 -0.193558 0.8513 X 0.061583 0.008349 7.376091 0.0001

D.X -0.006294 0.012988 -0.484595 0.6410

R-squared 0.930459 Mean dependent var 2925.250 Adjusted R-squared 0.904381 S.D. dependent var 604.3869 S.E. of regression 186.8903 F-statistic 35.68003 Sum squared resid 279423.9 Prob(F-statistic) 0.000056

Notes: Dependent Variable: FOODEXP Sample: 1–12 Included observations: 12

TABLE 6-6

guj75845_ch06.qxd 4/16/09 11:56 AM Page 196

CHAPTER SIX: DUMMY VARIABLE REGRESSION MODELS 197

might disturb the relationship between savings and PDI. To see if this in fact happened, we can divide our sample data into two periods, 1970 to 1981 and 1982 to 1995, the pre- and post-1982 recession periods.

In principle, we could estimate two regressions for the two periods in question. Instead, we could estimate just one regression by adding a dummy variable that takes a value of 0 for the period 1970 to 1981 and a value of 1 for the period 1982 to 1995 and estimate a model similar to Eq. (6.23). To allow for a different slope between the two periods, we have included the interac- tion term, as well. That exercise gives the results shown in Table 6-8.

As these results show, both the differential intercept and slope coefficients are individually statistically significant, suggesting that the savings-income relationship between the two time periods has changed. The outcome resem- bles Figure 6-4(d). From the data in Table 6-8, we can derive the following savings regressions for the two periods:

PERSONAL SAVINGS AND PERSONAL DISPOSABLE INCOME, UNITED STATES, 1970–1995

Personal Product of the Personal disposable Dummy dummy variable

Year savings income (PDI) variable and PDI

1970 61.0 727.1 0 0.0 1971 68.6 790.2 0 0.0 1972 63.6 855.3 0 0.0 1973 89.6 965.0 0 0.0 1974 97.6 1054.2 0 0.0 1975 104.4 1159.2 0 0.0 1976 96.4 1273.0 0 0.0 1977 92.5 1401.4 0 0.0 1978 112.6 1580.1 0 0.0 1979 130.1 1769.5 0 0.0 1980 161.8 1973.3 0 0.0 1981 199.1 2200.2 0 0.0 1982 205.5 2347.3 1* 2347.3 1983 167.0 2522.4 1 2522.4 1984 235.7 2810.0 1 2810.0 1985 206.2 3002.0 1 3002.0 1986 196.5 3187.6 1 3187.6 1987 168.4 3363.1 1 3363.1 1988 189.1 3640.8 1 3640.8 1989 187.8 3894.5 1 3894.5 1990 208.7 4166.8 1 4166.8 1991 246.4 4343.7 1 4343.7 1992 272.6 4613.7 1 4613.7 1993 214.4 4790.2 1 4790.2 1994 189.4 5021.7 1 5021.7 1995 249.3 5320.8 1 5320.8

Note: *Dummy variable = 1 for observations beginning in 1982. Source: Economic Report of the President, 1997, data are in billions

of dollars and are from Table B-28, p. 332.

TABLE 6-7

guj75845_ch06.qxd 4/16/09 11:56 AM Page 197

Savings-Income regression: 1970–1981:

Savingst = 1.0161 + 0.0803 Incomet (6.26)

Savings-Income regression: 1982–1995:

Savingst = (1.0161 + 152.4786) + (0.0803 - 0.0655) Incomet

= 153.4947 + 0.0148 Incomet (6.27)

If we had disregarded the impact of the 1982 recession on the savings-income relationship and estimated this relationship for the entire period of 1970 to 1995, we would have obtained the following regression:

Savingst = 62.4226 + 0.0376 Incomet t = (4.8917) (8.8937) r2 = 0.7672

(6.28)

You can see significant differences in the marginal propensity to save (MPS)—additional savings from an additional dollar of income—in these regressions. The MPS was about 8 cents from 1970 to 1981 and only about 1 cent from 1982 to 1995. You often hear the complaint that Americans are poor savers. Perhaps these results may substantiate this complaint.

6.6 THE USE OF DUMMY VARIABLES IN SEASONAL ANALYSIS

Many economic time series based on monthly or quarterly data exhibit seasonal patterns (regular oscillatory movements). Examples are sales of department stores at Christmas, demand for money (cash balances) by households at holi- day times, demand for ice cream and soft drinks during the summer, and demand for travel during holiday seasons. Often it is desirable to remove the

198 PART ONE: THE LINEAR REGRESSION MODEL

REGRESSION RESULTS OF SAVINGS-INCOME RELATIONSHIP

Variable Coefficient Std. Error t-Statistic Prob.

C 1.016117 20.16483 0.050391 0.9603 DUM 152.4786 33.08237 4.609058 0.0001 INCOME 0.080332 0.014497 5.541347 0.0000 DUM*INCOME -0.065469 0.015982 -4.096340 0.0005

R-squared 0.881944 Mean dependent var 162.0885 Adjusted R-squared 0.865846 S.D. dependent var 63.20446 S.E. of regression 23.14996

Notes: Dependent Variable: Savings Sample: 1970–1995 Observations included: 26

TABLE 6-8

guj75845_ch06.qxd 4/16/09 11:56 AM Page 198

CHAPTER SIX: DUMMY VARIABLE REGRESSION MODELS 199

seasonal factor, or component, from a time series so that we may concentrate on the other components of times series, such as the trend,7 which is a fairly steady increase or decrease over an extended time period. The process of removing the seasonal component from a time series is known as deseasonalization, or seasonal adjustment, and the time series thus obtained is called a deseasonalized, or season- ally adjusted, time series. The U.S. government publishes important economic time series on a seasonally adjusted basis.

There are several methods of deseasonalizing a time series, but we will con- sider only one of these methods, namely, the method of dummy variables,8 which we will now illustrate.

Example 6.5. Refrigerator Sales and Seasonality

To show how dummy variables can be used for seasonal analysis, consider the data given in Table 6-9, found on the textbook’s Web site.

This table gives data on the number of refrigerators sold (in thousands) for the United States from the first quarter of 1978 to the fourth quarter of 1985, a total of 32 quarters. The data on refrigerator sales are plotted in Fig. 6-5.

Figure 6-5 probably suggests that there is a seasonal pattern to refrigerator sales. To see if this is the case, consider the following model:

(6.29)

where Y = sales of refrigerators (in thousands), D2, D3, and D4 are dummies for the second, third, and fourth quarter of each year, taking a value of 1 for

Yt = B1 + B2D2t + B3D3t + B4D4t + ut

7A time series may contain four components: a seasonal, a cyclical, a trend (or long-term compo- nent), and one that is strictly random.

8For other methods of seasonal adjustment, see Paul Newbold, Statistics for Business and Economics, latest edition, Prentice-Hall, Englewood Cliffs, N.J.

1800

1600

1400

1200

1000

800 5 10 15 20 25 30

FRIG

Sales of refrigerators, United States, 1978:1–1985:4FIGURE 6-5

guj75845_ch06.qxd 4/16/09 11:56 AM Page 199

the relevant quarter and a value of 0 for the first quarter. We are treating the first quarter as the reference quarter, although any quarter can serve as the reference quarter. Note that since we have four quarters (or four seasons), we have assigned only three dummies to avoid the dummy variable trap. The layout of the dummies is given in Table 6-9. Note that the refrigerator is classified as a durable goods item because it has a sufficiently long life.

The regression results of this model are as follows:

= 1222.1250 + 245.3750D2t + 347.6250D3t - 62.1250D4t t = (20.3720)* (2.8922)* (4.0974)* (-0.7322)** (6.30)

R2 = 0.5318

*denotes a p value of less than 5%

**denotes a p value of more than 5%

Since we are treating the first quarter as the benchmark, the differential in- tercept coefficients (i.e., coefficients of the seasonal dummies) give the sea- sonal increase or decrease in the mean value of Y relative to the benchmark season. Thus, the value of about 245 means the average value of Y in the sec- ond quarter is greater by 245 than that in the first quarter, which is about 1222. The average value of sales of refrigerators in the second quarter is then about (1222 + 245) or about 1,467 thousands of units. Other seasonal dummy coefficients are to be interpreted similarly.

As you can see from Equation (6.30), the seasonal dummies for the second and third quarters are statistically significant but that for the fourth quarter is not. Thus, the average sale of refrigerators is the same in the first and the fourth quarters but different in the second and the third quarters. Hence, it seems that there is some seasonal effect associated with the second and third quarters but not the fourth quarter. Perhaps in the spring and summer peo- ple buy more refrigerators than in the winter and fall. Of course, keep in mind that all comparisons are in relation to the benchmark, which is the first quarter.

How do we obtain the deseasonalized time series for refrigerator sales? This can be done easily. Subtract the estimated value of Y from Eq. (6.30) from the actual values of Y, which are nothing but the residuals from regres- sion (6.30). Then add to the residuals the mean value of Y. The resulting series is the deseasonalized time series. This series may represent the other components of the time series (cyclical, trend, and random).9 This is all shown in Table 6-9.

YNt

200 PART ONE: THE LINEAR REGRESSION MODEL

9Of course, this assumes that the dummy variable technique is an appropriate method of desea- sonalizing a time series (TS). A time series can be represented as TS = s + c + t + u, where s represents the seasonal, c the cyclical, t the trend, and u the random component. For other methods of desea- sonalization, see Francis X. Diebold, Elements of Forecasting, 4th ed., South-Western Publishing, Cincinnati, Ohio, 2007.

guj75845_ch06.qxd 4/16/09 11:56 AM Page 200

CHAPTER SIX: DUMMY VARIABLE REGRESSION MODELS 201

In Example 6.5 we had quarterly data. But many economic time series are available on a monthly basis, and it is quite possible that there may be some sea- sonal component in the monthly data. To identify it, we could create 11 dum- mies to represent 12 months. This principle is general. If we have daily data, we could use 364 dummies, one less than the number of days in a year. Of course, you have to use some judgment in using several dummies, for if you use dum- mies indiscriminately, you will quickly consume degrees of freedom; you lose one d.f. for every dummy coefficient estimated.

6.7 WHAT HAPPENS IF THE DEPENDENT VARIABLE IS ALSO A DUMMY VARIABLE? THE LINEAR PROBABILITY MODEL (LPM)

So far we have considered models in which the dependent variable Y was quan- titative and the explanatory variables were either qualitative (i.e., dummy), quantitative, or a mixture thereof. In this section we consider models in which the dependent variable is also dummy, or dichotomous, or binary.

Suppose we want to study the labor force participation of adult males as a function of the unemployment rate, average wage rate, family income, level of education, etc. Now a person is either in or not in the labor force. So whether a person is in the labor force or not can take only two values: 1 if the person is in the labor force and 0 if he is not. Other examples include: a country is either a member of the European Union or it is not; a student is either admitted to West Point or he or she is not; a baseball player is either selected to play in the majors or he is not.

A unique feature of these examples is that the dependent variable elicits a yes or no response, that is, it is dichotomous in nature.10 How do we estimate such models? Can we apply OLS straightforwardly to such a model? The answer is that yes we can apply OLS but there are several problems in its application. Before we consider these problems, let us first consider an example.

Table 6-10, found on the textbook’s Web site, gives hypothetical data on 40 people who applied for mortgage loans to buy houses and their annual incomes. Later we will consider a concrete application.

In this table Y = 1 if the mortgage loan application was accepted and 0 if it was not accepted, and X represents annual family income. Now consider the following model:

(6.31)

where Y and X are as defined before.

Yi = B1 + B2Xi + ui

10What happens if the dependent variable has more than two categories? For example, a person may belong to the Democratic party, the Republican party, or the Independent party. Here, party affil- iation is a trichotomous variable. There are methods of handling models in which the dependent variable can take several categorical values. But this topic is beyond the scope of this book.

guj75845_ch06.qxd 4/16/09 11:56 AM Page 201

Model (6.31) looks like a typical linear regression model but it is not because we cannot interpret the slope coefficient B2 as giving the rate of change of Y for a unit change in X, for Y takes only two values, 0 and 1. A model like Eq. (6.31) is called a linear probability model (LPM) because the conditional expectation of Yi given Xi, , can be interpreted as the conditional probability that the event will occur given Xi, that is, . Further, this conditional probabil- ity changes linearly with X. Thus, in our example, gives the probability that a mortgage applicant with income of Xi, say $60,000 per year, will have his or her mortgage application approved.

As a result, we now interpret the slope coefficient B2 as a change in the pro- bability that Y = 1, when X changes by a unit. The estimated Yi value from Eq. (6.31), namely, , is the predicted probability that Y equals 1 and b2 is an estimate of B2.

With this change in the interpretation of Eq. (6.31) when Y is binary can we then assume that it is appropriate to estimate Eq. (6.31) by OLS? The answer is yes, provided we take into account some problems associated with OLS estima- tion of Eq. (6.31). First, although Y takes a value of 0 or 1, there is no guarantee that the estimated Y values will necessarily lie between 0 and 1. In an applica- tion, some can turn out to be negative and some can exceed 1. Second, since Y is binary, the error term is also binary.11 This means that we cannot assume that ui follows a normal distribution. Rather, it follows the binomial probability distribution. Third, it can be shown that the error term is heteroscedastic; so far we are working under the assumption that the error term is homoscedas- tic. Fourth, since Y takes only two values, 0 and 1, the conventionally com- puted R2 value is not particularly meaningful (for an alternative measure, see Problem 6.24).

Of course, not all these problems are insurmountable. For example, we know that if the sample size is reasonably large, the binomial distribution converges to the normal distribution. As we will see in Chapter 9, we can find ways to get around the heteroscedasticity problem. So the problem that remains is that some of the estimated Y values can be negative and some can exceed 1. In prac- tice, if an estimated Y value is negative it is taken as zero, and if it exceeds 1, it is taken as 1. This may be convenient in practice if we do not have too many negative values or too many values that exceed 1.

But the major problem with LPM is that it assumes the probability changes linearly with the X value; that is, the incremental effect of X remains constant throughout. Thus if the Y variable is home ownership and the X variable is income, the LPM assumes that as X increases, the probability of Y increases lin- early, whether X = 1000 or X = 10,000. In reality, we would expect the probabil- ity that Y = 1 to increase nonlinearly with X. At a low level of income, a family will not own a house, but at a sufficiently high level of income, a family most

YNi

YNi

E (Yi|Xi) P(Yi = 1|Xi)

E (Yi|Xi)

202 PART ONE: THE LINEAR REGRESSION MODEL

11It is obvious from Eq. (6.31) that when Yi = 1, we have ui = 1 - B1 - B2Xi and when Yi = 0, ui = -B1 - B2Xi.

guj75845_ch06.qxd 4/16/09 11:56 AM Page 202

CHAPTER SIX: DUMMY VARIABLE REGRESSION MODELS 203

likely will own a house. Beyond that income level, further increases in family income will have no effect on the probability of owning a house. Thus, at both ends of the income distribution, the probability of owning a house will be virtually unaffected by a small increase in income.

There are alternatives in the literature to the LPM model, such as the logit or probit models. A discussion of these models will, however, take us far afield and is better left for the references.12 However, this topic is discussed in Chapter 12 for the benefit of those who want to pursue this subject further.

Despite the difficulties with the LPM, some of which can be corrected, espe- cially if the sample size is large, the LPM is used in practical applications be- cause of its simplicity. Very often it provides a benchmark against which we can compare the more complicated models, such as the logit and probit.

Let us now illustrate LPM with the data given in Table 6-10. The regression results are as follows:

= -0.9456 + 0.0255Xi t = (-7.6984)(12.5153) r2 = 0.8047

(6.32)

The interpretation of this model is this: As income increases by a dollar, the probability of mortgage approval goes up by about 0.03. The intercept value here has no viable practical meaning. Given the warning about the r2 values in LPM, we may not want to put much value in the observed high r2 value in the present case. Sometimes we obtain a high r2 value in such models if all the observations are closely bunched together either around zero or 1.

Table 6-10 gives the actual and estimated values of Y from LPM model (6.31). As you can observe, of the 40 values, 6 are negative and 6 are in excess of 1, which shows one of the problems with the LPM alluded to earlier. Also, the finding that the probability of mortgage approval increases linearly with in- come at a constant rate of about 0.03, may seem quite unrealistic.

To conclude our discussion of LPM, here is a concrete application.

Example 6.6. Discrimination in Loan Markets

To see if there is discrimination in getting mortgage loans, Maddala and Trost examined a sample of 750 mortgage applications in the Columbia, South Carolina, metropolitan area.13 Of these, 500 applications were approved and 250 rejected. To see what factors determine mortgage approval, the authors developed an LPM and obtained the following results, which are given in tabular form. In this model the dependent variable is Y, which is binary, tak- ing a value of 1 if the mortgage loan application was accepted and a value of 0 if it was rejected. Part of the objective of the study was to find out if there

YN i

12For an accessible discussion of these models, see Gujarati and Porter, 5th ed., McGraw-Hill, New York, 2009, Chapter 15.

13See G. S. Maddala and R. P. Trost, “On Measuring Discrimination in Loan Markets,” Housing Finance Review, 1982, pp. 245–268.

guj75845_ch06.qxd 4/16/09 11:56 AM Page 203

was discrimination in the loan market on account of sex, race, and other qualitative factors.

Explanatory variable Coefficient t ratios

Intercept 0.501 not given AI 1.489 4.69* XMD -1.509 -5.74* DF 0.140 0.78** DR -0.266 -1.84* DS -0.238 -1.75* DA -1.426 -3.52* NNWP -1.762 0.74** NMFI 0.150 0.23** NA -0.393 -0.134

Notes: AI = Applicant’s and co-applicants’ incomes ($ in thousands) XMD = Debt minus mortgage payment ($ in thousands)

DF = 1 if female and 0 if male DR = 1 if nonwhite and 0 if white DS = 1 if single, 0 if otherwise DA = Age of house (102 years)

NNWP = Percent nonwhite in the neighborhood (*103) NMFI = Neighborhood mean family income (105 dollars)

NA = Neighborhood average age of home (102 years) *p value 5% or lower, one-tail test. **p value greater than 5%.

An interesting feature of the Maddala-Trost model is that some of the explana- tory variables are also dummy variables. The interpretation of the dummy coeffi- cient of DR is this: Holding all other variables constant, the probability that a non- white will have his or her mortgage loan application accepted is lower by 0.266 or about 26.6 percent compared to the benchmark category, which in the present in- stance is married white male. Similarly, the probability that a single person’s mortgage loan application will be accepted is lower by 0.238 or 23.8 percent com- pared with the benchmark category, holding all other factors constant.

We should be cautious of jumping to the conclusion that there is race dis- crimination or discrimination against single people in the home mortgage mar- ket, for there are many factors involved in getting a home mortgage loan.

6.8 SUMMARY

In this chapter we showed how qualitative, or dummy, variables taking values of 1 and 0 can be introduced into regression models alongside quantitative vari- ables. As the various examples in the chapter showed, the dummy variables are essentially a data-classifying device in that they divide a sample into various subgroups based on qualities or attributes (sex, marital status, race, religion, etc.) and implicitly run individual regressions for each subgroup. Now if there are dif- ferences in the responses of the dependent variable to the variation in the quanti- tative variables in the various subgroups, they will be reflected in the differences in the intercepts or slope coefficients of the various subgroups, or both.

204 PART ONE: THE LINEAR REGRESSION MODEL

guj75845_ch06.qxd 4/16/09 11:56 AM Page 204

CHAPTER SIX: DUMMY VARIABLE REGRESSION MODELS 205

Although it is a versatile tool, the dummy variable technique has to be han- dled carefully. First, if the regression model contains a constant term (as most models usually do), the number of dummy variables must be one less than the number of classifications of each qualitative variable. Second, the coefficient attached to the dummy variables must always be interpreted in relation to the control, or benchmark, group—the group that gets the value of zero. Finally, if a model has sev- eral qualitative variables with several classes, introduction of dummy variables can consume a large number of degrees of freedom (d.f.). Therefore, we should weigh the number of dummy variables to be introduced into the model against the total number of observations in the sample.

In this chapter we also discussed the possibility of committing a specification error, that is, of fitting the wrong model to the data. If intercepts as well as slopes are expected to differ among groups, we should build a model that incorporates both the differential intercept and slope dummies. In this case a model that in- troduces only the differential intercepts is likely to lead to a specification error. Of course, it is not always easy a priori to find out which is the true model. Thus, some amount of experimentation is required in a concrete study, espe- cially in situations where theory does not provide much guidance. The topic of specification error is discussed further in Chapter 7.

In this chapter we also briefly discussed the linear probability model (LPM) in which the dependent variable is itself binary. Although LPM can be estimated by ordinary least square (OLS), there are several problems with a rou- tine application of OLS. Some of the problems can be resolved easily and some cannot. Therefore, alternative estimating procedures are needed. We mentioned two such alternatives, the logit and probit models, but we did not discuss them in view of the somewhat advanced nature of these models (but see Chapter 12).

KEY TERMS AND CONCEPTS

The key terms and concepts introduced in this chapter are

Qualitative versus quantitative variables

Dummy variables Analysis-of-variance (ANOVA)

models Differential intercept coefficients Base, reference, benchmark, or

comparison category Data matrix Dummy variable trap; perfect

collinearity, multicollinearity Analysis-of-covariance (ANCOVA)

models Covariates; control variables

Comparing two regressions Interactive, or multiplicative Additive Interaction dummy Differential slope coefficient, or

slope drifter Coincident regressions Parallel regressions Concurrent regressions Dissimilar regressions Marginal propensity to save (MPS) Seasonal patterns Linear probability model (LPM) Binomial probability distribution

guj75845_ch06.qxd 4/16/09 11:56 AM Page 205

QUESTIONS

6.1. Explain briefly the meaning of: a. Categorical variables. b. Qualitative variables. c. Analysis-of-variance (ANOVA) models. d. Analysis-of-covariance (ANCOVA) models. e. The dummy variable trap. f. Differential intercept dummies. g. Differential slope dummies.

6.2. Are the following variables quantitative or qualitative? a. U.S. balance of payments. b. Political party affiliation. c. U.S. exports to the Republic of China. d. Membership in the United Nations. e. Consumer Price Index (CPI). f. Education. g. People living in the European Community (EC). h. Membership in General Agreement on Tariffs and Trade (GATT). i. Members of the U.S. Congress. j. Social security recipients.

6.3. If you have monthly data over a number of years, how many dummy variables will you introduce to test the following hypotheses? a. All 12 months of the year exhibit seasonal patterns. b. Only February, April, June, August, October, and December exhibit seasonal

patterns. 6.4. What problems do you foresee in estimating the following models:

a.

where Dit = 1 for observation in quarter i, i = 1, 2, 3, 4 = 0 otherwise

b.

where GNPt = gross national product (GNP) at time t Mt = the money supply at time t

Mt-1 = the money supply at time (t - 1)

6.5. State with reasons whether the following statements are true or false. a. In the model Yi = B1 + B2Di + ui, letting Di take the values of (0, 2) instead of

(0, 1) will halve the value of B2 and will also halve the t value. b. When dummy variables are used, ordinary least squares (OLS) estimators

are unbiased only in large samples. 6.6. Consider the following model:

Yi = B0 + B1Xi + B2D2i + B3D3i + ui

GNPt = B1 + B2Mt + B3Mt-1 + B4(Mt - Mt-1) + ut

Yt = B0 + B1D1t + B2D2t + B3D3t + B4D4t + ut

206 PART ONE: THE LINEAR REGRESSION MODEL

guj75845_ch06.qxd 4/16/09 11:56 AM Page 206

CHAPTER SIX: DUMMY VARIABLE REGRESSION MODELS 207

where Y = annual earnings of MBA graduates X = years of service

D2 = 1 if Harvard MBA = 0 if otherwise

D3 = 1 if Wharton MBA = 0 if otherwise

a. What are the expected signs of the various coefficients? b. How would you interpret B2 and B3? c. If , what conclusion would you draw?

6.7. Continue with Question 6.6 but now consider the following model:

a. What is the difference between this model and the one given in Question 6.6? b. What is the interpretation of B4 and B5? c. If B4 and B5 are individually statistically significant, would you choose this

model over the previous one? If not, what kind of bias or error are you com- mitting?

d. How would you test the hypothesis that B4 = B5 = 0?

PROBLEMS

6.8. Based on quarterly observations for the United States for the period 1961-I through 1977-II, H. C. Huang, J. J. Siegfried, and F. Zardoshty14 estimated the following demand function for coffee. (The figures in parentheses are t values.)

ln Qt = 1.2789 - 0.1647 ln Pt + 0.5115 ln It + 0.1483 ln

t = (-2.14) (1.23) (0.55)

-0.0089T - 0.0961 D1t - 0.1570D2t - 0.0097D3t R2 = 0.80

t = (-3.36) (-3.74) (-6.03) (-0.37)

where Q = pounds of coffee consumed per capita P = the relative price of coffee per pound at 1967 prices I = per capita PDI, in thousands of 1967 dollars

P’ = the relative price of tea per quarter pound at 1967 prices t = the time trend with t = 1 for 1961-I, to t = 66 for 1977-II

D1 = 1 for the first quarter D2 = 1 for the second quarter D3 = 1 for the third quarter ln = the natural log

P¿t

Yi = B0 + B1Xi + B2D2i + B3D3i + B4(D2iXi) + B5(D3iXi) + ui

B2 7 B3

14See H. C. Huang, J. J. Siegfried, and F. Zardoshty, “The Demand for Coffee in the United States, 1963–1977,” Quarterly Review of Economics and Business, Summer 1980, pp. 36–50.

guj75845_ch06.qxd 4/16/09 11:56 AM Page 207

a. How would you interpret the coefficients of P, I, and P’? b. Is the demand for coffee price elastic? c. Are coffee and tea substitute or complementary products? d. How would you interpret the coefficient of t? e. What is the trend rate of growth or decline in coffee consumption in the

United States? If there is a decline in coffee consumption, what accounts for it?

f. What is the income elasticity of demand for coffee? g. How would you test the hypothesis that the income elasticity of demand for

coffee is not significantly different from 1? h. What do the dummy variables represent in this case? i. How do you interpret the dummies in this model? j. Which of the dummies are statistically significant?

k. Is there a pronounced seasonal pattern in coffee consumption in the United States? If so, what accounts for it?

l. Which is the benchmark quarter in this example? Would the results change if we chose another quarter as the base quarter?

m. The preceding model only introduces the differential intercept dummies. What implicit assumption is made here?

n. Suppose someone contends that this model is misspecified because it assumes that the slopes of the various variables remain constant between quarters. How would you rewrite the model to take into account differential slope dummies?

o. If you had the data, how would you go about reformulating the demand function for coffee?

6.9. In a study of the determinants of direct airfares to Cleveland, Paul W. Bauer and Thomas J. Zlatoper obtained the following regression results (in tabular form) to explain one-way airfare for first class, coach, and discount airfares. (The dependent variable is one-way airfare in dollars). The explanatory variables are defined as follows:

Carriers = the number of carriers Pass = the total number of passengers flown on route (all carriers)

Miles = the mileage from the origin city to Cleveland Pop = the population of the origin city Inc = per capita income of the origin city

Corp = the proxy for potential business traffic from the origin city Slot = the dummy variable equaling 1 if the origin city has a slot-restricted

airport = 0 if otherwise

Stop = the number of on-flight stops Meal = the dummy variable equaling 1 if a meal is served

= 0 if otherwise Hub = the dummy variable equaling 1 if the origin city has a hub airline

= 0 if otherwise EA = the dummy variable equaling 1 if the carrier is Eastern Airlines

= 0 if otherwise CO = the dummy variable equaling 1 if the carrier is Continental Airlines

= 0 if otherwise

208 PART ONE: THE LINEAR REGRESSION MODEL

guj75845_ch06.qxd 4/16/09 11:56 AM Page 208

CHAPTER SIX: DUMMY VARIABLE REGRESSION MODELS 209

The results are given in Table 6-11. a. What is the rationale for introducing both carriers and squared carriers as

explanatory variables in the model? What does the negative sign for carriers and the positive sign for carriers squared suggest?

b. As in part (a), what is the rationale for the introduction of miles and squared miles as explanatory variables? Do the observed signs of these variables make economic sense?

DETERMINANTS OF DIRECT AIR FARES TO CLEVELAND

Explanatory variable First class Coach Discount

Carriers -19. 50 -23.00 -17.50 *t = (-0.878) (-1.99) (-3.67)

Carriers2 2.79 4.00 2.19 (0.632) (1.83) (2.42)

Miles 0.233 0.277 0.0791 (5.13) (12.00) (8.24)

Miles2 -0.0000097 -0.000052 -0.000014 (-0.495) (-4.98) (-3.23)

Pop -0.00598 -0.00114 -0.000868 (-1.67) (-4.98) (-1.05)

Inc -0.00195 -0.00178 -0.00411 (-0.686) (-1.06) (-6.05)

Corp 3.62 1.22 -1.06 (3.45) (2.51) (-5.22)

Pass -0.000818 -0.000275 0.853 (-0.771) (-0.527) (3.93)

Stop 12.50 7.64 -3.85 (1.36) (2.13) (-2.60)

Slot 7.13 -0.746 17.70 (0.299) (-0.067) (3.82)

Hub 11.30 4.18 -3.50 (0.90) (0.81) (-1.62)

Meal 11.20 0.945 1.80 (1.07) (0.177) (0.813)

EA -18.30 5.80 -10.60 (-1.60) (0.775) (-3.49)

CO -66.40 -56.50 -4.17 (-5.72) (-7.61) (-1.35)

Constant term 212.00 126.00 113.00 (5.21) (5.75) (12.40)

R 2 0.863 0.871 0.799 Number of observations 163 323 323

Note: *Figures in parentheses represent t values. Source: Paul W. Bauer and Thomas J. Zlatoper, Economic Review, Federal

Reserve Bank of Cleveland, vol. 25, no. 1, 1989, Tables 2, 3, and 4, pp. 6–7.

TABLE 6-11

guj75845_ch06.qxd 4/16/09 11:56 AM Page 209

c. The population variable is observed to have a negative sign. What is the implication here?

d. Why is the coefficient of the per capita income variable negative in all the regressions?

e. Why does the stop variable have a positive sign for first-class and coach fares but a negative sign for discount fares? Which makes economic sense?

f. The dummy for Continental Airlines consistently has a negative sign. What does this suggest?

g. Assess the statistical significance of each estimated coefficient. Note: Since the number of observations is sufficiently large, use the normal approxima- tion to the t distribution at the 5% level of significance. Justify your use of one-tailed or two-tailed tests.

h. Why is the slot dummy significant only for discount fares? i. Since the number of observations for coach and discount fare regressions is

the same, 323 each, would you pull all 646 observations and run a regres- sion similar to the ones shown in the preceding table? If you do that, how would you distinguish between coach and discount fare observations? (Hint: dummy variables.)

j. Comment on the overall quality of the regression results given in the preceding table.

6.10. In a regression of weight on height involving 51 students, 36 males and 15 females, the following regression results were obtained:15

1. Weighti = -232.06551 + 5.5662heighti t = (-5.2066) (8.6246)

2. Weighti = -122.9621 + 23.8238dumsexi + 3.7402heighti t = (-2.5884) (4.0149) (5.1613)

3. Weighti = -107.9508 + 3.5105heighti + 2.0073dumsexi + 0.3263dumht. t = (-1.2266) (2.6087) (0.0187) (0.2035)

where weight is in pounds, height is in inches, and where

Dumsex = 1 if male = 0 if otherwise

Dumht. = the interactive or differential slope dummy

a. Which regression would you choose, 1 or 2? Why? b. If 2 is in fact preferable but you choose 1, what kind of error are you com-

mitting? c. What does the dumsex coefficient in 2 suggest? d. In Model 2 the differential intercept dummy is statistically significant

whereas in Model 3 it is statistically insignificant. What accounts for this change?

e. Between Models 2 and 3, which would you choose? Why? f. In Models 2 and 3 the coefficient of the height variable is about the same,

but the coefficient of the dummy variable for sex changes dramatically. Do you have any idea what is going on?

210 PART ONE: THE LINEAR REGRESSION MODEL

15A former colleague, Albert Zucker, collected these data and estimated the various regressions.

guj75845_ch06.qxd 4/16/09 11:56 AM Page 210

CHAPTER SIX: DUMMY VARIABLE REGRESSION MODELS 211

To answer questions (d), (e), and (f) you are given the following correlation matrix.

Height Dumsex Dumht.

Height 1 0.6276 0.6752 Dumsex 0.6276 1 0.9971 Dumht. 0.6752 0.9971 1

The interpretation of this table is that the coefficient of correlation between height and dumsex is 0.6276 and that between dumsex and dumht. is 0.9971.

6.11. Table 6-12 on the textbook’s Web site gives nonseasonally adjusted quarterly data on the retail sales of hobby, toy, and game stores (in millions) for the period 1992: I to 2008: II. Consider the following model:

Salest = B1 + B2D2t + B3D3t + B4D4t + ut

where D2 = 1 in the second quarter, = 0 if otherwise D3 = 1 in the third quarter, = 0 if otherwise D4 = 1 in the fourth quarter, = 0 if otherwise

a. Estimate the preceding regression. b. What is the interpretation of the various coefficients? c. Give a logical reason for why the results are this way.

*d. How would you use the estimated regression to deseasonalize the data? 6.12. Use the data of Problem 6.11 but estimate the following model:

Salest = B1D1t + B2D2t + B3D3t + B4D4t + ut

In this model there is a dummy assigned to each quarter. a. How does this model differ from the one given in Problem 6.11? b. To estimate this model, will you have to use a regression program that sup-

presses the intercept term? In other words, will you have to run a regression through the origin?

c. Compare the results of this model with the previous one and determine which model you prefer and why.

6.13. Refer to Eq. (6.17) in the text. How would you modify this equation to allow for the possibility that the coefficient of Tuition also differs from region to region? Present your results.

6.14. How would you check that in Eq. (6.19) the slope coefficient of X varies by sex as well as race?

6.15. Reestimate Eq. (6.30) by assigning a dummy for each quarter and compare your results with those given in Eq. (6.30). In estimating such an equation, what precaution must you take?

*Optional.

guj75845_ch06.qxd 4/16/09 11:56 AM Page 211

6.16. Consider the following model:

Yi = B1 + B2D2i + B3D3i + B4 (D2i D3i) + B5Xi + ui where Y = the annual salary of a college teacher

X = years of teaching experience D2 = 1 if male

= 0 if otherwise D3 = 1 if white

= 0 if otherwise

a. The term (D2iD3i) represents the interaction effect. What does this expression mean?

b. What is the meaning of B4? c. Find E(Yi|D2 = 1, D3 = 1, Xi) and interpret it.

6.17. Suppose in the regression (6.1) we let

Di = 1 for female = -1 for male

Using the data given in Table 6-2, estimate regression (6.1) with this dummy setup and compare your results with those given in regression (6.4). What general conclusion can you draw?

6.18. Continue with the preceding problem but now assume that

Di = 2 for female = 1 for male

With this dummy scheme re-estimate regression (6.1) using the data of Table 6-2 and compare your results. What general conclusions can you draw from the various dummy schemes?

6.19. Table 6-13, found on the textbook’s Web site, gives data on after-tax corporate profits and net corporate dividend payments ($, in billions) for the United States for the quarterly period of 1997:1 to 2008:2. a. Regress dividend payments (Y) on after-tax corporate profits (X) to find out

if there is a relationship between the two. b. To see if the dividend payments exhibit any seasonal pattern, develop a

suitable dummy variable regression model and estimate it. In developing the model, how would you take into account that the intercept as well as the slope coefficient may vary from quarter to quarter?

c. When would you regress Y on X, disregarding seasonal variation? d. Based on your results, what can you say about the seasonal pattern, if any,

in the dividend payment policies of U.S. private corporations? Is this what you expected a priori?

6.20. Refer to Example 6.6. What is the regression equation for an applicant who is an unmarried white male? Is it statistically different for an unmarried white single female?

6.21. Continue with Problem 6.20. What would the regression equation be if you were to include interaction dummies for the three qualitative variables in the model?

6.22. The impact of product differentiation on rate of return on equity. To find out whether firms selling differentiated products (i.e., brand names) experience

212 PART ONE: THE LINEAR REGRESSION MODEL

guj75845_ch06.qxd 4/16/09 11:56 AM Page 212

CHAPTER SIX: DUMMY VARIABLE REGRESSION MODELS 213

higher rates of return on their equity capital, J. A. Dalton and S. L. Levin16

obtained the following regression results based on a sample of 48 firms:

se = (1.380) (0.056) (4.244) (0.017) R2 = 0.26 t = (1.079) (4.285) (-2.240) (-0.941)

p value = (0.1433) (0.000) (0.0151) (0.1759)

where Y = the rate of return on equity D = 1 for firms with high or moderate product differentiation

X2 = the market share X3 = the measure of firm size X4 = the industry growth rate

a. Do firms that product-differentiate earn a higher rate of return? How do you know?

b. Is there a statistical difference in the rate of return on equity capital be- tween firms that do and do not product-differentiate? Show the necessary calculations.

c. Would the answer to (b) change if the authors had used differential slope dummies?

d. Write the equation that allows for both the differential intercept and differ- ential slope dummies.

6.23. What has happened to the United States Phillips curve? Refer to Example 5.6. Extending the sample to 1977, the following model was estimated:

where Y = the year-to-year percentage change in the index of hourly earnings X = the percent unemployment rate

Dt = 1 for observations through 1969 = 0 if otherwise (i.e., for observations from 1970 through 1977)

The regression results were as follows:

se = (1.4024) (1.6859) (8.3373) (9.3999)

t = (7.1860) (-6.1314) (-2.1049) (4.0572) R2 = 0.8787

p value = (0.000) (0.000) (0.026) (0.000)

Compare these results with those given in Example 5.6. a. Are the differential intercept and differential dummy coefficients statisti-

cally significant? If so, what does that suggest? Show the Phillips curve for the two periods separately.

b. Based on these results, would you say that the Phillips curve is dead?

YN t = 10.078 - 10.337Dt - 17.549a 1

Xt b + 38.137Dta

1 Xt b

Yt = B1 + B2Dt + B3a 1

Xt b + B4Dta

1 Xt b + ut

YNi = 1.399 + 1.490Di + 0.246X2i - 9.507X3i - 0.016X4i

16See J. A. Dalton and S. L. Levin, “Market Power: Concentration and Market Share,” Industrial Organization Review, vol. 5, 1977, pp. 27–36. Notations were altered to conform with our notation.

guj75845_ch06.qxd 4/16/09 11:56 AM Page 213

6.24. Count R2. Since the conventional R2 value may not be appropriate for linear probability models, one suggested alternative is the count R2, which is defined as:

Since in LPM the dependent variable takes a value of 1 or 0, if the predicted probability is greater than 0.5, we classify that as 1, but if the predicted proba- bility is less than 0.5, we classify that as 0. We then count the number of correct predictions and compute the count R2 from the formula given above.

Find the count R2 for the model (6.32). How does it compare with the con- ventional R2 given in that equation?

6.25. Table 6-14, found on the textbook’s Web site, gives quarterly data on real per- sonal expenditure (PCE), real expenditure on durable goods (EXPDUR), real expenditure on nondurable goods (EXPNONDUR), and real expenditure on services (EXPSER), for the United States for the period 2000-1 to 2008-3. All data are in billions of (2000) dollars, and the quarterly data are at seasonally adjusted annual rates. a. Plot the data on EXPDUR, EXPNONDUR, and EXPSER against PCE. b. Suppose you regress each category of expenditure on PCE and the three

dummies shown in Table 6-14. Would you expect the dummy variable coefficients to be statistically significant? Why or why not? Present your calculations.

c. If you do not expect the dummy variables to be statistically significant but you still include them in your model, what are the consequences of your action?

6.26. The Phillips curve revisited again. Refer to Example 5.6 and Problem 5.29 from Chapter 5. It was shown that the percentage change in the index of hourly earnings and the unemployment rate from 1958–1969 followed the traditional Phillips curve model. The updated version of the data, from 1965–2007, can be found in Table 5-19 on the textbook’s Web site. a. Create a dummy variable to indicate a possible break in the data in 1982. In

other words, create a dummy variable that equals 0 from 1965 to 1982, then set it equal to 1 for 1983 to 2007.

b. Using the inverted “percent unemployment rate”(1/X) variable created in Chapter 5, create an interaction variable between (1/X) and the dummy variable from part (a).

c. Include both the dummy variable and the interaction term, along with (1/X) on its own, in a regression to predict Y, the change in the hourly earn- ings index. What is your new model?

d. Which, if any, variables appear to be statistically significant? e. Give a potential economic reason for this result.

6.27. Table 6-15 on the textbook’s Web site contains data on 46 mid-level employees and their salaries. The available independent variables are: Experience = years of experience at the current job Management = 0 for nonmanagers and 1 for managers Education = 1 for those whose highest education level is high school

2 for those whose highest education level is college 3 for those whose highest education level is graduate school

Count R2 = number of correct predictions

total number of observations

214 PART ONE: THE LINEAR REGRESSION MODEL

guj75845_ch06.qxd 4/16/09 11:56 AM Page 214

CHAPTER SIX: DUMMY VARIABLE REGRESSION MODELS 215

a. Does it make sense to utilize Education as it is listed in the data? What are the issues with leaving it this way?

b. After addressing the issues in part (a), run a linear regression using Experience, Management, and the changed Education variables. What is the new model? Are all the variables significant?

c. Now create a model to allow for the possibility that the increase in Salary may be different between managers and nonmanagers, with respect to their years of experience. What are the results?

*d. Finally, create a model that incorporates the idea that Salary might increase, with respect to years of experience, at a different rate between employees with different education levels.

6.28. Based on the Current Population Survey (CPS) of March 1995, Paul Rudd extracted a sample of 1289 workers, aged 18 to 65, and obtained the following information on each worker:

Wage = hourly wage in $ Age = age in years

Female = 1 if female worker Nonwhite = 1 if a nonwhite worker

Union = 1 if a union member Education = years of schooling

Experience = potential labor market experience in years.17 The full data set can be found as Table 6-16 on the textbook’s Web site. a. Based on these data, estimate the following model, obtaining the usual

regression statistics.

ln Wagei = B1 + B2 Age + B3 Female + B4 Nonwhite + B5 Union + B6 Education + B7 Experience + ui

where ln Wage = (natural logarithm of Wage) b. How do you interpret each regression coefficient? c. Which of these coefficients are statistically significant at the 5% level? Also

obtain the p value of each estimated t value. d. Do union workers, on average, earn a higher hourly wage? e. Do female workers, on average, earn less than their male counterparts? f. Is the average hourly wage of female nonwhite workers lower than the

average hourly wage of female white workers? How do you know? (Hint: interaction dummy.)

g. Is the average hourly wage of female union workers higher than the aver- age hourly wage of female non-union workers? How do you know?

h. Using the data, develop alternative specifications of the wage function, taking into account possible interactions between dummy variables and between dummy variables and quantitative variables.

*Optional. 17Paul R. Rudd, An Introduction to Classical Econometric Theory, Oxford University Press, New

York, 2000, pp. 17–18. These data are derived from the Data Extraction System (DES) of the Census Bureau: http://www.census.gov/DES/www/welcome.html.

guj75845_ch06.qxd 4/16/09 11:56 AM Page 215

guj75845_ch06.qxd 4/16/09 11:56 AM Page 216

PART II REGRESSION ANALYSIS

IN PRACTICE

217

In this part of the book, consisting of Chapters 7 through 10, we consider sev- eral practical aspects of the linear regression model. The classical linear regres- sion model (CLRM) developed in Part I, although a versatile model, is based on several simplifying assumptions that may not hold in practice. In this part we find out what happens if one or more of these assumptions are relaxed or are not fulfilled in any given situation.

Chapter 7 on model selection discusses the assumption of the CLRM that the model chosen for investigation is the correct model. In this chapter we discuss the consequences of various types of misspecification of the regression model and suggest appropriate remedies.

Chapter 8 on multicollinearity tries to determine what happens if two or more explanatory variables are correlated. Recall that one of the assumptions of the CLRM is that explanatory variables do not have a perfect linear rela- tionship(s) among themselves. This chapter shows that as long as explanatory variables are not perfectly linearly related, the ordinary least squares (OLS) estimators are still best linear unbiased estimators (BLUE).

Chapter 9 on heteroscedasticity discusses the consequences of violating the CLRM assumption that the error variance is constant. This chapter shows that if this assumption is violated, OLS estimators, although unbiased, are no longer efficient. In short, they are not BLUE. But this chapter shows how, with some simple transformations, we can eliminate the problem of heteroscedasticity.

Chapter 10 on autocorrelation considers yet another departure from the CLRM by examining the consequences of correlation in error terms. As in the

guj75845_ch07.qxd 4/16/09 11:57 AM Page 217

case of heteroscedasticity, in the presence of autocorrelation the OLS estimators, although unbiased, are not efficient; that is, they are not BLUE. But we show in this chapter how, with suitable transformation of the data, we can minimize the problem of autocorrelation.

218 PART TWO: REGRESSION ANALYSIS IN PRACTICE

guj75845_ch07.qxd 4/16/09 11:57 AM Page 218

CHAPTER 7 MODEL SELECTION:

CRITERIA AND TESTS

219

In the preceding chapters we considered several single-equation linear regres- sion models, including the score function for math S.A.T. scores, the Phillips curve, and the Cobb-Douglas production function. In presenting these models we assumed implicitly, if not explicitly, that the chosen model represents “the truth, the whole truth, and nothing but the truth”; that is, that it correctly mod- els the phenomenon under study. More technically, we assumed that there is no specification bias or specification error in the chosen model. A specification error occurs when instead of estimating the correct model we estimate another model, albeit unintentionally. In practice, however, searching for the true model can be like searching for the Holy Grail. We may never know what the true model is, but we hope to find a model that is a reasonably accurate representation of reality.

Because of its practical importance, we take a closer look at how to go about formulating an econometric model. Specifically, we consider the following questions:

1. What are the attributes of a “good” or “correct” model? 2. Suppose an omniscient econometrician has developed the “correct”

model to analyze a particular problem. However, because of data avail- ability, cost considerations, oversight, or sheer ignorance (which is not always bliss), the researcher uses another model, and thus, in relation to the “correct” model, commits a specification error. What type of specifi- cation errors are we likely to make in practice?

3. What are the consequences of the various specification errors? 4. How do we detect a specification error? 5. What remedies can we adopt to get back to the correct model if a specifi-

cation error has been made?

guj75845_ch07.qxd 4/16/09 11:57 AM Page 219

7.1 THE ATTRIBUTES OF A GOOD MODEL

Whether a model chosen in empirical analysis is good, or appropriate, or the “right” model cannot be determined without some reference criteria, or guidelines. A. C. Harvey,1 a noted econometrician, lists the following criteria by which we can judge a model.

Parsimony A model can never completely capture the reality; some amount of abstraction or simplification is inevitable in any model building. The Occam’s razor, or the principle of parsimony, suggests that a model be kept as simple as possible.

Identifiability This means that, for a given set of data, the estimated para- meters must have unique values or, what amounts to the same thing, there is only one estimate per parameter.

Goodness of Fit Since the basic thrust of regression analysis is to explain as much of the variation in the dependent variable as possible by explanatory vari- ables included in the model, a model is judged to be good if this explanation, as measured, say, by the adjusted is as high as possible.2

Theoretical Consistency No matter how high the goodness of fit measures, a model may not be judged to be good if one or more coefficients have the wrong signs. Thus, in the demand function for a commodity, if the price coeffi- cient has a positive sign (positively sloping demand curve!), or if the income co- efficient has a negative sign (unless the good happens to be an inferior good), we must look at such results with great suspicion even if the R2 of the model is high, say, 0.92. In short, in constructing a model we should have some theoreti- cal underpinning to it; “measurement without theory” often leads to very dis- appointing results.

Predictive Power As Milton Friedman, the Nobel laureate, notes: “The only relevant test of the validity of a hypothesis [model] is comparison of its prediction with experience.”3 Thus, in choosing between the monetarist and Keynesian models of the economy, by this criterion, we would choose the model whose theoretical predictions are borne out by actual experience.

Although there is no unique path to a good model, keep these criteria in mind in developing an econometric model.

R2 (=R 2),

220 PART TWO: REGRESSION ANALYSIS IN PRACTICE

1A. C. Harvey, The Economic Analysis of Time Series, Wiley, New York, 1981, pp. 5–7. The following discussion leans heavily on this material. See also D. F. Hendry and J. F. Richard, “On the Formulation of Empirical Models in Dynamic Econometrics,” Journal of Econometrics, vol. 20, October 1982, pp. 3–33.

2Besides there are other criteria that have been used from time to time to judge the goodness of fit of a model. For an accessible discussion of these other criteria, see G. S. Maddala, Introduction to Econometrics, Macmillan, New York, 1988, pp. 425–429.

3Milton Friedman, “The Methodology of Positive Economics,” Essays in Positive Economics, University of Chicago Press, 1953, p. 7.

R2,

guj75845_ch07.qxd 4/16/09 11:57 AM Page 220

7.2 TYPES OF SPECIFICATION ERRORS

As noted previously, a model should be parsimonious in that it should include key variables (called core variables) suggested by theory and should relegate minor influences (called peripheral variables) to the error term u. In this section we consider several ways in which a model can be deficient, which we label specification errors.

The topic of specification errors is vast. In this chapter we will discuss as suc- cinctly as possible some of the major specification errors that a researcher may encounter in practice. In particular, we will discuss the following specification errors:

1. Omission of a relevant variable(s). 2. Inclusion of an unnecessary variable(s). 3. Adopting the wrong functional form. 4. Errors of measurement.

To keep the discussion simple, and to avoid matrix algebra, we will consider two- or three-variable models to drive home the essential nature of model spec- ification errors. We will discuss each of the preceding topics separately.

Before we do that, note that the classical linear regression model (CLRM) that we have considered so far makes several simplifying assumptions. A vio- lation of one or more of its assumptions may itself constitute a specification error. For example, the assumption that the error term ui is uncorrelated (the assumption of no autocorrelation) or the assumption that the error variance is constant (the assumption of homoscedasticity) may not hold in practice. Because of their practical importance, we discuss these two topics in Chapters 9 and 10.

7.3 OMISSON OF RELEVANT VARIABLE BIAS: “UNDERFITTING” A MODEL

As noted in the introduction to this chapter, for a variety of reasons, a researcher may omit one or more explanatory variables that should have been included in the model. What are the consequences of such an omission for our ordinary least squares (OLS) estimating procedure?

To be specific, consider the data given in Problem 4.14 and consider the fol- lowing model:

(7.1)

where Y = child mortality rate, X2 = per capita GNP, and X3 = female literacy rate. All these variables are defined in Problem 4.14.

But instead of estimating the regression in Equation (7.1), we estimate the following function:

(7.2)Yt = A1 + A2X2t + vt

Yi = B1 + B2X2i + B3X3i + ui

CHAPTER SEVEN: MODEL SELECTION: CRITERIA AND TESTS 221

guj75845_ch07.qxd 4/16/09 11:57 AM Page 221

which is the same as Equation (7.1), except that it excludes the “relevant” vari- able X3. Note that v like u is a stochastic error term. Also, notice that we are using the B’s to represent the parameters in the “true” regression and the A’s to represent the parameters in the “incorrectly specified” regression: Equation (7.2) in relation to Eq. (7.1) is misspecified. What are the consequences of this mis- specification, which can be called the omitted variable bias?

We first state the consequences of dropping the variable X3 from the model in general terms and then illustrate them with the child mortality data.

The consequences of omitting X3 are as follows:

1. If the omitted, or left-out, variable X3 is correlated with the included variable X2, a1 and a2 are biased; that is, their average, or expected, values do not coincide with the true values.4 Symbolically,

where E is the expectations operator. As a matter of fact, it can be shown that5

(7.3)

(7.4)

where b32 is the slope coefficient in the regression of the omitted variable X3 on the included variable X2. Obviously, unless the last term in Equation (7.3) is zero, a2 will be a biased estimator, the extent of the bias given by the last term. If both B3 and b32 are positive, a2 will have an upward bias—on the average it will overestimate the true B2. But this result should not be surprising, for X2 represents not only its direct effect on Y but also its indirect effect (via X3) on Y. In short, X2 gets credit for the influence that is rightly attributed to X3, as shown in Figure 7-1.

On the other hand, if B3 is positive and b32 is negative, or vice versa, a2 will be biased downward—on the average it will underestimate the true B2. Similarly, a1 will be upward biased if the last term in model (7.4) is posi- tive and downward biased if it is negative.

2. In addition a1 and a2 are also inconsistent; that is, no matter how large the sample size is, the bias does not disappear.

3. If X2 and X3 are uncorrelated, b32 will be zero. Then, as Eq. (7.3) shows, a2 is unbiased. It is consistent as well. (As noted in Appendix D, if an estimator is unbiased [which is a small sample property], it is also consistent [which is a large sample property]. But the converse is not true; estimators can be consistent but may not be necessarily unbiased.) But a1 still remains

E(a1) = B1 + B3(X3 - b32X2) E(a2) = B2 + B3b32

E(a1) Z B1 and E(a2) Z B2

222 PART TWO: REGRESSION ANALYSIS IN PRACTICE

4A technical point: Shouldn’t X2 and X3 be uncorrelated by the “no multicollinearity” assump- tion? Recall from Chapter 4 that the assumption that there is no perfect collinearity among the X variables refers to the population regression (PRF) only; there is no guarantee that in a given sam- ple the X’s may not be correlated.

5The proof can be found in Gujarati and Porter, Basic Econometrics, 5th ed., McGraw-Hill, New York, 2009, pp. 519–520.

guj75845_ch07.qxd 4/16/09 11:57 AM Page 222

biased, unless is zero in Eq. (7.4). Even in this case the consequences mentioned in points (4) to (6) below hold true.

4. The error variance estimated from Eq. (7.2) is a biased estimator of the true error variance In other words, the error variance estimated from the true model (7.1) and that estimated from the misspecified model (7.2) will not be the same; the former is an unbiased estimator of the true , but the latter is not.

5. In addition, the conventionally estimated variance of is a biased estimator of the variance of the true estimator b2. Even in the case where b32 is zero, that is, X2 and X3 are uncorrelated, this variance remains biased, for it can be shown that6

(7.5)

That is, the expected value of the variance of a2 is not equal to the variance of b2. Since the second term in Equation (7.5) will always be positive

E[var (a2)] = var (b2) + B23gx23i

(n - 2)gx22i

a2 (= �N2>gx22)

�2

�2.

X3

CHAPTER SEVEN: MODEL SELECTION: CRITERIA AND TESTS 223

Effect of X3 on X2 b32 � 0.00256

N et

e ffe

ct o

f X 2 o

n Y

b 2 �

� 0.

00 56

N et

e ffe

ct o

f X 3 o

n Y

b3 �

� 2.

23 16

G ro

ss e

ffe ct

o f X

2 o

n Y

b 2 �

b 3 b 3

2 �

� 0.

01 14

X2

Y

X3

Net and gross effects of X2 on Y

Note: Net means controlling the influence of other variables. Gross means not controlling the influence of other variables.

FIGURE 7-1

6For proof, see Jan Kmenta, Elements of Econometrics, 2nd ed., Macmillan, New York, 1986, pp. 444–445. Note: This is true only when b32 = 0, which is not the case in our example, as can be seen from Equation (7.8), which follows.

guj75845_ch07.qxd 4/16/09 11:57 AM Page 223

(Why?), var (a2) will, on the average, overestimate the true variance of b2. This means it will have a positive bias.

6. As a result, the usual confidence interval and hypothesis-testing proce- dures are unreliable. In the case of Eq. (7.5), the confidence interval will be wider, and therefore we may tend to accept the hypothesis that the true value of the coefficient is zero (or any other null hypothesis) more frequently than the true situation demands.

Although we have not presented the proofs of the preceding propositions, we will illustrate some of these consequences with the child mortality rate example.

Example 7.1. Determinants of Child Mortality Rate

Using the data given in Table 4-7 (found on the textbook’s Web site), the em- pirical counterpart of Eq. (7.1) is as follows:

(7.6)

The results of the misspecified equation (7.2) are as follows:

(7.7)

Note the following differences between the two regressions:

1. The misspecified Equation (7.7) shows that as per capita GNP (PGNP) in- creases by a dollar, on the average, the child mortality rate goes down by about 0.01. On the other hand, in the true model, if PGNP goes up by a dollar, the average child mortality rate (CM) goes down by only about 0.006. In the present instance, in absolute terms (i.e., disregarding the sign), the misspecified equation overestimates the true impact of PGNP on CM, that is, it is upward biased. The nature of this bias can be seen easily if we regress the female literacy rate (FLR) (the omitted variable) on PGNP, the included variable in the model. The results are as follows:

(7.8)

Thus the slope coefficient b32 = 0.00256. Now from Equation (7.6) we can see that the estimated B2 = -0.0056 and the estimated B3 = -2.2316. Therefore, from Eq. (7.3) we obtain

which is just about what we obtain from the misspecified Eq. (7.7). Note that it is the product of B3 (the true value of the omitted variable) and b32 (the slope coefficient in the regression of the omitted variable on the

BN2 + BN3b32 = -0.0056 + (-2.2316)(0.00256) L -0.0114

se = (3.5553) (0.0011) r2 = 0.0721 FLRi = 47.5971 + 0.00256PGNP

se = (9.8455) (0.0032) r2 = 0.1662 CMi = 157.4244 - 0.0114PGNPi se = (11.5932) (0.0019) (0.2099) R2 = 0.7077 CMi = 263.6416 - 0.0056PGNPi - 2.2316FLRi

224 PART TWO: REGRESSION ANALYSIS IN PRACTICE

guj75845_ch07.qxd 4/16/09 11:57 AM Page 224

included variable) that determines the nature of the bias, upward or downward. Thus, by incorrectly dropping the FLR variable from the model, as in Eq. (7.2), or its empirical counterpart Eq. (7.7), we are not only neglecting the impact of FLR on CM (B3) but also the impact of FLR on PGNP (b32). The “lonely” variable PGNP included in the misspecified Eq. (7.7) model thus has to carry the “burden” of this omission, which, so to speak, prevents it from showing its true impact on CM (-0.0056 versus -0.0114). All this can be seen vividly in Figure 7-1.

2. The intercept term is also biased, but here it underestimates the true intercept term (157.42 versus 263.64).

3. The standard errors as well as the r2’s are also substantially different between the two regressions.

All these results are in accord with the theoretical results of misspecification discussed earlier. You can see at once that if we were to engage in hypothesis testing based upon the misspecified Eq. (7.7), our conclusions would be of du- bious values, to say the least. Therefore, in developing a model, exercise utmost care. There is little doubt that dropping relevant variables from a model can have very serious consequences. This is why it is very important that in devel- oping a model for empirical analysis, we should pay close attention to the ap- propriate theory underlying the phenomenon under study so that all theoreti- cally relevant variables are included in the model. If such relevant variables are excluded from the model, then we are “underfitting” or “underspecifying” the model; in other words, we are omitting some important variables.

7.4 INCLUSION OF IRRELEVANT VARIABLES: “OVERFITTING” A MODEL

Sometimes researchers adopt the “kitchen sink” approach by including all sorts of variables in the model, whether or not they are theoretically dictated. The idea behind overfitting or overspecifying the model (i.e., including unneces- sary variables) is the philosophy that so long as you include the theoretically relevant variables, inclusion of one or more unnecessary or “nuisance” vari- ables will not hurt—unnecessary in the sense that there is no solid theory that says they should be included. Such irrelevant variables are often included inad- vertently because the researcher is not sure about their role in the model. And this will happen if the theory underlying a particular phenomenon is not well developed. In that case inclusion of such variables will certainly increase R2

(and adjusted R2 if the absolute t value of the coefficient of the additional vari- able is greater than 1), which might increase the predictive power of the model.

What are the consequences of including unnecessary variables in the model, which may be called the (inclusion of) irrelevant variable bias? Again, to emphasize the point, we consider the case of simple two- and three-variable models. Now suppose that

(7.9)Yi = B1 + B2X2i + ui

CHAPTER SEVEN: MODEL SELECTION: CRITERIA AND TESTS 225

guj75845_ch07.qxd 4/16/09 11:57 AM Page 225

is the correctly specified model, but a researcher adds the superfluous variable X3 and estimates the following model:

(7.10)

Here the specification error consists in overfitting the model, that is, including the unnecessary variable X3, unnecessary in the sense that a priori it has no effect on Y. The consequences of estimating the regression (7.10) instead of the true model (7.9) are as follows:

1. The OLS estimators of the “incorrect” model (7.10) are unbiased (as well as consistent). That is, E(a1) = B1, E(a2) = B2, and E(a3) = 0. This is not dif- ficult to see. If X3 does not belong in the model, B3 is expected to be zero. Hence, in Eqs. (7.3) and (7.4) the B3 term will drop out.

2. The estimator of obtained from regression (7.10) is correctly estimated.

3. The standard confidence interval and hypothesis-testing procedure on the basis of the t and F tests remains valid.

4. However, the a’s estimated from the regression (7.10) are inefficient— their variances will be generally larger than those of the b’s estimated from the true model (7.9). As a result, the confidence intervals based on the standard errors of a’s will be larger than those based on the standard errors of b’s of the true model, even though the former are acceptable for the usual hypothesis-testing procedure. What will happen is that the true coefficients will not be estimated as precisely as if we had used the cor- rect model (7.9). In short, the OLS estimators are LUE (linear unbiased estimators) but not BLUE.

Notice the difference between the two types of specification errors we have considered thus far. If we exclude a relevant variable (the case of underfitting), the coefficients of variables retained in the model are generally biased as well as inconsistent, the error variance is incorrectly estimated, the standard errors of estimators are biased, and therefore the usual hypothesis-testing procedure be- comes invalid. On the other hand, including an irrelevant variable in the model (the case of overfitting), still gives us unbiased and consistent estimates of the coefficients of the true model, the error variance is correctly estimated, and the standard hypothesis-testing procedure is still valid. The major penalty we pay for the inclusion of the superfluous variable(s) is that the estimated variances of the coefficients are larger, and as a result, our probability inferences about the true parameters are less precise because the confidence intervals tend to be wider. In some cases we will accept the hypothesis that a true coefficient value is zero because of the wider confidence interval; that is, we will fail to recognize significant relationships between the dependent variable and the explanatory variable(s).

An unwarranted conclusion from the preceding discussion is that it is better to include irrelevant variables than to exclude the relevant ones. But this philosophy should not be encouraged because, as just noted, the addition of

�2

Yi = A1 + A2X2i + A3X3i + vi

226 PART TWO: REGRESSION ANALYSIS IN PRACTICE

guj75845_ch07.qxd 4/16/09 11:57 AM Page 226

unnecessary variables will lead to a loss in the efficiency of the estimators (i.e., larger standard errors) and may also lead to the problem of multicollinearity (Why?), not to mention the loss of degrees of freedom.

In general, the best approach is to include only explanatory variables that on theoretical grounds directly influence the dependent variable and are not accounted for by other included variables.

Example 7.2.

In Chapter 6 we considered an example relating expenditure on food (Y) to income after tax (X) and the gender dummy D (1 if female and 0 if male). The regression results are given in Equation (6.9). Later we redid this model in- cluding differential intercept and differential slope dummies. The results are given in Table 6-6. As we saw, in the latter regression neither the differential intercept nor the differential slope coefficient was significant, whereas in Eq. (6.9) the differential intercept coefficient was significant. It is quite possi- ble that the differential slope dummy variable was superfluous. That is to say, although the average level of food expenditure of the two sexes is different, it is quite possible that the rate of change of food expenditure in relation to after-tax income is the same for both sexes.

7.5 INCORRECT FUNCTIONAL FORM

We now consider a different type of specification error, that involving incorrect (wrong) functional form bias. Assume that variables Y, X2, and X3 included in the model are theoretically the correct variables. Now consider the following two specifications of the model:

(7.11)

(7.12)

The variables that enter the model in Equation (7.11) also enter the regression (7.12), except the functional relationship between the variables is different; in the regression (7.12) the (natural) logarithm of Y is a linear function of the (nat- ural) logarithms of X2 and X3; that is, it is a log-linear model. Note that in Eq. (7.12) A2 measures the partial elasticity Y with respect to the X2, whereas in Eq. (7.11) B2 simply measures the rate of change (i.e., slope) of Y with respect to X2. Similarly, in Eq. (7.12) A3 measures the partial elasticity of Y with respect to X3, whereas in Eq. (7.11) B3 measures the rate of change of Y with respect to X3. This is all familiar territory from Chapter 5. Note that not all the explanatory variables in Eq. (7.12) need to be in logarithmic form; some may be in logarith- mic form and some may be in linear form, as in Equation (7.13) below.

Now the dilemma in choosing between the models (7.11) and (7.12) is that economic theory is usually not strong enough to tell us the functional form in which the dependent and explanatory variables are related. Therefore, if the regression (7.12) is in fact the true model and we fit Eq. (7.11) to the data, we are

ln Yt = A1 + A2 ln X2t + A3 ln X3t + vt

Yt = B1 + B2X2t + B3X3t + ut

CHAPTER SEVEN: MODEL SELECTION: CRITERIA AND TESTS 227

guj75845_ch07.qxd 4/16/09 11:57 AM Page 227

likely to commit as much of a specification error as if the situation were con- verse, although in both cases the economically relevant variables are included. Without going into theoretical fine points, if we choose the wrong functional form, the estimated coefficients may be biased estimates of the true coefficients.

Example 7.3. U.S. Expenditure on Imported Goods

To provide some insight into this problem, consider the data given in Table 3-7, found on the textbook Web site. These data relate to U.S. expenditure on im- ported goods (Y) and personal disposable income (X), both measured in bil- lions of dollars, for the period 1959 to 2006.

Using these data, we obtained the following results:

(7.13)

where * signifies a p value less than 1%. In this model year represents the trend variable.

(7.14)

where * signifies a p value less than 1%. Before deciding between the two models, let us look at the results briefly. In

Equation (7.13) all the regression coefficients are individually as well as col- lectively significant (see the F value). The slope coefficient of 0.2975 means holding other variables constant, average expenditure on imported goods goes up by about 30 cents for every dollar increase in personal disposable income (PDI). Similarly, holding other variables constant (PDI here), the slope coefficient of -18.53 suggests that, on average over the sample period, expen- diture on imported goods was decreasing by about 18.5 billions of dollars per year. In other words, there was a downward trend. The R2 value is very high.

Turning to Equation (7.14), we see that the elasticity of import expenditure with respect to PDI was about 1.49, ceteris paribus. The coefficient of -0.0085 suggests that, holding other variables constant, on average, expenditure on imports was declining at the rate of about 0.85 percent (recall from Chapter 5 our discussion regarding logarithmic and semi-logarithmic models). The R2

value of this model is also quite high. How do we choose between Eqs. (7.13) and (7.14)? Although the R2 values

of the two models cannot be directly compared (Why?), they are both high. Also, both models are collectively significant (on the basis of the F test). For the linear model we can compute the elasticity of expenditure on imports with respect to PDI by using the mean values of these two variables.

R2 = 0.9959; R2 = 0.9957; F = 5421.7932

t = (0.7014) (13.6501)* (-1.0215) ln Yt = 10.9327 + 1.4857 ln Xt - 0.0085Year

R2 = 0.9839; R2 = 0.9832; F = 1376.7802

t = (6.3790)* (20.5203)* (-6.4030)* YNt = 36295.3168 + 0.2975Xt - 18.5253Year

228 PART TWO: REGRESSION ANALYSIS IN PRACTICE

guj75845_ch07.qxd 4/16/09 11:57 AM Page 228

Calculations will show that this value is 1.7807.7 From the log model, we get this elasticity as 1.4857. Of course, the former elasticity is a kind of average, whereas the latter elasticity remains the same regardless of the value of X at which it is measured. So we cannot compare the two directly.

So where do we stand? Can we devise a test to choose between the two models? We will consider one such test in Section 7.7, and we will revisit this question then.

7.6 ERRORS OF MEASUREMENT

All along we have assumed implicitly that the dependent variable Y and the explanatory variables, the X’s, are measured without any errors. Thus, in the regression of consumption expenditure on income and wealth of households, we assume that the data on these variables are accurate; they are not guess esti- mates, extrapolated, interpolated, or rounded off in any systematic manner, such as to the nearest hundredth dollar. Unfortunately, this ideal is not met in practice for a variety of reasons, such as nonresponse errors, reporting errors, and computing errors.

The consequences of errors of measurement depend upon whether such errors are in the dependent variable or the explanatory variables.

Errors of Measurement in the Dependent Variable

If there are errors of measurement in the dependent variable only, the following consequences ensue, which we state without proof:8

1. The OLS estimators are unbiased. 2. The variances of OLS estimators are also unbiased. 3. But the estimated variances of the estimators are larger than in the case

where there are no errors of measurement. The reason that the estimated variances of the estimators are larger than necessary is because the error in the dependent variable gets added to the common error term, ui.

So it seems that the consequences of measurement errors in the dependent variable may not matter much in practice.

Errors of Measurement in the Explanatory Variable(s)

In this case the consequence are as follows:

1. The OLS estimators are biased. 2. They are also inconsistent; that is, they remain biased even if the sample

size increases indefinitely.

CHAPTER SEVEN: MODEL SELECTION: CRITERIA AND TESTS 229

7 . 8For details, see Gujarati and Porter, Basic Econometrics, 5th ed., McGraw-Hill, New York, 2009,

pp. 482–486.

Elasticity = 0Y0X # XY = 0.2975 3306.688 552.447 = 1.7807

guj75845_ch07.qxd 4/16/09 11:57 AM Page 229

Obviously, an error of measurement in the explanatory variable(s) is a serious problem. Of course, if there are measurement errors in both the dependent and explanatory variables, the consequences can be quite serious.

It is one thing to document the consequences of errors of measurement, but it is quite another thing to find the appropriate remedy because it may not be easy to detect such errors. For example, data on variables such as wealth are no- toriously difficult to obtain. Similarly, data on income derived from activities such as the sale of illegal drugs or gambling are extremely difficult to obtain. In situations such as these not much can be done.

If there are errors of measurement in the explanatory variables, one sug- gested remedy is the use of instrumental or proxy variables. These variables, while highly correlated with original X variables, are uncorrelated with mea- surement errors and the usual regression term, ui. In some situations such proxy variables can be found, but it is generally not that easy to find them.

The best practical advice is to make sure that the data on the X variables that you include in your model are measured as accurately as possible; avoid errors of recording, rounding, or omission. If there are changes in the definition of the variables over time, make sure that you use comparable data.

7.7 DETECTING SPECIFICATION ERRORS: TESTS OF SPECIFICATION ERRORS

To know the consequences of specification errors is one thing, but to find out that we have committed such errors is quite another thing, for we (hopefully) do not deliberately set out to commit such errors. Often specification errors arise in- advertently, perhaps because we have not formulated the model as precisely as possible because the underlying theory is weak, or we do not have the right kind of data to test the theoretically correct model, or the theory is silent about the functional form in which the dependent variable is related to explanatory vari- ables. The practical issue is not that such errors are made, for they sometimes are, but how to detect them. Once it is found that specification errors have been made, the remedies often suggest themselves. If, for example, it can be shown that a variable is inappropriately omitted from a model, the obvious remedy is to include that variable in the analysis, assuming of course that data on that vari- able are available. We now consider several tests of specification errors.

Detecting the Presence of Unnecessary Variables

Suppose we have the following four-variable model:

(7.15)

Now if theory says that all three X variables determine Y, we should keep them in the model even though after empirical testing we find that the coefficient of one or more of the X variables is not statistically significant. Therefore, the ques- tion of irrelevant variables does not arise in this case. However, sometimes we

Yi = B1 + B2X2i + B3X3i + B4X4i + ui

230 PART TWO: REGRESSION ANALYSIS IN PRACTICE

guj75845_ch07.qxd 4/16/09 11:57 AM Page 230

have control variables in the model that are only there to prevent omitted vari- able bias. It may then be the case that if the control variables are not statistically significant and dropping them does not substantially alter our point estimates or hypothesis test results, then dropping them may clarify the model. We can then drop them but mention that they were tried and made no difference.9

Suppose in the model (7.15) X4 is the control variable in the sense that we are not absolutely sure whether it really belongs in the model. One simple way to find this out is to estimate the regression (7.15) and test the significance of b4, the estimator of B4. Under the null hypothesis that B4 = 0, we know that t = b4/se(b4) follows the t distribution with (n - 4) d.f. (Why?) Therefore, if the computed t value does not exceed the critical t value at the chosen level of significance, we do not reject the null hypothesis, in which case the variable X4 is probably a su- perfluous variable.10 Of course, if we reject the null hypothesis, the variable probably belongs in the model.

But suppose we are not sure that both X3 and X4 are relevant variables. In this case we would like to test the null hypothesis that B3 = B4 = 0. This can be done easily by the F test discussed in Chapter 4. (For details, see Section 4.12 on re- stricted least squares.)

Example 7.4. Life Expectancy in 85 Countries

To assess the impact of income and access to health care on life expectancy, we collected data on a sample of 85 countries and obtained the results shown in Table 7-1. The dependent variable in each case is life expectancy measured in years. (The raw data are given in Chapter 9, Table 9-6.)

CHAPTER SEVEN: MODEL SELECTION: CRITERIA AND TESTS 231

9In this case the researcher should inform the reader that the results, including the dropped variables, could be made available on request.

10We say “probably” because if there is collinearity among X variables, then, as we show in Chapter 8, standard errors of the estimated parameters tend to be inflated relative to the values of the coefficients, thereby reducing the estimated t values.

MODELS OF LIFE EXPECTANCY

Explanatory variable Model 1 Model 2 Model 3

Intercept 39.4380 (20.2392) 40.5082 (20.8204) 43.1662 (10.0172) Income 0.0054 (4.4417) 0.0016 (3.4848) 0.0014 (2.6836) Access 0.2833 (9.9599) 0.2499 (8.0803) 0.1491 (1.0010) Income squared — -6.28E-08 (-2.4060) -5.54E-08 (-1.9612) Access squared — — 0.0008 (0.6918) R 2 0.7741 0.7892 0.7904 F value 140.5332 101.0906 75.4496

Notes: Income = per capita income in U.S. dollars. Access = an index of access to health care.

The figures in parentheses are the estimated t values. -6.28E-08 is a short form for -0.0000000628. The difference among these models is that Model 3 includes all the variables, whereas the other two drop

one or more variables.

TABLE 7-1

guj75845_ch07.qxd 4/16/09 11:57 AM Page 231

A priori we would expect a positive relationship between income and life expectancy and between access and life expectancy. This expectation is borne out by Model 1. The addition of the income-squared variable in Model 2 is to find out if life expectancy increases at an increasing rate (in which case the squared income coefficient will be positive) or increases at a decreasing rate (in which case the squared income coefficient will be negative) with respect to income.11 The results show that it is the latter case. Model 3 adds the vari- able access-squared to find out if life expectancy is increasing at an increasing rate or at a decreasing rate with respect to access. The results indicate that it is increasing at an increasing rate. However, this coefficient is not statistically significant. Not only that, when we add this variable, the access coefficient it- self becomes statistically insignificant. Does this mean that access and access- squared variables are superfluous?

To see if this is the case, we can use the F test given in Equation (4.56), which gives the following result:

Note that in the present case m = 2, and k = 5. For 2 d.f. in the numerator and 80 d.f. in the denominator, the probability of ob- taining an F value of about 3.11 or greater is about 5 percent. It seems that access and access-squared are not superfluous variables. Is access-squared possibly a superfluous variable? Dropping this variable, we obtain Model 2, which shows that access has a statistically significant impact on life ex- pectancy, which is not an unexpected result.

As this example shows, detecting the presence of an irrelevant variable(s) is not a difficult task. But it is very important to remember that in carrying out these tests of specifications, we have a specific model in mind, which we accept as the “true” model. Given that model, then, we can find out whether one or more X variables are really relevant by the usual t and F tests. However, bear in mind that we should not use t and F tests to build a model iteratively; that is, we cannot say that initially Y is related to X2 because b2 is statistically significant and then ex- pand the model to include X3 and decide to keep that variable in the model if b3 turns out to be statistically significant. Such a procedure is known as stepwise regression.

R2ur = 0.7904, R2r = 0.7741,

= (0.7904 - 0.7741)>2

(1 - 0.7904)>(85 - 5) = 3.1106

F = (R2ur - R2r)>m

(1 - R2ur)>(n - k) ' F2,80

232 PART TWO: REGRESSION ANALYSIS IN PRACTICE

11If you have a general quadratic equation like then whether Y increases at an increasing or decreasing rate when X changes will generally depend on the signs of a, b, c and the value of X. On this, see Alpha C. Chang, Fundamental Methods of Mathematical Economics, 3rd ed., McGraw-Hill, New York, 1984, Chapter 9.

Y = a + bX + cX2,

guj75845_ch07.qxd 4/16/09 11:57 AM Page 232

This strategy, called data mining, is generally not recommended, for if a pri- ori X3 belonged in the model to begin with, it should have been introduced. Excluding X3 in the initial regression would then lead to the omission-of- relevant-variable bias with the potentially serious consequences that we have already discussed. This point cannot be overemphasized: Theory must be the guide to model building; measurement without theory can lead up a blind alley.

In our life expectancy example income and access to health care are obvi- ously important variables in determining life expectancy, although we are not entirely sure of the form in which these variables enter the model. So to some extent some kind of experimentation (data mining, if you will) will be necessary to determine the appropriate functional form of the relationship between the dependent and explanatory variables. This is especially so if there are several explanatory variables in a model and we cannot graph them together to get a vi- sual impression about the likely form of the relationship between them and the dependent variable.

Tests for Omitted Variables and Incorrect Functional Forms

The prescription that theory should be the underpinning of any model begs the question: What is theoretically the correct model? Thus, in our Phillips curve ex- ample discussed in an earlier chapter, although the rate of change of wages (Y) and the unemployment rate (X) are expected to be negatively related, they could be related in any of the following forms:

(7.16)

(7.17)

(7.18)

Or are they related in some other functional relationship? As noted in the introduction to this chapter, this is one of those questions that

cannot be answered definitely. Pragmatically, we proceed as follows. Based upon theory or introspection and prior empirical work, we develop a model that we believe captures the essence of the subject under study. We then subject the model to empirical testing. After we obtain the results, we begin the post- mortem, keeping in mind the criteria of a good model discussed earlier. It is at this stage that we learn if the chosen model is adequate. In determining model adequacy, we look at some broad features of the results, such as:

1. R2 and adjusted R2 ( ). 2. The estimated t ratios. 3. Signs of the estimated coefficients in relation to their prior expectations.

If these diagnostics are reasonably good, we accept the chosen model as a fair representation of reality.

R2

Yt = B1 + B2 1

Xt + ut B2 7 0

ln Yt = B1 + B2 ln Xt + ut B2 6 0Yt = B1 + B2Xt + ut B2 6 0

CHAPTER SEVEN: MODEL SELECTION: CRITERIA AND TESTS 233

guj75845_ch07.qxd 4/16/09 11:57 AM Page 233

By the same token, if the results do not look encouraging because the R2 is too low, or because very few coefficients are statistically significant or have the correct signs, then we begin to worry about model adequacy and to look for remedies. Perhaps we have omitted an important variable or have used the wrong functional form. To help determine whether model adequacy is due to one or more of these problems, we can use some of the methods we are cur- rently discussing.

Examination of Residuals It is always a good practice to plot the residuals ei (or et, in time series) of the fitted model, for such a plot may reveal specifica- tion errors, such as omission of an important variable or incorrect functional form. As we will see in Chapters 9 and 10, a residual plot is an invaluable tool to diagnose heteroscedasticity and autocorrelation.

To see this, return to model (7.13) where we regressed expenditure on im- ports on PDI and year. Suppose we erroneously drop the year or trend variable and estimate the following regression:

(7.19)

The results are as follows:

(7.20)

Now if Eq. (7.13) is in fact the true model in that the trend variable X3 belongs in the model, but we use model (7.19), then we are implicitly saying that the error term in the model (7.19) is

(7.21)

because it will reflect not only the truly random term u, but also the variable X3. No wonder in this case residuals estimated from Eq. (7.19) will show some sys- tematic pattern, which may be due to the excluded variable X3. This can be seen very vividly from Figure 7-2, which plots the residuals (S1) from the inappro- priately estimated regression (7.19). Also shown in this figure are the residuals (S2) from the “correct” model (7.13).

The difference between the two residual series plotted in this figure is ob- vious. The residuals series S2 may suggest that even if we include the trend variable in our import expenditure function the residuals may not be en- tirely randomly distributed. If that is the case, model (7.13) itself may not be correctly specified. Perhaps an index of import prices in relation to domestic prices has been left out or perhaps a quadratic term in the trend variable is missing.

In any case, an examination of residuals from the estimated model is often an extremely useful adjunct to model building.

vt = B3X3t + ut

t = (-5.7782) (38.0911); r2 = 0.9693 YNt = -136.1649 + 0.2082Xt YNt = B1 + B2Xt + vt

234 PART TWO: REGRESSION ANALYSIS IN PRACTICE

guj75845_ch07.qxd 4/16/09 11:57 AM Page 234

Besides examining the residuals, we can use several formal tests of model specification, such as: (1) the MacKinnon-White-Davidson (MWD) test, (2) Ramsey’s RESET (regression error specification) test, (3) the Wald test, (4) the Lagrange Multiplier test, (5) the Hausman test, and (6) Box-Cox transformations (to determine the functional form of the regression model). A full discussion of these tests is beyond the scope of this book.12 But we will discuss two of these tests, the MWD and RESET tests, in the sections that follow.

Choosing between Linear and Log-linear Regression Models:The MWD Test

Let us revisit the linear and log-linear specifications of the import expenditure function given in Equations (7.13) and (7.14), respectively. As we saw earlier, on the surface both models look reasonable, although the year variable is not statistically significant in Eq. (7.14). To see if one specification is better than the other, we can use the MWD test.13

We illustrate this test with our import expenditure example as follows:

H0: Linear Model: Y is a linear function of the X’s H1: Log-linear Model: ln Y is a linear function of the X’s or a log of the X’s

where, as usual, H0 and H1 denote the null and alternative hypotheses.

CHAPTER SEVEN: MODEL SELECTION: CRITERIA AND TESTS 235

�300

�200

�100

0

100

200

300

400

1998198819781968

Year

Residuals from Regression of Y vs. X and of Y vs. X and Year

S2

S1

Residuals from regressions (7.13) and (7.20)

Notes: S1 are residuals from model (7.20) and S2 are residuals from model (7.13).

FIGURE 7-2

12For a somewhat elementary discussion of these tests, see Gujarati and Porter, Basic Econometrics, 5th ed., McGraw-Hill, New York, 2009, Chapter 13.

13J. MacKinnon, H. White, and R. Davidson,“Tests for Model Specification in the Presence of Alternative Hypotheses; Some Further Results,” Journal of Econometrics, vol. 21, 1983.

guj75845_ch07.qxd 4/16/09 11:57 AM Page 235

The MWD test involves the following steps:

1. Estimate the linear model and obtain the estimated Y values, that is, 2. Estimate the log-linear model and obtain the estimated ln Yi values, that

is, 3. Obtain 4. Regress Y on the X’s and Z1i

Reject H0 if the coefficient of Z1i is statistically significant by the usual t test.

5. Obtain 6. Regress ln Y on the X’s or logs of X’s and Z2i

Reject H1 if the coefficient of Z2 in the preceding equation is statistically significant. The idea behind the MWD test is simple. If the linear model is in fact the correct

model, the constructed variable Z1i should not be significant, because in that case the estimated Y values from the linear model and those estimated from the log- linear model (after taking their antilog values for comparative purposes) should not be different. The same comment applies to the alternative hypothesis H1.

Reverting to our import expenditure example, assume that the true import expenditure function is linear. Under this hypothesis, following the steps just outlined, we obtain the results shown in Table 7-2.

Z2i = antilog (ln Yi) - YNi

Z1i = ln YNi - ln Yi

ln Yi

YNi

236 PART TWO: REGRESSION ANALYSIS IN PRACTICE

ILLUSTRATION OF THE MWD TEST: LINEAR SPECIFICATION

Standard Variable Coefficient error t statistic p value

Intercept 49707.4561 5867.9548 8.4710 0.0000 X 0.3314 0.0149 22.2137 0.0000 Year -25.3498 2.9844 -8.4940 0.0000 Z1 -81.7933 19.8201 -4.1268 0.0002 R-squared 0.9884 F-statistic 1250.4978

Notes: Dependent variable is Y.

TABLE 7-2

ILLUSTRATION OF THE MWD TEST: LOG-LINEAR SPECIFICATION

Standard Variable Coefficient error t statistic p value

Intercept 3.9653 14.0229 0.2828 0.7787 ln X 1.4434 0.0977 14.7748 0.0000 Year -0.0048 0.0074 -0.6417 0.5244 Z2 0.0013 0.0004 3.5630 0.0009 R-squared 0.9968 F-statistic 4558.1058

Notes: Dependent variable is ln(Y).

TABLE 7-3

These results would lead to the rejection of the null hypothesis H0. Let us see if H1 is acceptable. Following the procedure just outlined, we

obtain the regression results shown in Table 7-3.

guj75845_ch07.qxd 4/16/09 11:57 AM Page 236

Since the coefficient of Z2 is statistically significant, we reject H1. Looking at these results, it seems that either model is reasonable, although

the trend variable, year, is not statistically significant in the log-linear model.

Regression Error Specification Test: RESET

To detect the omission of variables and/or the choice of inappropriate func- tional form, Ramsey has developed a general test of model misspecification.14

To fix ideas, let us return to the import expenditure function, but now we regress expenditure on imports (Y) on personal disposable income only (X). This gives the following results:

(7.20) (7.22)

If you plot the residuals from this model against we obtain Figure 7-3. Although and are necessarily zero because of the properties of

OLS estimators discussed in Chapter 2, the residuals in this figure show a pat- tern (probably curvilinear) that might suggest that they vary in some fashion with the estimated Y values. This perhaps suggests that if we were to introduce

in some form as an additional explanatory variable(s) in regression (7.22), it would increase R2. And if the increase in R2 were statistically significant (on the YNi

gei YNigei YNt,

� t = (-5.7782) (38.0911); r2 = 0.9693 YNt = -136.1649 + 0.2082Xt

CHAPTER SEVEN: MODEL SELECTION: CRITERIA AND TESTS 237

14J. B. Ramsey, ”Tests of Specification Errors in Classical Linear Least Squares Regression Analysis,” Journal of the Royal Statistical Society, Series B, vol. 31, 1969, pp. 350–371.

�300

�200

�100

0

100

200

300

400

2000150010005000�500

Forecasted Y

S 4

Residuals from regression of Y on X versus estimated Y

Notes: S4 = Residuals; Forecasted Y = YN.

FIGURE 7-3

guj75845_ch07.qxd 4/16/09 11:57 AM Page 237

basis of the F test discussed in Chapter 4), it would suggest that the initial model was misspecified. This is essentially the idea behind RESET. The steps involved in the application of RESET are as follows:

1. From the chosen model (e.g., Eq. [7.22]), obtain the estimated Yi, namely, 2. Rerun the chosen model by adding powers of , such as etc., to

capture the systematic relationship, if any, between the residuals and the estimated Yi. Since Figure 7-3 shows a curvilinear relationship between the residuals and the estimated Y values, let us consider the following model:

(7.23)

where v is the error term of this model. 3. Let R2 obtained from Equation (7.23) be and that obtained from

Eq. (7.22) be . Then we can use the F test of Equation (4.56), namely,

(7.24)

to find out if the increase in R2 from using Eq. (7.23) is statistically signif- icant.

4. If the computed F value is statistically significant at the chosen level of significance, we can conclude that the initial model (such as Eq. [7.22]) is misspecified.

For our example, the empirical counterpart of Eq. (7.23) is as shown in Table 7-4:

F = AR2new - R2old B/number of new regressors

A1 - R2new B/(n - number of parameters in the new model)

R2old R2new

Yt = B1 + B2Xt + B3YN 2t + B4YN 3t + vt

YN 2i , YN 3 i ,YNi

YNi.

238 PART TWO: REGRESSION ANALYSIS IN PRACTICE

ILLUSTRATION OF RAMSEY’S RESET

Standard Variable Coefficient error t statistic p value

Intercept -39.7720 15.1193 -2.6306 0.0117 X 0.1471 0.0133 11.0550 0.0000

0.0000 0.0001 -0.1458 0.8848 0.0000 0.0000 3.3763 0.0015

R-squared 0.9959

Notes: Dependent variable is Y.

YN 3 YN 2

TABLE 7-4

Now applying the F test given in Equation (7.24), we obtain:

(7.25)

For 2 d.f. in the numerator and 44 d.f. in the denominator, the 1% critical F value is 5.12263. Since the computed F value is much larger than this, the probability of obtaining an F value of as much as 142.7317 or greater must be very small.

F = (0.9959 - 0.9693)/2

(1 - 0.9959)/(48 - 4) = 142.7317

guj75845_ch07.qxd 4/16/09 11:57 AM Page 238

Using statistical software packages or electronic tables we find that the actual probability of this is basically 0.0000.

The conclusion that we draw from this exercise is that the model (7.22) is mis- specified. This is not surprising because we saw earlier that the trend variable belongs in this model. It is quite possible that not only the trend variable but perhaps a squared trend variable should also be included in the model. To find this out, see Problem 7.18.

One advantage of the RESET test is that it is easy to apply, for it does not re- quire that we specify what the alternative model is. But that is also its disad- vantage because knowing that a model is misspecified does not help us neces- sarily in choosing an alternative model. Therefore, we can regard the RESET test primarily as a diagnostic tool.15

7.8 SUMMARY

The major points discussed in this chapter can be summarized as follows:

1. The classical linear regression model assumes that the model used in em- pirical analysis is “correctly specified.”

2. The term correct specification of a model can mean several things, including: a. No theoretically relevant variable has been excluded from the model. b. No unnecessary or irrelevant variables are included in the model. c. The functional form of the model is correct. d. There are no errors of measurement.

3. If a theoretically relevant variable(s) has been excluded from the model, the coefficients of the variables retained in the model are generally bi- ased as well as inconsistent, and the error variance and the standard er- rors of the OLS estimators are biased. As a result, the conventional t and F tests remain of questionable value.

4. Similar consequences ensue if we use the wrong functional form. 5. The consequences of including irrelevant variables(s) in the model are less

serious in that estimated coefficients still remain unbiased and consistent, the error variance and standard errors of the estimators are correctly esti- mated, and the conventional hypothesis-testing procedure is still valid. The major penalty we pay is that estimated standard errors tend to be rel- atively large, which means parameters of the model are estimated rather imprecisely. As a result, confidence intervals tend to be somewhat wider.

6. In view of the potential seriousness of specification errors, in this chapter we considered several diagnostic tools to help us find out if we have the specification error problem in any concrete situation. These tools include a graphical examination of the residuals and more formal tests, such as MWD and RESET.

CHAPTER SEVEN: MODEL SELECTION: CRITERIA AND TESTS 239

15Note this technical point. Since is a random variable, its appearance as an explanatory vari- able in Eq. (7.23) means the use of t and F tests is justified only if the sample is reasonably large.

YNt

guj75845_ch07.qxd 4/16/09 11:57 AM Page 239

Since the search for a theoretically correct model can be exasperating, in this chapter we considered several practical criteria that we should keep in mind in this search, such as (1) parsimony, (2) identifiability, (3) goodness of fit, (4) the- oretical consistency, and (5) predictive power.

As Granger notes, “In the ultimate analysis, model building is probably both an art and a science. A sound knowledge of theoretical econometrics and the availability of an efficient computer program are not enough to ensure success.”16

KEY TERMS AND CONCEPTS

The key terms and concepts introduced in this chapter are:

240 PART TWO: REGRESSION ANALYSIS IN PRACTICE

16See C. W. J. Granger (ed.), Modelling Economic Time Series: Readings in Econometric Methodology, Clanrendon, Oxford, U.K., 1990, p. 2.

Attributes of a good model a) Parsimony (principle of

parsimony) b) Identifiability c) Goodness of fit d) Theoretical consistency e) Predictive power

Specification errors and model misspecification errors a) Core variables b) Peripheral variables c) Underfitting a model (omitted

variable bias) d) Overfitting a model (inclusion

of irrelevant variable bias)

e) Incorrect (wrong) functional form bias

f) Instrumental or proxy variables Specification error tests

a) Unnecessary variables (stepwise regression; data mining)

b) Tests for omitted variables and incorrect functional forms

c) MacKinnon-White-Davidson (MWD) test

d) Ramsey’s regression error specification (RESET) test

QUESTIONS

7.1. What is meant by specification errors? 7.2. What are the reasons for the occurrence of specification errors? 7.3. What are the attributes of a “good” econometric model? 7.4. What are different types of specification errors? Can one or more of these

errors occur simultaneously? 7.5. What are the consequences of omitting a relevant variable(s) from a model? 7.6. When we say that a variable is “relevant” or “irrelevant,” what do we mean? 7.7. What are the consequences of including irrelevant variables in a model? 7.8. Omitting a relevant variable(s) from a model is more dangerous than includ-

ing an irrelevant variable(s). Do you agree? Why or why not?

guj75845_ch07.qxd 4/16/09 11:57 AM Page 240

7.9. In looking for the simple Keynesian multiplier, you regress the GNP on in- vestment and find that there is some relationship. Now, thinking that it can- not hurt much, you include the “irrelevant” variable “state and local taxes.” To your surprise, the investment variable loses its significance. How can an irrelevant variable do this?

7.10. What would you do if you had to choose between a model that satisfies all statistical criteria but does not satisfy economic theory and a model that fits established economic theory but does not fit many statistical criteria?

PROBLEMS

7.11. Table 7-5, found on the textbook’s Web site, gives data on the real gross prod- uct, labor input, and real capital input in the Taiwanese manufacturing sector for the years 1958 to 1972. Suppose the theoretically correct production func- tion is of the Cobb-Douglas type, as follows:

where ln = the natural log. a. Given the data shown in Table 7-5, estimate the Cobb-Douglas production

function for Taiwan for the sample period and interpret the results. b. Suppose capital data were not initially available and therefore someone

estimated the following production function:

where an error term. What kind of specification error is incurred in this case? What are the consequences? Illustrate with the data in Table 7-5.

c. Now pretend that the data on labor input were not available initially and suppose you estimated the following model:

where w = an error term. What are the consequences of this type of speci- fication error? Illustrate with the data given in Table 7-5.

7.12. Consider the following models: Model I: Consumptioni = B1 + B2incomei + ui Model II: Consumptioni = A1 + A2wealthi + vi a. How would you decide which of the models is the “true” model? b. Suppose you regress consumption on both income and wealth. How

would this help you decide between the two models? Show the necessary details.

7.13. Refer to Equation 5.40 in Chapter 5, which discusses the regression-through- the-origin (i.e., zero-intercept) model. If there is in fact an intercept present in the model but you run it through the origin, what kind of specification error is committed? Document the consequences of this type of error with the data given in Table 2-13 (found on the textbook’s Web site) in Chapter 2.

ln Yt = C1 + C2 ln X3t + wt

v =

ln Yt = C1 + C2 ln X2t + vt

ln Yt = B1 + B2 ln X2t + B3 ln X3t + ut

CHAPTER SEVEN: MODEL SELECTION: CRITERIA AND TESTS 241

guj75845_ch07.qxd 4/16/09 11:57 AM Page 241

7.14. Table 7-6 (found on the textbook’s Web site) gives data on the real rate of re- turn (Y) on common stocks, the output growth (X2), and inflation (X3), all in percent for the United States for 1954 to 1981. a. Regress Y on X3. b. Regress Y on X2 and X3. c. Comment on the two regression results in view of Professor Eugene

Fama’s observation that “the negative simple correlation between real stock returns and inflation is spurious (or false) because it is the result of two structural relationships: a positive relation between current real stock returns and expected output growth and a negative relationship between expected output growth and current inflation.”

d. Do the regression in part (b) for the period 1956 to 1976, omitting the data for 1954 and 1955 due to unusual stock return behavior in those years, and compare this regression with the one obtained in part (b). Comment on the difference, if any, between the two.

e. Suppose you want to run the regression for the period 1956 to 1981 but want to distinguish between the periods 1956 to 1976 and 1977 to 1981. How would you run this regression? (Hint: Think of the dummy variables.)

7.15. Table 7-7 (found on the textbook’s Web site) gives data on indexes of aggre- gate final energy demand (Y), the real gross domestic product, the GDP (X2), and the real energy price (X3) for the OECD countries—the United States, Canada, Germany, France, the United Kingdom, Italy, and Japan—for the pe- riod 1960 to 1982. (All indexes with base 1973 = 100.) a. Estimate the following models:

Model A: ln Yt = B1 + B2 ln X2t + B3 ln X3t + u1t Model B: ln Yt = A1 + A2 ln X2t + A3 ln X2(t - 1) + A4 ln X3t + u2t Model C: ln Yt = C1 + C2 ln X2t + C3 ln X3t + C4 ln X3(t - 1) + u3t Model D: ln Yt = D1 + D2 ln X2t + D3 ln X3t + D4 ln Y(t - 1) + u4t where the u’s are the error terms. Note: Models B and C are called dynamic models—models that explicitly take into account the changes of a variable over time. Models B and C are called distributed lag models because the im- pact of an explanatory variable on the dependent variable is spread over time, here over two time periods. Model D is called an autoregressive model because one of the explanatory variables is a lagged value of the dependent variable.

b. If you estimate Model A only, whereas the true model is either B, C, or D, what kind of specification bias is involved?

c. Since all the preceding models are log-linear, the slope coefficients repre- sent elasticity coefficients. What are the income (i.e., with respect to GDP) and price elasticities for Model A? How would you go about estimating these elasticities for the other three models?

d. What problems do you foresee with the OLS estimation of Model D since the lagged Y variable appears as one of the explanatory variables? (Hint: Recall the assumptions of the CLRM.)

7.16. Refer to Problem 7.11. Suppose you extend the Cobb-Douglas production function model by including the trend variable X4, a surrogate for technol- ogy. Suppose further that X4 turns out to be statistically significant. In that case, what type of specification error is committed? What if X4 turns out to be statistically insignificant? Present the necessary calculations.

242 PART TWO: REGRESSION ANALYSIS IN PRACTICE

guj75845_ch07.qxd 4/16/09 11:57 AM Page 242

7.17. Table 7-8 on the textbook’s Web site gives data on variables that might affect the demand for chickens in the United States. The dependent variable here is the per capita consumption of chickens, and the explanatory variables are per capita real disposable income and the prices of chicken and chicken substi- tutes (pork and beef). a. Estimate a log-linear model using these data. b. Estimate a linear model using these data. c. How would you choose between the two models? What test will you use?

Show the necessary computations. 7.18. Suppose that we modify model (7.13) as follows:

a. Estimate this model. b. If the Year2 in this model turns out to be statistically significant, what can

you say about regression (7.13)? c. Is there a specification error involved here? If so, of what type? What are

the consequences of this specification error?. 7.19. Does more money help schools? To answer this question, Rubén Hernández-

Murillo and Deborah Roisman present the data given in Table 7-9 on the textbook’s Web site.17

These data relate to several input and outcome variables for school districts in the St. Louis area and are for the academic year 1999 to 2000. a. Treating the Missouri Assessment Program (MAP) test score as the depen-

dent variable, develop a suitable model to explain the behavior of MAP. b. Which variable(s) is crucial in determining MAP—economic or social? c. What is the rationale for the dummy variable? d. Would it be prudent to conclude from your analysis that spending per

pupil and or smaller student/teacher ratio are unimportant determinants of test scores?

7.20. In Bazemore v. Friday, 478 U.S. 385 (1986), a case involving pay discrimination in the North Carolina Extension Service, the plaintiff, a group of black agents, submitted a multiple regression model showing that, on average, the black agents’ salary was lower than that of their white counterparts. When the case reached the court of appeals, it rejected the plaintiff’s case on the grounds that their regression had not included all the variables thought to have an effect on salary. The Supreme Court, however, reversed the appeals court. It stated:18

The Court of Appeals erred in stating that petitioners’ regression analyses were “unacceptable as evidence of discrimination,” because they did not include all measurable variables thought to have an effect on salary level. The court’s view of the evidentiary value of the regression analysis was plainly incorrect. While the omission of variables from a regression analysis may render the analysis less probative than it otherwise might be, it can hardly be said, absent some other infirmity, that an analysis which accounts for the major factors

Yt = B1 + B2Xt + B3Time + B4Time2 + ut

CHAPTER SEVEN: MODEL SELECTION: CRITERIA AND TESTS 243

17See their article, “Tough Lesson: More Money Doesn’t Help Schools; Accountability Does,” The Regional Economist, Federal Reserve Bank of St. Louis, April 2004, pp. 12–13.

18The following is reproduced from Michael O. Finkelstein and Bruce Levin, Statistics for Lawyers, Springer-Verlag, New York, 1989, p. 374.

guj75845_ch07.qxd 4/16/09 11:57 AM Page 243

“must be considered unacceptable as evidence of discrimination.” Ibid. Normally, a failure to include variables will affect the analysis’ probativeness, not its admissibility.

Do you think the Supreme Court was correct in this decision? Articulate your views fully, bearing in mind the theoretical consequences of specification errors and practical realities.

7.21. Table 7-10 on the textbook’s Web site contains data about the manufacturing sector of all 50 states and the District of Columbia. The dependent variable is output, measured as “value added” in thousands of U.S. dollars, and the in- dependent variables are worker hours and capital expenditures. a. Predict output using a standard linear model. What is the function? b. Create a log-linear model using the data as well. What is this function? c. Use the MWD test to decide which model is more appropriate.

244 PART TWO: REGRESSION ANALYSIS IN PRACTICE

guj75845_ch07.qxd 4/16/09 11:57 AM Page 244

245

CHAPTER 8 MULTICOLLINEARITY:

WHAT HAPPENS IF EXPLANATORY VARIABLES

ARE CORRELATED?

In Chapter 4 we noted that one of the assumptions of the classical linear re- gression model (CLRM) is that there is no perfect multicollinearity—no exact linear relationships among explanatory variables, X’s, included in a multiple regression. In that chapter we explained intuitively the meaning of perfect multicollinearity and reasons for assuming why it should not exist in the population regression function (PRF). In this chapter we take a closer look at the topic of multicollinearity. In practice, we rarely encounter perfect multicollinearity, but cases of near or very high multicollinearity where ex- planatory variables are approximately linearly related frequently arise in many applications. It is important to know what problems these correlated variables pose for the ordinary least squares (OLS) estimation of multiple regression models. Toward that end, in this chapter we will seek answers to the following questions:

1. What is the nature of multicollinearity? 2. Is multicollinearity really a problem? 3. What are the theoretical consequences of multicollinearity? 4. What are the practical consequences of multicollinearity? 5. In practice, how does one detect multicollinearity? 6. If it is desirable to eliminate the problem of multicollinearity, what

remedial measures are available?

guj75845_ch08.qxd 4/16/09 12:00 PM Page 245

246 PART TWO: REGRESSION ANALYSIS IN PRACTICE

8.1 THE NATURE OF MULTICOLLINEARITY:THE CASE OF PERFECT MULTICOLLINEARITY

To answer these various questions, we consider first a simple numerical exam- ple, which is specially constructed to emphasize some crucial points about mul- ticollinearity. Consider the data given in Table 8-1.

This table gives data on demand for widgets (Y) in relation to price (X2) and two measures of weekly consumer income, X3, as estimated, say, by a re- searcher, and X4, as estimated by another researcher. To distinguish between the two, we call X3 income and X4 earnings.

Since, besides the price, the income of the consumer is also an important determinant in the demand for most goods, we write the expanded demand function as

(8.1)

(8.2)

These demand functions differ in the measure of income used. A priori, or ac- cording to theory, A2 and B2 are expected to be negative (Why?), but A3 and B3 are expected to be positive (Why?).1

When an attempt was made to fit the regression (8.1) to the data in Table 8-1, the computer “refused” to estimate the regression.2 What went wrong? Nothing. By plotting the variables price (X2) and income (X3), we get the diagram shown in Figure 8-1.

And by trying to regress X3 on X2, we obtain the following results:

(8.3)X3i = 300 - 2X2i R2(=r2) = 1.00

Yi = B1 + B2X2i + B3X4i + ui

Yi = A1 + A2X2i + A3X3i + ui

THE DEMAND FOR WIDGETS

Y X2 X3 X4 (quantity) (price, $) (income per week, $) (earnings per week, $)

49 1 298 297.5 45 2 296 294.9 44 3 294 293.5 39 4 292 292.8 38 5 290 290.2 37 6 288 289.7 34 7 286 285.8 33 8 284 284.6 30 9 282 281.1 29 10 280 278.8

TABLE 8-1

1According to economic theory, the income coefficient is expected to be positive for most normal economic goods. It is expected to be negative for what are called “inferior” goods.

2Usually, you will get a message saying that the X, or data, matrix is not positive definite; that is, it cannot be inverted. In matrix algebra such a matrix is called a singular matrix. Simply put, the computer cannot do the calculations.

guj75845_ch08.qxd 4/16/09 12:00 PM Page 246

CHAPTER EIGHT: MULTICOLLINEARITY: WHAT HAPPENS IF EXPLANATORY VARIABLES ARE CORRELATED? 247

In other words, the income variable X3 and the price variable X2 are perfectly linearly related; that is, we have perfect collinearity (or multicollinearity).3

Because of the relationship in Equation (8.3), we cannot estimate the regres- sion (8.1), for if we substitute Eq. (8.3) into Eq. (8.1), we obtain

(8.4)

where C1 = A1 + 300A3 (8.5) C2 = A2 - 2A3 (8.6)

No wonder we could not estimate Eq. (8.1), for as Eq. (8.4) shows, we do not have a multiple regression but a simple two-variable regression between Y and X2. Now, although we can estimate Eq. (8.4) and obtain estimates of C1 and C2, from these two values we cannot obtain estimates of the original parameters A1, A2, and A3, for in Equations (8.5) and (8.6) we have only two equations but there are three unknowns to be estimated. (From school algebra we know that to es- timate three unknowns we generally require three equations.)

The results of estimating the regression (8.4) are as follows:

(8.7)

t = (66.538)(-17.935) r2 = 0.9757 se = (0.746)(0.1203) YNi = 49.667 - 2.1576X2i

= C1 + C2X2i + ui

= (A1 + 300A3) + (A2 - 2A3)X2i + ui

Yi = A1 + A2X2i + A3(300 - 2X2i) + ui

X3

X2

300

–2

1

X3 = 300 – 2 X2 In

co m

e

Price

2 4 6 8 100

Scattergram between income (X3) and price (X2)FIGURE 8-1

3Although the term collinearity refers to a single perfect linear relationship between variables and the term multicollinearity refers to more than one such relationship, from now on we will use the term multicollinearity in a generic sense to include both cases. The context will make it clear whether we have just one or more than one exact linear relationship.

guj75845_ch08.qxd 4/16/09 12:00 PM Page 247

As we can see, C1 = 49.667 and C2 = -2.1576. Try as we might, from these two val- ues there is no way to retrieve the values of the three unknowns, A1, A2, and A3.

4

The upshot of the preceding discussion is that in cases of perfect linear relation- ship or perfect multicollinearity among explanatory variables, we cannot obtain unique estimates of all parameters. And since we cannot obtain their unique estimates, we can- not draw any statistical inferences (i.e., hypothesis testing) about them from a given sample.

To put it bluntly, in cases of perfect multicollinearity, estimation and hypoth- esis testing about individual regression coefficients in a multiple regression are not possible. It is a dead end issue. Of course, as Eqs. (8.5) and (8.6) show, we can obtain estimates of a linear combination (i.e., the sum or difference) of the original coefficients, but not of each of them individually.

8.2 THE CASE OF NEAR, OR IMPERFECT, MULTICOLLINEARITY

The case of perfect multicollinearity is a pathological extreme. In most applica- tions involving economic data two or more explanatory variables are not exactly linearly related but can be approximately so. That is, collinearity can be high but not perfect. This is the case of near, or imperfect, or high multicollinearity. We will explain what we mean by “high” collinearity shortly. From now on when talk- ing about multicollinearity, we are refering to imperfect multicollinearity. As we saw in Section 8.1, the case of perfect multicollinearity is a blind alley.

To see what we mean by near, or imperfect, multicollinearity, let us return to our data in Table 8-1, but this time, we estimate regression (8.2) with earnings as the income variable. The regression results are as follows:

(8.8)

These results are interesting for several reasons:

1. Although the regression (8.1) cannot be estimated, we can estimate the regression (8.2), even though the difference between the two income variables is very small, which can be seen visually from the last two columns of Table 8-1.5

t = (1.2107) (-3.4444) (-0.7971) R2 = 0.9778

se = (120.06) (0.8122) (0.4003)

YNi = 145.37 - 2.7975X2i - 0.3191X4i

248 PART TWO: REGRESSION ANALYSIS IN PRACTICE

4Of course, if the value of one of A1, A2, and A3 is fixed arbitrarily, then the values of the other two A’s can be obtained from the estimated C’s. But these values will not be unique, for they depend on the value arbitrarily chosen for one of the A’s. To reiterate, there is no way of obtaining unique values of three unknowns (the three A’s) from two knowns (the two C’s).

5It is time to let the “cat out of the bag.” The earnings figures reported in column 4 of Table 8-1 were constructed from the following relation: X4i = X3i + ui, where the u’s are random terms obtained from a random number table. The 10 values of u are as follows: -0.5, -1.1, -0.5, 0.8, 0.2, 1.7, -0.2, 0.6, -0.9, and -1.2.

guj75845_ch08.qxd 4/16/09 12:00 PM Page 248

CHAPTER EIGHT: MULTICOLLINEARITY: WHAT HAPPENS IF EXPLANATORY VARIABLES ARE CORRELATED? 249

2. As expected, price coefficients are negative in both Equations (8.7) and (8.8), and the numerical difference between the two is not vast. Each price coefficient is statistically significantly different from zero (Why?), but notice that, relatively speaking, the|t|value of the coefficient in Eq. (8.7) is much greater than the corresponding |t| value in Eq. (8.8). Or what amounts to the same thing, comparatively, the standard error (se) of the price coefficient in Eq. (8.7) is much smaller than that in Eq. (8.8).

3. The R2 value in Eq. (8.7) with one explanatory variable is 0.9757, whereas in Eq. (8.8) with two explanatory variables it is 0.9778, an increase of only 0.0021, which does not appear to be a great increase. It can be shown that this increase in the R2 value is not statistically significant.6

4. The coefficient of the income (earnings) variable is statistically insignifi- cant, but, more importantly, it has the wrong sign. For most commodi- ties, income has a positive effect on the quantity demanded, unless the commodity in question happens to be an inferior good.

5. Despite the insignificance of the income variable, if we were to test the hypothesis that B2 = B3 = 0 (i.e., the hypothesis that R2 = 0), the hypothe- sis could be rejected easily by applying the F test given in expression (4.49) or (4.50). In other words, collectively or together, price and earn- ings have a significant impact on the quantity demanded.

What explains these “strange” results? As a clue, let us plot X2 against X4, price against earnings. (See Figure 8-2.) Unlike Figure 8-1, we see that although

X4

E ar

n in

gs

X2

Price 2 4 6 8 10

X4 = 299.92 – 2.0055 X2

1 3 5 7 9

0

Earnings (X4) and price (X2) relationshipFIGURE 8-2

6This can be shown with the F test discussed in Chapter 4.

guj75845_ch08.qxd 4/16/09 12:00 PM Page 249

price and earnings are not perfectly linearly related, there is a high degree of dependency between the two.

This can be seen more clearly from the following regression:

(8.9)

As this regression shows, price and earnings are highly correlated; the coefficient of correlation is -0.9884 (which is the negative square root of r2). This is the case of near perfect linear relationship, or near perfect multicollinearity. If the coefficient of correlation were -1, as in Eq. (8.3), this would be the case of perfect multi- collinearity. Notice carefully, in Eq. (8.3) we have not added ei because the linear relationship between X2i and X3i is perfect, whereas in Equation (8.9) we have added it to show that the linear relationship between X4i and X2i is not perfect.

In passing, note that if there are just two explanatory variables, the coefficient of correlation r can be used as a measure of the degree or strength of collinearity. But if more than two explanatory variables are involved, as we will show later, the coefficient of correlation may not be an adequate measure of collinearity.

8.3 THEORETICAL CONSEQUENCES OF MULTICOLLINEARITY

Now that we have discussed the nature of perfect and imperfect multicollinear- ity somewhat heuristically, let us state the consequences of multicollinearity a bit more formally. But keep in mind that from now on we consider only the case of imperfect multicollinearity, for perfect multicollinearity leads us nowhere.

As we know, given the assumptions of the CLRM, OLS estimators are best linear unbiased estimators (BLUE). In the class of all linear unbiased estimators, OLS estimators have the least possible variance. It is interesting that so long as collinearity is not perfect, OLS estimators still remain BLUE even though one or more of the partial regression coefficients in a multiple regression can be individually statis- tically insignificant. Thus, in Eq. (8.8), the income coefficient is statistically insignificant although the price coefficient is statistically significant. But OLS estimates presented in Eq. (8.8) still retain their BLUE property.7 Then why all the fuss about multicollinearity? There are several reasons:

1. It is true that even in the presence of near collinearity, the OLS estimators are unbiased. But remember that unbiasedness is a repeated sampling prop- erty. What this says is that, keeping the values of the X variables fixed, if we obtain several samples and compute the OLS estimates for each of these samples, the average value of the estimates will tend to converge to the true population value of the estimates. But this says nothing about the properties of estimates given in any given sample. In reality, we rarely have the luxury of replicating samples.

t = (444.44) (-18.44) r2 = 0.9770 se = (0.6748) (0.1088)

X4i = 299.92 - 2.0055X2i + ei

250 PART TWO: REGRESSION ANALYSIS IN PRACTICE

7Since imperfect multicollinearity per se does not violate any of the assumptions listed in Chapter 4, OLS estimators retain the BLUE property.

guj75845_ch08.qxd 4/16/09 12:00 PM Page 250

CHAPTER EIGHT: MULTICOLLINEARITY: WHAT HAPPENS IF EXPLANATORY VARIABLES ARE CORRELATED? 251

2. It is also true that near collinearity does not destroy the minimum vari- ance property of OLS estimators. In the class of all linear unbiased esti- mators, OLS estimators have minimum variance. This does not mean, however, that the variance of an OLS estimator will be small (in relation to the value of the estimator) in any given sample, as the regression (8.8) shows very clearly. It is true that the estimator of the income coefficient is BLUE, but in the sample at hand its variance is so large compared to the estimate that the computed t value (under the null hypothesis that the true income coefficient is zero) is only -0.7971. This would lead us to not reject the hypothesis that income has no effect on the quantity of wid- gets demanded. In short, minimum variance does not mean the numerical value of the variance will be small.

3. Multicollinearity is essentially a sample (regression) phenomenon in the sense that even if the X variables are not linearly related in the population (i.e., PRF), they can be so related in a particular sample, such as that of Table 8-1. When we postulate the PRF, we believe that all X variables included in the model have a separate or independent effect on the de- pendent variable Y. But it can happen that in any given sample that is used to estimate the PRF some or all X variables are so highly collinear that we cannot isolate their individual influence on Y. Our sample lets us down so to speak, although the theory says that all X’s are important. And this happens because most economic data are not obtained in controlled laboratory experiments. Data on variables such as the gross domestic product (GDP), prices, unemployment, profits, and dividends are usually observed as they occur and are not obtained experimentally. If these data could be obtained experimentally to begin with, we would not allow collinearity to exist. Since data are usually obtained nonexperimentally, and if there is near collinearity in two or more explanatory variables, often we are in “the statistical position of not being able to make bricks without straw.”8

For all these reasons, the fact that OLS estimators are BLUE despite (imper- fect) multicollinearity is of little consolation in practice. Therefore, we must try to find out what happens or is likely to happen in any given sample. As noted, collinearity is usually a sample-specific phenomenon.

8.4 PRACTICAL CONSEQUENCES OF MULTICOLLINEARITY

In cases of near or high multicollinearity, as in our demand for widget regression (8.8), we are likely to encounter one or more of the following consequences:

1. Large variances and standard errors of OLS estimators. This is clearly seen from the widget regressions (8.7) and (8.8). As discussed earlier, because

8J. Johnston, Econometric Methods, 2nd ed., McGraw-Hill, New York, 1972, p. 164.

guj75845_ch08.qxd 4/16/09 12:00 PM Page 251

of high collinearity between price (X2) and earnings (X4), when both variables are included in the regression (8.8), the standard error of the coefficient of the price variable increases dramatically compared with the regression (8.7). As we know, if the standard error of an estimator increases, it becomes more difficult to estimate the true value of the estimator. That is, there is a fall in the precision of OLS estimators.

2. Wider confidence intervals. Because of large standard errors, confidence intervals for relevant population parameters tend to be large.

3. Insignificant t ratios. Recall that to test the hypothesis that in our regres- sion (8.8) the true B3 = 0 we use the t ratio b3/se(b3) and compare the estimated t value with the critical t value from the t table. But as previ- ously seen, in cases of high collinearity the estimated standard errors increase dramatically, thereby making t values smaller. Therefore, in such cases we will increasingly accept the null hypothesis that the rele- vant true population coefficient is zero. Thus, in the regression (8.8), since the t value is only -0.7971, we might jump to the conclusion that in the widget example income has no effect on the quantity demanded.

4. A high R2 value but few significant t ratios. The regression (8.8) shows this clearly. The R2 in this regression is quite high, about 0.98, but only the t ratio of the price variable is significant. And yet on the basis of the F ratio, as we have seen, we can reject the hypothesis that the price and earnings variables simultaneously have no effect on the quantity of widgets demanded.

5. OLS estimators and their standard errors become very sensitive to small changes in the data; that is, they tend to be unstable. To see this, return to Table 8-1. Suppose we change the data on the earnings variable X4 slightly. The first, fifth, and tenth observations are now 295, 287, and 274, respectively. All other values remain intact. The result of this change gives the following regression:

(8.10)

Comparing Eq. (8.8) with regression (8.10), we observe that as a result of a very small change in the data, the regression results change quite substantially. Relatively speaking, standard errors have gone down in Eq. (8.10), and, as a result, t ratios have increased in absolute values and the income variable now has become less negative than before.

Why such a change? In the regression (8.8) the coefficient of correla- tion between X2 and X4 was -0.9884, whereas in the regression (8.10) it was -0.9431. In other words, the degree of collinearity between X2 and X4 has decreased in going from Eq. (8.8) to Eq. (8.10). Although the decrease in the correlation coefficient does not seem astounding, the

t = (2.0936) (-7.0083) (-1.0597) R2 = 0.9791

se = (48.030) (0.35906) (0.1604)

YNi = 100.56 - 2.5164X2i - 0.16995X4i

252 PART TWO: REGRESSION ANALYSIS IN PRACTICE

guj75845_ch08.qxd 4/16/09 12:00 PM Page 252

CHAPTER EIGHT: MULTICOLLINEARITY: WHAT HAPPENS IF EXPLANATORY VARIABLES ARE CORRELATED? 253

change in regression results is noticeable. And this is precisely what hap- pens in cases of near perfect collinearity.

6. Wrong signs for regression coefficients. As regressions (8.8) and (8.10) show, the earnings variable has the “wrong” sign, for economic theory would have us believe that for most commodities the income effect is positive. Of course, with an inferior good this is not a wrong sign. Therefore, we have to be careful in attributing the wrong sign to multicollinearity alone, but it should not be ruled out either.

7. Difficulty in assessing the individual contributions of explanatory variables to the explained sum of squares (ESS) or R2. We can illustrate this point again with our widget example. In Eq. (8.7) we regressed quantity (Y) on price (X2) alone, giving an R2 value of 0.9757. In regression (8.8) we regressed Y on both price and earnings, obtaining an R2 of 0.9778. Now if we regress Y on X4 alone, we obtain the following results:

(8.11)

Lo and behold, earnings (X4) alone explains 94 percent of the variation in the quantity demanded. In addition, the earnings coefficient is not only statistically significant, but it is also positive, in accord with theoretical expectations!

As shown previously, in the multiple regression (8.8) the R2 value is 0.9778. What part of it is due to X2 and what part is due to X4? We cannot tell precisely because the two variables are so highly collinear that when one moves the other moves with it almost automatically, as the regression (8.9) so clearly demon- strates. Therefore, in cases of high collinearity it is futile to assess the contribu- tion of each explanatory variable to the overall R2.

A question: Can the consequences of multicollinearity that we have illus- trated earlier be established rigorously? Yes indeed! But we will skip the proofs here since they can be found elsewhere.9

8.5 DETECTION OF MULTICOLLINEARITY

As demonstrated in the previous section, practical consequences of multi- collinearity can be far-ranging, the BLUE property notwithstanding. So, what can we do about resolving the multicollinearity problem? Before resolving it, we must first find out if we have a collinearity problem to begin with. In short, how do we detect the presence of and severity of multicollinearity?

t = (-9.794) (11.200) R2 = 0.9400

se = (26.929) (0.0932)

YNi = -263.74 + 1.0438X4i

9The proofs are shown in Gujarati and Porter, Basic Econometrics, 5th ed., McGraw-Hill, New York, 2009, Chapter 10.

guj75845_ch08.qxd 4/16/09 12:00 PM Page 253

Now we have a problem, for as noted earlier, multicollinearity is sample- specific; it is a sample phenomenon. Here it is useful to keep in mind the following warning:10

1. Multicollinearity is a question of degree and not of kind. The meaningful distinction is not between the presence and the absence of multicollinear- ity, but between its various degrees.

2. Since multicollinearity refers to the condition of the explanatory vari- ables that are assumed to be nonstochastic, it is a feature of the sample and not of the population.

Therefore, we do not “test for multicollinearity” but can, if we wish, measure its degree in any particular sample.

Having stated that, we must add that we do not have a single measure of multicollinearity, for in nonexperimentally collected data we can never be sure about the nature and degree of collinearity. What we have are some rules of thumb, or indicators, that will provide us with some clue about the existence of multicollinearity in concrete applications. Some of these indicators follow.

1. High R2 but few significant t ratios. As noted earlier, this is the “classic” symptom of multicollinearity. If R2 is high, say, in excess of 0.8, the F test in most cases will reject the null hypothesis that the partial slope coeffi- cients are jointly or simultaneously equal to zero. But individual t tests will show that none or very few partial slope coefficients are statistically dif- ferent from zero. Our widget regression (8.8) bears this out fully.

2. High pairwise correlations among explanatory variables. If in a multiple re- gression involving, say, six explanatory variables, we compute the coeffi- cient of correlation between any pair of these variables using the formula (B.46) in Appendix B, and if some of these correlations are high, say, in excess of 0.8, there is the possibility that some serious collinearity exists. Unfortunately, this criterion is not often reliable, for pairwise correlations can be low (suggesting no serious collinearity) yet collinearity is sus- pected because very few t ratios are statistically significant.11

3. Examination of partial correlations. Suppose we have three explanatory variables, X2, X3, and X4. Let r23, r24, and r34 represent the pairwise cor- relations between X2 and X3, between X2 and X4, and between X3 and X4, respectively. Suppose r23 = 0.90, indicating high collinearity between X2 and X3. Now consider the correlation coefficient, called the partial corre- lation coefficient, r23.4, which is the coefficient of correlation between X2 and X3, holding the influence of the variable X4 constant (the concept is sim- ilar to that of the partial regression coefficient discussed in Chapter 4). Suppose r23.4 = 0.43; that is, holding the influence of the variable X4 con- stant, the correlation coefficient between X2 and X3 is only 0.43, whereas

254 PART TWO: REGRESSION ANALYSIS IN PRACTICE

10Jan Kmenta, Elements of Econometrics, 2nd ed., Macmillan, New York, 1986, p. 431. 11For technical details, see Gujarati and Porter, Basic Econometrics, 5th ed., McGraw-Hill,

New York, 2009, Chapter 10.

guj75845_ch08.qxd 4/16/09 12:00 PM Page 254

CHAPTER EIGHT: MULTICOLLINEARITY: WHAT HAPPENS IF EXPLANATORY VARIABLES ARE CORRELATED? 255

not taking into account the influence of X4, it is 0.90. Then, judged by the partial correlation, we cannot say that the collinearity between X2 and X3 is necessarily high.

As we can see, in the context of several explanatory variables, reliance on simple pairwise correlations as indicators of multicollinearity can be misleading. Unfortunately, the substitution of simple pairwise correla- tions by partial correlation coefficients does not provide a definitive indicator of the presence of multicollinearity or otherwise. The latter provides only another device to check the nature of multicollinearity.12

4. Subsidiary, or auxiliary, regressions. Since multicollinearity arises because one or more of the explanatory variables are exact or near exact linear combinations of other explanatory variables, one way of finding out which X variable is highly collinear with other X variables in the model is to regress each X variable on the remaining X variables and to compute the corresponding R2. Each of these regressions is called a subsidiary or an auxiliary regression, auxiliary to the main regression of Y on all X’s.

For example, consider the regression of Y on X2, X3, X4, X5, X6, and X7—six explanatory variables. If this regression shows that we have a problem of multicollinearity because, say, the R2 is high but very few X coefficients are individually statistically significant, we then look for the “culprit,” the variable(s) that may be a perfect or near perfect linear com- bination of the other X’s. We proceed as follows:

(a) Regress X2 on the remaining X’s and obtain the coefficient of deter- mination, say, .

(b) Regress X3 on the remaining X’s and obtain its coefficient of determi- nation, .

Continue this procedure for the remaining X variables in the model. In the present example we will have six such auxiliary regressions, one for each explanatory variable.

How do we decide which of the X variables are collinear? The esti- mated will range between 0 and 1. (Why?) If an X variable is not a linear combination of the other X’s, then the of that regression should not be statistically significantly different from zero. And from Chapter 4, Eq. (4.50), we know how to test the assumption that a particular coeffi- cient of determination is statistically equal to zero.

Continuing with our hypothetical example involving six explanatory variables, suppose we want to test the hypothesis that ; that is, X2 is not collinear with the remaining five X’s. Now we use Eq. (4.50), which is

(4.50)F = R2>(k - 1)

(1 - R2)>(n - k)

R22 = 0

R2i R2i

R23

R22

12For technical details, see Gujarati and Porter, op. cit.

guj75845_ch08.qxd 4/16/09 12:00 PM Page 255

where n is the number of observations and k is the number of explana- tory variables including the intercept. Let us illustrate.

In our hypothetical example involving six explanatory variables, sup- pose that we regress each of the X variables on the remaining X’s in a sample involving 50 observations. The R2 values obtained from the vari- ous auxiliary regressions are as follows:

= 0.90 (in the regression of X2 on other X’s)

= 0.18 (in the regression of X3 on other X’s)

= 0.36 (in the regression of X4 on other X’s)

= 0.86 (in the regression of X5 on other X’s)

= 0.09 (in the regression of X6 on other X’s)

= 0.24 (in the regression of X7 on other X’s)

The results of applying the F test given in Eq. (4.50) are given in Table 8-2. As this table shows, the variables X2, X4, X5, and X7 seem to be

collinear with the other X’s, although the degree of collinearity, as mea- sured by R2, varies considerably. This example points out the important fact that a seemingly low R2, such as 0.36, can still be statistically signifi- cantly different from zero. A concrete economic example of auxiliary regressions is given in Section 8.7.

One drawback of the auxiliary regression technique is the computa- tional burden. If a regression contains several explanatory variables, we have to compute several subsidiary regressions, and therefore this method of detecting collinearity can be of limited practical value. But note that many computer packages now can compute the auxiliary regressions without much computational burden.

5. The variance inflation factor (VIF). Even if a model does not contain several explanatory variables, the R2 values obtained from the various auxiliary

R27

R26

R25

R24

R23

R22

256 PART TWO: REGRESSION ANALYSIS IN PRACTICE

TESTING THE SIGNIFICANCE OF R2

(EQUATION [4.50])

Value of R2 Value of F Is F significant?

0.90 79.20 Yes* 0.18 1.93 No 0.36 4.95 Yes* 0.86 54.06 Yes* 0.09 0.87 No 0.24 2.78 Yes†

Notes: *Significant at the 1% level. †Significant at the 5% level. In this example n = 50 and k = 6.

TABLE 8-2

guj75845_ch08.qxd 4/16/09 12:00 PM Page 256

CHAPTER EIGHT: MULTICOLLINEARITY: WHAT HAPPENS IF EXPLANATORY VARIABLES ARE CORRELATED? 257

regressions may not be a totally reliable indication of collinearity. This can be seen more clearly if we revert to the three-variable regression dis- cussed more completely in Chapter 4. In Equations (4.25) and (4.27) we have been given the formulas to compute the variances of the two partial slopes b2 and b3. With simple algebraic manipulations, these variance formulas can be alternatively written as

(8.12)

(8.13)

(For proofs of these formulas, see Problem 8.21.) In these formulas is the coefficient of determination in the (auxiliary) regression of X2 on X3. (Note: The R2 between X2 and X3 is the same as that between X3 and X2.)

In the preceding formulas

(8.14)

The expression on the right-hand side of Equation (8.14) is called, very appro- priately, the variance inflation factor (VIF) because as R2 increases, the vari- ance, and hence the standard error, of both b2 and b3 increases or inflates. (Do you see this?) In the extreme, when this coefficient of determination is 1 (i.e., perfect multicollinearity), these variances and standard errors are undefined. (Why?) Of course, if R2 is zero, that is, there is no collinearity, the VIF will be 1 (Why?), and we do not have to worry about the large variances and standard errors that plague the collinearity situations.

Now an important question: Suppose an in an auxiliary regression is very high (but less than 1), suggesting a high degree of collinearity per the criterion discussed in the previous point 4. But as Eq. (8.12), (8.13), and (8.14) so clearly show, the variance of, say, b2, not only depends upon the VIF but also upon the variance of ui, , as well as on the variation in X2, . Thus, it is quite possi- ble that an is very high, say, 0.91, but that either is low or is high, or both, so that the variance of b2 can still be lower and the t ratio higher. In other words, a high R2 can be counterbalanced by a low or a high , or both. Of course, the terms high and low are used in a relative sense.

All this suggests that a high R2 obtained from an auxiliary regression can be only a surface indicator of multicollinearity. It may not necessarily inflate the

gx22i�2

gx22i�2R2i gx22i�2

R2i

VIF = 1

A1 - R22 B

R22

= �2

ax 2 3i

VIF

var (b3) = �2

ax 2 3i A1 - R22 B

= �2

ax 2 2i

VIF

var (b2) = �2

ax 2 2i A1 - R22 B

guj75845_ch08.qxd 4/16/09 12:00 PM Page 257

standard errors of the estimators, as the preceding discussion reveals. To put it more formally, “high is neither necessary nor sufficient to get high standard er- rors and thus multicollinearity by itself need not cause high standard errors.”13

What general conclusions can we draw from the various multicollinearity diagnostics just discussed? That there are various indicators of multicollinearity and no single diagnostic will give us a complete answer to the collinearity problem. Remember that multicollinearity is a matter of degree and that it is a sample-specific phenomenon. In some situations it might be easy to diagnose, but in others one or more of the preceding methods will have to be used to find out the severity of the problem. There is no easy solution to the problem.

Research on multicollinearity diagnostics continues. There are some new techniques, such as the condition index, that have been developed. But they are beyond the scope of this book and are better left for the references.14

8.6 IS MULTICOLLINEARITY NECESSARILY BAD?

Before proceeding to consider remedial measures for the multicollinearity prob- lem, we need to ask an important question: Is multicollinearity necessarily an “evil”? The answer depends on the purpose of the study. If the goal of the study is to use the model to predict or forecast the future mean value of the dependent variable, collinearity per se may not be bad.

Returning to our widget demand function Eq. (8.8), although the earnings variable is not individually statistically significant, the overall R2 of 0.9778 is slightly higher than that of Eq. (8.7), which omits the earnings variable. Therefore, for prediction purposes Eq. (8.8) is marginally better than Eq. (8.7). Often forecasters choose a model on the basis of its explanatory power as mea- sured by the R2. Is this a good strategy? It may be if we assume that the collinearity observed between the price and earnings data given in Table 8-1 will also continue in the future. In Eq. (8.9) we have already shown how X4 and X2, earnings and price, are related. If the same relationship is expected to continue into the future, then Eq. (8.8) can be used to forecast. But that is a big if. If, in an- other sample, the degree of collinearity between the two variables is not that strong, obviously, a forecast based on Eq. (8.8) may be of little value.

On the other hand, if the objective of the study is not only prediction but also reliable estimation of the individual parameters of the chosen model, then seri- ous collinearity may be bad, because we have seen that it leads to large standard errors of the estimators. However, as noted earlier, if the objective of the study is to estimate a group of coefficients (e.g., the sum or difference of two coeffi- cients) fairly accurately, this can be done even in the presence of multicollinear- ity. In this case multicollinearity may not be a problem. Thus, in Eq. (8.7) the

R2i

258 PART TWO: REGRESSION ANALYSIS IN PRACTICE

13G. S. Maddala, Introduction to Econometrics, Macmillan, New York, 1988, p. 226. However, Maddala also says that “if is low, we would be better off.”

14For a simple discussion of the condition index, see Gujarati and Porter, Basic Econometrics, 5th ed., McGraw-Hill, New York, 2009, pp. 339–340.

R2i

guj75845_ch08.qxd 4/16/09 12:00 PM Page 258

CHAPTER EIGHT: MULTICOLLINEARITY: WHAT HAPPENS IF EXPLANATORY VARIABLES ARE CORRELATED? 259

slope coefficient of -2.1576 is an estimate of (A2 - 2A3) (see Eq. [8.6]), which can be measured accurately by the usual OLS procedure, although neither A2 nor A3 can be estimated individually.

There may be some “happy” situations where despite high collinearity the estimated R2 and most individual regression coefficients are statistically signif- icant on the basis of the usual t test at the conventional level of significance, such as 5%. As Johnston notes:

This can arise if individual coefficients happen to be numerically well in excess of the true value, so that the effect still shows up in spite of the inflated standard error and/or because the true value itself is so large that even an estimate on the downside still shows up as significant.15

Before moving on, let us take time out to consider a concrete economic ex- ample illustrating several points discussed so far in this chapter.

8.7 AN EXTENDED EXAMPLE:THE DEMAND FOR CHICKENS IN THE UNITED STATES, 1960 TO 1982

Table 7-8 (found on the textbook’s Web site) of Problem 7.17 gave data on the per capita consumption of chickens (Y), per capita real (i.e., adjusted for infla- tion) disposable income (X2), the real retail price of chicken (X3), the real retail price of pork (X4), and the real retail price of beef (X5) for the United States for the period 1960 to 1982.

Since in theory the demand for a commodity is generally a function of the real income of the consumer, the real price of the product, and real prices of competing or complementary products, the following demand function was estimated: The dependent variable (Y) is the natural log of per capita consump- tion of chickens in pounds.

Explanatory Standard error variable Coefficient (se) t ratio p value

Constant 2.1898 0.1557 14.063 0.0000 ln X2 0.3426 0.0833 4.1140 0.0003 ln X3 -0.5046 0.1109 -4.550 0.0001 (8.15) ln X4 0.1486 0.0997 1.4903 0.0767 ln X5 0.0911 0.1007 0.9046 0.1878

R2 = 0.9823; = 0.9784

Since we have fitted a log-linear demand function, all slope coefficients are partial elasticities of Y with respect to the appropriate X variable. Thus, the income elasticity of demand is about 0.34 percent, the own-price elasticity of demand is about -0.50, the cross-(pork) price elasticity of demand is about 0.15, and the cross-(beef) price elasticity of demand is about 0.09.

R 2

15J. Johnston, Econometric Methods, 3rd ed., McGraw-Hill, New York, 1984, p. 249.

guj75845_ch08.qxd 4/16/09 12:00 PM Page 259

As the previous results show, individually the income and own-price elastic- ity of demand are statistically significant, but the two cross-price elasticities are not. Incidentally, note that chicken is not a luxury consumption item since the income elasticity is less than 1. The demand for chicken with respect to its own price is price inelastic because, in absolute terms, the elasticity coefficient is less than 1.

Although the two cross-price elasticities are positive, suggesting that the other two meats are competing with chicken, they are not statistically signifi- cant. Thus, it would seem that the demand for chicken is not affected by the variation in the prices of pork and beef. But this might be a hasty conclusion, for we have to guard against the possibility of multicollinearity. Let us therefore consider some of the multicollinearity diagnostics discussed in Section 8.5.

Collinearity Diagnostics for the Demand Function for Chickens (Equation [8.15])

The Correlation Matrix Table 8-3 gives the pairwise correlations among the (logs of the) four explanatory variables. As this table shows, the pairwise correlations between the explanatory variables are uniformly high; about 0.98 between the log of real income and the log of the price of beef, about 0.95 be- tween the logs of pork and beef prices, about 0.91 between the log of real in- come and the log price of chicken, etc. Although such high pairwise correlations are no guarantee that our demand function suffers from the collinearity prob- lem, the possibility exists.

The Auxiliary Regressions This seems to be confirmed when we regress each explanatory variable on the remaining explanatory variables, which can be seen from the results presented in Table 8-4. As this table shows, all regressions in this table have R2 values in excess of 0.94; the F test shown in Eq. (4.50) shows that all these R2’s are statistically significant (see Problem 8.24), suggesting that each explanatory variable in the regression (8.15) is highly collinear with the other explanatory variables.

260 PART TWO: REGRESSION ANALYSIS IN PRACTICE

PAIRWISE CORRELATIONS BETWEEN EXPLANATORY VARIABLES OF EQUATION (8.15)

ln X2 ln X3 ln X4 ln X5

ln X2 1 0.9072 0.9725 0.9790 ln X3 0.9072 1 0.9468 0.9331 ln X4 0.9725 0.9468 1 0.9543 ln X5 0.9790 0.9331 0.9543 1

Note: The correlation matrix is symmetrical. Thus, the correlation between ln X4 and ln X3 is the same as that between ln X3 and ln X4.

TABLE 8-3

guj75845_ch08.qxd 4/16/09 12:00 PM Page 260

CHAPTER EIGHT: MULTICOLLINEARITY: WHAT HAPPENS IF EXPLANATORY VARIABLES ARE CORRELATED? 261

Therefore, it is quite possible that in the regression (8.15) we did not find the coefficients of the pork and beef price variables individually statistically signif- icant. But this is all in accord with the theoretical consequences of high multi- collinearity discussed earlier. It is interesting that despite high collinearity, the coefficients of the real income and own-price variables turned out to be statisti- cally significant. This may very well be due to the fact mentioned by Johnston in footnote 15.

As this example shows, we must be careful about judging the individual significance of an explanatory variable in the presence of a high degree of collinearity. We will return to this example in the following section when we consider remedial measures for multicollinearity.

8.8 WHAT TO DO WITH MULTICOLLINEARITY: REMEDIAL MEASURES

Suppose on the basis of one or more of the diagnostic tests discussed in Section 8.5 that we find a particular problem is plagued by multicollinearity. What solution(s), if any, can be used to reduce the severity of the collinearity problem, if not eliminate it completely? Unfortunately, as in the case of collinearity diagnostics, there is no surefire remedy; there are only a few rules of thumb. This is so because multicollinearity is a feature of a particular sam- ple and not necessarily a feature of the population. Besides, despite near collinearity, OLS estimators still retain their BLUE property. It is true that one or more regression coefficients can be individually statistically insignificant or that some of them can have the wrong signs. If the researcher is bent on re- ducing the severity of the collinearity problem, then he or she may try one or more of the following methods, keeping in mind that if the particular sample is “ill-conditioned,” there is not much that can be done. With this caveat, let us consider the various remedies that have been discussed in the econometric literature.

AUXILIARY REGRESSIONS

ln X2 = 0.9460 - 0.8324 ln X3 + 0.9483 ln X4 + 1.0176 ln X5 t = (2.5564) (-3.4903) (5.6590) (6.7847)

R2 = 0.9846 ln X3 = 1.2332 - 0.4692 ln X2 + 0.6694 ln X4 + 0.5955 ln X5

t = (8.0053) (-3.4903) (4.8652) (3.7848) R2 = 0.9428

ln X4 = -1.0127 + 0.6618 ln X2 + 0.8286 ln X3 - 0.4695 ln X5 t = (-3.7107) (5.6590) (4.8652) (-2.2879)

R2 = 0.9759 ln X5 = -0.7057 + 0.6956 ln X2 + 0.7219 ln X3 - 0.4598 ln X4

t = (-2.2362) (6.7847) (3.7848) (-2.2870) R2 = 0.9764

TABLE 8-4

guj75845_ch08.qxd 4/16/09 12:00 PM Page 261

Dropping a Variable(s) from the Model

Faced with severe multicollinearity, the simplest solution might seem to be to drop one or more of the collinear variables. Thus, in our demand function for chickens, the regression (8.15), since the three price variables are highly correlated, why not simply drop, say, the pork and beef price variables from the model?

But this remedy can be worse than the disease (multicollinearity). When for- mulating an economic model, such as the regression (8.15), we base the model on some theoretical considerations. In our example, following economic theory, we expect all three prices to have some effect on the demand for chicken since the three meat products are to some extent competing products. Therefore, eco- nomically speaking, the regression (8.15) is an appropriate demand function. Unfortunately, in our regression results based on the particular sample data given in Table 7-8 we were unable to detect the separate influence of the prices of pork and beef on the quantity of chicken demanded. But dropping those variables from the model will lead to what is known as model specification error, a topic that we discussed in Chapter 7. As we saw, if we drop a variable from a model simply to eliminate the collinearity problem and to estimate a model without that variable, the estimated parameters of the reduced model may turn out to be biased. To give some idea about this bias, let us present the results of the demand function for chickens without the pork and beef price variables:

ln Y = 2.0328 + 0.4515 ln X2 - 0.3722 ln X3 t = (17.497) (18.284) (-5.8647) (8.16)

R2 = 0.9801; = 0.9781

As these results show, compared to the regression (8.15), the income elasticity has gone up but the own-price elasticity, in absolute value, has declined. In other words, estimated coefficients of the reduced model seem to be biased.

As this discussion indicates, there may be a trade-off involved. In reducing the severity of the collinearity problem, we may be obtaining biased estimates of the coefficients retained in the model. The best practical advice is not to drop a variable from an economically viable model just because the collinearity problem is seri- ous. Whether a chosen model is economically correct is, of course, an important issue, and we have listed in Chapter 7 the attributes of a good model. In pass- ing, note that in regression (8.15) the t value of the pork price coefficient was in excess of 1. Therefore, following our discussion in Chapter 4, if we drop this variable from the model, the adjusted R2 will decrease, which is the case in the present instance.

Acquiring Additional Data or a New Sample

Since multicollinearity is a sample feature, it is possible that in another sample involving the same variables, collinearity may not be as serious as in the first

R2

262 PART TWO: REGRESSION ANALYSIS IN PRACTICE

guj75845_ch08.qxd 4/16/09 12:00 PM Page 262

CHAPTER EIGHT: MULTICOLLINEARITY: WHAT HAPPENS IF EXPLANATORY VARIABLES ARE CORRELATED? 263

sample. The important practical question is whether we can obtain another sample, for collection of data can be costly.

Sometimes just acquiring additional data—increasing the sample size—can reduce the severity of the collinearity problem. This can be seen easily from formulas (8.12) and (8.13). For example, in the formula

(8.13)

for a given and R2, if the sample size of X3 increases, will generally in- crease (Why?), as a result of which the variance of b3 will tend to decrease, and with it the standard error of b3.

As an illustration, consider the following regression of consumption expen- diture (Y) on income (X2) and wealth (X3) based on 10 observations:16

(8.17)

This regression shows that the wealth coefficient is not statistically significant, say, at the 5% level.

But when the sample size is increased to 40 observations, the following results are obtained:

(8.18)

Now the wealth coefficient is statistically significant at the 5% level. Of course, as in the case of obtaining a new sample, getting additional data

on variables already in the sample may not be feasible because of cost and other considerations. But if these constraints are not very prohibitive, by all means this remedy is certainly feasible.

Rethinking the Model

Sometimes a model chosen for empirical analysis is not carefully thought out— maybe some important variables are omitted, or maybe the functional form of the model is incorrectly chosen. Thus, in our demand function for chicken, instead of the log-linear specification, the demand function is probably linear in variables (LIV). It is possible that in the LIV specification the extent of collinear- ity may not be as high as in the log-linear specification.

t = (0.8713) (6.0014) (2.0641) R2 = 0.9672

YNi = 2.0907 + 0.7299X2i + 0.0605X3i

t = (3.875) (2.7726) (-1.1595) R2 = 0.9682

se = (6.2801) (0.31438) (0.0301)

YNi = 24.337 + 0.87164X2i - 0.0349X3i

gx23i�2

var (b3) = �2

ax 2 3i A1 - R22 B

16I am indebted to Albert Zucker for providing the results given in regressions (8.17) and (8.18).

guj75845_ch08.qxd 4/16/09 12:00 PM Page 263

Returning to the demand function for chicken, we fitted the LIV model to the data given in Table 7-8, with the following results:

t = (10.015)(1.0241) (-3.7530) (3.1137) (1.3631) (8.19)

R2 = 0.9426; = 0.9298

Compared to the regression (8.15), we now observe that in the LIV specification, the income coefficient is statistically insignificant but the pork price coefficient is statistically significant. What accounts for this change? Perhaps there is a high degree of collinearity between the income and the price variables. As a matter of fact, we found out from Table 8-4 that this was the case. As noted earlier, in the presence of a high degree of collinearity it is not possible to estimate a sin- gle regression coefficient too precisely (i.e., with a smaller standard error).

Prior Information about Some Parameters

Sometimes a particular phenomenon, such as a demand function, is investi- gated time and again. From prior studies it is possible that we can have some knowledge of the values of one or more parameters. This knowledge can be profitably used in the current sample. To be specific, let us suppose a demand function for widgets was estimated in the past and it was found that the income coefficient had a value of 0.9, which was statistically significant. But in the data of Table 8-1, as previously seen, we could not assess the individual impact of earnings (a measure of income) on the quantity demanded. If there is reason to believe that the past value of the income coefficient of 0.9 has not changed much, we could reestimate Eq. (8.8) as follows:

Quantity = B1 + B2 price + B3 earnings + ui = B1 + B2 price + 0.9 earnings + ui (8.20)

Quantity - 0.9 earnings = B1 + B2 price + ui

where use is made of the prior information that B3 = 0.9. Assuming that the prior information is correct, we have resolved the

collinearity problem, for on the right-hand side of Equation (8.20) we now have only one explanatory variable and no question of collinearity arises. To run Eq. (8.20), we only have to subtract from the quantity observation 0.9 times the corresponding earnings observation and treat the resulting difference as the dependent variable and regress it on price.17

R2

YN = 37.232 - 0.00501X2 - 0.6112X3 + 0.1984X4 + 0.0695X5

264 PART TWO: REGRESSION ANALYSIS IN PRACTICE

17Note that multicollinearity is often encountered in times series data because economic vari- ables tend to move with the business cycle. Here information from cross-sectional studies might be used to estimate one or more parameters in the models based on time series data.

guj75845_ch08.qxd 4/16/09 12:00 PM Page 264

CHAPTER EIGHT: MULTICOLLINEARITY: WHAT HAPPENS IF EXPLANATORY VARIABLES ARE CORRELATED? 265

Although an intuitively appealing method, the crux of the method lies in ob- taining extraneous, or prior, information, which is not always possible. But, more critically, even if we can obtain such information, to assume that the prior information continues to hold in the sample under study may be a “tall” as- sumption. Of course, if the income effect is not expected to vary considerably from sample to sample, and if we do have prior information on the income coefficient, this remedial measure can sometimes be employed.

Transformation of Variables

Occasionally, transformation of variables included in the model can minimize, if not solve, the problem of collinearity. For example, in a study of the aggregate consumption expenditure in the United States as a function of aggregate in- come and aggregate wealth we might express aggregate consumption expendi- ture on a per capita basis, that is, per capita consumption expenditure as a func- tion of per capita income and per capita wealth. It is possible that if there is serious collinearity in the aggregate consumption function, it may not be so se- vere in the per capita consumption function. Of course, there is no guarantee that such a transformation will always help, leaving aside for the moment the question of whether the aggregate or per capita consumption function to begin with is the appropriate model.

As an example of how a simple transformation of variables can reduce the severity of collinearity, consider the following regression based on the U.S. data for 1965 to 1980:18

t = N.A. (1.232) (1.844) R2 = 0.9894 (8.21)

where N.A. = not available Y = imports ($, in billions)

X2 = the GNP ($, in billions) X3 = the Consumer Price Index (CPI)

In theory, imports are positively related to the GNP (a measure of income) and domestic prices.

The regression results show that neither the income nor the price coefficient is individually statistically significant at the 5% level (two-tailed).19 But on the basis of the F test, we can easily reject the null hypothesis that the two (partial)

YNt = -108.20 + 0.045X2t + 0.931X3t

18See Dominick Salvatore, Managerial Economics, McGraw-Hill, New York, 1989, pp. 156–157. Notation is adapted.

19But note that the price coefficient is significant at the 5% level on the basis of the one-tailed t test.

guj75845_ch08.qxd 4/16/09 12:00 PM Page 265

slope coefficients are jointly equal to zero (check this out), strongly suggesting that the regression (8.21) is plagued by the collinearity problem. To resolve collinearity, Salvatore obtained the following regression:

(8.22)

where N.A. = not available. This regression shows that real imports are statisti- cally significantly positively related to real income, the estimated t value being highly significant. Thus, the “trick” of converting the nominal variables into “real” variables (i.e., transforming the original variables) has apparently elimi- nated the collinearity problem.20

Other Remedies

The preceding remedies are only suggestive. There are several other remedies suggested in the literature, such as combining time series and cross-sectional data, factor or principal component analysis and ridge regression. But a full discussion of these topics would not only take us far afield, it would also require statistical knowledge that is way beyond that assumed in this text.

8.9 SUMMARY

An important assumption of the classical linear regression model is that there is no exact linear relationship(s), or multicollinearity, among explanatory vari- ables. Although cases of exact multicollinearity are rare in practice, situations of near exact or high multicollinearity occur frequently. In practice, therefore, the term multicollinearity refers to situations where two or more variables can be highly linearly related.

The consequences of multicollinearity are as follows. In cases of perfect mul- ticollinearity we cannot estimate the individual regression coefficients or their standard errors. In cases of high multicollinearity individual regression coeffi- cients can be estimated and the OLS estimators retain their BLUE property. But the standard errors of one or more coefficients tend to be large in relation to their coefficient values, thereby reducing t values. As a result, based on esti- mated t values, we can say that the coefficient with the low t value is not statis- tically different from zero. In other words, we cannot assess the marginal or

t = N.A. (12.22) R2 = 0.9142

YNt X3t

= -1.39 + 0.202 X2t X3t

266 PART TWO: REGRESSION ANALYSIS IN PRACTICE

20Some authors warn against transforming variables routinely in this fashion. For details, see E. Kuh and J. R. Meyer, “Correlation and Regression Estimates When the Data Are Ratios,” Econometrica, pp. 400–416, October 1955. Also, see G. S. Maddala, Introduction to Econometrics, Macmillan, New York, 1988, pp. 172–174.

guj75845_ch08.qxd 4/16/09 12:00 PM Page 266

CHAPTER EIGHT: MULTICOLLINEARITY: WHAT HAPPENS IF EXPLANATORY VARIABLES ARE CORRELATED? 267

individual contribution of the variable whose t value is low. Recall that in a mul- tiple regression the slope coefficient of an X variable is the partial regression coef- ficient, which measures the (marginal or individual) effect of that variable on the dependent variable, holding all other X variables constant. However, if the ob- jective of study is to estimate a group of coefficients fairly accurately, this can be done so long as collinearity is not perfect.

In this chapter we considered several methods of detecting multicollinearity, pointing out their pros and cons. We also discussed the various remedies that have been proposed to solve the problem of multicollinearity and noted their strengths and weaknesses.

Since multicollinearity is a feature of a given sample, we cannot foretell which method of detecting multicollinearity or which remedial measure will work in any given concrete situation.

KEY TERMS AND CONCEPTS

The key terms and concepts introduced in this chapter are

Perfect and imperfect collinearity a) near or very high

multicollinearity b) perfectly linearly related c) perfect collinearity or

multicollinearity d) near perfect linear

relationship Partial correlation coefficient Subsidiary regression or auxiliary

regression Variance inflation factor (VIF)

Remedial measures for multicollinearity a) dropping variables; model

specification error b) acquiring a new sample (or

additional data) c) rethinking the model d) extraneous, or prior,

information e) transformation of variables f) other—factor or principal

component analysis; ridge regression

QUESTIONS

8.1. What is meant by collinearity? And by multicollinearity? 8.2. What is the difference between perfect and imperfect multicollinearity? 8.3. You include the subject’s height, measured in inches, and the same subject’s

height measured in feet in a regression of weight on height. Explain intuitively why ordinary least squares (OLS) cannot estimate the regression coefficients in such a regression.

8.4. Consider the model

where Y = the total cost of production and X = the output. Since X2 and X3 are functions of X, there is perfect collinearity. Do you agree? Why or why not?

Yi = B1 + B2Xi + B3Xi2 + B4Xi3 + ui

guj75845_ch08.qxd 4/16/09 12:00 PM Page 267

8.5. Refer to Equations (4.21), (4.22), (4.25), and (4.27). Let x3i = 2x2i. Show why it is impossible to estimate these equations.

8.6. What are the theoretical consequences of imperfect multicollinearity? 8.7. What are the practical consequences of imperfect multicollinearity? 8.8. What is meant by the variance inflation factor (VIF)? From the formula (8.14),

can you tell the least possible and the highest possible value of the VIF? 8.9. Fill in the gaps in the following sentences:

a. In cases of near multicollinearity, the standard errors of regression coeffi- cients tend to be _______ and the t ratios tend to be _______.

b. In cases of perfect multicollinearity, OLS estimators are _______ and their variances are _______.

c. Ceteris paribus, the higher the VIF is, the higher the _______ of OLS esti- mators will be.

8.10. State with reasons whether the following statements are true or false: a. Despite perfect multicollinearity, OLS estimators are best linear unbiased

estimators (BLUE). b. In cases of high multicollinearity, it is not possible to assess the individual

significance of one or more partial regression coefficients. c. If an auxiliary regression shows that a particular is high, there is definite

evidence of high collinearity. d. High pairwise correlations do not necessarily suggest that there is high

multicollinearity. e. Multicollinearity is harmless if the objective of the analysis is prediction only.

8.11. In data involving economic time series such as unemployment, money supply, interest rate, or consumption expenditure, multicollinearity is usually sus- pected. Why?

8.12. Consider the following model:

where Y = the consumption X = the income t = the time

This model states that consumption expenditure at time t is a linear function of income not only at time t but also of income in three previous time periods. Such models are called distributed lag models and represent what are called dynamic models (i.e., models involving change over time). a. Would you expect multicollinearity in such models and why? b. If multicollinearity is suspected, how would you get rid of it?

PROBLEMS

8.13. Consider the following set of hypothetical data:

Y: -10 -8 -6 -4 -2 0 2 4 6 8 10 X2: 1 2 3 4 5 6 7 8 9 10 11 X3: 1 3 5 7 9 11 13 15 17 19 21

Yt = B1 + B2Xt + B3Xt-1 + B4Xt-2 + B3Xt-3 + ut

R2i

268 PART TWO: REGRESSION ANALYSIS IN PRACTICE

guj75845_ch08.qxd 4/16/09 12:00 PM Page 268

CHAPTER EIGHT: MULTICOLLINEARITY: WHAT HAPPENS IF EXPLANATORY VARIABLES ARE CORRELATED? 269

Suppose you want to do a multiple regression of Y on X2 and X3. a. Can you estimate the parameters of this model? Why or why not? b. If not, which parameter or combination of parameters can you estimate?

8.14. You are given the annual data in Table 8-5 for the United States for the pe- riod 1971 to 1986. Consider the following aggregate demand function for passenger cars:

where ln = the natural log a. What is the rationale for the introduction of both price indexes X2 and X3? b. What might be the rationale for the introduction of the “employed civilian

labor force” (X6) in the demand function? c. How would you interpret the various partial slope coefficients? d. Obtain OLS estimates of the preceding model.

8.15. Continue with Problem 8.14. Is there multicollinearity in the previous prob- lem? How do you know?

8.16. If there is collinearity in Problem 8.14, estimate the various auxiliary regres- sions and find out which of the X variables are highly collinear.

ln Yi = B1 + B2 ln X2t + B3 ln X3t + B4 ln X4t + B5 ln X5t + B6 ln X6t + ut

DEMAND FOR NEW PASSENGER CARS IN THE UNITED STATES, 1971 TO 1986

Year Y X2 X3 X4 X5 X6

1971 10227 112.0 121.3 776.8 4.89 79367 1972 10872 111.0 125.3 839.6 4.55 82153 1973 11350 111.1 133.1 949.8 7.38 85064 1974 8775 117.5 147.7 1038.4 8.61 86794 1975 8539 127.6 161.2 1142.8 6.16 85846 1976 9994 135.7 170.5 1252.6 5.22 88752 1977 11046 142.9 181.5 1379.3 5.50 92017 1978 11164 153.8 195.4 1551.2 7.78 96048 1979 10559 166.0 217.4 1729.3 10.25 98824 1980 8979 179.3 246.8 1918.0 11.28 99303 1981 8535 190.2 272.4 2127.6 13.73 100397 1982 7980 197.6 289.1 2261.4 11.20 99526 1983 9179 202.6 298.4 2428.1 8.69 100834 1984 10394 208.5 311.1 2670.6 9.65 105005 1985 11039 215.2 322.2 2841.1 7.75 107150 1986 11450 224.4 328.4 3022.1 6.31 109597

Notes:Y = New passenger cars sold (thousands), seasonally unadjusted. X2 = New cars Consumer Price Index (1967 = 100), seasonally unadjusted. X3 = Consumer Price Index, all items, all urban consumers (1967 = 100),

seasonally unadjusted. X4 = Personal disposable income (PDI) ($, in billions), unadjusted for

seasonal variation. X5 = Interest rate (percent), finance company paper placed directly. X6 = Employed civilian labor force (thousands), unadjusted for seasonal

variation. Source: Business Statistics, 1986, a Supplement to the Current Survey of

Business, U.S. Department of Commerce.

TABLE 8-5

guj75845_ch08.qxd 4/16/09 12:00 PM Page 269

8.17. Continuing with the preceding problem, if there is severe collinearity, which variable would you drop and why? If you drop one or more X variables, what type of error are you likely to commit?

8.18. After eliminating one or more X variables, what is your final demand function for passenger cars? In what ways is this “final” model better than the initial model that includes all X variables?

8.19. What other variables do you think might better explain the demand for auto- mobiles in the United States?

8.20. In a study of the production function of the United Kingdom bricks, pottery, glass, and cement industry for the period 1961 to 1981, R. Leighton Thomas obtained the following results:21

1. log Q = -5.04 + 0.887 log K + 0.893 log H

se = (1.40) (0.087) (0.137) R2 = 0.878

2. log Q = -8.57 + 0.0272t + 0.460 log K + 1.285 log H

se = (2.99) (0.0204) (0.333) (0.324) R2 = 0.889

where Q = the index of production at constant factor cost K = the gross capital stock at 1975 replacement cost H = hours worked t = the time trend, a proxy for technology

The figures in parentheses are the estimated standard errors. a. Interpret both regressions. b. In regression (1) verify that each partial slope coefficient is statistically

significant at the 5% level. c. In regression (2) verify that the coefficients of t and log K are individually

insignificant at the 5% level. d. What might account for the insignificance of log K variable in Model 2? e. If you were told that the correlation coefficient between t and log K is 0.980,

what conclusion would you draw? f. Even if t and log K are individually insignificant in Model 2, would you

accept or reject the hypothesis that in Model 2 all partial slopes are simul- taneously equal to zero? Which test would you use?

g. In Model 1, what are the returns to scale? 8.21. Establish Eqs. (8.12) and (8.13). (Hint: Find out the coefficient of correlation be-

tween X2 and X3, say, .) 8.22. You are given the hypothetical data in Table 8-6 on weekly consumption

expenditure (Y), weekly income (X2), and wealth (X3), all in dollars. a. Do an OLS regression of Y on X2 and X3. b. Is there collinearity in this regression? How do you know? c. Do separate regressions of Y on X2 and Y on X3. What do these regressions

reveal? d. Regress X3 on X2. What does this regression reveal? e. If there is severe collinearity, would you drop one of the X variables? Why

or why not?

r223

270 PART TWO: REGRESSION ANALYSIS IN PRACTICE

21See R. Leighton Thomas, Introductory Econometrics: Theory and Applications, Longman, London, 1985, pp. 244–246.

guj75845_ch08.qxd 4/16/09 12:00 PM Page 270

CHAPTER EIGHT: MULTICOLLINEARITY: WHAT HAPPENS IF EXPLANATORY VARIABLES ARE CORRELATED? 271

8.23. Utilizing the data given in Table 8-1, estimate Eq. (8.20) and compare your results.

8.24. Check that all R2 values in Table 8-4 are statistically significant. 8.25. Refer to Problem 7.19 and the data given in Table 7-9. How would your

answer to this problem change knowing what you now know about multi- collinearity? Present the necessary regression results.

8.26. Refer to Problem 2.16. Suppose you regress ASP on GPA, GMAT, acceptance rate (%), tuition, and recruiter rating. A priori, would you face the multi- collinearity problem? If so, how would you resolve it? Show all the necessary regression results.

8.27. Based on the quarterly data for the U.K. for the period 1990-1Q to 1998-2Q, the following results were obtained by Asteriou and Hall.22 The dependent variable in these regressions is Log(IM) = logarithm of imports (t ratios in parentheses).

Explanatory variable Model 1 Model 2 Model 3

Intercept 0.6318 0.2139 0.6857 (1.8348) (0.5967) (1.8500)

Log(GDP) 1.9269 1.9697 2.0938 (11.4117) (12.5619) (12.1322)

Log(CPI) 0.2742 1.0254 — (1.9961) (3.1706) 0.1195

Log(PPI) — -0.7706 0.1195 (-2.5248) (0.8787)

Adjusted-R2 0.9638 0.9692 0.9602

Notes: GDP = gross domestic product CPI = Consumer Price Index PPI = producer price index

HYPOTHETICAL DATA ON CONSUMPTION EXPENDITURE (Y ), WEEKLY INCOME (X2), AND WEALTH (X3)

Y X2 X3

70 80 810 65 100 1009 90 120 1273 95 140 1425

110 160 1633 115 180 1876 120 200 2252 140 220 2201 155 240 2435 150 260 2686

TABLE 8-6

22See Dimitrios Asteriou and Stephen Hall, Applied Econometrics: A Modern Approach, Palgrave/Macmillan, New York, 2007, Chapter 6. Note that these results are summarized from var- ious tables given in that chapter.

guj75845_ch08.qxd 4/16/09 12:01 PM Page 271

a. Interpret each equation. b. In Model 1, which drops Log(PPI), the coefficient of Log(CPI) is positive

and significant at about the 5% level. Does this make economic sense? c. In Model 3, which drops Log(CPI), the coefficient of Log(PPI) is positive

but statistically insignificant. Does this make economic sense? d. Model 2 includes the logs of both price variables and their coefficients are

individually statistically significant. However, the coefficient of Log(CPI) is positive and that of Log(PPI) is negative. How would you rationalize this result?

e. Do you think multicollinearity is the reason why some of these results are conflicting? Justify your answer.

f. If you were told that the correlation between PPI and CPI is 0.9819, would that suggest that there is a multicollinearity problem?

g. Of the three models given above, which would you choose and why? 8.28. Table 8-7 on the textbook’s Web site gives data on imports, GDP, and the

Consumer Price Index (CPI) for the United States over the period 1975–2005. You are asked to consider the following model:

a. Estimate the parameters of this model using the data given in the table. b. Do you suspect that there is multicollinearity in the data? c. Regress: (1)

(2) (3)

On the basis of these regressions, what can you say about the nature of multicollinearity in the data?

d. Suppose there is multicollinearity in the data but and are individually significant at the 5% level and the overall F test is also significant. In this case, should we worry about the collinearity problem?

8.29. Table 8-8 on the textbook’s Web site gives data on new passenger cars sold in the United States as a function of several variables. a. Develop a suitable linear or log-linear model to estimate a demand func-

tion for automobiles in the United States. b. If you decide to include all the regressors given in the table as explanatory

variables, do you expect to face the multicollinearity problem? Why? c. If you do expect to face the multicollinearity problem, how will you go

about resolving the problem? State your assumptions clearly and show all calculations.

8.30. As cheese ages, several chemical processes take place that determine the taste of the final product. Table 8-9 on the textbook’s Web site contains data on the concentrations of various chemicals in 30 samples of mature cheddar cheese and a subjective measure of taste for each sample. The variables Acetic and H2S are the natural logarithm of the concentration of acetic acid and hydrogen sulfide, respectively. The variable lactic has not been log-transformed. a. Draw a scatterplot of the four variables. b. Do a bivariate regression of taste on acetic and H2S and interpret your

results.

�N 3�N 2

ln GDPt = C1 + C2 ln CPIt ln Importst = B1 + B2 ln CPIt ln Importst = A1 + A2 ln GDPt

ln Importst = �1 + �2 ln GDPt + �3 ln CPIt + ut

272 PART TWO: REGRESSION ANALYSIS IN PRACTICE

guj75845_ch08.qxd 4/16/09 12:01 PM Page 272

CHAPTER EIGHT: MULTICOLLINEARITY: WHAT HAPPENS IF EXPLANATORY VARIABLES ARE CORRELATED? 273

c. Do a bivariate regression of taste on lactic and H2S. How do interpret the results?

d. Do a multiple regression of taste on acetic, H2S, and lactic. Interpret your results.

e. Knowing what you know about multicollinearity, how would you decide among these regressions?

f. What overall conclusions can you draw from your analysis? 8.31. Table 8-10 on the textbook’s Web site gives data on the average salary of top

managers (in thousands of Dutch guilders), profit (in millions of Dutch guilders), and turnover (in millions of Dutch guilders) for 84 of the largest firms in the Netherlands.23 Let Y = salary, X2 = profit, and X3 = turnover. a. Estimate the following regression:

where ln = natural logarithm. b. Are all the slope coefficients individually statistically significant at the 5%

level? c. Are the slope coefficients together statistically significant at the 5% level?

Which test would you use and why? d. If the answer to (b) is yes, and the answer to (a) is no, what may be the

reason(s)? e. If you suspect multicollinearity, how would you find that out? Which

test(s) would you use? Note: Show all your calculations.

ln Yi = B1 + B2 ln X2 + B3 ln X3 + ui

23These data are from Christiaan Heij, Paul de Boer, Philip Hans Franses, Teun Kloek, and Herman K. van Dijk, Econometric Methods with Applications in Business and Economics, Oxford University Press, 2004. See their Web site at www.oup.com/uk/economics/cws. The original data are for 100 large firms, but we have included the data for 84 firms because for 16 firms, data on one or more variables were not available.

guj75845_ch08.qxd 4/16/09 12:01 PM Page 273

274

CHAPTER 9 HETEROSCEDASTICITY: WHAT HAPPENS IF THE ERROR VARIANCE IS NONCONSTANT?

An important assumption of the classical linear regression model (CLRM) is that the disturbances ui entering the population regression function (PRF) are homoscedastic; that is, they all have the same variance, . If this is not the case— if the variance of ui is , indicating that it is varying from observation to observation (notice the subscript on )—we have the situation of heteroscedas- ticity, or unequal, or nonconstant, variance.

However, the assumption of homoscedasticity is imposed by the CLRM. There is no guarantee in practice that this assumption will always be fulfilled. Therefore, the major goal of this chapter is to find out what happens if this assumption is not fulfilled. Specifically, we seek answers to the following questions:

1. What is the nature of heteroscedasticity? 2. What are its consequences? 3. How do we detect that it is present in a given situation? 4. What are the remedial measures if heteroscedasticity is a problem?

9.1 THE NATURE OF HETEROSCEDASTICITY

To explain best the difference between homoscedasticity and heteroscedasticity, let us consider a two-variable linear regression model in which the dependent variable Y is personal savings and the explanatory variable X is personal disposable, or after-tax, income (PDI). Now consider the diagrams in Figure 9-1 (cf. Figure 3-2[a] and 3-2[b]).

Figure 9-1(a) shows that as PDI increases, the mean, or average, level of savings also increases, but the variance of savings around its mean value remains the same at all levels of PDI. Recall that the PRF gives the mean, or average, value of the

�2 �2i

�2

guj75845_ch09.qxd 4/16/09 12:20 PM Page 274

CHAPTER NINE: HETEROSCEDASTICITY: WHAT HAPPENS IF THE ERROR VARIANCE IS NONCONSTANT? 275

dependent variable for given levels of the explanatory variable(s). This is the case of homoscedasticity, or equal variance. On the other hand, as Figure 9-1(b) shows, although the average level of savings increases as the PDI increases, the variance of savings does not remain the same at all levels of PDI. Here it increases with PDI. This is the case of heteroscedasticity, or unequal variance. Put differently, Figure 9-1(b) shows that high-income people, on average, save more than low- income people, but there is also more variability in their savings. This is not only plausible; it isalsoborneoutbyacasualglanceatU.S. savingsandincomestatistics. After all, there is very little discretionary income left to save for people on the lower rung of the income distribution ladder. Therefore, in a regression of savings on in- come, error variances (i.e., variance of ui) associated with high-income families are expected to be greater than those associated with low-income families.

Symbolically, we express heteroscedasticity as

(9.1)

Notice again the subscript on , which is a reminder that the variance of ui, is no longer constant but varies from observation to observation.

Researchers have observed that heteroscedasticity is usually found in cross- sectional data and not in time series data.1 In cross-sectional data we generally deal with members of a population at a given point in time, such as individual

�2

E Au2i B = �2i

Y Y

S av

in gs

X1 X2

PDI

Xn X

X1 X2

PDI

Xn X

(a) (b)

S av

in gs

0 0

(a) Homoscedasticity; (b) heteroscedasticityFIGURE 9-1

1This is, strictly speaking, not always true. In the autoregressive conditional heteroscedasticity (ARCH) models, heteroscedasticity can be observed in time series data also. But this is an involved topic and we will not discuss it in this text. For a discussion of the ARCH model, see Gujarati and Porter, Basic Econometrics, McGraw-Hill, 5th ed., New York, 2009, pp. 791–796.

guj75845_ch09.qxd 4/16/09 12:20 PM Page 275

consumers or their families; firms; industries; or geographical subdivisions, such as a state, county, or city. Moreover, these members may be of different sizes, such as small, medium, or large firms, or low, medium, or high income. In other words, there may be some scale effect. In time series data, on the other hand, the variables tend to be of similar orders of magnitude because researchers generally collect data for the same entity over a period of time. Examples are the gross domestic product (GDP), savings, or unemployment rate, say, over the period 1960 to 2008.

As a concrete illustration of heteroscedasticity, we present two examples.

Example 9.1. Brokerage Commission on the NYSE After Deregulation

Between April and May of 1975 the Securities and Exchange Commission (SEC) abolished the practice of fixed commission rates on stock transactions on the New York Stock Exchange (NYSE) and allowed stockbrokers to charge commission on a competitive basis. Table 9-1 presents data on the average

276 PART TWO: REGRESSION ANALYSIS IN PRACTICE

COMMISSION RATE TRENDS, NEW YORK STOCK EXCHANGE, APRIL 1975–DECEMBER 1978

X1 X2 X3 X4

April 1975 59.60 45.70 27.60 15.00 June 54.50 36.80 21.30 12.10 September 51.70 34.50 20.40 11.50 December 48.90 31.90 18.90 10.40

March 1976 50.30 33.80 19.00 10.80 June 50.00 33.40 19.50 10.90 September 46.70 31.10 18.40 10.20 December 47.00 31.20 17.60 10.00

March 1977 44.30 28.80 16.00 9.80 June 43.70 28.10 15.50 9.70 September 40.40 26.10 14.50 9.10 December 40.40 25.40 14.00 8.90

March 1978 40.20 25.00 13.90 8.10 June 43.10 27.00 14.40 8.50 September 42.50 26.90 14.40 8.70 December 40.70 24.50 13.70 7.80

Standard Name n Mean deviation Variance Minimum Maximum

X1 16 46.500 5.6767 32.225 40.200 59.600 X2 16 30.637 5.5016 30.268 24.500 45.700 X3 16 17.444 3.7234 13.864 13.700 27.600 X4 16 10.094 1.7834 3.1806 7.8000 15.000

Notes: X1 = Commission rate, cents per share (for 0 to 199 shares) X2 = Commission rate, cents per share (for 200 to 299 shares) X3 = Commission rate, cents per share (for 1000 to 9999 shares) X4 = Commission rate, cents per share (for 10,000+ shares)

Source: S. Tinic and R. West, “The Securities Industry Under Negotiated Brokerage Commissions: Changes in the Structure and Performance of NYSE Member Firms,” The Bell Journal of Economics, vol. 11, no. 1, Spring 1980.

TABLE 9-1

guj75845_ch09.qxd 4/16/09 12:20 PM Page 276

per share commission (in cents) charged by the brokerage industry to institutional investors for selected quarterly periods between April 1975 and December 1978.

Notice two interesting features of this table. There is a downward trend in the commission rate charged since the deregulation. But, more interestingly, there is a substantial difference in the average commission charged and the variance of commission among the four categories of institutional investors shown in the table. The smallest institutional investors, those with share transactions in the range of 0 to 199 shares, on average, paid a commission of 46.5 cents per share with a variance of 32.22, whereas the largest institutional investors paid, on average, a rate of only 10.1 cents per share with a variance of only 3.18. All this can be seen more vividly in Figure 9-2.

What explains this difference? Obviously, some scale effect seems to be evident here—the larger the volume of the transaction is, the lower the total cost of transacting is, and therefore the lower the average cost will be. Economists would say that there are economies of scale in the brokerage indus- try data given in Table 9-1. (But this may not necessarily be so. See Example 9.8 in Section 9.6.) Even if there are scale economies in the brokerage industry, why should the variance of the commission rate in the four categories be different? In other words, why is there heteroscedasticity? To attract the business of big institutional investors such as pension funds and mutual funds, brokerage firms compete so intensely among themselves that there is not much variability in the commission rates they charge. Small institutional investors may not have the same bargaining clout as large institutions, and hence have more variability in the commission rates that they pay. These and other reasons may explain the heteroscedasticity observed in the data of Table 9-1.

CHAPTER NINE: HETEROSCEDASTICITY: WHAT HAPPENS IF THE ERROR VARIANCE IS NONCONSTANT? 277

Time

4 8 120

C om

m is

si on

p er

S h

ar e

(i n

c en

ts )

60.000 56.842 53.684 50.526 47.368 44.211 41.053 37.895 34.737 31.579 28.421 25.263 22.105 18.947 15.789 12.632 9.4737 6.3158 3.1579 0.13323E-13

0–199 shares 200–999 shares 1000–9999 shares 10000 + shares

16

Commission per share, in cents, NYSE, April 1975 to December 1978 (based on Table 9-1 data)

FIGURE 9-2

guj75845_ch09.qxd 4/16/09 12:20 PM Page 277

Now if we were to develop a regression model to explain the commission rate as a function of the number of share transactions (and other variables), the error variance associated with high-transaction clients would be lower than that associated with low-transaction clients.

Example 9.2. Wage and Related Data for 523 Workers

As an example of purely cross-sectional data with potential for heteroscedasticity, consider the data given in Table 9-2, which is posted on the book’s Web site.2

Data for 523 workers were collected on several variables, but to keep the illustration simple, in this example we will consider only the relationship be- tween Wage (per hour, $), Education (years of schooling), and Experience (years of work experience). Let us suppose we want to find out how wages behave in relation to education, holding all other variables constant.

(9.2)

A priori, we would expect a positive relationship between wages and the two regressors. The results of this regression for our data are as follows:

Dependent Variable: WAGE Method: Least Squares (9.3)

Sample: 1 523 Included observations: 523

Coefficient Std. Error t-Statistic Prob.

C -4.524472 1.239348 -3.650687 0.0003 EDUC 0.913018 0.082190 11.10868 0.0000 EXPER 0.096810 0.017719 5.463513 0.0000

R-squared 0.194953 Mean dependent var 9.118623 Adjusted R-squared 0.191856 S.D. dependent var 5.143200 S.E. of regression 4.623573 F-statistic 62.96235 Sum squared resid 11116.26 Prob (F-statistic) 0.000000 Durbin-Watson stat 1.867684

Note: The Durbin-Watson statistic is discussed fully in Chapter 10. It is routinely produced as a part of standard regression output.

These results confirm our prior expectations: Wages are strongly positively related to education as well as work experience. The estimated coefficients of the two regressors are highly significant, assuming the classical assumptions hold.

Wagei = B1 + B2Edui + B3Exper + ui

278 PART TWO: REGRESSION ANALYSIS IN PRACTICE

2These data are obtained from http://lib.stat.edu/datasets/CPS_85_wages and supplemented from http://www.economicswebinstitute.org. The original data included 534 observations, but 11 observations had no work experience and so were dropped.

guj75845_ch09.qxd 4/16/09 12:20 PM Page 278

Since we have data on 523 workers with diverse backgrounds, it is likely that the assumption of homoscedasticity may not hold. If that is the case, the estimated standard errors and t values may not be reliable. To see if this possibility exists, we plot the squared residuals obtained from regression (9.3), first by themselves (Figure 9-3) and then against each regressor (Figures 9-4[a] and [b]).

As Figures 9-4(a) and 9-4(b) show, there is considerable variability in the data, raising the possibility that our regression sufffers from heteroscedasticity.

A cautionary note: It is true that the residuals ei are not the same thing as the disturbances ui, although they are proxies. Therefore, from the observed variability of squared ei we cannot categorically conclude that the variance of ui is also variable.3 But as we will show later, in practice we do not observe ui, and thus we will have to make do with ei. Therefore, by examining the pattern of , we will have to infer something about the pattern of . Also keep in mind

that we estimate the variance of as , where n is the sample size and k is the number of parameters estimated, and this is an unbiased estimate of .

Suppose in our wage regression we believe, say, on the basis of Figures 9-3 and 9-4, that we can have a heteroscedasticity situation. What then? Are the regression results given in the model (9.3), which are based explicitly on the assumption of homoscedasticity, useless?4 To answer this question, we must find out what happens to the OLS method if there is heteroscedasticity, which is done in the following section.

�2u

g e2i n - k

ui (= �2u)

u2ie 2 i

CHAPTER NINE: HETEROSCEDASTICITY: WHAT HAPPENS IF THE ERROR VARIANCE IS NONCONSTANT? 279

1400

1200

1000

800

600

400

200

0 50 100 150 200 250 300 350 400 450 500

SS1

Squared residuals from regression (9.3)FIGURE 9-3

3For the relationship between and , see E. Malinvaud, Statistical Methods of Econometrics, North-Holland, Amsterdam, 1970, pp. 88–89.

4As a practical matter, when running a regression, we generally assume that all assumptions of the CLRM are fulfilled. It is only when we examine the regression results that we begin to look for some clues which might tell us that one or more assumptions of the CLRM may not be tenable. This is not altogether a bad strategy. Why “look a gift horse in the mouth”?

uiei

guj75845_ch09.qxd 4/16/09 12:20 PM Page 279

9.2 CONSEQUENCES OF HETEROSCEDASTICITY

Recall that under the assumptions of the CLRM, ordinary least squares (OLS) estimators are best linear unbiased estimators (BLUE); that is, in the class of linear unbiased estimators least squares estimators have minimum variance—they are efficient. Now assume that all assumptions of CLRM hold except that we drop the assumption of homoscedasticity, allowing for the disturbance variance to be different from observation to observation. The following consequences are stated without proofs:5

1. OLS estimators are still linear. 2. They are still unbiased. 3. But they no longer have minimum variance; that is, they are no longer

efficient. This is so even in large samples. In short, OLS estimators are no longer BLUE in small as well as in large samples (i.e., asymptotically).

4. The usual formulas to estimate the variances of OLS estimators are generally biased. A priori we cannot tell whether the bias will be positive (upward bias) or negative (downward bias). A positive bias occurs if OLS overestimates the true variances of estimators, and a negative bias occurs if OLS underestimates the true variances of estimators.

280 PART TWO: REGRESSION ANALYSIS IN PRACTICE

5Some of the proofs and references to other proofs can be found in Gujarati and Porter, Basic Econometrics, McGraw-Hill, 5th ed., New York, 2009, Chapter 11.

4 8

EDUC

S S

1

12 16 200

1400

1200

1000

800

600

400

200

0

(a) Squared residuals versus education; (b) Squared residuals versus experienceFIGURE 9-4

10 20

EXPER

S S

1

30 40 50 600

1400

1200

1000

800

600

400

200

0

(a) (b)

guj75845_ch09.qxd 4/16/09 12:20 PM Page 280

5. The bias arises from the fact that , the conventional estimator of true , namely, , is no longer an unbiased estimator of . (Note: The

d.f. are [n - 2] in the two-variable case, [n - 3] in the three-variable case, etc.) Recall that enters into the calculations of the variances of OLS estimators.

6. As a result, the usual confidence intervals and hypothesis tests based on t and F distributions are unreliable. Therefore, every possibility exists of drawing wrong conclusions if conventional hypothesis-testing proce- dures are employed.

In short, in the presence of heteroscedasticity, the usual hypothesis-testing routine is not reliable, raising the possibility of drawing misleading conclusions.

Returning to our wage regression (9.3), if we have reason to believe that there is heteroscedasticity (the formal tests for the presence of heteroscedastic- ity are discussed in Section 9.3), we should be very careful about interpreting the results. In Eq. (9.3) the coefficient of education has a t value of about 11 and the coefficient of experience has a t value of about 5, both of which are “highly” significant. But these values were obtained under classical assump- tions. What happens if the error variance is in fact heteroscedastic? As we noted previously, in that case the usual hypothesis-testing routine is not reliable, rais- ing the possibility of drawing misleading conclusions.

As the preceding discussion indicates, heteroscedasticity is potentially a serious problem, for it might destroy the whole edifice of the standard, and so routinely used, OLS estimation and hypothesis-testing procedure. Therefore, it is important in any concrete study, especially one involving cross-sectional data, that we determine whether we have a heteroscedasticity problem.

Before turning to the task of detecting heteroscedasticity, however, we should know, at least intuitively, why OLS estimators are not efficient under heteroscedasticity.

Consider our simple two-variable regression model. Recall from Chapter 2 that in OLS we minimize the residual sum of squares (RSS):

(2.13)

Now consider Figure 9-5. This figure shows a hypothetical Y population against selected values of the

X variable. As this diagram shows, the variance of each Y (sub) population corresponding to the given X is not the same throughout, suggesting het- eroscedasticity. Suppose we choose at random a Y value against each X value. The Y’s thus selected are encircled. As Equation (2.13) shows, in OLS each receives the same weight whether it comes from a population with a large variance or a small variance (compare points Yn and Y1). This does not seem sensible; ide- ally, we would like to give more weight to observations coming from populations

e2i

a e 2 i = a (Yi - b1 - b2Xi)

2

�N 2

�2ge2i /d.f.�2 �N 2

CHAPTER NINE: HETEROSCEDASTICITY: WHAT HAPPENS IF THE ERROR VARIANCE IS NONCONSTANT? 281

guj75845_ch09.qxd 4/16/09 12:20 PM Page 281

with smaller variances than those coming from populations with larger vari- ances. This will enable us to estimate the PRF more accurately. And this is pre- cisely what the method of weighted least squares (WLS) does, a method we will discuss later.

9.3 DETECTION OF HETEROSCEDASTICITY: HOW DO WE KNOW WHEN THERE IS A HETEROSCEDASTICITY PROBLEM?

Although theoretically it is easy to document the consequences of heteroscedas- ticity, it is often not so easy to detect it in a concrete situation. This is under- standable because can be known only if we have the entire Y population corresponding to the chosen X’s, as in the hypothetical population of our math S.A.T. score example given in Table 2-1. Unfortunately, however, we rarely have the entire population available for study. Most generally, we have a sample of some members of this population corresponding to the given values of the X variables. Typically, what we have is a single value of Y for given values of the X’s. And there is no way to determine the variance of the conditional distribu- tion of Y for the given X from a single Y value.6

Now we are “between the devil and the deep blue sea.” If there is het- eroscedasticity and we assume it away, we might be drawing misleading con- clusions on the basis of the usual OLS procedure because OLS estimators are not BLUE. But since our data are mostly based on a sample, we have no way of

�2i

282 PART TWO: REGRESSION ANALYSIS IN PRACTICE

(b)

Y

X1 X2 X3 X4 X5 X6 Xn X

Y1

Y2 Y3

u4

Y4 Y6

Y5

u n

Y n

(a)

0

Y

Y1

Y2

e1 Y3

e4

Y4

Y5

Y6

Y n

e n

SRF

X1 X2 X3 X4 X5 X6 Xn0

PRF

Hypothetical population and sample showing heteroscedasticityFIGURE 9-5

6Note that given the X’s, the variance of u and the variance of Y are the same. In other words, the conditional variance of u (conditional on the given X’s) is the same as the conditional variance of Y, as noted in footnote 3 of Chapter 3.

guj75845_ch09.qxd 4/16/09 12:20 PM Page 282

finding out the true error variance associated with each observation. If we could find out the true , it would be possible to solve the problem of heteroscedas- ticity, as is shown later in Section 9.4. What should we do?

As in the case of multicollinearity, we have no sure method of detecting het- eroscedasticity; we only have several diagnostic tools that may aid us in detect- ing it. Some of the diagnostics follow.

Nature of the Problem

Often the nature of the problem under consideration suggests whether het- eroscedasticity is likely to be present. For example, following the pioneering work of Prais and Houthakker7 on family budget studies, in which they found that the residual variance around the regression of consumption on income in- creased with income, it is now generally assumed that in similar studies we can expect heteroscedasticity in the error term. As a matter of fact, in cross- sectional data involving heterogeneous units, heteroscedasticity may be the rule rather than the exception. Thus, in cross-sectional studies involving investment expenditure in relation to sales, the rate of interest, etc., heteroscedasticity is generally expected if small-, medium-, and large-sized firms are sampled together. Similarly, in a cross-sectional study of the average cost of production in relation to the output, heteroscedasticity is likely to be found if small-, medium-, and large-sized firms are included in the sample. (See Example 9.8 in Section 9.6.)

Graphical Examination of Residuals

In applied regression analysis it is always a good practice to examine the resid- uals obtained from the fitted regression line (or plane), for they may provide useful clues about the adequacy of the fitted model. Sometimes it is helpful to create a residual plot of the squared residuals, especially in the context of heteroscedasticity. The squared residuals can be plotted on their own (as in Figure 9-3) or they can be plotted against one or more explanatory variables (as in Figures 9-4[a] and 9-4[b]).

In Figure 9-6, we consider several likely patterns of squared residuals that one may encounter in applied work. Figure 9-6(a) has no discernible system- atic pattern between and X, suggesting that perhaps there is no het- eroscedasticity in the data. On the other hand, Figures 9-6(b) to (e) exhibit sys- tematic relationships between the squared residuals and the explanatory variable X. For example, Figure 9-6(c) suggests a linear relationship between the two, whereas Figures 9-6(d) and (e) suggest a quadratic relationship.

e2

�2i

CHAPTER NINE: HETEROSCEDASTICITY: WHAT HAPPENS IF THE ERROR VARIANCE IS NONCONSTANT? 283

7S. J. Prais and H. S. Houthakker, The Analysis of Family Budgets, Cambridge University Press, New York, 1955.

guj75845_ch09.qxd 4/16/09 12:20 PM Page 283

Therefore, if in an application the squared residuals exhibit one of the patterns shown in Figure 9-6(b) to (e), there is a possibility that heteroscedasticity is present in the data.

Keep in mind that the preceding graphical plots are simply diagnostic tools. Once the suspicion of heteroscedasticity is raised, we should proceed more cautiously to make sure that this suspicion is not just a “red herring.” Shortly we will present some formal procedures to do exactly that.

Meanwhile we can pose a couple of practical questions. Suppose we have a multiple regression involving, say, four X variables. How do we proceed then? The most straightforward way to proceed is to plot against each X variable. It is possible that the patterns exhibited in Figure 9-6 can hold true of only one of the X variables. Sometimes we can resort to a shortcut. Instead of plotting against each X variable, plot them against , the estimated mean value of Y. Since is a linear combination of the X’s (Why?), a plot of squared residuals against might exhibit one of the patterns shown in Figures 9-6(b) to (e), suggesting that perhaps heteroscedasticity is present in the data. This avoids the need for plotting the squared residuals against individual X variables, especially if the number of explanatory variables in the model is very large.

Suppose we plot against one or more X variables or against , and further suppose the plot suggests heteroscedasticity. What then? In Section 9.4 we

YNie 2 i

YNi YNi

YNi

e2i

e2i

284 PART TWO: REGRESSION ANALYSIS IN PRACTICE

e2e2

e2e2e2

(a)

X

(b)

X

(c)

X

(d)

X

(e)

X

0

0

0 0

0

Hypothetical patterns of e2FIGURE 9-6

guj75845_ch09.qxd 4/16/09 12:20 PM Page 284

will show how the knowledge that is related to an X variable or to enables us to transform the original data so that in the transformed data there is no heteroscedasticity.

Now let us return to our wage regression (9.3). In Figure 9-7 we plot the squared residuals estimated from regression (9.3) against the estimated wage values (Wagef) from this regression.8

This figure probably most closely resembles Figure 9-6(b), clearly suggesting that the squared residuals are systematically related to estimated wage values (which are linear combinations of education and experience), again supporting our earlier doubt that regression (9.3) suffers from the heteroscedasticity problem.

Also note that there is one observation (an outlier?) that is quite visible. In a sample of 523 observations, one outlier may not exert undue influence, but in small samples it can. So keep in mind that outliers also may be a cause of heteroscedasticity, especially in small samples.

Park Test9

The intuitively and visually appealing graphical test just presented can be formalized. If there is heteroscedasticity, the heteroscedastic variance may�2i

YNie 2 i

CHAPTER NINE: HETEROSCEDASTICITY: WHAT HAPPENS IF THE ERROR VARIANCE IS NONCONSTANT? 285

0 4

WAGEF

S S

1

8 12 16�4

1400

1200

1000

800

600

400

200

0

against estimated wages, wage regression (9.3)e2iFIGURE 9-7

8Note that we are plotting and not ei against Xi or because, as pointed out in Chapters 2 and 3, ei has zero correlation with both Xi and .

9R. E. Park, “Estimation with Heteroscedastic Error Terms,” Econometrica, vol. 34, no. 4, October 1966, p. 888.

YNi YNie

2 i

guj75845_ch09.qxd 4/16/09 12:20 PM Page 285

be systematically related to one or more explanatory variables. To see if this is the case, we can regress on one or more of the X variables. For example, in the two-variable model we can run the following regression:

(9.4)

where vi is a residual term. This is precisely what Park suggests. The particular functional form (9.4) that he chose was for convenience.

Unfortunately, regression (9.4) is not operational since we do not know the heteroscedastic variance . If we knew it, we could have solved the heteroscedasticity problem easily, as we will show in Section 9.4. Park suggests using ei as proxies for ui and running the following regression:

(9.5)

Instead of regressing the log of the squared residuals on the log of the X variable(s), you can also regress the squared residuals on the X variable, especially if some of the X values are negative. Where do we obtain ? They are obtained from the original regression, such as the model (9.3).

The Park test therefore involves the following steps:

1. Run the original regression despite the heteroscedasticity problem, if any.

2. From this regression, obtain the residuals ei, square them, and take their logs (most computer programs can do this routinely).

3. Run the regression (9.5) using an explanatory variable in the original model; if there is more than one explanatory variable, run the regression (9.5) against each X variable. Alternatively, run the regression (9.5) against , the estimated Y.10

4. Test the null hypothesis that B2 = 0; that is, there is no heteroscedasticity. If a statistically significant relationship exists between ln and ln Xi, the null hypothesis of no heteroscedasticity can be rejected, in which case we will have to take some remedial measure(s), which is discussed in Section 9.4.

5. If the null hypothesis is not rejected, then B1 in the regression (9.5) can be interpreted as giving the value of the common, or homoscedastic, variance, .�2

ei 2

YNi

ei 2

ln e2i = B1 + B2 ln Xi + vi

�2i

ln �2i = B1 + B2 ln Xi + vi

�2i

286 PART TWO: REGRESSION ANALYSIS IN PRACTICE

10The choice of the appropriate functional form to run the regression (9.5) should also be con- sidered. In some cases regressing on Xi might be the appropriate functional form; in some other cases ln may be the appropriate dependent variable.e2i

e2i

guj75845_ch09.qxd 4/16/09 12:20 PM Page 286

Example 9.3. Wage Regression and the Park Test

Let us illustrate the Park test with our wage example. Since there are two regressors, education and work experience, we have three options: We can regress wages on education only, or on experience only, or on both variables, as in Eq. (9.3), and obtain the squared residuals from these regressions. We can then regress the respective squared residuals on education only or on experience only or on both. We will use the third option, leaving the other two options for exercises at the end of the chapter.

Regressing squared residuals from Eq. (9.3) on the estimated wage values (Wagef) from this regression, we obtain the following empirical counterpart of the Park test:11

Dependent Variable: SS1 Method: Least Squares (9.6) Included observations: 523

Coefficient Std. Error t-Statistic Prob.

C -10.35965 11.79490 -0.878316 0.3802 WAGEF 3.467020 1.255228 2.762063 0.0059

R-squared 0.014432 Mean dependent var 21.25480 Adjusted R-squared 0.012540 S.D. dependent var 65.53846 S.E. of regression 65.12624 F-statistic 7.628992 Sum squared resid 2209783. Prob (F-statistic) 0.005947 Durbin-Watson stat 2.026039

Note: SS1 are squared residuals from regression (9.3) and Wagef are the forecast values of wage from regression (9.3).

Since the Wagef coefficient is statistically significant, it seems that the Park test shows evidence of heteroscedasticity.

Before we accept the results of the Park test, we should note some of the problems associated with the test: The error term in regression (9.6), vi, may itself be heteroscedastic.12 In that case, we are back to square one. More test- ing may be needed before we can conclude that the wage regression (9.3) is free from heteroscedasticity.

Glejser Test13

The Glejser test is similar in spirit to the Park test. After obtaining residuals ei from the original model, Glejser suggests regressing the absolute values of ei,

on the X variable that is thought to be closely associated with the�ei�,

CHAPTER NINE: HETEROSCEDASTICITY: WHAT HAPPENS IF THE ERROR VARIANCE IS NONCONSTANT? 287

11Since one forecast value of wage from Eq. (9.3) was negative, we cannot use the log transform. We will therefore use squared residuals as the regressand.

12We tested the residuals from Eq. (9.6) for heteroscedasticity. On the basis of the Breusch-Pagan test (see Exercise 9.23 and the White test (discussed below) we saw no evidence of heteroscedastic- ity, but the Glejser test (discussed below) showed that there was heteroscedasticity.

13H. Glejser, “A New Test for Heteroscedasticity,” Journal of the American Statistical Association (JASA), vol. 64, pp. 316–323.

guj75845_ch09.qxd 4/16/09 12:20 PM Page 287

heteroscedastic variance . Some functional forms that he has suggested for this regression are

(9.7)

(9.8)

(9.9)

The null hypothesis in each case is that there is no heteroscedasticity; that is, B2 = 0. If this hypothesis is rejected, there is probably evidence of heteroscedasticity.

Example 9.4. Wage Regression and the Glejser Test

The results of estimating these models from the residuals obtained from regression (9.3) are as follows:

(9.10)

(9.11)

(9.12)

Note that we are using Educ as the regressor. In Exercise (9.22) you are asked to use Exper and Wagef as regressors and compare your results with Equations (9.10) to (9.12). It seems the Glejser test in various forms suggests that the wage regression (9.3) probably suffers from heteroscedasticity.

A cautionary note regarding the Glejser test: As in the case of the Park test, the error term vi in the regressions suggested by Glejser can itself be heteroscedastic as well as serially correlated (see Chapter 10 on serial correla- tion). Glejser, however, has maintained that in large samples the preceding models are fairly good in detecting heteroscedasticity. Therefore, Glejser’s test can be used as a diagnostic tool in large samples. Since the squared residuals, rather than the absolute residuals, capture the spirit of the variance, tests based on squared residuals (such as Parle, White, and Breusch-Pagan) may be preferable to the Glejser test, as various examples discussed in this chapter will show.

t = (10.6923)(-2.6561) r2 = 0.133

�ei� = 4.3879 - 12.6224 1

Educi

t = (-2.5068)(5.1764) r2 = 0.0489

�ei� = -3.1905 + 1.82631Educi

t = (-0.4739)(5.5483) r2 = 0.0557

�ei� = -0.3208 + 0.2829Educi

�ei� = B1 + B2a 1 Xi b + vi

�ei� = B1 + B21Xi + vi

�ei� = B1 + B2Xi + vi

�2i

288 PART TWO: REGRESSION ANALYSIS IN PRACTICE

guj75845_ch09.qxd 4/16/09 12:20 PM Page 288

White’s General Heteroscedasticity Test14

White’s general test of heteroscedasticity is quite easy to apply. To see how the test is applied, suppose we have the following model:

(9.13)

White’s test proceeds as follows:

1. We first estimate regression (9.13) by OLS, obtaining the residuals, ei. 2. We then run the following auxiliary regression:

(9.14)

That is, the residuals obtained from the original regression (9.13) are squared and regressed on all the original variables, their squared values, and their cross-products. Additional powers of the original X variables can also be added. The term vi is the residual term in the auxiliary regression.

3. Obtain the R2 value from the auxiliary regression (9.14). Under the null hypothesis that there is no heteroscedasticity (i.e., all the slope coeffi- cients in Eq. [9.14] are zero), White has shown that the R2 value obtained from regression (9.14) times the sample size (=n), follows the distribu- tion with d.f. equal to the number of explanatory variables in regression (9.14) (excluding the intercept term):

(9.15)

where (k - 1) denotes d.f. In model (9.14) the d.f. are 5. 4. If the chi-square value obtained from Eq. (9.15) exceeds the critical chi-

square value at the chosen level of significance, or if the p value of the computed chi-square value is reasonably low (say 1% or 5%), we can reject the null hypothesis of no heteroscedasticity. On the other hand, if the p value of the computed chi-square value is reasonably large (say above 5% or 10%), we do not reject the null hypothesis.

Example 9.5. Wage Regression and White’s General Test of Heteroscedasticity

To illustrate White’s test, we continue with the wage regression (9.3). The empirical counterpart of Eq. (9.14) is as follows:

Heteroscedasticity Test: White

F-statistic 2.269163 Prob. F(5,517) 0.0465 Obs*R-squared 11.23102 Prob. Chi-Square(5) 0.0470 Scaled explained SS 52.67924 Prob. Chi-Square(5) 0.0000

n.R2 ' �2k-1

�2

e2i = A1 + A2X2i + A3X3i + A4X22i + A5X23i + A6X2iX3i + vi

Yi = B1 + B2X2i + B3X3i + ui

CHAPTER NINE: HETEROSCEDASTICITY: WHAT HAPPENS IF THE ERROR VARIANCE IS NONCONSTANT? 289

14H. White, “A Heteroscedasticity Consistent Covariance Matrix Estimator and a Direct Test of Heteroscedasticity,” Econometrica, vol. 48, no. 4, 1980, pp. 817–818.

guj75845_ch09.qxd 4/16/09 12:20 PM Page 289

Test Equation: Dependent Variable: RESID^2 (9.16) Method: Least Squares Included observations: 523

Coefficient Std. Error t-Statistic Prob.

C 14.38296 71.34726 0.201591 0.8403 EDUC -1.183296 9.137968 -0.129492 0.8970 EDUC^2 0.168639 0.300676 0.560865 0.5751 EDUC*EXPER 0.022239 0.104117 0.213591 0.8309 EXPER -1.401130 1.912126 -0.732760 0.4640 EXPER^2 0.027113 0.020969 1.293039 0.1966

R-squared 0.021474 Mean dependent var 21.25480 Adjusted R-squared 0.012011 S.D. dependent var 65.53846 S.E. of regression 65.14369 F-statistic 2.269163 Sum squared resid 2193993. Prob (F-statistic) 0.046542 Durbin-Watson stat 2.016101

For present purposes the important statistic is found through Eq. (9.15), which is 11.2310 in the present example. And this value is significant at the 5% level, again suggesting that the wage regression probably suffers from heteroscedasticity.

If we do not include the cross-product terms in the White test, we obtain with 2 d.f. This chi-square value has a probability of about

0.0078, which strongly suggests that the wage regression does suffer from heteroscedasticity.

As the various heteroscedasticity tests suggest, the overall conclusion seems to be that we do have the heteroscedasticity problem. This should not be a surprising finding, for in large cross-section data with heterogeneous units in the sample it is hard to maintain homogeneity.

Note : Although we have shown the various tests in detail, this labor can be reduced if we use statistical packages such as STATA and EViews. In EViews, for example, once you estimate a regression, you can click on the View button and choose the residuals test option. Once you invoke this option, EViews gives you a choice of several heteroscedasticity tests. Choosing one or more of these tests will provide the answer almost instantly.

Other Tests of Heteroscedasticity

The heteroscedasticity tests that we have discussed in this section by no means exhaust the list. We will now mention several other tests but will not discuss them here because a full discussion would take us far afield.

1. Spearman’s rank correlation test (see Problem 9.13). 2. Goldfeld-Quandt test. 3. Bartlett’s homogeneity-of-variance test. 4. Peak test.

n.R2 L 9.69,

290 PART TWO: REGRESSION ANALYSIS IN PRACTICE

guj75845_ch09.qxd 4/16/09 12:20 PM Page 290

5. Breusch-Pagan test. 6. CUSUMSQ test.

You may consult the references for details of these tests.15

9.4 WHAT TO DO IF HETEROSCEDASTICITY IS OBSERVED: REMEDIAL MEASURES

As we have seen, heteroscedasticity does not destroy the unbiasedness prop- erty of OLS estimators, but the estimators are no longer efficient, not even in large samples. This lack of efficiency makes the conventional OLS hypothesis- testing procedure of dubious value. Therefore, if heteroscedasticity is suspected or diagnosed, it is important to seek remedial measures.

For example, in our wage-education example, based on Figure 9-7, there was some indication that the wage regression given in Eq. (9.3) probably suf- fers from heteroscedasticity. This was confirmed by the Park, Glejser, and White tests. How can we solve this problem, if at all? Is there some way we can “transform” the model (9.3) so that there is homoscedasticity? But what kind of transformation? The answer depends on whether the true error variance, , is known or unknown.

When �2i Is Known:The Method of Weighted Least Squares (WLS)

To fix the ideas consider the two-variable PRF

(9.17)

where Y is, say, hourly wage earnings and X is education, as measured by years of schooling. Assume for the moment that the true error variance is known; that is, the error variance for each observation is known. Now consider the following “transformation” of the model (9.17):

(9.18)

All we have done here is to divide or “deflate” both the left- and right-hand sides of the regression (9.17) by the “known” , which is simply the square root of the variance .

Now let

(9.19)

We can call vi the “transformed” error term. Is vi homoscedastic? If it is, then the transformed regression (9.18) does not suffer from the problem of

ui = ui �i

�2i

�i

Yi �i

= B1a 1 �i b + B2 a

Xi

�i b +

ui �i

�2i

Yi = B1 + B2Xi + ui

�2i

CHAPTER NINE: HETEROSCEDASTICITY: WHAT HAPPENS IF THE ERROR VARIANCE IS NONCONSTANT? 291

15The Spearman’s rank correlation, the Goldfeld-Quandt, and the Breusch-Pagan tests are discussed in Gujarati and Porter, Basic Econometrics, 5th ed., McGraw-Hill, New York, 2009, Chapter 11. This text also gives references to the other tests mentioned earlier. See also Problem 9.13.

guj75845_ch09.qxd 4/16/09 12:20 PM Page 291

heteroscedasticity. Assuming all other assumptions of the CLRM are fulfilled, OLS estimators of the parameters in Equation (9.18) will be BLUE and we can then proceed to statistical inference in the usual manner.

Now it is not too difficult to show that the error term vi is homoscedastic. From Equation (9.19) we obtain

(9.20)

Therefore,

(9.21)

which is obviously a constant. In short, the transformed error term vi is homoscedastic. As a result, the transformed model (9.18) does not suffer from the heteroscedasticity problem, and therefore it can be estimated by the usual OLS method.

To estimate the regression (9.18) actually, you will have to instruct the computer to divide each Y and X observation by the known and run OLS regression on the data thus transformed. (Most computer packages now can do this routinely.) The OLS estimators of B1 and B2 thus obtained are called weighted least squares (WLS) estimators; each Y and X observation is weighted (i.e., divided) by its own (heteroscedastic) standard deviation, . Because of this weighting procedure, the OLS method in this context is known as the method of weighted least squares (WLS).16 (See Problem 9.14.)

When True �2i Is Unknown

Despite its intuitive simplicity, the WLS method of the model (9.18) begs an im- portant question: How do we know or find out the true error variance, ? As noted earlier, knowledge of the true error variance is a rarity. Therefore, if we want to use the method of WLS, we will have to resort to some ad hoc, although reasonably plausible, assumption(s) about and transform the original regression model so that the transformed model satisfies the homoscedasticity

�i 2

�i 2

�i

�i

= 1

= a 1

�2i b A�2i B because of Eq. (9.1)

= 1

�2i E Au2i B , since �

2 i is known

E Av2i B = Ea u2i

�2i b

v2i = v2i

�2i

292 PART TWO: REGRESSION ANALYSIS IN PRACTICE

16Note this technical point about the regression (9.18). To estimate it, you will have to instruct the computer to run the regression through the origin because there is no “explicit” intercept in Eq. (9.18)—the first term in this regression is . But the “slope” coefficient of is, in fact, the intercept coefficient B1. (Do you see this?) On the regression through the origin, see Chapter 5.

(1/�i)B1(1/�i)

guj75845_ch09.qxd 4/16/09 12:20 PM Page 292

assumption. OLS can then be applied to the transformed model, for, as shown earlier, WLS is simply OLS applied to the transformed data.17

In the absence of knowledge about the true the practical question then is, what assumption(s) can we make about the unknown error variance and how can we use the method of WLS? Here we consider several possibilities, which we discuss with the two-variable model (9.3); the extension to multiple regres- sion models can be made straightforwardly.

Case 1: The Error Variance Is Proportional to Xi: The Square Root Transformation If after estimating the usual OLS regression we plot the resid- uals from this regression against the explanatory variable X and observe a pat- tern similar to that shown in Figure 9-8, the indication is that the error variance is linearly related, or proportional, to X.

That is,

(9.22)E Au2i B = �2Xi

�i 2

CHAPTER NINE: HETEROSCEDASTICITY: WHAT HAPPENS IF THE ERROR VARIANCE IS NONCONSTANT? 293

17Note that in OLS we minimize

but in WLS we minimize

provided is known. See how in WLS we “deflate” the importance of an observation with larger variance, for the larger the error variance, the larger the divisor will be.

�i

a a ei �i b

2 = a c

Yi - b1 - b2Xi �i

d

2

a e 2 i = a (Yi - b1 - b2Xi)

2

X

σ i 2σ

Error variance proportional to XFIGURE 9-8

guj75845_ch09.qxd 4/16/09 12:20 PM Page 293

which states that the heteroscedastic variance is proportional, or linearly re- lated, to Xi; the constant (no subscript on ) is the factor of proportionality.

Given the assumption in Equation (9.22), suppose we transform the model (9.17) as follows:

(9.23)

where . That is, we divide both sides of the model (9.17) by the square root of Xi. Equation (9.23) is an example of what is known as the square root transformation.

Following the development of Equation (9.21), it can be proved easily that the error variance vi in the transformed regression is homoscedastic, and there- fore we can estimate Eq. (9.23) by the usual OLS method. Actually we are using the WLS method here. (Why?)18 It is important to note that to estimate Eq. (9.23) we must use the regression-through-the-origin estimating procedure. Most standard re- gression software packages do this routinely.

Example 9.6. Transformed Wage Regression

Let us illustrate with our wage regression (9.3). The empirical counterpart of Eq. (9.23) is as follows:

Dependent Variable: WAGE/(@SQRT(EDUC)) Method: Least Squares (9.24) Included observations: 523

Coefficient Std. Error t-Statistic Prob.

1/@SQRT(EDUC) -2.645605 1.076890 -2.456708 0.0143 @SQRT(EDUC) 0.781380 0.071763 10.88840 0.0000 EXPER/(@SQRT(EDUC)) 0.087698 0.016368 5.357896 0.0000

R-squared 0.084405 Mean dependent var 2.517214

Adjusted R-squared 0.080884 S.D. dependent var 1.316767 S.E. of regression 1.262392 Durbin-Watson stat 1.819673 Sum squared resid 828.6893

Note: Suppress the intercept when you run this regression.

vi = ui/1Xi

= B1 1

2Xi + B22Xi + vi

Yi

2Xi = B1

1

2Xi + B2

Xi

2Xi +

ui

2Xi

�2�2

294 PART TWO: REGRESSION ANALYSIS IN PRACTICE

18Since . Therefore,

that is, homoscedasticity. Note that the X variable is nonstochastic.

E(v2i ) = E Au2i B

Xi = �2 a

Xi Xi b = �2

vi = ui/2Xi, v2i = u2i /Xi

guj75845_ch09.qxd 4/16/09 12:20 PM Page 294

To get back to the original (untransformed) wage equation, just multiply both sides of Eq. (9.24) by , which gives

(9.25)

If you compare this regression with the original regression (9.3), you will see that the estimated regression coefficients are not the same. The reason for the difference could be that we are using as the deflator.

Incidentally, we tested the squared residuals from Eq. (9.24) for het- eroscedasticity and found that, on the basis of the Breusch-Pagan and White tests, there was no evidence of heteroscedasticity. The Glejser test, however, showed that there was heteroscedasticity.

A question: What happens if there is more than one explanatory variable in the model? In this case we can transform the model as shown in Eq. (9.23) using any one of the X variables that, say, on the basis of graphical plot, seems the ap- propriate candidate (see Problem 9.7). But what if more than one X variable is a candidate? In this case instead of using any of the X’s, we can use the , the es- timated mean value of Yi, as the transforming variable, for as we know, is a linear combination of the X’s.

Case 2: Error Variance Proportional to If the estimated residuals show a pattern similar to Figure 9-9, it suggests that the error variance is not linearly related to X but increases proportional to the square of X. Symbolically,

(9.26)E Au2i B = �2X2i

X2i

YNi YNi

1Educ

Wagei = -2.6456 + 0.7813 Educi + 0.0876 Experi

1Educi

CHAPTER NINE: HETEROSCEDASTICITY: WHAT HAPPENS IF THE ERROR VARIANCE IS NONCONSTANT? 295

X

σ i 2σ

Error variance proportional to X 2FIGURE 9-9

guj75845_ch09.qxd 4/16/09 12:20 PM Page 295

In this case the appropriate transformation of the two-variable model consid- ered previously is to divide both sides of the model by Xi, rather than by the square root of Xi, as follows:

(9.27)

where . Following the earlier development, we can verify easily that the error term

v in Equation (9.27) is homoscedastic. Hence, the OLS estimation of Eq. (9.27), which is actually a WLS estimation, will produce BLUE estimators. (Keep in mind that we are still keeping intact all the other assumptions of the CLRM.)

An interesting feature of Eq. (9.27) is that what was originally the slope coef- ficient now becomes the intercept, and what was originally the intercept now becomes the slope coefficient. But this change is only for estimation; once we estimate Eq. (9.27), multiplying by Xi on both sides, we get back to the original model.

The results of applying Eq. (9.27) to our wage-education model are as follows:

Dependent Variable: WAGE/EDUC Method: Least Squares (9.27a) Included observations: 523

Coefficient Std. Error t-Statistic Prob.

C 0.585431 0.051284 11.41551 0.0000 1/EDUC 0.090268 0.762246 0.118424 0.9058 EXPER/EDUC 0.070930 0.013836 5.126660 0.0000

R-squared 0.095542 Mean dependent var 0.705677 Adjusted R-squared 0.092063 S.D. dependent var 0.371773 S.E. of regression 0.354247 F-statistic 27.46492 Sum squared resid 65.25527 Prob (F-statistic) 0.000000 Durbin-Watson stat 1.755325

Multiplying the preceding equation by Educ on both sides, we obtain:

When this regression was tested for heteroscedasticity, we found that there was no evidence of it on the basis of the Breusch-Pagan and White tests, but the Glejser test did show heteroscedasticity.

Comparing this equation with Eq. (9.3), we can see that the coefficients of the two equations are not the same. This might very well be due to the particular deflator we have used on the transformation. As this example

Wagei = 0.0902 + 0.5854 Educi + 0.0709 Experi

vi = ui/Xi

= B1 a 1 Xi b + B2 + vi

Yi Xi

= B1a 1 Xi b + B2 + a

ui Xi b

296 PART TWO: REGRESSION ANALYSIS IN PRACTICE

guj75845_ch09.qxd 4/16/09 12:20 PM Page 296

shows, it may not always be easy to find the right deflator. Some amount of trial and error is inevitable.

In Problem 9.24 you are asked to use Wagef instead of Educ as the deflator to see if the preceding conclusion changes. Since Wagef takes into account both Educ and Exper variables, the results based on this deflator may be more preferable.

Respecification of the Model

Instead of speculating about , sometimes a respecification of the PRF—choosing a different functional form (see Chapter 5)—can reduce heteroscedasticity. For example, instead of running the linear-in-variable (LIV) regression, if we esti- mate the model in the log form, it often reduces heteroscedasticity. That is, if we estimate

(9.28)

the heteroscedasticity problem may be less serious in this transformation be- cause the log transformation compresses the scales in which the variables are measured, thereby reducing a tenfold difference between two values to a twofold difference. Thus, the number 90 is 10 times the number 9, but ln 90 (= 4.4998) is only about 2 times as large as ln 9(= 2.1972).

An incidental advantage of the log-linear, or double-log, model, as we have seen in Chapter 5, is that the slope coefficient B2 measures the elasticity of Y with respect to X, that is, the percentage change in Y for a percentage change in X.

Whether we should fit the LIV model or a log-linear model in a given instance has to be determined by theoretical and other considerations that we discussed in Chapter 7. But if there is no strong preference for either one, and if the het- eroscedasticity problem is severe in the LIV model, we can try the double-log model.

Example 9.7. Log-linear Model for the Wage Data

For the wage-education data, the empirical counterpart of Eq. (9.28) is as follows:

Dependent Variable: LOG(WAGE) Method: Least Squares (9.29) Included observations: 523

Coefficient Std. Error t-Statistic Prob.

C -0.794552 0.259204 -3.065354 0.0023 LOG(EDUC) 0.957322 0.091702 10.43948 0.0000 LOG(EXPER) 0.166189 0.024690 6.731001 0.0000

R-squared 0.193841 Mean dependent var 2.072301 Adjusted R-squared 0.190740 S.D. dependent var 0.522545 S.E. of regression 0.470076 F-statistic 62.51699 Sum squared resid 114.9050 Prob (F-statistic) 0.000000 Durbin-Watson stat 1.772461

ln Yi = B1 + B2 ln Xi + ui

�2i

CHAPTER NINE: HETEROSCEDASTICITY: WHAT HAPPENS IF THE ERROR VARIANCE IS NONCONSTANT? 297

guj75845_ch09.qxd 4/16/09 12:20 PM Page 297

Since this is a double-log model, the coefficients of log(Educ) and log(Exper) represent elasticities, elasticity of wage with respect to education and elastic- ity of wage with respect to experience, respectively. Both of these elasticities are highly significant, judged by their p values.

Before we accept these results, we need to check if regression (9.29) suffers from heteroscedasticity. Applying the Breusch-Pagan, Glejser, and White (no interaction terms) tests, we find no evidence of heteroscedasticity.

While the linear model (9.3) showed heteroscedasticity, the log-linear model shows the opposite. This shows that choosing the right model may be critical in resolving heteroscedasticity.

In Problem 9.9 you are asked to examine the preceding regression to find out if heteroscedasticity exists. If the regression (9.29) is not plagued by the het- eroscedasticity problem, then this model is preferable to the LIV model, which had this problem present, necessitating the transformation of variables, as in the regression (9.24).

In passing, note that all the transformations we have discussed earlier to remove heteroscedasticity are known in the literature as variance stabilizing transformations, which is another name for obtaining homoscedastic variances.

To conclude our discussion on remedial measures, we should reiterate that all transformations discussed previously are to some extent ad hoc; in the absence of precise knowledge about true , we are essentially speculating about what it might be. Which of the transformations we have considered will work depends upon the nature of the problem and the severity of the heteroscedasticity. Also note that sometimes the error variance may not be related to any of the explana- tory variables included in the model. Rather, it may be related to a variable that was originally a candidate for inclusion in the model but was not initially in- cluded. In this case the model can be transformed using that variable. Of course, if a variable logically belonged in the model, it should have been included in the first place, as we noted in Chapter 7.

9.5 WHITE’S HETEROSCEDASTICITY-CORRECTED STANDARD ERRORS AND t STATISTICS

As we have noted, in the presence of heteroscedasticity, the OLS estimators, al- though unbiased, are inefficient. As a result, the conventionally computed stan- dard errors and t statistics of the estimators are suspect. White has developed an estimating procedure that produces standard errors of estimated regression coefficients that take into account heteroscedasticity. As a result, we can continue to use the t and F tests, except that they are now valid asymptotically, that is, in large samples. It should be pointed out that White’s procedure does not change the values of the regression coefficients but only their standard errors.19

�2i

298 PART TWO: REGRESSION ANALYSIS IN PRACTICE

19The derivation of White’s heteroscedasticity-corrected standard errors is beyond the scope of this book. Interested readers may refer to Jack Johnston and John DiNardo, Econometrics Methods, 4th ed., McGraw-Hill, New York, 1997, Chapter 6.

guj75845_ch09.qxd 4/16/09 12:20 PM Page 298

To see how the conventionally computed standard errors and t statistics can be misleading in the presence of heteroscedasticity, let us return to the wage re- gression (9.3). Using Eviews, we obtained the following results:

Dependent Variable: WAGE Method: Least Squares (9.30)

Sample: 1 533 Included observations: 533 White’s Heteroscedasticity-Consistent Standard Errors and Covariance

Coefficient Std. Error t-Statistic Prob.

C -4.857541 1.259182 -3.857695 0.0001 EDUC 0.923849 0.088110 10.48517 0.0000 EXPER 0.104346 0.018083 5.770424 0.0000

R-squared 0.200778 Mean dependent var 9.034709 Adjusted R-squared 0.197762 S.D. dependent var 5.138028 S.E. of regression 4.602016 F-statistic 66.57232 Sum squared resid 11224.63 Prob (F-statistic) 0.000000 Durbin-Watson stat 1.839859

As noted, the regression coefficients of Eq. (9.3) and Eq. (9.30) are the same; the only difference is in their estimated standard errors and, therefore, the esti- mated t ratios. Since the standard errors of the slope coefficients under White’s procedure are higher (and the t ratios lower), it seems Eq. (9.3) underestimated the true standard errors. Even then, the estimated t ratios under White’s proce- dure are highly statistically significant, for their p values are practically zero.

This example shows that heteroscedasticity need not destroy the statistical signifi- cance of the estimated regression coefficients, provided we correct the standard errors once we find that we have the problem of heteroscedasticity.

9.6 SOME CONCRETE EXAMPLES OF HETEROSCEDASTICITY

We end this chapter by presenting three examples to show the importance of heteroscedasticity in applied work.

Example 9.8. Economies of Scale or Heteroscedasticity

The New York Stock Exchange (NYSE) was initially very much opposed to the deregulation of brokerage commission rates. As a matter of fact, in an econometric study presented to the Securities and Exchange Commission (SEC) before deregulation was introduced on May 1, 1975, the NYSE argued that there were economies of scale in the brokerage industry and therefore the (monopolistically determined) fixed rate commissions were justifiable.20

CHAPTER NINE: HETEROSCEDASTICITY: WHAT HAPPENS IF THE ERROR VARIANCE IS NONCONSTANT? 299

20Sometimes economists offer the economies of scale argument to justify monopolies in certain in- dustries, especially in the so-called natural monopolies (e.g., the electric and gas-generating utilities).

guj75845_ch09.qxd 4/16/09 12:20 PM Page 299

The econometric study that the NYSE submitted basically revolved around the following regression:21

(9.31)

where Y = the total cost and X = the number of share transactions. From the model (9.31) we see that the total cost is positively related to the volume of transactions. But since the quadratic term in the transaction variable is nega- tive and “statistically significant,” it implies that the total cost is increasing at a decreasing rate. Therefore, argued the NYSE, there were economies of scale in the brokerage industry, justifying the monopoly status of the NYSE.

But the antitrust division of the U.S. Department of Justice argued that the so-called economies of scale claimed in model (9.31) are a mirage, for the re- gression (9.31) was plagued by the problem of heteroscedasticity. This was because in estimating the cost function in Eq. (9.31) the NYSE did not take into account that small and large firms were included in the sample. That is, it did not take into account the scale factor. Assuming that the error term was proportional to the volume of transaction (see Eq. [9.22]), the antitrust divi- sion reestimated Eq. (9.31), obtaining the following result:22

(9.32)

Lo and behold, not only is the quadratic term statistically insignificant, but also it has the wrong sign.23 Thus, there are no economies of scale in the brokerage industry, demolishing the NYSE’s argument for retaining its monopoly commission structure.

The preceding example shows dramatically how the assumption of homo- scedasticity underlying Eq. (9.31) could have been potentially damaging. Imagine what would have happened if the SEC had accepted Eq. (9.31) on its face value and allowed the NYSE to fix the commission rates monopolistically, as before May 1, 1975!

YNi = 342,000 + 25.57 Xi + (4.34 * 10-6)X2i t = (32.3) (7.07) (0.503)

t = (2.98) (40.39) (-6.54) R2 = 0.934YNi = 476,000 + 31.348Xi - (1.083 * 10-6) X2i

300 PART TWO: REGRESSION ANALYSIS IN PRACTICE

21The results given in regressions (9.31) and (9.32) are reproduced from H. Michael Mann, “The New York Stock Exchange: A Cartel at the End of Its Reign” in Almarin Phillips (ed.), Promoting Competition in Regulated Industries, Brookings Institution, Washington D.C., 1975, p. 324.

22The actual mechanics consisted of estimating Eq. (9.23) shown in the text. Once this equation was estimated, it was multiplied by to get back to the original equation, which is presented in Eq. (9.32).

23The NYSE in response said that the particular heteroscedasticity assumption used by the antitrust division was not valid. Substitution of other assumptions still supports the antitrust division’s finding that there were no economies of scale in the brokerage industry. For details, see the Mann article cited in footnote 21.

1Xi

guj75845_ch09.qxd 4/16/09 12:20 PM Page 300

Example 9.9. Highway Capacity and Economic Growth

In support of his argument that economies with superior surface transporta- tion infrastructure will benefit through higher productivity and per capita income growth, David A. Aschauer24 obtained the results presented in Table 9-3. Since the study was conducted over a cross section of 48 states in the United States, “there is presumption that the error structure may not be homoskedastic” (p. 18).25

However, in the present instance the presumption of heteroscedasticity was just that since correcting for heteroscedasticity in various ways did not change OLS results much. But this example shows that if there is a presumption of heteroscedasticity, we should look into it rather than assume away the problem. As noted earlier, and as the NYSE economies of scale example so well demon- strates, heteroscedasticity is potentially a very serious problem and must not be taken lightly. It is better to err on the side of safety!

CHAPTER NINE: HETEROSCEDASTICITY: WHAT HAPPENS IF THE ERROR VARIANCE IS NONCONSTANT? 301

PER CAPITA INCOME GROWTH AND HIGHWAY CAPACITY

Explanatory variable OLS WLS1 WLS2 WLS3

Constant -7.69 -7.94 -8.19 -7.62 se = (1.08) (1.08) (1.09) (1.08)

lnX2 (in 1960) -1.59 -1.64 -1.69 -1.58 se = (0.18) (0.19) (0.19) (0.18)

ln X3 0.30 0.30 0.31 0.30 se = (0.06) (0.06) (0.06) (0.06)

X4 -0.009 -0.100 -0.011 -0.008 se = (0.003) (0.003) (0.003) (0.003)

D –31.00 -32.00 -33.00 -31.00 se = (0.08) (0.08) (0.08) (0.08) R2 = 0.67 0.49 0.46 0.73

Notes: Dependent variable Y: Average annual growth of per capita income (1972 $) from 1960 to 1980.

X2 = The level of per capita income (1972 $) in the base year 1960 X3 = The total existing road mileage, average over 1960 to 1985 X4 = The percentage of highway mileage of deficient quality in 1982 D = Dummy = 1 if midwest region, 0 if otherwise

WLS1 = Weighted least squares using the square root of X2 (see Eq. [9.23]) WLS2 = Weighted least squares using the level of X2 (see Eq. [9.27]) WLS3 = Weighted least squares using the level of ln X2 Source: David A. Aschauer, “Highway Capacity and Economic Growth,” Economic

Perspectives, Federal Reserve Bank of Chicago, September/October 1990, Table 1, p. 18. Notation is adapted.

TABLE 9-3

24This example and the statistical results presented in Table 9-3 are obtained from David A. Aschauer, “Highway Capacity and Economic Growth,” Economic Perspectives, Federal Reserve Bank of Chicago, pp. 14–23, September/October 1990.

25A historical note: Is it heteroscedasticity or heteroskedasticity? It is the latter, but the former is so well established in the literature that we only occasionally find the word spelled with a k.

guj75845_ch09.qxd 4/16/09 12:20 PM Page 301

Example 9.10. An Extended Wage Model

For pedagogic reasons, we have presented a simple model of wage determi- nation in this chapter. But using the data in Table 9-2, we now present a more refined model:

Dependent Variable: LOG(WAGE) Method: Least Squares Included observations: 523

Coefficient Std. Error t-Statistic Prob.

C 0.773947 0.123314 6.276238 0.0000 EDUC 0.091251 0.007923 11.51748 0.0000 EXPER 0.009712 0.001757 5.528884 0.0000 SEX -0.244064 0.039288 -6.212101 0.0000 (9.33) MARSTAT 0.069315 0.042214 1.641993 0.1012 REGION -0.115626 0.042945 -2.692413 0.0073 UNION 0.183644 0.050956 3.603982 0.0003

R-squared 0.301086 Mean dependent var 2.072301 Adjusted R-squared 0.292959 S.D. dependent var 0.522545 S.E. of regression 0.439386 F-statistic 37.04803 Sum squared resid 99.61894 Prob (F-statistic) 0.000000 Durbin-Watson stat 1.861383

Note: Sex = 1 for female; Marstat = 1 if married; Region = 1 if in the South; and Union = 1, if a union member.

In Equation (9.33) we have presented a semi-log model, with the wage variable in the logarithmic form and the regressors in the linear form. In the literature on wage modeling, the wage variable is often expressed in the log form. The coefficients of the Educ and Exper variables represent semi-elasticities. For ex- ample, the coefficient of Educ of about 0.091 means that, holding the other vari- ables constant, if years of schooling increase by one year, on average, wages go up by about 9.1 percent. For the interpretation of the dummy variables, see Problem 9.25.

The estimated equation was tested for heteroscedasticity. On the basis of the Breusch-Pagan and White tests (with cross-product terms), there is no evidence of heteroscedasticity. This was confirmed when Eq. (9.33) was estimated with White’s heteroscedasticity-corrected standard errors test. In fact, there was no difference between the OLS results and the White procedure standard errors.

9.7 SUMMARY

A critical assumption of the classical linear regression model is that the disturbances ui all have the same (i.e., homoscedastic) variance. If this assump- tion is not satisfied, we have heteroscedasticity. Heteroscedasticity does not destroy the unbiasedness property of OLS estimators, but these estimators are

302 PART TWO: REGRESSION ANALYSIS IN PRACTICE

guj75845_ch09.qxd 4/16/09 12:20 PM Page 302

no longer efficient. In other words, OLS estimators are no longer BLUE. If heteroscedastic variances are known, then the method of weighted least squares (WLS) provides BLUE estimators.

Despite heteroscedasticity, if we continue to use the usual OLS method not only to estimate the parameters (which remain unbiased) but also to establish confidence intervals and test hypotheses, we are likely to draw misleading conclusions, as in the NYSE Example 9.8. This is because estimated standard errors are likely to be biased and therefore the resulting t ratios are likely to be biased, too. Thus, it is important to find out whether we are faced with the heteroscedasticity problem in a specific application. There are several diagnostic tests of heteroscedasticity, such as plotting the estimated residuals against one or more of the explanatory variables, the Park test, the Glejser test, or the rank correlation test (See Problem 9.13).

If one or more diagnostic tests reveal that we have the heteroscedasticity problem, remedial measures are called for. If the true error variance is known, we can use the method of WLS to obtain BLUE estimators. Unfortunately, knowledge about the true error variance is rarely available in practice. As a result, we are forced to make some plausible assumptions about the nature of heteroscedasticity and to transform our data so that in the transformed model the error term is homoscedastic. We then apply OLS to the transformed data, which amounts to using WLS. Of course, some skill and experience are required to obtain the appropriate transformations. But without such a transformation, the problem of heteroscedasticity is insoluble in practice. However, if the sample size is reasonably large, we can use White’s procedure to obtain heteroscedasticity- corrected standard errors.

KEY TERMS AND CONCEPTS

The key terms and concepts introduced in this chapter are

�2i

�2i

CHAPTER NINE: HETEROSCEDASTICITY: WHAT HAPPENS IF THE ERROR VARIANCE IS NONCONSTANT? 303

Homoscedasticity (or equal variance) Heteroscedasticity (or unequal

variance) a) cross-sectional data b) scale effect

Detection of heteroscedasticity a) residual plots b) Park test c) Glejser test d) White’s general

heteroscedasticity test Other tests of heteroscedasticity

a) Spearman’s rank correlation test

b) Goldfeld-Quandt test c) Bartlett’s homogeneity-of-

variance test d) Peak test e) Breusch-Pagan test f) CUSUMSQ test

Weighted least squares (WLS) estimators

Square root transformation Variance stabilizing transformations White’s heteroscedasticity-

corrected standard errors and t statistics

guj75845_ch09.qxd 4/16/09 12:20 PM Page 303

QUESTIONS

9.1. What is meant by heteroscedasticity? What are its effects on the following? a. Ordinary least squares (OLS) estimators and their variances. b. Confidence intervals. c. The use of t and F tests of significance.

9.2. State with brief reasons whether the following statements are true or false: a. In the presence of heteroscedasticity OLS estimators are biased as well as

inefficient. b. If heteroscedasticity is present, the conventional t and F tests are invalid. c. In the presence of heteroscedasticity the usual OLS method always

overestimates the standard errors of estimators. d. If residuals estimated from an OLS regression exhibit a systematic pattern,

it means heteroscedasticity is present in the data. e. There is no general test of heteroscedasticity that is free of any assumption

about which variable the error term is correlated with. 9.3. Would you expect heteroscedasticity to be present in the following regressions?

Y X Sample

(a) Corporate profits Net worth Fortune 500 (b) Log of corporate Log of net worth Fortune 500

profits (c) Dow Jones industrial Time 1960–1990 (annual averages)

average (d) Infant mortality rate Per capita income 100 developed and

developing countries (e) Inflation rate Money growth rate United States, Canada, and

15 Latin American countries

9.4. Explain intuitively why the method of weighted least squares (WLS) is superior to OLS if heteroscedasticity is present.

9.5. Explain briefly the logic behind the following methods of detecting hetero- scedasticity: a. The graphical method b. The Park test c. The Glejser test

PROBLEMS

9.6. In the two-variable population regression function (PRF), suppose the error variance has the following structure:

How would you transform the model to achieve homoscedastic error variance? How would you estimate the transformed model? List the various steps.

E(u2i ) = �2Xi4

304 PART TWO: REGRESSION ANALYSIS IN PRACTICE

guj75845_ch09.qxd 4/16/09 12:20 PM Page 304

9.7. Consider the following two regressions based on the U.S. data for 1946 to 1975.26 (Standard errors are in parentheses.)

(0.0736)

(0.0597)

where C = aggregate private consumption expenditure GNP = gross national product

D = national defense expenditure t = time

The objective of Hanushek and Jackson’s study was to find out the effect of defense expenditure on other expenditures in the economy. a. What might be the reason(s) for transforming the first equation into the

second equation? b. If the objective of the transformation was to remove or reduce heteroscedas-

ticity, what assumption has been made about the error variance? c. If there was heteroscedasticity, have the authors succeeded in removing it?

How can you tell? d. Does the transformed regression have to be run through the origin? Why

or why not? e. Can you compare the R2 values of the two regressions? Why or why not?

9.8. In a study of population density as a function of distance from the central business district, Maddala obtained the following regression results based on a sample of 39 census tracts in the Baltimore area in 1970:27

where Y = the population density in the census tract and X = the distance in miles from the central business district. a. What assumption, if any, is the author making about heteroscedasticity in

his data?

t = (47.87) (-15.10)

ln Yi

2Xi = 9.932

1

2Xi - 0.22581Xi

t = (54.7)(-12.28) R2 = 0.803

ln Yi = 10.093 - 0.239Xi

R2 = 0.875se = (2.22)(0.0068)

a C

GNP b

t = 25.92

1 GNPt

+ 0.6246 - 0.4315 D

GNPt

R2 = 0.999se = (2.73) (0.0060)

Ct = 26.19 + 0.6248GNPt - 0.4398Dt

CHAPTER NINE: HETEROSCEDASTICITY: WHAT HAPPENS IF THE ERROR VARIANCE IS NONCONSTANT? 305

26These results are from Eric A. Hanushek and John E. Jackson, Statistical Methods for Social Scientists, Academic, New York, 1977, p. 160.

27G. S. Maddala, Introduction to Econometrics, Macmillan, New York, 1988, pp. 175–177.

guj75845_ch09.qxd 4/16/09 12:20 PM Page 305

b. How can you tell from the transformed WLS regression that heteroscedas- ticity, if present, has been removed or reduced?

c. How would you interpret the regression results? Do they make economic sense?

9.9. Refer to the wage data given in Table 9-2 (found on the textbook’s Web site). Regression (9.30) gives the results of the regression of the log of wage on the log of education. a. Based on the data of Table 9-2, verify this regression. b. For this regression, obtain the absolute values of the residuals as well as

their squared values and plot each against education. Is there any evi- dence of heteroscedasticity?

c. Do the Park and Glejser tests on the residuals of this regression. What con- clusions can you draw?

d. If heteroscedasticity is found in the double-log model, what kind of WLS transformation would you recommend to eliminate it?

e. For the linear regression (9.3) there was some evidence of heteroscedastic- ity. If for the log-log model there is no evidence of heteroscedasticity, which model would you choose and why?

f. Can you compare the R2s of the two regressions? Why not? 9.10. Continue with the wage data given in Table 9-2 (found on the textbook’s Web

site) and now consider the following regressions:

wagei = A1 + A2 experiencei + ui ln wagei = B1 + B2 ln experiencei + ui

a. Estimate both regressions. b. Obtain the absolute and squared values of the residuals for each regres-

sion and plot them against the explanatory variable. Do you detect any evidence of heteroscedasticity?

c. Verify your qualitative conclusion in part (b) with the Glejser and Park tests.

d. If there is evidence of heteroscedasticity, how would you transform the data to reduce its severity? Show the necessary calculations.

9.11. Consider Figure 9-10, which plots the gross domestic product (GDP) growth, in percent, against the ratio of investment/GDP, in percent, for several coun- tries for 1974 to 1985.28 The various countries are divided into three groups— those that experienced positive real (i.e., inflation-adjusted) interest rates, those that experienced moderately negative real interest rates, and those that experienced strongly negative interest rates. a. Develop a suitable model to explain the percent GDP growth rate in

relation to percent investment/GDP rate. b. From Figure 9-10, do you see any evidence of heteroscedasticity in the

data? How would you test its presence formally? c. If heteroscedasticity is suspected, how would you transform your regres-

sion to eliminate it? d. Suppose you were to extend your model to take into account the qualita-

tive differences in the three groups of countries by representing them with

306 PART TWO: REGRESSION ANALYSIS IN PRACTICE

28See World Development Report, 1989, the World Bank, Oxford University Press, New York, p. 33.

guj75845_ch09.qxd 4/16/09 12:20 PM Page 306

dummy variables. Write the equation for this model. If you had the data and could estimate this expanded model, would you expect heteroscedas- ticity in the extended model? Why or why not?

9.12. In a survey of 9,966 economists in 1964 the following data were obtained:

Age Median salary Age Median salary (years) ($) (years) ($)

20–24 7,800 50–54 15,000 25–29 8,400 55–59 15,000 30–34 9,700 60–64 15,000 35–39 11,500 65–69 14,500 40–44 13,000 70+ 12,000 45–49 14,800

Source: “The Structure of Economists’ Employment and Salaries,” Committee on the National Science Foundation Report on the Economics Profession, American Economics Review, vol. 55, no. 4, December 1965, p. 36.

CHAPTER NINE: HETEROSCEDASTICITY: WHAT HAPPENS IF THE ERROR VARIANCE IS NONCONSTANT? 307

�1

0

G D

P G

ro w

th R

at e

(p er

ce n

t)

9

8

7

6

5

4

3

2

1

�2

5 10 15 20 25 30 35 40

a Line represents sample average.

Investment/GDP (percent)

Average productivity of investmenta

Ghana

Jamaica Nigeria Zaire

ZambiaArgentina

Peru

Uruguay

Côte d'Ivoire

Malawi

Venezuela

Morocco

Philippines

Portugal

Yugoslavia

Sierra Leone Chile

Senegal Tanzania

Turkey Brazil

India Sri Lanka

Pakistan Indonesia Thailand

Ecuador Mexico

Malaysia

Tunisia

Republic of Korea

Algeria Singapore

Positive real interest rates Moderately negative real interest rates (0 to �5 percent) Strongly negative real interest rates

45

Real interest rates, investment, productivity, and growth in 33 developing countries from 1974 to 1985

Source: World Development Report, 1989. Copyright © by the International Bank for Reconstruction & Development/The World Bank. Reprinted by permission of the Oxford University Press, Inc., p. 33.

FIGURE 9-10

guj75845_ch09.qxd 4/16/09 12:20 PM Page 307

a. Develop a suitable regression model to explain median salary in relation to age. For the purpose of regression, assume that median salaries refer to the midpoint of the age interval.

b. Assuming error variance proportional to age, transform the data and obtain the WLS regression.

c. Now assume that it is proportional to the square of age. Obtain the WLS regression on this assumption.

d. Which assumption seems more plausible? 9.13. Spearman’s rank correlation test for heteroscedasticity. The following steps are

involved in this test, which can be explained with the wage regression (9.3): a. From the regression (9.3), obtain the residuals ei. b. Obtain the absolute value of the residuals, . c. Rank both education (Xi) and in either descending (highest to lowest)

or ascending (lowest to highest) order. d. Take the difference between the two ranks for each observation, call it di. e. Compute the Spearman’s rank correlation coefficient rs, defined as

where n = the number of observations in the sample. If there is a systematic relationship between ei and Xi, the rank correla-

tion coefficient between the two should be statistically significant, in which case heteroscedasticity can be suspected.

Given the null hypothesis that the true population rank correlation coefficient is zero and that n � 8, it can be shown that

follows Student’s t distribution with (n - 2) d.f. Therefore, if in an application the rank correlation coefficient is significant

on the basis of the t test, we do not reject the hypothesis that there is het- eroscedasticity in the problem. Apply this method to the wage data given in the text to find out if there is evidence of heteroscedasticity in the data.

9.14. Weighted least squares. Consider the data in Table 9-4. a. Estimate the OLS regression

b. Estimate the WLS

(Make sure that you run the WLS through the origin.) Compare the results of the two regressions. Which regression do you prefer? Why?

9.15. Show that the error term vi in Eq. (9.27) is homoscedastic.

Yi �i

= B1 1 �i

+ B2 Xi �i

+ ui �i

Yi = B1 + B2Xi + ui

rs1(n - 2)

11 - r2s ' tn-2

rs = 1 - 6 c gd2i

n(n2 - 1) d

�ei� �ei�

308 PART TWO: REGRESSION ANALYSIS IN PRACTICE

guj75845_ch09.qxd 4/16/09 12:20 PM Page 308

9.16. In a regression of average wages (W) on the number of employees (N) for a random sample of 30 firms, the following regression results were obtained:29

(1)

(2)

a. How would you interpret the two regressions? b. What is the author assuming in going from Eq. (1) to (2)? Was he worried

about heteroscedasticity? c. Can you relate the slopes and the intercepts of the two models? d. Can you compare the R2 values of the two models? Why or why not?

9.17. From the total cost function given in the NYSE regression (9.31), how would you derive the average cost function? And the marginal cost function? But if Eq. (9.32) is the true (i.e., heteroscedasticity-adjusted) total cost function, how would you derive the associated average and marginal cost func- tions? Explain the difference between the two models.

9.18. Table 9-5, on the textbook’s Web site, gives data on five socioeconomic indicators for a sample of 20 countries, divided into four per-capita income categories: low-income (up to $500 per year), lower-middle income (annual income between $500 and $2200), upper-middle income (annual income be- tween $2300 and $5500), and higher-income (over $5500 a year). The first five

t = (14.43) (76.58) R2 = 0.99

NW N

= 0.008 + 7.8 1 N

t = N.A.(16.10) R2 = 0.90

N W = 7.5 + 0.009N

CHAPTER NINE: HETEROSCEDASTICITY: WHAT HAPPENS IF THE ERROR VARIANCE IS NONCONSTANT? 309

AVERAGE COMPENSATION IN RELATION TO PRODUCTIVITY BY EMPLOYMENT SIZE, U.S. MANUFACTURING INDUSTRIES

Average Average Standard deviation Employment size compensation productivity of compensation (average number Y X

of employees) ($) ($) ($) (1) (2) (3) (4)

1–4 3,396 9,355 744 5–9 3,787 8,584 851

10–19 4,013 7,962 728 20–49 4,104 8,275 805 50–99 4,146 8,389 930

100–249 4,241 9,418 1,081 250–499 4,387 9,795 1,243 500–999 4,538 10,281 1,308

1,000–2,499 4,843 11,754 1,112

Source: Data from The Census of Manufacturing, U.S. Department of Commerce, 1958. (Figures in table computed by the author.)

�i

TABLE 9-4

29See Dominick Salvatore, Managerial Economics, McGraw-Hill, New York, 1989, p. 157.

guj75845_ch09.qxd 4/16/09 12:20 PM Page 309

countries in the table belong to the first income category, the second five countries to the second income category, and so on. a. Create a regression using all five independent variables. A priori, what do

you expect the impact of the population growth rate (X4) and daily calorie intake (X5) will be on infant mortality rate (Y)?

b. Estimate the preceding regression and see if your expectations were correct.

c. If you encounter multicollinearity in the preceding regression, what can you do about it? You may undertake any corrective measures that you deem necessary.

9.19. The model from Ex. 9.18, without inclusion of X4 and X5, when tested for het- eroscedasticity following the White test outlined in regression (9.14), yielded the following regression results. (Note: To save space, we have given only the t statistics and their p values. The results were obtained from the EViews statistical package.)

t = (-0.01) (0.60) (-0.13) (0.87) (0.56) (-0.85)

p value = (0.989)(0.556) (0.895) (0.394) (0.581) (0.400)

R2 = 0.23

a. How do you interpret the preceding regression? b. Do these results suggest that the model above suffers from the problem of

heteroscedasticity? How do you know? c. If the above regression suffers from heteroscedasticity, how would you get

rid of it? 9.20. a. Use the data given in Table 9-5 (on the textbook’s Web site) to develop a

multiple regression model to explain daily calorie intake for the 20 coun- tries shown in the table.

b. Does this model suffer from heteroscedasticity? Show the necessary test(s). c. If there is heteroscedasticity, obtain White’s heteroscedasticity-corrected

standard errors and t statistics (see if your software does this) and com- pare and comment on the results obtained in part (a) above.

9.21. Refer to the life expectancy example (Example 7.4) discussed in Chapter 7. For the models considered in Table 7-1, find out if these models suffer from the problem of heteroscedasticity. The raw data are given in Table 9-6, found on the textbook’s Web site. State the tests you use. How would you remedy the problem? Show the necessary calculations. Also, present the results based on White’s heteroscedasticity-corrected standard errors. What general conclusion do you draw from this exercise?

9.22. Estimate the counterparts of Equations (9.10) to (9.12) using Exper and Wagef as the deflators.

9.23. Describe the Breusch-Pagan (BP) test. Verify that, on the basis of this test, Eq. (9.33) shows no evidence of heteroscedasticity.

9.24. Reestimate Eq. (9.27a) using Wagef as the deflator. 9.25. Interpret the dummy coefficients in Eq. (9.33). 9.26. Refer to Table 9-7 on the textbook’s Web site. This data set considers R&D

expenditure data in relation to sales.

e2i = -15.76 + 0.3810X2i - 4.5641X3i + 0.000005X22i + 0.1328X23i - 0.0050X2iX3i

310 PART TWO: REGRESSION ANALYSIS IN PRACTICE

guj75845_ch09.qxd 4/16/09 12:20 PM Page 310

a. Create a standard LIV (linear-in-variables) regression model and note the results.

b. Using the software package of your choice, obtain White’s heteroscedasticity-corrected regression results. What are they?

c. Is there a substantial difference between the results obtained in parts (a) and (b)?

9.27. Table 9-8 (found on the textbook’s Web site) gives data on salary and related data on 447 executives of Fortune 500 companies. Salary = 1999 salary and bonuses; totcomp = 1999 CEO total compensation; tenure = number of years as CEO (0 if less than 6 months); age = age of CEO; sales = total 1998 sales revenue of the firm; profits = 1998 profits for the firm; and assets = total assets of the firm in 1998. a. Estimate the following regression from these data and obtain the Breusch-

Pagan statistic to check for heteroscedasticity:

Does there seem to be a problem with heteroscedasticity? b. Now create a second model using ln(Salary) as the dependent variable. Is

there any improvement in the heteroscedasticity? c. Create scattergrams of Salary versus each of the independent variables.

Can you discern which variable(s) is (are) contributing to the issue? What suggestions would you make now to address this? What is your final model?

d. Now obtain (White’s) robust standard errors. Are there any noticeable dif- ferences?

9.28. Table 9-9 (on the textbook’s Web site) gives data on 81 cars regarding MPG (average miles per gallon), HP (engine horsepower), VOL (cubic feet of cab space), SP (top speed, miles per hour), and WT (vehicle weight in 100 lbs.). a. Consider the following model:

Estimate the parameters of this model and interpret the results. Do they make economic sense?

b. Would you expect the error variance in the preceding model to be heteroscedastic? Why?

c. Use the White test to find out if the error variance is heteroscedastic. d. Obtain White’s heteroscedasticity-consistent standard errors and t values

and compare your results with those obtained from OLS. e. If heteroscedasticity is established, how would you transform the data so

that in the transformed data the error variance is homoscedastic? Show the necessary calculations.

MPGi = B1 + B2SPi + B3HPi + B4WTi + ui

Salaryi = B1 + B2tenurei + B3agei + B4salesi + B5profitsi + B6assetsi + ui

CHAPTER NINE: HETEROSCEDASTICITY: WHAT HAPPENS IF THE ERROR VARIANCE IS NONCONSTANT? 311

guj75845_ch09.qxd 4/16/09 12:20 PM Page 311

CHAPTER 10 AUTOCORRELATION: WHAT HAPPENS IF ERROR TERMS ARE CORRELATED?

In Chapter 9 we examined the consequences of relaxing one of the assump- tions of the classical linear regression model (CLRM)—the assumption of homoscedasticity. In this chapter we consider yet another departure from the CLRM assumption, namely, that there is no serial correlation or autocor- relation among the disturbances ui entering the population regression func- tion (PRF). Although we discussed this assumption briefly in Chapter 3, we will take a long look at it in this chapter to seek answers to the following questions:

1. What is the nature of autocorrelation? 2. What are the theoretical and practical consequences of autocorrelation? 3. Since the assumption of no autocorrelation relates to ui, which are not

directly observable, how do we know that there is no autocorrelation in any concrete study? In short, how do we detect autocorrelation in practice?

4. How do we remedy the problem of autocorrelation if the consequences of not correcting for it are serious?

This chapter is in many ways similar to the preceding one on heteroscedas- ticity in that under both heteroscedasticity and autocorrelation, ordinary least squares (OLS) estimators, although linear and unbiased, are not efficient; that is, they are not best linear unbiased estimators (BLUE).

Since our emphasis in this chapter is on autocorrelation, we assume that all other assumptions of the CLRM remain intact.

312

guj75845_ch10.qxd 4/16/09 12:26 PM Page 312

CHAPTER TEN: AUTOCORRELATION: WHAT HAPPENS IF ERROR TERMS ARE CORRELATED? 313

10.1 THE NATURE OF AUTOCORRELATION

The term autocorrelation can be defined as “correlation between members of observations ordered in time (as in time series data) or space (as in cross- sectional data).”1

Just as heteroscedasticity is generally associated with cross-sectional data, autocorrelation is usually associated with time series data (i.e., data ordered in temporal sequence), although, as the preceding definition suggests, autocorre- lation can occur in cross-sectional data also, in which case it is called spatial correlation (i.e., correlation in space rather than in time).

In the regression context the CLRM assumes that such correlation does not exist in disturbances ui. Symbolically, no autocorrelation means

(10.1)

That is, the expected value of the product of two different error terms ui and uj is zero.2 In plain English, this assumption means that the disturbance term relat- ing to any observation is not related to or influenced by the disturbance term relating to any other observation. For example, in dealing with quarterly time series data involving the regression of output on labor and capital inputs (i.e., a production function), if, say, there is a labor strike affecting output in one quar- ter, there is no reason to believe that this disruption will be carried over to the next quarter. In other words, if output is lower this quarter, it will not necessar- ily be lower next quarter. Likewise, in dealing with cross-sectional data involv- ing the regression of family consumption expenditure on family income, the effect of an increase of one family’s income on its consumption expenditure is not expected to affect the consumption expenditure of another family.

But if there is such dependence, we have autocorrelation. Symbolically,

(10.2)

In this situation the disruption caused by a strike this quarter can affect output next quarter (it might in fact increase to catch up with the backlog) or the in- crease in the consumption expenditure of one family can pressure another fam- ily to increase its consumption expenditure if it wants to keep up with the Joneses (this is the case of spatial correlation).

It is interesting to visualize some likely patterns of autocorrelation and nonautocorrelation, which are given in Figure 10-1. In the figure the vertical axis shows both ui (the population disturbances) and their sample counterparts, ei (the residuals), for as in the case of heteroscedasticity, we do not observe the former and try to infer their behavior from the latter.

E(uiuj) Z 0 i Z j

E(uiuj) = 0 i Z j

1Maurice G. Kendall and William R. Buckland, A Dictionary of Statistical Terms, Hafner, New York, 1971, p. 8.

2If i = j, Equation (10.1) becomes , the variance of ui, which by the homoscedasticity assumption is equal to .�2

E(ui 2)

guj75845_ch10.qxd 4/16/09 12:26 PM Page 313

(a) (b) (c)

(d) (e)

0 0 0

0 0

Time

u, e

u, e

Time

u, e

Time

u, e

Time

u, e

Time

Patterns of autocorrelationFIGURE 10-1

Figures 10-1(a) to (d) show a distinct pattern among the u’s while Fig- ure 10-1(e) shows no systematic pattern, which is the geometric counterpart of the assumption of no autocorrelation given in Equation (10.1).

Why does autocorrelation occur? There are several reasons for autocorrela- tion, some of which follow.

Inertia

A distinguishing feature of most economic time series is inertia or sluggishness. As is well known, time series such as the gross domestic product (GDP), pro- duction, employment, money supply, and price indexes exhibit business cycles (recurring and self-sustaining fluctuations in economic activity). Starting at the bottom of the recession, when economic recovery starts, most of these time se- ries start moving upward. In this upswing the value of a series at one point in time is greater than its previous value. Thus, there is a momentum built into these time series and the upswing continues until something happens (e.g., an increase in taxes or interest rates, or both) to slow them down. Therefore, in regressions involving time series data successive observations are likely to be interdependent or correlated.

314 PART TWO: REGRESSION ANALYSIS IN PRACTICE

guj75845_ch10.qxd 4/16/09 12:26 PM Page 314

CHAPTER TEN: AUTOCORRELATION: WHAT HAPPENS IF ERROR TERMS ARE CORRELATED? 315

Model Specification Error(s)

Sometimes autocorrelation patterns such as those shown in Figures 10-1(a) to (d) occur not because successive observations are correlated but because the re- gression model is not “correctly” specified. As we saw in Chapter 7, by incorrect specification of a model we mean that either some important variables that should be included in the model are not included (this is the case of underspeci- fication) or that the model has the wrong functional form—a linear-in-variable (LIV) model is fitted whereas a log-linear model should have been fitted. If such model specification errors occur, then the residuals from the incorrect model will exhibit a systematic pattern. A simple test of this is to include the excluded variable and to determine if the residuals still show a distinct pattern. If they do not, then the so-called serial correlation observed in the incorrect model was due to specification error.

The Cobweb Phenomenon

The supply of many agricultural commodities reflects the so-called cobweb phenomenon, where supply reacts to price with a lag of one time period because supply decisions take time to implement—the gestation period. Thus, at the beginning of this year’s planting of crops farmers are influenced by the price prevailing last year so that their supply function is

(10.3)

Suppose at the end of period t, price Pt turns out to be lower than . Therefore, in period (t + 1) farmers decide to produce less than they did in pe- riod t. Obviously, in this situation the disturbances ut are not expected to be ran- dom, for if the farmers overproduce in year t, they are likely to underproduce in year (t + 1), etc., leading to a cobweb pattern.

Data Manipulation

In empirical analysis the raw data are often “massaged” in a process referred to as data manipulation. For example, in time series regressions involving quarterly data, such data are often derived from the monthly data by simply adding three monthly observations and dividing the sum by 3. This averaging introduces “smoothness” into the data by dampening the fluctuations in the monthly data. Therefore, the graph plotting the quarterly data looks much smoother than the monthly data, and this smoothness can itself lend to a systematic pattern in the disturbances, thereby inducing autocorrelation.3

Before moving on, note that autocorrelation can be positive as well as nega- tive, although economic time series generally exhibit positive autocorrelation

Pt-1

Supplyt = B1 + B2Pt-1 + ut

3It should be pointed out that sometimes the averaging or other data-editing procedures are used because the weekly or monthly data can be subject to substantial measurement errors. The averaging process, therefore, can produce more accurate estimates. But the unfortunate byproduct of this process is that it can induce autocorrelation.

guj75845_ch10.qxd 4/16/09 12:26 PM Page 315

Time

u t , e

t

0 Time

u t , e

t

u t�1

e t�1

(a)

u t , e

t

0

u t , e

t

u t�1

e t�1

(b)

(a) Positive autocorrelation; (b) negative autocorrelationFIGURE 10-2

because most of them either move upward or downward over extended time periods (possibly due to business cycles) and do not exhibit a constant up-and- down movement, such as that shown in Figure 10-2(b).

10.2 CONSEQUENCES OF AUTOCORRELATION

Suppose the error terms exhibit one of the patterns shown in Figures 10-1(a) to (d) or Figure 10-2. What then? In other words, what are the consequences of relaxing assumption (10.1) for the OLS methodology? These consequences are as follows.4

1. The least squares estimators are still linear and unbiased. 2. But they are not efficient; that is, they do not have minimum variance

compared to the procedures that take into account autocorrelation. In

316 PART TWO: REGRESSION ANALYSIS IN PRACTICE

4The proofs can be found in Gujarati and Porter, Basic Econometrics, 5th ed., McGraw-Hill, New York, 2009, Chapter 12.

guj75845_ch10.qxd 4/16/09 12:26 PM Page 316

CHAPTER TEN: AUTOCORRELATION: WHAT HAPPENS IF ERROR TERMS ARE CORRELATED? 317

short, the usual ordinary least squares (OLS) estimators are not best linear unbiased estimators (BLUE).

3. The estimated variances of OLS estimators are biased. Sometimes the usual formulas to compute the variances and standard errors of OLS es- timators seriously underestimate true variances and standard errors, thereby inflating t values. This gives the appearance that a particular co- efficient is statistically significantly different from zero, whereas in fact that might not be the case.

4. Therefore, the usual t and F tests are not generally reliable. 5. The usual formula to compute the error variance, namely, = RSS/d.f.

(residual sum of squares/degrees of freedom), is a biased estimator of the true and in some cases it is likely to underestimate the latter.

6. As a consequence, the conventionally computed R2 may be an unreliable measure of true R2.

7. The conventionally computed variances and standard errors of forecast may also be inefficient.

As you can see, these consequences are similar to those of heteroscedasticity, and just as serious in practice. Therefore, as with heteroscedasticity, we must find out if we have the autocorrelation problem in any given application.

10.3 DETECTING AUTOCORRELATION

When it comes to detecting autocorrelation, we face the same dilemma as in the case of heteroscedasticity. There, we did not know the true error variance because the true ui are unobservable. Here, too, not only do we not know what the true ut are, but if they are correlated, we do not know what the true mecha- nism is that has generated them in a concrete situation. We only have their prox- ies, the et’s. Therefore, as with heteroscedasticity, we have to rely on the et’s obtained from the standard OLS procedure to learn something about the pres- ence, or lack thereof, of autocorrelation. With this caveat, we will now consider several diagnostic tests of autocorrelation, which we will illustrate with an example.

Example 10.1. Relationship between Real Wages and Productivity, U.S. Business Sector, 1959–2006

From basic macroeconomics, one would expect a positive relationship be- tween real wages and (labor) productivity—ceteris paribus, the higher the level of labor productivity, the higher the real wages. To shed some light on this, we explore the data in Table 10-1 (on the textbook’s Web site), which con- tains data on real wages (real compensation per hour) and labor productivity (output per hour of all persons) for the business sector of the U.S. economy for the time period 1959 to 2006. (Recall that these data were also presented in Table 3-3, in our concluding example in Chapter 3.)

�2i

�2

�N 2

guj75845_ch10.qxd 4/16/09 12:26 PM Page 317

R es

id u

al s

Time

Residuals from the regression (10.4)FIGURE 10-3

318 PART TWO: REGRESSION ANALYSIS IN PRACTICE

Regressing real wages on productivity, we obtain the following results; for discussion purposes we will call this the wages-productivity regression.

Realwagesi = 33.6360 + 0.6614 Productivityi se = (1.4001) (0.0156) (10.4)

t = (24.0243) (42.2928) r2 = 0.9749; d = 0.1463

Note: d refers to the Durbin-Watson statistic that is discussed below.

Judged by the usual criteria, these results look good. As expected, there is a positive relationship between real wages and productivity. The estimated t ratios seem quite high and the R2 value is quite high. Before we accept these results at their face value, we must guard against the possibility of autocorrelation, for in its presence, as we know, the results may not be reliable.

To test for autocorrelation, we consider three methods: (1) the graphical method, which is comparatively simple, (2) the celebrated Durbin-Watson d sta- tistic, and (3) the runs test, which is discussed in Appendix 10A.

The Graphical Method

As in the case of heteroscedasticity, a simple visual examination of OLS residu- als, e’s, can give valuable insight about the likely presence of autocorrelation among the error terms, the u’s. Now there are various ways of examining the residuals. We can plot them against time, as shown in Figure 10-3, which depicts the residuals obtained from regression (10.4) and shown in Table 10-2. Incidentally, such a plot is called a time-sequence plot.

An examination of Figure 10-3 shows that the residuals, et’s, do not seem to be randomly distributed, as in Figure 10-1(e). As a matter of fact, they exhibit a

guj75845_ch10.qxd 4/16/09 12:26 PM Page 318

CHAPTER TEN: AUTOCORRELATION: WHAT HAPPENS IF ERROR TERMS ARE CORRELATED? 319

TABLE 10-2 RESIDUALS AND RELATED DATA FROM THE WAGES-PRODUCTIVITY REGRESSION

et et−1 D = et − et−1 D2 Sign of e

-5.5315 — — — 30.5980 - -4.6395 -5.5315 0.8920 0.7958 21.5250 - -4.0293 -4.6395 0.6102 0.3724 16.2351 - -3.4225 -4.0293 0.6068 0.3682 11.7136 - -3.3494 -3.4225 0.0731 0.0053 11.2184 - -2.9543 -3.3494 0.3950 0.1561 8.7282 - -2.8642 -2.9543 0.0902 0.0081 8.2036 - -1.8191 -2.8642 1.0451 1.0923 3.3090 - -0.8831 -1.8191 0.9360 0.8761 0.7798 -

0.4787 -0.8831 1.3618 1.8545 0.2292 + 1.3827 0.4787 0.9040 0.8172 1.9119 + 1.9721 1.3827 0.5894 0.3474 3.8894 + 1.6004 1.9721 -0.3717 0.1382 2.5613 + 2.5687 1.6004 0.9683 0.9376 6.5982 + 2.8694 2.5687 0.3007 0.0904 8.2332 + 2.5580 2.8694 -0.3114 0.0969 6.5434 + 1.7362 2.5580 -0.8218 0.6753 3.0145 + 2.4849 1.7362 0.7486 0.5604 6.1745 + 2.8054 2.4849 0.3205 0.1027 7.8701 + 3.6342 2.8054 0.8289 0.6870 13.2076 + 3.7711 3.6342 0.1369 0.0187 14.2215 + 3.6020 3.7711 -0.1691 0.0286 12.9744 + 2.5788 3.6020 -1.0232 1.0469 6.6504 + 3.9875 2.5788 1.4087 1.9845 15.9005 + 2.0544 3.9875 -1.9331 3.7369 4.2207 + 0.7117 2.0544 -1.3428 1.8030 0.5065 + 0.6417 0.7117 -0.0700 0.0049 0.4118 + 1.9193 0.6417 1.2776 1.6323 3.6837 + 1.9530 1.9193 0.0337 0.0011 3.8143 + 2.3649 1.9530 0.4118 0.1696 5.5926 + 0.2462 2.3649 -2.1186 4.4886 0.0606 + 0.1526 0.2462 -0.0937 0.0088 0.0233 + 0.3945 0.1526 0.2419 0.0585 0.1556 + 0.2196 0.3945 -0.1749 0.0306 0.0482 +

-0.3238 0.2196 -0.5433 0.2952 0.1048 - -1.6487 -0.3238 -1.3250 1.7555 2.7183 - -2.0793 -1.6487 -0.4306 0.1854 4.3235 - -3.2736 -2.0793 -1.1943 1.4265 10.7168 - -3.5533 -3.2736 -0.2796 0.0782 12.6258 - -0.8740 -3.5533 2.6793 7.1787 0.7638 - -0.2214 -0.8740 0.6525 0.4258 0.0490 -

1.5511 -0.2214 1.7725 3.1418 2.4058 + 1.1339 1.5511 -0.4172 0.1740 1.2857 + 0.0733 1.1339 -1.0606 1.1248 0.0054 +

-1.0582 0.0733 -1.1315 1.2803 1.1198 - -2.2556 -1.0582 -1.1974 1.4338 5.0878 - -3.2529 -2.2556 -0.9973 0.9945 10.5812 - -3.4127 -3.2529 -0.1598 0.0255 11.6462 -

e2t

guj75845_ch10.qxd 4/16/09 12:26 PM Page 319

e t

e t�1

distinct behavior. Initially they are negative, then they become positive, then negative, then positive, and then negative. This can be seen more vividly if we plot et given in column 1 of Table 10-2 against given in column 2, as in Figure 10-4.

The general tenor of this figure is that successive residuals are positively cor- related, suggesting positive autocorrelation; most residuals are bunched in the first (northeast) and the third (southwest) quadrants.

The Durbin-Watson d Test5

The most celebrated test for detecting autocorrelation is that developed by Durbin and Watson, popularly known as the Durbin-Watson d statistic, which is defined as

(10.5)

which is simply the ratio of the sum of squared differences in successive resid- uals to the RSS. Note that in the numerator of the d statistic the sample size is (n − 1) because one observation is lost in taking successive differences.

d = a n

t=2 (et - et-1)2

a n

t=1 et

2

et-1

320 PART TWO: REGRESSION ANALYSIS IN PRACTICE

5J. Durbin and G. S. Watson, “Testing for Serial Correlation in Least-Squares Regression,” Biometrika, vol. 38, 1951, pp. 159–177.

Residuals et against et−1 from the regression (10.4)FIGURE 10-4

guj75845_ch10.qxd 4/16/09 12:26 PM Page 320

CHAPTER TEN: AUTOCORRELATION: WHAT HAPPENS IF ERROR TERMS ARE CORRELATED? 321

A great advantage of the d statistic is its simplicity; it is based on the OLS residuals which are routinely computed by most regression packages. It is now common practice to report the Durbin-Watson d along with summary statistics, such as R2, adjusted R2, , t, F ratios, etc. (see Equation [10.4]).

For our illustrative regression, we can easily compute the d statistic from the data given in Table 10-2. First, subtract the lagged e’s given in column 2 of that table from the e’s given in column 1, square the difference, sum it, and divide the sum by the sum of squared e’s given in column 5. The necessary raw data to compute d are presented in Table 10-2. Of course, this is now routinely done by the computer. For our example, the computed d value is 0.1463 (verify this).

Before proceeding to show how the computed d value can be used to deter- mine the presence, or otherwise, of autocorrelation, it is very important to note the assumptions underlying the d statistic:

1. The regression model includes an intercept term. Therefore, it cannot be used to determine autocorrelation in models of regression through the origin.6

2. The X variables are nonstochastic; that is, their values are fixed in repeated sampling.

3. The disturbances ut are generated by the following mechanism:

(10.6)

which states that the value of the disturbance, or error, term at time t depends on its value in time period (t - 1) and a purely random term (vt), the extent of the dependence on the past value, is measured by (rho). This is called the coefficient of autocorrelation, which lies be- tween -1 and 1. (Note: A correlation coefficient always lies between -1 and 1.) The mechanism, Equation (10.6), is known as the Markov first- order autoregressive scheme or simply the first-order autoregressive scheme, usually denoted as the AR(1) scheme. The name autoregression is appropriate because Eq. (10.6) can be interpreted as the regression of ut on itself lagged in one period. And this is first order because ut and its immediate past value are involved; that is, the maximum lag is one time period.7

ut = �ut-1 + vt -1 … � … 1

(R2)

6However, R. W. Farebrother has calculated d values when the intercept is absent from the model. See his article “The Durbin-Watson Test for Serial Correlation When There Is No Intercept in the Regression,” Econometrica, vol. 48, 1980, pp. 1553–1563.

7If the model were

it would be an AR(2) or second-order autoregressive scheme, etc. We note here that unless we are willing to assume some scheme by which the u’s are generated, it is difficult to solve the prob- lem of autocorrelation. This situation is similar to heteroscedasticity in which we also made some assumption about how the unobservable error variance is generated. For autocorrelation, in practice, the AR(1) assumption has proven to be quite useful.

�i 2

ut = �1ut-1 + �2ut-2 + vt

guj75845_ch10.qxd 4/16/09 12:26 PM Page 321

4. The regression does not contain the lagged value(s) of the dependent variable as one of the explanatory variables. In other words, the test is not applicable to models such as

(10.7)

where is the one-period lagged value of the dependent variable Y. Models like regression (10.7) are known as autoregressive models, a regression of a variable on itself with a lag as one of the explanatory variables.

Assuming all these conditions are fulfilled, what can we say about autocor- relation in our wages-productivity regression with a d value of 0.1463? Before answering this question, we can show that for a large sample size Eq. (10.5) can be approximately expressed as (see Problem 10.19)

(10.8)

where L means approximately and where

(10.9)

which is an estimator of the coefficient of autocorrelation � of the AR(1) scheme given in Equation (10.6). But since , Equation (10.8) implies the following:

Value of Value of d (approx.)

1. = -1 (perfect negative correlation) d = 4

2. = 0 (no autocorrelation) d = 2

3. = 1 (perfect positive correlation) d = 0

In short,

(10.10)

that is, the computed d value must lie between 0 and 4. From the preceding discussion we can state that if a computed d value is

closer to zero, there is evidence of positive autocorrelation, but if it is closer to 4, there is evidence of negative autocorrelation. And the closer the d value is to 2, the more the evidence is in favor of no autocorrelation. Of course, these are broad limits and some definite guidelines are needed as to when we can call a

0 … d … 4

�N

�N

�N

�N

-1 … �N … 1

�N = a n

t=2 etet-1

a n

t=1 et

2

d L 2(1 - �N )

Yt-1

Yt = B1 + B2Xt + B3Yt-1 + ut

322 PART TWO: REGRESSION ANALYSIS IN PRACTICE

guj75845_ch10.qxd 4/16/09 12:26 PM Page 322

d L

d U

2

Reject H0 Evidence of positive autocorrelation

Zone of indecision

Accept H0 or H0 or both *

Zone of indecision

Reject H0 *

0 4–d U

d4–d L

4

Legend

H0: No positive autocorrelation

H0: No negative autocorrelation *

Evidence of negative autocorrelation

The Durbin-Watson d statisticFIGURE 10-5

CHAPTER TEN: AUTOCORRELATION: WHAT HAPPENS IF ERROR TERMS ARE CORRELATED? 323

computed d value indicative of positive, negative, or no autocorrelation. In other words, is there a “critical” d value, as in the case of the t and F distribu- tions, that will give us some definitive indication of autocorrelation?

Unfortunately, unlike t and F distributions, there is not one but two critical d values.8 Durbin and Watson have provided a lower limit dL and an upper limit dU, such that if the d value computed from Equation (10.5) lies outside these bounds, a decision can be made regarding the presence of positive or negative serial cor- relation. These upper and lower limits, or upper and lower critical values, depend upon the number of observations, n, and the number of explanatory variables, k. These limits for n, from 6 to 200 observations, and for k, up to 20 ex- planatory variables, have been tabulated by Durbin and Watson for 1% and 5% significance levels and are reproduced in Appendix E, Table E-5. The actual me- chanics of the Durbin-Watson test are best explained with Figure 10-5.

The steps involved in this test are as follows:

1. Run the OLS regression and obtain the residuals et. 2. Compute d from Eq. (10.5). (Most computer programs now do this

routinely.) 3. Find out the critical dL and dU from the Durbin-Watson tables for the

given sample size and the given number of explanatory variables. 4. Now follow the decision rules given in Table 10-3, which for ease of

reference are also depicted in Figure 10-5.

Returning to Example 10.1, we have d = 0.1463. From the Durbin-Watson tables we see that for n = 50 (which is closest to our sample size of 48) and one explanatory variable, dL = 1.503 and dU = 1.585 at the 5% level of significance.

8Without going into technicalities, it should be mentioned that the exact critical value of d depends upon the value(s) taken by the explanatory variable(s), which will obviously vary from sample to sample.

guj75845_ch10.qxd 4/16/09 12:26 PM Page 323

Since the computed d of 0.1463 is well below the lower bound value of 1.503, fol- lowing the decision rules given in Table 10-3, we conclude that there is positive autocorrelation in our wages-productivity regression residuals. We reached the same conclusion on the basis of visual inspection of the residuals given in Figures 10-3 and 10-4.

Although popularly used, one drawback of the d test is that if it falls in the indecisive zone, or region of ignorance (see Figure 10-5), we cannot conclude whether or not autocorrelation exists. To solve this problem, several authors9

have proposed modifications of the d test but they are involved and beyond the scope of this book. The computer program SHAZAM performs an exact d test (i.e., true critical value), and if you have access to the program you may want to use that test if the d statistic lies in the indecisive zone. Since the conse- quences of autocorrelation can be quite serious, as we have seen, if a d statistic lies in the indecisive zone, it might be prudent to assume that autocorrelation exists and proceed to correct the condition. Of course, the nonparametric runs test (discussed in Appendix 10A) and the visual graphics should also be in- voked in this case.

To conclude our discussion of the d test, it should be reemphasized that this test should not be applied if the assumptions underlying this test discussed ear- lier do not hold. In particular, it should not be used to test for serial correlation in autoregressive models like the regression (10.7). If applied mistakenly in such cases, the computed d value is often found to be around 2, which is the value of d expected in the absence of AR(1). Hence, there is a built-in bias against dis- covering serial correlation in such models. But if such a model is used in empir- ical analysis, to test for autocorrelation in such models, Durbin has developed the so-called h statistic, which is discussed in Problem 10.16.

Before we move on, note that there are several other methods of detecting autocorrelation. We will discuss two such methods, the runs test and the Breusch- Godfrey test, in Appendixes 10A and 10B, respectively.10

324 PART TWO: REGRESSION ANALYSIS IN PRACTICE

TABLE 10-3 DURBIN-WATSON d TEST: DECISION RULES

Null hypothesis Decision If

No positive autocorrelation Reject 0 � d � dL No positive autocorrelation No decision dL � d � dU No negative autocorrelation Reject 4 − dL � d � 4 No negative autocorrelation No decision 4 − dU � d � 4 � dL No positive or negative autocorrelation Do not reject dU � d � 4 � dU

9Some authors maintain that dU, the upper limit of Durbin-Watson d, is approximately the true significance limit. Therefore, if the calculated d lies below dU, we can assume that there is (positive) autocorrelation. See, for example, E. J. Hannan and R. D. Terrell, “Testing for Serial Correlation after Least Squares Regression,” Econometrica, vol. 36, no. 2, 1968, pp. 133–150.

10For further details, see Gujarati and Porter, Basic Econometrics, 5th ed., McGraw-Hill, New York, 2009, Chapter 11.

guj75845_ch10.qxd 4/16/09 12:26 PM Page 324

CHAPTER TEN: AUTOCORRELATION: WHAT HAPPENS IF ERROR TERMS ARE CORRELATED? 325

10.4 REMEDIAL MEASURES

Since the consequences of serial correlation can be very serious and the cost of further testing can be high, and if on the basis of one or more diagnostic tests dis- cussed earlier it is found that we have autocorrelation, we need to seek remedial measures. The remedy, however, depends upon what knowledge we have or can assume about the nature of interdependence in the error terms ut. To keep the discussion as simple as possible, let us revert to our two-variable model:

Yt = B1 + B2Xt + ut (10.11)

and assume that the error terms follow the AR(1) scheme:

(10.6)

where the v’s satisfy the usual OLS assumptions and � is known. Now if somehow we can transform the model (10.11) so that in the trans-

formed model the error term is serially independent, then applying OLS to the transformed model will give us the usual BLUE estimators, assuming of course that the other assumptions of CLRM are fulfilled. Recall that we used the same philosophy in the case of heteroscedasticity, where our objective was to trans- form the model so that in the transformed model the error term was homoscedastic.

To see how we can transform the regression (10.11) so that in the transformed model the error term does not have autocorrelation, write the regression (10.11) with a one-period lag as

Yt-1 = B1 + B2Xt-1 + ut-1 (10.12)

Multiply regression (10.12) by � on both sides to obtain

�Yt-1 = � B1 + � B2Xt-1 + �ut-1 (10.13)

Now subtract Equation (10.13) from Equation (10.11), to yield

(Yt - �Yt-1) = B1(1 - �) + B2(Xt - �Xt-1) + vt (10.14)

where use is made of Eq. (10.6). Since the error term vt in Equation (10.14) satisfies the standard OLS

assumption, Eq. (10.14) provides the kind of transformation we are looking for which gives us a model free from serial correlation. If we write Eq. (10.14) as

(10.15)

where

B*1 = B1(1 - �)

X*t = (Xt - �Xt-1)

Y*t = (Yt - �Yt-1)

Y*t = B*1 + B2X*t + vt

ut = �ut-1 + vt -1 … � … 1

guj75845_ch10.qxd 4/16/09 12:26 PM Page 325

326 PART TWO: REGRESSION ANALYSIS IN PRACTICE

11A technical point may be noted here, since , and so may not get unbiased estimation of the original intercept term. But as noted on several occasions, in most applications the intercept term may not have any concrete economic meaning.

B *1 = B1(1 - �), B1 = B *1/(1 - �)

and apply OLS to the transformed variables Y* and X*, the estimators thus obtained will have the desirable BLUE property.11 Incidentally, note that when we apply OLS to transformed models, the estimators thus obtained are called generalized least squares (GLS) estimators. In the previous chapter on heteroscedasticity we also used GLS, except that there we called it WLS (weighted least squares).

We call Equations (10.14) and (10.15) generalized difference equations; specific cases of the generalized difference equation in which takes a particular value will be discussed shortly. It involves regressing Y on X, not in the original form, but in the difference form, which is obtained by subtracting a portion ( ) of the value of a variable in the previous period from its value in the current time period. Thus, if , we subtract 0.5 times the value of the variable in the previous time period from its value in the current time period. In this differencing procedure we lose one observation because the first sample observation has no antecedent. To avoid this loss of one observation, the first observation of Y and X is transformed as follows:

(10.16)

This transformation is known as the Prais-Winsten transformation. In prac- tice, though, if the sample size is very large, this transformation is not gen- erally made and we use Eq. (10.14) with (n − 1) observations. However, in small samples sometimes the results are sensitive if we exclude the first observation.

A couple of points about the generalized difference transformation Eq. (10.14) should be made here. First, although we have considered only a two-variable model, the transformation can be generalized to more than one explanatory variable (see Problem 10.18). Second, so far we have assumed only an AR(1) scheme, as in Eq. (10.6). But the transformation can be generalized easily to higher-order schemes, such as an AR(2), AR(3), etc.; no new principle is involved in the transformation except some tedious algebra.

It seems that we have a “solution” to the autocorrelation problem in the generalized difference equation (10.14). Alas, we have a problem. For the successful application of the scheme, we must know the true autocorrelation parameter, . Of course, we do not know it, and to use Eq. (10.14), we must find ways to estimate the unknown . The situation here is similar to that in the case of heteroscedasticity. There, we did not know the true and therefore had to make some plausible assumptions as to what it might be. Of course, had we known it, we could have used weighted least squares (WLS) straightforwardly.

�i 2

� �

X*1 = 21 - �2(X1)

Y*1 = 21 - �2(Y1)

� = 0.5

= �

guj75845_ch10.qxd 4/16/09 12:26 PM Page 326

CHAPTER TEN: AUTOCORRELATION: WHAT HAPPENS IF ERROR TERMS ARE CORRELATED? 327

10.5 HOW TO ESTIMATE �

There is no unique method of estimating ; rather, there are several approaches, some of which we present now.

� = 1: The First Difference Method Since lies between 0 and ±1, we can assume any value for in the range −1 to 1 and use the generalized difference equation (10.14). As a matter of fact, Hildreth and Lu12 proposed such a scheme. But which particular value of ? For even within the confines of the −1 to +1 range literally hundreds of values of can be chosen. In applied econometrics one assumption that has been used extensively is that ; that is, the error terms are perfectly positively autocorrelated, which may be true of some economic time series. If this assumption is acceptable, the generalized difference equation (10.14) reduces to the first difference equation as

or (10.17)

where , called delta, is the first difference operator and is a symbol or operator (like the operator E for expected value) for successive differences of two values. In estimating Equation (10.17) all we have to do is to form the first differences of both the dependent and explanatory variable(s) and run the regression on the variable(s) thus transformed.

Note an important feature of the first difference model (10.17): The model has no intercept. Hence, to estimate Eq. (10.17), we have to use the regression-through- the-origin routine in the computer package. Naturally, we will not be able to estimate the intercept term in this case directly. (But note that .)

� Estimated from Durbin-Watson d Statistic

Recall earlier that we established the following approximate relationship between the d statistic and :

(10.8)

from which we can obtain

(10.18)

Since the d statistic is now routinely computed by most regression packages, we can easily obtain an approximate estimate of from Equation (10.18).�

N� L 1 - d 2

d L 2(1 - N�)

b1 = Y - b2X

¢

¢Yt = B2¢Xt + vt

Yt - Yt-1 = B2(Xt - Xt-1) + vt

� = 1

� �

��

12G. Hildreth and J. Y. Lu, “Demand Relations with Autocorrelated Disturbances,” Michigan State University, Agricultural Experiment Station, Technical Bulletin 276, November 1960.

guj75845_ch10.qxd 4/16/09 12:26 PM Page 327

Once is estimated from d as shown in Eq. (10.18), we can then use it to run the generalized difference equation (10.14) for the wages-productivity example, in which . Therefore,

(10.19)

This value is obviously different from assumed for the first difference transformation. We can use this value to transform the data as in Eq. (10.14).

This method of transformation is easy to use and generally gives good estimates of if the sample size is reasonably large. For small samples, another estimate of based on d is suggested by Theil and Nagar, which is discussed in Problem 10.20.

� Estimated from OLS Residuals, et

Recall the first-order autoregressive scheme

(10.6)

Since the u’s are not directly observable, we can use their sample counterparts, the e’s, and run the following regression:

(10.20)

where is an estimator of . Statistical theory shows that although in small sam- ples is a biased estimator of true , as the sample size increases the bias tends to disappear.13 Hence, if the sample size is reasonably large, we can use obtained from Equation (10.20) and use it to transform the data as shown in Eq. (10.14). An advantage of Eq. (10.20) is its simplicity, for we use the usual OLS method to obtain the residuals. The necessary data to run the regression are given in Table 10-2, and the results of the regression (10.20) are as follows:

(10.21)

Thus, the estimated is about 0.89. (See Table 10-4.)

Other Methods of Estimating �

Besides the methods discussed previously, there are other ways of estimating , which are as follows:

1. The Cochrane-Orcutt iterative procedure. 2. The Cochrane-Orcutt two-step method. 3. The Durbin two-step method.

r2 = 0.8499se = (0.0552)

Net = 0.8915et-1

N� �N�

�N�

et = N�et-1 + vt

ut = �ut-1 + vt

� �

� � = 1�

N� L 1 - 0.1463

2 = 0.9268

d = 0.1463

328 PART TWO: REGRESSION ANALYSIS IN PRACTICE

13Technically, we say that is a consistent estimator of .�N�

guj75845_ch10.qxd 4/16/09 12:26 PM Page 328

TABLE 10-4 WAGES-PRODUCTIVITY REGRESSION: ORIGINAL AND TRANSFORMED DATA ( = 0.8915)

RWAGES RWAGES(−1) RLAGY YDIF PRODUCT PRODUCT(−1) RLAGX XDIF

59.8710 — — — 48.0260 — — — 61.3180 59.8710 53.3750 7.9430 48.8650 48.0260 42.8152 6.0498 63.0540 61.3180 54.6650 8.3890 50.5670 48.8650 43.5631 7.0039 65.1920 63.0540 56.2126 8.9794 52.8820 50.5670 45.0805 7.8015 66.6330 65.1920 58.1187 8.5143 54.9500 52.8820 47.1443 7.8057 68.2570 66.6330 59.4033 8.8537 56.8080 54.9500 48.9879 7.8201 69.6760 68.2570 60.8511 8.8249 58.8170 56.8080 50.6443 8.1727 72.3000 69.6760 62.1162 10.1838 61.2040 58.8170 52.4354 8.7686 74.1210 72.3000 64.4555 9.6656 62.5420 61.2040 54.5634 7.9786 76.8950 74.1210 66.0789 10.8161 64.6770 62.5420 55.7562 8.9208 78.0080 76.8950 68.5519 9.4561 64.9930 64.6770 57.6595 7.3335 79.4520 78.0080 69.5441 9.9079 66.2850 64.9930 57.9413 8.3437 80.8860 79.4520 70.8315 10.0545 69.0150 66.2850 59.0931 9.9219 83.3280 80.8860 72.1099 11.2181 71.2430 69.0150 61.5269 9.7161 85.0620 83.3280 74.2869 10.7751 73.4100 71.2430 63.5131 9.8969 83.9880 85.0620 75.8328 8.1552 72.2570 73.4100 65.4450 6.8120 84.8430 83.9880 74.8753 9.9677 74.7920 72.2570 64.4171 10.3749 87.1480 84.8430 75.6375 11.5105 77.1450 74.7920 66.6771 10.4679 88.3350 87.1480 77.6924 10.6426 78.4550 77.1450 68.7748 9.6802 89.7360 88.3350 78.7507 10.9853 79.3200 78.4550 69.9426 9.3774 89.8630 89.7360 79.9996 9.8634 79.3050 79.3200 70.7138 8.5912 89.5920 89.8630 80.1129 9.4791 79.1510 79.3050 70.7004 8.4506 89.6450 89.5920 79.8713 9.7737 80.7780 79.1510 70.5631 10.2149 90.6370 89.6450 79.9185 10.7185 80.1480 80.7780 72.0136 8.1344 90.5910 90.6370 80.8029 9.7881 83.0010 80.1480 71.4519 11.5491 90.7120 90.5910 80.7619 9.9501 85.2140 83.0010 73.9954 11.2186 91.9100 90.7120 80.8697 11.0403 87.1310 85.2140 75.9683 11.1627 94.8690 91.9100 81.9378 12.9312 89.6730 87.1310 77.6773 11.9957 95.2070 94.8690 84.5757 10.6313 90.1330 89.6730 79.9435 10.1895 96.5270 95.2070 84.8770 11.6500 91.5060 90.1330 80.3536 11.1524 95.0050 96.5270 86.0538 8.9512 92.4080 91.5060 81.5776 10.8304 96.2190 95.0050 84.6970 11.5220 94.3850 92.4080 82.3817 12.0033 97.4650 96.2190 85.7792 11.6858 95.9030 94.3850 84.1442 11.7588

100.0000 97.4650 86.8900 13.1100 100.0000 95.9030 85.4975 14.5025 99.7120 100.0000 89.1500 10.5620 100.3860 100.0000 89.1500 11.2360 99.0240 99.7120 88.8932 10.1308 101.3490 100.3860 89.4941 11.8549 98.6900 99.0240 88.2799 10.4101 101.4950 101.3490 90.3526 11.1424 99.4780 98.6900 87.9821 11.4959 104.4920 101.4950 90.4828 14.0092

100.5120 99.4780 88.6846 11.8274 106.4780 104.4920 93.1546 13.3234 105.1730 100.5120 89.6064 15.5666 109.4740 106.4780 94.9251 14.5489 108.0440 105.1730 93.7617 14.2823 112.8280 109.4740 97.5961 15.2319 111.9920 108.0440 96.3212 15.6708 116.1170 112.8280 100.5862 15.5308 113.5360 111.9920 99.8409 13.6951 119.0820 116.1170 103.5183 15.5637 115.6940 113.5360 101.2173 14.4767 123.9480 119.0820 106.1616 17.7864 117.7090 115.6940 103.1412 14.5678 128.7050 123.9480 110.4996 18.2054 118.9490 117.7090 104.9376 14.0114 132.3900 128.7050 114.7405 17.6495 119.6920 118.9490 106.0430 13.6490 135.0210 132.3900 118.0257 16.9953 120.4470 119.6920 106.7054 13.7416 136.4040 135.0210 120.3712 16.0328

Notes: RWAGES = Real wages RWAGES(−1) = Real wages lagged one period

RLAGY = 0.8915 times rwages (−1) YDIF = rwages − rlagy

PRODUCT = productivity PRODUCT(−1) = productivity lagged one period

RLAGX = 0.8915 times product (−1) XDIF = product − rlagX

329

guj75845_ch10.qxd 4/16/09 12:26 PM Page 329

TABLE 10-5 REGRESSION RESULTS OF WAGES AND PRODUCTIVITY BASED ON VARIOUS TRANSFORMATIONS

Method of estimated transformation from Intercept Slope r 2 Autocorrelation?

Original 33.6360 0.6614 0.9749 Yes regression (assumed) (1.4001) (0.0156) First difference * 0.6469 0.6950** No!

(0.0632) Eq. (10.21)!! 0.5617 0.8040 No!

(0.4783) (0.0413) Eq. (10.21)!!! 0.7421 0.7326 No!

(0.7849) (0.0661)

Notes: Figures in the parentheses are the estimated standard errors. *There is no intercept term in this regression. (Why?) ! Based on the runs test on the estimated residuals. !! Excludes the first observation. !!! Includes the first observation (i.e., Prais-Winsten transformation). **The various r 2 values are not directly comparable. †The intercept term in the transformed regression is . The original intercept can be obtained as .B1 = B*1>(1 - �)

B*1 = B1(1 - �)

2.9755� = 0.8915

4.8131†� = 0.8915

� = 1

� = 0

4. The Hildreth-Lu search procedure. 5. The maximum likelihood method.

A discussion of all these methods will take us far afield and thus is left for the references.14 (But see some of the problems at the end of the chapter.) Whichever method is employed, we use the obtained from that method to transform our data as shown in Eq. (10.14) and run the usual OLS regression.15

Although most computer software packages do the transformations with mini- mum instructions, we show in Table 10-4 how the transformed data will look.

Before concluding, let us consider the results of applying (1) the first difference transformation and (2) the transformation based on Eq. (10.21) to the wages- productivity regression. The results are summarized in Table 10-5 (see also Figures 10-6 and 10-7). Several observations can be made about these results.

1. The original regression was plagued by autocorrelation, but the various transformed regressions seem to be free from autocorrelation on the basis of the runs tests.16

14For a discussion of these methods, see Gujarati and Porter, Basic Econometrics, 5th ed., McGraw-Hill, New York, 2009, Chapter 12.

15In large samples the differences in the estimates of produced by the various methods are generally small.

16We can obtain the Durbin-Watson d statistic for the transformed regressions too. But econometric theory suggests that the computed d statistic from the transformed regressions may not be appropriate to test for autocorrelation in such regressions because if we were to use it for that purpose, it would suggest that the original error term may not follow the AR(1) scheme. It could, for example, follow an AR(2) scheme. The runs test discussed in Appendix 10A does not suffer from this problem since it is a nonparametric test.

330 PART TWO: REGRESSION ANALYSIS IN PRACTICE

guj75845_ch10.qxd 4/16/09 12:26 PM Page 330

5

1

2

3

4

0

�1

�4

�3

�2

�6

�5

�8 �6 �4 �2

RESIDUAL (�1)

R E

S ID

U A

L

0 642

�8

6

2

4

0

�2

�4

�6

Year 1968 1978 1988 1998

R es

id u

al s

Residual from wages-productivity regressionFIGURE 10-6

Residuals against lagged residual from the wages-productivity regressionFIGURE 10-7

CHAPTER TEN: AUTOCORRELATION: WHAT HAPPENS IF ERROR TERMS ARE CORRELATED? 331

2. Even though the estimated from the first difference transformation and that estimated from Eq. (10.21) are not the same, the estimated slope coefficients do not differ substantially from one another if we do not include the first observation in the analysis. But the estimates of intercept and slope values are substantially different from the original OLS regression.

guj75845_ch10.qxd 4/16/09 12:26 PM Page 331

332 PART TWO: REGRESSION ANALYSIS IN PRACTICE

17Strictly speaking, this statement is correct if the sample size is reasonably large. This is because we do not know the true and estimate it, and when we estimate to transform the data, econo- metric theory shows that the usual statistical testing procedure is valid generally in large samples.

18W. K. Newey and K. West, “A Simple Positive Semi-Definite Heteroscedasticity and Autocorrelation Consistent Covariance Matrix,” Econometrica, Vol. 55, 1987, pp. 703–708.

19For details, See Gujarati and Porter, Basic Econometrics, 5th ed., pp. 447–448.

��

3. The situation changes significantly, however, if we include the first observation via the Prais-Winsten transformation. Now the slope coefficient in the transformed regression is very close to the original OLS slope and the intercept in the transformed model is much closer to the original intercept. As noted, in small samples it is important to include the first observation in the analysis. Otherwise the estimated coefficients in the transformed model will be less efficient (i.e., have higher standard errors) than in the model that includes the first observation.

4. The r2 values reported in the various regressions are not directly compa- rable because the dependent variables in all models are not the same. Besides, as noted elsewhere, for the first difference model in which there is no intercept term, the conventionally computed r2 is not meaningful.

If we accept the results based upon the Prais-Winsten transformation for our wages-productivity example and compare them with the original regression beset by the autocorrelation problem, we see that the original t ratio of the slope coefficient, in absolute value, has decreased in the transformed regression. This is another way of saying that the original model underestimated the standard error. But this result is not surprising in view of our knowledge about theoretical consequences of autocorrelation. Fortunately, in this example even after cor- recting for autocorrelation, the estimated t ratio is statistically significant.17 But that may not always be the case.

10.6 A LARGE SAMPLE METHOD OF CORRECTING OLS STANDARD ERRORS:THE NEWEY-WEST (NW) METHOD

Instead of transforming variables to correct for autocorrelation, Newey and West have developed a procedure to compute OLS standard errors that are cor- rected for autocorrelation.18

Although we will not go into the mathematics of this test,19 it should be noted that, strictly speaking, this test is valid in large samples only. However, what constitutes a large sample is problem-specific. It should also be noted that most modern statistical software packages now include the NW test, which is popularly known as HAC (heteroscedasticity and autocorrelation-consistent) standard errors or simply Newey-West standard errors. It is interesting to note that HAC does not change the values of the OLS estimator; it only corrects their stan- dard errors.

guj75845_ch10.qxd 4/16/09 12:26 PM Page 332

To illustrate this test, we give in Table 10-6 (posted on the book’s Web site) several macro-economic data for the U.S. from 1947-1Q to 2007-4Q, for a total of 244 quarterly observations. For our present purposes we will use data on cor- porate dividends paid and corporate profits (CP).

(10.22)

where l denotes natural logarithm. The time or trend variable is included in the model to allow for the upward

trend in the two time series. In Eq. (10.22) gives the elasticity of dividends with respect to profits and gives the relative, or if multiplied by 100, the per- cent growth in dividends over time.

Using EViews 6, we obtained the following results:

Dependent Variable: LDIVIDEND Method: Least Squares

Sample: 1947Q1 2007Q4 Included observations: 244 Newey-West HAC Standard Errors & Covariance (lag truncation = 4)

Coefficient Std. error t-Statistic Prob.

C 0.435764 0.192185 2.267414 0.0243 LCP 0.424535 0.077733 5.461456 0.0000 Time 0.012691 0.001421 8.930795 0.0000

R-squared 0.991424 Mean dependent var 3.999717 Adjusted R-squared 0.991353 S.D. dependent var 1.430724 S.E. of regression 0.133041 Akaike info criterion −1.184093 Sum squared resid 4.265706 Schwarz criterion −1.141095 Log likelihood 147.4594 Hannan-Quinn criter. −1.166776 F-statistic 13930.73 Durbin-Watson stat 0.090181 Prob (F-statistic) 0.000000

Judged by the usual criteria, these results look “good.” All the coefficients are individually highly significant (the p values are practically zero), and the R2 is very high. The elasticity of dividends with respect to corporate profits is about 0.42 and the dividends have been increasing at the quarterly rate of about 1.26 percent. The only fly in the ointment is the low value of the Durbin-Watson statistic, which suggests a high degree of positive autocorrelation in the residu- als. Therefore, we cannot trust these results without taking care of the autocor- relation problem.

Our sample of 244 observations covering a span of 61 years may be large enough to use the HAC procedure.

B3 B2

lDividend = B1 + B2lCP + B3Time + ut

CHAPTER TEN: AUTOCORRELATION: WHAT HAPPENS IF ERROR TERMS ARE CORRELATED? 333

guj75845_ch10.qxd 4/16/09 12:26 PM Page 333

Using EViews 6, we obtained the following results:

Dependent Variable: LDIVIDEND Method: Least Squares

Sample: 1947Q1 2007Q4 Included observations: 244 Newey-West HAC Standard Errors & Covariance (lag truncation = 4)

Coefficient Std. error t-Statistic Prob.

C 0.435764 0.192185 2.267414 0.0243 LCP 0.424535 0.077733 5.461456 0.0000 T 0.012691 0.001421 8.930795 0.0000

R-squared 0.991424 Mean dependent var 3.999717 Adjusted R-squared 0.991353 S.D. dependent var 1.430724 S.E. of regression 0.133041 Akaike info criterion −1.184093 Sum squared resid 4.265706 Schwarz criterion −1.141095 Log likelihood 147.4594 Hannan-Quinn criter. −1.166776 F-statistic 13930.73 Durbin-Watson stat 0.090181 Prob (F-statistic) 0.000000

The first thing to notice about these results is that the estimates of the regression coefficients remain the same under OLS as well as under HAC. However, the standard errors have changed substantially. It seems the OLS standard errors underestimated the true standard errors, thus inflating the t values. But even then the estimated regression coefficients are highly significant. This example shows that autocorrelation need not necessarily negate the OLS results, but we should always check for the presence of autocorrelation in time series data.

Incidentally, the HAC output still shows the same Durbin-Watson value as under OLS estimation. But do not worry about this, for HAC has already taken this into account in recalculating the standard errors.

10.7 SUMMARY

The major points of this chapter are as follows:

1. In the presence of autocorrelation OLS estimators, although unbiased, are not efficient. In short, they are not BLUE.

2. Assuming the Markov first-order autoregressive, the AR(1), scheme, we pointed out that the conventionally computed variances and standard errors of OLS estimators can be seriously biased.

3. As a result, standard t and F tests of significance can be seriously misleading.

334 PART TWO: REGRESSION ANALYSIS IN PRACTICE

guj75845_ch10.qxd 4/16/09 12:26 PM Page 334

CHAPTER TEN: AUTOCORRELATION: WHAT HAPPENS IF ERROR TERMS ARE CORRELATED? 335

4. Therefore, it is important to know whether there is autocorrelation in any given case. We considered three methods of detecting autocorrelation: a. graphical plotting of the residuals b. the runs test c. the Durbin-Watson d test

5. If autocorrelation is found, we suggest that it be corrected by appropriately transforming the model so that in the transformed model there is no auto- correlation. We illustrated the actual mechanics with several examples.

KEY TERMS AND CONCEPTS

The key terms and concepts introduced in this chapter are

Serial correlation or autocorrelation a) spatial correlation

Reasons for autocorrelation a) inertia or sluggishness b) model specification error c) cobweb phenomenon d) data manipulation

Detecting autocorrelation a) time-sequence plot b) the Durbin-Watson d test;

coefficient of autocorrelation; the Markov first-order autoregressive or AR(1) scheme; autoregressive models; h statistic

Remedial measures for serial or autocorrelation a) generalized least squares (GLS)

(generalized difference equation)

b) Prais-Winsten transformation Estimation of

a) first difference equation b) the Durbin-Watson d statistic c) OLS residuals

Large sample method of correcting OLS standard errors a) the Newey-West (NW)

method; HAC; Newey-West standard errors

QUESTIONS

10.1. Explain briefly the meaning of a. Autocorrelation b. First-order autocorrelation c. Spatial correlation

10.2. What is the importance of assuming the Markov first-order, or AR(1), auto- correlation scheme?

10.3. Assuming the AR(1) scheme, what are the consequences of the CLRM assumption that the error terms in the PRF are uncorrelated?

10.4. In the presence of AR(1) autocorrelation, what is the method of estimation that will produce BLUE estimators? Outline the steps involved in imple- menting this method.

10.5. What are the various methods of estimating the autocorrelation parameter in the AR(1) scheme?

guj75845_ch10.qxd 4/16/09 12:26 PM Page 335

10.6. What are the various methods of detecting autocorrelation? State clearly the assumptions underlying each method.

10.7. Although popularly used, what are some limitations of the Durbin-Watson d statistic?

10.8. State whether the following statements are true or false. Briefly justify your answers. a. When autocorrelation is present, OLS estimators are biased as well as

inefficient. b. The Durbin-Watson d is useless in autoregressive models like the regres-

sion (10.7) where one of the explanatory variables is a lagged value(s) of the dependent variable.

c. The Durbin-Watson d test assumes that the variance of the error term ut is homoscedastic.

d. The first difference transformation to eliminate autocorrelation assumes that the coefficient of autocorrelation must be −1.

e. The R2 values of two models, one involving regression in the first differ- ence form and another in the level form, are not directly comparable.

10.9. What is the importance of the Prais-Winsten transformation?

PROBLEMS

10.10. Complete the following table:

Number of explanatory Durbin-Watson Evidence of

Sample size variables d autocorrelation

25 2 0.83 Yes 30 5 1.24 — 50 8 1.98 — 60 6 3.72 —

200 20 1.61 —

10.11. Use the runs test to test for autocorrelation in the following cases. (Use the Swed-Eisenhart tables. See Appendix 10A.)

Sample Number of Number of Number of size + − runs Autocorrelation (?)

18 11 7 2 — 30 15 15 24 — 38 20 18 6 — 15 8 7 4 — 10 5 5 1 —

10.12. For the Phillips curve regression Equation (5.29) given in Chapter 5, the estimated d statistic would be 0.6394. a. Is there evidence of first-order autocorrelation in the residuals? If so, is it

positive or negative?

336 PART TWO: REGRESSION ANALYSIS IN PRACTICE

guj75845_ch10.qxd 4/16/09 12:26 PM Page 336

CHAPTER TEN: AUTOCORRELATION: WHAT HAPPENS IF ERROR TERMS ARE CORRELATED? 337

b. If there is autocorrelation, estimate the coefficient of autocorrelation from the d statistic.

c. Using this estimate, transform the data given in Table 5-6 and estimate the generalized difference equation (10.15) (i.e., apply OLS to the transformed data).

d. Is there autocorrelation in the regression estimated in part (c)? Which test do you use?

10.13. In studying the movement in the production workers’ share in value added (i.e., labor’s share) in manufacturing industries, the following regression results were obtained based on the U.S. data for the years 1949 to 196420 (t ratios in parentheses):

where Y = labor’s share and t = the time. a. Is there serial correlation in Model A? In Model B? b. If there is serial correlation in Model A but not in Model B, what accounts

for the serial correlation in the former? c. What does this example tell us about the usefulness of the d statistic in

detecting autocorrelation? 10.14. Durbin’s two-step method of estimating .21 Write the generalized difference

equation (10.14) in a slightly different but equivalent form as follows:

In step 1 Durbin suggests estimating this regression with Y as the dependent variable and Xt, , and as explanatory variables. The coefficient of

will provide an estimate of . The thus estimated is a consistent estima- tor; that is, in large samples it provides a good estimate of true .

In step 2 use the estimated from step 1 to transform the data to estimate the generalized difference equation (10.14).

Apply Durbin’s two-step method to the U.S. import expenditure data discussed in Chapter 7 and compare your results with those shown for the original regression.

10.15. Consider the following regression model:22

t = (-2.2392) (70.2936) (2.6933) YN t = -49.4664 + 0.88544X2t + 0.09253X3t; R2 = 0.9979; d = 0.8755

� �

��Yt-1 Yt-1Xt-1

Yt = B1(1 - �) + B2Xt - �B2Xt-1 + �Yt-1 + vt

t = (-3.2724) (2.7777)

Model B: YN t = 0.4786 - 0.00127t + 0.0005t2; R2 = 0.6629; d = 1.82

t = (-3.9608)

Model A: YN t = 0.4529 - 0.0041t; r2 = 0.5284; d = 0.8252

20See Damodar N. Gujarati, ”Labor’s Share in Manufacturing Industries,” Industrial and Labor Relations Review, vol. 23, no. 1, October 1969, pp. 65–75.

21Royal Statistical Society, series B, vol. 22, 1960, pp. 139–153. 22See Dominick Salvatore, Managerial Economics, McGraw-Hill, New York, 1989, pp. 138, 148.

guj75845_ch10.qxd 4/16/09 12:26 PM Page 337

338 PART TWO: REGRESSION ANALYSIS IN PRACTICE

where Y = the personal consumption expenditure (1982 billions of dollars) X2 = the personal disposable income (1982 billions of dollars) (PDI) X3 = the Dow Jones Industrial Average Stock Index

The regression is based on U.S. data from 1961 to 1985. a. Is there first-order autocorrelation in the residuals of this regression? How

do you know? b. Using the Durbin two-step procedure, the preceding regression was trans-

formed per Eq. (10.15), yielding the following results:

Has the problem of autocorrelation been resolved? How do you know? c. Comparing the original and transformed regressions, the t value of the

PDI has dropped dramatically. What does this suggest? d. Is the d value from the transformed regression of any value in determin-

ing the presence, or lack thereof, of autocorrelation in the transformed data?

10.16. Durbin h statistic. In autoregressive models like Eq. (10.7):

the usual d statistic is not applicable to detect autocorrelation. For such models, Durbin has suggested replacing the d statistic by the h statistic defined as

where n = the sample size = the estimator of the autocorrelation coefficient

var (b3) = the variance of the estimator of B3, the coefficient of lagged Y variable

Durbin has shown that for large samples, and given the null hypothesis that true , the h statistic is distributed as

It follows the standard normal distribution, that is, normal distribution with zero mean and unit variance. Therefore, we would reject the null hypothesis that if the computed h statistic exceeds the critical h value. If, e.g., the� = 0

h ' N(0, 1)

� = 0

��N

h L �N A

n

1 - n # var(b3)

Yt = B1 + B2Xt + B3Yt-1 + vt

t = (30.72) (2.66) Y*t = -17.97 + 0.89X*2t + 0.09X*3t; R 2 = 0.9816; d = 2.28

guj75845_ch10.qxd 4/16/09 12:26 PM Page 338

CHAPTER TEN: AUTOCORRELATION: WHAT HAPPENS IF ERROR TERMS ARE CORRELATED? 339

level of significance is 5%, the critical h value is −1.96 or 1.96. Therefore, if a computed h exceeds , we can reject the null hypothesis; if it does not ex- ceed this critical value, we do not reject the null hypothesis of no (first-order) autocorrelation. Incidentally, entering the h formula can be obtained from any one of the methods discussed in the text.

Now consider the following demand for money function for India for the periods 1948 to 1949 and 1964 to 1965:

d = 1.8624 where M = real cash balances

R = the long-term interest rate Y = the aggregate real national income

a. For this regression, find the h statistic and test the hypothesis that the pre- ceding regression does not suffer from first-order autocorrelation.

b. As the regression results show, the Durbin-Watson d statistic is 1.8624. Tell why in this case it is inappropriate to use the d statistic. But note that you can use this d value to estimate

10.17. Consider the data given in Table 10-7 (on the textbook’s Web site) relating to stock prices and GDP for the period 1980–2006. a. Estimate the OLS regression

b. Find out if there is first-order autocorrelation in the data on the basis of the d statistic.

c. If there is, use the d value to estimate the autocorrelation parameter . d. Using this estimate of , transform the data per the generalized difference

equation (10.14), and estimate this equation by OLS (1) by dropping the first observation and (2) by including the first observation.

e. Repeat part (d), but estimate from the residuals as shown in Eq. (10.20). Using this estimate of , estimate the generalized difference equation (10.14).

f. Use the first difference method to transform the model into Eq. (10.17) and estimate the transformed model.

g. Compare the results of regressions obtained in parts (d), (e), and ( f ). What conclusions can you draw? Is there autocorrelation in the transformed regressions? How do you know?

10.18. Consider the following model:

Suppose the error term follows the AR(1) scheme in Eq. (10.6). How would you transform this model so that there is no autocorrelation in the trans- formed model? (Hint: Extend Eq. [10.15].)

Yt = B1 + B2X2t + B3X3t + B4X4t + ut

� �

� �

Yt = B1 + B2Xt + ut

�(�N L 1 - d/2).

se = (1.2404) (0.3678) (0.3427) (0.2007) R2 = 0.9227 ln Mt = 1.6027 - 0.1024 ln Rt + 0.6869 ln Yt + 0.5284 ln Mt-1

�N

ƒ 1.96 ƒ

guj75845_ch10.qxd 4/16/09 12:26 PM Page 339

10.19. Establish Eq. (10.8). (Hint: Expand Eq. [10.5] and use Eq. [10.9]. Also, note that for a large sample size and are approximately the same.)

10.20. The Theil-Nagar based on d statistic. Theil and Nagar have suggested that in small samples instead of estimating as (1 − d/2), it should be esti- mated as

where n = the sample size d = the Durbin-Watson d k = the number of coefficients (including the intercept) to be estimated

Show that for large n, this estimate of is equal to the one obtained by the simpler formula (1 − d/2).

10.21. Refer to Example 7.3 relating expenditure on imports (Y) to personal dispos- able income (X). Now consider the following models:

Model 1 Model 2 Model 3

Intercept −136.16 22.69 12.18 X 0.2082 0.2975 0.0382 Time — −18.525 −3.045 Y (−1) — — 0.9659 R2 0.969 0.984 0.994 d 0.216 0.341 1.611

a. What do these results suggest about the nature of autocorrelation in this example?

b. How would you interpret the time and lagged Y terms in Model 3? Note: The estimated coefficients in all the models, except for the X and Time coefficients in Model 3, were statistically significant at the 5% or lower level of significance.

10.22. Monte Carlo experiment. Consider the following model:

(1)

where X takes values of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10. Assume that

(2)

where . Assume that . a. Generate 10 values of vt and then 10 values of ut per Equation (2).

u0 = 0vt ' N(0, 1)

= 0.9ut-1 + vt

ut = �ut-1 + vt

Yt = 1.0 + 0.9Xt + ut

�N = n2(1 - d>2) + k2

n2 - k2

� �

ge2tge2t-1

340 PART TWO: REGRESSION ANALYSIS IN PRACTICE

guj75845_ch10.qxd 4/16/09 12:26 PM Page 340

CHAPTER TEN: AUTOCORRELATION: WHAT HAPPENS IF ERROR TERMS ARE CORRELATED? 341

b. Using the 10 X values and the 10 u values generated in the preceding step, generate 10 values of Y.

c. Regress the Y values generated in part (b) on the 10 X values, obtaining b1 and b2.

d. How do the computed b1 and b2 compare with the true values of 1 and 0.9, respectively?

e. What can you conclude from this experiment? 10.23. Continue with Problem 10.22. Now assume that and repeat the exer-

cise. What do you observe? What general conclusion can you draw from Problems 10.22 and 10.23?

APPENDIX 10A:The Runs Test

THE RUNS TEST23

To explain this test, simply note the sign (+ or −) of the residuals obtained from the estimated regression. Suppose in a sample of 20 observations, we obtained the following sequence of residuals

(++)(− − − − − − − − − − − − −)(+++++) (10A.1)

We now define a run as an uninterrupted sequence of one symbol or attribute, such as + or −. We further define the length of the run as the num- ber of elements in the run. In the sequence shown in Equation (10A.1), there are 3 runs—a run of 2 pluses (i.e., of length 2), a run of 13 minuses (i.e., of length 13), and a run of 5 pluses (i.e., of length 5); for better visual effect we have put the various runs in parentheses.

By examining how runs behave in a strictly random sequence of observa- tions, we can derive a test of randomness of runs. The question we ask is: Are the 3 runs observed in our example consisting of 20 observations too many or too few com- pared with the number of runs expected in a strictly random sequence of 20 observa- tions? If there are too many runs, it means that the e’s change sign frequently, thus suggesting negative serial correlation (cf. Figure 10-2[b]). Similarly, if there are too few runs, it suggests positive autocorrelation, as in Figure 10-2(a).

Now let N = total number of observations (= N1 + N2) N1 = number of + symbols (i.e., + residuals) N2 = number of − symbols (i.e., − residuals)

k = number of runs

Then under the null hypothesis that the successive outcomes (here, residuals) are independent, Swed and Eisenhart have developed special tables that give

� = 0.1

23It is a nonparametric test because it makes no assumptions about the (probability) distribution from which the observations are taken.

guj75845_ch10.qxd 4/16/09 12:26 PM Page 341

critical values of the runs expected in a random sequence of N observations. These tables are given in Appendix E, Table E-6.

Swed-Eisenhart Critical Runs Test

To illustrate the use of these tables, let us revert to the sequence shown in Eq. (10A.1). We have N = 20, N1 = 7 (7 pluses), N2 = 13 (13 minuses), and k = 3 runs. For N1 = 7 and N2 = 13, the 5% critical values of runs are 5 and 15. Now, as noted in Appendix E, Table E-6, if the actual number of runs is equal to or less than 5 or equal to or greater than 15, we can reject the hypothesis that the observed sequence of the e’s given in Eq. (10A.1) is random. In our example the actual number of runs is 3. Hence, we can conclude that the observed sequence in Eq. (10A.1) is not random.

Note that the Swed-Eisenhart table is for 40 observations at most—20 pluses and 20 minuses. If the actual sample size is greater, we cannot use these tables. But in that case it can be shown that if N1 � 10 and N2 � 10 and the null hy- pothesis is that the successive observations (residuals in our case) are inde- pendent, the number of runs k is asymptotically (i.e., in large samples) normally distributed with

(10A.2)

(10A.3)

If the null hypothesis of randomness is sustainable, following the properties of the normal distribution, we should expect that

(10A.4)

That is, the probability is 95% that the preceding interval will include the observed k.

Decision Rule

Do not reject the null hypothesis of randomness with 95% confidence if k, the number of runs, lies in the interval of Eq. (10A.4); reject the null hypothesis if the estimated k lies outside these limits. (Note: You can choose any level of confidence you want.)

Prob[E(k) - 1.96�k … k … E(k) + 1.96�k] = 0.95

Variance: �2k = 2N1N2(2N1N2 - N)

N2(N - 1)

Mean: E(k) = 2N1N2

N + 1

342 PART TWO: REGRESSION ANALYSIS IN PRACTICE

guj75845_ch10.qxd 4/16/09 12:26 PM Page 342

CHAPTER TEN: AUTOCORRELATION: WHAT HAPPENS IF ERROR TERMS ARE CORRELATED? 343

24T. S. Breusch, “Testing for Autocorrelation in Dynamic Linear Models,” Australian Economic Papers, vol. 17, 1978, pp. 334–355, and L. G. Godfrey, “Testing Against General Autoregressive and Moving Average Error Models When the Regressand Includes Lagged Dependent Variables,” Econometrica, vol. 46, 1978, pp. 1293–1302.

APPENDIX 10B: A General Test of Autocorrelation: The Breusch-Godfrey (BG) Test

A test of autocorrelation that is more general than some of the tests discussed so far is one developed by statisticians Breusch and Godfrey.24 This test is general in that it allows for (1) stochastic regressors, such as the lagged values of the de- pendent variables, (2) higher-order autoregressive schemes, such as AR(1), AR(2), etc., and (3) simple or higher-order moving averages of the purely ran- dom error terms, such as , , etc.

To illustrate this test, we revert to the dividend–corporate profits example discussed in Section 10.6. In that example we regressed the logarithm of divi- dend on the logarithm of corporate profits and a trend variable. On the basis of the Durbin-Watson test, we found in that example that we did have the auto- correlation problem. This is also confirmed by the BG test, which proceeds as follows:

1. Run the dividend regression as shown in Eq. (10.22) and obtain residuals from this regression, .

2. Now run the following regression:

That is, regress the residual at time t on the original regressors, including the intercept and the lagged values of the residuals up to time , the value of k being determined by trial and error or on the basis of Akaike or Schwarz information criteria. Obtain the value of this regression. This is called the auxiliary regression.

3. Calculate , that is, obtain the product of the sample size n and the value obtained in (2). Under the null hypothesis that all the coefficients of the lagged residual terms are simultaneously equal to zero, it can be shown that in large samples

That is, in large samples, the product of the sample size and follows the chi-square distribution with k degrees of freedom (i.e., the number of lagged residual terms). In econometrics literature, the BG test is known as the Lagrange multiplier test.

R2

nR2 ' �2k

R2nR2

R2

(t - k)

et = A1 + A2lCPt + A2 Time + C1et-1 + C2et-2 + Á + Cket-k + vt

et

vt-2vt-1

guj75845_ch10.qxd 4/16/09 12:26 PM Page 343

For our example, we obtained the following results (for illustrative purposes we have used three lagged values of the residuals, although only the first lagged value is statistically significant):

Breusch-Godfrey Serial Correlation LM Test:

F-statistic 823.0875 Prob. F(3,238) 0.0000 Obs*R-squared 222.5495 Prob. Chi-Square(3) 0.0000

Test Equation: Dependent Variable: RESID Method: Least Squares

Sample: 1947Q1 2007Q4 Included observations: 244 Presample missing value lagged residuals set to zero.

Coefficient Std. error t-Statistic Prob.

C −0.020423 0.031482 −0.648726 0.5171 LCP 0.007548 0.012027 0.627611 0.5309 Time −0.000121 0.000214 −0.565962 0.5720 RESID(−1) 0.907903 0.064654 14.04247 0.0000 RESID(−2) −0.021374 0.087434 −0.244459 0.8071 RESID(−3) 0.074971 0.064785 1.157217 0.2483

R-squared 0.912088 Mean dependent var −1.10E-15 Adjusted R-squared 0.910241 S.D. dependent var 0.132493 S.E. of regression 0.039694 Akaike info criterion −3.590926 Sum squared resid 0.375005 Schwarz criterion −3.504930 Log likelihood 444.0929 Hannan-Quinn criter. −3.556291 F-statistic 493.8525 Durbin-Watson stat 2.021935 Prob (F-statistic) 0.000000

As you can see, . The probability of obtaining a chi-square value of as much as 222.54 or greater for 3 d.f. is practically zero. Therefore, we can reject the hypothesis that . That is, there is evidence of au- tocorrelation in the error term. The BG test, therefore, confirms the finding on the basis of the Durbin-Watson test. But keep in mind that the BG test is of gen- eral applicability, whereas the Durbin-Watson test assumes only first-order ser- ial correlation.

C1 = C2 = C3 = 0

nR2 ' 222.54 = �23

344 PART TWO: REGRESSION ANALYSIS IN PRACTICE

guj75845_ch10.qxd 4/16/09 12:26 PM Page 344

PART III ADVANCED TOPICS IN

ECONOMETRICS

345

In this part, consisting of two chapters, we discuss two topics that may be ad- vanced for the beginner. But with an instructor’s help, students can master them with some effort.

Chapter 11 discusses simultaneous equation models. Chapters in the previ- ous two parts of the text were devoted to single equation regression models be- cause such models are used extensively in empirical work in business and eco- nomics. In such models, as we have seen, one variable (the dependent variable, Y) is expressed as a linear function of one or more other variables (the explanatory variables, the X’s). In such models an implicit assumption is that the cause-and-effect relationship, if any, between Y and the X’s is unidirectional; the explanatory variables are the cause, and the dependent variable is the effect.

However, there are situations where there is a two-way flow, or influence, among economic variables; that is, one economic variable affects another eco- nomic variable(s) and is, in turn, affected by it (them). Thus in the regression of money (M) on the rate of interest (r), the single-equation methodology assumes implicitly that the rate of interest is fixed (say, by the Federal Reserve Bank) and tries to find out the change in the amount of money demanded in response to changes in the level of the interest rate. But what happens if the rate of interest depends on the demand for money? In this case, the conditional regression analysis made thus far in this book may not be appropriate because now M depends on r and r depends on M. This leads us to consider simultaneous equa- tion models—models in which there is more than one regression equation, that is, one for each interdependent variable.

guj75845_ch11.qxd 4/16/09 12:23 PM Page 345

In this chapter we present a very elementary, and often heuristic, introduc- tion to the vast and complex subject of simultaneous equation models, the details being left for the references.

Chapter 12 discusses a variety of topics in the field of time series economet- rics, a field that is growing in importance. In regression analysis involving time series data we have to be careful in routinely using the standard classical linear regression assumptions. The critical concept in time series analysis is the concept of stationary time series. In this chapter we discuss this topic at an intuitive level and point out the importance of testing for stationarity.

In this chapter we also discuss the logit model. In Chapter 6 we considered several models in which one or more X variables were dummy variables, taking a value of 0 or 1. In logit models we try to model situations in which the dependent variable, Y, is a dummy variable. For example, admission to a graduate school is a dummy variable, for you are either accepted or rejected. Although such models can be estimated with the standard ordinary least squares (OLS) procedure, it is generally not recommended because of several estimation problems.

In these two chapters, as throughout the book, we illustrate the various concepts introduced with several concrete examples.

346 PART THREE: ADVANCED TOPICS IN ECONOMETRICS

guj75845_ch11.qxd 4/16/09 12:23 PM Page 346

CHAPTER 11 SIMULTANEOUS

EQUATION MODELS

347

All the regression models we have considered so far have been single equation regression models in that a single dependent variable (Y) was expressed as a function of one or more explanatory variables (the X’s). The underlying eco- nomic theory determined why Y was treated as the dependent variable and the X’s as the determining or causal variables. In other words, in such single equa- tion regression models the causality, if any, ran from the X’s to Y. Thus, in our child mortality illustrative example considered earlier, it was socioeconomic theory that suggested that personal income (X2) and female literacy rate (X3) were the primary factors affecting child mortality (Y).

However, there are situations in which such a unidirectional relationship be- tween Y and the X’s cannot be maintained. It is quite possible that the X’s not only affect Y, but that Y can also affect one or more X’s. If that is the case, we have a bi- lateral, or feedback, relationship between Y and the X’s. Obviously, if this is the case, the single equation modeling strategy that we have discussed in the previous chapters will not suffice, and in some cases it may be quite inappropriate because it may lead to biased (in the statistical sense) results. To take into account the bi- lateral relationship between Y and the X’s, we will therefore need more than one regression equation. Regression models in which there is more than one equation and in which there are feedback relationships among variables are known as si- multaneous equation regression models. In the rest of this chapter we will dis- cuss the nature of such simultaneous equation models. Our treatment of the topic is heuristic. For a detailed treatment of this topic, consult the references.1

1An extended treatment of this subject can be found in Gujarati and Porter, Basic Econometrics, 5th ed., McGraw-Hill, New York, 2009, Chapters 18–20.

guj75845_ch11.qxd 4/16/09 12:23 PM Page 347

11.1 THE NATURE OF SIMULTANEOUS EQUATION MODELS

The best way to proceed is to consider some examples from economics.

Example 11.1. The Keynesian Model of Income Determination

A beginning student of economics is exposed to the simple Keynesian model of income determination. Using the standard macroeconomics textbook con- vention, let C stand for consumption (expenditure), Y for income, I for in- vestment (expenditure), and S for savings. The simple Keynesian model of income determination consists of the following two equations:

Consumption function: Ct = B1 + B2Yt + ut (11.1)

Income identity: Yt = Ct + It (11.2)

where t is the time subscript, u is the stochastic error term, and It = St. This simple Keynesian model assumes a closed economy (i.e., there is no

foreign trade) and no government expenditure (recall that the income iden- tity is generally written as Yt = Ct + It + Gt + NXt, where G is government expenditure and NX is net export [export − import]). The model also assu- mes that I, investment expenditure, is determined exogenously, say, by the private sector.

The consumption function states that consumption expenditure is linearly related to income; the stochastic error term is added to the function to reflect the fact that in empirical analysis the relation between the two is only approximate. The (national income) identity says that total income is equal to the sum of con- sumption expenditure and investment expenditure; the latter is equal to total savings. As we know, the slope coefficient B2 in the consumption function is the marginal propensity to consume (MPC), the amount of extra consumption expen- diture resulting from an extra dollar of income. Keynes assumed that MPC is positive but less than 1, which is reasonable because people may save part of their additional income.

Now we can see the feedback, or simultaneous, relationship between con- sumption expenditure and income. From Equation (11.1) we see that income affects consumption expenditure, but from Equation (11.2) we also see that consumption is a component of income. Thus, consumption expenditure and income are interdependent. The objective of analysis is to find out how consump- tion expenditure and income are determined simultaneously. Thus consumption and income are jointly dependent variables. In the language of simultaneous equation modeling, such jointly dependent variables are known as endogenous variables. In the simple Keynesian model, investment I is not an endogenous variable, for its value is determined independently; so it is called an exogenous, or predetermined, variable. In more refined Keynesian models, investment can also be made endogenous.

348 PART THREE: ADVANCED TOPICS IN ECONOMETRICS

guj75845_ch11.qxd 4/16/09 12:23 PM Page 348

In general, an endogenous variable is a “variable that is an inherent part of the system being studied and that is determined within the system. In other words, a variable that is caused by other variables in a causal system,” and an exogenous variable “is a variable entering from and determined from outside the system being studied. A causal system says nothing about its exogenous variables.”2

Equations (11.1) and (11.2) represent a two-equation model involving two endogenous variables, C and Y. If there are more endogenous variables, there will be more equations, one for each of the endogenous variables. Some equa- tions in the system are structural, or behavioral, equations and some are identities. Thus, in our simple Keynesian model, Eq. (11.1) is a structural, or behavioral, equation, for it depicts the structure or behavior of a particular sector of the economy, the consumption sector here. The coefficients (or parameters) of the structural equations, such as B1 and B2, are known as structural coefficients. Equation (11.2) is an identity, a relationship that is true by definition: Total in- come is equal to total consumption expenditure plus total investment.

Example 11.2. Demand and Supply Model

As every student of economics knows, the price P of a commodity and the quantity Q sold are determined by the intersection of the demand and sup- ply curves for that commodity. Thus, assuming for simplicity that the demand and supply curves are linearly related to price and adding the stochastic, or random error, terms u1 and u2, we may write the empirical demand and supply functions as:

(11.3)

(11.4)

(11.5)

where = quantity demanded, = quantity supplied, and t = time. According to economic theory, A2 is expected to be negative (downward-

sloping demand curve) and B2 is expected to be positive (upward-sloping supply curve). Equations (11.3) and (11.4) are both structural equations, the former representing the consumers and the latter the suppliers. The A’s and B’s are structural coefficients.

Now it is not too difficult to see why there is a simultaneous, or two-way, relationship between P and Q. If, for example, u1t (in Eq. [11.3]) changes because of changes in other variables affecting demand (such as income, wealth, and tastes), the demand curve will shift upward if u1t is positive and

QstQ d t

Equilibrium condition: Qdt = Qst

Supply function: Qst = B1 + B2Pt + u2t

Demand function: Qdt = A1 + A2Pt + u1t

CHAPTER ELEVEN: SIMULTANEOUS EQUATION MODELS 349

2W. Paul Vogt, Dictionary of Statistics and Methodology: A Nontechnical Guide for the Social Sciences, Sage Publications, California, 1993, pp. 81, 85.

guj75845_ch11.qxd 4/16/09 12:23 PM Page 349

downward if u1t is negative. As Figure 11-1 shows, a shift in the demand curve changes both P and Q. Similarly, a change in u2t (because of strikes, weather, hurricanes) will shift the supply curve, again affecting both P and Q. Therefore, there is a bilateral, or simultaneous, relationship between the two variables; the P and Q variables are thus jointly dependent, or endogenous, variables. This is known as the simultaneity problem.

11.2 THE SIMULTANEOUS EQUATION BIAS: INCONSISTENCY OF OLS ESTIMATORS

Why is simultaneity a problem? To understand the nature of this problem, return to Example 11.1, which discusses the simple Keynesian model of income determination. Assume for the moment that we neglect the simultaneity

350 PART THREE: ADVANCED TOPICS IN ECONOMETRICS

Interdependence of price and quantityFIGURE 11-1

0

Quantity

P

P ri

ce P0

Q0

S

D0 Q

0

Quantity

P

P ri

ce P0

Q0

S

D0

Q

D1

Q1

P1

0

Quantity

P

P ri

ce P0

Q1

S

D0 Q

D1

Q0

P1

guj75845_ch11.qxd 4/16/09 12:23 PM Page 350

between consumption expenditure and income and just estimate the consump- tion function (11.1) by the usual ordinary least squares (OLS) procedure. Using the usual OLS formula, we obtain

(11.6)

Now recall from Chapter 3 that if we work within the framework of the classi- cal linear regression model (CLRM), which is the framework we have used thus far, the OLS estimators are best linear unbiased estimators (BLUE). Is b2 given in Equation (11.6) a BLUE estimator of the true marginal propensity to consume B2? It can be shown that in the presence of the simultaneity problem the OLS estimators are generally not BLUE. In our case b2 is not a BLUE estimator of B2. In particular, b2 is a biased estimator of B2; on average, it underestimates or over- estimates the true B2. A formal proof of this statement is given in Appendix 11A. But intuitively it is easy to see why b2 may not be BLUE.

As discussed in Section 3.1, one of the assumptions of the CLRM is that the sto- chastic error term u and the explanatory variable(s) are not correlated. Thus, in the Keynesian consumption function Y (income) and the error term ut must not be correlated, if we want to use OLS to estimate the parameters of the consumption function (11.1). But that is not the case here. To see this, we proceed as follows:

Therefore, transferring the B1Yt term to the left-hand side and simplifying, we obtain

(11.7)

Notice an interesting feature of this equation. National income Y not only depends on investment I but also on the stochastic error term u! Recall that the error term u represents all kinds of influences not explicitly included in the model. Let us suppose that one of these influences is consumer confidence as measured by, say, the consumer confidence index developed by the University of Michigan. Suppose consumers feel upbeat about the economy because of a boom in the stock market (as happened in the United States in 1996 and 1997). Therefore, consumers increase their consumption expenditure, which affects in- come Y in view of the income identity (11.2). This increase in income will lead to another round of increase in consumption because of the presence of Y in the consumption function (11.1), which will lead to further increases in income, and so on. What will the end result of this process be? Students familiar with ele- mentary macroeconomics will recognize that the end result will depend on the value of the multiplier . If, for example, the MPC (B2) is 0.8 (i.e., 80 cents(

1 1 - B2)

Yt = B0

1 - B1 +

1 1 - B1

It + 1

1 - B1 ut

= B0 + B1Yt + ut + It

= (B0 + B1Yt + ut) + It substituting for Ct from Eq. (11.1) Yt = Ct + It

b2 = ©(Ct - C)(Yt - Y)

©(Yt - Y)2 =

©ctyt ©y2t

CHAPTER ELEVEN: SIMULTANEOUS EQUATION MODELS 351

guj75845_ch11.qxd 4/16/09 12:23 PM Page 351

of every additional dollar’s worth of income is spent on consumption), the mul- tiplier will be 5.

The point to note is that Y and u in Eq. (11.1) are correlated, and hence we cannot use OLS to estimate the parameters of the consumption function (11.1). If we per- sist in using it, the estimators will be biased. Not only that, but as Appendix 11A shows, the estimators are not even consistent. As discussed in Appendix D.4, roughly speaking, an estimator is said to be an inconsistent estimator if it does not approach the true parameter value even if the sample size increases indefi- nitely. In sum, then, because of the correlation between Y and u, the estimator b2 is biased (in small samples) as well as inconsistent (in large samples). This just about destroys the usefulness of OLS as an estimating method in the context of simultaneous equation models. Obviously, we need to explore other estimating methods. We discuss an alternative method in the following section. In passing, note that if an explanatory variable in a regression equation is correlated with the error term in that equation, that variable essentially becomes a random, or stochastic, variable. In most of the regression models considered previously, we either assumed that the explanatory variables assume fixed values, or if they were random, that they were uncorrelated with the error term. This is not the case in the present instance.

Before proceeding further, notice an interesting feature of Equation (11.7): It expresses Y (income) as a function of I (investment), which is given exoge- nously, and error term u. Such an equation, which expresses an endogenous vari- able solely as a function of an exogenous variable(s) and the error term, is known as a reduced form equation (regression). We will see the utility of such reduced form equations shortly.

If we now substitute Y from Eq. (11.7) into the consumption function (11.1), we obtain the reduced form equation for C as

(11.8)

As in Eq. (11.7), this equation expresses the endogenous variable C (consump- tion) solely as a function of the exogenous variable I and the error term.

11.3 THE METHOD OF INDIRECT LEAST SQUARES (ILS)

For reasons just stated, we should not use OLS to estimate the parameters B1 and B2 of the consumption function (11.1) because of correlation between Y and u. What is the alternative? The alternative can be found in Equation (11.8). Why not simply regress C on I, using the method of OLS? We could do that, because I, being exogenous by assumption, is uncorrelated with u; this was not the case with the original consumption function (11.1).

But how does the regression (11.8) enable us to estimate the parameters of the original consumption function (11.1), the object of our primary interest? This is easy enough. Let us write Eq. (11.8) as

(11.9)Ct = A1 + A2It + vt

Ct = B1

1 - B2 +

B2 1 - B2

It + 1

1 - B2 ut

352 PART THREE: ADVANCED TOPICS IN ECONOMETRICS

guj75845_ch11.qxd 4/16/09 12:23 PM Page 352

where A1 = B1/(1 - B2), A2 = B2/(1 - B2), and vt = ut/(1 - B2). Like u, v is also a stochastic error term; it is simply a rescaled u. The coefficients A1 and A2 are known as the reduced form coefficients because they are the coefficients attached to the reduced form (regression) equation. Observe that the reduced form coef- ficients are (nonlinear) combinations of the original structural coefficients of consumption function (11.1).

Now from the relationship between the A and B coefficients just given, it is easy to verify that

(11.10)

(11.11)

Therefore, once we estimate A1 and A2, we can easily “retrieve” B1 and B2 from them.

This method of obtaining the estimates of the parameters of the consumption function (11.1) is known as the method of indirect least squares (ILS), for we obtain the estimates of the original parameters indirectly by first applying OLS to the reduced form regression (11.9). What are the statistical properties of ILS estimators? We state (without proof) that the ILS estimators are consistent estimators; that is, as the sample size increases indefinitely, these estimators converge to their true population values. However, in small, or finite, samples, the ILS estimators may be biased. In contrast, the OLS estimators are biased as well as inconsistent.3

11.4 INDIRECT LEAST SQUARES: AN ILLUSTRATIVE EXAMPLE

As an application of the ILS, consider the data given in Table 11-1 on the text- book’s Web site. The data on consumption, income, and investment are for the United States for the years 1959 to 2006 and are given in billions of dollars. It should be noted that the data on income is simply the sum of consumption and investment expenditure, in keeping with our simple Keynesian model of in- come determination.

Following our discussion of ILS, we first estimate the reduced form regres- sion (11.8). Using the data given in Table 11-1, we obtain the following results; the results are given in the standard format as per Eq. (3.46).

(11.12)

t = (-1.4040) (58.6475) r2 = 0.9868 se = (69.4198) (0.0729) CN t = -97.4641 + 4.2767It

B2 = A2

1 + A2

B1 = A1

1 + A2

CHAPTER ELEVEN: SIMULTANEOUS EQUATION MODELS 353

3For a proof of these statements, consult Gujarati and Porter, Basic Econometrics, 5th ed., McGraw-Hill, New York, 2009, Chapter 18.

guj75845_ch11.qxd 4/16/09 12:23 PM Page 353

Thus and , which are respectively the estimates of and , the parameters of the reduced form regression (11.8). Now we use Equations (11.10) and (11.11) to obtain the estimates of and , the parameters of the consumption function (11.1):

(11.13)

(11.14)

These are the ILS estimates of the parameters of the consumption function. And the estimated consumption function now is

(11.15)

Thus, the estimated marginal propensity to consume (MPC) is about 0.81. For comparison, we give the results based on OLS, that is, the results obtained

by directly regressing C on Y without the intermediary of the reduced form:

(11.16)

Note the difference between the ILS and OLS estimates of the parameters of the consumption function. Although the estimated marginal propensities to consume do not differ substantially, there is a difference in the estimated intercept values. Which results should we trust? We should trust the results obtained from the method of ILS, for we know that in the presence of the simultaneity problem, the OLS results are not only biased but are inconsis- tent as well.4

It would seem that we can always use the method of indirect least squares to estimate the parameters of simultaneous equation models. The question is whether we can retrieve the original structural parameters from these reduced form estimates. Sometimes we can, and sometimes we cannot. The answer depends on the so-called identification problem. In the following sec- tion we discuss this problem and then in the ensuing sections we discuss other methods of estimating the parameters of the simultaneous equation models.

t = (-1.9177) (312.8214) r2 = 0.9995 se = (12.8715) (0.0026) CN t = -24.6841 + 0.8121Yt

CNt = -18.4707 + 0.8105Yt

b2 = a2

1 + a2 =

4.2767 1 + 4.2767

= 0.8105

b1 = a1

1 + a2 =

-97.4641 1 + 4.2767

= -18.4707

B2B1 A2

A1a2 = 4.2767a1 = -97.4641

354 PART THREE: ADVANCED TOPICS IN ECONOMETRICS

4Notice that we have given standard errors and t values for the OLS regression (11.16) but not for the ILS regression (11.15). This is because the coefficients of the latter, obtained from Eqs. (11.13) and (11.14), are nonlinear functions of a1 and a2, and there is no simple method of obtaining stan- dard errors of nonlinear functions.

guj75845_ch11.qxd 4/16/09 12:23 PM Page 354

11.5 THE IDENTIFICATION PROBLEM: A ROSE BY ANY OTHER NAME MAY NOT BE A ROSE

Let us return to the supply and demand model of Example 11.2. Suppose we have data on P and Q only, and we want to estimate the demand function. Suppose we regress Q on P. How do we know that this regression in fact estimates a demand function? You might say that if the slope of the estimated regression is negative, it is a demand function because of the inverse relation- ship between price and quantity demanded. But suppose the slope coefficient turns out to be positive. What then? Do you then say that it must be a supply function because there is a positive relationship between price and quantity supplied?

You can see the potential problem involved in simply regressing quantity on price: A given Pt and Qt combination represents simply the point of intersection of the appropriate supply and demand curves because of the equilibrium condi- tion that demand is equal to supply. To see this more clearly, consider Figure 11-2.

Figure 11-2(a) gives a few scatterpoints relating P to Q. Each scatterpoint rep- resents the intersection of a demand and supply curve, as shown in Figure 11-2(b).

CHAPTER ELEVEN: SIMULTANEOUS EQUATION MODELS 355

Hypothetical supply and demand functions and the identification problemFIGURE 11-2 P

ri ce

0 Quantity

P

Q

D3

D2

D1

S3

S2 S1

(c)

0 Quantity

P

Q

D S

(b)

P ri

ce

D S

D S D

S

0 Quantity

P

Q

(a)

P ri

ce

P ri

ce

0 Quantity

P

Q

(e)

D

S1 S2

S3 S4

S5P ri

ce

0 Quantity

P

Q

(d)

S

D1

D2

D3

D4

D5

guj75845_ch11.qxd 4/16/09 12:23 PM Page 355

Now consider a single point, such as that shown in Figure 11-2(c). There is no way we can be sure which demand and supply curve of the whole family of curves shown in that panel generated that particular point. Clearly, some ad- ditional information about the nature of the demand and supply curves is needed. For example, if the demand curve shifts over time because of a change in income or tastes, for example, but the supply curve remains relatively sta- ble, as in Figure 11-2(d), the scatterpoints trace out a supply curve. In this situ- ation, we say that the supply curve is identified; that is, we can uniquely estimate the parameters of the supply curve. By the same token, if the supply curve shifts over time because of weather factors (in the case of agricultural com- modities) or other extraneous factors but the demand curve remains relatively stable, as in Figure 11-2(e), the scatterpoints trace out a demand curve. In this case, we say that the demand curve is identified; that is, we can uniquely esti- mate its parameters.

The identification problem therefore addresses whether we can estimate the parameters of the particular equation (be it a demand or a supply function) uniquely. If that is the case, we say that the particular equation is exactly iden- tified. If we cannot estimate the parameters, we say that the equation is uniden- tified or underidentified. Sometimes it can happen that there is more than one numerical value for one or more parameters of the equation. In that case, we say that the equation is overidentified. We will now consider each of these cases briefly.

Underidentification

Consider once again Example 11.2. By the equilibrium condition that supply equals demand, we obtain

(11.17)

Solving Equation (11.17), we obtain the equilibrium price

(11.18)

where (11.19)

(11.20)

where v1 is a stochastic error term, which is a linear combination of the u’s. The symbol is read as pi and is used here to represent a reduced form regression coefficient.

v1t = u2t - u1t A2 - B2

�1 = B1 - A1 A2 - B2

Pt = �1 + v1t

A1 + A2Pt + u1t = B1 + B2Pt + u2t

356 PART THREE: ADVANCED TOPICS IN ECONOMETRICS

guj75845_ch11.qxd 4/16/09 12:23 PM Page 356

Substituting Pt from Equation (11.18) into either the supply or demand func- tion of Example 11.2, we obtain the following equilibrium quantity:

(11.21)

where (11.22)

(11.23)

where v2 is also a stochastic, or random, error term. Equations (11.19) and (11.21) are reduced form regressions. Now our demand

and supply model has four structural coefficients, A1, A2, B1, and B2, but there is no unique way of estimating them from the two reduced form coefficients, and . As elementary algebra teaches us, to estimate four unknowns we must have four (independent) equations. Incidentally, if we run the reduced form re- gressions (11.19) and (11.21) we see that there are no explanatory variables, only the constants, the ’s, and these constants will simply give the mean values of P and Q. (Why?) There is no way of estimating the four structural coefficients from the two mean values. In short, both the demand and supply functions are unidentified.

Just or Exact Identification

We have already considered this case in the previous section where we dis- cussed the estimation of the Keynesian consumption function using the method of indirect least squares. As shown there, from the reduced form regression (11.12), we were able to obtain unique values of the parameters of the con- sumption function, as can be seen from Eqs. (11.13) and (11.14).

To further illustrate exact identification, let us continue with our demand and supply example, but now we modify the model as follows:

(11.24)

(11.25)

where in addition to the variables already defined, X = income of the consumer. Thus, the demand function states that the quantity demanded is a function of its price as well as the income of the consumer; economic theory of demand gener- ally has price and income as its two main determinants. The inclusion of the income variable in the model will give us some additional information about consumer behavior. It is assumed that the income of the consumer is determined exogenously.

Using the market-clearing mechanism, quantity demanded = quantity sup- plied, we obtain

(11.26)A1 + A2Pt + A3Xt + u1t = B1 + B2Pt + u2t

Supply function: Qt s = B1 + B2Pt + u2t

Demand function: Qt d = A1 + A2Pt + A3Xt + u1t

�2

�1

v2t = A2u2t - B2u1t

A2 - B2

�2 = A2B1 - A1B2

A2 - B2

Qt = �2 + u2t

CHAPTER ELEVEN: SIMULTANEOUS EQUATION MODELS 357

guj75845_ch11.qxd 4/16/09 12:23 PM Page 357

Solving Equation (11.26) provides the following equilibrium value of Pt:

(11.27)

where the reduced form coefficients are

(11.28)

(11.29)

(11.30)

Substituting the equilibrium value of Pt into the preceding demand or supply function, we obtain the following equilibrium, or market-clearing, quantity:

(11.31)

where (11.32)

(11.33)

(11.34)

Since Equations (11.27) and (11.31) are both reduced form regressions, as noted before, OLS can always be applied to estimate their parameters. The question that remains is whether we can uniquely estimate the parameters of the struc- tural equations from the reduced form coefficients.

Observe that the demand and supply models (11.24) and (11.25) contain five structural coefficients, A1, A2, A3, B1, and B2. But we have only four equations to estimate them—the four reduced form coefficients, the four ’s. So, we cannot obtain unique values of all five of the structural coefficients. But which of these coefficients can be uniquely estimated? The reader can verify that the parame- ters of the supply function can be uniquely estimated, for

(11.35)

(11.36)

Therefore, the supply function is exactly identified. But the demand function is unidentified because there is no unique way of estimating its parameters, the A coefficients.

B2 = �4 �2

B1 = �3 - B2�1

v2t = A2u2t - B2u1t

A2 - B2

�4 = - A3B2

A2 - B2

�3 = A2B1 - A1B2

A2 - B2

Qt = �3 + �4Xt + v2t

v1t = u2t - u1t A2 - B2

�2 = - A3

A2 - B2

�1 = B1 - A1 A2 - B2

Pt = �1 + �2Xt + v1t

358 PART THREE: ADVANCED TOPICS IN ECONOMETRICS

guj75845_ch11.qxd 4/16/09 12:23 PM Page 358

Observe an interesting fact: It is the presence of an additional variable in the demand function that enables us to identify the supply function. Why? The inclusion of the income variable in the demand equation provides us with some additional information about the variability of the function, as indicated in Figure 11-2(d). The figure shows how the intersection of the stable supply curve with the shift- ing demand curve (due to changes in income) enables us to trace (identify) the supply curve.

How can the demand function be identified? Suppose we include , the one-period lagged value of price as an additional variable in the supply func- tion (11.25). This amounts to saying that the supply depends not only on the current price but also on the price prevailing in the previous period, not an un- reasonable assumption for many agricultural commodities. Since at time t the value of is already known, we can treat it as an exogenous, or predeter- mined, variable. Thus the new model is

(11.37)

(11.38)

Using Equations (11.37) and (11.38) and the market-clearing condition, obtain the reduced form regressions and verify that now both the demand and sup- ply functions are identified; each reduced form regression will have Xt and

as explanatory variables, and since the values of these variables are de- termined outside the model, they are uncorrelated with the error terms. Once again notice how the inclusion or exclusion of a variable(s) from an equation helps us to identify that equation, that is, to obtain unique values of the param- eters of that equation. Thus it is the exclusion of the variable from the de- mand function that helps us to identify it, just as the exclusion of the income variable (Xt) from the supply function helps us to identify it. One implication is that an equation in a simultaneous equation system cannot be identified if it includes all the variables (endogenous as well as exogenous) in the system. Later we provide a simple rule of identification that generalizes this idea (see Section 11.6).

Overidentification

Although the exclusion of certain variables from an equation may enable us to identify it as we just showed, sometimes we can overdo it. This leads to the problem of overidentification, a situation in which there is more than one value for one or more parameters of an equation in the model. Let us see how this can happen.

Once again return to the demand-supply model and write it as

(11.39)

(11.40) Supply function: Qst = B1 + B2Pt + B3Pt-1 + u2t

Demand function: Qdt = A1 + A2Pt + A3Xt + A4Wt + u1t

Pt-1

Pt-1

Supply function: Qst = B1 + B2Pt + B3Pt-1 + u2t

Demand function: Qdt = A1 + A2Pt + A3Xt + u1t

Pt-1

Pt-1

CHAPTER ELEVEN: SIMULTANEOUS EQUATION MODELS 359

guj75845_ch11.qxd 4/16/09 12:23 PM Page 359

where in addition to the variables introduced previously, Wt stands for the wealth of the consumer. For many commodities, income as well as wealth are important determinants of demand. Compare the demand and supply models (11.37) and (11.38) with the models (11.39) and (11.40). Whereas originally the supply function excluded only the income variable, in the new model it excludes both the income and wealth variables. Before, the exclusion of the income vari- able from the supply function enabled us to identify it; now the exclusion of both the income and wealth variables from the supply function overidentifies it in the sense that we have two estimates of the supply parameter B2, as we show below.

Equating models (11.39) and (11.40), we now obtain the following reduced form regressions:

(11.41)

(11.42)

where

(11.43)

Remember that the supply and demand models we are considering have seven structural coefficients in all—the four A’s and three B’s. But there are eight reduced form coefficients in Equation (11.43). We have more equations than unknowns. Clearly, there is more than one solution to a parameter. You can readily verify that we have, in fact, two values for B2:

(11.44)

And there is no reason to believe that these two estimates will be the same. Since B2 appears in the denominators of all the reduced form coefficients

given in Eq. (11.43), the ambiguity in the estimation of B2 will be transmitted to other structural coefficients also. Why do we obtain such a result? It seems that we have too much information—exclusion of either the income or wealth variable

B2 = �7 �3

or B2 = �6�2

v2t = A2u2t - B2u1t

A2 - B2 v1t =

u2t - u1t A2 - B2

�8 = A2B3

A2 - B2 �7 =

A4B2 A2 - B2

� 6

= - A3B2

A2 - B2 �5 =

A2B1 - A1B2 A2 - B2

�4 = B3

A2 - B2 �3 = -

A4 A2 - B2

�2 = - A3

A2 - B2 �1 =

B1 - A1 A2 - B2

Qt = �5 + �6Xt + �7Wt + �8Pt-1 + v2t

Pt = �1 + �2XT + �3Wt + �4Pt-1 + v1t

360 PART THREE: ADVANCED TOPICS IN ECONOMETRICS

guj75845_ch11.qxd 4/16/09 12:23 PM Page 360

would have sufficed to identify the supply function. This is the opposite of the case of underidentification, where there was too little information. The point here is that more information may not always be better! Note, though, that the problem of overidentification occurs not because we are deliberately adding more variables. It is simply that sometimes theory tells us what variables to include or exclude from an equation, and the equation then ends up either unidentified or identified (either exactly or over).

In summary, an equation in a simultaneous equation model may be uniden- tified, exactly identified, or overidentified. There is nothing we can do about underidentification, assuming the model is correct. Underidentification is not a statistical problem that can be solved with a larger sample size. You can look at those four dots in Figure 11-2(a) all year long, but they will never tell you the slope of the supply and demand curves that generated them. If an equation is exactly identified, we can use the method of indirect least squares (ILS) to esti- mate its parameters. If an equation is overidentified, ILS will not provide unique estimates of the parameters. Fortunately, we can use the method of two- stage least squares (2SLS) to estimate the parameters of an overidentified equation. But before we turn to 2SLS, we would like to find out if there is a sys- tematic way to determine whether an equation is underidentified, exactly iden- tified, or overidentified; the method of reduced form regression to determine identification is rather cumbersome, especially if the model contains several equations.

11.6 RULES FOR IDENTIFICATION:THE ORDER CONDITION OF IDENTIFICATION

To understand the so-called order condition of identification, we introduce the following notations:

m = number of endogenous (or jointly dependent) variables in the model k = total number of variables (endogenous and exogenous) excluded from the

equation under consideration

Then,

1. If k = m - 1, the equation is exactly identified. 2. If k > m - 1, the equation is overidentified. 3. If k < m - 1, the equation is underidentified.

To apply the order condition, all we have to do is to count the number of endogenous variables (= number of equations in the model) and the total num- ber of variables (endogenous as well as exogenous) excluded from the particu- lar equation under consideration. Although the order condition of identification is only necessary and not sufficient, in most practical applications is has been found to be very helpful.

Thus, applying the order condition to the supply and demand models (11.39) and (11.40), we see that m = 2 and that the supply function excludes the

CHAPTER ELEVEN: SIMULTANEOUS EQUATION MODELS 361

guj75845_ch11.qxd 4/16/09 12:23 PM Page 361

variables Xt and Wt; that is, k = 2. Since k > m - 1, the supply equation is overi- dentified. As for the demand function, it excludes . Since k = m - 1, the de- mand function is identified. But we now have a slight complication. If we try to estimate the parameters of the demand function from the reduced form coeffi- cients given in Equation (11.43), the estimates will not be unique because B2, which enters into the computations, takes two values, as shown in Equation (11.44). This complication can, however, be avoided if we use the method of 2SLS, which we will now discuss.

11.7 ESTIMATION OF AN OVERIDENTIFIED EQUATION: THE METHOD OF TWO-STAGE LEAST SQUARES

To illustrate the method of two-stage least squares (2SLS), consider the following model:

(11.45)

(11.46)

where Y = income M = stock of money

I = investment expenditure G = government expenditure on goods and services

u1, u2 = stochastic error terms

In this model, the variables I and G are assumed to be exogenous. The income function, a hybrid of the quantity-theory and the Keynesian

approaches to income determination, states that income is determined by the money supply, investment expenditure, and government expenditure. The money supply function states that the stock of the money supply is determined by the Federal Reserve System (FED) on the basis of the level of income. Obviously, we have a simultaneity problem here because of the feedback between income and money supply.

Applying the order condition of identification, we can check that the income equation is unidentified because it excludes no variable in the model, whereas the money supply function is overidentified because it excludes two variables in the system. (Note that m = 2 in this model.)

Since the income equation is underidentified, there is nothing we can do to estimate its parameters. What about the money supply function? Since it is overidentified, if we use ILS to estimate its parameters, we will not obtain unique estimates for the parameters; actually, B2 will have two values. What about OLS? Because of the likely correlation between income Y and the sto- chastic error term u2, OLS estimates will be inconsistent in view of our earlier discussion. What, then, is the alternative?

Suppose in the money supply function (11.46) we find a surrogate or proxy or an instrumental variable for Y such that, although resembling Y, it is uncorrelated

Money supply function: Mt = B1 + B2Yt + u2t

Income function: Yt = A1 + A2Mt + A3It + A4Gt + u1t

Pt-1

362 PART THREE: ADVANCED TOPICS IN ECONOMETRICS

guj75845_ch11.qxd 4/16/09 12:23 PM Page 362

with u2. If we can find such a proxy, OLS can be used straightforwardly to esti- mate the parameters of the money supply function. (Why?) But how do we obtain such a proxy or instrumental variable? One answer is provided by the method of two-stage least squares (2SLS). As the name indicates, the method involves two successive applications of OLS. The process follows.

Stage 1 To get rid of the likely correlation between income Y and the error term u2, first regress Y on all predetermined variables in the whole model, not just on that equation. In the present case, this means regressing Y on the predetermined variables I (gross private domestic investment) and G (government expenditure) as follows:

(11.47)

where w is a stochastic error term. From Equation (11.47), we obtain

(11.48)

where is the estimated mean value of Y, given the values of I and G. Note the over the coefficients indicates that these are the estimated values of the true ’s.

Therefore we can write Eq. (11.47) as

(11.49)

which shows that the (stochastic) Y consists of two parts: , which from Equation (11.48) is a linear combination of the predetermined variables I and G and a random component wt. Following OLS theory, and w are therefore uncorrelated. (Why? See Problem 2.25.)

Stage 2 The overidentified money supply function can now be written as

(11.50)

where Comparing Equations (11.50) and (11.46), we see that they are very similar in

appearance, the only difference being that Y is replaced by , the latter being obtained from Eq. (11.48). What is the advantage of this? It can be shown that although Y in the original money supply function (11.46) is likely to be corre- lated with the stochastic error term u2 (hence rendering OLS inappropriate), in Eq. (11.50) is uncorrelated with vt asymptotically, that is, in a large sample (or, more accurately, as the sample size increases indefinitely). As a result, OLS can now be applied to Eq. (11.50), which will give consistent estimates of the parameters of the money supply function (11.46). This is an improvement over

YN

YN

vt = u2t + B2w1

Mt = B1 + B2(YN t + wt) + u2t = B1 + B2YN t + (u2t + B2wt) = B1 + B2YN t + vt

YN

YN t

Yt = YN t + wt

� �

N

YN t

YtN = �N 1 + �N 2It + �N 3Gt

Yt = �1 + �2It + �3Gt + wt

CHAPTER ELEVEN: SIMULTANEOUS EQUATION MODELS 363

guj75845_ch11.qxd 4/16/09 12:23 PM Page 363

the direct application of OLS to Eq. (11.46), for in that situation the estimates are likely to be biased as well as inconsistent.5

11.8 2SLS: A NUMERICAL EXAMPLE

Let us continue with the money supply and income models of Equations (11.45) and (11.46). Table 11-2 in Problem 11.18 (found on the textbook’s Web site), gives data on Y (income, as measured by GDP), M (money supply, as measured by the M2 measure of money supply), I (investment as measured by gross private do- mestic investment, GPDI), and G (federal government expenditure). The data are in billions of dollars, except the interest rate (as measured by the 6-month Treasury bill rate), which is a percentage. The data on interest rates are given for some problems at the end of the chapter. These data are annual and are for the period 1965–2006.

Stage 1 Regression To estimate the parameters of the money supply func- tion (11.46), we first regress the stochastic variable Y (income) on the proxy vari- ables I and G, which are treated as exogenous or predetermined. The results of this regression are

(11.51)

Interpret these results in the usual manner. Notice that all the coefficients are statistically significant at the 5% level of significance.

Stage 2 Regression We estimate the money supply function (11.46) by re- gressing M not on the original income Y but on the Y as estimated in Eq. (11.51). The results are

(11.52)6

Note: Observe that there is a ˆ on Y on the right-hand side.

t = (4.2013) (89.9646) r2 = 0.9951 se = (35.9740) (0.0057) MN t = 151.1360 + 0.5163YN t

t = (-2.9972) (7.9377) (11.2397) R2 = 0.9975 se = (54.0655) (0.3278) (0.2869) YN t = -162.0426 + 2.6019It + 3.2250Gt

364 PART THREE: ADVANCED TOPICS IN ECONOMETRICS

5For further discussion of this somewhat technical point, see Gujarati and Porter, Basic Econometrics, 5th ed., McGraw-Hill, New York, 2009, Chapter 20.

6These standard errors are corrected to reflect the nature of the error term vt. This is a techni- cal point. Consult Gujarati and Porter, Basic Econometrics, 5th ed., McGraw-Hill, New York, 2009, p. 736.

guj75845_ch11.qxd 4/16/09 12:23 PM Page 364

OLS Regression For a comparison, we give the results of the regression (11.46) based on the inappropriately applied OLS:

(11.53)

Comparing the 2SLS and the OLS results, you might say that the results are not vastly different. This may be so in the present case, but there is no guarantee that this always will be the case. Besides, we know that in theory 2SLS is better than OLS, especially in large samples.

We conclude our somewhat nontechnical discussion of the simultaneous equation models by noting that besides ILS and 2SLS there are other methods of estimating such models. But a discussion of these methods (e.g., the method of full information maximum likelihood) is beyond the scope of this introductory book.7 Our primary purpose in this chapter was to introduce readers to the bare bones of the topic of simultaneous equation models to make them aware that on occasion we may have to go beyond the single equation regression modeling considered in the previous chapters.

11.9 SUMMARY

In contrast to the single equation models discussed in the preceding chapters, in simultaneous equation regression models what is a dependent (endogenous) variable in one equation appears as an explanatory variable in another equa- tion. Thus, there is a feedback relationship between the variables. This feedback creates the simultaneity problem, rendering OLS inappropriate to estimate the pa- rameters of each equation individually. This is because the endogenous variable that appears as an explanatory variable in another equation may be correlated with the stochastic error term of that equation. This violates one of the critical assumptions of OLS that the explanatory variable be either fixed, or nonran- dom, or if random, that it be uncorrelated with the error term. Because of this, if we use OLS, the estimates we obtain will be biased as well as inconsistent.

Besides the simultaneity problem, a simultaneous equation model may have an identification problem. An identification problem means we cannot uniquely estimate the values of the parameters of an equation. Therefore, before we esti- mate a simultaneous equation model, we must find out if an equation in such a model is identified.

One cumbersome method of finding out whether an equation is identified is to obtain the reduced form equations of the model. A reduced form equation ex- presses a dependent (or endogenous) variable solely as a function of exogenous,

t = (3.3370) (67.5898) r2 = 0.9913 se = (47.7531) (0.0076) MN t = 159.3544 + 0.5147Yt

CHAPTER ELEVEN: SIMULTANEOUS EQUATION MODELS 365

7Refer to William H. Greene, Econometric Analysis, 3rd ed., Prentice-Hall, New Jersey, 1997, Chapter 16.

guj75845_ch11.qxd 4/16/09 12:23 PM Page 365

or predetermined, variables, that is, variables whose values are determined out- side the model. If there is a one-to-one correspondence between the reduced form coefficients and the coefficients of the original equation, then the original equation is identified.

A shortcut to determining identification is via the order condition of identifica- tion. The order condition counts the number of equations in the model and the number of variables in the model (both endogenous and exogenous). Then, based on whether some variables are excluded from an equation but included in other equations of the model, the order condition decides whether an equa- tion in the model is underidentified, exactly identified, or overidentified. An equa- tion in a model is underidentified if we cannot estimate the values of the para- meters of that equation. If we can obtain unique values of parameters of an equation, that equation is said to be exactly identified. If, on the other hand, the estimates of one or more parameters of an equation are not unique in the sense that there is more than one value of some parameters, that equation is said to be overidentified.

If an equation is underidentified, it is a dead-end case. There is not much we can do, short of changing the specification of the model (i.e., developing an- other model). If an equation is exactly identified, we can estimate it by the method of indirect least squares (ILS). ILS is a two-step procedure. In step 1, we apply OLS to the reduced form equations of the model, and then we retrieve the original structural coefficients from the reduced form coefficients. ILS estima- tors are consistent; that is, as the sample size increases indefinitely, the estima- tors converge to their true values.

The parameters of the overidentified equation can be estimated by the method of two-stage least squares (2SLS). The basic idea behind 2SLS is to replace the explanatory variable that is correlated with the error term of the equation in which that variable appears by a variable that is not so correlated. Such a vari- able is called a proxy, or instrumental, variable. 2SLS estimators, like the ILS estimators, are consistent estimators.

KEY TERMS AND CONCEPTS

The key terms and concepts introduced in this chapter are

366 PART THREE: ADVANCED TOPICS IN ECONOMETRICS

Simultaneous equation regression model

Endogenous variable Exogenous, or predetermined,

variable Structural or behavioral equation Identity Simultaneity problem Reduced form equation

Indirect least squares (ILS) Identification problem

a) Exact identification b) Unidentification or

underidentification c) Overidentification

Two-stage least squares (2SLS) Identification rules

a) Order condition of identification

guj75845_ch11.qxd 4/16/09 12:23 PM Page 366

QUESTIONS

11.1. What is meant by the simultaneity problem? 11.2. What is the meaning of endogenous and exogenous variables? 11.3. Why is OLS generally inappropriate to estimate an equation embedded in a

simultaneous equation model? 11.4. What happens if OLS is applied to estimate an equation in a simultaneous

equation model? 11.5. What is meant by a reduced form (regression) equation? What is its use? 11.6. What is the meaning of a structural, or behavioral, equation? 11.7. What is meant by indirect least squares? When is it used? 11.8. What is the nature of the identification problem? Why is it important? 11.9. What is the order condition of identification? 11.10. What may be meant by the statement that the order condition of identifica-

tion is a necessary but not sufficient condition for identification? 11.11. Explain carefully the meaning of (1) underidentification, (2) exact identifica-

tion, and (3) overidentification. 11.12. How do we estimate an underidentified equation? 11.13. What method(s) is used to estimate an exactly identified equation? 11.14. What is 2SLS used for? 11.15. Can 2SLS also be used to estimate an exactly identified equation?

PROBLEMS

11.16. Consider the following two-equation model:

where the Y’s are the endogenous variables, the X’s the exogenous variables, and the u’s the stochastic error terms. a. Obtain the reduced form regressions. b. Determine which of the equations is identified. c. For the identified equation, which method of estimation would you use

and why? d. Suppose, a priori, it is known that A3 = 0. How would your answers to the

preceding questions change? Why? 11.17. Consider the following model:

where the Y’s are the endogenous variables, the X’s the exogenous, and the u’s the stochastic error terms. Based on this model, the following reduced form regressions are obtained

Y2t = 4 + 12X1t

Y1t = 6 + 8X1t

Y2t = B1 + B2Y1t + u2t

Y1t = A1 + A2Y2t + A3X1t + u1t

Y2t = B1 + B2Y1t + B3X2t + u2t

Y1t = A1 + A2Y2t + A3X1t + u1t

CHAPTER ELEVEN: SIMULTANEOUS EQUATION MODELS 367

guj75845_ch11.qxd 4/16/09 12:23 PM Page 367

a. Which structural coefficients, if any, can be estimated from these reduced form equations?

b. How will our answer change if it is known a priori that (1) A2 = 0 and (2) A1 = 0?

11.18. Consider the following model:

where Y = income (measured by gross domestic product, GDP), R = interest rate (measured by 6-month Treasury bill rate, %), and M = money supply (measured by M2). Assume that M is determined exogenously. a. What economic rationale lies behind this model? (Hint: See any macroeco-

nomics textbook.) b. Are the preceding equations identified? c. Using the data given in Table 11-2 (on the textbook’s Web site), estimate

the parameters of the identified equation(s). Justify the method(s) you use. 11.19. Consider the following reformulation of the model given in Problem 11.18.

where in addition to the variables defined in the preceding problem, I stands for investment (measured by gross private domestic investment, GPDI). Assume that M and I are exogenous. a. Which of the preceding equations is identified? b. Using the data in Table 11-2 (on the textbook’s Web site), estimate the

parameters of the identified equation(s). c. Comment on the difference in the results of this and the preceding

problem. 11.20. Consider the wages data set used in Chapter 9 (see Table 9-2, on the text-

book’s Web site). As a reminder: Wage = $, per hour; Occup = Occupation; Sector = 1 for manufacturing, 2 for construction, 0 for other; Union = 1 if union member, 0 otherwise; Education = years of schooling; Experience = work experience in years; Age = in years; Sex = 1 for female; Marital status = 1 if married; Race = 1 for other, 2 for Hispanic, 3 for white; Region = 1 if lives in the South.

Consider the following simple wage determination model:

(1)

Suppose education, like wages, is endogenous. How would you find out that in Equation (1) education is in fact endogenous? Use the data given in the table in your analysis.

11.21. Consider the following demand and supply model for loans of commercial banks to businesses:

Supply: Qt = b1 + b2Rt + b3RSt + b4TBDt + u2t

Demand: Qt = Qt = �1 + �2Rt + �2RDt + �4IPIt + u1t

In W = B1 + B2Educ + B3Exper + B4Exper2 + ui

Yt = B1 + B2Rt + B3It + u2t

Rt = A1 + A2Mt + A3Yt + u1t

Yt = B1 + B2Rt + u2t

Rt = A1 + A2Mt + A3Yt + u1t

368 PART THREE: ADVANCED TOPICS IN ECONOMETRICS

guj75845_ch11.qxd 4/16/09 12:23 PM Page 368

Where Q = total commercial bank loans ($ billion); R = average prime rate; RS = 3-month Treasury bill rate; RD = AAA corporate bond rate; IPI = Index of Industrial Production; and TBD = total bank deposits. a. Collect data on these variables for the period 1980–2008 from various

sources, such as www.economagic.com, the Web site of the Federal Reserve Bank of St. Louis, or any other source.

b. Are the demand and supply functions identified? List which variables are endogenous and which are exogenous.

c. How would you go about estimating the demand and supply functions listed above? Show the necessary calculations.

d. Why are both R and RS included in the model? What is the role of IPI in the model?

APPENDIX 11A: Inconsistency of OLS Estimators

To show that the OLS estimator of b2 is an inconsistent estimator of B2 because of correlation between Yt and ut, we start with the OLS estimator Eq. (11.6):

(11A.1)

where . Now substituting for Ct from Eq. (11.1), we obtain

(11A.2)

where in the last step use is made of the fact that and . (Why?)

Taking the expectation of Equation (11A.2), we get

(11A.3)

Unfortunately, we cannot readily evaluate the expectation of the second term in Equation (11A.3), since the expectations operator E is a linear operator. (Note:

.) But intuitively it should be clear that unless the second term in Eq. (11A.3) is zero, b2 is a biased estimator of B2. E[A>B] Z E[A]>E[B]

E(b2) = B2 + E c gytut gy2t

d

(gYtyt>gy2t ) = 1gyt = 0

= B2 + gytut gy2t

b2 = g (B1 + B2Yt + ut)yt

gy2t

yt = (Yt - Y)

= gCtyt gy2t

b2 = g (Ct - C)(Yt - Y) g (Yt - Y)2

CHAPTER ELEVEN: SIMULTANEOUS EQUATION MODELS 369

guj75845_ch11.qxd 4/16/09 12:23 PM Page 369

Not only is b2 biased, but it is inconsistent as well. An estimator is said to be consistent if its probability limit (plim) is equal to its true (population) value.8

Using the properties of the plim, we can express9

(11A.4)

where use is made of the properties of the plim operator that the plim of a con- stant (such as B2) is that constant itself and that the plim of the ratio of two entities is the ratio of the plim of those entities.

Now as n increases indefinitely, it can be shown that

(11A.5)

where is the variance of u and is the variance of Y. Since B2 (MPC) lies between 0 and 1, and since the two variance terms in

Equation (11A.5) are positive, it is obvious from Eq. (11A.5) that plim (b2) will always be greater than B2; that is, b2 will overestimate B2 and the bias will not disappear no matter how large the sample size.

�2y� 2

plim(b2) = B2 + 1

1 - B2 a

�2

�2y b

= B2 + p lim (gytut>n)

p lim (gy2t >n)

= B2 + plim c gytut>n

gy2t >n d

plim(b2) = plim(B2) + plim c gytut gy2t

d

370 PART THREE: ADVANCED TOPICS IN ECONOMETRICS

8If lim Probability , where and n is the sample size, we say that b2 is a consistent estimator of B2, which, for short, we write as . For further details, see Gujarati and Porter, Basic Econometrics, 5th ed., McGraw-Hill, New York, 2009, pp. 829–831.

9Although , we can write plim (A/B) = plim(A)/plim(B).E(A>B) Z E(A)>E(B)

n : q plim (b2) = B2 d 7 0n : q { ƒ b2 - B2 ƒ 6 d} = 1

guj75845_ch11.qxd 4/16/09 12:23 PM Page 370

CHAPTER 12 SELECTED TOPICS

IN SINGLE EQUATION REGRESSION MODELS

371

In this chapter we will consider several topics that are useful in applied research. These topics are:

1. Dynamic economic models. 2. Spurious regression: Nonstationary time series. 3. Tests of stationarity. 4. Cointegrated time series. 5. The random walk model. 6. The logit model.

We will discuss the nature of these topics and illustrate them with several examples.

12.1 DYNAMIC ECONOMIC MODELS: AUTOREGRESSIVE AND DISTRIBUTED LAG MODELS

In all the regression models that we have considered up to this point we have assumed that the relationship between the dependent variable Y and the ex- planatory variables, the X’s, is contemporaneous, that is, at the same point in time. This assumption may be tenable in cross-sectional data but not in time series data. Thus, in a regression of consumption expenditure on personal disposable income (PDI) involving time series data it is possible that consumption expen- diture depends upon the PDI in the previous time period as well as upon the PDI in the current time period. That is, there may be a noncontemporaneous, or lagged, relationship between Y and the X’s.

guj75845_ch12.qxd 4/16/09 12:25 PM Page 371

To illustrate, let Yt = the consumption expenditure at time t, Xt = the PDI at time t, Xt−1 = the PDI at time (t − 1), and Xt−2 = the PDI at time (t − 2). Now consider the model

(12.1)

As this model shows, because of the lagged terms Xt−1 and Xt−2, the relationship between consumption expenditure and PDI is not contemporaneous. Models like Equation (12.1) are called dynamic models (i.e., involving change over time) because the effect of a unit change in the value of the explanatory variable is felt over a number of time periods, three in the model of Eq. (12.1).

More technically, dynamic models like Eq. (12.1) are called distributed lag models, for the effect of a unit change in the value of the explanatory variable is spread over, or distributed over, a number of time periods. To illustrate this point further, consider the following hypothetical consumption function:

(12.2)

Suppose a person received a permanent salary increase of $1000 (permanent in the sense that the increase in the salary will be maintained). If his or her con- sumption function is as shown in Equation (12.2), then in the first year of the salary increase he or she increases his or her consumption expenditure by $400 (0.4 × 1000), by another $300 (0.3 × 1000) the next year, and by another $200 (0.2 × 1000) in the third year. Thus, by the end of the third year the level of his or her consumption expenditure will have increased by (200 + 300 + 400), or by $900; the remaining $100 goes into savings.

Contrast the consumption function Eq. (12.2) with the following consumption function:

(12.3)

Although the ultimate effect of a $1000 increase in income on consumption is the same in both cases, it takes place with a lag of one year in Equation (12.3), whereas in Eq. (12.2) it is distributed over a period of three years; hence the name distributed lag model for models like Eq. (12.2). This can be seen clearly from Figure 12-1.

Reasons for Lag

Before moving on, a natural question arises: Why do lags occur? That is, why does the dependent variable respond to a unit change in the explanatory vari- able(s) with a time lag? There are several reasons, which we discuss now.

Psychological Reasons Due to the force of habit (inertia), people do not change their consumption habits immediately following a price decrease or an income increase, perhaps because the process of change involves some imme- diate disutility. Thus, those who become instant millionaires by winning lotter- ies may not change their lifestyles because they do not know how to react to

Yt = constant + 0.9Xt-1

Yt = constant + 0.4Xt + 0.3Xt-1 + 0.2Xt-2

Yt = A + B0Xt + B1Xt-1 + B2Xt-2 + ut

372 PART THREE: ADVANCED TOPICS IN ECONOMETRICS

guj75845_ch12.qxd 4/16/09 12:25 PM Page 372

CHAPTER TWELVE: SELECTED TOPICS IN SINGLE EQUATION REGRESSION MODELS 373

such an immediate windfall, not to mention the hounding by financial plan- ners, newly discovered relatives, tax lawyers, etc.

Technological Reasons Every time a new-generation personal computer (PC) comes on the market, the prices of existing PCs drop dramatically. Some people who can still use existing PCs would therefore wait for the announcement of a new PC in the hope of purchasing an existing PC at a cheaper price. The same is true of automobiles. The moment, say, the 2010 models are on the market, the prices of 2009 models drop considerably. Consumers thinking of replacing their old cars may wait for the announcement of the new model in anticipation of buying a previous model at a lower price.

Institutional Reasons Since most major collective bargaining agreements are multiyear contracts, union workers have to wait for the expiration of the existing contract to negotiate a new wage rate even though the inflation rate has increased substantially since the signing of the last contract. Likewise, a profes- sional ball player has to wait until the expiration of his contract to negotiate a new one, even though his “productivity” has gone up since the contract was signed several years ago. Of course, some players try to renegotiate the existing contract and some do succeed.

For these and other reasons, lags occupy a central role in economics. This is clearly reflected in the short-run/long-run methodology of economics. In the short run the price or income elasticities are generally smaller in absolute value than their long-run counterparts because it takes time to make the necessary adjustment following a change in the values of explanatory variables.

Generalizing Eq. (12.1), we can write a k-period distributed lag model as

(12.4)Yt = A + B0Xt + B1Xt-1 + B2Xt-2 + Á + BkXt-k + ut

An example of a distributed lag modelFIGURE 12-1

C on

su m

p ti

on E

xp en

d it

u re

Y

X 0 t1 t2 t3

Time

$900

$400

$300

$200

guj75845_ch12.qxd 4/16/09 12:25 PM Page 373

in which the effect of a unit change in the value of the explanatory variable is felt over k periods.1 In the regression (12.4), Y responds to a unit change in the value of the X variable not only in the current time period but also in several previous time periods.

In the regression (12.4), the coefficient B0 is known as the short-run, or impact, multiplier because it gives the change in the mean value of Y following a unit change in X in the same time period. If the change in X is maintained at the same level thereafter, then (B0 + B1) gives the change in the mean value of Y in the next period, (B0 + B1 + B2) in the following period, etc. These partial sums are called interim, or intermediate, multipliers. Finally, after k periods, we obtain

(12.5)

which is known as the long-run, or total, multiplier. Thus, in the consumption function given in the model (12.2), the short-run multiplier is 0.4, the interim multiplier is (0.4 + 0.3) = 0.7, and the long-run multiplier is (0.4 + 0.3 + 0.2) = 0.9. In the long run, here three periods, a unit change in PDI will lead, on average, to a 0.9 unit change in the consumption expenditure. In short, the long-run marginal propensity to consume (MPC) is 0.9, whereas the short-run MPC is only 0.4, 0.7 being the intermediate term MPC. Since the impact of the change in the value of the explanatory variable(s) in the distant past is probably less im- portant than the impact in the immediate near future, we would expect that gen- erally B0 would be greater in value than B1, which would be greater than B2, etc. In other words, the values of the various B’s are expected to decline from the first B onward, a fact that will be useful later when we estimate the distributed lag models.

Estimation of Distributed Lag Models

How do we estimate distributed models like regression (12.4)? Can we still use the usual ordinary least squares (OLS) method? In principle, yes, for if we as- sume that Xt is nonstochastic, or fixed in repeated sampling, so are Xt−1 and all other lagged values of the X’s. Therefore, model (12.4) per se does not violate any of the standard assumptions of the classical linear regression model (CLRM). However, there are some practical problems that need to be addressed.

1. The obvious problem is to determine how many lagged values of the explanatory variables to introduce, for economic theory is rarely robust enough to suggest the maximum length of the lag.

2. If we introduce too many lagged values, the degrees of freedom can become a serious problem. If we have 20 observations and introduce

a k

i=0 Bi = B0 + B1 + B2 + Á + Bk

374 PART THREE: ADVANCED TOPICS IN ECONOMETRICS

1The term period is used generically; it can be a day, a week, a month, a quarter, a year, or any suitable time period.

guj75845_ch12.qxd 4/16/09 12:25 PM Page 374

10 lagged values, we will have only 8 degrees of freedom left—10 d.f. will be lost on account of the lagged values—one on account of the cur- rent value, and one for the intercept. Obviously, as the number of de- grees of freedom dwindles, statistical inference becomes increasingly less reliable. The problem becomes all the more complex if we have more than one explanatory variable in the model, each with its own distrib- uted lag structure. In this case we can consume degrees of freedom very fast. Note that for every coefficient estimated, we lose 1 d.f.

3. Even with a large sample where there is not much concern about the degrees of freedom problem, we may run into the problem of multi- collinearity, for successive values of most economic variables tend to be correlated, sometimes very highly. As noted in Chapter 8, multicollinear- ity leads to imprecise estimation; that is, standard errors tend to be large in relation to estimated coefficients. As a result, based on the routinely computed t ratios, we tend to declare that a lagged coefficient(s) is statis- tically insignificant. Another problem that arises is that coefficients of suc- cessive lagged terms sometimes alternate in sign, which makes it difficult to interpret some coefficients, as the following example will show.

Example 12.1. An Illustrative Example: The St. Louis Model

To determine whether changes in the nominal gross national product (GNP) can be explained by changes in either the money supply (monetarism) or government expenditure (Keynesianism), the Federal Reserve Bank of St. Louis has developed a model, popularly known as the St. Louis model. One version of this model is

(12.6)

where the rate of growth of nominal GNP at time t the rate of growth in the money supply (M1 version) at time t the rate of growth in full or high employment government expenditure at time t

By convention, a dot over a variable denotes growth rate (e.g., ; recall the log-lin model from Chapter 5).

The results based on the quarterly data from 1953-I to 1976-IV using four lagged values of and each follow.2 For ease of reading, the results are presented in tabular form (Table 12-1).

E #

M #

Y #

t = 1 Y

dY dt

E #

t = M #

t = Y #

t =

Y #

t = constant + a i=4

i=0 AiM

# t- i + a

i=4

i=0 Bi E

# t- i + ut

CHAPTER TWELVE: SELECTED TOPICS IN SINGLE EQUATION REGRESSION MODELS 375

2These results, with a change in notation, are from Keith M. Carlson, “Does the St. Louis Equation Now Believe in Fiscal Policy,” Review, Federal Reserve Bank of St. Louis, vol. 60, no. 2, February 1978, Table IV, p. 17. Note:

, and similarly for g4i=0BiE # t- i .A4M

# t-4

g4i=0AiM #

t- i = A0M #

t + A1M #

t-1 + A2M #

t-2 + A3M #

t-3 +

guj75845_ch12.qxd 4/16/09 12:25 PM Page 375

376 PART THREE: ADVANCED TOPICS IN ECONOMETRICS

Notice several features of the results presented in Table 12-1.

1. Not all lagged coefficients are individually significant on the basis of the conventional t test. But we cannot tell whether this lack of significance is genuine or merely due to multicollinearity.

2. The fourth lagged value of has a negative sign, which is difficult to interpret economically because all other lagged money coefficients have a positive impact on . This negative value, however, is statistically in- significant, although we do not know if this is due to multicollinearity. The third and fourth lagged values of are not only negative but are also statistically significant. Again, economically, it is difficult to interpret these negative values, for why should the rate of growth in government expenditure have a negative impact three and four periods in the past while the first two lagged values have a positive impact?

3. The immediate, or short-run, impact of a unit change in is 0.40, whereas the long-term impact is 1.06 (which is the sum of the various A coefficients), and this is statistically significant. The interpretation is that a sustained 1 percent increase in the rate of growth of the money supply will be accompanied by 1 percent increase in the rate of growth of the nominal GNP in about five quarters. Similarly, the short-run impact of a 1 percent increase in the rate of growth of government expenditure is

0.08, which is statistically significant, but the long-term impact is only 0.01 (the sum of the B coefficients), which is statistically insignificant.

The implication then is that changes in the growth rates in the money supply have a lasting impact on changes in the growth rate of the GNP (almost one for one) but changes in the growth rates of government ex- penditure do not. In short, the St. Louis model tends to support mone- tarism. That is why the St. Louis model is often called the monetarist model.

From a statistical viewpoint the obvious question is why did the St. Louis model include only four lags of each explanatory variable? Can some insignificant coefficients be due to multicollinearity? These questions cannot be answered without examining the original data and

L

L

M #

E #

Y #

M #

THE ST. LOUIS MODEL

Coefficient Estimate Coefficient Estimate

A0 0.40 (2.96)* B0 0.08 (2.26)* A1 0.41 (5.26)* B1 0.06 (2.52)* A2 0.25 (2.14)* B2 0.00 (0.02) A3 0.06 (0.71) B3 −0.06 (−2.20) A4 −0.05 (−0.37) B4 −0.07 (−1.83)*

1.06 (5.59)* 0.01 (0.40) R2 = 0.40; d = 1.78

Note: The figures in parentheses are t ratios. *Significant at 5% level (one-tailed). The value of the intercept is not

presented in the original article.

TABLE 12-1

guj75845_ch12.qxd 4/16/09 12:25 PM Page 376

determining what happens to the model if more lagged terms are intro- duced. But as you can well imagine, this will not be a particularly fruitful line of attack, for there is no way to avoid the problem of multicollinear- ity if more lagged terms are introduced. Clearly, we need an alternative that not only will rid us of the problem of multicollinearity but also will tell us how many lagged terms can be included legitimately in a model.

The Koyck, Adaptive Expectations, and Stock Adjustment Models Approach to Estimating Distributed Lag Models3

An ingenious approach to reducing both the number of lagged terms in the dis- tributed lag models and the problem of multicollinearity is to adopt the ap- proach used by the so-called Koyck, the adaptive expectations, and the partial, or stock, adjustment models. Without going into the technical details of these models, a remarkable feature of all of them is that distributed models like Eq. (12.4) can be reduced to the following “simple” model:4

(12.7)

where v is the error term. This model is called an autoregressive model (recall Chapter 10) because the lagged value of the dependent variable appears as an explanatory variable on the right-hand side of the equation. In the regression (12.4) we had to estimate the intercept, current, and k-lagged terms. So, if k = 15, we will have to estimate all 17 parameters, a considerable loss of degrees of freedom, especially if the sample size is not too large. But in the regression (12.7) we have to estimate only three unknowns, the intercept and the two slope coefficients, a tremendous savings in the degrees of freedom. So all lagged terms in the regression (12.4) are replaced by a single lagged value of Y.

Of course, there is no such thing as a “free lunch.” In reducing the number of parameters to be estimated in the model (12.4) to only three, we have cre- ated some problems in the model (12.7). First, since Yt is stochastic, or random, Yt−1 is random too. Therefore, to estimate the model (12.7) by OLS, we must make sure that the error term vt and the lagged variable Yt−1 are not correlated; otherwise, as can be shown, the OLS estimators are not only biased but are incon- sistent as well. If, however, vt and Yt−1 are uncorrelated, it can be proved that the OLS estimators are biased (in small samples), but the bias tends to disap- pear as the sample size becomes increasingly large. That is, in a large sample (technically, asymptotically) the OLS estimators will be consistent. Second, if,

Yt = C1 + C2Xt + C3Yt-1 + vt

CHAPTER TWELVE: SELECTED TOPICS IN SINGLE EQUATION REGRESSION MODELS 377

3See L. M. Koyck, Distributed Lags and Investment Analysis, North-Holland, Amsterdam, 1954; P. Cagan, “The Monetary Dynamics of Hyper Inflations,” in M. Friedman (ed.), Studies in the Quantity Theory of Money, University of Chicago Press, Chicago, 1956 (for the adaptive expectations model); Marc Nerlove, Distributed Lags and Demand for Agricultural and Other Commodities, Handbook No. 141, U.S. Department of Agriculture, June 1958 (for the partial, or stock, adjustment model).

4For technical details, see Gujarati and Porter, Basic Econometrics, 5th ed., McGraw-Hill, New York, 2009, Chapter 17.

guj75845_ch12.qxd 4/16/09 12:25 PM Page 377

however, vt is serially correlated (e.g., it follows the first-order Markov scheme: , where and the error term wt satisfies the usual

OLS assumptions), OLS estimators are biased as well as inconsistent and the traditional t and F testing procedure becomes invalid. Therefore, in autoregres- sive models like Eq. (12.7) it is very important that we find out whether the error term vt follows, say, the first-order Markov, or the AR(1) scheme, discussed in Chapter 10. Third, as we discussed in Chapter 10, in autoregressive models the conventional Durbin-Watson d test is not applicable. In such cases we can use the Durbin h statistic discussed in Problem 10.16 to detect first-order autocorre- lation, or we can use the runs test.

Before we proceed to illustrate the model (12.7), it is interesting to note that the coefficient C2 attached to Xt gives the short-run impact of a unit change in Xt on mean Yt and C2/(1 − C3) gives the long-run impact of a (sustained) unit change in Xt on mean Yt; this is equivalent to summing the values of all B coefficients in the model (12.4), as shown in Eq. (12.5).5 In other words, the lagged Y term in the re- gression (12.7) acts as the workhorse for all lagged X terms in the model (12.4).

Example 12.2. The Impact of Adjusted Monetary Base Growth Rate on Growth Rate of Nominal GNP, United States, 1960–1988

To see the relationship between the growth rate in the nominal GNP ( ) and the growth rate in the adjusted monetary base ( ),6 Joseph H. Haslag and Scott E. Hein7 obtained the following regression results. (Note: The authors did not present R2. A dot over a variable represents its growth rate.)

se (0.004) (0.067) (0.054) (12.8) t (1.000) (3.552) (14.056)

Durbin h 3.35

Before interpreting these results, notice that Haslag and Hein use a one- period (a year here) lagged value of the as an explanatory variable and not the current period value, but this should cause no problem becauseAMB is largely determined by the Federal Reserve system. Besides, AMBt−1 is nonstochastic if

AMB #

= = =

Y #

t = 0.004 + 0.238AMB #

t-1 + 0.759Y #

t-1

AMB # Y

#

-1 … � … 1vt = �vt-1 + wt

378 PART THREE: ADVANCED TOPICS IN ECONOMETRICS

5The details can be found in Gujarati and Porter, Basic Econometrics, 5th ed., McGraw-Hill, New York, 2009, Chapter 17.

6The monetary base (MB), sometimes called high-powered money, in the United States consists of currency and total commercial bank reserves. The AMB takes into account the changes in the re- serve ratio requirements of the Federal Reserve bank; in the United States all commercial banks are required to keep certain cash or cash equivalents against the deposits that customers keep with the banks. The reserve ratio is the ratio of cash and cash equivalents to the total deposits (which are li- abilities of the banks). The Federal Reserve system changes this ratio from time to time to achieve some policy goals, such as containment of inflation or the rate of interest, etc.

7See Joseph H. Haslag and Scott E. Hein, “Reserve Requirements, the Monetary Base and Economic Activity,” Economic Review, Federal Reserve Bank of Dallas, March 1989, p. 13. The re- gression results are presented to suit the format of model (3.46) in Chapter 3.

guj75845_ch12.qxd 4/16/09 12:25 PM Page 378

AMBt is, which is what we usually assume about any explanatory variable in the standard CLRM. Now to the interpretation of model (12.8).

From Eq. (12.8) we observe that the short-run impact of is 0.238; that is, a one percentage point change in on the average leads to 0.238 percent- age point change in the nominal . This impact seems statistically significant because the computed t value is significant. However, the long-run impact is

which is almost unity. Therefore, in the long run a (sustained) one percentage point change in the leads to about a one percentage point change in the nominal ; so to speak, there is a one-to-one relationship between the growth rates of AMB and the nominal GNP.

The only problem with model (12.8) is that the estimated h value is statisti- cally significant. As pointed out in Problem 10.16, in a large sample the h statistic follows the standard normal distribution. Therefore, the 5% two-tailed critical Z (standard normal) value is 1.96 and the 1% two-tailed critical Z value is 2.58. Since the observed h of 3.35 exceeds these critical values, it seems that the resid- uals in the regression (12.8) are autocorrelated, and therefore the results pre- sented in model (12.8) should be taken with a grain of salt. But note that the h statistic is a large sample statistic and the sample size in the model (12.8) is 29, which may not be very large. In any case, Eq. (12.8) serves the pedagogical pur- pose of illustrating the mechanics of estimating distributed lag models via the Koyck, adaptive expectation, or stock adjustment models.

Example 12.3. Margin Requirements and Stock Market Volatility

To assess the short-run and long-run impact of a margin requirement (which restricts the amount of credit that brokers and dealers can extend to their cus- tomers), Gikas A. Hardouvelis8 estimated the following regression (among several others) for the monthly data from December 1931 to December 1987, a total of 673 months, for the stocks included in the Standard & Poor’s (S&P) index. (Note: The standard error, indicated by *, was not presented by the author.)

(12.9)

where = the standard deviation of the monthly excess nominal rate of re- turn of stocks (the nominal rate of return minus the one-month T-bill rate at the end of the previous month) calculated from (t − 11) to t (in decimals), which is taken as a measure of volatility; mt = the average official margin require- ment from (t − 11) to t (in decimals); and the figures in parentheses are the esti- mated standard errors corrected for heteroscedasticity and autocorrelation.

� t

se = (0.015) (0.024) ( )* R2 = 0.44 �N t = 0.112 - 0.112mt + 0.186�t-1

L

GNP # AMB

#

0.238 (1 - 0.759)

= 0.988

GNP # LAMB

# AMB #

CHAPTER TWELVE: SELECTED TOPICS IN SINGLE EQUATION REGRESSION MODELS 379

8See Gikas A. Hardouvelis, “Margin Requirements and Stock Market Volatility,” Quarterly Review, Federal Reserve Bank of New York, vol. 13, no. 2, Summer 1988, Table 4, p. 86, and footnote 21, p. 88.

guj75845_ch12.qxd 4/16/09 12:25 PM Page 379

Unfortunately, Hardouvelis does not present the standard error of the lagged volatility coefficient nor the h statistic. Note, though, that the author has cor- rected his results for autocorrelation.

As expected, the coefficient of the margin variable has a negative sign, sug- gesting that when margin requirements are increased, there is less speculative activity in the stock market, thereby reducing volatility. The value of −0.112 means that if the margin requirement is increased by, say, one percentage point, the volatility of S&P stocks decreases by 0.11 percentage points. This is, of course, the short-run impact. The long-run impact is

which obviously is higher (in absolute value) than the short-run impact, but not a lot higher.

Although the topic of dynamic modeling is vast and all kinds of newer econometric techniques to handle such models are currently available, the pre- ceding discussion will give you the flavor of what dynamic modeling is all about. For additional details, consult the references.9

12.2 THE PHENOMENON OF SPURIOUS REGRESSION: NONSTATIONARY TIME SERIES

Regression models involving time series data sometimes give results that are spurious, or of dubious value, in the sense that superficially the results look good but on further investigation they look suspect. To explain this phenome- non of spurious regression, let us consider a concrete example. Table 12-2 (found on the textbook Web site) gives quarterly data for the United States on gross domestic product (GDP), personal disposable income (PDI), personal consumption expenditure (PCE), profits, and dividends for the period of 1970-I to 2008-IV (a total of 156 observations); all the data are in billions of 2000 dollars.

For now we will concentrate on PCE and PDI; the other data given in the table will be used in problems at the end of this chapter.

Using the data given in Table 12-2 and regressing PCE on PDI we obtain the following regression results:

(12.10)

These regression results look “fabulous”: the R2 is extremely high, the t value of PDI is extremely high, the marginal propensity to consume (MPC) out of PDI is positive and high. The only fly in the ointment is that the Durbin-Watson d is low. As Granger and Newbold have suggested, an R2 � d is a good rule of thumb to suspect that the estimated regression suffers from spurious (or nonsense) regression;

t = (-22.03) (264.76) PCEt = -470.52 + 1.0006 PDIt R2 = 0.998; d = 0.3975

- 0.112

(1 - 0.186) L -0.138

L

380 PART THREE: ADVANCED TOPICS IN ECONOMETRICS

9A good reference is A. C. Harvey, The Econometric Analysis of Time Series, 2nd ed., MIT, Cambridge, Mass., 1990. Some parts of this book may be difficult for beginners.

guj75845_ch12.qxd 4/16/09 12:25 PM Page 380

CHAPTER TWELVE: SELECTED TOPICS IN SINGLE EQUATION REGRESSION MODELS 381

that is, in actuality there may not be any meaningful relationship between PCE and PDI.10

Why may the regression results in Equation (12.10) be spurious? To under- stand this, we have to introduce the concept of a stationary time series. To explain this concept, let us first plot the data on PCE and PDI given in Table 12-2, as shown in Figure 12-2.

Looking at Figure 12-2, we can see that both the PCE and PDI time series are generally trending upward over the sample period. Such a picture generally in- dicates that such time series may be nonstationary. What does that mean?

Broadly speaking, a stochastic process is said to be stationary if its mean and variance are constant over time and the value of the covariance between two time periods depends only on the distance or lag between the two time periods and not on the actual time at which the covariance is computed.11

Symbolically, letting Yt represent a stochastic time series, we say that it is stationary if the following conditions are satisfied:12

Mean: E(Yt) = (12.11) Variance: E(Yt − )2 = (12.12) Covariance: (12.13)�k = E[(Yt - �)(Yt+k - �)]

�2�

Quarterly PDI and PCE, United States, 1970–2008FIGURE 12-2

B il

li on

s of

2 00

0 D

ol la

rs

1000

10000

9000

6000

7000

8000

5000

4000

3000

2000

0 2000 20051970 1975 1980 1985 1990 1995

Year

PDI

PCE

10C. W. J. Granger and P. Newbold, “Spurious Regression in Econometrics,” Journal of Econometrics, vol. 2, no. 2, July 1974, pp. 111–120.

11Any time series data can be thought of as being generated by a stochastic, or random, process and a concrete set of data, such as that shown in Table 12-2, can be regarded as a (particular) realization (i.e., a sample) of the underlying stochastic process.

12In the time series literature such a stochastic process is called a weakly stationary stochastic process. In most applied work the assumption of weak stationarity has proved useful. In strong stationarity we consider higher moments of the PDF, that is, moments beyond the second.

guj75845_ch12.qxd 4/16/09 12:25 PM Page 381

where , the covariance (or autocovariance) at lag k, is the covariance between the values of Yt and , that is, between two values of Y, k periods apart. If k = 0, we obtain , which is simply the variance of Y (= ); if k = 1, is the co- variance between two adjacent values of Y, the type of covariance we encoun- tered in Chapter 10 when we discussed the topic of autocorrelation.

Suppose we shift the origin of Y from Yt to (say from 1970-I to 1974-I in our illustrative example). Now if Yt is to be stationary, the mean, variance, and autocovariances of must be the same as those of Yt. In short, if a time series is stationary, its mean, variance, and autocovariance (at various lags) remain the same no matter what time we measure them.

If a time series is not stationary in the sense just defined, it is called a nonsta- tionary time series. (Keep in mind, we are only talking about weak stationarity.)

Looking at the PCE and PDI time series given in Figure 12-2, we get the feel- ing that these two time series are not stationary. If this is the case, then in re- gression (12.10) we are regressing one nonstationary time series on another nonstationary time series, leading to the phenomenon of spurious regression.

The question now is how do we verify our feeling that the PCE and PDI time series are in fact nonstationary? We will attempt to answer this question in the next section.

12.3 TESTS OF STATIONARITY

In the literature there are several tests of stationarity. Here we will consider the so-called unit root test. Without delving into the technicalities, this test can be described as follows.13 Letting Yt represent the stochastic time series of interest (such as PCE), we proceed like this.

1. Estimate the following regression:

(12.14)

where represents the first difference operator that we encountered in Chapter 10, where t is the trend variable, taking values of 1, 2, and so on (156 for our illustrative example), and where is the one-period lagged value of the variable Y.14

2. The null hypothesis is that A3, the coefficient of , is zero, which is an- other way of saying that the underlying time series is nonstationary. This is called the unit root hypothesis.15

Yt-1

Yt-1

¢

¢Yt = A1 + A2t + A3Yt-1 + ut

Yt+m

Yt+m

�1� 2�0

Yt+k �k

382 PART THREE: ADVANCED TOPICS IN ECONOMETRICS

13For details, see Gujarati and Porter, Basic Econometrics, 5th ed., McGraw-Hill, New York, 2009, Chapter 21.

14This regression can also be estimated without the intercept and the trend term, although they are generally included.

15To see intuitively why the term unit root is used, let us proceed as follows: Yt = A1 + A2t + CYt−1 + ut. Now subtract Yt−1 from both sides of this equation to give (Yt − Yt−1) = A1 + A2t + CYt−1 − Yt−1, which then gives Yt = A1 + A2t + (C − 1)Yt−1 = A1 + A2 t + A3Yt−1, where A3 = (C − 1). Thus, if C is in fact equal to 1, A3 in regression (12.14) will in fact be zero, thus the name unit root.

¢

guj75845_ch12.qxd 4/16/09 12:25 PM Page 382

3. To test that a3, the estimated value of A3, is zero, ordinarily we would use the now familiar t test. Unfortunately, we cannot do that because the t test is, strictly speaking, valid only if the underlying time series is sta- tionary. However, we can use an alternative test called the (tau) test, whose critical values were tabulated by its creators on the basis of Monte Carlo simulations. In the literature, the tau test is known as the Dickey- Fuller (DF) test, in honor if its discoverers.16 If in an application, the computed t (= tau) value of estimated A3 is greater (in absolute value) than the critical Dickey-Fuller tau values, we reject the unit root hypoth- esis; that is, we conclude that the said time series is stationary. On the other hand, if the computed tau value is smaller (in absolute value) than the critical tau values, we do not reject the unit root hypothesis. In that case, the time series in question is nonstationary.

Let us apply the unit root test to the PCE and PDI time series given in Table 12-2. Corresponding to Equation (12.14), we obtain:

PCEt = 42.04 + 0.6596t − 0.0117 PCEt−1 t( ) = (2.83) (2.18) (−1.52) R2 = 0.099

PDIt = 74.19 + 1.0482t − 0.02209 PDIt−1 (12.15)

t( ) = (1.88) (1.58) (−1.31) R2 = 0.035 For the present purpose we are interested in the t value of the lagged PCE and PDI. The 1% and 5% critical DF, or tau, values from the table in Appendix E are about −4.04 and −3.45, respectively.17 Since in absolute terms (i.e., disregarding sign), the tau values of the lagged PCE and PDI variables are much smaller than any of the preceding tau values, the conclusion is that the PCE and PDI time se- ries are nonstationary (i.e., there is a unit root). In consequence, the OLS regres- sion given in Eq. (12.10) may be spurious (i.e., not meaningful). Incidentally, note that if we had applied the usual t test to, say, the second regression in Eq. (12.15), we would have said that the t value of the lagged PDI variable is statistically significant. But on the basis of the correct tau test (in the presence of nonstationarity) this conclusion would be wrong.

12.4 COINTEGRATED TIME SERIES

The conclusion that the regression (12.10) may be spurious suggests to us that all time series regressions, such as regression (12.10), are spurious. If this were in fact the case, we would need to be very wary of doing regressions based on

= � ¢

= � ¢

CHAPTER TWELVE: SELECTED TOPICS IN SINGLE EQUATION REGRESSION MODELS 383

16D. A. Dickey and W. A. Fuller, “Distribution of Estimators for Autoregressive Time Series with a Unit Root,” Journal of the American Statistical Association, vol. 74, June 1979, pp. 427–431.

17J. G. MacKinnon, “Critical Values of Cointegration Tests,” in R. F. Engle and C. W. J. Granger, eds., Long-run Economic Relationships: Readings in Cointegration, Oxford University Press, New York, 1991, Chapter 13. Computer packages, such as EViews, now compute the critical tau values routinely.

guj75845_ch12.qxd 4/16/09 12:25 PM Page 383

time series data. But there is no cause for despair. Even if the time series of PCE and PDI are nonstationary, it is quite possible that there is still a (long-run) sta- ble or equilibrium relationship between the two. If that is indeed the case, we say that such time series are cointegrated.18 But how do we find that out? This can be accomplished as follows.

Let us return to the PCE–PDI regression (12.10). From this regression, obtain the residuals, et; that is, obtain:

et = PCEt + 470.52 − 1.0006PDIt (12.16) Treating et as a time series, we now apply the unit root test (see Eq. [12.14]), which gives the following results. (Note: there is no need to introduce intercept and the trend variable in this regression. Why?)

(12.17)

Now the critical tau values, as computed by Engle and Granger in Appendix E, are about −4.04 (1%), −3.37 (5%), and −3.03 (10%).19 Since, in absolute terms, the computed tau of 4.35 exceeds any of these critical tau values, the conclusion is that the series et is stationary. Therefore, we can say that although PCE and PDI are individually nonstationary, their linear combination as shown in Eq. (12.16) is stationary. That is, the two time series are cointegrated, or, in other words, there seems to be a long-run or equilibrium relationship between the two variables. This is a very comforting finding because it means that the regression (12.10) is real and not spurious.

To sum up: If we are dealing with time series data, we must make sure that the in- dividual time series are either stationary or that they are cointegrated. If this is not the case, we may be open to the charge of engaging in spurious (or nonsense) regression analysis.

We will conclude the discussion of nonstationary time series by considering another example of a nonstationary time series, the so-called random walk model, which has found quite useful applications in finance, investment, and interna- tional trade.

12.5 THE RANDOM WALK MODEL

Financial time series such as the S&P 500 stock index, the Dow-Jones index, and foreign exchange rates are often said to follow a “random walk” in the sense that knowing the values of these variables today will not enable us to predict

t(= �) = (-4.35) r2 = 0.1094 ¢et = -0.2096 et-1

384 PART THREE: ADVANCED TOPICS IN ECONOMETRICS

18The literature on cointegration is vast and quite technical. Our discussion here is heuristic. A commonly cited example of cointegration is the drunkard and his dog. Leaving the bar, the drunk- ard meanders in a haphazard way. His dog also meanders in his merry ways. But the dog never loses sight of his owner. So to speak, their meanderings are cointegrated.

19R. F. Engle and C. W. J. Granger, Long-run Economic Relationships: Readings in Cointegration, Oxford University Press, New York, 1991, Chapter 13.

guj75845_ch12.qxd 4/16/09 12:25 PM Page 384

what these values will be tomorrow. Thus, knowing the price of a stock (say, of Dell or IBM) today, it is hard to tell what it will be tomorrow. That is, the price behavior of stocks is essentially random—today’s price is equal to yesterday’s price plus a random shock.20

To see the anatomy of a random walk model, consider the following simple model:

(12.18)

where ut is the random error term with zero mean and constant variance, . Let us suppose we start at time 0 with a value of Y0. Now we can write:

(12.19)

Using the recursive relation (12.18) repeatedly as in Equation (12.19), we can write:

(12.20)

where the summation is from t = 1 to t = T, T being the total number of obser- vations. Now it is easy to verify that

(12.21)

since the expected value of each ut is zero. It is also easy to verify that

(12.22)

where use is made of the fact that the u’s are random, each with the same vari- ance .

As Equation (12.22) shows, the variance of Yt is not only not constant but also continuously increases with T. Therefore, by the definition of stationarity given earlier, the (random walk) variable Yt given in Eq. (12.18) is nonstationary (here nonstationary in the variance). But notice an interesting feature of the random walk model given in Eq. (12.18). If you write it as:

(12.23)

where, as usual, is the first difference operator, we see that the first differ- ences of Y are stationary, for E( Yt) = E(ut) = 0 and var( Yt) = var(ut) = . Therefore, if Y in Eq. (12.18) represents, say, share prices, then these prices may be nonstationary, but their first differences are purely random.

�2¢¢ ¢

¢Yt = (Yt - Yt-1) = ut

�2

var (Yt) = var (u1 + u2 + Á + uT) = T�2

E(Yt) = Y0

Yt = Y0 + aut

Yt-1 = Yt-2 + ut-1

�2

Yt = Yt-1 + ut

CHAPTER TWELVE: SELECTED TOPICS IN SINGLE EQUATION REGRESSION MODELS 385

20The random walk is often compared with a drunkard’s walk. Leaving the bar, the drunkard moves a random distance ut at time t, and if he or she continues to walk indefinitely, he or she will eventually drift farther and farther away from the bar.

guj75845_ch12.qxd 4/16/09 12:25 PM Page 385

We can modify the random walk model (12.18) as follows:

(12.24)

where d is a constant. This is the random walk model with drift, d being the drift parameter.

We leave it as an exercise for you to show that for the model (12.24) we get

(12.25)

(12.26)

That is, for the random walk model with drift, both the mean and the variance continuously increase over time. Again, we have a random variable that is non- stationary both in the mean and the variance. If d is positive, we can see from model (12.24) that the mean value of Y will increase continuously over time; if d is negative, the mean value of Y will decrease continuously. In either case, the variance of Y increases continuously over time. A random variable whose mean value and variance are time-dependent is said to follow a stochastic trend. This is in contrast to the linear trend model that we discussed in Chapter 5 (see Equation [5.23]), where it was assumed that the variable Y followed a deterministic trend.

If we were to use the random walk models for forecasting purposes, we would obtain a picture such as is shown in Figure 12-3.

Figure 12-3(a) shows the