Problem 3 (45 marks)


Dataset: credit.csv. The description of the variables is in an excel file named


This data set consists of genuine credit records from a South German bank. The aim would generally

be to predict which customers will repay the loan in full and which of them will not. There are 1000

records and all amounts are in Deutschmarks. Answer the following using suitable approaches

whether descriptive/graphical or inferential and using a suitable package e.g. StatTools. Justify your

answers in the main text and include all workings as appendix.

a) Wherever possible and meaningful, provide a brief analysis of each variable, including their

distribution, outliers, etc.

b) Does there seem to be differences in age, length of loan, or amount of loan for those who repaid

their loans and those who defaulted?

c) Explore and describe the association of each variable with the credit status.

d) Does the Length of the loan vary with the use of the loan?

e) Determine relationships, if any, between Age, Length of loan and Amount of loan.

f) Construct a 3-way contingency table from the factors credit, record and use, and analyse it. You

must state your final conclusions in detail.

