Week04 HW

profilesameer

  

Guidelines

· Share screen shot on your response 

· Share the code and the plots 

· Put your name and id number

· Clear mark question number

· Upload Word document

· Insert Cover page Questions Attempted

HW04 Cover Sheet

Identify all questions that you attempted in this template

Q1 Chapter 04 Classification Examples

Part 1 Review logistic regression in Chapter 4 - Classification

https://github.com/JWarmenhoven/ISLR-python

Use the examples to review 4.3 logistic regression for the ISLR Text

a. Plot Figure 4.1 

b. Plot Figure 4.2

c. Table 4.1, 4.2, 4.3

d. Plot Figure 4.3

Hint use - https://nbviewer.jupyter.org/github/JWarmenhoven/ISL-python/blob/master/Notebooks/Chapter%204.ipynb#4.3-Logistic-Regression

Part 2 Application to Caravan Insurance Data¶

Use Caravan.csv to apply KNN and Logistic Regression to the Caravan data

Hint – use https://nbviewer.jupyter.org/github/JWarmenhoven/ISL-python/blob/master/Notebooks/Chapter%204.ipynb#4.6.5-K-Nearest-Neighbors

Q2. Classification Textbook Examples

Using the Boston data set, fit classification models in order to predict whether a given suburb has a crime rate above or below the median. Explore logistic regression, and KNN models using various subsets of the predictors. Describe your findings.

Hint – use: https://botlnec.github.io/islp/sols/chapter4/exercise13/

Q3 Iris Data Set and Classification (iris.csv)

The Iris dataset was used in R.A. Fisher's classic 1936 paper. It includes three iris species with 50 samples each as well as some properties about each flower. One flower species is linearly separable from the other two, but the other two are not linearly separable from each other. The columns in this dataset are:

· Id

· Sepal Length Cm

· Sepal Width Cm

· Petal Length Cm

· Petal Width Cm

· Species

a. Plot the iris dataset – i) “Sepal Length vs Sepal Width” ii) “Petal Length vs Petal Width”

Split into Training / Test and 

b. Apply Naïve Bayes Classifier to classify species with the decision boundaries

c. Apply logistic regression to classify species with the decision boundaries

d. Apply KNN algorithm to classify species with the decision boundaries

e. Compare the “Truth matrix” and Accuracy of the three algorithms

  


TP


TN


FP


FN


Accuracy

 

Naïve Bayes






 

Logistic Regression






 

KNN






Hint

Naïve Bayes - https://xavierbourretsicotte.github.io/Naive_Bayes_Classifier.html

Logistic Regression – 

https://scikit-learn.org/stable/auto_examples/linear_model/plot_iris_logistic.html

https://www.datacamp.com/community/tutorials/understanding-logistic-regression-python

KNN Algorithm – 

https://www.ritchieng.com/machine-learning-k-nearest-neighbors-knn/

Q4 Titanic Data Set and Classification (titanic.zip – already separated as test, train)

a. Perform Exploratory Data Analysis

b. Do Feature Engineering

c. Apply logistic regression

d. Apply KNN algorithm

Hint

https://www.kaggle.com/angps95/basic-classification-methods-for-titanic

Q5. How does k-fold cross validation and grid search on the Social Ads Network data

Use the references the explain how the two work together to evaluate a model

https://scikit-learn.org/stable/auto_examples/model_selection/plot_grid_search_digits.html

https://sebastianraschka.com/faq/docs/evaluate-a-model.html

    • Posted: 2 months ago
    • Due: 
    • Budget: $5