ISYE 6501 - Homework 2
Jacob Wilson
September 6, 2018
Question 3.1(a) – KKNN with Cross Validation
Answer:
In this problem, I utilized the “caret” library and the train function with a “kknn” method to
perform 10 -
...
ISYE 6501 - Homework 2
Jacob Wilson
September 6, 2018
Question 3.1(a) – KKNN with Cross Validation
Answer:
In this problem, I utilized the “caret” library and the train function with a “kknn” method to
perform 10 -fold cross validation. The following is the results:
The most accurate classification was at k=23 because the accuracy is the highest and
the data is classified against the most neighbors, reducing the risk. After retraining the
model with k=23, accuracy was 84.3%.
RStudio Script:
install.packages("kknn", repos="http://cran.rstudio.com/")
## Installing package into 'C:/Users/jacob/Documents/R/win-library/3.5'
## (as 'lib' is unspecified)
## package 'kknn' successfully unpacked and MD5 sums checked
##
## The downloaded binary packages are in
## C:\Users\jacob\AppData\Local\Temp\Rtmp29XlTm\downloaded_packages
install.packages("caret", repos ="http://cran.rstudio.com/")
## Installing package into 'C:/Users/jacob/Documents/R/win-library/3.5'
## (as 'lib' is unspecified)
## package 'caret' successfully unpacked and MD5 sums checked
##
kmax Accuracy Kappa
5 0.8304588 0.6538483
7 0.8344980 0.6633484
9 0.8344980 0.6634414
11 0.8354980 0.6653672
13 0.8354980 0.6653672
15 0.8354980 0.6653672
17 0.8360108 0.6663338
19 0.8360108 0.6663338
21 0.8360108 0.6663338
23 0.8360108 0.6663338
## The downloaded binary packages are in
## C:\Users\jacob\AppData\Local\Temp\Rtmp29XlTm\downloaded_packages
install.packages("e1071", repos= "http://cran.rstudio.com/")
## Installing package into 'C:/Users/jacob/Documents/R/win-library/3.5'
## (as 'lib' is unspecified)
## package 'e1071' successfully unpacked and MD5 sums checked
##
## The downloaded binary packages are in
## C:\Users\jacob\AppData\Local\Temp\Rtmp29XlTm\downloaded_packages
library(caret)
## Loading required package: lattice
## Loading required package: ggplot2
library(e1071)
library(kknn)
##
## Attaching package: 'kknn'
## The following object is masked from 'package:caret':
##
## contr.dummy
Importing the data from “credit_card_data-headers”, establishing the working directory,
and ensuring the results are repeatable…
setwd("C:/Users/jacob/Desktop/ISYE 6501/Homework 1 - 30 AUG 2018")
cc_data <- read.table("credit_card_data-headers.txt", header = TRUE)
set.seed(313)
dp <- createDataPartition(cc_data$R1, p = 0.6, list = FALSE)
train <- cc_data[dp,]
trainX <- cc_data[dp, 1:10]
trainY <- as.factor(cc_data[dp, 11])
test <- cc_data[-dp,]
testX <- cc_data[-dp, 1:10]
testY <- as.factor(cc_data[-dp, 11])
The control sets the parameters for the k-fold cross validation. This will be 10-fold cross
validation that is repeated 10 times.
control <- trainControl(method = "repeatedcv", number = 10, repeats = 5)
Training the model with the Caret Package.
Note: Explaining the inputs - train(Classes, Data, Scaling data from 0 to 1, tuneLength =
K over 10 values and uses the best to train the final model, trControl = Implements the
control establish above)
model_knn <- train(trainX, trainY, method = "kknn", preProcess = c("range"), tuneLength = 10, t
rControl = control)
print(model_knn)
## k-Nearest Neighbors
##
## 393 samples
## 10 predictor
## 2 classes: '0', '1'
##
## Pre-processing: re-scaling to [0
[Show More]