HomeWork #1 EDX GTx: ISYE6501x - Introduction to Analytics Modeling Mónica Rojas May 17, 2020, Georgia Tech,

Document Content and Description Below

HomeWork #1 EDX GTx: ISYE6501x - Introduction to Analytics Modeling Mónica Rojas May 17, 2020 Table of Contents Results........................................................................... ... ..................................................................................1 Question 2.1 .............................................................................................................................................1 Question 2.2 .............................................................................................................................................2 Part 1......................................................................................................................................................2 Part 2......................................................................................................................................................8 Part 3......................................................................................................................................................9 Question 3.1 .......................................................................................................................................... 11 Part a ................................................................................................................................................... 11 Part b................................................................................................................................................... 12 Results Question 2.1 Describe a situation or problem from your job, everyday life, current events, etc., for which a classification model would be appropriate. List some (up to 5) predictors that you might use. In my case, a classification model would be appropriate to determine which clients will close their accounts. I work for a bank where it is important to have our clients happy with our service and avoid churn. Attracting new customers is more expensive than keeping existing ones. Some of the predictors I found are: - Banking predictors: - Account balance - Age as a customer - Products quantity - Recent Complaints- Demographic predictors: - Profession - Marital status - Type (Companny or not) - Gender Question 2.2 The files credit_card_data.txt (without headers) and credit_card_data-headers.txt (with headers) contain a dataset with 654 data points, 6 continuous and 4 binary predictor variables. It has anonymized credit card applications with a binary response variable (last column) indicating if the application was positive or negative. The dataset is the “Credit Approval Data Set” from the UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/datasets/Credit+Approval) without the categorical variables and without data points that have missing values. Part 1 1. Using the support vector machine function ksvm contained in the R package kernlab, find a good classifier for this data. Show the equation of your classifier, and how well it classifies the data points in the full data set. (Don’t worry about test/validation data yet; we’ll cover that topic soon.) Notes on ksvm • You can use scaled=TRUE to get ksvm to scale the data as part of calculating a classifier. • The term λ we used in the SVM lesson to trade off the two components of correctness and margin is called C in ksvm. One of the challenges of this homework is to find a value of C that works well; for many values of C, almost all predictions will be “yes” or almost all predictions will be “no”. • ksvm does not directly return the coefficients a0 and a1…am. Instead, you need to do the last step of the calculation yourself. Here’s an example of the steps to take (assuming your data is stored in a matrix called data): -- call ksvm. Vanilladot is a simple linear kernel. model <- ksvm(data[,1:10],data[,11],type=”Csvc”,kernel=”vanilladot”,C=100,scaled=TRUE) -- calculate a1…am a <- colSums(model@xmatrix[[1]] * model@coef[[1]]) a -- calculate a0 a0 <- –model@b a0-- see what the model predicts pred <- predict(model,data[,1:10]) pred -- see what fraction of the model’s predictions match the actual classification sum(pred == data[,11]) / nrow(data) I know I said I wouldn’t give you exact R code to copy, because I want you to learn for yourself. In general, that’s definitely true – but in this case, because it’s your first R assignment and because the ksvm function leaves you in the middle of a mathematical calculation that we haven’t gotten into in this course, I’m giving you the code. Hint: You might want to view the predictions your model makes; if C is too large or too small, they’ll almost all be the same (all zero or all one) and the predictive value of the model will be poor. Even finding the right order of magnitude for C might take a little trial-and-error Note: If you get the error “Error in vanilladot(length = 4, lambda = 0.5) : unused arguments (length = 4, lambda = 0.5)”, it means you need to convert data into matrix format: model <- ksvm(as.matrix(data[,1:10]),as.factor(data[,11]),type=”Csvc”,kernel=” vanilladot”,C=100,scaled=TRUE) suppressMessages(suppressWarnings( {library(kernlab) } )) #setwd('E:/monik.ro.es/Cursos/EDX Micromaster GT/1.Introduction to Analytics Modeling/HW/1. Due_Date_21_mayo/week_1_data-summer') CC <- suppressWarnings(read.delim("E:/monik.ro.es/Cursos/EDX Micromaster GT/1.Introduction to Analytics Modeling/HW/1. Due_Date_21_mayo/week_1_data-summer/data2_2/credit_card_dataheaders.txt", header = TRUE)) data <- data.frame(text = CC, stringsAsFactors = F) class(data) ## [1] "data.frame" Data description str(data)## 'data.frame': 654 obs. of 11 variables: ## $ text.A1 : int 1 0 0 1 1 1 1 0 1 1 ... ## $ text.A2 : num 30.8 58.7 24.5 27.8 20.2 ... ## $ text.A3 : num 0 4.46 0.5 1.54 5.62 ... ## $ text.A8 : num 1.25 3.04 1.5 3.75 1.71 ... ## $ text.A9 : int 1 1 1 1 1 1 1 1 1 1 ... ## $ text.A10: int 0 0 1 0 1 1 1 1 1 1 ... ## $ text.A11: int 1 6 0 5 0 0 0 0 0 0 ... ## $ text.A12: int 1 1 1 0 1 0 0 1 1 0 ... ## $ text.A14: int 202 43 280 100 120 360 164 80 180 52 ... ## $ text.A15: int 0 560 824 3 0 0 31285 1349 314 1442 ... ## $ text.R1 : int 1 1 1 1 1 1 1 1 1 1 ... summary(data) ## text.A1 text.A2 text.A3 text.A8 ## Min. :0.0000 Min. :13.75 Min. : 0.000 Min. : 0.000 ## 1st Qu.:0.0000 1st Qu.:22.58 1st Qu.: 1.040 1st Qu.: 0.165 ## Median :1.0000 Median :28.46 Median : 2.855 Median : 1.000 ## Mean :0.6896 Mean :31.58 Mean : 4.831 Mean : 2.242 ## 3rd Qu.:1.0000 3rd Qu.:38.25 3rd Qu.: 7.438 3rd Qu.: 2.615 ## Max. :1.0000 Max. :80.25 Max. :28.000 Max. :28.500 ## text.A9 text.A10 text.A11 text.A12 ## Min. :0.0000 Min. :0.0000 Min. : 0.000 Min. :0.0000 ## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.: 0.000 1st Qu.:0.0000 ## Median :1.0000 Median :1.0000 Median : 0.000 Median :1.0000 ## Mean :0.5352 Mean :0.5612 Mean : 2.498 Mean :0.5382 ## 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.: 3.000 3rd Qu.:1.0000 ## Max. :1.0000 Max. :1.0000 Max. :67.000 Max. :1.0000 ## text.A14 text.A15 text.R1 ## Min. : 0.00 Min. : 0 Min. :0.0000 ## 1st Qu.: 70.75 1st Qu.: 0 1st Qu.:0.0000 ## Median : 160.00 Median : 5 Median :0.0000 ## Mean : 180.08 Mean : 1013 Mean :0.4526 ## 3rd Qu.: 271.00 3rd Qu.: 399 3rd Qu.:1.0000 ## Max. :2000.00 Max. :100000 Max. :1.0000 SVM model # call ksvm. Vanilladot is a simple linear kernel. #model <- ksvm(data[,1:10],data[,11],type=”Csvc”,kernel=”vanilladot”,C=100,scaled=TRUE) #Note: If you get the error “Error in vanilladot(length = 4, lambda = 0.5) : unused arguments (length = 4, lambda = 0.5)”, it means you need to convert data into matrix format: model <- kernlab::ksvm(as.matrix(data[,1:10]), as.factor(data[,11]), type = "C-svc", kernel="vanilladot", C = 100, scaled = TRUE) ## Setting default kernel parameters#Calculate a1…am a <- colSums(model@xmatrix[[1]] * model@coef[[1]]) a ## text.A1 text.A2 text.A3 text.A8 text.A9 ## -0.0010065348 -0.0011729048 -0.0016261967 0.0030064203 1.0049405641 ## text.A10 text.A11 text.A12 text.A14 text.A15 ## -0.0028259432 0.0002600295 -0.0005349551 -0.0012283758 0.1063633995 #Calculate a0 a0 <- -model@b a0 ## [1] 0.08158492 #See what the model predicts pred <- predict(model,data[,1:10]) pred ## [1] 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ## [38] 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 ## [75] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ## [112] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ## [149] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ## [186] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 ## [223] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 ## [260] 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 ## [297] 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 ## [334] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ## [371] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [Show More]

Last updated: 3 years ago

Preview 1 out of 13 pages

Buy Now

Instant download

We Accept: