Statistics > SOLUTIONS MANUAL > Fore School Of Management STATS SMAdvance_Stats PROJECT 2 (All)

Fore School Of Management STATS SMAdvance_Stats PROJECT 2

Document Content and Description Below

PROBLEM STATEMENT 2: The dataset Education - Post 12th Standard.csv is a dataset which contains the names of various colleges. This particular case study is based on various parameters of various in ... stitutions. You are expected to do Principal Component Analysis for this case study according to the instructions given in the following rubric. The data dictionary of the 'Education - Post 12th Standard.csv' can be found in the following file: Data Dictionary.xlsx. 2.1) Perform Exploratory Data Analysis [both univariate and multivariate analysis to be performed]. The inferences drawn from this should be properly documented. Sol. The main purpose of univariate analysis is to describe the data, summarize and finds pattern, it doesn’t deal with causes and relationships unlike regression. 1. We start with loading the dataset, checking its shape and data types of variable . shape tell us how many rows and columns we have in the data and data type tell us whether the variable is object, integer or float value. 2. Then we use describe function to summarize our data it tell us the mean, standard deviation, IQR, and summary of numeric columns. 3. Then we use distplot or density plot to check the normality. Normality means whether the data is normally distributed or not .4. To understand which variable in the data set is normally distributed and which is not we use skewness. If the skewness =0, It is said to be normally distributed, if it is >0 it is left skewed and if it < 0 it is skewed towards right. 5. After that, we do the multivariate analysis like we do the correlation and heatmap 2.2) Scale the variables and write the inference for using the type of scaling function for this case study. Sol. The main objective of scaling or standardization to normalize a data within a particular range. It is a step of data pre processing which is applied to independent variables or features of data. Another importance of scaling is it helps in speeding up the calculations in an algorithm.Before standardizing we need to remove the outliers which are present in the dataset. Standardization cannot be possible on columns having strings, so wee need to remove name column and then apply z score . 2.3) Comment on the comparison between covariance and the correlation matrix after scaling. Sol. Correlation is a scaled version of covariance; note that the two parameters always have the same sign (positive, negative, or 0). When the sign is positive, the variables are said to be positively correlated; when the sign is negative, the variables are said to be negatively correlated; and when the sign is 0, the variables are said to be uncorrelated [Show More]

Last updated: 2 years ago

Preview 1 out of 11 pages

Buy Now

Instant download

We Accept: