Statistics > SOLUTIONS MANUAL > Fore School Of Management STATS SMAdvance_Stats PROJECT 2 (All)
PROBLEM STATEMENT 2: The dataset Education - Post 12th Standard.csv is a dataset which contains the names of various colleges. This particular case study is based on various parameters of various in ... stitutions. You are expected to do Principal Component Analysis for this case study according to the instructions given in the following rubric. The data dictionary of the 'Education - Post 12th Standard.csv' can be found in the following file: Data Dictionary.xlsx. 2.1) Perform Exploratory Data Analysis [both univariate and multivariate analysis to be performed]. The inferences drawn from this should be properly documented. Sol. The main purpose of univariate analysis is to describe the data, summarize and finds pattern, it doesn’t deal with causes and relationships unlike regression. 1. We start with loading the dataset, checking its shape and data types of variable . shape tell us how many rows and columns we have in the data and data type tell us whether the variable is object, integer or float value. 2. Then we use describe function to summarize our data it tell us the mean, standard deviation, IQR, and summary of numeric columns. 3. Then we use distplot or density plot to check the normality. Normality means whether the data is normally distributed or not .4. To understand which variable in the data set is normally distributed and which is not we use skewness. If the skewness =0, It is said to be normally distributed, if it is >0 it is left skewed and if it < 0 it is skewed towards right. 5. After that, we do the multivariate analysis like we do the correlation and heatmap 2.2) Scale the variables and write the inference for using the type of scaling function for this case study. Sol. The main objective of scaling or standardization to normalize a data within a particular range. It is a step of data pre processing which is applied to independent variables or features of data. Another importance of scaling is it helps in speeding up the calculations in an algorithm.Before standardizing we need to remove the outliers which are present in the dataset. Standardization cannot be possible on columns having strings, so wee need to remove name column and then apply z score . 2.3) Comment on the comparison between covariance and the correlation matrix after scaling. Sol. Correlation is a scaled version of covariance; note that the two parameters always have the same sign (positive, negative, or 0). When the sign is positive, the variables are said to be positively correlated; when the sign is negative, the variables are said to be negatively correlated; and when the sign is 0, the variables are said to be uncorrelated [Show More]
Last updated: 2 years ago
Preview 1 out of 11 pages
Buy this document to get the full access instantly
Instant Download Access after purchase
Buy NowInstant download
We Accept:
Can't find what you want? Try our AI powered Search
Connected school, study & course
About the document
Uploaded On
Jul 19, 2021
Number of pages
11
Written in
All
This document has been written for:
Uploaded
Jul 19, 2021
Downloads
0
Views
221
Scholarfriends.com Online Platform by Browsegrades Inc. 651N South Broad St, Middletown DE. United States.
We're available through e-mail, Twitter, Facebook, and live chat.
FAQ
Questions? Leave a message!
Copyright © Scholarfriends · High quality services·