SYE Midterm 1 Notes:
Week 1 Classification:
- Two main types of classifiers:
o Hard Classifier: A classifier that perfectly separates data into 2 (or more) correct classes.
This type of classifier is rigid and is onl
...
SYE Midterm 1 Notes:
Week 1 Classification:
- Two main types of classifiers:
o Hard Classifier: A classifier that perfectly separates data into 2 (or more) correct classes.
This type of classifier is rigid and is only applicable to perfectly separable datasets.
o Soft Classifier: A classifier that does not perfectly separate data into perfectly correct
classes. This type is used when a dataset it not realistically separable by class thus we
use a more flexible classifier that gives us not a perfect, but optimal solution given the
constraint of a-separability.
- If a given model uses a soft classifier, we can further tune it to fit our needs based on factors
such as the cost of misclassification.
- Semantics on the types of data:
o Columns: Attribute / Features / Response / Covariates / Predictor
o Rows: Observations / Data point
o Structured Data: data that is described and stored in a structured way
o Unstructured Data: data cannot be described or easily stored. The most common
example is written language.
o Quantitative Values: Numbers that have meaning in a numerical sense
o Categorical Data: can be numeric representing a category or non-numeric indicating a
category
o Unrelated Data: No relationships between datapoints
o Timeseries Data: Same data recorded over time typically in regular intervals
- Hyperplane: In a p-dimensional coordinate space, a hyperplane is the flat affine subspace of that
dimension such that it takes a dimensionality of (p-1). In this context affine indicates that the
subspace need not pass through the origin. This equation will be in general linear form.
o Equation of p dimensional general form:
β0 + β1X1 + β2X2 + … + βpXp = 0
- Linear Separable Hyperplane: Given a set of datapoints with given signed classes and perfectly
linearly separable classes, there will be an infinite number of hyperplanes equations that satisfy
all points being classified correctly. This is demonstrated in the constraints below. If a point is
above the hyperplane it will have a positive value and if it is below the separating hyperplane it
will have a value less than zero. Furthermore, if the class is correctly predicted via the result of
the equation it will share the sign of its respective yi value. Given these two properties of the
hyperplanes we can infer the following constraints.
If yi = 1; β0 + β1X1 + β2X2 + … + βpXp > 0
If yi = -1; β0 + β1X1 + β2X2 + … + βpXp < 0
o We can equivalently rewrite these two inequalities into a single property of the
hyperplane by combining them via a class wise multiplication term. This multiplication
causes no change as we have perfect sign translation without class error.
(Yi) (β0 + β1X1 + β2X2 + … + βpXp) > 0
o The above equation/constraint verifies that all points are properly classified.
o There will be many linear equations that satisfy this equation, but we need to thin this
solution space down. Thus, in order to select the best one, we need to refine the
optimization problem constraints to find the optimal hyperplane that separates the
classes by the maximum amount which we will call margin.
- Margin: In the case of SVM, we define margin as the distance between the classes and their
decision boundary. Consider the previous equation for a hyperplane and consider it has 2
parallel lines equidistant from one another via a distance M. This distance M can be calculated
using the formula for the distance between parallel line:
o M = 2 / (sqrt (∑ ai
2
))
o In this case we are going to expand the margin as much as possible so that there is
maximum separation between classes. This optimization is gone about by minimizing the
denominator of M which is the sum of all coefficients squared. By minimizing the sum of
coefficients, we can maximize the distance between classes while our previous
constraint continues to enforce that all classes are still correctly classified. This will yield
a maximum margin classifier.
- Maximal Margin Classifier: The hyperplane that optimal separates the points into the correct
classes with the greatest margin between the decision boundary:
o Maximize: Margin~ M ~ 2 / (sqrt (∑ ai
2
))
o By Minimizing: ∑ ai
2
o Or by setting a constraint: ∑ ai
2
= 1
o Constrained by: (Yi) (β0 + β1X1 + β2X2 + … + βpXp) ≥ M
o Collectively, this optimization problem lays out that all points be classified correctly with
enough distance from misclassification as specified by the margin amount that satisfies
the inequality relative to its maximized magnitude.
o The points that lie upon the margin equations are support vectors and are equal to the
margin that has been maximized.
- Support Vector Machine: In this variant of a maximal margin classifier, we face a dataset with
non-linearly separable classes. Due to this
[Show More]