Study Note: Dimension Reduction - PCA, PCR

Dimension Reduction Methods Subset selection and shrinkage methods all use the original predictors, X1,X2, . . . , Xp. Dimension Reduction Methods transform the predictors and then fit a least squares model using the transformed variables. Approach Let $Z_1,Z_2, . . . ,Z_M$ represent $M < p$ linear combinations of our original $p$ predictors. That is, $$ \begin{align} Z_m=\sum_{j=1}^p\phi_{jm}X_j \end{align} $$ ...

June 14, 2019 · 11 min · 2235 words · Me

Study Note: SVM

Maximal Margin Classifier What Is a Hyperplane? Hyperplane: In a p-dimensional space, a hyperplane is a flat affine subspace of dimension $p − 1$. e.g. in two dimensions, a hyperplane is a flat one-dimensional subspace—in other words, a line. Mathematical definition of a hyperplane: $$ \beta_0+\beta_1X_1+\beta_2X_2,…+\beta_pX_p=0, \quad (9.1) $$ Any $X = (X_1,X_2,…X_p)^T$ for which (9.1) holds is a point on the hyperplane. ...

June 12, 2019 · 8 min · 1541 words · Me

Study Note: Resampling Methods - Cross Validation, Bootstrap

Resampling methods:involve repeatedly drawing samples from a training set and refitting a mode of interest on each sample in order to obtain additional information about the fitted model. model assessment:the process of evaluating a model’s performance model selection:the process of selecting the proper level of flexibility for a model cross-validation: can be used to estimate the test error associated with a given statistical learning method in order to evaluate its performance, or to select the appropriate level of flexibility. bootstrap:provide a measure of accuracy of a parameter estimate or of a given selection statistical learning method. ...

June 12, 2019 · 4 min · 819 words · Me

Study Note: Model Selection and Regularization (Ridge & Lasso)

Subset Selection/Adjusted $R^2$/Ridge/Lasso/SVD ...

June 11, 2019 · 18 min · 3817 words · Me

Study Note: Comparing Logistic Regression, LDA, QDA, and KNN

Logistic regression and LDA methods are closely connected. Setting: Consider the two-class setting with \(p = 1\) predictor, and let \(p_1(x)\) and \(p_2(x) = 1−p_1(x)\) be the probabilities that the observation \(X = x\) belongs to class 1 and class 2, respectively. In LDA, from $$ \begin{align} p_k(x)=\frac{\pi_k \frac{1}{\sqrt{2\pi}\sigma}\exp{\left( -\frac{1}{2\sigma^2}(x-\mu_k)^2 \right)}}{\sum_{l=1}^K\pi_l\frac{1}{\sqrt{2\pi}\sigma}\exp{\left( -\frac{1}{2\sigma^2}(x-\mu_l)^2 \right)}} \end{align} $$ $$ \begin{align} \delta_k(x)=x\frac{\mu_k}{\sigma^2}-\frac{\mu_k^2}{2\sigma^2}+\log(\pi_k) \end{align} $$ The log odds is given by $$ \begin{align}\log{\frac{p_1(x)}{1-p_1(x)}}=\log{\frac{p_1(x)}{p_2(x)}}=c_0+c_1x \end{align} $$ where c0 and c1 are functions of μ1, μ2, and σ2. In Logistic Regression, $$ \begin{align} \log{\frac{p_1}{1-p_1}}=\beta_0+\beta_1x \end{align} $$ ...

June 10, 2019 · 4 min · 851 words · Me