Apache Spark: Basic Concepts
Study note of Big Data Essentials: HDFS, MapReduce and Spark RDD ...
Study note of Big Data Essentials: HDFS, MapReduce and Spark RDD ...
Simple Linear Regression Models Linear Regression Model Form of the linear regression model: $y=\beta_{0}+\beta_{1}X+\epsilon$. Training data: ($x_1$,$y_1$) … ($x_N$,$y_N$). Each $x_{i} =(x_{i1},x_{i2},…,x_{ip})^{T}$ is a vector of feature measurements for the $i$-th case. Goal: estimate the parameters $β$ Estimation method: Least Squares, we pick the coefficients $β =(β_0,β_1,…,β_p)^{T}$ to minimize the residual sum of squares Assumptions: Observations $y_i$ are uncorrelated and have constant variance $\sigma^2$; $x_i$ are fixed (non random) The regression function $E(Y |X)$ is linear, or the linear model is a reasonable approximation. ...