# Model representation

Supervised Learning: Give the “right aswer” for each example in the data.

Predict Problem: Predict real-valued-value output.

Training set of housing prices (Portland, OR):

Size in feet^2 (x) Price ($) in 1000’s (y) 2104 460 1416 232 1534 315 852 178 Notation: m = Number of training examples x’s = “input” variable / features y’s = “output” variable / “target” variable In this data set , m = 47 h：hypothesis，假设函数，代价函数 **How do we represent h ** Linear regression with one variable. Univariate （单变量） linear regression.（一元线性回归） # Cost function$minimize \frac{1}{2m}\sum^{m}_{i=1}(h_Θ(x^i)-y^i)^2$among them:$h_Θ(x^i)=Θ_0+Θ_{x^{(i)}}$and then:$J(Θ_0,Θ_1)=\frac{1}{2m}\sum^{m}_{i=1}(h_Θ(x^i)-y^i)^2$whitch is:$minimize J(Θ_0,Θ_1)$This function is cost function（代价函数）. Also called squared error function（平方误差函数）. # Cost function intution Using this sipmlified definition of hypothesizing cost function, let’s try to understand the cost function concept better. For each value of Θ , we wound up with a different value of function J, we could use his to trace out this plot on the right. The optimization objective for our learning algorithm is we want choose the value of Θ that minimizes J . Back the origin function When use two parameters, the plot maybe: bowl function # Gradient descent （梯度下降） Have some function$J(Θ_0,Θ_1)$Want$min J(Θ_0,Θ_1)$Outline • Start with some$Θ_0,Θ_1$• Keep changing$Θ_0,Θ_1$to reduce$J(Θ_0,Θ_1)$until we hopegully end up at a minimum ## Gradient descent algorithm := Denote assignment α learning rate（学习率）, it basically controls how big a step we take downhill with gradient descent. # Gradient descent intuition About derivative About α If α is too small, gradient descent can be slow. If α is too large, gradient descent can overshoot the minimum. It may fail to converge, or even diverge. If$Θ_1$at a local optimum or the local minimum, the derivative would be equal to 0, in gradient descent update,$Θ_1\$ unchanged.

Gradient descent can converge to a local minimum, even with the learning rate α fixed.

As we approach a local minimum, gradient descent will automatically take smaller steps. So, no need to decrease α over time.