Linear regression with one variable



Model representation

Supervised Learning: Give the “right aswer” for each example in the data.

Predict Problem: Predict real-valued-value output.

Training set of housing prices (Portland, OR):

Size in feet^2 (x) Price ($) in 1000’s (y)
2104 460
1416 232
1534 315
852 178

Notation: m = Number of training examples

x’s = “input” variable / features

y’s = “output” variable / “target” variable

In this data set , m = 47



**How do we represent h **


Linear regression with one variable. Univariate (单变量) linear regression.(一元线性回归)

Cost function



$minimize \frac{1}{2m}\sum^{m}_{i=1}(h_Θ(x^i)-y^i)^2$

among them:


and then:


whitch is:

$minimize J(Θ_0,Θ_1)$

This function is cost function(代价函数). Also called squared error function(平方误差函数).

Cost function intution

Using this sipmlified definition of hypothesizing cost function, let’s try to understand the cost function concept better.


For each value of Θ , we wound up with a different value of function J, we could use his to trace out this plot on the right.


The optimization objective for our learning algorithm is we want choose the value of Θ that minimizes J .

Back the origin function


When use two parameters, the plot maybe:


bowl function

Gradient descent


Have some function $J(Θ_0,Θ_1)$

Want $min J(Θ_0,Θ_1)$


Gradient descent algorithm


:= Denote assignment

α learning rate(学习率), it basically controls how big a step we take downhill with gradient descent.

Gradient descent intuition

About derivative


About α

If α is too small, gradient descent can be slow.

If α is too large, gradient descent can overshoot the minimum. It may fail to converge, or even diverge.

If $Θ_1$ at a local optimum or the local minimum, the derivative would be equal to 0, in gradient descent update, $Θ_1$ unchanged.

Gradient descent can converge to a local minimum, even with the learning rate α fixed.

As we approach a local minimum, gradient descent will automatically take smaller steps. So, no need to decrease α over time.



Gradient descent for linear regression

Gradient descent algorithm


cost function for liner regression is always going to be a convex function(凸函数).

“Batch” Gradient Descent

“Batch”: Each step of gradient descent uses all the training examples.