January 30, 2014

ml-305: Univariate Linear Regression

In the last posts here and here, we looked in much depth at the "gradient descent" minimization algorithm. Today we will complete the puzzle by fitting that last piece in the whole jigsaw of the "univariate linear regression" ML algorithm that we have been studying in this 30x series.

To do that, lets recap what we have learnt so far:

  • x => input variable/feature (in "univariate", we have only 1 feature)
  • y => output variable/target (this is the value that we need to predict)
  • m => #training examples
  • h(x) => hypothesis function, defined as

hΘ(x) = Θ0 + Θ1 x

  • J(Θ0, Θ1) = (Σi=1m (hΘ(x(i)) - y(i))2)/(2*m)
  • Goal is to minimize the cost function, to find optimal values of Θ0 and Θ1 (to be substituted in the hypothesis function). This is achieved using gradient descent algorithm, defined as

repeat until convergence {

Θj = Θj - α δ/δΘj J(Θ0, Θ1) (for j = 0, 1)

}

where,

  • α: is called the "learning rate"
  • δ/δΘj J(Θ0, Θ1): is the partial derivative of the cost function (J(Θ0, Θ1)) with respect to Θj

The only thing left is to substitute the real value of J(Θ0, Θ1) inside the gradient descent algorithm. To do that we need to know the partial derivative of J(Θ0, Θ1) with respect to Θ0 and Θ1. Without going into derivation of the partial derivatives (because

  • it is too complex and mathematically involved
  • i don't know it ;) and
  • knowing the derivation is not needed to correctly implement gradient descent

), the gradient descent steps are defined (for our cost function for univariate linear regression) as:

repeat until convergence {

Θ0 = Θ0 - α(1/m)i=1m (hΘ(x(i)) - y(i))), for j = 0

Θ1 = Θ1 - α(1/m)i=1m (hΘ(x(i)) - y(i)) * (x(i))), for j = 1

}

That is we are inserting the actual values of the partial derivatives of J(Θ0, Θ1), with respect to Θ0 and Θ1 respectively in the gradient descent implementation. As mentioned previously, while implementing this step in octave (or any other software), it is extremely important to make both updates simultaneously.

With this we have completed learning our first ML algorithm, viz the "univariate linear regression". I hope you have had as much fun learning it as I have teaching it. For me, a lot of my concepts have become clear while explaining things over here in so much detail! This alone has made the whole effort worthwhile. It has also given me motivation to continue this series. So in the upcoming posts I will try to explain other more advanced ML algorithms.

Do leave comments below to share with me your experience here, especially (constructive) criticism about how I can strive to make these posts better. I am eagerly looking forward to hearing from you :)