Make today better than yesterday!: ml-401: Multivariate Linear Regression

February 1, 2014

ml-401: Multivariate Linear Regression

Hi! In the last post, we completed learning our very first machine learning algorithm, viz "linear regression". As you might remember from here, this is a type of "supervised learning" ML algorithm. However, during our study, we had only 1 feature (or input variable), so that what we learnt was really the "univariate linear regression". Today we will take a look at how to handle a more real life situation, that is when we have more than one feature. Such an algorithm is called the "multivariate linear regression" ML algorithm.

To recap, for univariate linear regression, the steps were:

x => input variable/feature (in "univariate", we have only 1 feature)
y => output variable/target (this is the value that we need to predict)
m => #training examples
h(x) => hypothesis function, defined as

h_Θ(x) = Θ₀ + Θ₁ x

cost function => J(Θ₀, Θ₁) = (Σ_i=1^m (h_Θ(x⁽ⁱ⁾) - y⁽ⁱ⁾)²)/(2*m)
minimization of the cost function was done using the "gradient descent" algorithm which was implemented as

repeat until convergence {

Θ₀ = Θ₀ - α*(1/m)* (Σ_i=1^m (h_Θ(x⁽ⁱ⁾) - y⁽ⁱ⁾)), for j = 0

Θ₁ = Θ₁ - α*(1/m)* (Σ_i=1^m (h_Θ(x⁽ⁱ⁾) - y⁽ⁱ⁾) * (x⁽ⁱ⁾)), for j = 1

}

For the "multivariate" version of "linear regression" ML algorithm, we will have multiple features x₁, x₂, x₃, etc. We define one more notation

n => number of features

In this case our hypothesis function "h(x)" will change as:

h_Θ(x) = Θ₀ + Θ₁ x₁ + Θ₂ x₂ + Θ₃ x₃ + … + Θ_n x_n

h_Θ(x) = Θ₀ + Σ_j=1ⁿ Θ_j x_j

It is customary (makes later maths easier) to add an extra feature x₀ whose value is always "1", so that h(x) becomes

h_Θ(x) = Σ_j=0ⁿ Θ_j x_j

Our cost function too will change as:

J(Θ₀, Θ₁, Θ₂, …, Θ_n) = (Σ_i=1^m Σ_j=1ⁿ (h_Θ(x⁽ⁱ⁾_j) - y⁽ⁱ⁾)²)/(2*m)

And so will the gradient descent:

repeat until convergence {

Θ₀ = Θ₀ - α*(1/m)* (Σ_i=1^m (h_Θ(x⁽ⁱ⁾) - y⁽ⁱ⁾)), for j = 0

Θ₁ = Θ₁ - α*(1/m)* (Σ_i=1^m (h_Θ(x⁽ⁱ⁾₁) - y⁽ⁱ⁾) * (x⁽ⁱ⁾₁)), for j = 1

Θ₂ = Θ₂ - α*(1/m)* (Σ_i=1^m (h_Θ(x⁽ⁱ⁾₂) - y⁽ⁱ⁾) * (x⁽ⁱ⁾₂)), for j = 2

…

Θ_n = Θ_n - α*(1/m)* (Σ_i=1^m (h_Θ(x⁽ⁱ⁾_n) - y⁽ⁱ⁾) * (x⁽ⁱ⁾_n)), for j = n

}

or, re-written simply:

repeat until convergence {

Θ_j = Θ_j - α*(1/m)* (Σ_i=1^m (h_Θ(x⁽ⁱ⁾_j) - y⁽ⁱ⁾) * (x⁽ⁱ⁾_j)), for j = 0 to n

}

As always, remember that the above updates to all Θ_j (for j = 0 to n) should happen simultaneously for a correct implementation (see here)

Once we compute our hypothesis function using the above implementation, it is then trivial to predict "y" for any given input set of features

y = h(x) = Σ_j=0ⁿ Θ_j x_j

That's it for today. In the next post we will learn how to implement these algorithms using "vectorization" for faster performance. So stay tuned!

« ml-402: Multivariate Linear Regression in Octave ml-305: Univariate Linear Regression »

ml-401: Multivariate Linear Regression

Recent Posts

Tags