Make today better than yesterday!: ml-402: Multivariate Linear Regression in Octave

February 2, 2014

ml-402: Multivariate Linear Regression in Octave

Hi! I hope that you have been following the recent posts in the ML series where we have understood in much detail the various aspects of the linear regression ML algorithm, and also intuition about how it works. We have also formulated all the mathematical formulae, but have not seen how to implement them in octave yet. So that's what we are going to do today. Lets start with a small recap (of only the functions) of the multivariate linear regression ML algorithm:

hypothesis function is defined as:

h_θ(x) = Σ_j=0ⁿ θ_j x_j

cost function is defined as:

J_{(θ0, θ1, θ2, …, θn)} = (Σ_i=1^m Σ_j=1ⁿ (h_θ(x⁽ⁱ⁾_j) - y⁽ⁱ⁾)²)/(2*m)

While the gradient descent steps are:

repeat until convergence {

θ_j = θ_j - α*(1/m)* (Σ_i=1^m (h_θ(x⁽ⁱ⁾_j) - y⁽ⁱ⁾) * (x⁽ⁱ⁾_j)), for j = 0 to n

}

Before we move onto octave, lets define some variables:

X => input feature "matrix" of size "mxn"

x => "one" example "vector" of size "(n+1)x1" (including the pivot feature x0 (which is always = 1))

θ => vector of size "(n+1)x1"

y => output target variable "vector" of size "mx1"

Now for the actual octave program

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%% helper functions
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

function y = hypothesis(x, theta)
  N = length(x); % this is actually inclusive of pivot x0, so this N = n + 1, where n = #features
  y = 0;
  for j in 1:N
    y = y + x(j) * theta(j);
  end;
end;

function J = cost(X, y, theta)
  m = size(X, 1);
  X = [ones(m, 1), X]; % add the x0 feature column
  n = size(X, 2); % note that this n includes the x0 feature too
  J = 0;
  for i in 1:m
    J = J + (hypothesis(X(i, :), theta) - y(i)) ^ 2; % the inner "j" for-loop is inside the "hypothesis" function
  end;
  J = J / (2 * m);
end;

function [J, theta] = gradientDescent(X, y, theta, alpha, num\_iters)
  % ideally we should let gradientDescent run until convergence,
  % but for simplicity we will run it for a fixed number of iterations (num\_iters)
  
  m = size(X, 1);
  n = size(X, 2);
  
  for iter in 1:num\_iters
    sum = zeros(n, 1);
    for i = 1:m
      diff = hypothesis(X(i, :), theta) - y(i);
      for j = 1:n
        sum(j) = sum(j) + diff * X(i, j);
      end;
    end;
    for j = 1:n
      theta(j) = theta(j) - (alpha * (sum(j) / m));
    end;
  end;

  J = cost(X, y, theta);
end;

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%% main program starts here
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

[X, y] = load("data");        % load some data

%%%%% initialize some basic variables
m = size(X, 1);               % #training examples
X = [ones(m, 1), X];          % add pivot feature vector x0
n = size(X, 2);               % #features + 1
alpha = 0.01;                 % "learning rate"; actual value needs to be determined experimentally
theta = zeros(n, 1);          % initialize to some arbitrary value
num\_iters = 1000;             % initialize to some arbitrary value

%%%%% run gradientDescent
[J, theta] = gradientDescent(X, y, theta, alpha, num\_iters);

That's it! Wasn't that easy?

Once we have found out the optimal values for θ (theta), we can find the value for any new "y", by simply calling the hypothesis function using the new (given) value for "x".

« ml-203: Vectorization in Octave ml-401: Multivariate Linear Regression »

ml-402: Multivariate Linear Regression in Octave

Recent Posts

Tags