February 17, 2014

ml-504: Cost Function and Logistic Regression

Hello. In the previous few posts (here, here and here we learned about linear regression and also formulated its hypothesis function using the sigmoid function so that the output variable "y" always is either 0 or 1. Today we will move onto the next stage and learn about the cost function that shall be used in the logistic regression ML problem.

In post we defined our hypothesis function as

hθ(x) = g(θ' x) = 1 / (1 + e-(θ' x))

Also, in post, we had defined the cost function for linear regression as:

J(θ) = (Σi=1m (hθ(x(i)) - y(i))2)/(2*m)

Let's re-write it as:

J(θ) = (Σi=1m ((hθ(x(i)) - y(i))2) / 2) / m

or

J(θ) = cost(hθ(x(i)), y(i)) / m

that is

cost(hθ(x(i)), y(i)) = (Σi=1m ((hθ(x(i)) - y(i))2) / 2)

But this cost-function is non-convex :( (unlike that of linear regression (see post)). This is because the squared cost of sigmoid function is non-linear (remember that our hypothesis function is the sigmoid of (θ' * x)). Due to this finding the optimal (global) minimum is (mostly, unless by luck) impossible. Hence we need to define a new cost-function for our logistic regression problem.

Let us define the new cost-function as follows:

cost(hθ(x), y) = -log(hθ(x)), if "y = 1"

-log(1 = hθ(x)), if "y = 0"

Now we need to prove that the above cost-function is convex. To do this let us first consider the case when "y = 1"

![]()

The above graph is a plot for the function "-log(hθ(x))". As can bee seen from this graph, our newly defined cost function has interesting and desirable properties:

  • for true-positive (hypothesis predicts y = 1, and really y = 1), then cost = 0
  • but false-negative (hypothesis predicts y = 0, but really y = 1), then cost grows to infinity

Now see what happens when "y = 0"

![]()

The above graph is a plot for the function "-log(1 - hθ(x))". Similar to the previous graph, we have some interesting and desirable properties here too:

  • for true-negative (hypothesis predicts y = 0, and really y = 0), then cost = 0
  • but false-positive (hypothesis predicts y = 1, but really y = 0), then cost grows to infinity

These properties are good because when our implementation is working as expected the cost (error) is 0, but when it is not, then it gets penalized heavily. When, in later stages, we apply minimization algorithms to this cost function, we are sure it will work in the right direction and produce the correct values of θ. From there, we will follow the standards steps of defining the hypothesis function which will be used to predict "y" given "x".

Coming back to the point, our real cost function "J(θ)" will now be defined as

J(θ) = cost(hθ(x(i)), y(i)) / m

where

cost(hθ(x), y) = -log(hθ(x)), if "y = 1"

-log(1 = hθ(x)), if "y = 0"

Keep watching this space until next time when we will move onto the next stage of the logistic regression ML algorithm and learn about minimization techniques for our new cost function!