February 9, 2014

ml-502: Hypothesis Representation and Decision Boundary

Hello! In the last post we took an initial swing at a new type of ML problem, viz classification. We also looked at how the ML linear regression algorithm fails to work for this type of problem, and saw that there is another ML algorithm, logistic regression, that is used to solve them. Today we will do a deeper dive into various concepts surrounding the logistic regression ML algorithm.

We saw in the last post that we want:

0 <= hθ(x) <= 1

In linear regression, we know that:

hθ(x) = θ' * x

In logistic regression, we will define:

hθ(x) = g(θ' * x)

where,

g(z) = 1 / (1 + e-z)

g(z) is called the logistic or sigmoid function (hence the name "logistic" regression for this algorithm). The graph (z v/s g(z)) of the sigmoid function looks like:

![]()

As you might notice, the sigmoid function has this interesting that it asymptotes at 0 as g(z) nears -infinity, and asymptotes at 1 as g(z) nears +infinity. This property makes this function extremely useful in our logistic regression implementation. Also, notice in the above graph that

g(z) >= 0.5 when z >= 0

and

g(z) < 0.5 when z < 0

that is

hθ(x) = g(θ' x) >= 0.5, whenever (θ' x) >= 0

and

hθ(x) = g(θ' x) < 0.5, whenever (θ' x) < 0

We would define our threshold such that

y = 1 if hθ(x) >= 0.5

and

y = 0 if hθ(x) < 0.5

Combining the above 2 we get

"y = 1" if (θ' *x) >= 0

and

"y = 0" if (θ' *x) < 0

Remember this condition, because it will be used when we try to understand the next important concept of "decision boundary".

Anyways, coming back to our hypothesis function, it becomes:

hθ(x) = 1 / (1 + e-(θ' * x))

This might seem confusing to understand, so let's try to put it in simpler terms. The meaning of the output of hypothesis function is "estimated probability that 'y = 1' on input 'x'". For example, if

x = [x0 x1]' = [1 tumorSize]'

and if, for this x

hθ(x) = 0.7

then, it means that the probability that the tumor is malignant is 70%. Mathematically, it is written as

hθ(x) = P(y = 1 | x; θ)

and is read is "probability that 'y = 1', given 'x', parameterized by 'θ'". Conversely, probability for 'y = 0' is

P(y = 0 | x; θ) = 1 - P(y = 1 | x; θ)

Next, let's look at another important concept called the "decision boundary". To explain this let us consider the following dataset:

![]()

Let us also define our hypothesis function as

hθ(x) = θ0 + θ1 x1, θ2 x2

and assume that we have somehow found the values of θ already as:

θ = [-4 1 1]'

So our hypothesis function becomes (based on our previously noted equation above) to predict:

"y = 1" if -4 + x1 + x2 >= 0

or

"y = 1" if x1 + x2 >= 4

Similarly,

"y = 0" if x1 + x2 < 4

![]()

The line in the above graph that separates the 2 binary classes "y = 1" and "y = 0" is called the decision boundary. It is extremely important to note that the decision boundary is the property of the hypothesis function and is parameterized by θ, and is not dependent on the dataset.

I think it has been a bit heavy today, so let's stop here. In the next post, we will take a look at some other, more interesting, decision boundaries and understand more about them. So stay tuned!