ml-301: Linear Regression with one variable
Welcome back! After having gone through some posts on introduction to ML (here, here, here, here and here), lets now start learning a real ML algorithm. The algorithm that we will study today is a supervised learning regression algorithm.
First lets define the problem statement. Today we will use a subset of this data. The first few training examples in the data are:
18.0 | 8 | 307.0 | 130.0 | 3504. | 12.0 | 70 | 1 | "chevrolet chevelle malibu" |
15.0 | 8 | 350.0 | 165.0 | 3693. | 11.5 | 70 | 1 | "buick skylark 320" |
18.0 | 8 | 318.0 | 150.0 | 3436. | 11.0 | 70 | 1 | "plymouth satellite" |
16.0 | 8 | 304.0 | 150.0 | 3433. | 12.0 | 70 | 1 | "amc rebel sst" |
17.0 | 8 | 302.0 | 140.0 | 3449. | 10.5 | 70 | 1 | "ford torino" |
15.0 | 8 | 429.0 | 198.0 | 4341. | 10.0 | 70 | 1 | "ford galaxie 500" |
14.0 | 8 | 454.0 | 220.0 | 4354. | 9.0 | 70 | 1 | "chevrolet impala" |
14.0 | 8 | 440.0 | 215.0 | 4312. | 8.5 | 70 | 1 | "plymouth fury iii" |
14.0 | 8 | 455.0 | 225.0 | 4425. | 10.0 | 70 | 1 | "pontiac catalina" |
15.0 | 8 | 390.0 | 190.0 | 3850. | 8.5 | 70 | 1 | "amc ambassador dpl" |
15.0 | 8 | 383.0 | 170.0 | 3563. | 10.0 | 70 | 1 | "dodge challenger se" |
14.0 | 8 | 340.0 | 160.0 | 3609. | 8.0 | 70 | 1 | "plymouth 'cuda 340" |
15.0 | 8 | 400.0 | 150.0 | 3761. | 9.5 | 70 | 1 | "chevrolet monte carlo" |
14.0 | 8 | 455.0 | 225.0 | 3086. | 10.0 | 70 | 1 | "buick estate wagon (sw)" |
24.0 | 4 | 113.0 | 95.00 | 2372. | 15.0 | 70 | 3 | "toyota corona mark ii" |
22.0 | 6 | 198.0 | 95.00 | 2833. | 15.5 | 70 | 1 | "plymouth duster" |
18.0 | 6 | 199.0 | 97.00 | 2774. | 15.5 | 70 | 1 | "amc hornet" |
In this data, the 1st column is the "mpg" target variable, or the parameter that we have to predict, while the rest of the columns are the features or input variables. But today we will go simple and will use only 1 input variable, that is the 3rd column "displacement". To keep the algorithm simple, we will assume that our target variable "mpg" is someway dependent or related to only "displacement", our input variable.
So now the first 10 training examples look like (notice that I have swapped the order of the data Note that the first column is x (input variable(s)), and second column is y (target variable))
x | y |
307.0 | 18.0 |
350.0 | 15.0 |
318.0 | 18.0 |
304.0 | 16.0 |
302.0 | 17.0 |
429.0 | 15.0 |
454.0 | 14.0 |
440.0 | 14.0 |
455.0 | 14.0 |
390.0 | 15.0 |
Thus the problem definition is given a new "displacement" (x), we need to predict the value for "mpg" (y). To do this we need to a function "h" that maps the input "x" to the output "y". In ML world, this function "h" is called the hypothesis function, and is represented as
hΘ(x) = Θ0 + Θ1 x
In the above example "h" is also called 'univariate linear regression', because we are using only 1 variable (or feature).
Before we move ahead, lets get some terminology noted
- m => #examples
- x => input variable/feature
- y => output variable/target
- x(i), y(i) => ith training example, so for example, in the above data, x(1) = 307.0, x(5) = 302.0, y(7) = 14.0
What we need to do next is to chose Θ0 and Θ1 so that hΘ(x) is as close to y for our training examples (x, y). I hope you understand why we are doing this. This is because, if the hypothesis function that we find out (as the end goal of our ML problem) is perfect, then the value of hΘ(x(i)) should be exactly equal to the corresponding y(i), for any example i.
So, obviously, our goal would be to do the following
minimize(Θ0,Θ1) (Σim (hΘ(x(i)) - y(i))2)/(2*m)
The division by 2m is simply taking an average of the squared error and is done to make later math easier. This function is called the cost function J(Θ0, Θ1), and also called squared error function, or squared error cost function. This is the most most commonly used error function in ML world.
By implementing the above we would find the optimum values of Θ0 and Θ1 that will minimize our cost function J. Then all we need to do is to insert these Θ0 and Θ1 values into our hypothesis function hΘ(x) and our ML problem is solved. Then given any new value for x, we can simply put that x in the hypothesis function to get the y.
Ain't that cool? We have already started solving ML problems and making predictions!
Well, not so fast stud ;) We still have to figure out how to perform the last step of the puzzle, viz "minimize", don't we? But I think this much study is good for today, and we will jump to the minimization problem in the next post. So keep watching this space :)