# “Hello, World” of Machine Learning in 10 minutes (no installation required)

**What you need:**

- A Google account
- Less than 10 minutes

**Steps:**

2. Click “New Notebook” at the bottom.

3. You will find an area to write codes. Copy the codes below:

`from sklearn.linear_model import LinearRegression`

import numpy as np

x_values = np.linspace(1,10,10)

x_values = x_values.reshape(-1,1)

y_values = -2 * x_values + 1

model = LinearRegression()

model.fit(x_values, y_values)

print(model.coef_)

print(model.intercept_)

4. Hit the play button.

5. The following output will appear.

**Code explanation:**

- On lines #1 and #2, we imported the required libraries. The first one (
**sklearn**) is required for machine learning, and the second one (**numpy**) is for number processing. *np.linspace(1,10,10)*simply generates an array of 10 numbers between 1 and 10: [1,2,3,4,5,6,7,8,9,10].- sklearn requires the input items to be an array type, meaning we need to convert
*x_values*to : [[1],[2],[3],[4],[5],[6],[7],[8],[9],[10]].*x_values.reshape(-1,1)*on line#4 does this for us. - On line#5, we are generating
*y_values*. As you may assume, the i-th value of*y_values*will be “*-2 * x_values[i] +1”*. Meaning,*y_values: [[-1],[-3],[-5],[-7],[-9],[-11],[-13],[-15],[-17],[-19]]*

5. We are creating a Linear Regression model on line#5, which is one of the simplest machine learning models.

6. On line#6, we train the model to learn the relation between *x_values* and *y_values*.

7. Finally, we print the model parameters, which actually represent the learning of the model based on the training dataset.

**What is happening here:**

- We generated
*y_values*from*x_values*through a linear function. Remember from high school math that “*y = f(x) = ax + b” is*a linear function, where*a*is the**coefficient**(or slope)*b*is the**intercept**. In our case,*a = -2*and*b = 1*. - The model we created and trained on
*x_values*and*y_values*, notice that it had no information about*a*and*b*. Only by analyzing the data in*x_values*and*y_values*, it tries to learn the relation/pattern between these 2 arrays. In other words, it is trying to learn the coefficient(*a)*and the intercept(*b)*. - On the output, we see that it correctly learned and printed
*a*and*b*. (Don’t worry too much about the enclosing brackets.)

**This simple example demonstrates the endless possibility of machine learning:**

- Here we used hard-coded x/y-values for demonstration purposes. But we could have read them from CSV files for real-world problems. Usually,
**pandas**library is used for processing CSV files in Python. - Of course, the real-world problems usually do not involve such simple linear relation. In order to capture more complex relations, we have to consider more advanced models than linear regression. Some easy choices are support vector machine, random forests etc. You can easily try those models in our exercise code by replacing LinearRegression with
**SVR**,**RandomForestRegressor**etc. - Here we used only one feature input. In most of the real-world problems, we need to consider multiple features. For example, if you want to predict house price from
**size**(square feet) and**number of rooms**, you need to accommodate 2 features. In this case, the x_values might look like:

`[ `

[size_of_house_1, number_of_room_house_1],

[size_of_house_2, number_of_room_house_2],

[...],[...]

]

Possibly, you have got an idea now why sklearn requires the input items to be an array type.

4. We do not train models only to learn patterns. Rather, we need them to predict output for unseen inputs. It can easily be done by calling *model.predict(new_x_values)* in the code above. Here *new_x_values *is the input data for which you want to predict the output. The structure of *new_x_values *should be same as *x_values*.

If you find things inspiring so far and willing to dig deeper, feel free to drop me a line on messenger. I will be happy to provide more easy-to-follow resources/examples.