The ultimate PyTorch Hello World

5 min readJan 2, 2023

In this tutorial, you will learn

  1. How to train a model in PyTorch
  2. How to evaluate model performance
  3. How to save and load models


You know some basic programming (array, loop etc.)

What are you going to do?

#1: We will create a dataset with 1000 rows like the one below where:

y = x1 * x2

#2: We will build a neural network to predict y from x1 and x2. Note that the choice of the equation ( y = x1 * x2) is arbitrary. You can try different equations and see that the model can predict almost anything!


Go to Google Colab: Log in if you are not logged in already. You should see a screen like the one below:


Click New Notebook:


You will get a place to write code. Copy the following code and paste it there:

import torch
from torch import nn
from sklearn.metrics import r2_score

class MyMachine(nn.Module):
def __init__(self):
self.fc = nn.Sequential(

def forward(self, x):
x = self.fc(x)
return x

def get_dataset():
X = torch.rand((1000,2))
x1 = X[:,0]
x2 = X[:,1]
y = x1 * x2
return X, y

def train():
model = MyMachine()
X, y = get_dataset()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-2, weight_decay=1e-5)
criterion = torch.nn.MSELoss(reduction='mean')

for epoch in range(NUM_EPOCHS):
y_pred = model(X)
y_pred = y_pred.reshape(1000)
loss = criterion(y_pred, y)
print(f'Epoch:{epoch}, Loss:{loss.item()}'), 'model.h5')

def test():
model = MyMachine()
X, y = get_dataset()

with torch.no_grad():
y_pred = model(X)
print(r2_score(y, y_pred))



Hit the run button:


It will take a few seconds to run. Then you will see an output like the below:


As you see in the last two lines, we are first training the model and then testing:

Both train and test functions use the MyMachine class and get_dataset function. So, let us see the structure of those first.


As you can see from the definition of MyMachine:

it is a simple neural network of the following structure:

Notice that we only define the forward pass mechanism. PyTorch will handle backward passes (backpropagation) automatically (with the help of computational graphs; if you are curious how).


On the first line, we define a 2-dimensional array X (1000 rows and 2 columns) with random numbers. The columns represent x1 and x2, respectively. Then we derive an array y (1000 rows and 1 column) by multiplying x1 by x2.

Structure of X and y


On the first line, we create an instance of MyMachine (model). On the second line, we are informing PyTorch that we are using the model for training at this moment.

An optimizer is used to update the weights and biases of the neural network. Here we are using Adam optimizer.

After that, we define a Mean-squared error (MSE) loss function.

Epoch simply means the number of times all data is passed for training.

At every iteration in the for loop, with:

we reset gradient values to ensure gradients are not adjusted multiple times.

After we pass the data through the model, we get the prediction (y_pred) in the form of an array of arrays of single values, for example:


Before comparing the prediction (y_pred) with the ground truth (y), we have to make its structure the same as y:

Then the structure will be like this:


After calculating the loss, we derive the gradients for all the weights and biases by:

Then, by running

we update all weights and biases based on the derived gradients.

Finally, we save the trained model in the file model.h5:


Again, on the first line, we create an instance of MyMachine (model). Then, we load the parameters (weights and biases) of the machine with the trained parameters we just saved. On the third line, we inform PyTorch that we are using the model in test (evaluation) mode.

After that, we get a new dataset (as X, y will be formed with newly generated random numbers) and pass X through the model. Finally, we print the R-squared value of the ground truth (y) and the prediction (y_pred). Simply speaking, the R-squared value indicates the prediction performance. R-squared value 1 means perfect prediction without any error (which is not practical). A low R-squared value indicates poor prediction.