Model Training & Evaluation

Ishan Deshpande
May 23
5 min read

In the previous blog, we learned how Feature Engineering transforms raw data into something a Machine Learning model can actually understand.

But preparing data is only half the journey. Now comes the most interesting question:

Once the data is ready… how does a Machine Learning model actually learn?

Does it memorize data?
Does it understand patterns?
And how do we know whether the model is truly good?

This is where Model Training & Evaluation comes into the picture.

During training, the model looks at historical examples and gradually improves its predictions.

During evaluation, we test whether the model can perform well on data it has never seen before.

What Does “Training a Model” Mean?

At a high level, training a model means, Teaching a machine to identify patterns from historical data so it can make predictions on new data.

But unlike humans, machines do not understand concepts, logic, or meaning. They only learn from examples.

Think of training a model like teaching a child. If you show a child enough examples of cats and dogs, eventually they start recognizing the difference on their own.

Machine Learning works in a similar way.

We provide:

Input data (Features)
Actual answers (Labels)

The model studies these examples and gradually learns relationships between them.

Train vs Test Data — Why Do We Split Data?

Imagine preparing for an exam.

You study using books, notes, and practice questions. But on exam day, the questions are new.

If the exam questions were exactly the same as your practice questions, getting a high score would not prove that you actually learned.

Machine Learning works in a very similar way.

A model should not just remember examples. It should learn patterns and apply them to new unseen data.

That’s why we divide our dataset into two parts.

Training Data - This is the data used to teach the model.

Testing Data - This data is kept separate and hidden during training. After training finishes, we use this data to check, can the model perform well on completely new examples?

Typical Data Split is 80:20

It is not mandatory to have 80:20 split, it can be little different as well for example 70:30

One question might come in your mind that - Why Not Train on 100% of the Data?

Because high training performance can sometimes be misleading.

Imagine a student memorizing answers instead of understanding concepts.

They may score 100% on practice but poorly on actual exams.

Models can do the same, this is called Overfitting (we’ll cover this later in the blog)

What Actually Happens During Training?

At this point, the model has:

Data to learn from
Expected answers available

Now training begins.

The model does not magically become intelligent.

Instead, it learns through a cycle of:

Try → Make Mistakes → Improve → Repeat

Think of learning to play cricket.

You:

Take a shot
Miss
Adjust
Try again

After many attempts, you improve.

Machine Learning training works in a very similar way.

Step 1 — The Model Makes a Prediction

The model receives input data and tries to predict the output.

Example:

Input:

Hours Studied

Model Prediction:

Pass

Step 2 — Compare with Actual Answer

Now the model checks:

Predicted → Pass
Actual → Fail

Oops… prediction was wrong.

Step 3 — Calculate Error

The model measures how far was my prediction from the actual answer?

Smaller error → Better model

Larger error → More improvement needed

This error acts like feedback.

Step 4 — Learn and Adjust

Now the model updates itself. It changes feature weightage slightly to improve future predictions.

You can think of this as making small corrections after every mistake.

We will take about this in very detail in the upcoming blogs when we discuss the algorithms

Step 5 — Repeat Again and Again

The model repeats this cycle thousands of times.

Every repetition helps improve performance.

Two Terms You’ll Hear Often

Epoch - One complete pass through the entire dataset.

Example: If your dataset has 1000 rows and the model sees all records once its 1 Epoch.

If repeated 10 time its 10 Epoch

Learning- The process of adjusting internal parameters to reduce errors.

How Do We Know if the Model is Good?

After training finishes, the next question becomes, Did the model actually learn properly?

Because simply making predictions is not enough. We need a way to measure performance. This process is called Model Evaluation

Different Machine Learning problems use different evaluation metrics.

For example:

Predicting Spam / Not Spam → Classification Metrics
Predicting House Price → Regression Metrics

Let’s first understand Classification Metrics.

Classification Metrics

Confusion Matrix — Understanding Model Mistakes

Confusion Matrix is the foundation of classification evaluation.

Instead of giving only one number, it shows:

Correct predictions
Incorrect predictions
Types of mistakes

Let's see a spam email detection example

Now let's see key evaluation metrics derived from a Confusion Matrix

1.Accuracy

It tells,

Out of all predictions, how many were correct?

Correct Predictions ÷ Total Predictions

Easy to understand, but not always reliable for imbalanced data.

Suppose you have 100 emails out of which 5 are spam. Your model marks all emails as not spam, its accuracy is still 95%.

95% accuracy sounds good, but it completely failed to detect spam.

2.Precision

It tells,

Out of everything predicted as positive, how many were actually correct?

Example:

100 emails marked as Spam.

Only 90 were actually Spam.

Precision - 90%

High Precision means: Fewer false alarms

3.Recall

It tells,

Out of all actual positives, how many did we successfully find?

Example:

100 actual Spam emails.

Model detected only 80.

Recall - 80%

High Recall means: Fewer missed cases

4.F1 Score

Sometimes Precision and Recall conflict. Improving one may reduce the other.

F1 Score balances both. Think of it as one score that combines Precision + Recall

Useful when:

Dataset is imbalanced

Accuracy becomes misleading

Regression Metrics

Regression models predict actual numbers, so we don’t check whether predictions are simply correct or incorrect.

Instead, we need to ask - How far was the prediction from the actual value?

That’s where Regression Metrics come in.

1. MAE — Mean Absolute Error

MAE measures:

On average, how much does the model miss by?

Example:

If MAE = ₹2 Lakhs

It means on average, predictions are off by ₹2 Lakhs.

Why MAE is useful

Easy to understand
Treats all mistakes equally

2. MSE — Mean Squared Error

MSE also measures prediction error.

But there’s one difference. It punishes larger mistakes more heavily.

Example:

Prediction errors: 2, 5, 10

MSE squares them: 4, 25, 100

Notice:

Large mistakes become much bigger.

Why use MSE?

Useful when:

Big mistakes are much more expensive.

Example:

Stock Prediction
Medical Forecasting
Financial Models

3. RMSE — Root Mean Squared Error

RMSE is built on MSE.

Think of it as MSE converted back into normal units.

Example:

Instead of 25,000,000, RMSE may show as ₹5 Lakhs

which is easier to interpret.

Why is RMSE popular?

Penalizes large mistakes
Still remains understandable

4. R² Score (R-Squared)

R² answers a different question.

Instead of measuring error, it measures how well does the model explain the data?

Example:

R² = 0.90

It means the model explains about 90% of the variation.

Higher is generally better.

Overfitting vs Underfitting

This is one of the most important ML concepts.

Imagine students.

Underfitting

Student did not study enough.

Result:Bad performance everywhere.

Overfitting

Student memorized answers.

Result:Excellent practice score.

Poor actual exam score.

Good Fit

Student understood concepts.

Result: Performs well on unseen questions.

Goal: Learn patterns, not memorize.

Final Thoughts

We started with prepared data and explored what happens next, how Machine Learning models actually learn.

We learned that training is a continuous cycle of:

Predicting
Measuring mistakes
Improving

And more importantly, A good model is not the one that performs best on old data — it’s the one that performs well on new unseen data

With training, evaluation, and model validation covered, we’ve now built another important piece of the ML foundation.

In the next blog, we’ll finally explore our first Machine Learning algorithm and see how predictions are actually made.

See you in the next blog — stay curious, keep growing. 🚀