Combining Machine Learning Models in Python

Posted on Thu 21 September 2023 in Python • 3 min read

Now that we've built & trained logistic regression and decision tree models to classify the iris dataset in these previous posts:

We found that they were both really good in their own regard (potentially overfitting), but what if we had two models that had pros/cons of each but we wanted the best of both worlds? In machine learning such a thing exists and it's known as ensembling models where you combine multiple models together to make a single model with hopefully the strengths of each of the models that are combined. There are many methods to how we combine them together which are grouped under 2 main categories: averaging and boosting.

Averaging ensemble methods is when we build multiple models independantly and then average out their predictions. By doing this, the variance of the model is reduced and typically increases the performance of the model. Boosting ensemble methods is when we build models sequentially where each model depends on the previous and combine them in a specific strategy for the final model.

For this post, we'll use sklearn to train each of the models that we previously trained and combine them together to see how they fair against each other.

As always we begin by loading the data.

Best practice would be to split the training and testing data here, but for brevity we will skip this step.

In [2]:
from sklearn.datasets import load_iris
import numpy as np

iris = load_iris()

Next we will train both our decision tree model and our logistic regression model.

In [3]:
from sklearn.tree import DecisionTreeClassifier

decisionTreeClassifier = DecisionTreeClassifier().fit(,
In [4]:
from sklearn.linear_model import LogisticRegression

logisiticRegression = LogisticRegression().fit(,

Now let's create a function that we can pass a model into that will give us a report on the score for the model, so then we can compare how well each model is performing.

In [9]:
from sklearn.metrics import classification_report

def get_model_score(model, model_name):
    predictions = model.predict(

    print(classification_report(, predictions))
In [11]:
get_model_score(logisiticRegression, 'logisiticRegression')
get_model_score(decisionTreeClassifier, 'decisionTreeClassifier')
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        50
           1       0.98      0.94      0.96        50
           2       0.94      0.98      0.96        50

    accuracy                           0.97       150
   macro avg       0.97      0.97      0.97       150
weighted avg       0.97      0.97      0.97       150

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        50
           1       1.00      1.00      1.00        50
           2       1.00      1.00      1.00        50

    accuracy                           1.00       150
   macro avg       1.00      1.00      1.00       150
weighted avg       1.00      1.00      1.00       150

As seen from the previous posts, the models are very strong in their own regard but we make sure to note that given the sample size we are potentially overfitting drastically to the dataset.

Since we will be making use of the voting classifier method later on, which takes a majority vote of the outcome of the models, we will need an odd number of models to give a worthwhile comparison, so let's train another model using k-Nearest Neighbours.

Now this model is a bit different in that we search for the best parameters for the model and then select the best model of this specific type that will go into our future ensemble. The cv argument stands for cross validation, which means the dataset is randomly split into a number of groups, similar to how we train/test split our dataset.

In [19]:
from sklearn.model_selection import GridSearchCV
from sklearn.neighbors import KNeighborsClassifier

knn = KNeighborsClassifier()

knn_params = {'n_neighbors': np.arange(1,50)}

knn_grid_search = GridSearchCV(knn, knn_params, cv=5),

knn_best = knn_grid_search.best_estimator_

get_model_score(knn_best, 'KNearestNeighboursClassifier')
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        50
           1       0.96      0.96      0.96        50
           2       0.96      0.96      0.96        50

    accuracy                           0.97       150
   macro avg       0.97      0.97      0.97       150
weighted avg       0.97      0.97      0.97       150

Now let's combine all of these models into a single ensemble model using the voting classifier method, this takes the majority of the models to decide on the output.

In [18]:
from sklearn.ensemble import VotingClassifier

models = [
    ('logisiticRegression', logisiticRegression),
    ('decisionTreeClassifier', decisionTreeClassifier),
    ('kNearestNeighboursClassifier', knn_best)

ensembleClassifier = VotingClassifier(models, voting='hard'),

get_model_score(ensembleClassifier, 'ensembleClassifier')
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        50
           1       0.98      0.98      0.98        50
           2       0.98      0.98      0.98        50

    accuracy                           0.99       150
   macro avg       0.99      0.99      0.99       150
weighted avg       0.99      0.99      0.99       150

All of our models across the three posts in this scenario boast the same scores, however, due to our ensemble model's lower variance, it should be more adapted to multiple scenarios and perform better generally.