A decision boundary is a surface that separates two or more classes into different sets, where all the points belonging to one class lie on one side of the decision boundary.
Plotting a decision boundary is a great way to visually evaluate how good our machine learning model is, and in this article, I am going to give a demo of how to plot a decision boundary using NumPy and Matplotlib for a binary classification problem.
- Import the necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
2. Load the make_circles dataset from sklearn
from sklearn.datasets import make_circles
n_samples = 1000
X,y = make_circles(n_samples,
noise = 0.03,
random_state=42)
3. Check the shape of X and y
X.shape, y.shape-> ((1000, 2), (1000,))
4. Sample a point and label
X[4], y[4] -> (array([ 0.44220765, -0.89672343]), 0)
5 . Convert the data into pandas dataframe and check the first few rows
circles = pd.DataFrame({"X0":X[:,0], "X1":X[:,1], "label":y})
circles.head()
6. Plot the data
plt.scatter(X[:,0],X[:,1], c = y, cmap =plt.cm.RdYlBu);
X[:,0] denotes all rows and the first column for array X,
X[:,1] denotes all rows and the second column for array X,
c = y means that the colour of the plotted points X will be according to their labels y. So in the plot below, all the points with label 1 are red and all the points with label 0 are blue.
7. Divide the data into train and test sets, in the ratio of 80:20.
from sklearn.model_selection import train_test_splitX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)X_train.shape, X_test.shape, y_train.shape, y_test.shape-> ((800, 2), (200, 2), (800,), (200,))
8. Now we can use any classification algorithm to build and train a model to fit on the train set and evaluate on the test set. Since this article is about plotting decision boundaries, I will not get into the details of which model to choose. I have chosen a neural network with 2 layers here
import tensorflow as tfmodel = tf.keras.Sequential([
tf.keras.layers.Dense(20,activation='relu'),
tf.keras.layers.Dense(1, activation = 'sigmoid')
])model.compile(loss = tf.keras.losses.BinaryCrossentropy(),
optimizer = tf.keras.optimizers.Adam(), metrics = ['accuracy']
)model.fit(X_train, y_train, epochs=100)
Our model is now ready, and we can evaluate our model on the test set and check the performance, but since our ultimate aim here is to plot a decision boundary, I will directly get into it.
Decision boundary steps
All the above steps were required to get a model, which in turn will be required to plot a decision boundary for our data, for the given model. This will help us visualise how well the model is separating the 2 classes.
9. Find the min and maximum x and y coordinates for our data
x_min, x_max = X[:, 0].min() - 0.1, X[:,0].max() + 0.1y_min, y_max = X[:, 1].min() - 0.1, X[:, 1].max() + 0.1
This gives a rectangular boundary for our data, within which we can plot our data and the decision boundary. ‘0.1’ is added to provide uniform padding all around.
10. Using the boundaries above, we will create a rectangular grid of points inside these boundaries, using NumPy's meshgrid method. This meshgrid will help in plotting the contour/shape of our decision boundary.
xx, yy = np.meshgrid(np.linspace(x_min,x_max, 100),np.linspace(y_min, y_max, 100))
we take the extreme values and create 100 equidistant points in both x and y directions and combine these to create x and y coordinates of each point within the mesh grid, which will look something like this :
11. Now we need to obtain the predicted values for each of the points in the meshgrid. To do so, we need to concatenate the coordinate xx and yy obtained above along the second (axis =1). We use NumPy c_ method for this:
x_in = np.c_[xx.ravel(), yy.ravel()]
The np.ravel method flattens the xx and yy arrays into a 1-dimensional array
12. Obtain the predicted values using our trained model from above
y_pred = model_1.predict(x_in)
13. We need to reshape y_pred to be the same shape as xx for using in the contour function
y_pred = np.round(y_pred).reshape(xx.shape)
14. Now we are ready to plot our decision boundary. We can use matplotlib’s contour function for this, along with the scatter plot to plot the original data
plt.contourf(xx, yy, y_pred, cmap=plt.cm.RdYlBu, alpha=0.7 )plt.scatter(X[:,0], X[:, 1], c=y, s=40, cmap=plt.cm.RdYlBu)plt.xlim(xx.min(), xx.max())plt.ylim(yy.min(), yy.max())
We obtain the below figure:
We can see that our model has performed quite well, and the decision boundary, seen as a white curve between the light blue and pink regions above is quite successful at separating the data points belonging to different classes.
Thanks for reading.