Mnist and Emnist Handwriting Recognition Using Keras and Tensorflow

Handwriting recognition aka classifying each handwritten document is a challenging problem due to huge variation in individual writing styles. The traditional approach to solving this would be to extract language-dependent features like the curvature of different letters, spacing b/w letters etc.

And then use a deep learning-based approach to identifying these features. We will pass small patches of handwritten images to a NN (Neural Network) and train with a softmax classification loss.

This tutorial is meant to be a quick straightforward implementation of Python to recognize handwritten images with the help of NN. This includes both the recognition of MNIST and EMNIST datasets with the help of Dense NN and Convolution NN.

MNIST Image Classification

A popular demonstration of the capability of deep learning techniques is object recognition in image data. In this section, you will discover how to develop a deep learning model to achieve near state-of-the-art performance on the MNIST handwritten digit recognition task in Python using the Keras deep learning library.

We’ll apply all the knowledge to write a deep neural network. The problem we’ve chosen is referred to as the “Hello World” of Deep Learning because most students it is the first deep learning algorithm they see.

The dataset is called MNIST and it provides 70,000 images (28×28 pixels) of handwritten digits (1 digit per image).

The goal is to write an algorithm that detects which digit is written. Since there are only 10 digits (0,1,2,3,4,5,6,7,8,9), this is a classification problem with 10 classes. Our goal would be to build a neural network with 2 hidden layers.

Now let’s come to the coding part without wasting much time. The coding part includes the following sections:

1. Importing the relevant packages:

import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds

2. Loading the dataset:

mnist_dataset, mnist_info=tfds.load(name='mnist',with_info=True,as_supervised=True)



3. Pre-processing the data:

The pixel values in images must be scaled prior to providing the images as input to a deep learning neural network model during the training or evaluation of the model. Traditionally, the images would have to be scaled prior to the development of the model and stored in memory or on disk in the scaled format. An alternative approach is to scale the images using a preferred scaling technique just-in-time during the training or model evaluation process.

def scale(image, label):
    return image, label

The function made above should be mapped to both training data and testing data so that it should be scaled.

Now, we want the data to be shuffled. When we are dealing with enormous amount of datasets, we can’t shuffle all data at once.

shuffling dataset



batch tensorflow

4. Model

So, let’s outline the model. There are 784 inputs, so that’s our input layers. We have 10 output nodes one for each digit. We will work with two hidden layers consisting of 50 nodes each. I don’t know the optimal width and depth (i.e. hyper-parameters) for this problem but I surely know these values are suboptimal. The underlying assumption is that all hidden layer is of same size. Our data is such that each input is 28x28x1.


keras sequential
keras layers
keras layers dense

Now moving further, lets choose the optimizer and the loss function for our model.


5.Training the model:

Now we have reached the most exciting part of the machine learning process i.e. Training. This is where we fit our model we have built and see if it actually works.

EPOCH keras tensorflow

Note: we have parameter it in a neat way so we can clearly inspect and amend the number of epochs. Whenever we have hyper-parameters such as Batch size, Buffer size, Input size, Output size and so on. We prefer to create dedicated variables that can be easily spotted when we find two or debug our code.

6. Testing the model:

Now we must test the model on the test dataset because the final accuracy of the model come from forward propagating the test dataset not the validation dataset. The reason is we may have overfit. Finally, the test dataset is used to evaluate the model and a classification error rate is printed.

testing the model tensorflow

So here’s the final result. Our model has a final test accuracy of 97.3 percent with a loss of 8 percent. This means you have feedback that it has 97.3 percent accuracy with this particular configuration.

EMNIST Image Classification

The EMNIST dataset is a set of handwritten character digits derived from the NIST Special Database 19  and converted to a 28×28 pixel image format and dataset structure that directly matches the MNIST dataset . Further information on the dataset contents and conversion process can be found in the paper available at

The dataset is provided in two file formats. Both versions of the dataset contain identical information, and are provided entirely for the sake of convenience. The first dataset is provided in a MATLAB format that is accessible through both MATLAB and Python (using the function). The second version of the dataset is provided in the same binary format as the original MNIST dataset as outlined in

There are six different splits provided in this dataset. A short summary of the dataset is provided below:

  • EMNIST ByClass: 814,255 characters. 62 unbalanced classes.
  • EMNIST ByMerge: 814,255 characters. 47 unbalanced classes.
  • EMNIST Balanced:  131,600 characters. 47 balanced classes.
  • EMNIST Letters: 145,600 characters. 26 balanced classes.
  • EMNIST Digits: 280,000 characters. 10 balanced classes.
  • EMNIST MNIST: 70,000 characters. 10 balanced classes.

The full complement of the NIST Special Database 19 is available in the ByClass and ByMerge splits. The EMNIST Balanced dataset contains a set of characters with an equal number of samples per class. The EMNIST Letters dataset merges a balanced set of the uppercase and lowercase letters into a single 26-class task. The EMNIST Digits and EMNIST MNIST dataset provide balanced handwritten digit datasets directly compatible with the original MNIST dataset.

Convolution Neural Network for EMNIST

the input image is fed into the CNN layers. These layers are trained to extract relevant features from the image. Each layer consists of three operation. First, the convolution operation, which applies a filter kernel of size 5×5 in the first two layers and 3×3 in the last three layers to the input. Then, the non-linear RELU function is applied. Finally, a pooling layer summarizes image regions and outputs a downsized version of the input. While the image height is downsized by 2 in each layer, feature maps (channels) are added, so that the output feature map (or sequence) has a size of 32×256.

Now let’s see the coding part for this section. It also includes the following sections:

1. Importing the relevant packages:

import os
#os.environ['CUDA_VISIBLE_DEVICES'] = '' # hides the GPU from tensorflow (for science)
import gzip
import tensorflow.keras
import matplotlib.pyplot as plt
import numpy as np
import random
import struct
import time

#List computing devices available to tensorflow:
from tensorflow.python.client import device_lib
device_list = device_lib.list_local_devices()
[ for x in device_list]

2. Loading the dataset:

You should download the link first from the NIST site.

#List computing devices available to tensor-flow:

%matplotlib inline
image_dir = 'Path of the EMNIST dataset you have downloaded'
labels = '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz'
categories = len(labels)
batch_size = 1024
epochs = 50
model_path = 'Path to save the model.h5'

#Defining the helper function to read the dataset from disc:

def read_idx(filename):
    print('Processing data from %s.' % filename)
    with, 'rb') as f:
        z, dtype, dim = struct.unpack('>HBB',
        print('Dimensions:', dim)

#get the shape (size in each dimension) of the data:

shape = tuple(struct.unpack('>I',[0] for d in range(dim))
        print('Shape:', shape)

#return the data as a reshaped numpy array:

return np.frombuffer(, dtype=np.uint8).reshape(shape)

def load_emnist():
    train_images = os.path.join(image_dir, 'emnist-byclass-train-images-idx3-ubyte.gz')
    train_labels = os.path.join(image_dir, 'emnist-byclass-train-labels-idx1-ubyte.gz')
    test_images = os.path.join(image_dir, 'emnist-byclass-test-images-idx3-ubyte.gz')
    test_labels = os.path.join(image_dir, 'emnist-byclass-test-labels-idx1-ubyte.gz')

    train_X = read_idx(train_images)
    train_y = read_idx(train_labels)
    test_X = read_idx(test_images)
    test_y = read_idx(test_labels)
    return (train_X, train_y, test_X, test_y)
raw_train_X, raw_train_y, raw_test_X, raw_test_y = load_emnist()

#Displaying a random image to verify that the data loaded correctly:

print(raw_train_X.shape, raw_train_y.shape, raw_test_X.shape, raw_test_y.shape)

i = random.randint(0, raw_train_X.shape[0])
fig, ax = plt.subplots()
ax.imshow(raw_train_X[i].T, cmap='gray')
title = 'label = %d = %s' % (raw_train_y[i], labels[raw_train_y[i]])
ax.set_title(title, fontsize=20)

3. Pre-processing the data:

The pre-processing of EMNIST dataset involves normalisation of training and testing input data, reshaping the input data for input to the CNN and one-hot encoding the output (labels) data.

#Normalize the training and testing input data:

train_X = raw_train_X.astype('float32')
test_X = raw_test_X.astype('float32')
train_X /= 255
test_X /= 255

#Reshape the input data for input to the CNN

train_X = train_X.reshape(train_X.shape[0], 28, 28, 1)
test_X = test_X.reshape(test_X.shape[0], 28, 28, 1)

#One-hot encode the output (labels) data:

train_y = tensorflow.keras.utils.to_categorical(raw_train_y)
test_y = tensorflow.keras.utils.to_categorical(raw_test_y)

4. Model

For this task we build a convolution neural network (CNN) in Keras using Tensorflow backend. We will use a standard CNN with multiple convolution and Maxpool layers, a few dense layers and a final output layer with softmax activation. RELU activation was used between the convolution and dense layers and model was optimized using Adam optimizer.

The size of the model needs to be proportional to the size of the data. Three blocks of convolution -maxpool layers and couple of dense layers was sufficient for this problem.

#Defining the CNN model:

import tensorflow
model = tensorflow.keras.models.Sequential()
                kernel_size=(5, 5),
                strides=(2, 2),
                input_shape=(28, 28, 1),
                kernel_size=(3, 3),
model.add(tensorflow.keras.layers.MaxPooling2D(pool_size=(2, 2)))
model.add(tensorflow.keras.layers.Dense(128, activation='relu'))
model.add(tensorflow.keras.layers.Dense(categories, activation='softmax'))

Now moving further, lets choose the optimizer and the loss function for our model.

model.compile(loss='categorical_crossentropy',optimizer='adam', metrics=['accuracy'])

5.Training the model:

Now we have again reached the most exciting part of the machine learning process i.e. Training. This is where we fit our model we have built and see if it actually works.

#Training the model, saving historical data to graph:

t1 = time.time()
fit =, train_y, batch_size=batch_size, epochs=epochs, verbose=1, validation_data=(test_X, test_y))
t2 = time.time()
print('Elapsed time: %ds' % (t2 - t1))

The output can be observed as:


6.Testing the model:

results = model.evaluate(test_X, test_y)

#Showing the loss and accuracy results:

print(results[0]*100, results[1]*100)

#Plotting the model’s loss and accuracy:

plt.figure(figsize=(12, 6), dpi=96)
plt.subplot(1, 2, 1)
plt.title('Model Loss')
plt.legend(['train', 'test'], loc='upper left')
plt.subplot(1, 2, 2)
plt.title('Model Accuracy')
plt.legend(['train', 'test'], loc='upper left')
loss and accuracy keras tensorflow

7.Saving the model and calculating results:

# Saving model to the disc:

#Loading the model from disc:

model_new = tensorflow.keras.models.load_model(model_path)
results_new = model_new.evaluate(test_X, test_y)
print('Loss: %.2f%%, Accuracy: %.2f%%' % (results_new[0]*100, results_new[1]*100))

So here’s the final result. Our model has a final test accuracy of 87.2 percent with a loss of 34.2 percent. This means you have feedback that it has 87.2 percent accuracy with this particular configuration.


Handwriting recognition using deep learning is a very powerful technique for several reasons:

  • It automatically identifies deep powerful features
  • Our approach of feeding in random patches makes the model text independent
  • High prediction accuracy makes it possible to use this in practical applications

Leave a Comment

Your email address will not be published. Required fields are marked *