Detailed Analysis of Bi-Directional LSTM on IMDb dataset
Image Source: https://www.mentionlytics.com/blog/how-to-use-sentiment-analysis-for-brand-building
Introduction
In this article, we study the Bi-directional LSTM model applied to the IMDb movie review sentiment analysis dataset. Sentiment analysis of movie reviews involves labeling the review given by the user as either positive or negative. This enables the movie-businesses to improve content and movie-goer experience.
Stacked Bi-directional LSTMs are — LSTMs that read input sequences in both forward and backward directions — stacked on top of each other. Our task is to classify the reviews as positive or negative. The code and the libraries used are explained in details.
Background
Long Short Term Memory (LSTM) networks are a type of recurrent neural networks (RNNs) that remember or forget information as required from input sequences. This is achieved by using a number of gates in the network. This helps in modelling the long term dependencies among the terms of a sequence. Such a network has feedback connections which enable the network to model images, videos, speech, and sequences of text.
Long short term memory networks were particularly developed to solve the problem of vanishing gradients. During back propagation, when the gradients become too small due to the inability of the model to capture long term dependencies, the weights of the model are not updated. Long short term memory networks solve this problem by using gates and feedback connections in their architecture.
LSTMs can be stacked on top of each each other, can be stacked with Convolutional Neural Network models and can be made to read sequences in both forward and backward directions (Bidirectional LSTMs). These variations in LSTMs enable them to handle various categories of problems.
LSTMs and stacked LSTMs can be applied to various tasks like encoder-decoder architecture for language translation, sequence to sequence model for text generation, sequence to sequence model for optical character recognition and intelligent character recognition, named entity recognition and speech-to-text conversion.
Import Libraries
import numpy as np
from tensorflow import keras
from tensorflow.keras import layers
Numpy: Numpy is an open-source python library for handling n-dimensional arrays, written in the C programming language. Python is also written in the C programming language. Loading Numpy in the memory enables the Python interpreter to work with array computing in a fast and efficient manner.
Numpy offers the implementation of various mathematical functions, algebraic routines and Fourier transforms. Numpy supports different hardware and computing technologies and is well suited for GPU and distributed computing. The high-level language used provides ease of use with respect to the various Numpy functionalities.
Keras: Keras is a high-level API which can use Tensorflow deep learning library as a backend. Although keras does not have the same lower level access capabilities as Tensorflow, it reduces human effort by providing easy to use plug-and-play APIs.
It provides ease of usage with GPUs and TPUs. As tensorflow can be used as a backend, the developed models can be exported to various environments like Javascript, android, iOS etc. Keras is used at various eminent institutions like CERN, NASA, NIH etc.
Layers: The “layers” API in Keras allows us to add various layers to our neural network. The layers take a tensor as input and provide a tensor as output. Typical layer classes and components are initialization layers, regularization layers, normalization layers, convolution and pooling layers, recurrent layers, embedding layers etc.
# consider top 20000 words only
max_features = 20000
# consider 200 words per review only
maxlen = 200
We use a corpus of 20000 words for building our model. Further, each review is limited to a maximum of 200 words.
Build the model
# variable length input integer sequences
inputs = keras.Input(shape = (None,),dtype="int32")
# Embed each integer to 128 dimensional vector space
x = layers.Embedding(max_features, 128)(inputs)
# Add 2 Bi-LSTMS
x = layers.Bidirectional(layers.LSTM(64,return_sequences=True))(x)
x = layers.Bidirectional(layers.LSTM(64))(x)
# Add a classifier
outputs = layers.Dense(1,activation="sigmoid")(x)
model = keras.Model(inputs,outputs)
model.summary()
We start by using the Keras Input layer to define our inputs to the model. As the input sizes can vary in length, we define the first shape parameter as None. The words are mapped to 32-bit integers before being fed into the model. This pre-processing step is done for us by the Keras API.
Each sequence is then embedded to a dimension of sequence_length x 128 where each word is represented in 128 dimensional vector space. Next two Bidirectional-LSTM layers are added. The first layer maps the sequence to a feature vector of dimension sequence_length x (64x2). A factor of two is multiplied to 64 as the layer is bi-directional.
The input to LSTMs is in the form of batch_size x sequence_length x feature_dimension. However, the output will be of the form batch_size x feature_dimension (only 1 output is returned) . However, for the second Bi-LSTM layer we need an input of the form batch_size x sequence_length x feature_dimension.
So we set return_sequences=True to return the intermediate output steps from the Bi-LSTM layer. The input to the second layer now becomes batch_size x sequence_length x feature_dimension and the output becomes batch_size x feature_dimension.
Summary of the Bi-LSTM model.
Next, a dense layer is added which maps the input of size (batch_size x feature_dimension) to size (batch_size x 1). Sigmoid activation function is used to provide probability values in the range [0,1] for binary classification. A model is defined using the Model API which uses the inputs to calculate the outputs.
Load the dataset
(x_train, y_train),(x_val,y_val) = keras.datasets.imdb.load_data(num_words=max_features)
print(len(x_train),"Training sequences")
print(len(x_val), "Validation sequences")
x_train = keras.preprocessing.sequence.pad_sequences(x_train, maxlen=maxlen)
x_val = keras.preprocessing.sequence.pad_sequences(x_val,maxlen=maxlen)
We load the IMDb training and validation sets provided by the Keras library with top 20,000 words chosen. The dataset contains 25000 training and 25000 validation sentences. Next we pad the training and validation sets to maxlen sequence_length using the pad_sequences API.
Train and evaluate the model
model.compile("adam","binary_crossentropy",metrics=["accuracy"])
model.fit(x_train, y_train, batch_size=32,epochs=2, validation_data=(x_val,y_val))
Keras models must be compiled before training so we compile the model using “adam” optimizer and “binary_crossentropy” loss function. We monitor the training accuracy to compile the model. We fit the model using a batch_size of 32 and train the model for two epochs. After training for 2 epochs the model gives a training accuracy of 91.4 % and a validation accuracy of 87.1%.
Conclusion
This brings us to the end of the article. In this article we studied the application of the Bi-directional LSTM model to the IMDb reviews sentiment analysis problem. We started by explaining the Long Short Term Memory networks in brief and the applications of LSTMs and stacked LSTMs.
We studied the background on various libraries used for building our model. Then we downloaded the pre-processed IMDb movie-review dataset using the Keras library and developed a Bidirectional LSTM model for sentiment analysis.
The model gave a training accuracy of 91.4% and a validation accuracy of 87.1 % after training for 2 epochs. This example is taken from the Keras official website while details on libraries and code have been added to explain the work in a clear manner.
Please write comments and reviews as applicable. Feedback is always welcome. This article borrows text from my earlier article in Analytics Vidhya on “Logistic regression on UCI dataset” and one of my articles in Medium on “Detailed analysis of finding and drawing contours in an image”.
My name is Narayanan (Yetirajan*) Arvind and I am working as an AI/ML R&D engineer at IN-D by Emulya Technologies PTE LTD. My interests lie in Computer vision, NLP , Deep learning and machine learning. My linkedin profile can be found at: https://www.linkedin.com/in/arvind-yetirajan-narayanan-iyengar-2b0632167/
* meaning king of the ascetics