Keras, Tell Me The Genre Of My Book

Harnessing the Power of Recurrent Neural Networks for Classification.

Published in

Towards Data Science

17 min readMay 23, 2020

Photo by Billion Photos on Shutterstock — Photo by Billion Photo on Shutterstock

For anyone hoping to gain an insight on how Recurrent Neural Networks (RNN) works, I hope this simple tutorial would be a great read!

The data set we’re using consists of book descriptions and genre classification I scraped from GoodReads and this is a great example of using RNN’s for a typical classification problem. For this project, we shall reduce our problem to a Binary Classification problem and we are going to use an RNN to do sentiment analysis on full-text book descriptions!

Think about how amazing this is. We’re going to train an artificial neural network how to “read” book descriptions and guess its genre.

Since understanding written language requires keeping track of all the words in a sentence, we need a recurrent neural network to keep a “memory” of the words that have come before as it “reads” sentences over time.

In particular, we’ll use LSTM (Long Short-Term Memory) cells because we don’t really want to “forget” words too quickly — words early on in a sentence can affect the meaning of that sentence significantly.

GitHub Repository here

Here are some important libraries we will need for this project — Numpy, Pandas, matplotlib, Plotly and Tensorflow. We will be using Keras — something that runs on top of TensorFlow (or CNTK or Theano). Keras allows us to think less about original model topology and dive straight into easy and fast prototyping. The faster you can experiment, the better your results :)

import numpy as np
import pandas as pdimport matplotlib.pyplot as plt
import tensorflow
import plotly.offline as pyoff
import plotly.graph_objs as go
pyoff.init_notebook_mode()

Let’s first import the data I scraped from Good Reads and see what we have to work with.

bookdata_path = 'book_data.csv'
testdata_path = 'book_data18.csv'
book = pd.read_csv(bookdata_path)
test = pd.read_csv(testdata_path)book.columns

'book_authors': Author’s name str

'book_desc': Description str

'book_edition': Different editions of the book str

'book_format': Hardcover/Paperback str

'book_pages': Number of pages in a book int

'book_rating': Book rating float

'book_rating_count': Number of ratings int

'book_review_count': Number of reviews int

'book_title': Title of book str

'genres': Genre of book str

'image_url': Book image url str

Data Cleaning and Data Exploration

This step takes up the bulk of every data scientist’s time. We look at every column in the data frame and find out any potential problems we might face.

Some common problems include:

Missing values
Different languages involved
non-Ascii characters
Invalid descriptions
Missing spaces in the description eg. HelloILike toEat

I suggest writing down all the findings from your data cleaning step so that you can constantly refer back to your notes and ensure you don’t miss a step!

Without further ado, here are my findings.

Many languages present in the corpus — Do I want to keep all of them or just English descriptions? How about the overall language distribution in my dataset?
Each book has at least 1 user-defined genre — How many genres are there in my dataset? What is the genre distribution? How many unique genres are there?

1. Remove descriptions with invalid format

Since we are predicting the genre, the genre will be our labels and the features would come from the descriptions of each book. I have observed that there were formatting errors in some entries — this is where langdetect comes in. We will implement a function to remove any rows with invalid description format.

from langdetect import detectdef remove_invalid_lang(df):
    invalid_desc_idxs=[]
    for i in df.index:
        try:
            a=detect(df.at[i,'book_desc'])
        except:
            invalid_desc_idxs.append(i)
    
    df=df.drop(index=invalid_desc_idxs)
    return dfbook = remove_invalid_lang(book)
test = remove_invalid_lang(test)

2. Get english descriptions only

I noticed that there were many languages involved in my dataset. For simplicity, I only want to get book descriptions that are in English.

book[‘lang’]=book[‘book_desc’].map(lambda desc: detect(desc))
test['lang']=test['book_desc'].map(lambda desc: detect(desc))

detect from langdetect allows us to map each description to a ISO 639–1 value to make our lives easier when filtering out english descriptions. Use it! I will then retrieve a list of languages and their respective ISO values from Wikipedia.

lang_lookup = pd.read_html('https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes')[1]
langpd = lang_lookup[['ISO language name','639-1']]
langpd.columns = ['language','iso']def desc_lang(x):
    if x in list(langpd['iso']):
        return langpd[langpd['iso'] == x]['language'].values[0]
    else:
        return 'nil'book['language'] = book['lang'].apply(desc_lang)
test['language'] = test['lang'].apply(desc_lang)

Distribution of Language in book dataset

Clearly from the graph, the vast majority of descriptions are in English. Let’s take a closer look at the distribution of other languages.

Retrieve all English books from our test and training dataset with these one-liners!

book = book[book['language']=='English']
test = test[test['language']=='English']

3. Look at all genres available

This is how our genre column looks like. We have numerous user-defined genres separated by ‘|’ so we definitely have to clean that up.

Fantasy|Young Adult|Fiction

In every data science project, it is very important to know the distribution of your data and the best way to do so would be to plot graphs! I really like using Plotly for data visualisation, but matplotlib and seaborn would do the job as well.

This is my function to get all genres for each book and plot them in a graph.

def genre_count(x):
    try:
        return len(x.split('|'))
    except:
        return 0book['genre_count'] = book['genres'].map(lambda x: genre_count(x))plot_data = [
    go.Histogram(
        x=book['genre_count']
    )
]
plot_layout = go.Layout(
        title='Genre distribution',
        yaxis= {'title': "Frequency"},
        xaxis= {'title': "Number of Genres"}
    )
fig = go.Figure(data=plot_data, layout=plot_layout)
pyoff.iplot(fig)

Most books have approximately 5–6 genres each and pretty normally distributed I must say.

def genre_listing(x):
    try:
        lst = [genre for genre in x.split("|")]
        return lst
    except: 
        return []book['genre_list'] = book['genres'].map(lambda x: genre_listing(x))genre_dict = defaultdict(int)
for idx in book.index:
    g = book.at[idx, 'genre_list']
    if type(g) == list:
        for genre in g:
            genre_dict[genre] += 1genre_pd = pd.DataFrame.from_records(sorted(genre_dict.items(), key=lambda x:x[1], reverse=True), columns=['genre', 'count'])

The code above gives me a dictionary of all genres and their total count in the entire corpus. Let’s get to the plot.

plot_data = [
 go.Bar(
 x=genre_pd[‘genre’],
 y=genre_pd[‘count’]
 )
]
plot_layout = go.Layout(
 title=’Distribution for all Genres’,
 yaxis= {‘title’: “Count”},
 xaxis= {‘title’: “Genre”}
 )
fig = go.Figure(data=plot_data, layout=plot_layout)
pyoff.iplot(fig)

It is not practical to look at those genres with very low counts since it holds little to no value to us. We will only want to look at the top unique genres that is representative of the dataset hence let us pick the top 50 genres to look at!

If we look at the genre_list column, a book is classified as fiction if ‘fiction’ is included as at least one of its genres. By observation, if a book has at least fiction in its genre_list, all other genres in the same list will be closely associated to fiction as well. From this I can compare the number of fiction and nonfiction books in my dataset and turn this into a binary classification problem!

def determine_fiction(x):
    lower_list = [genre.lower() for genre in x]
    if 'fiction' in lower_list:
        return 'fiction'
    elif 'nonfiction' in lower_list:
        return 'nonfiction'
    else:
        return 'others'
book['label'] = book['genre_list'].apply(determine_fiction)
test['label'] = test['genre_list'].apply(determine_fiction)

4. Clean The Text

Here are my functions to remove any non-Ascii characters and remove punctuations.

def _removeNonAscii(s): 
    return "".join(i for i in s if ord(i)<128)def clean_text(text):
    text = text.lower()
    text = re.sub(r"what's", "what is ", text)
    text = text.replace('(ap)', '')
    text = re.sub(r"\'s", " is ", text)
    text = re.sub(r"\'ve", " have ", text)
    text = re.sub(r"can't", "cannot ", text)
    text = re.sub(r"n't", " not ", text)
    text = re.sub(r"i'm", "i am ", text)
    text = re.sub(r"\'re", " are ", text)
    text = re.sub(r"\'d", " would ", text)
    text = re.sub(r"\'ll", " will ", text)
    text = re.sub(r'\W+', ' ', text)
    text = re.sub(r'\s+', ' ', text)
    text = re.sub(r"\\", "", text)
    text = re.sub(r"\'", "", text)    
    text = re.sub(r"\"", "", text)
    text = re.sub('[^a-zA-Z ?!]+', '', text)
    text = _removeNonAscii(text)
    text = text.strip()
    return textdef cleaner(df):
    df = df[df['label'] != 'others']
    df = df[df['language'] != 'nil']
    df['clean_desc'] = df['book_desc'].apply(clean_text)return df

Simply call:

clean_book = cleaner(book)
clean_test = cleaner(test)

and we now have a ‘clean’ description for each book!

"Winning will make you famous. Losing means certain death.” becomes

“winning will make you famous losing means certain death”

Preparing our data for the model

Now comes the fun part. The book description is our predictor so we have to pay special attention! We need to ensure that every description is in the same format and length.

Working with fixed input length improves performance during model training because this allows for the creation of fixed-shape tensors and hence more stable weights. Therefore, we will do clipping and padding — process of trimming descriptions to an optimal length and padding it with empty values if original description length is shorter than optimal length.

How do we determine the optimal length?

Plot the distribution of description length and observe the most ‘common’ description length.

Wow that’s a very skewed distribution! But know we know most books have description lengths lesser than 500. I want to plot the Cumulative Distribution Function (CDF) to observe the ‘population of books’ for every level of description length.

len_df_bins=clean_book.desc_len.value_counts(bins=100, normalize=True).reset_index().sort_values(by=['index'])len_df_bins['cumulative']=len_df_bins.desc_len.cumsum()len_df_bins['index']=len_df_bins['index'].astype('str')len_df_bins.iplot(kind='bar', x='index', y='cumulative')

About 92.7% of the records have word count below 277 words. Hence I decide to set my max threshold at 250 words. We will need a minimum threshold as well and I will set that to 6 since any description with less than 5 words is unlikely to be descriptive enough to determine the genre.

1. Clipping and Padding

For records where the description is less than 250 words, we will pad them with empty values, whereas for records where the description is more than 250 words, we will clip them to include just the first 250.

The RNN reads token sequence from left to right and outputs a single prediction for whether the book is fiction or nonfiction. The memory of these tokens are passed on one by one to the final token, and thus, it is important to pre-pad the sequence instead of post-padding it. This means that the zeros are added BEFORE the token sequence and not after. There are situations where post-padding may be more effective, for example, in bi-directional networks.

min_desc_length=6
max_desc_length=250clean_book=clean_book[(clean_book.clean_desc.str.split().apply(len)>min_desc_length)].reset_index(drop=True)

The above code filters out all descriptions with less than 6 words.

vocabulary=set() #unique list of all words from all descriptiondef add_to_vocab(df, vocabulary):
    for i in df.clean_desc:
        for word in i.split():
            vocabulary.add(word)
    return vocabularyvocabulary=add_to_vocab(clean_book, vocabulary)#This dictionary represents the mapping from word to token. Using token+1 to skip 0, since 0 will be used for padding descriptions with less than 200 words
vocab_dict={word: token+1 for token, word in enumerate(list(vocabulary))}#This dictionary represents the mapping from token to word
token_dict={token+1: word for token, word in enumerate(list(vocabulary))}assert token_dict[1]==token_dict[vocab_dict[token_dict[1]]]def tokenizer(desc, vocab_dict, max_desc_length):
    '''
    Function to tokenize descriptions
    Inputs:
    - desc, description
    - vocab_dict, dictionary mapping words to their corresponding tokens
    - max_desc_length, used for pre-padding the descriptions where the no. of words is less than this number
    Returns:
    List of length max_desc_length, pre-padded with zeroes if the desc length was less than max_desc_length
    '''
    a=[vocab_dict[i] if i in vocab_dict else 0 for i in desc.split()]
    b=[0] * max_desc_length
    if len(a)<max_desc_length:
        return np.asarray(b[:max_desc_length-len(a)]+a).squeeze()
    else:
        return np.asarray(a[:max_desc_length]).squeeze()len(vocabulary)
85616

We have 85616 unique words. Finally the last step for this clipping and padding step, tokenize each description.

clean_test['desc_tokens']=clean_test['clean_desc'].apply(tokenizer, args=(vocab_dict, max_desc_length))

2. Train-Test Split

When a dataset is imbalanced, i.e., the distribution of target variable (fiction/nonfiction) is not uniform, we should make sure that the training-validation split is stratified. This ensures that the distribution of the target variable is preserved in both the training and validation datasets.

We can also try random under-sampling to reduce number of fiction samples However, I will use stratified sampling in this case. Here’s why.

Stratified random samples are used with populations that can be easily broken into different subgroups or subsets, in our case, fiction or nonfiction. I will randomly choose record from each label in proportion to the group’s size versus the population. Each record must only belong to one stratum(label) and I am certain each record is mutually exclusive since a book can only either be fiction or nonfiction. Overlapping strata would increase the likelihood that some data are included, thus skewing the sample.

One of the many advantages stratified sampling has over random under-sampling is that because it uses specific characteristics, it can provide a more accurate representation of the books based on what is used to divide it into different subsets, also we don’t have to remove any records which might be useful in our model.

def stratified_split(df, target, val_percent=0.2):
    '''
    Function to split a dataframe into train and validation sets, while preserving the ratio of the labels in the target variable
    Inputs:
    - df, the dataframe
    - target, the target variable
    - val_percent, the percentage of validation samples, default 0.2
    Outputs:
    - train_idxs, the indices of the training dataset
    - val_idxs, the indices of the validation dataset
    '''
    classes=list(df[target].unique())
    train_idxs, val_idxs = [], []
    for c in classes:
        idx=list(df[df[target]==c].index)
        np.random.shuffle(idx)
        val_size=int(len(idx)*val_percent)
        val_idxs+=idx[:val_size]
        train_idxs+=idx[val_size:]
    return train_idxs, val_idxs_, sample_idxs = stratified_split(clean_book, 'label', 0.1)train_idxs, val_idxs = stratified_split(clean_book, 'label', val_percent=0.2)
sample_train_idxs, sample_val_idxs = stratified_split(clean_book[clean_book.index.isin(sample_idxs)], 'label', val_percent=0.2)classes=list(clean_book.label.unique())sampling=Falsex_train=np.stack(clean_book[clean_book.index.isin(sample_train_idxs if sampling else train_idxs)]['desc_tokens'])
y_train=clean_book[clean_book.index.isin(sample_train_idxs if sampling else train_idxs)]['label'].apply(lambda x:classes.index(x))x_val=np.stack(clean_book[clean_book.index.isin(sample_val_idxs if sampling else val_idxs)]['desc_tokens'])
y_val=clean_book[clean_book.index.isin(sample_val_idxs if sampling else val_idxs)]['label'].apply(lambda x:classes.index(x))x_test=np.stack(clean_test['desc_tokens'])
y_test=clean_test['label'].apply(lambda x:classes.index(x))

x_train and y_train will be used to train our model while x_val and y_val are used to check the validation accuracy of our model. We are aiming for a moderately high training accuracy and high validation accuracy to ensure we are not overfitting.

Overfitting is when our model performs well at making predictions on the data they were trained on but fails to generalise on unseen data — validation data. On the other hand, under-fitting occurs when our model performs terribly even on the training data.

An overfitted model will have high variance and low bias while an under-fitted model will have high bias and low variance.

Error = Bias² + Variance

Our main aim is to reduce error and not bias or variance specifically so the optimal complexity is in the middle.

Model Building

At this step, our training data is only a matrix of numbers which is the necessary input into our model. As for our y labels, it is now 1 (fiction) or 0 (nonfiction).

RECAP

So to recap, we have a bunch of book descriptions that have been converted into vectors of words represented by integers, and a binary sentiment classification to learn from. RNN’s can blow up quickly and to keep things manageable on our little PC we have limited the descriptions to their first 250 words. Not forgetting that this also helps in the performance improvement of our model training!

Initialising our model

Now let’s set up our neural network model! Considering how complicated a LSTM recurrent neural network is under the hood, it’s really amazing how easy this is to do with Keras.

We will start with an Embedding layer — this is just a step that converts the input data into dense vectors of fixed size that’s better suited for a neural network. You generally see this in conjunction with index-based text data like we have here. The embedding layer helps us reduce the dimensionality of the problem. If we one-hot encode the words in the vocabulary, each word will be represented by a vector the size of the vocabulary itself, which in this case is 85643. Since each sample will be a tensor of size (vocabulary x no. of tokens), i.e., (85643 x 250), the size of the layer will be too big for the LSTM to consume and it will be very resource-intensive and time-consuming for the training process. If I use embedding, my tensor size will only be 250 x 250! WAYYY smaller!

One hot encoding will result in a huge sparse matrix while embedding gives us a dense matrix representation. The higher the embedding length, the more complex representations our model can learn since our embedding layer learns a “representation” of each word that is of a fixed length.

Next we just have to set up a LSTM layer for the RNN itself. It’s that easy. We specify 200 to match the output size of the Embedding layer, and dropout terms to avoid overfitting, which RNN’s are particularly prone to. You can choose any number other than 200 by the way, it only specifies the number of hidden neurons in that layer.

Finally we just need to boil it down to a single neuron in the last layer with a sigmoid activation function to choose our binary sentiment classification of 0 or 1.

I will explain why I chose my hyper-parameters for the model, but if you would like to skip to the codes, feel free to skip this part!

How many LSTM layers should we use? Speed-Complexity Tradeoff

Usually 1 layer is enough to find trends in simple problems and 2 is sufficient to find reasonably complex features. We can compare accuracy of our model after a fixed number of epochs for a range of choices(number of layers), if we find that the validation accuracy does not change significantly even after adding more layers, we can select the minimum number of layers.

How many hidden nodes should we add to our LSTM layers?
Ns : number of samples in training data
Ni : number of input neurons
No : number of output neurons
alpha : scaling factor(Indicator of how general you want your model to be, or how much you want to prevent overfitting)

General formula: Ns / [alpha * (Ni + No)]

Adding a Dropout layer. Accuracy-Overfit prevention Tradeoff

Prevents overfitting by ignoring randomly selected neurons during training and reduces sensitivity to specific weights of individual neurons. This forces our model to spread out it learning.

Adding a Dense layer

Since we have 1 output label indicating fiction or nonfiction, we will have 1 dimensional output.

Adding Activation layer

There are many activation functions to choose from so it depends on our goal.

In this case, we want the output to be either fiction/nonfiction hence Softmax or Sigmoid functions will be good.

Sigmoid function basically outputs probabilities, we will usually use sigmoid for binary classification.

Softmax function output values between 0 and 1 such that the summation of all output values equals to 1. Basically you get a probability of each class whose sum is bound to be one. This makes Softmax great for multi-class problems.

You can tune the activation layer hyper-parameter with both Sigmoid and Softmax functions and compare the validation accuracy! We could try reLU too, it is commonly used because it is fast to compute and works well.

Choosing the loss function, optimiser and judgement metrics

Since we are faced with a binary classification problem, binary cross-entropy will work well with Sigmoid because the cross-entropy function cancels out the plateaus at each end of the sigmoid function and therefore speeds up the learning process.

For optimisers, adaptive moment estimation (adam), has been shown to work well in most practical applications and works well with only little changes in the hyper-parameters. r

Judging the models’ performance from an overall accuracy point of view is the easiest to interpret the resulting model performance.

Now we will actually train our model. RNN’s, like CNN’s, are very resource heavy. Keeping the batch size relatively small is the key to enabling this to run on your PC at all. In the real word of course, you’d be taking advantage of GPU’s installed across many computers on a cluster to make this scale a lot better.

Code

parameters = {'vocab': vocabulary,
              'eval_batch_size': 30,
              'batch_size': 20,
              'epochs': 5,
              'dropout': 0.2,
              'optimizer': 'Adam',
              'loss': 'binary_crossentropy',
              'activation':'sigmoid'}

def bookLSTM(x_train, y_train, x_val, y_val, params):
    model = Sequential()
    model.name="Book Model"
    model.add(Embedding(len(params['vocab'])+1, output_dim=x_train.shape[1], input_length=x_train.shape[1]))
    model.add(LSTM(200, return_sequences=True))
    model.add(Dropout(params['dropout']))
    model.add(LSTM(200))
    model.add(Dense(1, activation=params['activation']))
    model.compile(loss=params['loss'],
              optimizer=params['optimizer'],
              metrics=['accuracy'])
    print(model.summary())
    model.fit(x_train, 
          y_train,
          validation_data=(x_val, y_val),
          batch_size=params['batch_size'], 
          epochs=params['epochs'])
    results = model.evaluate(x_test, y_test, batch_size=params['eval_batch_size'])
    return model

BookModel1 = bookLSTM(x_train, y_train, x_val, y_val, parameters)------ Model Summary ------Model: "Book Model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
embedding_2 (Embedding)      (None, 200, 200)          17123400  
_________________________________________________________________
lstm_1 (LSTM)                (None, 200, 200)          320800    
_________________________________________________________________
dropout_1 (Dropout)          (None, 200, 200)          0         
_________________________________________________________________
lstm_2 (LSTM)                (None, 200)               320800    
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 201       
=================================================================
Total params: 17,765,201
Trainable params: 17,765,201
Non-trainable params: 0
_________________________________________________________________Train on 23387 samples, validate on 5845 samples
Epoch 1/5
23387/23387 [==============================] - 270s 12ms/step - loss: 0.3686 - accuracy: 0.8447 - val_loss: 0.2129 - val_accuracy: 0.9129
Epoch 2/5
23387/23387 [==============================] - 282s 12ms/step - loss: 0.1535 - accuracy: 0.9476 - val_loss: 0.2410 - val_accuracy: 0.9013
Epoch 3/5
23387/23387 [==============================] - 279s 12ms/step - loss: 0.0735 - accuracy: 0.9771 - val_loss: 0.2077 - val_accuracy: 0.9357
Epoch 4/5
23387/23387 [==============================] - 280s 12ms/step - loss: 0.0284 - accuracy: 0.9924 - val_loss: 0.2512 - val_accuracy: 0.9334
Epoch 5/5
23387/23387 [==============================] - 293s 13ms/step - loss: 0.0161 - accuracy: 0.9957 - val_loss: 0.2815 - val_accuracy: 0.9290
657/657 [==============================] - 3s 5ms/step

I noticed that as my epochs goes from 3 to 5, test accuracy increases, but validation accuracy decreases. This means that the model is fitting the training set better, but it is losing the ability to predict on new data, indicating that my model is starting to fit on noise and is beginning to overfit. Let’s change up the parameters!

parameters = {'vocab': vocabulary,
              'eval_batch_size': 30,
              'batch_size': 20,
              'epochs': 2,
              'dropout': 0.2,
              'optimizer': 'Adam',
              'loss': 'binary_crossentropy',
              'activation':'sigmoid'}

def bookLSTM(x_train, y_train, x_val, y_val, params):
    model = Sequential()
    model.name="Book Model2"
    model.add(Embedding(len(params['vocab'])+1, output_dim=x_train.shape[1], input_length=x_train.shape[1]))
    model.add(LSTM(200, return_sequences=True))
    model.add(Dropout(params['dropout']))
    model.add(LSTM(200))
    model.add(Dense(1, activation=params['activation']))
    model.compile(loss=params['loss'],
              optimizer=params['optimizer'],
              metrics=['accuracy'])
    print(model.summary())
    model.fit(x_train, 
          y_train,
          validation_data=(x_val, y_val),
          batch_size=params['batch_size'], 
          epochs=params['epochs'])
    results = model.evaluate(x_test, y_test, batch_size=params['eval_batch_size'])
    return model

BookModel2 = bookLSTM(x_train, y_train, x_val, y_val, parameters)------ Model Summary ------Model: "Book Model2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
embedding_3 (Embedding)      (None, 200, 200)          17123400  
_________________________________________________________________
lstm_3 (LSTM)                (None, 200, 200)          320800    
_________________________________________________________________
dropout_2 (Dropout)          (None, 200, 200)          0         
_________________________________________________________________
lstm_4 (LSTM)                (None, 200)               320800    
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 201       
=================================================================
Total params: 17,765,201
Trainable params: 17,765,201
Non-trainable params: 0
_________________________________________________________________Train on 23387 samples, validate on 5845 samples
Epoch 1/2
23387/23387 [==============================] - 337s 14ms/step - loss: 0.3136 - accuracy: 0.8690 - val_loss: 0.1937 - val_accuracy: 0.9305
Epoch 2/2
23387/23387 [==============================] - 332s 14ms/step - loss: 0.1099 - accuracy: 0.9636 - val_loss: 0.1774 - val_accuracy: 0.9341
657/657 [==============================] - 3s 4ms/step

We achieved training accuracy of 96.4% and validation accuracy of 93.4% in only 2 epochs! That is pretty good for just 2 runs. I have tried with many different hyper-parameters and running just 10 epochs took me an hour so….

Try it for yourself and let me know what parameters you chose to obtain a higher validation accuracy than me!

Simple function to test our model

’In this mesmerizing sequel to the New York Times bestselling Girls of Paper and Fire, Lei and Wren have escaped their oppressive lives in the Hidden Palace, but soon learn that freedom comes with a terrible cost. Lei, the naive country girl who became a royal courtesan, is now known as the Moonchosen, the commoner who managed to do what no one else could. But slaying the cruel Demon King wasn’t the end of the plan — -its just the beginning. Now Lei and her warrior love Wren must travel the kingdom to gain support from the far-flung rebel clans. The journey is made even more treacherous thanks to a heavy bounty on Leis head, as well as insidious doubts that threaten to tear Lei and Wren apart from within.Meanwhile, an evil plot to eliminate the rebel uprising is taking shape, fuelled by dark magic and vengeance. Will Lei succeed in her quest to overthrow the monarchy and protect her love for Wren, or will she fall victim to the sinister magic that seeks to dest’

def reviewBook(model,text):
  labels = [‘fiction’, ‘nonfiction’]
  a = clean_text(fantasy)
  a = tokenizer(a, vocab_dict, max_desc_length)
  a = np.reshape(a, (1,max_desc_length))
  output = model.predict(a, batch_size=1)
  score = (output>0.5)*1
  pred = score.item()
  return labels[pred]

Let’s pass in our final model and sample text to see if our model can accurately predict the genre of a book just based on its description. Which genre do you think the above book belongs to?

reviewBook(BookModel2,fantasy)

‘fiction’

Pretty obvious with ‘Demon King’ and ‘dark magic’ in the description!

Conclusion

Note that the validation accuracy while we were training never really improved after the second or third epoch; we’re likely just overfitting. This is a case where early stopping would have been beneficial.

But again — stop and think about what we just made here! A neural network that can “read” descriptions and deduce whether the book is fiction or not based on that text. And it takes the context of each word and its position in the review into account — and setting up the model itself was just a few lines of code! It’s pretty incredible what you can do with Keras.

That’s it for now guys, hope you all had a great time reading my very first Medium article. CHEERRRRRSSSSSSSSSSSSSS