Creating poems from one’s own poems — Neural Networks and Life Paradoxes

Parth Agrawal
8 min readJul 26, 2019
Image courtesy: Errantscience.com

In the age of texting and communicating in short texts instead of verbal calling, there comes a time when one realized that they have written something so genuine and with the thought that maybe if we wrote with such vigor on our cover letters, our high school essays, we might have been the next Stephen King.

In one of the interviews or tips given by Stephen King on how to be a writer, he mentions that no matter what happens, one should always keep on writing at least 1000 words a day. They might make sense, they might not. But it keeps the engine running and becomes a habit in the longer scheme of things.

Now, it is already difficult in a layman’s monotonous life to add a new routine which gets repeated just for the sake of it and although we as a species are a creature of habit, the new generation is shying away from mundane and stereotypical roles.

I took it upon myself to start off with a simpler challenge of writing 100 poems in 2018 and motivate myself to at least write once every alternate day.

We all know how new year resolutions and college projects show similar trends. Starts off with motivation, midyear crisis, and last moment sprints.

Roughly 2000–3000 words…in the whole year :/

Well, we all have our excuses and mine was learning machine learning and applying it at work. Little did I know, I was creating a toy dataset of my own in 2018 which might encourage me to write an article in the future (predictive much, eh?)

During these 2 years, the hype of AI and ML has grown and so has all the resources on the internet to learn them. I mean I personally at one point was bingeing more tutorials than mindflayer flaying young adults of Hawkins.

That tingling feeling everyone has when they start off learning about ML and Deep Learning nowadays

Just a few days back, I was taking this course on RNNs with Tensorflow by Andrew Ng’s deeplearning.ai and came across this sweet assignment of generating poetry (had done generating Shakespearean text before but that is too mainstream, bleh). He took some Irish sonnets and trained LSTMs on it to give him new poems on the go. Now that’s all fine, but let’s get into the technicalities of this.

If you want to head over to the final code, here is the git repo.

Let’s break down the code which is under 100 lines (cliched ml article line ✓).

Import the Libraries

from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.layers import Embedding, LSTM, Dense, Dropout, Bidirectional
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.models import Sequential
from tensorflow.keras.optimizers import Adam
from tensorflow.keras import regularizers
import tensorflow.keras.utils as ku
import numpy as np

This gets all the libraries up and running before you begin the actual code. Do note that this is in Tf 2.0 and hence this code might not work if you are running older tf version (You can import the methods from keras directly, but there will be some problems)

Tokenizing

tokenizer = Tokenizer()
data = open('100dayschallenge.txt',encoding="utf8").read()
corpus = data.lower().split("\n")tokenizer.fit_on_texts(corpus)
total_words = len(tokenizer.word_index) + 1

So tokenizer basically allows us to vectorize all these words in our huge corpus(text file) into a sequence of integers or vectors whatever might be useful to input into the neural network. To understand more about tokenizing and dictionary approach, do watch this video.

We load the data from the file which I compiled from my poems (you can replace it with whatever text corpus you want your neural network to learn from) and make it lower case to not run into case sensitive words with same spelling being treated as different words. After that, we call the split method with (“\n”) to treat one sentence as a sequence.

The fit_on_texts method takes one sentence at a time, converts that sentence into the number sequence of each word from the dictionary and gives you back the final list of sequence. Finally, we find the total number of words and add the length + 1 just to include a word tag which indicates if the new word coming in during testing is in the dictionary or not.

Now, comes generating the training data part.

Training Data

input_sequences = []
for line in corpus:
token_list = tokenizer.texts_to_sequences([line])[0]
for i in range(1, len(token_list)):
n_gram_sequence = token_list[:i+1]
input_sequences.append(n_gram_sequence)
# pad sequences
max_sequence_len = max([len(x) for x in input_sequences])
input_sequences = np.array(pad_sequences(input_sequences, maxlen=max_sequence_len, padding='pre'))
# create predictors and label
predictors, label = input_sequences[:,:-1],input_sequences[:,-1]
label = ku.to_categorical(label, num_classes=total_words)

input_sequences is our input vector which we will use to feed into the network.

The for loop goes sequence by sequence(line by line) and generates new n-gram sequences and appends them in our training data.

For example, if the sentence is “I love to procrastinate”, the generated sequence (in word form) would be

I love [1,200]
I love to [1,200,3]
I love to procratinate [1,200,3,504]

Now this might create a problem in terms of sizing of each sequence as different sentences will have different lengths and these sequences also will vary. So we come to the final preprocessing step of padding. By finding the max-length of a sequence(total number of words in a sentence), we pad 0s in a pre-operation so that all sequences are of the same length and we have a well-cleaned training data.

Output after padding
[[0,0,1,200],
[0,1,200,3],
[1,200,3,504],
...
...]

The training Xs is complete so now we need Ys i.e the labels. In simple terms, what we tell the system is that look if the sequence is I love to…. it should take procrastinate as the output and every time this sequence of “I love to” comes, it should predict procrastinate (something like what happens when you use Gmail now it tries to autofill what you would say next, similar concept).

So the labels become input_sequences[:,-1] i.e the last word of the sequence.

We then convert them to categorical labels i.e one hot encode them and treat the total number of words in the dictionary as the total number of classes.

(something like label =5 would be converted to Y = [0,0,0,0,1,0,0,0,0,0] with the total number of words being 10 in the dictionary which makes it easier for the NN to map)

Model Building

model = Sequential()
model.add(Embedding(total_words, 100, input_length=max_sequence_len-1))
model.add(Bidirectional(LSTM(150, return_sequences = True)))
model.add(Dropout(0.2))
model.add(LSTM(100))
model.add(Dense(total_words/2, activation='relu', kernel_regularizer=regularizers.l2(0.01)))
model.add(Dense(total_words, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
print(model.summary())

The sequential method in Keras is the simplest one to create a stacking of layer architecture (usually followed in NNs).

Embedding is a special method specifically made for handling text data.

From the Keras documentation -

model.add(Embedding(1000, 64, input_length=10)) 
# the model will take as input an integer matrix of size (batch, input_length).
# the largest integer (i.e. word index) in the input should be
# no larger than 999 (vocabulary size).

# now model.output_shape == (None, 10, 64), where None is the batch dimension.

After this, we add the bidirectional LSTM layer which makes the architecture something like this

We then add another LSTM layer, add the final dense layer which is the size of total words to predict. We add some dropouts and regularization in between to avoid overfitting to the data (the data isn’t that large but well, dropout is always good, unless google comes and takes away your code for their patent infringement)

We use categorical cross-entropy as loss function and Adam as the optimizer and these are all the hyperparameters which you can play around to find different results. That itself is a different article in itself.

Pro tip: If you see the output as repeated words over and over in the end, try reducing your embedding input_size and play around with LSTM cell sizes

Model Fitting and Accuracy

history = model.fit(predictors, label, epochs=200, verbose=1)

Again, you can play around with the number of epochs, but usually, it will take longer epochs to get good accuracy for this exercise.

%matplotlib inline
import matplotlib.pyplot as plt
acc = history.history['acc']
loss = history.history['loss']
epochs = range(len(acc))plt.plot(epochs, acc, 'b', label='Training accuracy')
plt.title('Training accuracy')
plt.figure()plt.plot(epochs, loss, 'b', label='Training Loss')
plt.title('Training loss')
plt.legend()
plt.show()

This part will help you visualize your training accuracy and loss to see if you are on the right track or not.

Ideal Case

Generate New Poems

written by your alter ai-ego

seed_text = "What is this life"
next_words = 50

for _ in range(next_words):
token_list = tokenizer.texts_to_sequences([seed_text])[0]
token_list = pad_sequences([token_list], maxlen=max_sequence_len-1, padding='pre')
predicted = model.predict_classes(token_list, verbose=0)
output_word = ""
for word, index in tokenizer.word_index.items():
if index == predicted:
output_word = word
break
seed_text += " " + output_word
print(seed_text)

You have to give some seed_text sequence on which you want your alter ai-ego to generate the next sentences. You also have to specify how many words you want it to spit out (Let’s make it to 2000, Stephen King who?)

We have to carry out all the preprocessing task we did for the input sequence to be applied to this test seed_text sequence in order to fit the model’s requirement. It will keep on adding new words at the end of each sequence and run till the counter variable you have set for word count.

Finally, all these predicted words get added to the seed text and are again feed into the loop to generate the next part of the sentence. Crazily simple, right?

Voila, you have your own procrastinating writer aid giving you ideas based on your own ideas for recycling is necessary, right? (Looking at you, Hollywood sequels).

One of my outputs:

What is this life converge in a late to depart blue years day a moon in little years putting putting i was blue without my victories day a ragtag bell without day a moon to our cache years the letters snaps blue day a moon day day they were never best friends into the

Problems and Improvements

  • It can’t work with punctuations, but as we all know punctuations are necessary in the poems to understand whether the meaning of the sentence is split into two or more lines.
  • Repeating sequences — You can see day a moon, moon day repeating 3–4 times, now I don’t mean to say poems don’t to repetition (hello figure of speech my old friend), but you might want more rich words which make sense which can only be done through larger corpus and thousands of lines of written text.

Do let me know if you are up for #100PoemsIn2019 challenge, let's change it to #100AiPoweredPoemsIn2019.

--

--