My Journey to Reinforcement Learning — Part 1.5: Simple Binary Image Transformation with Q-Learning

Jae Duk Seo
Towards Data Science
5 min readApr 10, 2018

--

GIF from this website

So for today, I wanted to make a very simple problem to solve with Q-learning in hopes to make my understanding more concrete.

** Warning ** I am still learning about RL, and the way I learn things is by explaining the concepts to myself, and this post is exactly that. Note that I am still learning about this topic, so if you came here to learn about RL, I recommend you not to read my post, but this post from UCLA PhD student : Deep Reinforcement Learning Demystified (Episode 0)

Experiment Setup

Left Image → Original Image
Right Image → Transformed Image
Red Number → Added black blocks of, there in total of 4

So the whole experiment is super simple, we want go from the left image to the right image. We just want to add 4 more black blocks in the given image.

Q-Learning Setup (State, Action)

Black Numbers / Lines → Dividing the images into 10 little states

First things first, lets make the states, since our original image have dimension of 10 * 100, lets divide the image into small squares (10*10) so we would have ten of them.

Modified Image from here

So our table for Q-Learning would look something like above. It would be an one-dimensional array with the index of the array being states and the values inside the array would be the threshold values. What this means is that we are going to have a threshold filter that only lets certain pixel intensity through and others will get filtered out.

Above we can see the hand crafted ground truth table, we can see that the index of number zero aligns where the image have black blocks. But we want to learn these threshold values, rather than hand crafting it. So lets first initialize our Q-Table like something below, with all ones.

Finally, lets define what the reward can be, for this lets simply take the bit-wise operation XOR from the image that we have threshold and the ground truth image.

Q-Learning Setup (Reward — XOR Different Values)

Image from this website

Red Box → XOR operation in Image

Taking the XOR operation between two images, will give us how much difference there is between those two images. So if the two images are identical the XOR difference would be 0, since there are no difference. Else if the images are completely different we would get a positive value for the XOR operation since there is a huge difference.

create_data → Created Image after Threshold operation
reward → XOR operation between the create_data and ground truth

Since our objective is to populate the Q table with correct values we should subtract one and divide by the maximum pixel intensity, which is 100. Making our reward 1 if the threshold value is correct and 0 if the threshold is wrong.

E-Greedy Exploration

We are going to create a variable called epsilon and depending on this value, we are going to change the threshold or keep it the way it is. Also, as number of episode increase we are going to take a smaller magnitude and for numerical stability we are going to add 1e-8.

Results

Top Left → Original Image
Top Right → Ground Truth Image
Bottom Left → Image generated by learned Q-table values
Bottom Right → Image generated by hand crafted Q-table values

Our Q-Table have missed one block, which is the second black block. But lets see the raw Q-Table results.

Red Box → Incorrect value of 1 threshold

Interactive Code

For Google Colab, you would need a google account to view the codes, also you can’t run read only scripts in Google Colab so make a copy on your play ground. Finally, I will never ask for permission to access your files on Google Drive, just FYI. Happy Coding!

To access the code for this post, please click here.

Final Words

A professor from University of Waterloo, already gave an example of using RL to perform segmentation “Application of reinforcement learning for segmentation of transrectal ultrasound images” I wish to implement that paper soon as well.

If any errors are found (since I am still learning this there will be alot), please email me at jae.duk.seo@gmail.com, if you wish to see the list of all of my writing please view my website here.

Meanwhile follow me on my twitter here, and visit my website, or my Youtube channel for more content. I also did comparison of Decoupled Neural Network here if you are interested.

Reference

  1. Building Java Programs Chapter 7 — ppt download. (2018). Slideplayer.com. Retrieved 10 April 2018, from http://slideplayer.com/slide/8688800/
  2. drawBitmap, clipPath, UNION, DIFFERENCE, INTERSECT, REPLACE, XOR Android example | Software and Source Code for Developers. (2018). Android.okhelp.cz. Retrieved 10 April 2018, from http://android.okhelp.cz/drawbitmap-clippath-union-difference-intersect-replace-xor-android-example/
  3. Sahba, F., Tizhoosh, H., & Salama, M. (2008). Application of reinforcement learning for segmentation of transrectal ultrasound images. BMC Medical Imaging, 8(1). doi:10.1186/1471–2342–8–8

--

--

Exploring the intersection of AI, deep learning, and art. Passionate about pushing the boundaries of multi-media production and beyond. #AIArt