M2M Day 90— How I used Artificial Intelligence to automate Tinder
This post is a part of Jeff’s 12-month, accelerated learning project called “Month to Master.” For March, he is downloading the ability to build an AI.
If you’re interested in learning more about me, check out my website. We’re also building yourmove.ai, an AI conversation assistant, specifically for dating!
Introduction
The other day, while I sat on the toilet to take a *poop*, I whipped out my phone, opened up the king of all toilet apps: Tinder. I clicked open the application and started the mindless swiping. *Left* *Right* *Left* *Right* *Left*.
Now that we have dating apps, everyone suddenly has access to exponentially more people to date compared to the pre-app era. The Bay Area tends to lean more men than women. The Bay Area also attracts uber-successful, smart men from all around the world. As a big-foreheaded, 5 foot 9 asian man who doesn’t take many pictures, there’s fierce competition within the San Francisco dating sphere.
From talking to female friends using dating apps, females in San Francisco can get a match almost every other swipe. Assuming females get 20 matches in an hour, they do not have the time to go out with every man that messages them. Obviously, they’ll pick the man they like most based off their profile + initial message.
I’m an above-average looking guy. However, in a sea of asian men, based purely on looks, my face wouldn’t pop out the page. In a stock exchange, we have buyers and sellers. The top investors earn a profit through informational advantages. At the poker table, you become profitable if you have a skill advantage over the other people on your table. If we think of dating as a “competitive marketplace”, how do you give yourself the edge over the competition? A competitive advantage could be: amazing looks, career success, social-charm, adventurous, proximity, great social circle etc.
On dating apps, men & women who have a competitive advantage in photos & texting skills will reap the highest ROI from the app. As a result, I’ve broken down the reward system from dating apps down to a formula, assuming we normalize message quality from a 0 to 1 scale:
The better photos/good looking you are you have, the less you need to write a quality message. If you have bad photos, it doesn’t matter how good your message is, nobody will respond. If you have great photos, a witty message will significantly boost your ROI. If you don’t do any swiping, you’ll have zero ROI.
While I don’t have the BEST pictures, my main bottleneck is that I just don’t have a high-enough swipe volume. I just think that the mindless swiping is a waste of my time and prefer to meet people in person. However, the problem with this, is that this strategy severely limits the range of people that I could date. To solve this swipe volume problem, I decided to build an AI that automates tinder called: THE DATE-A MINER.
The DATE-A MINER is an artificial intelligence that learns the dating profiles I like. Once it finished learning what I like, the DATE-A MINER will automatically swipe left or right on each profile on my Tinder application. As a result, this will significantly increase swipe volume, therefore, increasing my projected Tinder ROI. Once I attain a match, the AI will automatically send a message to the matchee.
While this doesn’t give me a competitive advantage in photos, this does give me an advantage in swipe volume & initial message. Let’s dive into my methodology:
Data Collection
To build the DATE-A MINER, I needed to feed her A LOT of images. As a result, I accessed the Tinder API using pynder. What this API allows me to do, is use Tinder through my terminal interface rather than the app:
I wrote a script where I could swipe through each profile, and save each image to a “likes” folder or a “dislikes” folder. I spent hours and hours swiping and collected about 10,000 images.
One problem I noticed, was I swiped left for about 80% of the profiles. As a result, I had about 8000 in dislikes and 2000 in the likes folder. This is a severely imbalanced dataset. Because I have such few images for the likes folder, the date-ta miner won’t be well-trained to know what I like. It’ll only know what I dislike.
To fix this problem, I found images on google of people I found attractive. Then I scraped these images and used them within my dataset.
Data Pre-Processing
Now that I have the images, there are a number of problems. There is a wide range of images on Tinder. Some profiles have images with multiple friends. Some images are zoomed out. Some images are low quality. It would difficult to extract information from such a high variation of images.
To solve this problem, I used a Haars Cascade Classifier Algorithm to extract the faces from images and then saved it. The Classifier, essentially uses multiple positive/negative rectangles. Passes it through a pre-trained AdaBoost model to detect the likely facial dimensions:
The Algorithm failed to detect the faces for about 70% of the data. This shrank my dataset to 3,000 images.
Modeling
To model this data, I used a Convolutional Neural Network. Because my classification problem was extremely detailed & subjective, I needed an algorithm that could extract a large enough amount of features to detect a difference between the profiles I liked and disliked. A cNN was also built for image classification problems.
To model this data, I used two approaches:
3-Layer Model: I didn’t expect the three layer model to perform very well. Whenever I build any model, my goal is to get a dumb model working first. This was my dumb model. I used a very basic architecture:
model = Sequential()
model.add(Convolution2D(32, 3, 3, activation=’relu’, input_shape=(img_size, img_size, 3)))
model.add(MaxPooling2D(pool_size=(2,2)))model.add(Convolution2D(32, 3, 3, activation=’relu’))
model.add(MaxPooling2D(pool_size=(2,2)))model.add(Convolution2D(64, 3, 3, activation=’relu’))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Flatten())
model.add(Dense(128, activation=’relu’))
model.add(Dropout(0.5))
model.add(Dense(2, activation=’softmax’))adam = optimizers.SGD(lr=1e-4, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss=’categorical_crossentropy’,
optimizer= adam,
metrics=[‘accuracy’])
The resulting accuracy was about 67%.
Transfer Learning using VGG19: The problem with the 3-Layer model, is that I’m training the cNN on a SUPER small dataset: 3000 images. The best performing cNN’s train on millions of images.
As a result, I used a technique called “Transfer Learning.” Transfer learning, is basically taking a model someone else built and using it on your own data. This is usually the way to go when you have an extremely small dataset. I froze the first 21 layers on VGG19, and just trained the last two. Then, I flattened and slapped a classifier on top of it. Here’s what the code looks like:
model = applications.VGG19(weights = “imagenet”, include_top=False, input_shape = (img_size, img_size, 3))top_model = Sequential()top_model.add(Flatten(input_shape=model.output_shape[1:]))
top_model.add(Dense(128, activation='relu'))
top_model.add(Dropout(0.5))
top_model.add(Dense(2, activation='softmax'))new_model = Sequential() #new model
for layer in model.layers:
new_model.add(layer)
new_model.add(top_model) # now this worksfor layer in model.layers[:21]:
layer.trainable = Falseadam = optimizers.SGD(lr=1e-4, decay=1e-6, momentum=0.9, nesterov=True)
new_model.compile(loss='categorical_crossentropy',
optimizer= adam,
metrics=['accuracy'])new_model.fit(X_train, Y_train,
batch_size=64, nb_epoch=10, verbose=2 )new_model.save('model_V3.h5')
The results were:
Accuracy: 73%
Precision: 59%
Recall: 44.61%
Accuracy is just predicting whether I liked or disliked the image correctly.
Precision, tells us “out of all the profiles that my algorithm predicted were true, how many did I actually like?” A low precision score would mean my algorithm wouldn’t be useful since most of the matches I get are profiles I don’t like.
Recall, tells us “out of all the profiles that I actually like, how many did the algorithm predict correctly?” If this score is low, it means the algorithm is being overly picky.
You can see here the algorithm predicting on Scarlet Johansson:
Running the Bot
Now that I have the algorithm built, I needed to connect it to the bot. Building the bot wasn’t too difficult. Here, you can see the bot in action:
I intentionally added a 3 to 15 second delay on each swipe so Tinder wouldn’t find out that it was a bot running on my profile. Unfortunately, I did not have time to add a GUI to this program.
Future Work
I gave myself only a month of part-time work to complete this project. In reality, there’s an infinite number of additional things I could do:
Natural Language Processing on Profile text/interest: I could extract the profile description and facebook interests and incorporate this into a scoring metric to develop more accurate swipes.
Create a “total profile score”: Rather than make a swipe decision off the first valid picture, I could have the algorithm look at every picture and compile the cumulative swipe decisions into one scoring metric to decide if she should swipe right or left.
More Data: I only trained on 3,000 images. If I could train on 150,000 Tinder images, I’m confident I’d have an 80–90% performing algorithm. In addition, I could also improve the facial extraction program, so I’m not losing 70% of my data.
Adapt to Hinge, Coffee Meets Bagel, Bumble: To widen my quantity, adapt the algorithm to hit multiple channels.
A/B Testing: Having a framework to AB test different messages, profile pictures and have analytics supporting these different decisions.
Google’s Inception, VGG16: These are different pre-trained cNN’s. I wanted to try these but I ran out of time.
Add GUI/Turn into a user-friendly app: This would allow non-technical people to use this.
Now, time to swipe!
If you’re interested in seeing the code or re-creating this project for yourself, click here. If you’re interested in learning more about me, check out my website.
Credits:
- Oscar Alsing for explaining the framework.
- Harm De Vries paper on his Tinder experiment.
- Philippe Remy for ideas from his Tinder experiment.
Read the next post.