This article is a bit of a demo for how we can apply Artificial Intelligence and data science to a company’s marketing.
It has been one full year since I started writing articles on medium.com, and Towards Data Science. I have become a top writer on medium for the tag "Artificial Intelligence", reaching a growing readership of almost 2,000 followers with 50 articles.

How does this article apply to you? Well, in this article, I’m the guinea pig. You get to see some nice data on how a medium writer grows an audience, and see behind the curtain to understand what works and what doesn’t. You can imagine how my approach to marketing material analysis in this article could help you to be a better writer on medium.
Why am I writing on medium? Well, it’s more than marketing. It’s messaging. I get to push content into the public domain, and refer clients back to these posts as we discuss solution architecture. It also helps us to accelerate the due diligence companies conduct on our firm on a regular basis. Because we are so tightly bound by NDAs, I take every opportunity I can find to share the non-secret parts of what we are doing with you, my engaged readers.
Having a look at the titles of my articles, I tend to use the keyword "artificial intelligence" instead of deep learning or Machine Learning. This approach helps the most people to understand the topic from afar, and to engage the C-suite execs who are my target market for both our B2B consulting (lemay.ai), and our B2B products (genrush.com, auditmap.ai).
Let’s have a look at the dataset of my 50 medium posts (see below) and my traffic stats from medium.com
The meaning of the columns are as follows:
- id: to keep track of rows in the database
- cluster_daniel: which of three topics do I think this article is about
- title: the title of the article
- url: link to the article
- word_count: number of words in the article. sometimes this counts non-words like code or links. I’m OK with that
- views_medium: article view count from medium stats
- reads_medium: article read count from medium stats. Reads are engaged users, while views are less engaged
- fans_accts_medium: number of accounts that clapped for the article (thank you)
- claps_medium: number of claps the readers added to the article
- date_medium: date the article was published
- days_since_post: number of days between when I grabbed the stats and the article publication date (date_medium).
- twitter_medium: number of referrals to the article from twitter
- text: the plaintext of the articles – tags and embedded multimedia removed
Here is how I classified my posts, in my head (cluster_daniel column):
- Machine Learning Demos and How-To Stuff
- B2B Solution Architecture
- Startup Scene and Opinion Blag
Let’s see if my clustering based on experience matched up with some mathematical reality. I converted the text for each article into vector form using spacy (english medium dataset), and then applied t-SNE dimensionality reduction. Let’s see how the articles data clustered for the 3 labels I have been using:

The results are quite encouraging. In the figure above, we can see that the red dots for blog-like articles (heavy on business terms) and the blue dots for technical articles (heavy on technical terms) are separated by the yellow dots which represent B2B articles that mix business and technical terms. It tells me that my labels are backed up by the content data itself.
It can be fun to look at what specific articles are near each other in the embedding space:

A quick disclaimer before we proceed to look at the data some more: this was a very simple analysis without frequency/trending analysis or anything fancy, no image count, link count, link web, etc.
Now then, let’s look at the relationship between columns in the data.
2 seconds of pandas dataframe and seaborn work gets us the following starting point:

The problems with the figure above are: no clusters are shown, the text is too small, and the correlations are hard to see.

This new figure above is a nice step forward, but we lost a sense of the underlying cluster data. We see correlations overall, but still can’t see the cluster origin of these correlations. What we can see more clearly now is that a lot of stuff is correlated. For example, word count is correlated with views, claps, reads, and so forth. That makes good sense. The anti-correlations for id are just telling us that the lower cluster number has more fans/reads/likes/etc than the higher cluster numbers. However, we can see that the older posts are in the later cluster. This tracks with my increased focus over time on B2B and technical articles over time, rather than opinion pieces (the third cluster). The days_since_post row tells us that the more recent articles are doing better than the older ones.
And now, here is the corlation data without cluster coloring. The data along the diagonal is converted into frequency bins for easier analysis than the linear approximation we had before. each correlation box now has a line of best fit created using regression, and a cone of uncertainty (bigger cone = less sure).

We can already make some logical conclusions from the data. Row 4 column 3 (reads v.s. views) tells us that reads are tightly correlated with views, which is clearly true. Similarly claps and fan accounts.
Now let’s go that one level deeper and look at the data by cluster. The whole idea is to validate my hypothesis that readers (you) want to see more technical and B2B articles, and less opinion articles. Finally, this is the same data with color added for each cluster:


The text is still pretty tiny, but you can zoom in to see the column names.
Now that we can see the data by cluster, it becomes clear that within clusters, the data exhibits a lot of regularity for many metrics. In the first row and first column clusters oranize by id because id was picked by me to be ascending by cluster. That’s not interesting as an insight. Moving deeper into the figure, the last row (twitter) correlates with word count (column 2) but not for the red dots (startup scene). This tells us that twitter "likes" the other clusters better. Looking at the bottom right bar graph, most articles have no twitter referrals, and a small number of articles get most of the twitter traffic. The second to last row shows us that I have been writing less articles, and no recent articles on startup stuff (red dots). We also see that there are more reads and views for more recent articles, but less articles overall in recent times.
As future work, it would be fun to predict cluster id (topic model) and claps/views from the word embedding model. However, at this point, I just don’t think I have enough data in terms of row count and column count to make reliable predictions.
Thanks for reading! Here below is a deeper dive into each of the 50 data points (articles) by cluster:
Machine Learning Demos and How-To Stuff:
Artificial Intelligence for Music Videos
Drawing Anime Girls With Deep Learning
Elbow Clustering for Artificial Intelligence
Image Datasets for Artificial Intelligence
kegra: Deep Learning on Knowledge Graphs with Keras
AWS SageMaker: AI’s Next Game Changer
Deep Learning on the DigitalOcean Stack? Not Quite Yet
Deep Learning with DigitalOcean: Redux
A dozen helpful commands for data nerds
Artificial Intelligence Genesis: Literally.
Scandal! What happens to a leaked email address?
Machine Learning in Medicine: Demo Time!
Model me this; Model me that: Generate text one character at a time.
B2B Solution Architecture:
DREAM.ac: Build Teams Using Artificial Intelligence
Deep Learning Magic: Small Business Type
Artificial Intelligence and Bad Data
Understanding Events with Artificial Intelligence
Artificial Intelligence Without Labeled Data
Accelerating Deep Neural Networks
Artificial Intelligence: Get your users to label your data
Big Data and Machine Learning: Two peas in a pod
Startup Scene and Opinion Blag:
Artificial Intelligence is Probably Safe
Machine Learning: Use Small Words
Artificial Intelligence as Magic
Artificial Intelligence: Consequences
Artificial Intelligence: Hyperparameters
AI Tools Are Popping Like Popcorn
Our Artificial Intelligence Startup
Why Bother to Bootstrap Your AI Startup?
AI Consulting & The Reverse Marshmallow Experiment
How Can I Write Better Articles on AI?
Low Budgets and High Expectations: Machine Learning Startups
It’s alive! Building general AI is the future
Educational Videos: Artificial Intelligence, Deep Learning, Machine Learning, etc.
-Daniel [email protected] ← Say hi. Lemay.ai 1(855)LEMAY-AI
Other articles you may enjoy: