Getting creative with data science algorithms
in data science

In April 1972, New York times published an article "Workers Increasingly Rebel Against Boredom on Assembly Line". Though car industry was considered very innovative, the type of work was very mechanical and repetitive. The reason was that the car industry was based on assembly line concept, where people are supposed to do a sequence of repetitive tasks every day.
Similarly data science field is super interesting and innovative, it can sometime get mechanical and repetitive. For example, when objective is to cluster data, generally we tend to use the "usual suspects" algorithms such as KMeans or DBScan. When it comes to prediction, the process can be very mechanical and just following sequence of steps such as data cleaning, one-hot encoding, feature engineering, machine learning and confusion matrix.
Always applying similar algorithms and methodologies, can sometime lead to data-science fatigue. Even-though data-science is used for innovative purposes, data science fatigue can lead your brain to be less innovative.
So how to avoid getting too mechanical and avoid assembly-line style work ? One way is to get Creative with algorithms. Here are some examples
Think of the objective first, and then algorithms
When you think of clustering or segmentation, we generally think of applying clustering algorithms such as KMeans , DBScan and others on tabular data with many features. However let us see here an approach of clustering which is different from the usual way of doing it.
Let us say that you have list of shopping invoices as your data. It has got three fields Invoice Number, Product Code, Description and Quantity.

So how do we cluster on this data? At first we need to decide on the objective of clustering. One useful objective could be to find clusters or segments of products which are sold together. So our tactic to address this problem could be in two steps: 1. Find products which are sold together 2. Find cluster of all products which are sold together
So let us first start by finding all products which are sold together. Market basket analysis helps us to find which products are sold together. There are various algorithms for market basket analysis. One of the commonly used algorithm for market basket analysis is apriori algorithm. The result of this algorithm is product couples which are frequently sold together. Here are some examples based on the dataset of shopping invoices
VINTAGE BILLBOARD LOVE/HATE MUG – & – EDWARDIAN PARASOL RED
WHITE HANGING HEART T-LIGHT HOLDER – & – WHITE METAL LANTERN
So now as we have determined products which are sold together, next step is to find clusters of such products. To do this we can get inspiration from Graph theory, or sometimes also called Network theory. We can think each product as a node. And if they have been sold together, we can create an edge between the products (nodes). The graph could be visualized as shown here, for few top selling products

We can clearly see some clusters of products in this visualization. Additionally graph algorithms such as modularity algorithm can be used to extract these clusters
So as we have seen, we can do clustering with apriori and graph algorithms. When you think of clustering , you should not automatically conclude use of KMeans , DBScan or other similar algorithms. One should think of the objective of clustering and then decide on which algorithms to use
Try to find different ways to do the same thing
One way of avoiding repetitive work and keep the innovation edge sharp is to look at the problem from a different angle. To illustrate this point, let us take as an example automobile dataset. This automobile dataset has various technical characteristics about cars.

Let us say that our objective is to find the covariance between all features. One of the first things which would come to the mind is to apply covariance algorithm between all features. And repeatedly applying the same algorithm always makes you loose the innovative edge.
So what could be an another way to find co-variance. One such different way is PCA (Principal Component analysis). Though the objective of PCA algorithm is dimension reduction, it is based on finding variation between features. The features with the largest variations are used for dimension reduction. As a by-product of this algorithm, you also get features which have positive covariance as well as those which have negative covariance
Shown here is a plot between the features and its influence (or Eigenvalues) on the first principal component. One can conclude that the width, length, height have positive covariance and are positively correlated. And that mpg, rpm have negative covariance and negatively correlated

Here we see that we can solve a problem with a different approach. This also helps us to understand how different algorithms are related. Once you develop this thinking on relations and similarities between algorithms, you can start approaching a problem from various different angles. This will take you to the next level of Data Science innovation
Use Deep Learning not as an end result , but as a source of data
How many times we have seen those green boxes on images after execution of YOLO algorithms. When YOLO was introduced to the data scientists few years back, it was super fun and exciting. There were many data scientists who used YOLO for object identification on variety of images and video. However now just using YOLO for object identification has become very mechanical. All those green boxes does not excite the data scientists as it used it to exicte few years back.
Deep learning is very innovative and cutting edge. But the manner in which it is used has become very mechanical. One of the ways is to keep its innovative usage is to think of deep learning as source of data. Imagine you have video analysis of people moving in a retail shop. We can use deep learning algorithms to identify humans , as shown in example below. The examples illustrate identifying persons in an airport or in a retail shop

Now what if we consider above as result as not an end , but as a data source. YOLO can help detect object, but it also gives the position of the object. So the result of YOLO can be then be fed into another algorithm, such as path analysis. We can analyze people movement. We can use it to find zones in retail shop or airport where there is lot of movement and zones which are not very busy. We can also find the trajectory which people usually take. Like in retail shop, we can find out which zones are visited before coming to the cash counter as well shoppers who do not pass through the cash counters
As illustrated below is path analysis of zones visited in a retail shop

Here are we are going beyond just object detection and drawing green boxes. Thinking output of deep learning algorithms as source of data will help you apply intelligent algorithms to come out with business oriented and exciting outcomes
In this article we saw some interesting ways to think about data science. Use these techniques in order not to get trapped into a mechanical and assembly-line manner of doing data science. Data science profession is innovative in nature. So think different, think beyond the usual, and keep your innovative edge always sharp
Additional resources
Website
You can visit my website to make analytics with zero coding. https://experiencedatascience.com
Please subscribe to stay informed whenever I release a new story.
You can also join Medium with my referral link.
Youtube channel Here is link to my YouTube channel https://www.youtube.com/c/DataScienceDemonstrated