5 critical success factors for Big Data mining

Vladimir Fedak
6 min readJan 26, 2018

Successful Big Data mining relies on the correct analytical model, choosing the relevant data sources, receiving worthy results and using them to ensure the positive end-users’ experience.

Big Data mining is a permanent activity of specifying the desired business goals, choosing the correct data sources, gathering the relevant information and applying the analytics results to gain substantial and feasible benefits, either in terms of feasible (bottom line increase) or infeasible (customer satisfaction or brand awareness, etc.) improvements.

Even the most expensive and sophisticated Big Data analytics system is utterly useless if the results of its work cannot be applied to improve the current workflow, increase the brand awareness or market impact, secure the bottom line or ensure a lasting positive customer experience with the product or service the business delivers. This is why imbuing the Big Data mining into the existing business routine is highly beneficial for startups, small-to-medium businesses and enterprises alike.

Below we describe 5 factors we consider critical for the success of Big Data mining projects:

  1. Clear business goals the company aims to achieve using Big Data mining
  2. Relevancy of the data sources to avoid duplicates and unimportant results
  3. Completeness of the data to ensure all the essential information is covered
  4. Applicability of the Big Data analysis results to meet the goals specified
  5. Customer engagement and bottom line growth as the indicators of data mining success

Let’s take a closer look at what these success factors are and how to achieve them.

Clear business goals

Big Data mining can be a success only if it has some tangible, certain goals: find out what product or service is the least popular and what can be done to improve the situation. Is it the sales funnel, the wrong design, the wrong USP or the inappropriate message that does not communicate to the customer? Analyzing the customer’s activity on social media and their feedback to the loyalty program surveys can be a trove of information regarding the relevance of your inventory to their needs and requirements.

Practical implementations and the approaches to goal setting might differ, yet the result will be the same: setting a clear business goal is essential to ensure the analysis success.

Relevancy of the data sources

It’s obvious that in order for data mining to provide some credible results, the data should be collected from relevant sources. Gathering the data on average car tire prices will not help increase the sales of burritos, etc. However, determining the relevant information sources for a Big Data mining project is not enough. Keeping the dataset size close to the minimally appropriate is essential too.

For example, when the data is gathered by aggregating the news, there is a high risk of receiving duplicates of the same article multiple times, as various media repost the materials. Sometimes the link to the source is provided, but let’s assume the source A posts an article, the source B reposts it and cites A, while the source C reposts the material and cites B as a source. To add even more chaos to the mix, let’s assume the source D rewrites the material a bit and posts it without citing any of the sources above.

All of this results in 4 pieces of news with essentially the same information, yet only 1 being of value, with 3 being merely duplicates. What can be done to deal with this situation?

  • Applying a semantics analysis to search for the keywords and find plagiarism
  • Comparing the publication times of duplicates, to find the earliest publication
  • Analyzing the spatial spread of the news, as the target audience in the US will least likely be interested in the news article from Congo, even if the Congolese media reposted The New York Times, etc.
  • Using the RSS feeds as the sources of data instead of the news portals to be amongst the first entities informed of the event and not lag behind.

Completeness of the data

The next step is making sure the data set is complete, meaning all the essential characteristics and metrics of the intended analysis are covered by at least 1 relevant data source. Having more data sources is better than having only a few, of course, yet the dataset should be kept as lean, mean and efficient as possible to minimize the resources spent.

Once the appropriate data set is gathered, it should be analyzed by a correctly chosen Machine Learning algorithm to provide the expected data mining outcomes. Choosing the right algorithm is quite a complicated task, so working with a trustworthy and experienced contractor is highly recommended to achieve the best results.

Applicability of the Big Data analysis results

Once you lay your hands on the Big Data analysis results, it’s important to take action to apply them and reach the business goals set. If the analysis shows some item is abundant in stock — it’s time for a promo event or even a free giveaway of this item as a bonus to a more expensive purchase. If the system highlights low sales of fried ribs in one of the restaurants, you can either relocate their stockpiles to some better-performing branches or issue a special event with 50% discount on the fried ribs to the local loyalty club members, to further bolster their positive experience.

The possibilities are endless, the only condition being the business actually takes some action based on the analysis results, or the whole process is done in vain. In the case no such action can be taken, it seems the goals were not set correctly from the start, or an error was made on any of the previous stages. To avoid such a risk, the businesses should either have ample experience with Big Data mining or hire the specialists with such experience.

The indicators of data mining success

You should set some KPI (Key Performance Indicators) and check if the application of the decisions made based on the results of the Big Data mining analysis helped you reached the business goals set. Have the sales grown after a successful campaign? Did the logistics expenses plummet after contracting a more reliable transporting company? Did your marketing campaign bring better fruit as compared to the previous ones? Using the feedback from your customers and employees helps evaluate the efficiency of your data mining process.

It is also important to keep in mind sometimes force-majeure reasons influence the situation and there is literally nothing one can do to correct the situation. The 2017 hurricanes in the southern states of the US are a perfect example of the losses and events nobody could avert, even knowing about them in advance. While the population has been evacuated, property and utility damage was substantial, as well as the losses of the businesses in the area.

Thus said, the Machine Learning algorithms used for Big Data mining should be able to raise smart alerts upon encountering unexpected trends or patterns in the data, allowing the businesses get the insights faster and make more grounded decisions to maximize the positive possibilities and minimize the negative effects.

The article was originally published here.

--

--