A step-by-step process for choosing and assembling your company’s first data stack
Background
Choosing the initial data stack for your company is not a trivial task. The data-tools market is getting more crowded each year, with hundreds of different products and services that one could easily get lost.
During my time as Lead Data Engineer at Wix and Explorium, I came across many professional resources describing the processes of how companies upgraded their old data stack or migrated to a new one. Articles and posts about an initial stack, on the other hand, are much harder to find. This might be related to the fact that many initial stacks weren’t chosen by data engineers or done as a structured process.
After reading this article, you will have a set of ground rules for choosing the right stack for your company.
Step 1: Collect Stakeholders’ requirements & budget
- Interviews (AKA, get to know your users): Start by interviewing your potential consumers within the organization, understand their needs and expectations from the data platform. Consumers might be business stakeholders that will only interact with your BI tool, analysts that will query the data directly, or software engineers that will ingest data into your platform.
- Translate business needs to a technical spec: Once you understand their needs, it is your job to translate them into actual technical specifications. Some of the most common back-end/pipeline specs are data freshness, volume, accuracy, availability and query performance. For the front-end, you would probably consider dashboard load time, self-service abilities and permissions management.
- Set the logistics (budget, timeline, resources): This is the time to agree on a budget for the platform. SaaS data products can become pricy as usage increases, and cost might play a big role in your final decision. It is also recommended to know who your dependencies are (DevOps, IT, Legal, etc.) and make sure their timeline is synced with yours.
Step 2: Draft your initial stack layout
-
Select componentsOne of the hardest tasks is the selection of components your stack will be based on (A query engine, ETL tool, dashboarding tool, etc.). Make an initial layout based on: a. Your stakeholders’ requirements b. The team that will maintain each component and their knowledge – For example, do they know how to code? Are they able to deploy and maintain a self-hosted service? c. Your own knowledge and vision
-
Create a specific requirements listGenerate a list of required features for each component and order it by priority, distinguishing the essential ones from the nice-to-haves. In most cases, this step won’t narrow down your search dramatically. Unless your stakeholders are already experienced working with data, you might end up with a list of very generic features, offered by most of the vendors of each component.
- Decide on a prioritization mechanismMake sure you prioritize each need, and score each feature’s weight accordingly. Most tools in each category share the same basic features with some offering unique capabilities or integrations. Your prioritization will help you later deciding if it’s worth paying that extra for a specific tool.
Step 3: Identify the market leaders
One of the most effective ways to focus your search is learning from the experience of similar organizations. Find a few companies that are at a similar stage as you (or even 1–2 steps ahead) and understand which products they chose and why. Some companies share this data publicly through blogs, meetups or keynotes while to others you’ll need to approach directly.
This process will help you understand who are the lead vendors for your market. I recommend considering market leaders for 2 main reasons:
- Market leaders usually have larger communities and more online information. No matter what your issues will be, someone else probably already dealt with them.
- It’s easier to recruit when the technologies you use are rising (as recruits want to gain experience with them) or mainstream (recruits come with prior knowledge and experience).
Market leaders are usually the more expensive option. By looking at similar companies, you’re more likely to find ones that still fit your budget.
Step 4: Create your shortlist and make final decisions
Now it’s time to wrap things up. For each component, select 2–3 final candidates that fit your must-have requirements and were identified as market leaders. Create a comparison table between them and include a rough estimate for the cost of each one.
If you feel that all products are similar to you, consider asking for a demo account or a POC.
Help them help you by asking the right questions – In some cases, the feature you need is available in a Beta program or is about to be developed in the near future. Don’t be lured by features out of your requirements.
Before making your final decision consider the following aspects:
- Components integrationMake sure the different vendors that you choose are compatible with each other. Some vendors offer multiple components or even the whole stack. This might help with integration and reduce cost but can also bring you to vendor lock-in.
- Avoid vendor lock-inMake sure your data is saved in an open-sourced format and in your cloud storage. Don’t let the vendor own your data and always leave an option to migrate (partially or fully) to another vendor. For dashboards, on the other hand, there are still no widely supported open formats. This makes dashboard migration manual and expensive.
- Consider your company’s existing vendors If your company has existing contracts with data vendors (cloud providers are the classic example, as all of them offer a full data stack), consider checking them out. This might speed up your process (as the contract is already set up) and get you a better price.
Final advice
Remember this stack is only your baseline. It won’t be perfect and, as your company evolves, it might not be an hermetic solution anymore. Don’t over-engineer it and don’t get into endless decision cycles. Start moving and collect lessons and feedback you’ll use for the inevitable redesign. The next time will be easier.