The world’s leading publication for data science, AI, and ML professionals.

The Top 3 Ways to Get Started With DataOps Pipelines

The proliferation of data and data systems – spurred by an increasing number of use cases for advanced data analytics – has catapulted…

The DataOps methodology offers a new way to improve both the quality and speed of data analytics.

DataOps graphic, with permission from Ascend's dataops.dev
DataOps graphic, with permission from Ascend’s dataops.dev

The proliferation of data and data systems – spurred by an increasing number of use cases for advanced Data Analytics – has catapulted DataOps into the mainstream for modern organizations. The DataOps methodology has been growing in popularity among data teams, offering a new way to improve both the quality and speed of data analytics.

Traditionally, data pipelines relied on very little automation and required intensive coding. As organizations modernized and began focusing on self-service analytics and machine learning, companies latched onto DataOps, which brings a software engineering perspective and approach to managing data pipelines – similar to the DevOps trend.

The DataOps methodology matches the mantra of agile software development: change is inevitable. One must architect processes and technology to embrace change. And change isn’t limited to schema changes either – it includes shifting business requirements, delivering data and reports to new stakeholders, integrating new data sources, and more. By focusing on automated tooling that supports quick change management and iterative processes, DataOps delivers on organizational goals like increasing the data team’s output to the business while decreasing overhead.

Benefits of a DataOps Strategy

With a focus on "making change cheap," DataOps practitioners transform data into business value. This value may take the form of easier-to-use data sets for analysts and data scientists, faster turnaround times on change requests, and critically, less errors in the data sets. If they haven’t done so already, companies with data and analytics goals (big or small) need to use DataOps to their advantage. In fact, Gartner recently listed DataOps as one of its top 10 data and analytics trends for 2021, as part of the larger "XOps" movement. Data and analytics teams should consider incorporating DataOps into their programs to take advantage of its flexible design, automation, agile orchestration, and scalability.

An investment in DataOps and automated tooling enables the following improvements: automated tests ensure quality of data, even as code logic changes; version control provides quick auditability of changes and rollback; and CI/CD separates development and staging environments from production. Generally speaking, DataOps enables data teams to keep pace with ever-accelerating data development lifecycle.

Potential Challenges to Overcome

DataOps and data pipelines affect all three parts of the "Golden Triangle" – people, process, and technology. When it comes to implementing any new methodology and the technologies that come with it, there can be inconsistencies with how it’s deployed and managed and the processes put in place.

Companies using DataOps need to ensure growth and development of all three aspects to keep data pipelines healthy. For example, a company investing heavily in technology to support change management will not unlock value if the people involved prefer a more "waterfall" approach as opposed to a more agile, iterative approach. People excited by the prospect of better change management (i.e., version control, data quality tests, and CI/CD pipeline deployment) and deploying more frequently might be burnt after breaking production analytics frequently due to a lack of test coverage. To address these challenges, the three aspects of the Golden Triangle need to work together and be managed and developed holistically for a successful DataOps strategy.


Top 3 Ways to Get Started with DataOps Pipelines

As with any emerging practice and new technology, it can be a daunting task to get started. With DataOps, pipelines are a big component. To help, here are several tactical approaches to implement DataOps pipelines within your organization.

  1. Prioritize Self-Service to Increase Velocity: Add in self-service so that all stakeholders within the organization, regardless of role and responsibility, are empowered to drive their goal. With quickly shifting requirements, processes hinged on certain teams will inevitably become bottlenecks and force priority tradeoffs. Instead, if the project’s driving team is empowered to self-create the necessary artifacts, they’ll be able to hit their goals without the need for cross-team communication and its inherent complexity. For example, the Data Engineering team can create more flexible data sets and bring in tools that allow the data analyst team to dig deeper without adding new feature requests. Creating more self-service will allow backlogged teams to catch up and use that additional bandwidth to invest in other DataOps processes and technologies.
  2. De-Risk the Change Process With Automated Testing & Deployment: Invest in automated testing and deployment (e.g., CI/CD) to build confidence in data quality and accelerate change management. This alone is not sufficient for DataOps, but it is a critical piece that can start paying dividends quickly such as freeing up the team’s time to focus more on new development, as opposed to dealing with unintentional breakages or brittle deploy processes. Having trustworthy data with required changes delivered more frequently can cause business-focused teams to rely more on the data.
  3. Start Small, Expand Over Time: Identify pain points and address them a few at a time. If a report is often inaccurate, add a couple of tests to provide coverage. Expand the test suite over time. If the deployment process is tedious and is delaying roll-outs and updates, identify the appropriate places to create a little bit of automation. In other words, instead of committing to a large investment in all of the moving pieces required for Dataops (which requires a long lead time and stakeholder buy-in), start smaller.

Determining which approach makes the most sense for a particular organization – and in what order – is up to wide discussion in the DataOps community. It’s similar to the nuances the software community experienced when starting with agile development. Is test-driven development a requirement or was the important part to have tests even if developed after the fact? It’s important to not get lost in the nuances of approaches and instead stick to the core tenets of DataOps practices. Only then can your team unlock the true potential of DataOps and create meaningful value for the business.


Related Articles