Should you build on Singer taps?

The growing debate over the viability of building on open source Singer taps

hotglue
Towards Data Science

--

Singer is an open-source standard for writing scripts that move data built by the folks over at Stitch. The open source project was introduced by Stitch to make the process of creating data integration “connectors” more standardized and easy — an attractive pull for developers.

Source: unDraw

In fact, both closed source projects like hotglue and open source projects like Meltano are building off Singer taps to offer platforms that make the process of creating data integration pipelines easier for developers.

However, there is a growing debate over the feasibility of building data pipelines on top of Singer taps. Why?

Airbyte, a YCombinator company, is one of the most outspoken critics of Singer in the open source community, and is focused on building an OSS solution that makes it easy to move your data into a data warehouse. They have published a series of articles about the problems with Singer, and what they aim to do better.

Let’s look at the pros and cons:

Pros

1. The core Singer spec is good

The Singer spec is well documented and there are several well-maintained, and well-documented taps. The core design of using JSON to move data between taps and targets avoids incompatible output formats, and the idea of interchangeably using different taps is a huge plus.

2. There are a ton of Singer taps already!

There are currently 150–200 connectors built on the Singer spec, and building new ones is relatively easy. Most of these connectors come from developers who wanted a specific data source, built the tap, and submitted it to Singer for review. This means you won’t have to worry about building every tap you want to use yourself.

Cons

1. Lack of standardization across taps

Every Singer tap is a unique project — they are organized as open source repos under the Singer organization. This was designed to encourage developers to create their own taps and contribute to the overall community. It definitely worked — there are a lot of open source Singer taps, and many of them have been accepted by Singer as “official” taps.

However, this also creates one of the biggest problems in Singer. Because each project is unique, there is often a lack of continuity between what features and use cases are supported across taps. Many contributors solely build what they need from the tap, and then leave it for the open source community.

This makes it hard to be confident in the quality of Singer taps, and many organizations end up maintaining their own forks of taps or custom shims.

2. Lack of maintenance

After Stitch was acquired by Talend, the Singer project was largely left alone. This can create real issues as APIs have breaking changes and organizations remain on deprecated Singer taps.

Beyond that, the dependencies these taps have often conflict with each other, and there is no real consistency in the versioning. Running multiple taps together often requires placing the standalone Singer tap binaries into containerized applications or doing virtual environment gymnastics to avoid conflicts.

Conclusion

Singer definitely creates enormous value for developers, and did a lot in terms of beginning the process of standardizing the process of building taps. Now more than ever, data is more spread out and the need to solve data integration across organizations is more paramount.

Although Singer offers a ton of taps, I personally think Airbyte’s mission to address the flaws in Singer and create a better experience for developers is a great alternative. I would argue that Airbyte is definitely worth trying, and you should be wary of being fully dependent on Singer.

Thanks for reading! Let me know what you think in the comments, or if you have any questions.

--

--