A Guide On Choosing the Right Data Pipeline Tool

William D'Souza
6 min readSep 7, 2023

--

You are maintaining your own APIs? Really!?!?

Photo by Sigmund on Unsplash

For a long time we used to code our own API’s to ETL our data from multiple vendors in our organization. We needed this data because we bringing it in house meant we had more control with it. If it lived on a vendors platform, we were at their mercy on the dashboards they provided and the data they allowed us to see.

Maintaining these APIs was tough. You had to keep up to date with the vendors documentation, modify your code according to new versions, think about batch processing, then store it with proper architecture principles just so it was functional.

The explosion of new (and more powerful presence of some old ones) companies focused entirely on data pipelines has made the job of a data engineer much more easier, allowing them to focus their time far beyond just getting data into a warehouse.

It might be hard to believe, but the data engineer was and has always been bogged down by things like this. If you want to improve the productivity of your data engineering team, analysts, and data scientists, it’s important that you offload this task to a platform dedicated to handling this.

3 Data Pipeline Tools You Want To Look Into

Fivetran

Fivetran is a big player in the game and boy are they making a name for themselves. Their focus on the shift of ETL -> ELT for the industry, integrations with modern tools, flexibility in offerings and data democratization is admirable. It helps that they have great documentation that is actually useful!

Hevo

Hevo is another well established platform that many people use. It has a well designed UI and a healthy offering of integrations. They have a well defined fault tolerant architecture that can provide real time data movement. It has a commitment to transformations that differ for the better or worse then other products.

Stitch

Stitch has a great offering of data connectors and flexibility for users to use whichever connector they need. They promote more control for the user in managing their pipelines with data replication. They do have features to enrich data and their pricing model can make sense for a lot of companies. They have a good customer base with large businesses that they clearly market towards.

A Quick Scoring Guide

Although these platforms are extremely similar in a lot of ways, its important to:

  1. Know what you need to evaluate these platform on
  2. Understand the strengths and weaknesses

Integrations

Fivetran: 4/5 — — Hevo: 3.5/5 — — Stitch: 3.5/5

Fivetran offerings with integrations are wealthy, All these platforms generally offer the most popular connectors everyone is using, but Fivetran shines here as it caters to a wider net. As well, Fivetran connectors are available to all plans, so you aren’t forced to upgrade to get a connector.

The experience using a connector on all platforms don’t have the best error messaging when something goes wrong, but I found Stitch to be the best for this. Stitch also allows for customer connectors but you need to have good programming knowledge, and the purpose of these tools are to be no code. Hevo has better support for data lakes in general but this is only purposeful for a small number of organizations in the grand scheme of things.

Transformations

Fivetran: 4/5 — — Hevo: 4/5 — — Stitch: 3/5

Fivetran and Hevo both take the cake here. Fivetran is more ELT focused where as Hevo promotes both ETL/ELT based on your organizational needs. The flexibility with Hevo is great but a lot of people are realizing the benefits of ELT over ETL and Fivetran clearly stands by it.

The major difference between Hevo and Fivetran is how you perform transformations. With Fivetran you are looking at preloaded dbt models (this cuts down a lot of work but also gives you templates to work of off) and the educational approach with transparency in lineage will teach you a lot. Hevo offers transformations via Python, which most data people know by now. You can do way more with manipulating data in Python which is why we ranked it so high, compared to Fivetran where you have to start thinking of data modelling through dbt. I am a big believer in dbt so this is not a negative, but just another thing to consider. Stitch has very limited data transformations.

Technical Support

Fivetran: 4/5 — — Hevo: 5/5 — — Stitch: 3/5

Hevo has the customer support you want and are looking for. They have you covered 24x7, including chat and email support. They have great technical documentation and generally will have you covered. It is maintained well and updated regularly. Fivetran loses some points here because they just can’t match the customer support offerings that Hevo does and their offerings are extremely limited. It shines with the quality of the technician documentation, content, and the tooling strategies is phenomenal. While Stitch offers in app support, companies that charge money for phone support just don’t cut it out for me.

Security

Fivetran: 5/5 — — Hevo: 5/5 — — Stitch: 5/5

The truth is all these platforms are secure. Each one will probably tell you that they are more secure then the other in some ways, they follow industry standards, etc. While data security is ranked on everyone’s top list, these are well established companies that would lose their business if they didn’t have high security standards.

Pricing

Fivetran: 5/5 -— Hevo: 3/5 — Stitch: 4/5

Pricing is always a hard one to judge because different pricing models work for a different set of customers. While Stitch and Hevo have similar models, Stitch wins based of the fact that they are generally cheaper and offer great thresholds based of # of rows you have. Hevo offers a good starting threshold at 1 million events with limited connectors, then the costs add up like nuts. I also think charging by events isn’t a great model for data extraction, as your cost will run up with a lot of connectors.

Fivetran offers a “pay for what you use” model and charges like a utility (but better). They calculate based off MAR and the way it is implemented is super attractive. A primary key can be updated 10 times in a billing cycle but it will always count as one MAR. This is flat-out superior then events and much smarter in general, so you are truly paying for what you use at a reduced rate compared to events!

What’s The Verdict?

Photo by Tingey Injury Law Firm on Unsplash

The truth is that you have to choose a tool that properly fits your budget, needs, and size and shape of your data you are storing. There is no one tool that is completely superior to the other.

We find Fivetran to be superior in pricing, connectors, and transformations (if you believe in more modern frameworks). We find Hevo to be superior in architecture, UI, and customer support. Stitch to us is a great middle ground with extremely attractive pricing that caters to all company sizes and offers great user feedback through error messaging.

They all come with their own pros and cons but I have found that Stitch was a popular tool most people have started moving away from in the pursuit of Fivetran or Hevo.

At Kizmet Solutions, we can help you choose and implement the tool that fits your needs. Starting with any of these tool is great, but you may come to realize that controlling your costs while scaling is difficult in its own. We have expertise and strategies in place to be able to get the most out of your tool at the lowest possible price!

--

--

William D'Souza
William D'Souza

Written by William D'Souza

providing solutions for common data problems @ Kizmet Solutions. www.kizmetsolutions.com

No responses yet