Don’t Procrastinate on Getting Your Data Catalog

What will you get out of a catalog and an amazing tool — Secoda

William D'Souza
6 min readJul 19, 2023
Photo by micheile henderson on Unsplash

In every company I have worked for or consulted with, there has always been one consistent theme. There always seems to be widespread confusion regarding the understanding of any data point within the organization’s systems. Whether it is a data point on a dashboard, a key performance indicator (KPI), or some metadata, there always seems to be a lack of clarity and consistency.

There is no utopia where everyone understands everything all the time; that is why us data people have jobs. Our roles are essential in translating information to others. However, a lot of self-serve systems are often poorly constructed, and the lack of emphasis on enhancing data literacy has become a growing problem that is seemingly given less priority.

How can you empower people in your organization to utilize data to their advantage when you aren’t even focused on making it accessible to them?

Does This Sound Familiar?

You are in a meeting where important discussions are taking place, with everyone using the data they have to prove their points. However, for some reason, the representative responsible for data is not present.

Micro arguments and clarifications about the meaning of certain things (or proposing what they “mean to them”) constantly interrupt valuable conversations. Some individuals do not understand the assumptions made to create the key performance indicator (KPI), or even its origin. It truly feels like everyone is speaking a different language.

You begin to wonder, “Where is the source of truth? Everyone has different answers, and each person seems to have their own perception of what the data represents.

This situation occurs frequently, and after giving up on trying to figure it out, we resort to improvising with what we can and making decisions based on intuition. It’s not the worst thing to do, but you can make better decisions and validate your intuition more effectively if you have data to support it.

The reality is that many companies lack an organized perspective on their data and its definitions because they have not made the investment in what is essentially a readily available (yet highly valuable) opportunity.

Similar to how a product catalog serves as a reliable source of pricing information for customers, a data catalog should serve as the definitive source of truth for all matters related to data and data governance within your organization.

What is a Data Catalog?

The data catalog is often underestimated and overlooked within the data ecosystem, despite being one of the most crucial components in delivering value to your organization.

A data catalog serves as a centralized repository that is meticulously organized and offers a wide range of information about the data assets accessible within your organization. It acts as an inventory, encompassing data sources, datasets, tables, definitions, and additional resources.

Some common things you will see in a well built catalog include:

  1. Data descriptions
  2. Data lineage
  3. Data Quality Indicators (DQI’s)
  4. Data Ownership
  5. Permissions

What Does the Data Catalog Solve?

The primary objectives of a data catalog are to facilitate data discovery and enhance data literacy and understanding throughout the organization. It plays a crucial role in enabling business users and even data teams to effortlessly locate and access the relevant data required for their workflows.

Gone are the days when one had to rely on manual exploration of data to obtain information that should be readily accessible. With the data catalog, individuals can now self-serve their needs by efficiently identifying and comprehending various datasets along with their contextual information.

Side Tangent: Google Docs is Not A CATALOG…

Please don’t think you can try to recreate this with a google doc because its people should be “tool agnostic”. Documentation is already a manual process and there are too many features that are needed to have in a data catalog then to waste any time recreating it.

NO!

You aren’t being “scrappy” or “smart” if you try to justify it. Maybe you think think that its not a big deal because it can be added to the pile of tech debt later on. Let’s be serious, it won’t get done.

NO!

Photo by Andre Hunter on Unsplash

This opportunity presents itself as a low-hanging fruit with significant value, devoting minimal time to get it set up the right way and develop processes around it can be easily achieved. These processes can be KISS oriented and don’t need to be intense, but thinking you can inject organization into a pile of garbage years later is never going to work.

Advantages You Will Get From A Good Catalog

You will start to see a more data literate organization with better communication skills around data in the long term, and you will even see some immediate benefits right away:

  1. Data Visibility & Transparency

You will get a holistic view of available data assets across the organization, whatever cave they may be lurking in

2. Data Governance

You will see higher standards of data governance through data lineage, increased quality, and better compliance

3. Data Collaboration

An increase in collaboration with your teams by allowing people to share and communicate about the data itself

4. Data Re-usability & Consistency:

It promotes re-usability and consistency by promoting a better user experience to discover and leverage existing data assets rather then redoing the same work

5. Data Security

It helps enforce controls and security policies by giving visibility into data ownership and permissions

6. Data Insights

Users will be able to assess data quality, understand data relationships, and get insights by exploring metadata

My Favorite Tool — Secoda

I’ve worked with and evaluated tons of data catalogs, and the options are really restricting (and quite annoying). Hands down the best tool that I have seen and worked with to date is Secoda (start a free trial and try it for yourself, the product speaks for itself).

Why do I suggest this tool?

1. You need the flexibility to adapt to changing ecosystems in your data stack with ease, and Secoda will get you there with their integrations

2. You aren’t limited to using just one ecosystem (Tableau, Databricks, etc), you can pull out tools from different stacks and implement proper solutions, and Secoda keeps up to date

3. You need an outstanding UX/UI. Catalogs are hard to navigate and they put a huge emphasis on their users

4. They make documentation easier consistently with newer and faster technology, making it the catalog more readable for everyone

5. Their prices just make sense. Data Catalog tools shouldn’t cost an arm and a leg but at the same time they shouldn’t be undervalued in price. You get more then you think with Secoda

6. They have a great staff that you ACTUALLY want to work with… they treat you like a partner, not just a customer

Implementing Secoda and Getting Started

Implementing Secoda is easy and they have made it seamless for their users to get started. There’s helpful tutorial, a dedicated slack community, and diligent customer service if you need it.

Just integrating is not enough, however. If you want to see quality from this tool, you will need tohave a solid understanding of data architecture, data modelling, and data governance. Without this background knowledge you may end up with spaghetti.

Try Secoda for yourself as you might have all this figured out, or give us a shout at Kizmet Solutions and we will be happy to evaluate your stack to consult and implement with high standards, ensuring you get maximum value out of your catalog!

--

--