Blog Post

Doing Data Documentation the Right Way

Nick Freund
January 31, 2024

If you are a data professional, you have most likely spent a solid portion of your career building things. You have probably built ETL pipelines, or written the SQL your team uses as the foundation for your organization’s data platform. 

One thing that becomes clear when you actually deploy that data to the rest of your organization, however, is that the end users who actually end up consuming the data are coming from a completely different place. Rather than having the context of what went into building an analytics asset, they most likely come to it simply looking for an answer. 

They don’t have all the knowledge your data team does – and why would they? Their job is to drive impact in other areas of the business, like supporting your customers, or designing marketing campaigns to source inbound leads. They come to you and your data platform to ensure they are applying data to their execution of those tasks. 

This is an environment in which knowledge asymmetries are the rule, not the exception. Knowledge is not distributed equally – context of both the business as well as datasets varies widely. Effective documentation that contextualizes your data is key to countering these effects, and ensuring that data is an enabler, rather than a stumbling block.

Knowledge Asymmetries Kill Efficiency – and Culture

At the core of the need for documentation is an imbalance in knowledge between different teams. This cuts in both directions:

  • The data team has a much better understanding of the datasets used to create your reporting, including any upstream context and/or limitations that are not readily apparent at first glance.
  • Simultaneously, operational teams have a much better sense of how their employees operate on a day-to-day basis, and what jobs to be done require the input of data in the first place.

Two things have the potential to grow In the gap between these differing types of knowledge: inefficiency and, in the worst cases, lack of trust.

The inefficiencies here are obvious to anyone who has worked on, or interacted with, a data team. Rather than gleaning insights that they can apply, end users stumble over functionality, or are confused by what a metric means. They will bombard your service desk or Slack channel with questions, requiring the analytics team to address repetitive questions rather than evolving the data platform. 

For end users, the frustration this causes may be even more acute, as an activity that should in theory be enhancing their ability to do their work suddenly drives confusion and wasted time instead.

A natural result of the challenges felt on both sides of this equation is that end users will begin to distrust the data they are provided with, and dislike the process of interacting with the data team. When there is this type of mistrust, the ability to collaborate and drive effective insights breaks down entirely.

Documentation Drives Data ROI and Cuts Time to Value

There are many solutions to the challenge of knowledge asymmetries. But the lowest hanging fruit for most data teams is in documenting their data assets, so that any end user who picks up the dashboard or report will know what they are looking at, and how they should be using it, without having to ask.

The benefits of effective documentation are far-reaching, but here is a quick taste:

  • New employees joining the company will require less bespoke training before ramping up onto your team’s data assets, and can start making important business decisions sooner.
  • Updates to existing assets are clearly marked and understood, and do not inspire a host of questions about why something changed.
  • Because there are fewer questions about the data, the data team has the time and bandwidth not only to build new assets, but also to engage earlier on with stakeholders to ensure that they are building the right things. This creates a feedback loop where better initial decisions about what to build cause the data team to invest less in answering clarifying questions and addressing confusion, and allow them to reinvest that time and energy into gathering stakeholder feedback.

All of these increase your return on investment from data. An under-documented, under-explained data environment is one that will be used poorly, or not at all. Don’t let the effort your team puts into building vital data products go to waste, or be slowed down by having to answer question after question.

Doing Documentation the Right Way

When we say documentation, you might have an image of a basic intranet page with a simple text description of the asset, and maybe a table with field definitions. 

While documentation can look like this, documentation done right will consider two main factors. 

  1. What types of information to include. We recommend considering: 
  • Explanatory descriptions and metric definitions
  • FAQs that anticipate the questions you will receive
  • Visual indicators like diagrams and screenshots
  • Videos showing live demonstrations of how the asset works and should be used
  • Pendo- or Appcues-style training guides that present information in context
  1. What mediums and channels documentation should be distributed through to reach folks where they work and given how they learn. We recommend considering the following:
  • What mediums will work best for end users with different learning styles and attention spans.
  • What are the working cadences/times and places that users will need to be enabled on their data?

What works for one person might not work for another, and providing only one solution will increase the chances that documentation will be ignored.

As a final note, it is crucial to strike the balance between giving adequate context and not overloading end users with irrelevant information. One good solution can be to summarize key points, and then include a more detailed section, or video walkthrough if end users are curious to learn more. And in all of this, keeping in mind who is the target consumer, and what level of detail they will need, will help keep you on the right course.

Document in Context, As You Go

The point of providing documentation is, simply put, to provide adequate context so that end users can use data assets effectively. 

As such, beware of documentation solutions that flatten or erode context. The intranet is one of the worst and most commonly used offenders. Intranets have a tendency to create their own ecosystems, divorced from the actual thing they are documenting. They typically provide at best a link to an asset, requiring inefficient navigation back and forth. 

Data Knowledge platforms that present your documentation directly in-line with your assets are much better at retaining the correct context, and preventing the sprawl of another system like an intranet. Any time a user switches to a different system for information, some level of nuance or critical information will likely be lost. This is one reason tools like data notebooks have become so popular, and why at we’ve emphasized bringing documentation and FAQs directly beside your assets as one of our core features. 

Working in context also makes it easier to document as you build assets, rather than all at once after the fact. This keeps the process quick and easy for data builders. Creating a limited amount of documentation incrementally is simpler than trying to create a ton at once. It is also easier to remember nuances when they are still fresh. Think about it like unit test coverage: investing a little extra work every time saves you from a huge backlog of tech debt later on.

Of course, building documentation as you go also ensures adequate coverage for consumers, who won’t experience a gap in documentation. When they know they can expect context through documentation every time they access a new asset, they will come to trust the process, and the data.

So as you build that next important dashboard, or self service asset, think critically about the end user experience, and what kind of information will provide the necessary context for folks to do their job. It requires an extra step and a little extra time, but the time savings in answering repetitive questions, and increased quality of data interactions, will pay huge dividends long term. 

Nick Freund
January 31, 2024