Blog Post

Searching and Discovering Data Across Platforms

Nick Freund
October 15, 2023

Ensuring everyone can access the right data at the right time is the holy grail for any organization adopting a data-driven approach.

In many ways, the charter of a data team is to understand what the “right data” is for their organization, and to ensure that it is delivered. But delivering that data to users at the “right time” to meet business needs is arguably a bigger challenge.

One fundamental problem is that the data needs for any given user may change dramatically throughout the course of a week, or even a day. What is relevant to them when they are working on a department-wide presentation might not be when they are engaging with customers. They are also often time-constrained – there may only be a five-minute gap before a client meeting where they need to understand product usage data. These problems around delivering appropriate data grow exponentially as the number of data users supported multiplies. 

As a result, while curating data assets and knowledge is important, ensuring it is available easily and quickly – or at the “right time ” – is arguably the most crucial determining factor to whether data is actually used. The key is a robust system for searching and discovering data, which ensures that insights are readily available in a busy and dynamic work environment.

Data Sprawl and the Limitations of Search

Why is it often so hard for users to find the data they need, when they need it?

Sprawl is one of the main reasons. Data is now everywhere, in dozens of different platforms at any given company, and most companies now expect their end users to actively use data to make decisions. But it is hard to find data when an end user may literally not know what platform that it lives in. 

They might search through Salesforce reports, browse Looker dashboards, try to find a previous dataset they dumped in Google Sheets, before finally realizing the information they seek is actually part of siloed documentation found in Confluence. Searching individually in four or five different systems is a nightmare of any employee who has to prepare for a meeting in five minutes, or provide their boss with a quick answer to a question. Platforms and their terminologies are often incredibly specialized as well. Each system may have different search capabilities, or ways of referring to the same type of information.

Because of these limitations, finding data can be both slow and aggravating.

What Solutions Are Out There?

The baseline solution for solving this issue has been one of the classic ways of storing and organizing information for any organization – the intranet. We have talked extensively on this blog about the ways in which intranets decontextualize the data they relate to, and the challenges they present in giving teams another system to manage. As such, it is hard to recommend them to solve issues with search.

A more intriguing option, in our humble opinion, is to explore one of the universal search tools that have emerged in recent years. Solutions like Command E integrate with your tools (e.g. G-suite, Slack, Notion, JIRA, Figma, etc.), and allow you to search all of that information from a single place. These tools are fantastic at collapsing the silos that make information so hard to find across the many different tools you use on a daily basis.

When discussing data, though, there are a couple of problems still left unsolved by universal search. Since these tools lack deep integration with data tools, they can provide at best a surface-level search through your data. They are also more focused on generic discovery across every touch point, rather than discovery that takes context into account – data and data knowledge are inextricably linked, but a universal search tool will treat them as two separate things. 

Discovering Knowledge, Not Just Data

As you have probably guessed, searching in its ideal form should allow you to search across every platform where your users find and consume data or data knowledge. 

But arguably more important is that discovering your data goes beyond simple metadata like a dashboard’s name. More relevant to when and whether it is used is the collection of knowledge about that data – everything from documentation created by your data team, to conversations had about aspects of it, and even to metadata like usage. A more powerful version of search allows users to discover assets via this knowledge, and vice versa.

To paint an example – if a salesperson does not know the name of a dashboard or dataset, they likely cannot use a traditional search utility to find it. Instead, they might be able to find what they need by asking, “What report was the VP of Revenue looking at to understand churn?” With usage metadata, or being able to find a conversation the VP had about that report, they could actually answer that question.

A massive amount of metadata is currently unavailable with traditional search. You might be able to crawl through Slack or JIRA tickets, but that is just as problematic as trying to find something in three different BI tools. Data knowledge is incredibly diffuse, and having to search in a specific workflow tool to find it is slowing end users down.

Searching your data knowledge means that users can query the full context of what the data is, and how your organization relates to it.

What Does the Future Look Like for Data Knowledge Discovery?

We believe that the first step toward a better search experience is the simple consolidation of data assets in a single place, accessible through a single search. From there, we would argue that extending that search utility to all data knowledge and knowledge metadata is arguably most important to providing users the right data at the right time. 

Further down the line, we believe that AI and natural language processing will be transformational in allowing searches to be more user-friendly, by allowing less structured queries to return the same results. As AI progresses, it will be able to screen out all non-relevant results depending on the context or business requirements of the person searching. If a user needs a dashboard showing sales figures, the search should not even touch the multitude of procedure docs that might have a loosely similar name. 

In this sense, we see search needing to develop in two directions: it needs to get wider and more integrated in order to make every piece of knowledge your organization has generated available, and it needs to get smarter, so that this broader scope of knowledge can easily be parsed to return only what is relevant.

Nick Freund
October 15, 2023