Blog Post

How to Control Analytics Entropy with Data Asset Management

by
Nick Freund
-
January 31, 2024

Last time on the Data Knowledge Newsletter, we explored the many facets that have contributed to the problems of data asset entropy in the modern organization. From the explosion of tools to visualize and consume data, to the ever growing centrality of analytics teams, data proliferation has meant that data is more available than ever. However, those same factors have resulted in analytics disorder and sprawl, which have decreased efficiency and trust across the organization when it comes to actually using that data.

Today we will examine what analytics teams have done to overcome these challenges in the past, and suggest a new model for dealing with entropy using the right tools for Data Asset Management.

How have organizations addressed this problem in the past?

Even if the problems of entropy have accelerated in recent years, it has always been a problem to address. Data teams often think that their choice of an organizational model is a silver bullet to the problem of data asset entropy.

Unfortunately, one of the most common strategies to better enable an organization with data can be an accelerant of entropy and disorder: the self-service model. By enabling anyone in the organization to self-serve, operational teams are no longer reliant on tactical support from their data counterparts. This requires far less support from data teams and empowers end users to generate their own assets and insights. If done thoughtfully, it encourages agility, and a high degree of individual impact. It can also, however, be a multiplier for entropy.

Entropy grows with every tool and every report or dashboard, and if users are building their own assets—sometimes even duplicating the work already done by other users or teams—sprawl is inevitable. As a result, your organization could end up with thousands of assets, with no one in charge of maintaining the vast majority. 

The opposite approach, the centralized control model, is probably the most direct buffer against entropy. If the analytics team owns and manages everything, things naturally can’t get nearly as far out of control. The obvious problem here is that analytics teams then become gatekeepers for the organization’s data, and the level of individual empowerment to make data-directed decisions drops off for the rest of the organization. This also increases information asymmetries between teams.  

A nuanced approach between self service and control is where most innovative teams end up; but no working model in and of itself addresses the root causes of entropy. 

Because the models themselves cannot solve entropy, most teams will end up employing strategies to identify which 20% of high profile assets are most useful to the most people and should therefore be curated and maintained, and which 80% are just noise and need to be managed out. 

Chart differentiating the 80% from the 20%

A more directed focus on the 20% has historically led data teams to implement strategies to manage and curate their most important assets:

  • Create folders of certified assets: Certified assets are those the data team stands behind, and will actively support. Teams may create folders of these assets, or put a certified logo or label in the UI of a given dashboard. Having this sort of clear quality marker goes a long way toward making sure users know which assets are appropriate to use, but does not work well when managing many different systems. 
  • Build documentation in an intranet: Intranets have become critical to cataloging assets, as teams will often list out all the most important assets and provide guidance for how to use them. In contrast to maintaining folders of certified assets, intranets are more flexible and allow teams to build content pertaining to assets from many different systems at once. However, they may also exacerbate the problem by introducing a new set of one-off documentation to maintain, knowledge which ends up in a separate system from the data it relates to.
  • Maintain a data catalog: Data catalogs help teams keep track of data lineage, metadata, and the underlying tables that power their data assets. Data catalogs are helpful for larger data teams, but are targeted at a technical audience, and typically focus on addressing data sprawl within your data warehouse. This might help your data team, and folks that can write SQL, but does not solve the broader disorder within your organization.

The 80/20 approach necessitates understanding which data assets make up the less used and less valuable 80%, and how to take them to end of life. Common strategies include:

  • Conduct user interviews: This type of “anecdata” is useful to know who uses what, but is vulnerable to selection and confirmation bias depending on who you interview and how questions are phrased. It is also time-consuming.
  • Mine available usage data: Many analytics tools will have some level of usage data to help you establish insights into which assets are the most frequently used. This can be valuable, but is prone to gaps between heterogeneous systems, and even worse, some systems don’t make this usage data available at all. 
  • Quarterly clean-up/mass delete: The brute force method is sometimes necessary to get things under control. A combination of manual and automated processes will likely be necessary. For a rundown on some of the automated strategies that you can employ to reduce the manual load of this process, see this article from data engineering writer Sarah Krasnik. It should be noted, however, that these strategies require solid availability of high-quality usage data, which is not always a given across different systems, as discussed above.

All of these solutions have a place in data management, but each has its own tradeoffs and shortcomings, and is unlikely on its own to solve the problem of entropy. It is clear that a more comprehensive solution is required.

Why implement a data asset management solution?

"Without proper maintenance of those assets, without looking at a software development process or a product process too, and recognizing there is a cost to those things, you end up inevitably in a state of data chaos."

- Jamie Davidson, Co-founder, Omni

We would like to offer a somewhat different approach to solving this inevitable and fundamental problem of entropy: a data asset management solution like Workstream.io.

A data asset management solution is distinct from many of the other solutions we have discussed in this article—from data catalogs, to intranet solutions—in a number of ways. First, and perhaps most crucially, it is purpose built for the problem of entropy in your data environment. Rather than repurposing functionality from another tool, you are using the tool designed to do this work. 

Second, a true data asset management solution is integrated with your modern data stack. It is not a new unique destination—and therefore does not contribute to the asset sprawl that is at the root of entropy. It is a single, integrated experience.

Finally, a data asset management solution solves the pain points experienced by both data consumers and data builders, addressing entropy as well as the knowledge asymmetries that both contribute to and result from entropy. These benefits accrue as a greater number of builders and consumers adopt and use the solution, as the breadth of available data and insights increases with every user.

The functionality required for a solution deployed to reduce the sprawl in your data environment is varied depending on the exact needs of your organization, but we have identified some key pillars that we feel are table stakes to any such solution.

  • Analytics hub: One of the key culprits of entropy is the sheer number and scale of different analytics solutions. Bringing together all of your assets in a single hub or access layer is absolutely essential to making your data accessible to everyone in the organization. If users are not working out of an analytics hub, they are probably missing crucial information.
  • Automated lifecycle management: You may already have processes to dictate where in its lifecycle any given data asset is. However, manually maintaining these processes is a huge resource drain for your team. Having a solution that automates even some of this work allows you to keep your environment more up to date.
  • Automated collections: Bringing together disparate data assets is great, but without the ability to organize those assets further, an analytics hub would be just a list of assets. Collections solve that problem by bringing together consolidated libraries of assets from different systems—and should be automated to make maintaining and sharing them as low a lift as possible.
  • Data asset analytics: As mentioned before, it is often incredibly difficult to figure out exactly who is using which assets so that you can solve the 80/20 problem. A data asset management solution should consolidate this type of usage data and give full transparency about which assets your organization finds valuable.
  • Data social graph: Finally, it is clear that other users are often the best guides for how an individual user should be working with data. Building a social graph to chart interactions with data, and exposing these connections wherever possible to stakeholders, gives a sense of ownership and collaboration, and helps solve the knowledge asymmetries that often keep stakeholders in the dark.

Conclusion

Over the next few weeks we will be diving deeper into each of these pillars of data asset management. They all represent facets aimed at solving one part of the ever-present problem of entropy. We hope you will continue to dig in further to this subject with us as we look to unify your fractured data environment.

If you would like to learn more about Workstream, you can create a free account at app.workstream.io, or set up a demo.

by
Nick Freund
-
January 31, 2024