Diagnosing Your Data Pains

5 Min Read

What is a Data Pain?

The storyline may sound familiar: As an executive, your company has a new set of aggressive business goals for the following fiscal year including how to incorporate Gen AI applications and grade the ROI of each of them.  To achieve those goals, your data team is expected to develop KPI tracking, customized insights, and develop benchmarking to drive the performance by initiative that is needed. However, the necessary “flywheel” of data-driven decision-making simply doesn’t exist in your organization. Discussions are delayed, shallow, and often generate more questions than answers.

Despite significant investment in in-house data resources, both people and tools, these expensive resources are scattered across different teams. Their work is not aligned, nor is it efficient, and your staff lacks the motivation to tackle larger, often undefined, issues. You are confident that you possess the right data assets and resources to realize your business goals and outpace the competition, yet there is a persistent drag on your data efforts that you cannot quite identify. These are all very real data pains that require attention in your organization.

It’s time to re-evaluate your Data Strategy.

Your Data Strategy is the network effects of an optimal setup of Data Architecture, Data Tech Stack, Data People, and a Data Culture that creates marginal gains for the business in real dollars. No two organizations will have the same Data Strategy as each one will reflect your Market, Product, Org Alignment, and unique business goals. 

With over 20 years of experience working in Data across start-ups, scale-ups, and large corporations, I’ve established individualized Data Strategies that compress the bottom line (expenses) and raise the top line (revenue) all while curing a myriad of the Data Pains which materialize time and again.  Despite new disciplines, new data tools, and new technological breakthroughs, it is your Data Strategy that will unlock or restrict the full strength of your data assets (read: AI/ML, Gen AI, and Monetization) to differentiate your business.  

As a CEO, how do you know your Data Strategy is the right one?

A CEO does not need to understand the nuances of a semantic layer in your Data Strategy, but they will understand the pain of requesting a basic KPI and having to wait weeks for the response. Additionally, if your Head Of Data (HOD) is not evolving the systems and processes with each point of friction, this is your clear sign that you need a HOD who understands how to evolve the Data Organization as a part of refreshing (or building) your Data Strategy. 

This paper will dive into a few common data pains and indicate how your Data Strategy can and should allow for richer, faster, and more frequent discussions around experiments in your business that drive growth. 

Common Data Pains & Their Cure

Data Pain #1: Gen AI & Your Data

Challenge: You want to integrate Gen AI applications at an enterprise level, but you are unsure which Large Language Model (LLM) provider is best and layer one on top of your existing data.

Solution: I have lived in NYC for 20 years. I often find landlords will “spruce up” apartments in between tenants by painting over everything: light switches, radiators, etc. While that may work for some time, eventually you start to see “crusty corners” of decades of paint layers and you can tell maintenance is not a priority in that building.  

Similarly, the establishment of your organization’s data cleanliness, scalability, and quality should be a precursor to any Gen AI application integration. Maintenance should follow, being an always-on part of the work.  Before you can start to assess the quality of the LLMs, you have to assess the quality of the data you plan to feed them.  This effort would fall into the Data Architecture pillar of your Data Strategy.  If you notice, every Gen AI app demo starts and ends with clean, aggregated data so that the retrieval is fast, as well as through natural language.  Both are required for the audience to be impressed.

An assessment of your semantic layer should take place before you start thinking about Gen AI applications.  Why? Costs and accuracy (known as “hallucinations” in the Gen AI world).  Having a Gen AI application read through the cleanest, most organized version of your data will reduce costs as you won’t burn through cash a la GPUs.  You’ll also ensure against hallucinations, which can prove embarrassing (and potentially harmful) if presented to clients, your board, or even your investors. Gen AI, while helpful, will be less helpful if your inputs are low quality.  Over the last decade, I’ve witnessed several companies' data stores grow, with little attention placed on how to prune them for efficiency or future needs.   Adding tooling on top of a fragmented database will assuredly lead to crusty corners.  

Refactor your semantic layer, carve out time for data debt, and allow for regular checks within quality assurance as a part of your Data Strategy refresh as well as day to day data operations. 

This philosophy also applies to qualitative data.  More sophisticated workflows, such as Agentic AI, will lean towards crisp, clear, and thorough documentation of technical workflows (think technical spec, API, and SDK documentation) to come up with their next logical step.  So if you want to create the flywheel of efficiency vis-à-vis Gen AI, you’ll first start with a quantitative and qualitative base that is the highest jumping off point you can. 

To summarize, it likely won’t matter which Gen AI LLM provider you go with if you don’t have an optimal semantic layer of data in your Data Architecture for them to pull from. 

Data Pain #2: The Surging Costs of Data Tools

Challenge: You have invested six, or potentially 7, figures into your data tech stack, and costs are rising double-digit percentages year over year. You can’t put your finger on what added value is coming with the hefty price tags.

Solution:  Your business can survive, even thrive, without every tool from the modern data tech stack. Why? Most cloud-based tools already provide features that were once available only in specialized products. But let’s rewind for a moment and ask, what is a “modern data tech stack?” The definition of a modern data stack is that it allows for discoverability, security, accessibility, analytics layering, and data interoperability (a growing theme).  Major cloud-based data providers (Hyperscalers: Google Cloud Platform, Amazon Web Services, & Microsoft Azure; Analytics Layer: Snowflake & Databricks) already allow for these features through built in features or connectors. 

There are some exceptions that can create significant efficiencies for the right organization. Take Sigma for example: a modern BI tool allowed one of my clients to smash dashboarding time, because our stakeholders wanted to pivot the data they already had. We then repurposed the data team’s time to produce company-wide insights for Product Development.

To understand your Data Tech Stack investment, you’ll start with a data flow audit. Mapping out source data to the end applications, along with all the pipelines, tools, and costs for the data flows.  Your HOD should be familiar with each tool, what the opportunity to negotiate the costs down of those tools is, and what the alternatives for each tool are, keeping in mind switching costs.

Considering your unique Data Audit, your ideal tool inventory should contain one from each of the following categories:

  1. Hyperscaler / Cloud DB

  2. ETL & Reverse ETL Tool

  3. Data Modeling Tool - optional, but can be useful in the right organization

  4. Analytics/Compute Layer - optional, but can be powerful

  5. BI Tool 

  6. Gen AI Infrastructure App - optional, and should be after your semantic layer is cleaned, unified, and structured for your business outcomes.

So to summarize, you likely will require 3 tools to start and can layer on more when the expected ROI of each new tool is positive.   The good news is that data tools are always trying to edge out the competition in categories they don’t already own, so you can really push the edges of each tool instead of stacking tools for tools’ sake.  By the above methodology and knowing how to negotiate, I was able to bring down the next year’s costs by 25% over the tool stack that I oversaw at company X. 

Data Pain #3: Generating Dollars with My Data

Challenge: You believe your data is unique and powerful enough to generate additional revenue by repackaging it for external parties. 

Solution: This particular challenge sits across your Data Architecture and Data Culture pillars.  Here’s the hard truth: In the short run, you are more likely to save money with your data than generate external revenue. As such, your foundational data years are better spent building a strong data culture that optimizes internal operations.

To successfully monetize your data externally through a SAAS-style Data Product, your company needs a few key pillars in place:  

  1. Unique Data Position 

    • Be at the intersection of a marketplace to see both buy & sell side activity;

    • Hold more than 50% of activity at either end of the marketplace; or

    • Possess 1st party data on an Industry that is regulated or difficult to track

  2. Proprietary Data Product

    • Develop a unique Data Product from your proprietary data, which can take 6 to 12 months. 

  3. Separate Infrastructure

    • Implement a dedicated database instance and ETL layer that will support the data extraction. 

  4. Quality Control

    • Establish quality control on both the data itself and the ETL layer. 

  5. Customer-Facing Delivery

    • Embed a BI vendor all with governance and visit tracking layers into your customer facing application (website or app).

  6. Customer Support

    • Provide a support line for any issues or feature/product requests. 

Apart from the self-serve approach above, I’ve personally driven much success with customized data stacks for the highest paying customers, but the effort is sizable, and revenue is slow to materialize. 

I do want to emphasize that optimizing to maximum organizational efficiency is “profit generation” the same way financial pundits see paying down debt is similar to saving on a whole. I will wrap this section up by nothing that while it is possible for your data to make dollars from your data, it's more likely for your data to make sense (or save cents) as a first step.

Data Pain #4: Making Your Data Team Proactive

Challenge: As a C-suite executive, you notice your data team’s efforts are more reactive than proactive.  You also bear witness to no top-down strategy for how to improve upon your Data Strategy’s pillars.  Your top person is loyal but is unfortunately not able to think ahead and scale the business the way you know it is needed.

Solution: A reactive Data Team is usually a symptom of the problem versus the problem itself.  Why? I once worked at Company X that acquired Company Y. I asked two analysts, one from each company to pull information on a shared client: same type of data, same day ask, and same parameters pulled. One query was 300 lines, and one was 3.  You are talking about a difference of a few minutes vs a few days and many times difference in error potential.  

That said, I would come back to the Data Strategy map to understand where the organization, not just the Data Team, can improve on the four pillars: 1) Data Architecture, 2) Data Tools, 3) Data People, and 4) Data Culture.  All four pillars work in harmony and create network effects.  In our practice, we break these down into sub-pillars that further drive efficiency.  If you had to start somewhere, I recommend that you assess your semantic layer in the Data Architecture pillar. The semantic layer is the layer of data that is cleaned, unified, aggregated, and has business definitions applied to be used in BI tools.   Most start-up and scale-up organizations overlook this step, and it leads to more reactive than proactive work.  I’ve seen massive gains from simple workflows: compressing your dashboard footprint, democratizing data vs dashboards, cleaning up the semantic layer, and creating a master catalog for reference.  These seemingly elementary tasks can scale the work of each team by 15 - 30% over the course of 12 months (a real-life example I put into play at company X), and that’s before we think about the efficiency gained from Gen AI applications. 

You can then repurpose the time into more strategic projects. To further make your Data team proactive, I then ask leaders whether the Data Team is in the room from inception of a project or are they pulled in only when data is needed? Being treated as a thought partner goes a long way in building reciprocal, and proactive relationships. 

Data Pain #5: Customer Churn 

Challenge:  You notice a “leaky bucket” of customers with alarmingly high churn. You want to understand what’s happening, but the process is slow and piecemeal. As CEO, you want a data-driven story with actionable items.

Solution #5:  To study and pre-empt churn, you need a few criteria in place:  1) sufficient data that provides insight into what’s happening; 2) awareness that churn can be studied from multiple POVs, each valuable for action items; 2) a cross-functional group that can enact change to collect additional data points for the iterative study of churn. Ideally, you pull in influencers that can drive action items within their functional teams. 

A study such as churn should follow the Crawl, Walk, Run, Fly iterative approach:

  1. Crawl: Dashboard where you are today with available data. Distribute the report daily or weekly to key stakeholders. 

  2. Walk: Identify insights as to why churn is higher than expected. Analyze data both forward from onboarding and backward from the date of churn. Partner with cross-functional teams (Sales, Programs, Product) to develop and measure improvement programs using A/B testing.

  3. Run: Collect more data based on the gaps of Crawl and Walk. Build a more advanced decision science model to predict churn.

  4. Fly: Once stabilized, add your new churn predictor by account to a report or an internal user interface (personalization) to increase the efficiency of your commercial teams. 

Using the above approach, Decision Science Institute was able to identify actions to reduce almost 75% of customer churn. I should mention that a realistic expectation is 6 to 12 months to reach the Fly stage, so having a clear plan and tracking progress is essential for managing expectations.

Conclusion

While I am not a physician, I am an M.D.—a Managing Director—of my own company, and my life’s work is diagnosing and treating organizations’ unique data pains. My passion is to help you find and implement the right Data Strategy.

Remember, having the right Data Strategy is only part of the equation. Your Head of Data must execute and iterate on this strategy over time. All the solutions detailed in this paper are iterative; finding the sweet spot for your unique organization will require time and experimentation. If you are open to experimenting for the better, let’s chat.