Why you should be concerned about data observability

We’re excited to bring Transform 2022 back in person on July 19 and virtually July 20 – August 3. Join AI and data leaders for insightful conversations and exciting networking opportunities. Learn more

Imagine leading a customer success operations team responsible for preparing a weekly report for the CEO with customer churn data and analytics.

You deliver the report over and over to be notified of issues with the data minutes later. It doesn’t matter how strong the ETL pipelines are or how often the team reviews the SQL queries – the data just isn’t reliable. This puts you in the awkward position of repeatedly returning to the leadership and telling them that the information you just provided was wrong. These interactions affect not only the CEO’s confidence in the data, but also in the conclusions you draw from it. Something has to change.

In today’s business landscape, many companies manage petabytes of data. This is a larger volume than most people can comprehend – let alone manage – without a methodology for thinking about the health of datasets.

Perceptibility is a well-known concept

So how do you feel about managing the health of such large data sets? Think of a car. A car is a complex system and the actions you would take to fix a flat tire are different from those for engine trouble. Fortunately, you don’t have to inspect the entire vehicle every time it breaks down. Instead, you rely on tire pressure or indicator lights to warn you—usually prior to serious consequences—not just that there is a problem, but what part of the car has been affected. This kind of automatic emergence of problems is called observability.

In software engineering, this concept exists up and down the stack. In DevOps, for example, a warning and an easy-to-use dashboard give the engineer a head start when solving a problem. Companies like New Relic, DataDog and Dynatrace help software engineers get to the heart of problems in complex software systems quickly. This is observability of the infrastructure. On the stack, in the AI ​​and machine learning model layer, other companies provide observability to machine learning engineers about how their production models perform in ever-changing environments. This is machine learning observability.

So what infrastructure observability does for software and machine learning sensing does for machine learning models, data observability does for dataset health management. These disciplines work together and often you have to rely on more than one of them to solve a problem.

What is data sensing?

Data observability is the discipline to automatically surface the health of your data and resolve any issues as quickly as possible.

It’s a fast-growing area with big players like Monte Carlo and Bigeye, as well as a clique of newcomers like Acceldata, Databand and Soda. The software infrastructure observation market, which is more mature than the data observation market, was estimated to be more than $5 billion in 2020 and has likely grown significantly since then. While the data observation market is not that well developed at the moment, it has plenty of room to grow as it caters to different personas (data engineers vs. software engineers) and solves different problems (datasets vs. web applications). In total, companies focused on observing data have collectively raised more than $250 million to date.

Why companies need care

Every company these days is a data company. This can take many forms, from a technology company that collects user data to better recommend content, to a production company that maintains large internal data sets about security systems, to a finance company that makes important investment decisions based on data from third-party vendors. Today’s technology trends, from digital transformation to the shift to cloud computing and data storage, are only amplifying this influence of data.

Given the heavy reliance of organizations on data, problems with that data can penetrate deep into the enterprise and affect customer service, marketing, operations, sales, and ultimately revenue. When data drives automated systems or mission-critical decisions, the stakes can multiply.

If data is the new oil, it is critical to monitor and maintain the integrity of this precious resource. Just as most of us wouldn’t tape the check engine light, we need to pay attention to data observation practices along with infrastructure and AI observability for the companies that rely heavily on those areas.

As data sets grow and data systems become more complex, data observability will be a critical tool to realize maximum business value and sustainability.

Aparna Dhinakaran is co-founder and CPO at Arize AI, provider of machine learning observation capabilities. She was recently named in the 2022 Forbes 30 Under 30 in Enterprise Technology and is a member of the Cognitive World think tank on enterprise AI.

DataDecision makers

Welcome to the VentureBeat Community!

DataDecisionMakers is where experts, including the technical people who do data work, can share data-related insights and innovation.

If you want to read about the latest ideas and up-to-date information, best practices and the future of data and data technology, join us at DataDecisionMakers.

You might even consider contributing an article yourself!

Read more from DataDecisionMakers

This post Why you should be concerned about data observability

was original published at “https://venturebeat.com/2022/03/26/why-you-should-care-about-data-observability/”