We’re excited to bring Transform 2022 back in person on July 19 and pretty much July 20-28. Join AI and data leaders for insightful conversations and exciting networking opportunities. Register today!
Every business today is data-driven or at least claims to be. Business decisions are no longer made on hunches or anecdotal trends as they were in the past. Concrete data and analytics are now the foundation for companies’ most critical decisions.
As more companies leverage the power of machine learning and artificial intelligence to make critical choices, there needs to be a conversation about the quality — the completeness, consistency, validity, timeliness and uniqueness — of the data used by these tools. The insights companies expect to get from machine learning (ML) or AI-based technologies are only as good as the data they use. The old saying “garbage in, garbage out” comes to mind when it comes to data-based decisions.
Statistically, poor data quality leads to greater complexity of data ecosystems and poor long-term decision-making. In fact, about $12.9 million is lost every year due to poor data quality. As data volumes continue to grow, so will the challenges businesses face in validating and validating their data. To resolve data quality and accuracy issues, it is critical to first understand the context in which the data elements will be used, as well as best practices to guide the initiatives.
1. Data quality is not a one-size-fits-all undertaking
Data initiatives are not specific to a single business driver. In other words, determining data quality will always depend on what a company is trying to achieve with that data. The same data can affect more than one business unit, function, or project in very different ways. In addition, the list of data elements that require strict governance may vary among different data users. Marketing teams, for example, would need a highly accurate and validated email list, while R&D would be invested in quality user feedback data.
The best team to determine the quality of a data element would then be the team closest to the data. Only they can recognize data as it supports business processes and ultimately judges accuracy based on what the data is used for and how.
2. What you don’t know can hurt you
Data is a business asset. However, actions speak louder than words. Not everyone within a company does everything they can to ensure that the data is correct. If users don’t see the importance of data quality and management, or simply don’t prioritize them as they should, they won’t bother both anticipating data problems with mediocre data entry and raising their hands when they find a data source. problem that needs to be resolved.
This can be practically addressed by tracking data quality metrics as a performance goal to promote greater accountability for those directly involved with data. In addition, business leaders must defend the importance of their data quality program. They should align with key team members about the practical implications of poor data quality. This includes misleading insights that are shared in inaccurate reports for stakeholders, which can potentially lead to fines or sanctions. Investing in better data literacy can help organizations create a culture of data quality to avoid making careless or misinformed mistakes that hurt profits.
3. Don’t try to cook the ocean
It is not practical to solve a large laundry list of data quality problems. It is also not an efficient use of resources. The number of data elements active within a given organization is huge and growing exponentially. It’s best to start by defining an organization’s Critical Data Elements (CDEs). These are the data elements that are an integral part of the core function of a specific company. CDEs are unique to each business. Net income is a common CDE for most companies as it is important for reporting to investors and other shareholders, etc.
Since every company has different business goals, business models and organizational structures, each company’s CDEs will be different. For example, in retail, CDEs can relate to design or sales. On the other hand, healthcare companies will be more interested in ensuring the quality of regulatory compliance data. While this is not an exhaustive list, business leaders should consider asking the following questions to help define their unique CDEs: What are your critical business processes? What data is used within those processes? Are these data elements involved in regulatory reporting? Are these reports checked? Will these data elements guide initiatives in other departments within the organization?
By validating and restoring only the most important elements, organizations can scale their data quality efforts in a sustainable and resourceful way. Ultimately, an organization’s data quality program will reach a maturity level where there are frameworks (often with some degree of automation) that categorize data assets based on predefined elements to eliminate inequalities within the enterprise.
4. More Visibility = More Accountability = Better Data Quality
Companies create value by knowing where their CDEs are, who has access to them and how they are used. Essentially, there is no way for a company to identify their CDEs if they don’t have proper data governance in place to begin with. However, many companies struggle with unclear or non-existent ownership of their data stores. Defining ownership before adding more data stores or resources promotes commitment to quality and usability. It also makes sense for organizations to set up a data governance program in which data ownership is clearly defined and people can be held accountable. This could be as simple as a shared spreadsheet dictating ownership of the set of data elements or it could be managed by an advanced data management platform, for example.
Just as organizations must model their business processes to improve accountability, they must also model their data in terms of data structure, data pipelines, and how data is transformed. Data architecture attempts to model the structure of an organization’s logical and physical data assets and data management resources. Creating this type of visibility is at the heart of the data quality problem, i.e. without understanding the *lifecycle* of data – when it is created, how it is used/transformed and how it is executed – it is impossible to ensure true data quality.
5. Data overload
Even when data and analytics teams have established frameworks to categorize and prioritize CDEs, thousands of data elements remain that must either be validated or restored. Each of these data elements may require one or more business rules specific to the context in which they will be used. However, those rules can only be assigned by the business users who work with those unique data sets. Therefore, data quality teams will need to work closely with subject matter experts to identify rules for each unique data element, which can be extremely compact, even if they are prioritized. This often leads to burnout and overload within data quality teams as they are responsible for manually writing a large amount of rules for a variety of data elements. When it comes to their team members’ data quality workload, organizations need to set realistic expectations. They may consider expanding their data quality team and/or investing in tools that leverage ML to reduce manual work on data quality tasks.
Data is not just the new oil of the world: it is the new water of the world. Organizations may have the most complex infrastructure, but if the water (or data) flowing through those pipelines isn’t drinkable, it’s useless. People who need this water need to have easy access to it, they need to know it’s usable and not contaminated, they need to know when the supply is low and, finally, the suppliers/gatekeepers need to know who has access to it. Just as access to clean drinking water helps communities in several ways, improved data access, mature data quality frameworks, and a deeper data quality culture can protect data-dependent programs and insights, driving innovation and efficiency within organizations around the world.
JP Romero is technical manager at Kalypso
Welcome to the VentureBeat Community!
DataDecisionMakers is where experts, including the technical people who do data work, can share data-related insights and innovation.
If you want to read about the latest ideas and up-to-date information, best practices and the future of data and data technology, join us at DataDecisionMakers.
You might even consider contributing an article yourself!
Read more from DataDecisionMakers
This post Top 5 Data Quality and Accuracy Challenges and How to Overcome Them
was original published at “https://venturebeat.com/2022/04/24/top-5-data-quality-accuracy-challenges-and-how-to-overcome-them/”