Data Fabric vs Data Mesh: What’s the Difference?

We’re excited to bring Transform 2022 back in person on July 19 and pretty much July 20-28. Join AI and data leaders for insightful conversations and exciting networking opportunities. Register today!

As more and more processes go online during the pandemic, companies are applying analytics to better understand their operations. According to a 2021 survey commissioned by Starburst and Red Hat, 53% of companies believe data access has become more “critical” during the pandemic. The results are consistent with the findings of ManageEngine, Zoho’s IT division, which found in a 2021 poll that more than 20% of organizations increased their use of business analytics compared to the global average.

Thirty-five percent of Starburst and RedHat survey respondents said they wanted to analyze real-time business risk, while 36% said they wanted to drive growth and revenue through “more intelligent” customer engagement. But underlining the challenges in analytics, more than 37% of respondents said they were not confident in their ability to access “timely, relevant data for decision-making,” either because of disparate storage resources or development issues. of data pipelines.

Two emerging concepts have been pitched as the answer to hurdles in data analysis and management. One is a “data fabric,” a data integration approach that includes an architecture — and services running on that architecture — to help organizations orchestrate data. The other is a ‘data mesh’, which aims to reduce the challenges of data availability by providing a decentralized connectivity layer that allows businesses to access data from different sources in different locations.

Both data fabrics and data meshes can serve a wide variety of business, technical, and organizational purposes. For example, they can save data scientists time by automating repetitive data transformation tasks while simultaneously powering self-service data access tools. Data fabrics and data meshes can also integrate and extend data management software already in use for greater cost-effectiveness.

data substance

A combination of technologies including AI and machine learning, data fabric is akin to a fabric that stretches to connect data sources, types, and locations with methods of accessing the data. Gartner describes it as analyzing “existing, discoverable and derived metadata assets” to support the “design, implementation and use” of data in local, edge and data center environments.

Data Fabric continuously identifies, connects, purifies and enriches real-time data from various applications to discover relationships between data points. For example, a data fabric can monitor several data pipelines — the sequence of actions that take raw data from a source and move it to a destination — to suggest better alternatives before automating the most repeatable tasks. A data fabric can also “cure” failed data integration tasks, handle more complicated aspects of data governance, such as creating — and profiling — datasets, and provide ways to manage and secure data by restricting who can access what data and infrastructure.

To uncover the relationships between data, a data fabric builds a graph that stores interconnected descriptions of data, such as objects, events, situations, and concepts. Algorithms can use this chart for a variety of business analysis purposes, such as making predictions and resurfacing previously hard-to-find dataset stores.

As K2 View, a provider of data fabric solutions, explains: “The data fabric continuously provides… data based on a 360 view of business entities, such as a particular segment of customers, a set of business products, or all stores in a specific geography… Using this data, data scientists create and refine machine learning models, while data analysts use business intelligence to analyze trends, segment customers, and perform root cause analysis. The refined machine learning model is implemented in the data fabric, to be executed in real time for an individual entity (customer, product, location, etc.) – operationalizing the machine learning algorithm. The data fabric runs the machine learning model on demand in real time and feeds it with the complete and current data of the individual entity. The machine learning output is immediately sent back to the requesting application and stored in the data fabric, as part of the entity, for future analysis.”

Data fabrics often work with a range of data types, including technical, business, and operational data. Ideally, they are also compatible with many different data delivery “styles” such as replication, streaming, and virtualization. In addition, the best data fabric solutions provide robust visualization tools that make their technical infrastructure easy to interpret, enabling companies to monitor storage costs, performance and efficiency — plus security — regardless of where their data and applications reside.

In addition to analytics, a data fabric provides organizations with a number of benefits, including minimizing interruptions when switching between cloud vendors and compute resources. With data fabric, enterprises — and the data analytics, sales, marketing, network architects and security teams that work with them — can adapt their infrastructure based on evolving technology needs, connecting infrastructure endpoints regardless of where the data is located.

In a 2020 report, Forrester found that IBM’s data fabric solution could accelerate data delivery 60 times, while increasing ROI by 459%. But data fabric has its drawbacks – the most important of which are the complexity of the implementation. For example, data fabrics require exposing and integrating different data and systems, which can often format data differently. This lack of native interoperability can create friction, such as the need to harmonize and deduplicate data.

Data network

On the other hand, there is a data mesh, which breaks down large enterprise data architectures into subsystems managed by a dedicated team. Unlike a data fabric, which relies on metadata to make recommendations for things like data delivery, data meshes leverage the expertise of subject matter experts who oversee “domains” within the mesh.

“Domains” are independently deployable clusters of related microservices that communicate with users or other domains through various interfaces. Microservices are composed of many loosely coupled and independently deployable smaller services.

Domains usually contain code, workflows, a team and a technical environment, and teams working within domains treat data as a product. Clean, fresh and complete data is delivered to every data consumer based on permissions and roles, while “data products” are created to be used for a specific analytical and operational purpose.

To add value to a data mesh, engineers must develop an in-depth understanding of data sets. They become responsible for serving data consumers and organizing around the domain – that is, testing, deploying, monitoring and maintaining the domain. In addition, they must ensure that different domains remain connected through a layer of interoperability and consistent data governance, standards and observability.

Data meshes promote decentralization, on the plus side, allowing teams to focus on specific problems. They can also amplify analytics by leading with business context rather than jargon, technical knowledge.

But data networks have their drawbacks. For example, domains can inadvertently duplicate data, wasting resources. The distributed structure of data meshes — if the data mesh is not sufficiently infrastructure agnostic — may require more technical experts to scale than centralized approaches. And technical debt can grow as domains create their own data pipelines.

Using data meshes and fabrics

When weighing the pros and cons, it’s important to keep in mind that data mesh and data fabric are concepts – not technologies – and not mutually exclusive. An organization can adopt both a data mesh and a data fabric approach for some or all departments. For James Serra, formerly a big data and data warehousing solution architect at Microsoft, the difference between the two concepts lies in how users access data.

“A data fabric and a data mesh both provide an architecture to access data across multiple technologies and platforms, but a data fabric is technology oriented while a data mesh focuses on organizational change,” he writes in a blog post (via Datanami). †[A] data mesh is more about people and processes than architecture, while a data fabric is an architectural approach that cleverly addresses the complexity of data and metadata and works well together.”

Eckerson Group analyst David Wells cautions against becoming obsessed with the differences, which he says are far less important than the components that must be in place to achieve desired business objectives. “They are architectural frameworks, not architectures,” Wells writes in a recent blog post (also via Datanami). “You don’t have an architecture until the frameworks are adapted and adapted to your needs, your data, your processes and your terminology.”

That’s not to say that data fabrics and data meshes will remain relevant for the foreseeable future. While each includes different elements, they all share the same goal: to bring more analytics to an organization with a sprawling — and growing — data infrastructure.

VentureBeat’s mission is to be a digital city square for tech decision makers to learn about transformative business technology and transactions. Learn more about membership.

This post Data Fabric vs Data Mesh: What’s the Difference?

was original published at “https://venturebeat.com/2022/04/08/data-fabric-versus-data-mesh-whats-the-difference/”