Modern Approaches to Data Management Using Data Governance
In this article
By 2025, the amount of data generated each day will reach 463 exabytes globally, 80% of which will be unstructured. We are facing a multitude of data opportunities, such as improving data literacy, advancing in data maturity, or parsing data sprawl. As organizations scale their digital operations, issues inevitably arise with managing all the data an organization owns. Common problems include difficulty linking data from different departments, unclear data terms, ambiguous or incorrect reports due to lack of data source tracking, and difficulty adhering to ever-changing regulations. A lack of structure, oversight, and visibility in an organization's data architecture limits their ability to access and utilize the data to its full potential. To address these roadblocks, approaches to data management have matured.
In this article, we explain terms you may have heard of when exploring modern approaches to data management – data estate, data mesh and data fabric. Data mesh and data fabric are two data architectures that help organizations manage their ever-growing data estate. These solutions are supported and enabled by data governance during implementation, which helps to deliver the most value from your data estate.
Every organization has a data estate. A data estate refers to all the data that an organization manages and may also be referred to as a data ecosystem. The goal of a data estate is to be an infrastructure that allows the organization to manage and consume that data in near real-time. If a data estate is all the data managed by an organization, you may be wondering: how does a data estate differ from a data lake or data warehouse?
A data estate encompasses data lakes, data warehouses, and any legacy data storage existing in the organization. An organization's data storage architecture may include data warehouses (1990-2000s), along with data lake (2005), lake house (2015) and GPU-accelerated databases (2016). A data estate operates as the recognition that these systems inter-operate as a data estate to provide business value. Data estates are the next step in modernization. The data estate view combats issues with having an organization's data storage exist as segmented units, such as minimal ability to drill down and mine data or having to wait for EOD batches of data.
Data fabric and data mesh are two architectural methodologies that can be adopted in building out your organization's data estate. While they represent two different ways to help manage an organization's data, they are both most successful when implemented under the direction of data governance. The formalized approach helps create an effectively governed data estate by combining the need to guide people's behavior while implementing organizational and technological change.
As an organization advances in its data maturity, there are several frameworks used to manage their data across storage, processes, and analysis in both cloud and on-premises architecture.
Traditional data platform architecture framework focuses on storing static data in data warehouses & legacy data repositories, or in central enterprise data lakes. However, data needed for processing is constantly maturing and increasing in variety, including data types such as real-time, streaming and unstructured data. Next-gen architectures capable of handling batch-oriented data and streaming data are desired by many organizations. These include Lambda and Kappa software architecture. Lambda architecture allocates separate layers for batch and streaming analytics, whereas Kappa architecture combines the two into a single technology stack, allowing for lower complexity. Lambda and Kappa architecture represent examples of frameworks that have been brought about by increasing variety in data size, type, and business need. However, these frameworks are not the end of the road. In an ever-evolving data landscape, approaches have been developed to address emerging needs in managing these frameworks.
Two examples of modern management approaches to data architecture are data mesh and data fabric. These approaches to managing data architecture are the next step to maturing your data estate after knowing your organization's data assets and business needs - knowing who should have ownership over your data workflows, and how to best access your data.
Fundamentally, data mesh and data fabric differ in that data fabric is a service fabric – it describes how data is connected and moves around your organization. Data mesh, on the other hand, describes how your data may be logically grouped together through a business domain approach.
|Domain-driven approach which distributes management of data assets by data stewards so that each domain manages their own data pipelines||Metadata-driven approach to unifying data through data governance policies instead of centralizing all data in data lake|
|Business units manage own data pipelines for agility||Centralized view of data estate for easy management|
|Organizations with siloed business units||Organizations with a distributed workforce or regional segmentation|
|Documentation to facilitate inter-unit communication||Centralized service framework for data asset|
The distributed data mesh is a domain-driven management approach to accessing and transforming data from a central data lake or repository. The data mesh is a bottom-up approach to data architecture and focuses on being user-centric. The data mesh is an approach where teams within an organization are given ownership and management over the data sources, products and infrastructure that the team uses, as opposed to a centralized management system.
Distributed and decentralized across the teams that need to use and process specific segments of data in an organization's data storage, this approach allows for flexibility and agility across batch-oriented and streaming data. By giving individual teams ownership over the management of the data that they use, the socialization of best practices is tailored with in-house knowledge and is targeted to specific teams for adoption.
Organizations that may find a data mesh useful to implement include Agile DevOps teams working simultaneously on different products. Apart from uses in tech, we can see data mesh being useful for other industries like education, where functionally autonomous groups such as local school districts operating under a state system may value the freedom to manage their own data pipelines. Additionally, in the case of mergers and acquisitions with teams or departments that remain independent and will continue to use differing software vendors, a data mesh framework may help with consolidating the knowledge of data pipeline management.
A data fabric is a top-down management approach to data architecture. This architecture is driven by Metadata (data that provides information about other data) and aims to create fluidity across data environments. A data fabric is a technology-centric, unified management that allows data storage to remain distributed. The architecture aims to deliver a single umbrella of technology that virtually overlays various data repositories while accounting for the requirements that exist among the independent tools and systems.
The centralized approach of a data fabric allows for increased accessibility, innovation and insights, and enhanced regulatory compliance. It also negates issues associated with data silos, preventing data isolation issues and increasing collaboration capabilities. The increased visibility offered by a data fabric allows for more comprehensive, data-driven decision making.
So, how can an organization deliver the highest value from these modern management approaches?
One of the best practices comes from building and maintaining data governance across an organization's data estate. Data governance provides the guidelines for data stewardship, data cataloging, and data lineage & glossary creation to improve the utility and accessibility of data in an organization. This increases the visibility of data operations within an organization and allows users to deliver high-value insights from data faster and easier.
Implementation of a data mesh can take place in the form of data governance programs, including a data stewardship program to establish data owners and data managers. Additionally, through visualizing your organization's data lineage, users throughout the organization can view exactly which teams have ownership over assets and their position in data pipelines. Data governance building and management tools such as Collibra and Microsoft Purview aid with centralizing all your organization's data governance policies for easy access.
Meanwhile, the implementation of a data fabric, which includes the service framework for all of the organization's various data technology, is optimized for access through the standardization of data practices across data storage platforms. In that sense, effective data governance supports an effective data fabric. Utilizing data stewardship programs, establishing clear data catalog and data lineage, and standardizing data dictionaries and data glossaries set the "guardrails" that provide the meaning and structure critical for the business operations and use cases of the data fabric. In turn, the architectural capabilities of a data fabric help support data governance principles. The existing policies and securities are maintained at the local level, ensuring compliance regardless of where the data is accessed. It also gives the opportunity for enterprise-level policy implementation, further reducing the risk of data exposure.
An important note to keep in context - data fabric and data mesh are the next level of maturity in data architecture and management. Organizations must recognize that implementing from a data governance framework helps set up the necessary principles to promote maturity that will come after data fabric or data mesh. No matter how your data architecture grows and changes, your organization will make sure that the right people have access to the right data at the right time.
In conclusion, this is how data governance can meaningfully impact your organization's modern data management:
- Data Estate: Data governance enables answering questions about all your data assets through building principles and policies stewardship
- Data Mesh: Data governance supports logical business groupings and the functionality of their data for cross-functional use across business domains through a data catalog
- Data Fabric: Data governance supports data apps and services by mapping out how data is moving around your organization through creating data lineages
Now that the conceptual frameworks used in building a data architecture are more familiar, other considerations may be necessary to think about when actually implementing it in your organization's data architecture. To get started, here are common pitfalls to look out for when assessing your company's current data management practices in alignment with business outcomes.
Our Data Governance practice has had multiple years of experience supporting organizations in building data platform solutions. We've found that data governance is often pivotal in supporting data management across an organization's data and cloud journey, be it through the building and maintenance of a durable data estate, organizing data in domain structures a la data mesh, or service integrations across an enterprise.
Does your organization have a more complex environment? In relation to data architecture building and transformation, we create a tailored approach in identifying gaps and high impact solutions for an organization. We help with the assessment of an organization's data maturity and the development of a data governance program, stewardship bodies and mechanisms for a meaningful data architecture that serves people, process and technology. Feel free to request a workshop with our data governance experts today.
- Microsoft: Data Estate Migration and Modernization https://docs.microsoft.com/en-us/learn/modules/data-estate-migration-and-modernization/
- IBM: What is a data fabric? https://www.ibm.com/topics/data-fabric
- K2View & Gartner: Data Fabric https://www.k2view.com/what-is-data-fabric
- Thoughtworks: How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh, Zhamak Dehghani (May 20, 2019) https://martinfowler.com/articles/data-monolith-to-mesh.html
- Oracle: What is Data Mesh? https://www.oracle.com/integration/what-is-data-mesh/
- DataNami: Data Mesh Vs. Data Fabric: Understanding the Differences, Alex Woodie (October 25, 2021) https://www.datanami.com/2021/10/25/data-mesh-vs-data-fabric-understanding-the-differences/