Imagine a searchable data management system that would enable you to review crowdsourced, categorized and classified data. Consider that this system would apply to all types of data — structured and unstructured — and become more robust as more users analyze it.
Data governance, the organization of your data and the ability to have confidence in its quality, is the searchable system I describe. Governance is also an important aspect of managing a data lake because it’s the first step in building a solid data management foundation that helps organizations to have full confidence in their data.
You can read more about the importance of a trusted foundation in this article. For now, let’s focus on why that trusted foundation is so crucial to maximizing the value of your data.
Governance can make or break your data lake
When Big Data became the standard, with its huge volumes and variated types, organizations commonly dumped their data into Hadoop-like solutions known as data lakes. These data lakes were supposed to resolve integration and agility issues for businesses.
But if nobody knows what data is in the data lake, you can’t do much with it. The data lake becomes a new problem to solve. That’s where governance comes in. A governed data lake helps you avoid the pitfalls of a data swamp and can provide business value with a dedicated purpose.
Architecture of a data lake
Governance tools validate and enhance the quality of data and are designed to protect the data from misuse. These capabilities ensure data is up to date, accessible and eventually removed at appropriate points in its lifecycle.
While a data lake can offer flexible access to data, a system of governance is required to ensure the data is security-rich and useful. Effective data lakes are made up of several elements that promote self-service access throughout your organization.
There are four key building blocks of a governed data lake:
- Enterprise IT data exchange can extract, analyze and exchange data between data lakes and enterprise IT systems. It cleanses data and monitors data quality.
- Catalog services describe the data in the data lake: what it means, how it’s classified and the resulting governance requirements this places on the data.
- Governance help manage the data in the data lake and apply quality, security and privacy policies to the data stored in the lake.
- Self-service access consists of three sets of services that provide on-demand access to the data lake. For analytics users, this enables access to raw data as it’s stored. For line-of-business teams, the service provides normalized data in simplified data structures. For risk and compliance teams, the service provides governed data for audits.
Types of data consumers
Users consume data in different ways. Understanding the varied needs of each user group is an important aspect of data governance.
- Self-sufficient builders are developers and data scientists who manage data or incorporate analytics in the operating system of their applications.
- IT builders are data specialists and enterprise developers who optimize data quality, manage metadata or engage with analytics.
- Self-service consumers are knowledge workers who often work within a specific line of business.
- Solution consumers are line-of-business executives who create systems for specific outcomes.
What a governed data lake can do for you
Creating a governed data lake requires some effort at the outset, but the result is worth it. Once your data lake is governed, you have ongoing access to a scalable system that can handle your data as it grows. Users across the enterprise are empowered to access the data they need when they need it, and they can make decisions knowing that the data is accurate. You can save time on searching for data and instead get faster time to value.
For more information about governed data lakes, read this ebook on governed data lakes for business insights.