In today’s digital world one of the most difficult tasks to build is a sound Data Governance framework specifically on data sovereignty, privacy, security and access. In an ever-evolving world businesses are starting to realize the value of their data and incorporating it as a core competency within their daily operations and job functions. Data is just not an afterthought anymore but a reality; the questions enterprises and startups are starting to ask are:
- How do I manage data?
- How do I secure data?
- How do I build data infrastructure?
- How do I stay compliant?
- How do I best utilize the data?
These are some basic questions to get the journey started, we at Caserta are here to help you with that journey for guidance, advisory, architecting and implementation. From experience the biggest hurdle that companies have faced is the data governance part.
Recently I went through this journey with a well-known and popular health food brand based in California that needed help building a compliance model along with implementing the solution for the brand new California Consumer Privacy Act law that passed this year. With a lean team and a handful of resources we were tasked with a herculean effort of taking their data lake and making it CCPA compliant in 4 months.
Since this dealt with locality (sovereignty) within the state of California one of the first questions I had asked was “I’d like to see your data governance model and your procedures on how you secure and serve data”. It was very minimal and data access was not well defined. They had been collecting data for several years within their data lake and one of their biggest pain points and major challenges was to retrofit a process that anonymized all the PII/PCI out of their data lake to become compliant.
The first thing I was tasked with was to build a data catalog and document all the data sources within the data lake to build a centralized standard to be able to understand and gain insight in what was being collected. The importance of ownership and accountability becomes a key component when applying governance methodologies, assigning data stewards to become the liaison and subject matter experts of what comes in and out of the data lake is an integral part of the process. Having the ability to communicate with the tech, product and business stakeholders to control what’s called “garbage in and garbage out” and translate to legitimate business requirements is a skill not for the faint of heart, they may sound like a unicorn but they do exist. Most of the time stewards have a technical background with a thirst for strategy and extroverted in nature since the bulk of their job is intra-team interaction.
Once those details were resolved the team and I received buy in from the higher level executives to architect, build and execute a solution. The biggest take away was that when the business realized that CCPA was just not a data problem but a business problem they were on board, along with the security and legal teams.
Process flow of what was implemented:
Definitions and Context
- Data Catalog: Metadata store that houses all information about stored objects in the data lake.
- Header Names
- Header Types
- Structure Types
- Classification (Non PII/PII)
- ETL/ELT Audit Logs
- Staging: Defined as a temporary landing plane for objects that are transient. This usually is placeholder for the objects in transit until some or none business rules are applied.
- Anonymizer: Distributed application that data in transit will pass through to identify PII in reference to a data catalog and deidentify calling the secured ID service to assign GUID to object and scrub PII.
- PII Secured Bucket (Optional): Another PII store that is KMS secured that houses all raw objects that have PII “as is”, within the matrix below this would have the highest security (SHR).
- Data Lake: Fully anonymized cloud object store
None – no material impact to objectives or reputation
Information intended for public access.
Low – may be detrimental to interests if/when exposed to external parties without a specific business need
Company-wide proprietary information.
Moderate – could be somewhat detrimental (legally, materially, reputation) if exposed to external parties without a specific business need
Proprietary information available only on a need-to-know basis.
Special Handling Required (SHR)
High – could cause substantial damage to brand, cause loss of material assets
Information protected by regulation or that grants access to other non-public information.
Being able to centralize this model and having teams take ownership and steward the data ecosystem the access patterns for permission got a lot simpler and the process guaranteed that all privacy was scrubbed out and if there was a need for PII it would be rule based and on an “as needed basis”. This empowered the business knowing that they can service the data the proper way while being able to sleep at night knowing that the data is secured and anonymized.
Caserta has been through this journey; we are here to help. From vetting a governance model, architecting it or implementing an end to end solution. We have the experts to compliment your current data ecosystem or build it out as needed.