The Roadmap to Building a Data Warehouse
Just like no two organizations have the same business needs, no two organizations will have the same journey implementing their data warehouses. However, there is some commonality in the steps and they often start with small steps and a focus on business benefits to ensure success and drive immediate value.
Phase 1: Establishing the Business Objectives
One of the biggest mistakes that organizations make is deciding to implement a data warehouse without defining the ways that the business will drive value from it. The “if you build it, they will come” approach simply doesn’t work, since a data warehouse built without a purpose almost never meets anyone’s needs.
It’s absolutely critical to keep the end in mind when you start out, and it is essential that business stakeholders, department leads, and data science and analyst users intended to be consumers of the information are engaged very early about their highest priority needs for data and the insights they hope to gain from it.
Most businesses will find that they already have a pent up demand for a data warehouse, but didn’t recognize it as being such: a growing backlog of requests to IT for data extracts or refreshed data sets, hundreds of similar or seemingly identical reports that may or may not even be used, departmental “islands” of data that are actually workarounds for not having a more centralized source of data, and repeated data cleanup “fire drills” when new projects startup or when there are concerns about the real source of “truth” or quality of data throughout the enterprise.
During this phase, organizations should inventory these needs and prioritize the ways in which data will enable improved business efficiency, increase customer impact, reduce institutional risk, and drive revenue opportunities. These should be specific objectives that are both strategic for long-term needs and tactical ones that address immediate pain points.
Phase 2: Taking Stock of Your Existing Environment
While focusing attention on the future, it is also important to get a good sense of what exists today to establish a baseline for the current data landscape. This is a combination of sources of data, flows between systems, technologies that are in use, processes, and policies that directly impact how data is captured, moved, and used.
This can be a daunting task, but knowing what you have and also recognizing what you don’t know about your data landscape can help to inform strategy, architecture, and platform needs for your data warehouse. Even just compiling a list of what you have today helps existing sources of data (what is persisted/generated internally and also received from external providers), where data is flowing today inside and outside the organization, the location, and purpose of the department or workgroup-focused point solutions, and listings of reports, datasets, and dashboards that exist today can be important in understanding your current state.
It should also be noted that understanding current processes and policies for data governance, security, privacy, and quality management should also be captured, since some of these may need to be revisited along the way.
Phase 3: Formulation and Platform Selection
After having a good understanding of the current landscape and outlining broad goals and objectives, organizations can then decide on their optimal deployment options and architectural design approach when structuring the data warehouse. They will select their data warehouse technology based on a number of factors, including sizing and scale, the volume of data flows, reporting and consumption needs, and privacy/security requirements.
There is no “one size fits all” solution, and many businesses find that the business needs change quickly as well, so it is important in this process to make decisions not just to solve today’s needs but also for what is known about future direction. The flexibility of the data warehouse platform and its ability to scale and grow to meet business needs is a primary factor in selecting the right solution since you may start small but will want to expand as needed.
Phase 4: Creating the Value Roadmap and Positioning Organizational Roles
Once clear goals and platform decisions are in place, this phase will establish the roadmap for implementation, usually in the form of quarterly goals for implementation. These can be a series of deployment phases, with the understanding that establishing the foundation may comprise much of the early work, with more expansion and delivery of tactical use cases accelerating over time.
When determining what goes in each phase, the business priorities should be a key driver in the decision process. However, each phase should consist of a combination of high-priority business needs that can drive immediate value balanced with ongoing infrastructure and platform enhancements to the data warehouse. This ensures that continuous value is delivered, but also that improvements to make the management and monitoring of the data warehouse are not forgotten along the way.
In parallel with this, it is also critical to put the right organizational roles in place, which includes aligning the implementation with business sponsors that already have use cases in mind where value can be clearly shown. There rarely is a single role that has ownership over data initiatives even when organizations have established a Chief Data Officer (CDO). Strategy, funding, and prioritization is often a joint effort of a broader Data Steering Committee with representation from business lines, IT, infrastructure, security, and audit. This level of partnership across a variety of roles and various skills is critical to ensure that a data warehouse is a success.
There are, however, some key roles that must be put in place to implement and maintain a data warehouse long-term.
- Product Owner. An overall owner within the organization is critical continually focusing on value-driven from data, whether it is for decision-making, revenue generation, or creating an improved client experience. They are responsible for speaking to the business need and are essential for maintaining enthusiasm and momentum over time.
- Executive Sponsorship. Where the Product Owner speaks drives the ongoing need, leadership at the highest levels in the organization must be proponents for the value and change that will be gained through having a data warehouse. While ensuring that funding is brought to the table is one responsibility, they also act as the steering committee that continually helps to align priorities to broader business goals. Having ownership at this leadership level is critical to maintaining business support for enhancements and expansion.
- Data Architects. This role within the team helps define the data warehouse structure so that it fits within the organization’s infrastructure, processes, and overall technology standards and direction.
- Data Engineers and Modelers. These specialists within the team build the data structures, data flows and integrations, and analysis and reporting capabilities into the data warehouse.
- Operations. Operations perform ongoing monitoring of capabilities of the data warehouse so that all of the proper procedures are in place for maintenance, sizing, and disaster recovery. They are involved with the ongoing performance tuning, sizing, and extension.
- Data Governance. Even the best technical implementations can be plagued with issues with data quality, standardization, controls, and usability of the information. Data Governance Owners, Stewards, and Data Owners are almost always required to wrap all of the technical implementation work with the process, roles, and tools to ensure that the trustworthiness of the data presented in the data warehouse is never called into question.
Phase 5: Creating the Data Warehouse Architecture and Design
This phase includes establishing the overall architecture of the data warehouse, designing individual data flows in and out, and even profiling existing data sources to confirm assumptions about the existence and quality of data that will be needed.
Some of the analysis that goes into this includes:
- The data volume that is generated daily and the frequency of data updates.
- The data type and structure and relation to different data sources.
- The degree of sensitivity and data quality.
- Any missing data, or if the data is high quality enough to meet the business requirements.
Decisions about hosting of the data warehouse (Cloud, on-premise, hybrid, etc.) are made at this time as well, and how security policies, encryption approaches, data access provisioning, and compliance aspects will be addressed.
Detailed design will include planning out how data will flow in and out of the data warehouse, ultimately resulting in highly detailed data mapping and lineage details. The tools that will also be used for consuming the data to create reports, dashboards, analysis, and models are also planned for and their implementation and rollout approaches are outlined.
Phase 6: Implementing the Data Warehouse
This phase is where all of the planning and design come together–building integration/ETL processes to move data in and out of the data warehouse, configuring the security controls, building and validating reports and dashboards, and working with analysts and data scientists to ensure their data needs are met.
Quality and performance testing is also conducted during this phase, and often involves reconciling data back to originating sources. However, in many organizations, an agile approach to ongoing implementation efforts often help to continually focus implementation efforts on the highest priority needs, which allows teams to incorporate new business needs and also account for improvements/enhancements to existing capabilities.
Phase 7: Launch
Launching an organization’s data warehouse includes the rollout of the platform and data migration, but it is also when business users are introduced to the data warehouse and maybe the first time that business intelligence or other analysis tools are made available to end-users. Careful communication, training, and ongoing support are all needed to make sure that the adoption of the data warehouse is not stalled due to a lack of understanding or apprehension about the use of the tools.
Phase 8: Business As Usual Operations and Continuous Improvement
After the data warehouse has successfully launched, organizations need to put operational processes in place to continually support its users and create a mechanism for feedback and enhancement requests. Most data initiatives, including data warehouse implementations, don’t just “finish” after a first rollout, and organizations must establish mechanisms for ongoing expansion and improvement over time so that the data warehouse grows and adapts to business, market, and end-user needs.
Most importantly, stakeholders must be involved at every step, and successes need to be spotlighted. It is easy to begin to capture metrics that show the impact of the data warehouse, whether it is an increase in projects velocity because data is readily accessible, a decreased business user dependence on IT to make information available, elimination of data quality problems that may have plagued implementations in the past, or even true revenue and efficiency gains based on trusted information that is available whenever it is needed.