Old Meets New with the All Purpose Data Lake
The rise of “big data” gave rise to flat data lake architectures. Flat data lake architectures are generally preferred for machine learning applications. But, facing ever growing data volumes, many organizations are still struggling with answering basic questions for simply running the business: 1. How many customers did I add/lose last week? 2. Did we meet our revenue forecasts last quarter?
These basic and mundane questions are usually better suited for traditional Business Intelligence tools. But can traditional BI tools and AI/machine learning tools use the same data structures efficiently? Yes, with thoughtful data architecture, it’s possible. In this talk, we will explore data architectures that make sense for both data scientists and business intelligence analysts. There’s no need to replicate data to serve two functions, one model can serve many. Borrowing from more traditional data mart modeling practices, establish key reference data as “dimensions” (e.g. customer, product, geography, promotion, campaign, etc.) and transactional activity as “fact” tables (e.g. web visits, purchases, trades, contact history, complaints, returns, etc.)
article continued below
These structures have been the foundational bread-and-butter for Business Intelligence tools for a generation. BI tools offer a structured way for a wide range of skilled business users and analysts to consume important data insights often having highly customizable interfaces. Thus, having easy access to discreet lists of reference data (dimensions) can make-or-break the user experience. Seemingly, in contrast, the needs of data scientists are different. Data scientists require as much data about an observation as possible (or as needed) joined properly together on the same row. To satisfy, both goals a flexible Data Lake Model is needed. Spark, with its nearly unlimited flexibility and power, is ideally suited to build and maintain this architecture This talk will discuss a flexible Data Lake architecture that serves all types of analytical “customers” and offer different.