Many companies are currently working to transform their traditional data warehouse systems into modern data architectures that address the challenges of today’s data landscape. These innovative systems are designed to give companies a competitive edge. With huge amounts of historical, operational, and real-time data, combined with the new and ever-improving tools to analyze, model, and mine data, businesses have a lot of power at their fingertips. Access to that data is helping forward-thinking companies find ways to outperform and out-innovate their competition. An Analytics Sandbox is one of the tools that’s helping them succeed.

The amount of time that it takes a company to turn their data into knowledge is critical. Traditional enterprise data warehouse (EDW) and business intelligence (BI) processes can sometimes be slow to implement and do not always meet the rapidly changing needs of today’s businesses. When efforts made to speed up delivery cycles have limited success, businesses may take things into their own hands. This promotes the propagation of spread-marts and poorly built data solutions. This is where the concept of the Analytics Sandbox comes in.

What is an Analytics Sandbox?

An Analytics Sandbox is a separate environment that is part of the overall data lake architecture, meaning that it is a centralized environment meant to be used by multiple users and is maintained with the support of IT.  Here are some key characteristics of a modern Analytics Sandbox:

Key Characteristics

  • The environment is controlled by the analyst
    • Allows them to install and use the data tools of their choice
    • Allows them to manage the scheduling and processing of the data assets
  • Enables analysts to explore and experiment with internal and external data
  • Can hold and process large amounts of data efficiently from many different data sources; big data (unstructured), transactional data (structured), web data, social media data, documents, etc.

The concept of an Analytics Sandbox has been around for a long time. Data warehousing pioneer Bill Inmon and industry expert Claudia Imhoff have been evangelizing about the idea since the late 1990s, although the co-authors referred to it then as “Exploration Warehousing” in their 2000 book by the same name. They even include the concept on many of their well-known Corporate Information Factory diagrams (see the yellow database objects).

Analytics_2_Myers.gif

Unlike Inmon and Imhoff’s Exploration Warehouse though, which only got data from the EDW, a modern Analytics Sandbox will commonly pull data from all layers of the data lake. It acts mainly as a playground for data scientists to conduct data experiments. As shown in the Modern Data Architecture, it resides in the lower levels of the data lake because it consumes a lot of raw/non-curated data. It may even end up feeding the EDW at some point.

Analytics_1_Meyers.png

Advantages of an Analytics Sandbox

There are many advantages to having an Analytics Sandbox as part of your data architecture. Perhaps most significant is that it decreases the amount of time that it takes a business to gain knowledge and insight from their data. It does this by providing an on-demand/always ready environment that allows analysts to quickly dive into and process large amounts of data and prototype their solutions without kicking off a big BI project. In other words, it enables agile BI by empowering your advanced users.

Another major benefit to the business and IT team is that by giving the business a place to prototype their data solutions it allows the business to figure what they want on their own without involving IT. When they decide that a solution is adding business value, it becomes a good candidate for something that should be productionized and built into the EDW process at some point. This saves both teams a lot of time and effort.

Could your business benefit from having an Analytics Sandbox? Interested in learning more?   Please contact us today.