HDInsight

Transitioning from Traditional to Azure Data Architectures

Confession: I put a lot of subtexts in this blog post in an attempt to catch how people may be describing their move from SSIS to ADF, from SQL DBs, to SQL DWs or from scheduled to event-based data ingestion.  The purpose of this post is to give you a visual picture of how our well loved “traditional” tools of on-prem SQL Databases, SSIS, SSAS and SSRS are being replaced by the Azure tool stack.  If you are moving form “Traditional Microsoft” to “Azure Microsoft” and need a road map, this post is for you.

Summary of the Matter: If you only read one thing, please read this: transitioning to Azure is absolutely “doable”, but do not let anyone sell you “lift and shift”.  Azure data architecture is a new way of thinking.  Decide to think differently.

First Determine Added Value:  Below are snippets from a slide deck I shared during Pragmatic Work’s 2018 Azure Data Week.  (You can still sign up for the minimal cost of $29 and watch all 40 recorded sessions, just click here.)  However, before we begin, let’s have a little chat.  Why in the world would anyone take on an Azure migration if their on-prem SQL database(s) and SSIS packages are humming along with optimum efficiency?  The first five reasons given below are my personal favorites.

  1. Cost (scale up, scale down)
  2. Event Based File Ingestion
  3. File based history (SCD2 equivalent but in your Azure Data Lake)
  4. Support for Near Real Time Requirements
  5. Support for Unstructured Data
  6. Large Data Volumes
  7. Offset Limited Local IT Resources
  8. Data Science Capabilities
  9. Development Time to Production
  10. Support for large audiences
  11. Mobile
  12. Collaboration

Each of the reasons given above are a minimum one hour working session on their own, but I’m sharing my thoughts in brief in an effort to help you to get started compiling our own list.  Please also look at the following diagram (Figure 1) and note two things: a.) the coinciding “traditional” components and b.) the value add boxed in red.

Tom WardTransitioning from Traditional to Azure Data Architectures
Read More

Overview of HDInsight R Server

Copy of Overview of HDInsight R ServerToday I’ll wrap up my series on HDInsight with R Server. What R Server does is when you create an HDInsight cluster, you can select it as an option and it will provide data scientists, statisticians and R Programmers with on demand access to scalable and distributed methods of analytics on HDInsight.

Tom WardOverview of HDInsight R Server
Read More

Overview of HDInsight Interactive Query

Copy of Overview of HDInsight Interactive QueryLast week I began a series on HDInsight. Today I’m continuing that series with a focus on Interactive Query. Interactive Query leverages Hive which uses LLAP (Long Live and Process), also known as low latency analytical processing. This allows for interactivity with complex data warehouse-style queries on big data, that is stored in commodity storage, such as a blob or Data Lake Store.

Tom WardOverview of HDInsight Interactive Query
Read More

Overview of HDInsight Kafka

Copy of Overview of HDInsight KafkaContinuing with my HDInsight series, today I’ll be talking about Kafka. HDInsight Kafka will sound much like Storm but as I get into the nuts the bolts you’ll see the differences. Kafka is an open source distributed stream platform that can be used to build real time data streaming pipelines and applications with a message broker functionality, like a message cue.

Tom WardOverview of HDInsight Kafka
Read More

Overview of HDInsight Storm

Copy of Overview of HDInsight StormNext in my series on HDInsight, today I’ll be talking about Storm. HDInsight Storm is a distributed stream processing computational framework. It uses spouts which define information sources and bolts which are manipulations in processing to allow batch distributed processing of streaming data.

Tom WardOverview of HDInsight Storm
Read More

Overview of HDInsight HBase

Copy of Overview of HDInsight HBaseIn continuation of my series on HDInsight and the different clusters within it, today I’ll cover HBase. HBase is a NoSQL database that provides random access and strong consistency for structured, unstructured and semi-structured data.

Tom WardOverview of HDInsight HBase
Read More

Overview of HDInsight Spark

Copy of Overview of HDInsight SparkToday I’m continuing my series on HDInsight with the focus on Spark clusters. HDInsight Spark clusters provide the required baseline for in-memory cluster computing. This technology has gained momentum over the last few years as the required levels of memory have increased, as well as the hardware.

Tom WardOverview of HDInsight Spark
Read More

Overview of HDInsight Hadoop

Overview of HDInsight HadoopIn this week’s Azure Every Day posts, I’ll begin a series focusing on big data and the HDInsight offerings. If you don’t know, HDInsight is a fully managed, full spectrum open source analytics service for enterprises that allows you to use open source frameworks such as Hadoop, Spark, Hive, among others. It was introduced to Azure in 2013 and they’ve added more recent options, such as domain join clusters capabilities.

Tom WardOverview of HDInsight Hadoop
Read More

Do You Need a Relational Data Warehouse?

PRAG_Banner_AzureEveryday_600x250-1.pngAre you looking to do a major update to your data warehouse or looking to modernize? Many technologies have come about that are changing the landscape of what data warehouses are made of. In this Azure Every Day session, I’d like to talk about 3 new technologies in Azure and HDInsight that break the rules.

Tom WardDo You Need a Relational Data Warehouse?
Read More