Blog & Company Updates

Getting Started with Azure Databricks

As part of our ongoing series on Azure Databricks, I’ll walk you through getting started by creating your own Databricks Service and Databricks cluster. First off, it’s important to know that Databricks is not available with an Azure free subscription, you must have an Azure pay as you go account. However, there is a free 14-day premium trial available.

My video included below is a demo that will take you step by step in creating a Databricks Service and cluster.

  • To get started, you’ll need to log into the Azure portal and select the plus (+) to create a new resource. You can find Databricks in the list in the analytics link or by doing a search.
  • Once selected, the Azure Databricks Service page will open. You’ll create a new resource and enter the name for your Databricks workspace. Then select the location and pricing tier and click create.
  • Once the deployment is complete, we can launch a Databricks workspace by clicking on the go to resource button.
  • Databricks uses Azure Active Directory for authentication. Once you’ve signed in, the Databricks workspace will open and the next step is to create a cluster.
  • Select new cluster on the launch page or click the cluster icon on the left side menu.
    • This will open the cluster manager page. Select create cluster, enter a name, and select the cluster mode.
      • High concurrency is optimized for concurrent workloads with SQL, Python and R; it does not support Scala.
      • Standard mode is recommended for single clusters and supports all languages.
    • Next, select the Databricks runtime version. The pull-down lists all the supported versions and beta versions currently available. There are also 2 machine learning variants available – standard or graphical processing unit.
    • Auto pilot options include enable auto scaling which toggles between a variable or static number of workers.
    • The auto terminate timebox allows you to select a specific period and it will shut down the cluster after that time has elapsed with no jobs running.
    • The next field is where we’ll select the types of workers for our cluster, either a fixed number or a minimum and maximum, depending on if you enabled auto scaling.
      • If you choose a fixed sized cluster, Databricks will always use that number of workers.
      • If you provide a range, Databricks will pick the appropriate number of workers required for the job. The system will warn you if the account doesn’t have enough CPU’s for the level of workers selected based on validation of processing capabilities. If you get a warning, you’ll have to lower the max level of workers.
    • Now, we need to choose the kind of workers from the dropdown. The bigger you go, the faster the speed but also will be more expensive. In my demo, I select a lightweight general-purpose worker with a DBU (Databricks Unit) of .75. This is the unit of processing capability per hour and prices range from 7 cents to 55 cents per DBU.
    • I’m also going to use the same driver type as the worker although I could use a beefier driver. The driver node runs the main functions and executes the parallel operations on the worker nodes.
    • Now our information is all selected and we click the create cluster button.

We now have a Databricks workspace with a running cluster that we’ll use in our Databricks notebook development. That is how how easy it is to get started with Databricks.

Need further help? Our expert team and solution offerings can help your business with any Azure product or service, including Managed Services offerings. Contact us at 888-8AZURE or  [email protected].

Author

  • Senior Consultant. I am an Information Technology Professional who likes to solve puzzles - I am a solution finder! I am always looking to learn new things and the right way to create and implement the things I learn - again with a focus of being efficient and productive.

Leslie AndrewsGetting Started with Azure Databricks