Microsoft’s Azure SQL Database and Azure SQL Data Warehouse promises reliability, scalability and ease of management.  With a few clicks and a few dollars, the lowly workgroup-level server can grow to consume nationwide traffic.  Such a feat is a miracle of cloud computing, but to take advantage of it, the data needs to get there.  All the computing capacity in the world is useless if it doesn’t have access to the data.  This post is going to outline just a few ways you can get that data up in the cloud.

Migrating a solution to Azure is far more than the simple transfer of data to the cloud.  It requires selection of the target platform, a migration strategy, security planning, database compatibility testing, application testing and more.  In short, it’s potentially a huge topic. Here are a few notes on what will and will not be covered.

  • This article focuses on migrating to Azure SQL Database (v12) and Azure SQL Data Warehouse.
  • It assumes that you can transfer your data in bulk, preferably in a downtime window. Some of these methods work in incremental transfers. However, live migration is not feasible with these methods.
  • Any compatibility or security issues need to be ironed out before production transfer.
  • Plus, you might want to investigate ways to optimize your data structures for the new environment, particularly if you’re migrating to Data Warehouse.
  • There are four primary approaches to transferring the data, and each approach has multiple implementations.

 migrate-data-azure-db.jpg

Approach 1 – Data Tier Export/Import Approach

The data tier export/import approach uses BACPAC files, which are essentially database backups that are compatible with Azure.  Both methods are of limited use, but their one-shot nature might be appealing for small datasets.

Using a Migration Wizard

PROS: Transfers database as a single unit.  If you can use it, it’s a real “easy button.”

CONS: Requires compatible database, interactive only, no error handling

APPROPRIATE FOR: Testing

With the Azure SQL Database Migration Wizard or the in-preview Data Warehouse Migration Utility, you can easily migrate a database to Azure.  It will do all the work, including creating the database and database objects as well as transferring data.   It’s much like an automated backup and restore.

However, this method is of limited use.  If your database does not pass the compatibility check, you will be forced to use another method.  Additionally, it cannot schedule a migration for later.  Instead, it transfers the backup while you wait.  Finally, it cannot be used incrementally.

Because of its limitations, it should only be used for testing purposes.

Export/Import BACPAC files

PROS: Transfers database as a single unit. Can be integrated into batch scripts.

CONS: Requires compatible database, limited error handling

APPROPRIATE FOR: Simple and smaller deployments

The Migration Wizard uses these BACPAC files in the background, so many of the same limitations and recommendations apply here.  BACPAC files are essentially database backups that are compatible with Azure SQL Database.

When divorced from the Migration Wizard, they can be more useful for deployment.  PowerShell scripts can back up a SQL database to a BACPAC file, transfer it to Azure and deploy it to an Azure SQL database. Those scripts can be scheduled like any other PowerShell script.  Thus, it’s mostly useful for small databases whose data can easily migrate within the transfer window.

Approach 2 – Direct Transfer

The direct transfer approach copies data directly from the source SQL database to its analogue in Azure.  There are two different options for this approach, but neither significantly outperforms the other.  The limiting factor is ultimately the network connection, so the choice comes down to other factors.  When loading the data, be sure to scale the target database to maximize throughput.

SSIS Direct Transfer

PROS: Flexibility, error handling, scheduled, incremental transfers, BIML support

CONS: SSIS is not part of all SQL Server installations, cannot stage to Blob Storage without Azure Feature Pack

APPROPRIATE FOR: Most deployments (up to Gigabytes)

SSIS is a tool that many DBAs and ETL experts know and love.  Because Azure SQL Database and Azure SQL Data Warehouse use native SQL Server connections, the existing SSIS tool set can be used to migrate the data to Azure as if it were migrating to any new SQL Server installation.  BIML can simplify development of these loading packages immensely.

There are no options, however, for bulk compression of the data.  That caps the size of any single data transfer.  However, it can incrementally transfer the data over days or weeks, if necessary.

It’s a good general-purpose solution for all but the largest of deployments.

Azure Data Factory (Direct Transfer)

PROS: Scheduled, incremental transfers, BIML support

CONS: Error handling is limited, requires Microsoft Data Management Gateway

APPROPRIATE FOR: Small to medium deployments

Azure Data Factory is the closest analogue to SSIS in Azure’s platform.  It is used to coordinate data transfers to or from an Azure service.  ADF pipeline definitions can also be built with BIML.

Azure Data Factory can easily handle large volumes.  However, because it can so easily add a staging step as an intermediary, it’s only recommended for small to medium deployments when using a direct transfer.

Approach 3 – Staging in Azure Blob Storage

This approach breaks the deployment into two steps: staging and load.  The staging step migrates your data from your server to Blob Storage, and the load step gets it from staging in Blob Storage to its destination.  It’s useful for larger deployments.

Staging

SSIS with Azure Feature Pack

PROS: Flexibility, error handling, scheduled, incremental transfers, BIML support

CONS: SSIS is not part of all SQL Server installations, requires additional Azure Feature Pack, not appropriate for landing in Azure SQL DB

APPROPRIATE FOR: Large deployments to Azure SQL Data Warehouse

With SSIS in a staged approach, it is only appropriate to use with PolyBase to land in Azure SQL Data Warehouse.  But it does make an effective tool for the use case.  It can use data flows to stage the data in Blob Storage and issue queries against Azure DW to use PolyBase external tables.  Again, BIML makes automating package development much easier.

Azure Data Factory

PROS: Scheduled, incremental transfers, BIML support

CONS: Error handling is limited, requires Microsoft Data Management Gateway

APPROPRIATE FOR: Large deployments

All of the previous comments about ADF apply here.  Adding a staging step in Blob Storage to an ADF pipeline is easy.  Its biggest draw here is that it’s the only tool here that natively stages in Blob Storage and loads Azure SQL DB or DW in a single pipeline.

If you’re staging to Blob Storage over a network, this is the easiest one to set up and use.

Azure Import/Export Service

PROS: Data throughput, does not use network resources

CONS: Only useful for large batches of data, lag, complexity of export/import

APPROPRIATE FOR: Huge deployments

The Azure Import/Export Service allows a customer to ship hard drives to an Azure datacenter, where the contents are loaded to Blob Storage within the datacenter itself.   This is potentially faster than most any network connection can handle, so long as the volume of data is extremely high.

Here, a user would export the database contents to flat files using bcp or SSIS packages. They’d transfer the files to hard drives and ship them to Microsoft.

For less than the recommended 20 TB data transfers, it’s less useful.  There’s shipping and loading lag, fragmentation of the process, and less control.  However, the decreased load on networks means that Microsoft recommends this method with volumes greater than 20 TB.

Load

PolyBase

PROS: SQL queries, transfer speed

CONS: Does not work with Azure SQL DB

APPROPRIATE FOR: Azure DW

When using Azure Data Warehouse, PolyBase is the fastest way to import data from Blob Storage. PolyBase can read the flat files as tables, making import just 2-3 SQL queries per table.  It’s best paired with scripting to automate the process.

Approach 4 – Staging in a SQL Server Azure Virtual Machine

PROS: Transfer efficiency

CONS: Multi-step process

APPROPRIATE FOR: Large to huge deployments

Network bandwidth within data centers is much greater than bandwidth to the Internet at large.  For huge SQL databases, it may be desirable to transfer a compressed backup of the data to a temporary SQL Server VM in Azure.  There, you can transfer the data to an Azure Database using internal bandwidth, resulting in a net speedup.

It’s useful for larger deployments only.

You can transfer the database backup files using any file transfer mechanism, as the endpoint is another Windows (virtual) machine.  You can use ftp, sftp, Windows file sharing, or even the Azure Import/Export Service.   Once the backups are restored in the VM, any of the direct transfer methods are applicable.

Wrapping It Up

We’ve gone through four approaches to transfer your data into an Azure SQL Database or SQL Data Warehouse.  This outlines appropriate solutions for all sizes of transfers, from the smallest to the largest.  When in doubt, use the tools you’re most comfortable using.  What you might potentially lose in transfer efficiency is often gained back in speed of development and QA resources.

Please contact us for more details or if you’re looking for design or delivery support for your Azure DB or Azure DW project.