Microsoft’s Azure SQL Database and Azure SQL Data Warehouse promises reliability, scalability and ease of management. With a few clicks and a few dollars, the lowly workgroup-level server can grow to consume nationwide traffic. Such a feat is a miracle of cloud computing, but to take advantage of it, the data needs to get there. All the computing capacity in the world is useless if it doesn’t have access to the data. This post is going to outline just a few ways you can get that data up in the cloud.
Migrating a solution to Azure is far more than the simple transfer of data to the cloud. It requires selection of the target platform, a migration strategy, security planning, database compatibility testing, application testing and more. In short, it’s potentially a huge topic. Here are a few notes on what will and will not be covered.
- This article focuses on migrating to Azure SQL Database (v12) and Azure SQL Data Warehouse.
- It assumes that you can transfer your data in bulk, preferably in a downtime window. Some of these methods work in incremental transfers. However, live migration is not feasible with these methods.
- Any compatibility or security issues need to be ironed out before production transfer.
- Plus, you might want to investigate ways to optimize your data structures for the new environment, particularly if you’re migrating to Data Warehouse.
- There are four primary approaches to transferring the data, and each approach has multiple implementations.
Approach 1 – Data Tier Export/Import Approach
The data tier export/import approach uses BACPAC files, which are essentially database backups that are compatible with Azure. Both methods are of limited use, but their one-shot nature might be appealing for small datasets.
Using a Migration Wizard
PROS: Transfers database as a single unit. If you can use it, it’s a real “easy button.”
CONS: Requires compatible database, interactive only, no error handling
APPROPRIATE FOR: Testing
With the Azure SQL Database Migration Wizard or the in-preview Data Warehouse Migration Utility, you can easily migrate a database to Azure. It will do all the work, including creating the database and database objects as well as transferring data. It’s much like an automated backup and restore.
However, this method is of limited use. If your database does not pass the compatibility check, you will be forced to use another method. Additionally, it cannot schedule a migration for later. Instead, it transfers the backup while you wait. Finally, it cannot be used incrementally.
Because of its limitations, it should only be used for testing purposes.
Export/Import BACPAC files
PROS: Transfers database as a single unit. Can be integrated into batch scripts.
CONS: Requires compatible database, limited error handling
APPROPRIATE FOR: Simple and smaller deployments
The Migration Wizard uses these BACPAC files in the background, so many of the same limitations and recommendations apply here. BACPAC files are essentially database backups that are compatible with Azure SQL Database.
When divorced from the Migration Wizard, they can be more useful for deployment. PowerShell scripts can back up a SQL database to a BACPAC file, transfer it to Azure and deploy it to an Azure SQL database. Those scripts can be scheduled like any other PowerShell script. Thus, it’s mostly useful for small databases whose data can easily migrate within the transfer window.
Approach 2 – Direct Transfer
The direct transfer approach copies data directly from the source SQL database to its analogue in Azure. There are two different options for this approach, but neither significantly outperforms the other. The limiting factor is ultimately the network connection, so the choice comes down to other factors. When loading the data, be sure to scale the target database to maximize throughput.
SSIS Direct Transfer
PROS: Flexibility, error handling, scheduled, incremental transfers, BIML support
CONS: SSIS is not part of all SQL Server installations, cannot stage to Blob Storage without Azure Feature Pack
APPROPRIATE FOR: Most deployments (up to Gigabytes)
SSIS is a tool that many DBAs and ETL experts know and love. Because Azure SQL Database and Azure SQL Data Warehouse use native SQL Server connections, the existing SSIS tool set can be used to migrate the data to Azure as if it were migrating to any new SQL Server installation. BIML can simplify development of these loading packages immensely.
There are no options, however, for bulk compression of the data. That caps the size of any single data transfer. However, it can incrementally transfer the data over days or weeks, if necessary.
It’s a good general-purpose solution for all but the largest of deployments.
Azure Data Factory (Direct Transfer)
PROS: Scheduled, incremental transfers, BIML support
CONS: Error handling is limited, requires Microsoft Data Management Gateway
APPROPRIATE FOR: Small to medium deployments
Azure Data Factory is the closest analogue to SSIS in Azure’s platform. It is used to coordinate data transfers to or from an Azure service. ADF pipeline definitions can also be built with BIML.
Azure Data Factory can easily handle large volumes. However, because it can so easily add a staging step as an intermediary, it’s only recommended for small to medium deployments when using a direct transfer.
Approach 3 – Staging in Azure Blob Storage
This approach breaks the deployment into two steps: staging and load. The staging step migrates your data from your server to Blob Storage, and the load step gets it from staging in Blob Storage to its destination. It’s useful for larger deployments.
Staging
SSIS with Azure Feature Pack
PROS: Flexibility, error handling, scheduled, incremental transfers, BIML support
CONS: SSIS is not part of all SQL Server installations, requires additional Azure Feature Pack, not appropriate for landing in Azure SQL DB
APPROPRIATE FOR: Large deployments to Azure SQL Data Warehouse
With SSIS in a staged approach, it is only appropriate to use with PolyBase to land in Azure SQL Data Warehouse. But it does make an effective tool for the use case. It can use data flows to stage the data in Blob Storage and issue queries against Azure DW to use PolyBase external tables. Again, BIML makes automating package development much easier.
Azure Data Factory
PROS: Scheduled, incremental transfers, BIML support
CONS: Error handling is limited, requires Microsoft Data Management Gateway
APPROPRIATE FOR: Large deployments
All of the previous comments about ADF apply here. Adding a staging step in Blob Storage to an ADF pipeline is easy. Its biggest draw here is that it’s the only tool here that natively stages in Blob Storage and loads Azure SQL DB or DW in a single pipeline.
If you’re staging to Blob Storage over a network, this is the easiest one to set up and use.
Azure Import/Export Service
PROS: Data throughput, does not use network resources
CONS: Only useful for large batches of data, lag, complexity of export/import
APPROPRIATE FOR: Huge deployments
The Azure Import/Export Service allows a customer to ship hard drives to an Azure datacenter, where the contents are loaded to Blob Storage within the datacenter itself. This is potentially faster than most any network connection can handle, so long as the volume of data is extremely high.
Here, a user would export the database contents to flat files using bcp or SSIS packages. They’d transfer the files to hard drives and ship them to Microsoft.
For less than the recommended 20 TB data transfers, it’s less useful. There’s shipping and loading lag, fragmentation of the process, and less control. However, the decreased load on networks means that Microsoft recommends this method with volumes greater than 20 TB.
Load
PolyBase
PROS: SQL queries, transfer speed
CONS: Does not work with Azure SQL DB
APPROPRIATE FOR: Azure DW
When using Azure Data Warehouse, PolyBase is the fastest way to import data from Blob Storage. PolyBase can read the flat files as tables, making import just 2-3 SQL queries per table. It’s best paired with scripting to automate the process.
Approach 4 – Staging in a SQL Server Azure Virtual Machine
PROS: Transfer efficiency
CONS: Multi-step process
APPROPRIATE FOR: Large to huge deployments
Network bandwidth within data centers is much greater than bandwidth to the Internet at large. For huge SQL databases, it may be desirable to transfer a compressed backup of the data to a temporary SQL Server VM in Azure. There, you can transfer the data to an Azure Database using internal bandwidth, resulting in a net speedup.
It’s useful for larger deployments only.
You can transfer the database backup files using any file transfer mechanism, as the endpoint is another Windows (virtual) machine. You can use ftp, sftp, Windows file sharing, or even the Azure Import/Export Service. Once the backups are restored in the VM, any of the direct transfer methods are applicable.
Wrapping It Up
We’ve gone through four approaches to transfer your data into an Azure SQL Database or SQL Data Warehouse. This outlines appropriate solutions for all sizes of transfers, from the smallest to the largest. When in doubt, use the tools you’re most comfortable using. What you might potentially lose in transfer efficiency is often gained back in speed of development and QA resources.
Please contact us for more details or if you’re looking for design or delivery support for your Azure DB or Azure DW project.