In today’s post I’d like to talk about Azure Data Factory and the difference between the lookup and stored procedure activities. The lookup activity in Data Factory is not the same as the lookup transformation in integration services, so if you’re coming from an integration services background like SSIS, this may be a bit confusing at first using Data Factory.
Let me try to clear up some confusion. The lookup activity in Azure Data Factory (ADF) is used for returning a data set to a data factory, so you can then use that data to control other activities in the pipeline. The data set from a lookup can be either a single row or multiple rows of data.
A typical scenario for using the lookup would be to return one row of data that may include parameters to be used as inputs for other stored procedures where you have singleton values. You could also use the lookup to return a data set of items to iterate through a ForEach Loop, for example.
To generate the data set for a lookup activity, let’s assume we’re pulling some data from SQL Server, I can either use a query or a stored procedure to return that data for the lookup. So, if I’m using the stored procedure in the lookup, why wouldn’t I use the stored procedure activity? Because the stored procedure activity doesn’t return any output.
The rule of thumb is if you have a stored procedure that returns some data to the factory that you need to use in your pipeline, you’ll want to use the lookup activity. If your stored procedure does not return any output, but just performs an operation on the backend of your database, like writing to a log table, then you’ll use the stored procedure.
Let me share a couple examples of how you would use the output of the lookup activity:
Example 1 –Shows how you can reference the output when the activity returns just a single row of data. Here I have an activity called Start New Extract. I’ve got one row of data and to reference that I can use that first row, and then in that row of data I have a field called LoadLogKey, so this syntax was used as input for a parameter for another step in my activity in my pipeline.
Example 2 – Shows how I can use the output of a stored procedure that returns multiple rows of data. Here I’m showing using this data set as the list of items to loop through in a ForEach Loop. The syntax is a bit different in this case. I have an activity that I called GetGUIDstoProcess, and that returns a list of IDs that I want to loop through in a ForEach Loop. I can do that by referencing the activity name and then the .output.value. That value is the entire data set and I can later reference specific rows or columns.
I hope this was helpful if you’re starting out with Azure Data Factory.
Need further help? Our expert team and solution offerings can help your business with any Azure product or service, including Managed Services offerings. Contact us at 888-8AZURE or [email protected].