- Data movements between public and private networks either on-premises or using a virtual private network (VPN). They were known as data management gateways in V1 and Power BI.
- Public: They are used by Azure and other cloud connections. There's a default integration runtime that comes with ADF.
- Private: They are used to connect private computer resources such as SQL Server on-premises to ADF. We need to install a service on one Windows machine in the private network. That machine can connect to the enterprise resources and send the data to ADF via the service installed on it.
- SSIS package execution—managing SSIS packages in Azure. This is one of the main topics of this book. Chapter 3, SSIS Lift and Shift, is completely dedicated to this feature.
Linked services now have a connectVia property to be able to use the Integration Runtimes that we mentioned in this chapter before. They can now connect to a lot more of data stores than it was possible before.
Datasets are the same as they were in V1, but we don't need to define any availability schedules in them now. This means that they have more flexibility in their usage. In conjunction with Linked Services, the datasets have now access to a whole lot of new data stores: sources and destinations.
- On demand via .NET, PowerShell, REST API, or Python
- Execute pipeline: Calls another pipeline in the same factory.
- For each activity: Executes activities in a loop similar to any
for eachloop in structured programming languages.
- Web activity: Used to call custom REST endpoints.
- Lookup activity: Gets a record from any external data. The output can later be used by subsequent activities.
- Get metadata activity: Gets the metadata of activities in ADF.
- Until activity: Loops the execution of activity sets until the condition is evaluated to true.
- If condition activity: This is like any
ifstatement in standard programming languages.
- Wait activity: Pauses the pipeline for a time before resuming other activities.
There is now a new SSI runtime that completely manages clusters of Azure VMs dedicated to running SSIS in the cloud. Packages are deployed in the same manner that they are deployed on-premises when using the Azure SSIS integration runtime. SQL Server Data Tools (SSDT) or SQL Server Management Studio (SSMS) can be used to deploy SSIS packages.
Spark clusters are now available in V2. Since Spark is very performant and now integrates more functionalities, it has become an almost essential player in the big data world. In the previous version of ADF, Spark clusters were available via MapReduce custom activities. In this version, Spark is now a first-class citizen, so there will be no more headaches when it comes to integrating it in our data flow.