Microsoft

Pipelines and activities in ADF and Azure Synapse Analytics

Man using tablet to look at charts

Overview:

A Data Factory or Synapse Workspace can have multiple pipelines. Pipelines are groups of activities that perform a specific task together. Data integration and ETL (Extract, Transform and Load) services in the cloud work together to orchestrate data movement and transform data with ease.

  • There are workflows that can ingest data from disparate data sources (called pipelines) that you can create and schedule.
  • A complex ETL process can be built visually with data flows or with compute services such as Azure Databricks or Azure SQL Database.
  • The transformed data can also be published for consumption by business intelligence (BI) applications to data stores such as Azure SQL Data Warehouse.
  • The goal of Azure Data Factory is to enable you to organize raw data into meaningful data stores and data lakes for better business decisions.

Code free ETL as service:-

Invest your time in building business logic and transforming data.

  • ETL design without code
  • Data from On Premise and other clouds is copied to Azure
  • Transforming data at stages

Key Components:

Synapse Analytics workflow consists of the following components:-

  • Pipelines
  • Activities
  • Datasets
  • Linked services
  • Data Flows
  • Integration Runtimes

1st

The components work together to create a platform for composing data-driven workflows that move and transform data.

2nd

Linked services:

A linked service can be compared to a connection string, which defines the connection information needed for Data Factory to connect with resources from outside. Datasets represent the structure of data, while linked services define how they are connected to the data source.

8th

Datasets:

A dataset is simply a reference to the inputs and outputs you want to use in your activities. The following are examples of source and destination datasets.

4th

Activity:

In a pipeline, activities represent processing steps. Data can be copied from one data store to another using a copy activity, for example.

5th

6th

Different types of activities can be added e.g. Data flow, Lookup or stored procedure,Databricks.

Pipeline:-

It is possible for a data factory to have more than one pipeline. A pipeline are logical grouping of activities that performs a unit of work. An activity in a pipeline performs a task together.

7th

Triggers:

Pipeline execution begins when a trigger determines that it needs to be fired. Different types of events can be triggered by different types of triggers. Pipelines can be scheduled or run immediately when user click on the Add trigger option in the pipeline.

Integration Runtime:

Azure Data Factory and Azure Synapse pipelines use the Integration Runtime (IR) as their compute infrastructure. In an activity, the action to be performed is defined. Data stores and compute services are defined by a linked service. Integration runtimes serve as a bridge between activities and linked services. Referred by the linked service, it is the compute environment where the linked activity runs.

Happy reading & learning.

Thoughts on “Pipelines and activities in ADF and Azure Synapse Analytics”

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Prathamesh Wanjari

Prathamesh work as an Associate technical consultant at Perficient in the Nagpur GDC, India. He's working in the data solutions team and have a firm knowledge of Azure data factory, Azure databricks, and Azure synapse analytics.

More from this Author

Subscribe to the Weekly Blog Digest:

Sign Up
Categories
Follow Us
TwitterLinkedinFacebookYoutubeInstagram