One of Azure’s ever-increasing number of new data platform services, Azure Data Factory provides the ability to execute and orchestrate Big Data workflows.
Azure Data Factory allows you to manage and produce information by offering an easy way to create, orchestrate, and monitor data pipelines.  Activities are combined into Pipelines, and these pipelines are used to move and transform structured, semi-structured and unstructured data among and between data sources inside and outside of the Azure ecosystem.  The entire process is then known as a “Factory”.

For instance, you can connect to your on-premises SQL Server, Azure SQL database, Azure tables or blobs and create data pipelines that will process that data using either a variety of Hadoop tools such as Hive and Pig scripting, or custom C# processing.  Those pipelines will then feed that data into other processes and tools.  Ultimately, the output of a “Factory” is the transformation of raw data assets into trusted information that can be shared broadly with BI and analytics tools.   In the context of Microsoft’s Cortana Analytics Suite, Azure Data Factory is basically the circulatory system. By providing connectivity and data movement from/to internal and external data sources, as well as between the Cortana Analytics suite tools (i.e. Azure Machine Learning, HDInsight, etc.), ADF makes itself a critical piece of virtually any Azure-based analytics solution.
A very attractive feature of ADF, especially as opposed to the set of comparable command line tools typically used for these operations in the open source Big Data context, is that the service offers a holistic monitoring and management experience over these pipelines.  It provides a view of all pipelines’ data production and data lineage down to the source systems, which makes debugging a much easier and more concise prospect.
Although you may be getting some “SSIS in the cloud” vibes, that light analogy is about the end of the similarities. SSIS provides many more pre-built functions and data transformations, and relies on a SQL Server infrastructure (physical or virtual, on-prem or in the cloud) to operate.  While certain pre-existing capabilities come with Azure Data Factory, it is mainly about cloud-based data movement and is composed of Activities built using C#, Hive, or Pig as opposed to built-in transformation tasks.
 
Stay tuned for more Azure Did You Know’s. Contact us at Perficient to have a certified Azure consultant help envision your solution today!

