Cloud

Data Architecture: 2.5 Types of Modern Data Integration Tools

Istock 889237210

As we move into the modern cloud data architecture era, enterprises are deploying 2 primary classes of data integration tools to handle the traditional ETL and ELT use cases.

The first type of Data integration tool is GUI-Based Data Integration solutions.

Talend, Infosphere Datastage, Informatica, and Matillion are good examples. These tools leverage a UI to either configure a data integration engine or compile code for data integration.  GUI Integration tools promise fast, friendly user interfaces to rapidly create new data pipelines. Also, GUI-based data integration tools have a proven record of increasing developer productivity. They are good for organizations that have:

  1. Many data integration pipelines to manage.
  2. Complex MDM requirements and business rules that need to integrate into data pipelines.
  3. An ubiquitous relational database ecosystem.
  4. Requirements to move data to and from cloud platforms (e.g. AWS, Azure, GCP)

The second type of Data Integration is the Script/Code-based Data Integration Solutions.

The Digital Essentials, Part 3
The Digital Essentials, Part 3

Developing a robust digital strategy is both a challenge and an opportunity. Part 3 of the Digital Essentials series explores five of the essential technology-driven experiences customers expect, which you may be missing or not fully utilizing.

Get the Guide

Script/Code-based data integration leverages a serious of tools to develop a data pipeline. This capability usually requires:

  1. A programming language like Python or Scala
  2. A data processing framework such as Spark
  3. An orchestration tool similar to Apache Airflow.

Code/Scripts are constructed in vertices or nodes using a programming language and framework. These vertices then are structured in Directed Acyclic Graphs (DAGs) by the orchestration tool.    DAGs can scale to handle very large (think 10s of Terabytes per day) data pipelines. DAGs are also extremely useful for handling customized or complex processing that one would see in Artificial Intelligence or Machine Learning use cases.

The 0.5: Cloud Native

When I was initially socializing the two types of Cloud ETL blog idea, a counterpart asked, “What about cloud-native?” Good question! The cloud-native options are just flavors of the two types of Data Integration. For instance, AWS Glue and Google DataProc have UIs that generate code (e.g. Python and Scala). Unlike their legacy counterparts with a rich UI functionality, these cloud-native tools still require editing the generated code (usually Python or Scala).  The cloud-native tools are quickly catching up, but they still need to add significant functionality to their UIs to be able to garner the same productivity gains as traditional GUI-based solutions.

About the Author

Bill is a Director and Senior Data Strategist leading Perficient's Big Data Team. Over his 27 years of professional experience he has helped organizations transform their data management, analytics, and governance tools and practices. As a veteran in analytics, Big Data, data architecture and information governance, he advises executives and enterprise architects on the latest pragmatic information management strategies. He is keenly aware of how to advise and lead companies through developing data strategies, formulating actionable roadmaps, and delivering high-impact solutions. As one of Perficient’s prime thought leaders for Big Data, he provides the visionary direction for Perficient’s Big Data capability development and has led many of our clients largest Data and Cloud transformation programs. Bill is an active blogger and can be followed on Twitter @bigdata73.

More from this Author

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Subscribe to the Weekly Blog Digest:

Sign Up