Power Query Ideology
Power Query is Microsoft’s answer to the above demand of industry. The core of Power Query is Mashup Engine which has the ability of ETL (Extract –> Transform –> Load). Mashup engine has it’s own scripting termed M Language scripting. It is GUI based tool with the ability to auto-generate scripting for the transformation. Majority of transformation can be done via toolbar available. Just handful of scenario requires manually tweaking of M script via its editor window.
At a Glance
Power Query supports extraction of data from massive number of sources (around 75+). At present handful of ETL solutions exists, which supports querying data from so many data sources. So, some of popular sources logically categorized are as below
- Database (like SQL Server, MySQL, PostgreSQL, Oracle Server, IBM DB2, etc)
- Files (Excel, CSV, PDF, JSON, XML, etc)
- ERP / CRM (Salesforce, Dynamics 365, Quickbooks, Zoho, etc)
- Big Data (like Azure Bricks, Parquet, Apache Spark, Hadoop, etc)
- External Scripting (Python, R script)
- Social Media & Online Feeds (Linked-in Sales, Google Analytics, etc)
- Misc (Web, ODBC, Sharepoint, OData, etc)
Choosing a Global Software Development Partner to Accelerate Your Digital Strategy
To be successful and outpace the competition, you need a software development partner that excels in exactly the type of digital projects you are now faced with accelerating, and in the most cost effective and optimized way possible.
Transformation or Data Shaping is the crucial task performed by M engine. M engine library has lot many functions (around 250+), which have been categorized logically based on its type. Function follows syntax of category_name.function_name just similar to .NET Library style. Result of each step of transformation is stored into variable, which is then referenced by next step in a linear manner. It allows user both GUI based interface (for Basic User) & Raw Editor window (for Advance User) to write M code for transformation.
The final role of Power Query after transformation is returning output to primary service / software which invoked Power Query for ETL services. Currently following are some popular services/products of Microsoft, which utilizes Power Query engine:
Power BI: Power BI is a Data Analysis & Visualization tool supporting DAX queries. But the basic data in its model comes from Power Query, which is an integral part of it.
Excel: Power Query was shipped as an extension for Excel 2010/2013. But with increasing usage, Microsoft embedded it inside excel for ease. Limited number of data sources connectors are available in Excel for Power Query.
Power Apps: Power Apps too support Power Query based ETL via Dataflows, which can be automated to schedule. It is used for handy data insertion/updation in Dataverse or Common Data Service.
Azure Data Factory: Data Factory is an ETL tool for Azure products. It too supports Power Query as one of the transformation tools for its tasks.
Embedded / Plugged Module: Power Query is not circulated as an independent software, instead it is embedded as add-on inside the main solution. Like in Excel 2010/2013, Power Query was circulated as an optional Excel extension. From Office 2016 onward, Microsoft embedded it inside Excel. Azure Data Factory itself is an ETL solution. But it too contains Power Query plugged into it, for transformation.
Step-by-Step Transform / Result evaluation: In real-life, we break any long or complicated task into multiple smaller divisible tasks, for easy management of solution and fool-proof results. Power Query also follows the same rule. Every transformation operation is treated as a step, displaying result post-processing of that step for evaluation.
Source independent transformation: In ETL, data can arrive from various sources. So, there are high chances that structurally data from two different source might be totally different like Tabular vs Tree structure, posing difficulty in co-relating it. Traditionally transformation language for tabular & tree structure is different like for tabular SQL is preferred, where-as for tree structure XPath / XQuery is mostly used. Power Query removes burden of knowing all these languages.
Object Structure: Power Query supports extraction of data from Object (i.e. Key-Value) type of source. JSON is widely used format for data exchange with increasing development of web-based solutions. Power Query provides support for transformation of such types of object, there-by reducing headache of writing code for transformation of such data types.
Reusable Scripting: Power Query supports converting partial portion of scripting into a reusable function, thereby reducing amount of scripting efforts. It supports function capability similar to software development languages.
Recursion for scanning Hierarchy: Many of the ETL solutions generally does not support recursion, which is typically required for tree structure of data sources, where decoding hierarchy is crucial many times. Power Query supports calling M Language functions recursively to scan the tree.
Automatic SQL generation: Power Query supports generating SQL query based on the transformation steps, when Direct Query mode is selected (applicable to selected RDBMS only), for selective actions of transformation supported by underlying database server. This is a welcome feature for non-technical background person, who may find composing SQL query a bit difficult.
Lack of Optimization: Power Query performs well up to a certain threshold of data volume. But with large datasets, it lags. Database Server utilizes technique of indexation for optimized output, which Power Query lacks. Also, caching of results is also missing in Power Query & it keeps on re-calculating the same transformation numerous times (just like Excel). Direct Query mode, which uses Database side optimization for Power Query is also having limited support.
High Memory Usage: Power Query attempts to perform calculation in-memory with minimal disk usage. This gives performance improvement when data size is small to medium. But with increasing volume of data, it eats up considerable memory (RAM) of system, eventually depriving other applications from memory.
In-progress Documentation: Microsoft has official website MSDN (Microsoft Developer Network) for all its products / services, containing robust documentation along with examples. Power Query portion of documentation is currently in progress, so developer needs to rely on Power BI community platform for knowing any function in-depth.