Analytics

Power Query – Intro

Istock 1267191201

Background

In 21st century era, data is the most valuable and critical asset of any business. To convert this data into knowledge, analytical processes are required to be applied on data. Every level of management requires analysis of data, but its context is different for all. Previously data analytics was primarily done by business ERP itself using programming languages & SQL queries. But this approach requires technical knowledge. So, this created demand for less technical scripting language for data analysis, which is easy to learn for people with non-technical background & focus more on transforming of data for quick results.

Power Query Ideology

Power Query is Microsoft’s answer to the above demand of industry. The core of Power Query is Mashup Engine which has the ability of ETL (Extract –> Transform –> Load). Mashup engine has it’s own scripting termed M Language scripting. It is GUI based tool with the ability to auto-generate scripting for the transformation. Majority of transformation can be done via toolbar available. Just handful of scenario requires manually tweaking of M script via its editor window.

At a Glance

Power Query ETL Diagram

All icons used in the image are for educational purpose 1) Microsoft Inc & Salesforce.com reserve exclusive ownership of their respective product icons used in image 2) Open-Source icon credit icons8.com

Extract

Power Query supports extraction of data from massive number of sources (around 75+). At present handful of ETL solutions exists, which supports querying data from so many data sources. So, some of popular sources logically categorized are as below

  • Database (like SQL Server, MySQL, PostgreSQL, Oracle Server, IBM DB2, etc)
  • Files (Excel, CSV, PDF, JSON, XML, etc)
  • ERP / CRM (Salesforce, Dynamics 365, Quickbooks, Zoho, etc)
  • Big Data (like Azure Bricks, Parquet, Apache Spark, Hadoop, etc)
  • External Scripting (Python, R script)
  • Social Media & Online Feeds (Linked-in Sales, Google Analytics, etc)
  • Misc (Web, ODBC, Sharepoint, OData, etc)

Power Query Data Sources

Transform

Transformation or Data Shaping is the crucial task performed by M engine. M engine library has lot many functions (around 250+), which have been categorized logically based on its type. Function follows syntax of category_name.function_name just similar to .NET Library style. Result of each step of transformation is stored into variable, which is then referenced by next step in a linear manner. It allows user both GUI based interface (for Basic User) & Raw Editor window (for Advance User) to write M code for transformation.

Power Query Screenshot

Load

The final role of Power Query after transformation is returning output to primary service / software which invoked Power Query for ETL services. Currently following are some popular services/products of Microsoft, which utilizes Power Query engine:

  • Power BI: Power BI is a Data Analysis & Visualization tool supporting DAX queries. But the basic data in its model comes from Power Query, which is an integral part of it.

  • Excel: Power Query was shipped as an extension for Excel 2010/2013. But with increasing usage, Microsoft embedded it inside excel for ease. Limited number of data sources connectors are available in Excel for Power Query.

  • Power Apps: Power Apps too support Power Query based ETL via Dataflows, which can be automated to schedule. It is used for handy data insertion/updation in Dataverse or Common Data Service.

  • Azure Data Factory: Data Factory is an ETL tool for Azure products. It too supports Power Query as one of the transformation tools for its tasks.

 

Pros

  1. Embedded / Plugged Module: Power Query is not circulated as an independent software, instead it is embedded as add-on inside the main solution. Like in Excel 2010/2013, Power Query was circulated as an optional Excel extension. From Office 2016 onward, Microsoft embedded it inside Excel. Azure Data Factory itself is an ETL solution. But it too contains Power Query plugged into it, for transformation.

  2. Step-by-Step Transform / Result evaluation: In real-life, we break any long or complicated task into multiple smaller divisible tasks, for easy management of solution and fool-proof results. Power Query also follows the same rule. Every transformation operation is treated as a step, displaying result post-processing of that step for evaluation.

  3. Source independent transformation: In ETL, data can arrive from various sources. So, there are high chances that structurally data from two different source might be totally different like Tabular vs Tree structure, posing difficulty in co-relating it. Traditionally transformation language for tabular & tree structure is different like for tabular SQL is preferred, where-as for tree structure XPath / XQuery is mostly used. Power Query removes burden of knowing all these languages.

  4. Object Structure: Power Query supports extraction of data from Object (i.e. Key-Value) type of source. JSON is widely used format for data exchange with increasing development of web-based solutions. Power Query provides support for transformation of such types of object, there-by reducing headache of writing code for transformation of such data types.

  5. Reusable Scripting: Power Query supports converting partial portion of scripting into a reusable function, thereby reducing amount of scripting efforts. It supports function capability similar to software development languages.

  6. Recursion for scanning Hierarchy: Many of the ETL solutions generally does not support recursion, which is typically required for tree structure of data sources, where decoding hierarchy is crucial many times. Power Query supports calling M Language functions recursively to scan the tree.

  7. Automatic SQL generation: Power Query supports generating SQL query based on the transformation steps, when Direct Query mode is selected (applicable to selected RDBMS only), for selective actions of transformation supported by underlying database server. This is a welcome feature for non-technical background person, who may find composing SQL query a bit difficult.

Cons

  1. Lack of Optimization: Power Query performs well up to a certain threshold of data volume. But with large datasets, it lags. Database Server utilizes technique of indexation for optimized output, which Power Query lacks. Also, caching of results is also missing in Power Query & it keeps on re-calculating the same transformation numerous times (just like Excel). Direct Query mode, which uses Database side optimization for Power Query is also having limited support.

  2. High Memory Usage: Power Query attempts to perform calculation in-memory with minimal disk usage. This gives performance improvement when data size is small to medium. But with increasing volume of data, it eats up considerable memory (RAM) of system, eventually depriving other applications from memory.

  3. In-progress Documentation: Microsoft has official website MSDN (Microsoft Developer Network) for all its products / services, containing robust documentation along with examples. Power Query portion of documentation is currently in progress, so developer needs to rely on Power BI community platform for knowing any function in-depth.

 

Thoughts on “Power Query – Intro”

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Dhananjay Gokhale

Dhananjay is a BI Developer with over 8 years of experience in Database & MIS Reporting for corporates. He loves developing open-source projects & has trained over 1,000 people in Excel.

More from this Author

Categories
Follow Us
TwitterLinkedinFacebookYoutubeInstagram