Core of any BI tools is to acquire business data from multiple sources and then to co-relate it for quality reports & dashboards. Information Technology is being dominantly used for business processes from last 20 years. Volume of data has grown a lot. Businesses have shifted to enterprise database platforms like SQL Server, Oracle, etc, which have capability to store, fetch & process voluminous amount of data efficiently and in optimized manner.
Modes of connectivity in Power BI
Power BI supports 2 modes to connect with data, Import & Direct Query
Import Mode: In this mode, Power BI connects with underlying data source & downloads entire data from the datasource. This data is stored in Power BI model. Fresh copy of this data can be downloaded by pressing Refresh button. PBIX file internally stores model data in compressed format. This published datset model on Power BI Service, internally is stored on Common Data Model, which is sort of Azure Managed SQL Server instance in the backend.
Direct Query Mode: In this mode, Power BI connects to data source, but does not download entire data from the datasource. Instead, it generates SQL Query (or any equivalent) to generate data for specific visualization & fires it to underlying datasource utilizing optimization feature of underlying datasource engine.
|Storage||Stores data into model||Do not store data|
|Volume||At max can store 1 GB data in model||No restriction|
|Performance||Fast with less volume, but degrades when volume increases||Fast if proper indexes are created on database, otherwise might underperform|
|Compatibility||Compatible with every type of data source||Supported for Database Server type of data source|
|RLS (Row Level Security)||Supports RLS||Supports RLS. But proper care needs to be taken since one might use Service Accounts to connect to database which practically impersonates identity of Power BI user|
|DAX & Transformations||Supports all the Power Query transformations & DAX||Supports only those transformations & DAX, for which Power BI is able to generate equivalent SQL Query|
|Refresh Schedule||User have option to set automatic refresh of data on scheduled interval based on subscription||Since data is not stored in model, concept of scheduling refresh is not applicable. Data is fetched from server when user opens report|
|Availability||If data refresh fails due to unavailability of datasource, last data persisting in model is used to prepare visuals||If data refresh fails due to unavailability of datasource, then entire report goes blank, as data is not stored in model. No option to go back to previous state|
Why Direct Query outperforms Import, in majority scenario ?
To know reason behind this, one needs to understand how visuals are prepared. Any visual, be it a Column Chart, Line Chart, Card, Matrix, etc, needs an underlying tabular data. This tabular data is visually arranged/plotted by Power BI on the visual.
Power BI most of the time is doing below mentioned operations:
Let’s understand this in detail. When we drag any field in row/column of matrix/table or X/Y/Legend axis of Column/Line chart, internally Power BI is grouping the data to arrive at unique values in those fields. Upon dragging any field or measure into values box, internally Power BI applies aggregation operation like SUM,COUNT,AVERAGE,MIN,MAX etc. When multiple tables are connected using relationship, upon dragging any of the field/measure from those columns, internally merging of those tables is done by Power BI in the background. And upon selection of value(s) in slicer or filters, filtering operation is done.
Power BI has 2 mighty hands, DAX (powered by SQL Server Analysis Service) and M Script (powered by Power Query), to perform the ETL jobs and visualization related calculations. Both of these engines are efficient with their way of performing calculations. But when it comes to performing calculations in an optimized way, both under-perform compared to database engines, since they are in-memory calculation engines best suited to play with data. They lack indexing of data, which database handles while storing of data. Also, database engine re-uses query results of last few queries, by keeping track of changing data, which is something complex for both DAX & Power Query. As the data size grows, performance difference is quite noticeable.
In which scenario Import mode outperforms Direct Query ?
It’s a myth that Direct Query is always faster than Import. There can be scenarios wherein reverse is observed
Not using indexing feature of database. This might create situation wherein database server is taking too much time to read data from disk, whereas in-memory calculations by Power BI itself might work a little faster.
Running database server with very low resource (like CPU, RAM, etc). In this scenario, database server might be badly struggling with resources to fetch data. Although Power BI Service runs on shared Azure resources, it might outperform in this case.
Many concurrent users querying database. Power BI practically fires multiple SQL queries for multiple visuals. If there are locks on table, then it may lead to long wait thereby freezing Power BI visuals.
Long connection time to database. This happens when database connection request is made to on-premises server over VPN having high latency. If volume of data is not quite large, then it would be wise to import data instead of direct query. With import mode, one always enjoys good service of Azure resource in back-end on Power BI Service.
Is Direct Query real-time ?
No, one should make a distinction between Live and Realtime. Direct Query is a live connection. Whenever report is opened, fresh connection to database server is established to pull the latest data. But the visual will not automatically like a security/share market ticker. After opening screen, it will become static. Only after pressing refresh or re-opening of the report will it load fresh data.
Thank you , Basics and how to choose these methods very well explained. Also waiting for notes
Informative read. Thank you for insights !!