Microsoft

Blog Categories

Subscribe to RSS feed

Archives

Follow our Microsoft Technologies board on Pinterest

Archive for the ‘SQL Server’ Category

Why Does Data Warehousing Take So Long? Part 2

In my last post, I wrote about BI’s reputation for being a long drawn-out process. And of course, where the market sees a vacuum…  A plethora of products and processes exist to aid with “shortcutting” BI/DW development. So this time, I want to look some common factors at play when you use the various types of frameworks, accelerators, and off-the-shelf products. I also want to point out how you can easily miss the boat on the long-term benefits of a BI solution by depending too much on these tools.

The market contains some pretty sophisticated enterprise metadata management tools, with full-featured user interfaces that provide drag-and-drop interaction with connected data sources, will auto-generate all kinds of artifacts, etc. On that end of the spectrum, you enter the realm of data virtualization and cloud-based solutions, big vendors with pretty huge brand names (i.e. Teradata, Informatica, IBM Cognos), Big Data, etc.  Although this level of tool does significantly more than just generically speed up the BI process, they do offer a collection of “accelerator” features. Down the cost scale, others tools in this segment are a little bit more “black box,” and will, say, create ETL from scratch based on comparing data models (like Wherescape RED), or generate dashboards by just pointing the tool at a set of tables (like Tableau or QlikView). And still others are merely frameworks, either ETL or front-end building blocks that essentially allow you to Lego yourself a BI solution (i.e. the growth of BIML for SSIS-based ETL).

Read the rest of this post »

Visualization options with Microsoft

I’ve been speaking to a lot of clients lately about the visualization capabilities of the Microsoft BI platform and want to clarify a point of confusion. When building an enterprise analytics platform you will be faced with several decisions around architecture as well as delivery. The architectural options will be vetted by your IT department, but in large part they will be driven by how you want to deliver and consume information in your organization. Typically there will be a balance between ‘traditional’ BI delivery and ‘self-service’ BI delivery.

What’s the difference? Traditional BI delivery comes in the form of reports and dashboards that are built by your IT department with tools such as SSRS or PerformancePoint. Both are solid tools with a lot of functionality. In contrast, most organizations are looking for ways to reduce their dependency on IT-built reports and therefore need a technology that enables their business users to be self-sufficient. This comes in the form of Excel with PowerPivot and PowerView.

A complete explanation of these new tools can be found here.

Feel free to contact us on how these tools can be used in your enterprise to delivery pervasive insights!

Advanced analytics in healthcare with Epic, SQL Server and Azure

Over the months we have released a lot of information on building analytic platforms in healthcare. Several members of my team have played key architectural roles in not only implementing the Cogito platform and performing readmission analysis with it, but also expanding the platform to include customer satisfaction data from Press Ganey.

These functions were deemed critical to the initial phases of these projects, but are largely ‘back-end’ architectural projects. They do not address the ad-hoc analysis needs of the business, the delivery technologies available or much less the predictive capabilities that can be added to the platforms.

Fortunately there are a lot of new technologies in the Microsoft stack to address these needs.

As part of our advisory services to help our clients understand what new capabilities they have with their new platforms we regularly build concept visualizations. The following videos are examples of out of the box capabilities we built for one of our clients utilizing:

Self-service analytics with Power Pivot and Power View

3D visualizations with Power Map

And finally natural language query processing in the cloud with Q&A in Power BI

These technologies are well known and are being leveraged within several of our large clients, but a couple of recent announcements from Microsoft introduces even more exciting capabilities.

Power View now supports forecasting. This is a great new add currently available in the HTML5 version of Power View in Power BI. It gives the user the ability to quickly forecast a trend line, account for seasonality and even adjust the confidence intervals of the calculation. Below is a screenshot of some readmission forecasting being performed on the dataset from the earlier videos.

Forecasting

Important to note is that you not only see the forecasted line (light blue lines which runs through the top chart gray box) but the second chart also shows the hindcasting feature which lets a user start a forecast in the past in order to see how accurate it would have been against real data. (light blue line to the left of the gray box in the second chart).

While valuable and easy to use, this technology doesn’t give us the ability to predict who is at risk of readmitting. For that, we need a more powerful tool.

Azure Machine Learning Services is a recently announced cloud service for the budding Data Scientist. Through a drag and drop interface you can now build experiments of predictive models, train and score the models and even evaluate the accuracy of different algorithms within your model.

The screenshot below shows an experiment that was built against the same readmission data used in the forecasting example (Epic Cogito dataset). The dataset was modified to flatten multiple patient admissions onto one record and included the following attributes as well as some others:

Attributes

The experiment was then created to compare two different classification algorithms, a boosted decision tree vs. a logistic regression. *Note that this blog is not intended to debate the accuracy or appropriate use of these particular algorithms. These were just the two I used.

Model

Once the experiment is complete and evaluated a simple visual inspection shows the accuracy gains one algorithm has over the other.

Results

After some tweaking (and this model still needs it) there is a simple process to create a web service with the associated API key which you can use to integrate the model into a readmission prediction application. One that accepts single record or batch inputs.

API

As you can see, there are a number of options for introducing advanced analytics into your healthcare environment. Feel free to contact me with questions on how these tools can be put to work in your new healthcare analytics platform.

Insights on SQL Server 2014 Data Warehousing Edition

For anyone that is thinking about selecting the Data Warehouse edition of SQL Server 2014, I just want to highlight a few things required to install this product and get it up and running.

First off though, what is SQL 2014 DataWarehousing Edition? In short, it is a version of SQL server that  is available as an image on an Azure VM, the product seems to be flying a little bit under the radar.  In terms of licensing and features, it is closest to Enterprise Edition and is similar to BI Edition.  It houses the full stack of BI products, and it also allows for database snapshots like Enterprise Edition.  The biggest single difference I can find is that it is optimized to use Azure Storage in the cloud-interesting no?  I see its primary purpose as replacing an existing on premise data warehouse, or to function as a starting point for a new data warehouse that will be fairly large.

SQL Server 2014 EvolutionI won’t go into provisioning a cloud VM in this blog, but if you want more info here’s a link:

http://msdn.microsoft.com/library/dn387396.aspx

Ok, on to some tech points and what to expect:

First and foremost, this edition’s minimum recommended VM size is an A7-whoah!

Pretty steep for a minimum spec, A7 is 8 cores with 56 GBs of RAM.  We all know minimum specs are just that, the bare minimum, and usually we end up going larger.

If you are unfamiliar with Azure VM sizing take a look here:

http://msdn.microsoft.com/en-us/library/azure/dn197896.aspx

Second, even to do a basic install, it is going to require that you have several 1 Terabyte Storage locations available for it to harness in Azure Storage. Double Whoah!

When you first login to this VM, you will not be able to connect SSMS to the SQL instance. Instead you are prompted to configure Storage containers for SQL 2014 DW Edition. This can be done in the Portal, or it can be done via Azure PowerShell and is documented quite well here:

http://msdn.microsoft.com/library/dn387397.aspx

In a nutshell it is quite easy to attach the disks through the Portal Application on Azure, you just browse to your VM and click “Attach” at the bottom of the screen.  The VM will reboot, and you can then confirm the process in the logs listed in the link above.  But as I mentioned earlier, you will know when it is up and running because you will get a login error from SSMS if it is not properly setup.  One thing to keep in mind is that LUNS are numbered 0-X  not 1-X, I made this mistake when I first read the log  and thinking it was complete when I still needed to attach one more disk.

Once you have configured the appropriate number of storage LUNS, you must then use Disk Manager in Windows to format and label them – E: , F:, G:, etc.

Once the SQL Instance finds its required number of storage containers, it will then and only then, allow you to login via SSMS.

So what is going on here? Well, some good stuff in my opinion.

  1. It is forcing the end user to appropriate several disk locations instead of just using the default c:\ drive to store everything. This is a great optimization because it will spread the disk activity out over multiple LUNS. It also enforces separating the data files from the operating system disk and the page files. Think about how many database systems you have worked on that have this design flaw-a lot of them.
  2. It is assuming you mean business and it requires a massive amount of storage up front to even install it. Either you need this edition of SQL Server or you don’t. This is not SQL Express or a departmental application server, this is a full size enterprise application that is capable of migrating an on premise DW to Azure.

Even though one might be put off a bit that it requires 4+ terabytes of storage to install, I actually like the fact that it enforces good design and automatically gives some overhead for growth.

No hardware budget excuses this time, a very important point, is that even though it requires you to appropriate 4+ TB’s of storage, YOU ARE NOT BILLED FOR THE STORAGE YOU APPROPRIATE, you are only billed for the storage that you actually fill with data.

Once you understand that, this product starts making more sense. You can design a large storage location, with plenty of room to grow, without having to buy a large storage location. In a traditional on premise environment, this could mean forking over some major cash. If you have never noticed, SANs are not inexpensive, and they take a long time to arrive onsite!

In summary,  I am glad that this product is designed the way it is. It enforces good design from the beginning. It is not the correct product for a lot of different applications due to its scale, but for the person or place that wants to migrate or build a true Enterprise size data warehouse in Azure, SQL 2014 DW Edition is perfect.

 

 

Columnstore Indexes: When Should You Use Them?

When I speak to clients about In-Memory features in SQL Server, I find that Columnstore indexes just haven’t gained much traction as a marquee feature. The functionality itself is quite useful in the BI/DW realm as far as potentially boosting query performance by 1.5 to 10 times. But I think it gets overlooked because the use-case just isn’t very obviously derived from the typical description of it. The explanations I have read/heard/seen of Columnstore and how it works get tedious very quickly.

Columnstore Indexes: When Should You Use Them?So I don’t want to cover details of how Columnstore works in this post. I just want to clarify when it might be useful. Then, if it sounds like it fits your situation, you can dive into some links and have all the tedium you want.

So here are, to me, the most pertinent Columnstore facts to be aware of:

  • It stores data in a columnar data format, heavily compressed, and in-memory — so it’s FAST.
  • It is very focused on large result sets that need to be aggregated or grouped. If you are doing full table scans in your queries, you might be interested.
  • It requires partitioning. If you have a large Fact table that is a candidate for partitioning, this again is potentially right up your alley.
  • Columnstore is not ideal for frequently updated tables. You will end up having to drop and re-create the index before/after data update operations. So a rapid incremental refresh environment is not an ideal fit.  UPDATE: I am reminded by a very helpful colleague that SQL Server 2014 removes this limitation and allows table updates/deletes/etc.  (Thanks Andrew!)
  • Because it is an In-Memory feature, your capability and performance is dependent upon hardware and SQL Server memory configuration.

If you have large fact tables and query performance issues, and if SSAS is either not an option or itself has performance issues, columnstore is an option to investigate. Columnstore indexes have been shown to be faster than an Analysis Services cube in some instances!   From my perspective, a couple of use case scenarios immediately come to mind:

  • Creation of specific fact structures for highly responsive reports/dashboards — especially in situations where Analysis Services is not an option, or is also not performing adequately
  • Improving cube processing performance (although the drop/rebuild time for the columnstore index will then likely take place during ETL — so net performance gain would have to be tested)

For further info, this article seems to be the granddaddy of all columnstore articles. It contains a massive and detailed FAQ, and includes the formula for determining memory capacity.   More focused instructions and examples of creating and using a columnstore can be found here, on Microsoft’s TechNet site. Cheers!

Upcoming Webinar: The Modern Data Warehouse – A Hybrid Story

Organizations of all sizes are looking more seriously at the cloud, and beginning to understand the scalability and cost savings benefits of a hybrid solution. Given the benefits of a properly implemented hybrid solution, it’s logical to consider other areas of the business that can gain from the cloud – one of those being your data analytics platforms.

Data warehousing in the cloud. Sounds almost idyllic, right? Well, it just might be, especially if you are looking for a fast, efficient way to stand up a data warehouse. So, how can you leverage SQL Server 2014 and Azure to build that data warehouse? sql-server-2014-logoFind out next week, on Thursday, June 19 at 1 pm, during a webinar on The Modern Data Warehouse – A Hybrid Story.

Duane Schafer, Microsoft Business Intelligence Practice Director at Perficient, will discuss the hows and whys around implementing your analytics platform in a hybrid environment. While on premises vs. cloud is the major decision, you’ll also want to consider things like “traditional’ disk-based storage and computing vs. in-memory.

If you’d like to learn a bit more and can’t wait another week, Duane shares his excitement for the new capabilities of SQL Server 2014 and Azure in a recent blog post.

To register for the webinar, click here.
The Modern Data Warehouse – A Hybrid Story
Thursday, June 19, 2014
1:00 p.m. CT

Leverage Power BI with SQL Server 2014 & Azure: Webinar Recap

Earlier in the year, we held a popular webinar on utilizing Power BI, a new Microsoft offering (then in preview). Of course, much has changed since that previous webinar, with Power BI being available to the general public and the release of SQL Server 2014.

In light of those changes, my colleague, Andy Tegethoff, a Microsoft BI architect and senior level consultant here at Perficient, again shared his knowledge around Power BI. Click here for the replay.

During Wednesday’s session, Andy showed attendees how they can host their entire BI solution in the cloud, using Power BI along with SQL Server 2014 and Azure. He kicked things off with a quick review of Power BI, and then talked about how to access your data with Power BI. Next up was the topic of SQL Server 2014 – specifically, what’s new with the latest version and what does that mean for you / the user? For starters, dramatically improved query performance with in-memory DB options, and improved integration with cloud of backup, hot failover, DR, or even primary data storage. As part of this, Andy also discussed what this all means when integrating with Microsoft Azure.

This segued into Azure Virtual Machines and Big Data with HDInsight, as Andy reviewed what you need to host BI in the cloud using SQL Server 2014 and Azure vs. Power BI in the cloud with Office 365. He wrapped up with advice and lessons learned around cloud BI with SQL Server.

The attendees asked great questions, and Andy had time to answer them. You can view the webinar replay here, including the Q&A. If you are looking to learn more about BI in the cloud, Andy recently authored a white paper, ‘The Possibilities of Cloud BI with Microsoft Azure,” available here.

Webinar: Make the Most of Power BI with SQL Server 2014 & Azure

Within the last few months, you’ve probably heard about about Power BI for Office 365, the new cloud-centric BI front end tool set from Microsoft. You may have even attended one of our webinars on the topic, while Power BI was still in preview – it has since moved to general availability. power-bi(If you missed the session, you can view the on demand webinar, Power BI for Office 365: Using SharePoint to Deliver Self-Service BI, at your leisure.)

Power BI  is a collection of features and services that provide you with the ability to visualize data, share discoveries, and collaborate more intuitively. It works seamlessly with Excel – simply use Excel to create data models and visualizations, and then, use Power BI to share and extend those insights. More often, users now want self-service access to data of all types, and Power BI delivers on that.

When you take Power BI’s self-service business intelligence and combine it with SQL Server 2014 and Microsoft Azure, you can host your entire BI solution in the cloud, while ensuring greater access to business analytics. And by letting Microsoft manage the infrastructure for you, you can focus on your data and discoveries.

On Wednesday, May 21, 2014, at 1 p.m. CDT, join Andrew Tegethoff, Microsoft BI Architect at Perficient, for a complimentary webinar, Making the Most of Power BI with SQL Server 2014 and Azure. During the session, Andy will show you how to host your analytics and reporting solution in the cloud using Power BI and SQL Server 2014.

In the meantime, if you’d like to learn more about BI in the cloud, Andy wrote a comprehensive guide, The Possibilities of Cloud BI with Microsoft Azure.

To register for the webinar, click here.
Making the Most of Power BI with SQL Server 2014 and Azure
Wednesday, May 21, 2014
1:00 p.m. CDT

 

Power View Differences between Excel 2013 and SharePoint 2013

I knew that there were differences between Power View in desktop Excel and Power View as run from SharePoint. But I recently felt the impact of this for the first time.

The first confusing point I found was, the Export to Power Point function only works for reports created in Power View in SharePoint Server. If you create a Power View report in Excel you will not be able to export it live into a Power Point deck.

So, having encountered this hiccup, I wanted to try to catalog any other practical differences….

Here are a few that I have run into/heard about:

  • Power View in SharePoint Server can create multi-page reports.  In Excel, different Power View reports end up on different tabs. This is a subtle difference, but it’s important to consider if you are trying to tell a story or do drilldown across a number of views.
  • In Excel, there is connectivity from Power View directly against An Analysis Services DB — either Tabular or Multidimensional. You must use the Data Model (most likely a Power Pivot DB) as a kind of “go-between”.   In Power View in SharePoint Server, you can create reports directly against Analysis Services via a BISM Connector.
  • With Excel on the desktop, you can manually refresh data from a variety of source, whether or not your workbook contains Power View sheets.       In SharePoint Server, you can set workbooks to automatically refresh such data sources. BUT, you CAN’T set automatic refresh on workbooks with Power View sheets in them.
  • This is from a page I found in the Office Help files that somewhat clarifies where you can work with Power View as created/stored in which environment:
Workbook is stored Host configuration is Workbook is opened in Power View sheets are
On client computer Excel 2013 Editable and interactive
On premises In SharePoint view mode (SharePoint Server configured to render workbooks by using Excel Services) Excel Services Interactive
On premises In Office Web Apps Server view mode (SharePoint Server configured to render workbooks by using Office Web Apps Server) Excel Online Not visible
In Office 365 SharePoint Online Excel Online Interactive
On OneDrive Excel Online Not visible

So, there’s a good list of differences. And I’m leaving out the whole Power BI for Office 365 angle for the time being.

What it comes down to is that there is not complete compatibility of features across Power View in all situations. So consider what features are important for your situation, and use Power View in the best venue to meet your needs.

Big Data and the Microsoft Data Platform and You

The age of Big Data is upon us, and the age of Pervasive (or “Ambient”) Data is rapidly approaching.

This means that not only the typical loosely-structured Big Data sources that we’ve become accustomed to considering, like Twitter feeds, Facebook posts, and web logs. This also means we’re dealing with the streams of data generated by the growing Internet of Things: a universe of sensors deployed in everything from refrigerators to factory robots, serving purposes from managing self-driving cars to automatically adjusting air conditioning in our homes.

Big Data, the Microsoft Data Platform and YouMore and more, the data streams generated by the various elements of our wired world are being pulled together to improve performance, to save money, to enhance profit, and to improve our lives in general.   And more and more, as the effects of leveraging all forms of Big Data across a wide spectrum of industries are being felt, there is a stampede of organizations that want leverage it for all those purposes.

But amidst that stampede are thousands of already existing solutions, systems, and products already implemented and in place, being used for analytics and data warehousing. And most of those systems–regardless of platform or vendor–are not equipped to deal with the Volume, Velocity, and Variety of Big Data we are talking about.

Generally, the kind of processing demanded by Big Data takes a Hadoop installation, which may or may not be practical to either implement or maintain as either an add-on to these systems, or even if you consider starting from scratch.

But an ideal solution to this problem is Azure HDInsight, a 100% Apache Hadoop solution based entirely in the Cloud.   With the scalability, security, and adaptability of the Azure environment, HDInsight can allow you to bring Big Data into an existing solution in a matter of days. The Azure environment is both easy to manage and cleanly integrated with the Microsoft server platform via PowerShell scripting. Further, HDInsight integrates natively with the rest of the Microsoft BI stack (including Office 365 Power BI plus SQL Server 2012 and 2014) using Hive and ODBC.

On the other end of the spectrum, for massive quantities of both traditional relational data AND Big Data capability in a single solution, the Microsoft Analytics Platform System is a turnkey answer. The APS consists of a SQL Server Parallel Data Warehouse (PDW) appliance, which optionally includes a private instance of HDInsight, as well as the breakthrough Polybase data querying technology which allows SQL queries to combine relational and Hadoop data together. The APS can:

  • Scale out to 6PB (yes, that’s petabytes) of data
  • Provide up to 100x performance gains over traditional data warehouse architectures
  • Store and integrate traditional relational AND Big Data in the same appliance.

As a bonus, it’s also the cheapest option (per terabyte) of any data warehouse appliance available. :-)

So as the drive to Big Data intensifies, Microsoft’s Cloud OS and other Data Platform offerings are positioned to help organizations of all sizes take advantage.