Microsoft

Blog Categories

Subscribe to RSS feed

Archives

Follow Microsoft Technologies on Pinterest

Archive for the ‘SQL Server’ Category

Hybrid Analytics in Healthcare with O365 & Power BI Webinar Recap

Last week, we had our Microsoft last business intelligence focused webinar of the year, “Hybrid Analytics in Healthcare: Leveraging Power BI and Office 365 to Make Smarter Business Decisions.”  Heidi Rozmiarek, Assistant Director of IT Development for UnityPoint Health, spoke, along with our Microsoft BI team, on implementing an analytics platform in a hybrid environment. WebinarReplay

First, the Perficient team covered architectural components and functions, architecture options including on premises, hybrid, cloud,  and delivery considerations. Next, Steven Gregor, a technical consultant on our Microsoft BI team, reviewed Power BI and its features, including the security model and client side Data Management Gateway, and then walked through a live demo.

Last, Heidi shared how her organization is architecting a successful analytics infrastructure using Microsoft technologies. She explained how UnityPoint Health is leveraging Microsoft’s BI stack to provide simple solutions for complex questions. Heidi shared how they built the solution, collected and cleansed the data, modeled the data, and visualize and report the answer. She wrapped up by sharing her organization’s plans to move further to a hybrid on-premises/cloud solution in the next few months. Read the rest of this post »

Power BI Basics Inside Office 365 – A Video Series

Yesterday, we were fortunate to have a customer, Heidi Rozmiarek, Assistant Director of IT Development for UnityPoint Health, speak alongside our Microsoft BI team for the webinar, “Hybrid Analytics in Healthcare: Leveraging Power BI and Office 365 to Make Smarter Business Decisions.” power-bi

It was an informative session that began by covering architectural components and functions, architecture options including on premises, hybrid, cloud and delivery considerations. Following this, we had a live Power BI demo, and last but not least, Heidi shared how her organization is using the Microsoft BI stack to provide simple solutions for complex questions. Keep an eye out for a post describing the webinar in more detail, but in the meantime, you can view the replay here. 

Whether or not you attended the webinar, if you are interested in learning more about building a hybrid analytics platform with Power BI and Office 365,  I highly recommend you take a look at the following short video series.

  1. Introduction to Power BI:  The first video includes an introduction to Power BI, particularly around Power BI Sites, “My Power BI” and the Power BI Admin page.
  2. Administration and Permissions in Power BI: This video focuses on Site Admin and security basics.
  3. Data Exploration and Visualization in Power BI: The third video in the series discusses data exploration and visualization using Excel and related power tools, including Power Pivot and Power View.
  4. Data Management Gateway for Power BI: Here, we cover the steps to enable data feeds in Power BI using the Data Management Gateway.

Anglebrackets Conference – Day 1 Keynote

I’m lucky to be able to attend this year’s Anglebrackets conference in Las Vegas and I’ll try to cover the conference in this Perficient blog as much as I can. Today was the opening day of the conference, which actually consisted only of the opening keynote. The speaker was Scott Guthrie, Executive VP of Cloud and Enterprise group at Microsoft. He was wearing his signature red shirt. His keynote was titled, “The Cloud For Modern Business.”

image_dbbecd7b-9298-4dde-993a-acd9d9461515The following are my notes from his keynote:

Mobile first, cloud first.

Why cloud? Cloud enables:

1. Quick and easy deployment.
– No need to wait for provisioning. Demo: database deployed in Azure in few clicks. Sharepoint server farm deployed in few seconds.

2. Elastic capacity.
– no need to buy infrastructure
– unexpected load easily managed
– global coverage with unprecedented scale
Example: XBox One game Titanfall is completely cloud powered. 200,000 VMs were spun off on launch day.

3. Pay only for what you use
– no upfront costs
– no long-term commitment
– no wasted capacity
Example: slide with a typical web site usage pattern (a saw) illustrating unused capacity. Azure allows to avoid that by allowing automatic scaleup and down.

4. Enable new business value
– engage customers with web and mobile
– big data analytics
– machine learning Read the rest of this post »

Microsoft Azure updates for October

microsoft-azure-logo_11368901Every month Microsoft is releasing new Azure services and promoting other services from preview state to general availability. In October this year a few news services were released and a few graduated to general availability.

– Azure Automation is now generally available. Azure Automation is essentially a PowerShell scripting in the cloud. Microsoft was recommending to script Azure deployment tasks for a long time, but previously the scripting capabilities were limited by developer’s computer. Now, using  Azure Automation it’s possible to actually run PowerShell scripts in Azure cloud, create jobs and schedule them at given times, create automation workflows. These PowerShell workflows are called “runbooks”. Microsoft is providing a comprehensive catalog of ready to use runbooks made to automate and manage different part of Azure: web site, cloud storage, media service, VMs, etc.

– Azure Service Bus received a new feature – Event Hubs. Event Hubs is hyper-scalable pub/sub event ingestor which can ingest data from millions of telemetry events per second so it could be processed by Azure cloud services. Event Hubs is designed for use with “internet of things” (IoT) – cloud-connected devices with sensors.

– Microsoft Animalware for Cloud Services and VMs graduated to general availability. Microsoft Antimalware is a service and SDK enabling protection of cloud services and VMs from malware.

– Instance-level public IPs are now generally available. It’s now possible to directly assign a public IP to VM or a web or worker role. Limit of two public IPs per subscription was removed.

– Elastic Scale preview is now available for SQL Azure. Elastic Scale is a set of .NET libraries and management tools making horizontal scale-out (sharding) of SQL Azure servers easier. Sharding was a recommended scale-out (and scale-in) pattern for Azure SQL for a while. However, implementation of sharding required custom coding and writing management scripts (or manual management of SQL instances). Now it’s much easier to implement.

– Azure Batch is now in public preview. Azure Batch is new platform which is enabling user to run large scale parallel applications on thousands of virtual machines, auto-scale depending on work in the queue, monitor job progress and stage data and build computing pipelines.

– Stream Analytic is now available in public preview. Stream Analytics is a realtime event processing engine and built to process mullions of telemetric events per seconds when used together with Event Hubs.

– Data Factory is now available in public preview. Azure Data Factory is a framework for creating, managing and orchestration of data pipelines for connecting all kinds of data sources (SQL on premises or Azure, Azure tables or blobetc) to Hadoop cluster .

Why Does Data Warehousing Take So Long? Pt. 3

Last time, I posted about how BI/DW accelerator and framework tools need to be used with care, and not as a substitute for solid requirements analysis. This time, I want to debunk a misconception that can be framed by the following question: Will Agile processes speed up my BI development timeline?  

I see many situations where misconceptions about using Agile methods have gotten people and projects into a tight corner. And it’s not just in database/BI development, but in virtually every type of development. This is just one of them. But why is that? What is the disconnect here?

First, let’s just answer the question: No, Agile methods are probably not going to “speed up” any part of software development.   Sorry. :-/

As to “why”, I think the confusion arises from the fact that, since doing Agile means favoring working product over documentation, you will usually see a “first draft” of part of a solution pretty early on. This “draft” may have some functionality initially but, in most cases, for any sizable solution it won’t be much. Some screens/UI components might be in place, but it’s most likely that the screens won’t actually do much or take you anywhere. The point is, this “first draft” is reviewed by the business, feedback gathered and then it’s iterated upon to build more real functionality.   Build working software instead of models and document, and keep rebuilding and changing until it’s “done.”

And this gets to the core of what Agile methods are really about, which is about how to take in stride a continuously changing set of software requirements. While so-called “BDUF” methodologies lean on mechanisms for Change Control, Agile simply embraces and accepts that such changes are part of the deal. The Agile software development movement began as a response to inevitable requirements changes — not as a way to accelerate development.   The early prototypes you get out of the first sprints in an Agile project do not signal that the project will finish faster. It just means that during development you favor working software more than ancillary process and/or documentation.

So then with respect to Data Warehousing/Business Intelligence projects, if Agile isn’t going to speed things up, what is it actually good for?   Potentially a lot.

For instance, Kimball Method data warehouse development is already iterative, so it can be made to fit fairly well into an Agile context — albeit with some tweaking on matters like sprint length, sprint goals, etc. So this provides an interesting option for managing Kimball method development that may be a good fit for your environment.

Agile report development can also be quite worthwhile. Something I learned early in my consulting career was that it’s a lot easier to helpful responses from users if you show them something they DON’T want first. Starting with a simple working prototype of a report can be much more fruitful than starting with abstract specs and mockups.

In all, I think that Agile methods are best paired with data work in the following types of scenarios:

  1. Product Development – When you are starting from scratch, and have few or no solid requirements to start with, Agile is your friend. It will give you the opportunity to explore those requirements, stir up some ideas, and still come away with reasonably solid and reusable development progress.   However, whether to stick with Agile beyond initial prototyping is another question — mainly due to the fact that some DW/BI development tasks can be hard to fit into the Agile context for an Enterprise-level solution.
  2. Maintenance Mode – If you have a mature DW/BI solution that is live and in production, Agile can lend itself well to managing maintenance workloads and ongoing changes.   Agile maintenance can give IT departments greater flexibility in responding to sudden priority changes coming from the business.

Are those the only use cases? No, but that’s really a question for another post….   Bottom line: using Agile methods to develop your DW/BI solution does not automatically mean the project will run faster or deliver value sooner. But it can offer some serious non-speed-related benefits — depending on the context of the project.

Next time, I’ll wrap up this series with some ideas gettting value out of a DW/BI project as early as possible. Cheers!

Why Does Data Warehousing Take So Long? Part 2

In my last post, I wrote about BI’s reputation for being a long drawn-out process. And of course, where the market sees a vacuum…  A plethora of products and processes exist to aid with “shortcutting” BI/DW development. So this time, I want to look some common factors at play when you use the various types of frameworks, accelerators, and off-the-shelf products. I also want to point out how you can easily miss the boat on the long-term benefits of a BI solution by depending too much on these tools.

The market contains some pretty sophisticated enterprise metadata management tools, with full-featured user interfaces that provide drag-and-drop interaction with connected data sources, will auto-generate all kinds of artifacts, etc. On that end of the spectrum, you enter the realm of data virtualization and cloud-based solutions, big vendors with pretty huge brand names (i.e. Teradata, Informatica, IBM Cognos), Big Data, etc.  Although this level of tool does significantly more than just generically speed up the BI process, they do offer a collection of “accelerator” features. Down the cost scale, others tools in this segment are a little bit more “black box,” and will, say, create ETL from scratch based on comparing data models (like Wherescape RED), or generate dashboards by just pointing the tool at a set of tables (like Tableau or QlikView). And still others are merely frameworks, either ETL or front-end building blocks that essentially allow you to Lego yourself a BI solution (i.e. the growth of BIML for SSIS-based ETL).

Read the rest of this post »

Visualization options with Microsoft

I’ve been speaking to a lot of clients lately about the visualization capabilities of the Microsoft BI platform and want to clarify a point of confusion. When building an enterprise analytics platform you will be faced with several decisions around architecture as well as delivery. The architectural options will be vetted by your IT department, but in large part they will be driven by how you want to deliver and consume information in your organization. Typically there will be a balance between ‘traditional’ BI delivery and ‘self-service’ BI delivery.

What’s the difference? Traditional BI delivery comes in the form of reports and dashboards that are built by your IT department with tools such as SSRS or PerformancePoint. Both are solid tools with a lot of functionality. In contrast, most organizations are looking for ways to reduce their dependency on IT-built reports and therefore need a technology that enables their business users to be self-sufficient. This comes in the form of Excel with PowerPivot and PowerView.

A complete explanation of these new tools can be found here.

Feel free to contact us on how these tools can be used in your enterprise to delivery pervasive insights!

Advanced analytics in healthcare with Epic, SQL Server and Azure

Over the months we have released a lot of information on building analytic platforms in healthcare. Several members of my team have played key architectural roles in not only implementing the Cogito platform and performing readmission analysis with it, but also expanding the platform to include customer satisfaction data from Press Ganey.

These functions were deemed critical to the initial phases of these projects, but are largely ‘back-end’ architectural projects. They do not address the ad-hoc analysis needs of the business, the delivery technologies available or much less the predictive capabilities that can be added to the platforms.

Fortunately there are a lot of new technologies in the Microsoft stack to address these needs.

As part of our advisory services to help our clients understand what new capabilities they have with their new platforms we regularly build concept visualizations. The following videos are examples of out of the box capabilities we built for one of our clients utilizing:

Self-service analytics with Power Pivot and Power View

3D visualizations with Power Map

And finally natural language query processing in the cloud with Q&A in Power BI

These technologies are well known and are being leveraged within several of our large clients, but a couple of recent announcements from Microsoft introduces even more exciting capabilities.

Power View now supports forecasting. This is a great new add currently available in the HTML5 version of Power View in Power BI. It gives the user the ability to quickly forecast a trend line, account for seasonality and even adjust the confidence intervals of the calculation. Below is a screenshot of some readmission forecasting being performed on the dataset from the earlier videos.

Forecasting

Important to note is that you not only see the forecasted line (light blue lines which runs through the top chart gray box) but the second chart also shows the hindcasting feature which lets a user start a forecast in the past in order to see how accurate it would have been against real data. (light blue line to the left of the gray box in the second chart).

While valuable and easy to use, this technology doesn’t give us the ability to predict who is at risk of readmitting. For that, we need a more powerful tool.

Azure Machine Learning Services is a recently announced cloud service for the budding Data Scientist. Through a drag and drop interface you can now build experiments of predictive models, train and score the models and even evaluate the accuracy of different algorithms within your model.

The screenshot below shows an experiment that was built against the same readmission data used in the forecasting example (Epic Cogito dataset). The dataset was modified to flatten multiple patient admissions onto one record and included the following attributes as well as some others:

Attributes

The experiment was then created to compare two different classification algorithms, a boosted decision tree vs. a logistic regression. *Note that this blog is not intended to debate the accuracy or appropriate use of these particular algorithms. These were just the two I used.

Model

Once the experiment is complete and evaluated a simple visual inspection shows the accuracy gains one algorithm has over the other.

Results

After some tweaking (and this model still needs it) there is a simple process to create a web service with the associated API key which you can use to integrate the model into a readmission prediction application. One that accepts single record or batch inputs.

API

As you can see, there are a number of options for introducing advanced analytics into your healthcare environment. Feel free to contact me with questions on how these tools can be put to work in your new healthcare analytics platform.

Insights on SQL Server 2014 Data Warehousing Edition

For anyone that is thinking about selecting the Data Warehouse edition of SQL Server 2014, I just want to highlight a few things required to install this product and get it up and running.

First off though, what is SQL 2014 DataWarehousing Edition? In short, it is a version of SQL server that  is available as an image on an Azure VM, the product seems to be flying a little bit under the radar.  In terms of licensing and features, it is closest to Enterprise Edition and is similar to BI Edition.  It houses the full stack of BI products, and it also allows for database snapshots like Enterprise Edition.  The biggest single difference I can find is that it is optimized to use Azure Storage in the cloud-interesting no?  I see its primary purpose as replacing an existing on premise data warehouse, or to function as a starting point for a new data warehouse that will be fairly large.

SQL Server 2014 EvolutionI won’t go into provisioning a cloud VM in this blog, but if you want more info here’s a link:

http://msdn.microsoft.com/library/dn387396.aspx

Ok, on to some tech points and what to expect:

First and foremost, this edition’s minimum recommended VM size is an A7-whoah!

Pretty steep for a minimum spec, A7 is 8 cores with 56 GBs of RAM.  We all know minimum specs are just that, the bare minimum, and usually we end up going larger.

If you are unfamiliar with Azure VM sizing take a look here:

http://msdn.microsoft.com/en-us/library/azure/dn197896.aspx

Second, even to do a basic install, it is going to require that you have several 1 Terabyte Storage locations available for it to harness in Azure Storage. Double Whoah!

When you first login to this VM, you will not be able to connect SSMS to the SQL instance. Instead you are prompted to configure Storage containers for SQL 2014 DW Edition. This can be done in the Portal, or it can be done via Azure PowerShell and is documented quite well here:

http://msdn.microsoft.com/library/dn387397.aspx

In a nutshell it is quite easy to attach the disks through the Portal Application on Azure, you just browse to your VM and click “Attach” at the bottom of the screen.  The VM will reboot, and you can then confirm the process in the logs listed in the link above.  But as I mentioned earlier, you will know when it is up and running because you will get a login error from SSMS if it is not properly setup.  One thing to keep in mind is that LUNS are numbered 0-X  not 1-X, I made this mistake when I first read the log  and thinking it was complete when I still needed to attach one more disk.

Once you have configured the appropriate number of storage LUNS, you must then use Disk Manager in Windows to format and label them – E: , F:, G:, etc.

Once the SQL Instance finds its required number of storage containers, it will then and only then, allow you to login via SSMS.

So what is going on here? Well, some good stuff in my opinion.

  1. It is forcing the end user to appropriate several disk locations instead of just using the default c:\ drive to store everything. This is a great optimization because it will spread the disk activity out over multiple LUNS. It also enforces separating the data files from the operating system disk and the page files. Think about how many database systems you have worked on that have this design flaw-a lot of them.
  2. It is assuming you mean business and it requires a massive amount of storage up front to even install it. Either you need this edition of SQL Server or you don’t. This is not SQL Express or a departmental application server, this is a full size enterprise application that is capable of migrating an on premise DW to Azure.

Even though one might be put off a bit that it requires 4+ terabytes of storage to install, I actually like the fact that it enforces good design and automatically gives some overhead for growth.

No hardware budget excuses this time, a very important point, is that even though it requires you to appropriate 4+ TB’s of storage, YOU ARE NOT BILLED FOR THE STORAGE YOU APPROPRIATE, you are only billed for the storage that you actually fill with data.

Once you understand that, this product starts making more sense. You can design a large storage location, with plenty of room to grow, without having to buy a large storage location. In a traditional on premise environment, this could mean forking over some major cash. If you have never noticed, SANs are not inexpensive, and they take a long time to arrive onsite!

In summary,  I am glad that this product is designed the way it is. It enforces good design from the beginning. It is not the correct product for a lot of different applications due to its scale, but for the person or place that wants to migrate or build a true Enterprise size data warehouse in Azure, SQL 2014 DW Edition is perfect.

 

 

Columnstore Indexes: When Should You Use Them?

When I speak to clients about In-Memory features in SQL Server, I find that Columnstore indexes just haven’t gained much traction as a marquee feature. The functionality itself is quite useful in the BI/DW realm as far as potentially boosting query performance by 1.5 to 10 times. But I think it gets overlooked because the use-case just isn’t very obviously derived from the typical description of it. The explanations I have read/heard/seen of Columnstore and how it works get tedious very quickly.

Columnstore Indexes: When Should You Use Them?So I don’t want to cover details of how Columnstore works in this post. I just want to clarify when it might be useful. Then, if it sounds like it fits your situation, you can dive into some links and have all the tedium you want.

So here are, to me, the most pertinent Columnstore facts to be aware of:

  • It stores data in a columnar data format, heavily compressed, and in-memory — so it’s FAST.
  • It is very focused on large result sets that need to be aggregated or grouped. If you are doing full table scans in your queries, you might be interested.
  • It requires partitioning. If you have a large Fact table that is a candidate for partitioning, this again is potentially right up your alley.
  • Columnstore is not ideal for frequently updated tables. You will end up having to drop and re-create the index before/after data update operations. So a rapid incremental refresh environment is not an ideal fit.  UPDATE: I am reminded by a very helpful colleague that SQL Server 2014 removes this limitation and allows table updates/deletes/etc.  (Thanks Andrew!)
  • Because it is an In-Memory feature, your capability and performance is dependent upon hardware and SQL Server memory configuration.

If you have large fact tables and query performance issues, and if SSAS is either not an option or itself has performance issues, columnstore is an option to investigate. Columnstore indexes have been shown to be faster than an Analysis Services cube in some instances!   From my perspective, a couple of use case scenarios immediately come to mind:

  • Creation of specific fact structures for highly responsive reports/dashboards — especially in situations where Analysis Services is not an option, or is also not performing adequately
  • Improving cube processing performance (although the drop/rebuild time for the columnstore index will then likely take place during ETL — so net performance gain would have to be tested)

For further info, this article seems to be the granddaddy of all columnstore articles. It contains a massive and detailed FAQ, and includes the formula for determining memory capacity.   More focused instructions and examples of creating and using a columnstore can be found here, on Microsoft’s TechNet site. Cheers!