Most new SharePoint 2013 implementations these days run on virtual machines, and the question on whether to virtualize SQL servers has been long put to rest. Indeed, with the new Windows Server 2012 R2 Hyper-V VM specs of up to 64 vCPUs, 1 TB RAM and 64 TB data, it is hard to make a case for physical hardware.
Both Microsoft Hyper-V and VMware have published recommendations for working with virtualized SharePoint farms. The list of recommendations is long (and somewhat tedious), so this cheat-sheet aims to summarize the most important ones and provide real-world advice for SharePoint and virtualization architects.
- When virtualizing SharePoint 2013, Microsoft recommends minimum of 4 and maximum of 8 CPU cores per VM. Start low (4) and scale up as needed. With multiprocessor virtual machines, the physical host needs to ensure enough physical CPU cores are available before scheduling threads execution of that particular VM. Therefore, in theory the higher the number of vCPUs, the longer potential wait times for that VM. In every version starting 4.0, VMware has made improvements to the CPU scheduling algorithm to reduce the wait time for multiprocessor VMs using relaxed co-scheduling. Still, it’s wise to consult documentation on your particular version and see what are the specific limitations and recommendations.
- Ensure true high availability by using affinity rules. Your SharePoint admin should tell you which VM hosts which role, and you will need to keep VMs with same role on separate physical hosts. For example, all VMs that host the web role should not end up on the same physical host, so your typical mid-size 2 tier farm should look something like this:
- When powering down the farm, start with the web layer, and work your way down to the database layer. When powering up, go in the opposite direction
- Do not over oversubscribe or thin-provision PROD machines, do oversubscribe and thin-provision DEV and TEST workloads
- NUMA (non-uniform memory access) partition boundaries: The high-level recommendation from both Microsoft and VMware is not to cross NUMA boundaries. Different chip manufacturers have different definitions of NUMA, but the majority opinion seems to be that NUMA node equals physical CPU socket, and not CPU core. For example, for a physical host with 8 quad-code CPUs and 256 GB of RAM, a NUMA partition is 32 GB. Ensure that individual SharePoint VMs will fit into a single partition i.e. will not be assigned more than 32 GB or RAM each.
- Do not use dynamic memory: Certain SharePoint components like search and distributed cache use memory-cached objects extensively and are unable to dynamically resize their cache when the available memory changes. Therefore, dynamic memory mechanisms like minimum/maximum RAM, shares, ballooning driver etc. will not work well with SharePoint 2013. Again, your SharePoint admin should provide detailed design and advise which VM hosts which particular service.
- Do not save VM state at shutdown or use snapshots in PROD: SharePoint is transactional application and saving VM state can lead to inconsistent topology after the VM comes back up or is reverted to a previous snapshot.
- Disable time synchronization between the host and the VM: Same as previous point. All transaction events are time stamped, and latency during time synchronization can cause inconsistent topology. SharePoint VMs will use the domain synchronization mechanism to keep local clocks in sync.
- Do not configure “always start machine automatically”: There may be cases where SharePoint VM is shut down for a reason, and starting it automatically after physical host reboot can cause problems.
- TCP Chimney offload: Please refer to this VMware post on reasons why this setting may need to be disabled. This is not a setting unique to SharePoint and unless it is the standard practice for all web VMs or is part of the image, it should not be configured.
- When configuring disaster recovery, virtualization has been a godsend for quite some time. Using VM replication to a secondary site is by far the simplest SharePoint DR scenario to configure and maintain.
- Other settings that are not SharePoint-specific : things like storage host multi-pathing, storage partition alignment, physical NIC teaming, configuring shared storage for vMotion etc. hold true for all VMware implementations
One thing to keep in mind is that VM replication is not supported (for the same reason that Snapshot/Save State is not supported) when using >1 SharePoint VM. Also, for Hyper-V, given it does not use a gang scheduler, you can freely add as many vCPUs as you see fit without an impact to performance, given you’re not oversubscribing the host, of course.
That’s a good point. VMware has a whitepaper that implies SRM can be used in these scenarios. SRM integrates with third-party storage replication products, which in turn can be configured to minimize data loss.