I recently ran into a problem with a Skype for Business 2015 on-premises deployment where certain services randomly failed to start. After successfully deploying and testing the services it was time to patch the OS and reboot. This is where things became interesting.
As you can see from the screen grab below there were specific services like the Conferencing Announcement and Response Group services did not want to start. The Skype for Business Front End service did start but was taking an extremely long time. I attempted to manually start the stopped services but did not have any luck.
My first stop in troubleshooting the problem was to have a look at the event logs. The system log showed a timeout as the cause of the service not wanting to start. This made sense as the Skype For Business Front End service was taking an extremely long time to start. Unfortunately, the event logs were not very helpful in finding the root cause. At first glance, the services which failed to start have little in common. One commonality I did see was the location of the servers and the use of Read-Only Domain Controllers (RODC’s).
At this point, I switched over to using the native Skype for Business CLS Logging tool to pull detailed logs. Before choosing a component to log I sat down and thought about the dependencies for the failed services. The conferencing and response group related services rely on the RTC Service global settings container which is located in a distant central office.
To start I added ADConnect as a component to log. I then attempted to start the Skype for Business Response Group service with the logging running. Once the service timed out I pulled up the logs. From the logs, it became apparent that Skype was cycling through a number of RODC’s within the local AD site in search of some required attribute.
The Skype Server Try to Locate a Global Catalog/Domain Controller:
TL_WARN(TF_COMPONENT) 423C.376C::11/18/2017-00:57:19.412.ffffffff (ADConnect,ADConnection.AnalyzeDirectoryError:adconnection.cs(692)) Caught LdapExceptionQ with 7143535(0x), message= TL_ERROR(TF_COMPONENT) 423C.376C::11/18/2017-00:57:19.412.ffffffff (ADConnect,SuitabilityVerifier.CreateConnectionAndBind:suitabilityverifier.cs(143)) BindingFailedTo fqdn: server.domain.com error:6357090 port:3014764 message:omn TL_ERROR(TF_COMPONENT) 423C.376C::11/18/2017-00:57:19.412.ffffffff (ADConnect,DirectoryServicesTopologyProvider.FindFirstSuitableDomainController:directoryservicestopologyprovider.cs(770)) Domain Controller server.domain,com Message-“The LDAP server is unavailable.” LdapError-“ServerDown” ” server.domain.com “:”389” was found not suitable. Will try to find another DC in the domain. Error: server.domain.com * in site Remote Site
Going Through List of Read-Only Domain Controllers in the AD Site:
TL_INFO(TF_COMPONENT) 423C.376C::11/18/2017-00:57:19.412.ffffffff (ADConnect,SuitabilityVerifier.IsServerSuitableIgnoreExceptions:suitabilityverifier.cs(46)) Trying to find if DC server.domain.com is suitable TL_INFO(TF_COMPONENT) 423C.376C::11/18/2017-00:57:20.149.ffffffff (ADConnect,SuitabilityVerifier.IsServerSuitable:suitabilityverifier.cs(91)) Created a connection to DC server.domain.com TL_INFO(TF_COMPONENT) 423C.376C::11/18/2017-00:57:20.149.ffffffff (ADConnect,SuitabilityVerifier.IsOperatingSystemSuitable:suitabilityverifier.cs(170)) Checking if operating system is suitable for DC server.domain.com TL_ERROR(TF_COMPONENT) 423C.376C::11/18/2017-00:57:20.394.ffffffff (ADConnect,SuitabilityVerifier.LogRodcFoundEvent:suitabilityverifier.cs(458)) OsSuitability server.domain.com TL_ERROR(TF_COMPONENT) 423C.376C::11/18/2017-00:57:20.395.ffffffff (ADConnect,DirectoryServicesTopologyProvider.FindFirstSuitableDomainController:directoryservicestopologyprovider.cs(770)) Domain Controller server.domain.comlErrorIsServerSuitableRODC server.domain.com was found not suitable. Will try to find another DC in the domain. Error:
Why was it not chasing a referral to a domain controller that had the answer? DNS lookups and telnet confirmed name resolution and connectivity were not a problem. The customer’s Active Directory Team confirmed they had healthy replication across the environment. They also added additional replication links in the AD topology between the local sites and the forest root. None of this seemed to make a difference and it was apparent Skype was not leaving the local AD site for answers.
It was apparent that Skype, unlike an Exchange server, is not site aware. I needed a way of making Skype bypass the RODC’s in the local site and go directly to the forest root. From the start of my career, I recalled an old NT 4.0 setting in the registry called SetPrfDc. This was used to statically assign the login domain controller. That particular setting had long since become obsolete for good reason. I started looking for something similar to the old setting. I discussed the problem with one of my colleagues and he found an old article referring to a similar problem with a Lync 2010 appliance. You can read about it here – https://trogjels.wordpress.com/2013/05/08/lync-2010-sba-with-rodc-how-to-get-it-work/
The proposed fix was to add the local server IP address to the forest root in AD Sites and Services. The customer’s AD team was understandably reluctant to make the change. This prompted a call to Microsoft who recommended a simple change to the server that ultimately resolved the problem. They pointed me to a registry setting called SiteName. We made the recommended change and following a reboot, the services started as expected.
You can read more about the fix on Technet – https://technet.microsoft.com/en-us/library/cc937923.aspx.