In keeping with this month’s apparent theme of troubleshooting Live Meeting and Audio Conferencing problems for external users, I ran into yet another weird one. This time we have a pretty basic Office Communications Server 2007 R2 deployment with Enterprise Voice using a NET VX1200 media gateway with Cisco Call Manager 4.1. All OCS features are deployed and working with best practices followed for nearly every piece of the puzzle; no cutting any corners. The latest round of OCS patches have been applied and everything is looking good.
That was until I started testing Dial-In Conferencing. I found that OCS users were able successfully join audio conferences using either a dynamic Conference ID from a scheduled meeting or their own personal Conference IDs, but only when they joined from the link within an email/meeting.
But if an OCS user attempted to manually dial the Conferencing Attendant number, or if a PBX or PSTN phone called directly into the same number, they were unable to join the meeting. The Conferencing Attendant would accept the given ID as valid and then attempt to transfer the call into the meeting, at which point would fail as the attendant responded with:
"Sorry, but i can’t seem to connect you to your conference right now. Please try your call again later. Goodbye."
The caller would then be immediately disconnected. I did find out that after restarting the server the very first attempt to join any conference would actually work, but then all subsequent attempts would fail regardless of where the call was dialed from. Whether other OCS users were already connected to the meeting (using the link) or no one was connected, inbound callers would always fail. And it didn’t matter if callers attempted to join anonymously (via meeting passcode) or authenticated (using their personal PIN).
Something was definitely wrong; I triple-checked the OCS configuration and event logs but nothing was out-of-place. So I walked through each of these three scenarios with debug logging enabled:
- Test Scenarios
- Joined OCS User A to scheduled Audio conference using link in meeting invite = Success
- Joined OCS User B to conference by manually dialing ‘6789’ in the Search bar in OC = Failure
- Joined PSTN Caller using external DID ‘+13125556789’ = Failure
- Recorded trace data on FE server for following components:
- AcpMcu
- AvMcu
- CAAServer
- CASServer
- S4
- SIPStack
After going through the AvMcu log with Microsoft PSS the following error was located at the time when callers were unable to join the meeting and were disconnected:
TL_ERROR(TF_COMPONENT) [3]117C.1554::06/11/2009-16:00:38.162.0003a54f (AvMcu,UserMediaManager.CreateEndpointAndStreamsCallback:usermedia.cs(821))( 00000000027630E1 )[UserMediaManager]{sip:a4eecd6e49dd404488af679e8e8a1a29@anonymous.invalid} MP CreateEndpointAndStreams exception. System.NullReferenceException: Object reference not set to an instance of an object.
at Microsoft.Rtc.Internal.Sip.TLSListener.SignString(String signString, String& hashAlg, String& signAlg)
The highlighted section in the line above caught our attention as I had already seen a previous issue earlier that week at the same client which ended up being related to their internal certificate. They had deployed an internal Windows 2008 Enterprise CA but had elevated the signing algorithm to SHA2 256, above the default SHA1 value. But the affected server in that case was the only Windows Server 2003 system in the domain, all others (including the OCS servers) were running on Server 2008 which natively supports that higher level. But because the failure seemed to be internal as the conference service couldn’t handling moving calls between it;s own services (from the lobby to a meeting) I had a hunch it might still be related to the hash level.
Here we can see in the details of the certificate that the Signature Algorithm is indeed sha256RSA:
A quick verification of the General Properties on the Windows 2008 Enterprise Certification Authority show that the Hash algorithm used on the root CA is also using sha256:
Since the internal CA here is only configured to sign certificates using SHA2 we went out to RapidSSL to request a thirty-day free trial certificate to temporarily use on the Front-End Server. Here we can see that the certificate is using the more common SHA1 for it’s Signature Algorithm:
After applying the new certificate to OCS and selecting it in IIS I rebooted the server for good measure. Initially the problem appeared to be resolved as I was able to dial directly into an audio conference from an internal PBX phone, followed by a PSTN phone. The second caller was actually added to the conference successfully (yeay!) but in doing so the first caller was immediately booted out of the room (boo!). And when I dialed in from a third phone, you guessed it, the second caller was promptly disconnected.
I looked at the OCS Event Log on the Front-End server showed a whole bunch of new error messages that had not been there before, describing lots of MCU errors, which would explain the failure to join additional parties. Details on the error messages can be found in this TechNet blog.
Turns out that the new certificate was (partially) to blame as it’s Issuing CA’s certificate is not configured by default in Windows Server in a way that is supported by OCS. By checking the new certificate’s path we see that the Equifax Secure Certificate Authority is the issuer used by the RapidSSL free certificate:
By locating that certificate in the Third-Party Root Certification AuthoritiesCertificates folder in the Local Computer store and viewing the properties we can see that by default the certificate is only enable for a specific sub-set of purposes. To resolve the MCU errors in OCS it needed to be enabled for all purposes.
Once that change was made and the services are restarted, the MCU event log errors stopped appearing and all parties were all to join Dial-In Audio Conferences regardless of where and how they connected to the service. This proved that the previous certificate was the culprit and that the higher level of encryption on the signature was causing the validation problems.
Microsoft is currently looking into this and have successfully reproduced the issue in a lab. Once their debugging is completed I’ll update this blog posting with details on whether a hotfix or KB article is released in response.
Update
I’ve received word back from Microsoft that the issue has been fully replicated and tested in both Standard and Enterprise Edition, and certificates issued with a Signature Algorithm of MD5 or SHA2 cannot be supported for OCS R2. Only certificates using SHA1 with up to a 4096 bit key length will operate correctly. Support for SHA2 and MD5 is being considered for the next release of OCS.