The event logs on the Lync 2010 Front Ends, Lync 2013 Front Ends and Lync 2013 Edge servers didn't show any errors or warnings stating that replication was failing. I checked all the usual suspects:
- Certificate issues
- I looked and verified the internal certificate had the proper Subject Name
- Verified the internal root CA certificate was in the proper location
- Verified that certificate checks based on the Microsoft KB article Lync Server 2013 Front-End service cannot start in Windows Server 2012 http://support.microsoft.com/kb/2795828 was not the case
- Verified I could telnet to the Edge servers from the Front Ends over port 4443
- Verified DNS was correct
- Verified the computer name on the Edge servers were correct with the proper primary DNS suffix entered
- Performed a WireShark trace on the Edge server and verified that there were connections to the server over port 4443
...and finally performed an comprehensive web search about this issue with no luck.
I spent enough time on this issue and decided to enroll the help of a very knowledgeable coworker named James Denavit (who happens to be a Lync 2010 MCM) before I throw in the towel and call Microsoft Support.
Luckily for me, James had the great idea of running the Lync Logging Tool from one of the Lync 2010 Front End servers. We ran it on the 2010 Front End because I did not move the CMS to the Lync 2013 yet. So we ran logging with the following options:
After running the Invoke-CsManagementStoreReplication PowerShell cmdlet from the Lync 2010 Front End server a couple times. I stopped the Logging tool and analyzed them. What we found were the following warnings:
One of the red flags we saw was:
TL_WARN(TF_COMPONENT) [2]0560.2E58::05/24/2013-20:25:24.920.4d11ccab (XDS_File_Transfer_Agent,FileTransferTask.CopyFilesFromReplicaUsingWcf:filetransfertask.cs(644))
(0000000002D2AF4D)[FileTransferTask(7, 5/24/2013 1:21:30 PM): {TASK_NOT_STARTED, fromReplica, [L13EdgeServerFQDN, HttpsWebService, 4443], 0}] Failed to copy files from replica. Exception: [System.ServiceModel.EndpointNotFoundException: Could not connect to https://L13EdgeServerFQDN:4443/ReplicationWebService. TCP error code 10061: No connection could be made because the target machine actively refused it 10.61.1.204:4443. ---> System.Net.WebException: Unable to connect to the remote server ---> System.Net.Sockets.SocketException: No connection could be made because the target machine actively refused it 10.61.1.204:4443
at System.Net.Sockets.Socket.DoConnect(EndPoint endPointSnapshot, SocketAddress socketAddress)
James did a search based on this and found the following post from The Lync Guy Blog:
I did find this blog post during my web search but lucky for me, James read the responses (which obviously I didn't) and pointed out the response from Jonatan talking about adding the registry entry: DWord value SendTrustedIssuerList to the HKey_Local_Machine\SYSTEM\CurrentControlSet\Control\SecurityProviders\SCHANNEL key and assigning it a value of 0.
After modifying the registry, I rebooted the Edge server and ran the Invoke-CsManagementStoreReplication PowerShell cmdlet from the Lync 2010 Front End server a couple times again and still no luck. I then proceeded to read more of the responses and the last response from Chris Duva stated that he added another registry entry.
I added the registry entry DWord value ClientAuthTrustMode to the HKey_Local_Machine\SYSTEM\CurrentControlSet\Control\SecurityProviders\SCHANNEL key and assigning it a value of 2. I rebooted the Edge server again and ran the Invoke cmdlet again and voila!!! It worked.
Since I had 2 Lync 2013 Edge servers, I just wanted to verify that only the ClientAuthTrustMode registry entry was needed so I added only that entry to the second Edge server and rebooted it and it worked! So I removed the registry entry for SendTrustedIssuerList from the first Edge server, rebooted it and verified that the replication still worked.
Thanks to James for helping me with this issue as well as answering questions whenever I have them.
Hope this helps.