“The odd call drops” of the Mediation Server

Standard

I had a very annoying issue lately when an installation of a new gateway resolved in some calls (specifically to US numbers) dropped by the Skype for Business mediation server saying “A call to a PSTN number failed due to non availability of gateways.

The cause, according to the mediation server, was that “All gateways available for this call are marked as down“, and the resolution, surprisingly, was to “Verify that these gateways are up and can respond to calls.”

It seemed funny, because all other calls were successful, I have not exhausted the available PRI channels I had, the gateway didn’t seem to lose connectivity for a split second and SIP options are accepted and replied to on both ends.

Looking further at the Lync Monitoring Reports, I noticed the following:

Reported by Client
12000; reason=”Routes available for this request but no available gateway at this point”

Reported by Server
12000; reason=”Routes available for this request but no available gateway at this point”

I went to the gateway (AudioCodes M1000B) for answers and the logs were showing something very weird;

Every call starts with the same flow:

Flow1The mediation server sends a SIP Invite that will normally be answered immediately by the gateway with a “100 Trying”. This, for me, will close the deal on the “Gateway no responding” – this is a great response if you ask me.
The next phase would be the gateway immediately sending a “PSTN Place call” to the PSTN, hopefully receiving a “Proceeding” instantly from the PSTN, meaning so far we’re fully communicating. So for the time being I’m not really buying what the Lync mediation server is selling.

Looking further at the logs, I see the following strange behaviour:
As I’m still waiting for the PSTN to connect the call (this would normally be a “PSTN Call Alerting” message), I see that the mediation server decided to abandon ship!
This is what I see in the logs:

Wrong

Within 10 seconds after initiating the call, the mediation server will send a “Cancel” request to the gateway and will terminate the call:

19:25:49.640 [S=660986] [SID:1327438938] INVITE sip:+353857560598@10.10.10.1;user=phone SIP/2.0 
19:25:49.729 [S=661027] [SID:1327438938] SIP/2.0 100 Trying 
19:25:49.730 [S=661035] [SID:1327438938] (   lgr_psbrdif)(    656225)   pstn send --> PlaceCall
19:25:50.087 [S=661039] [SID:1327438479] (   lgr_psbrdex)(    656229)   pstn recv <-- CALL_PROCEEDING
19:25:59.020 [S=661055] [SID:1327438938] CANCEL sip:+353857560598@10.10.10.1;user=phone SIP/2.0 
19:25:59.052 [S=661061] [SID:1327438938] SIP/2.0 200 OK 
19:25:59.058 [S=661065] [SID:1327438938] SIP/2.0 487 Request Terminated 
19:25:59.079 [S=661074] [SID:1327438938] (   lgr_psbrdif)(    656265)   pstn send --> PSTNDisconnectCall

Clearly, the gateway is available and responding, but the mediation server is somewhat impatient.

Doing a bit of digging around, it appears there’s a setting for this.

The Lync mediation server is expecting a “SIP 180 Ringing” within 10 seconds of initiating the call. Failing to provide that (183 “Session in Progress” can be pushed here but that won’t benefit you if you’re still waiting for that ring…) will cause the mediation server to decide that the call didn’t respond and to mark the gateway as unavailable. A little too harsh? In my opinion yes. This will generate errors (2 per failed call) on your Lync servers and is actually a Lync self-inflicted error.

To work around this issue you can go to:
<Lync or Skype for Business installation folder\Server\Core
and look for a file called “OutboundRouting.exe.config“.
When you open that file, the first configuration line is:
<add key=”FailOverTimeout” value=”10000″/>
This actually determines how long should a call wait for the gateway until it declares it “Offline”. the value is in milliseconds, meaning the default setting is 10 seconds.

This will impact your environment if you have two or more gateways, and it means that it’ll take the mediation servers more than 10 seconds to declare that gateway as offline if it actually fails.

For me to workaround that issue I extended the wait time to 20 seconds, restarted the front-end and mediation services, and all of a sudden no call was dropped.

It appears it took the carrier approx. 12 seconds to generate the “PSTN Call Alerting” command to the gateway. For some reason, only calls to the US (and only with this specific carrier) were affected by this issue.

Now, I can see all my calls working perfectly and the errors have disappeared from the serves too.

Advertisements

Irish Public Holidays 2016 set for Skype for Business Server

Standard

New year, new holiday set.

This updated script will add all 2016 public holidays (and 2017 new years day) to your Skype for Business Server \ Lync 2013 Server environment as a holiday set.

Holidays in this script:

  • St. Patrick’s day.
  • Easter Monday.
  • Bank holidays for May, June, August, October.
  • Christmas day.
  • St. Stephen’s Day.
  • New years day 2017.

This will then become an option to tick on your response group workflows:

2016

The script can be downloaded here.

Skype for Business VCFG Viewer

Standard

When preparing documentation for a customer or trying to survey an existing deployment, I usually ask for a copy of the Voice Configuration files (.vcfg) of Lync \ Skype for Business to see how things are set up.

Currently there’s no option to view the vcfg file in a readable format.
Sure, you can open the vcfg file in a text editor or even a web browser and browse through the xml. Alternatively, you can import the configuration to your servers – but good luck trying to read it between the “Committed” and “Uncommitted” text.

The vcfg viewer is a “PowerShell executable” that runs through the vcfg file and renders it to a readable html file.
Simply run the ps1 file, choose the vcfg file you wish to open, and you’ll have a browse able page to examine the vcfg file:

Download the ps1 file here.

MS15-104 Security update breaks Lync Server 2013 web services

Standard

Microsoft released a KB article describing issues with the Web Components Server on Lync Server 2013 after installing the latest security update.
It affects the following:

•Users can’t sign in to your dial-in page.
•Lync Mobile clients can’t sign in.
•External clients can’t sign in.
•Address book web queries fail.
•Users are prompted for credentials for some web services after they sign in internally to Lync desktop clients.

To resolve this problem, uninstall security update 3080353, install the July 2015 cumulative update, and then reinstall security update 3080353.

Source and additional information: Microsoft.

Skype for Business Server Response Groups Migration Gotcha

Standard

Noticed something weird during a recent upgrade to from Lync Server 2013 to a new Skype for Business Server pool.

I was double-checking my self against Greig Sheridan‘s very detailed guide and as expected, everything worked just fine.
We tested and confirmed that calls are coming through and the response groups are all working as expected – that was easy.

The next week I’m getting a phone call saying the Response Groups’ managers can’t change some of the settings they used to be able to.
Checking AD permissions – OK.
Opening the Workflow – I can’t see the managers and the workflow is set to “Unmanaged”.

I thought it was just a misconfiguration but then I checked the “Get-CsRgsWorkflow” export we did earlier (Never underestimate documenting!) and all workflows that were set to “Managed: True” are now set to “Managed: False“.

I thought it was a bug, but no – it’s by design. I couldn’t see it in the migration documents (nor thought I should look for it!) but the Microsoft planning document for Skype for Business Server 2015 states very clearly vaguely that “When you migrate response groups from a prior version to Skype for Business Server, the type is set to Unmanaged.”
Here, check for yourself.

I’m not sure if this is the behaviour when performing an in-place upgrade to Skype for Business, but assuming it is.

Reset \ Fix Skype for Business and Lync accounts …And a bonus!

Standard

Every now and then I run into one account that has something weird about it. Usually it would display an old phone number even when you changed the phone number in AD, ran Update-CsAddressBook and deleted all old records you had.
However, when looking at the user’s contact card you’ll still see the old number AND the new number, resulting in users dialling the wrong number, resulting in call failures, resulting in service desk calls, do I need to go further…?

I read various workaround involving running various scripts against the RTC database, and to be honest – there are some thing I’d rather not touch.

Workaround for the “corrupted” users would normally be disabling them and then re-enabling the for Skype for Business \ Lync.
This will do the trick, BUT has a huge downside: if you didn’t export the users’ data they’ll lose all of their groups, favourites, contacts, etc. Also, you’ll have to re-apply the user’s policies, line URI, private numbers, etc.
This is both time consuming and would require to schedule a maintenance window as the user is going to be kicked out.

So, came up with the following script:

Window

  1. You’ll provide the user’s sip address.
  2. The system will confirm the user’s name so you’re happy to move on.
  3. The user’s data will be exported to a folder on your local C drive (Path is always C:\y0av\users\<username>)
  4. The user’s Get-CsUser data will be exported to a text file in the same folder so you can compare the settings once you’re done.
  5. The user will be disabled, and the script will run Update-CsUserDatabase and pause for 15 seconds.
  6. The user will then be enabled and the script will pause for another 15 seconds. The purpose of the pauses is to allow for the changes to set in. I tested various environments and 30 seconds seems like enough time. I’ll add the option to change the pause in future versions.
  7. The user’s polices will be re-applied.
  8. The system will run Update-CsUserDatabase and Update-CsAddressBook to reflect the changes.
  9. The user’s data dump is saved in the C:\y0av\users folder, and you can manually delete it if you feel you no longer need it.

Folder

What’s missing?

  • If the user is configured with a Private Line you’ll have to use the script with the -PrivateLine switch.
  • PIN must be reset for the user after the account is recreated.
  • Conference ID will change – any recurring Skype for Business \ Lync meetings must be re-sent.

What’s the bonus?

All the tests I made show that the script runs fast enough to not interfere with the user’s activity; I ran this against a user during a call, the script completed and the call never disconnected.
Try this on a test account before killing someone’s call in your organization… 🙂

______________________________________________________________________________

Policies this script will save and re-apply:

Hosted Voicemail (True or False)
Archiving Policy
CallViaWork Policy
Client Policy
ClientVersion Policy
Conferencing Policy
External Access Policy
Hosted Voicemail Policy
Location Policy
Mobility Policy
Persistent Chat Policy
Pin Policy
Presence Policy
Third Party Video System Policy
User Services Policy
Voice Policy
Voice Routing Policy

To run the script for a user with a private line, run:

S4BUserRepair.ps1 -PrivateLine

Download the script here.