Can you have too many CPU cores?

As I found out today the answer is yes, if you are deploying a Windows role that requires the WID (Windows Internal Database).  Below is the scenario I ran into and how to workaround the issue.

I had a customer that was attempting to deploy RDS (Remote Desktop Services).  I say attempting as he was having no luck getting connection broker to install properly.  The connection broker, session host, and rdweb roles would install, but the session collection was not being created.  Additionally, my customer was not able to manage RDS in server manager.  After several attempts of installing and removing the RDS components I noticed that the WID service was taking a very long time to start at boot and most times it would just hang.  I figured that we might have an issue with the existing OS or possibly a GPO (Group Policy Object) , so we isolated the server in an OU (Organizational Unit) with blocked inheritance and then added the newly loaded server to the domain.  The deployment still failed.  We then reloaded Windows.  Upon our first attempt at loading RDS it failed in exactly the same way.

At this point we knew the root of the issue was with the WID.  Searching the Internet turned up an article that alluded to a possible issue with configurations over 32 CPU cores.  My customer’s server is going to be used for a very CPU intensive application, so it was configured with 48 CPU cores (96 logical cores).  Since I was fresh out of ideas on what to try next I removed the WID and RDS components.  I then limited the server to 24 CPU cores through msconfig.  After a reboot we were able to deploy RDS without any problems.  To test, we removed the limit on CPU cores and rebooted.  The WID service then behaved exactly as before.

Now that we had the issue nailed down it was time to find a more permanent fix.  Before I get into that, let me detail the symptoms that were observed.  Hopefully this should help the next person that runs into this issue.

The primary behavior we observed was the WID service hung in a starting state.

Additionally we saw the following event in the application log when the WID finally started with more than 32 cores were exposed:

Process 0:0:0 (0xee8) Worker 0x0000000000 appears to be non-yielding on Schedule 47....

Finally the SQL error log contained a similar event:

*******************************************************************************
*
* BEGIN STACK DUMP:
* 07/21/17 09:35:26 spid 4268
*
* Non-yielding Scheduler
*
* *******************************************************************************
Stack Signature for the dump is 0x000000000000009C
External dump process return code 0x20000001.
External dump process returned no errors.

Process 0:0:0 (0x780) Worker 0x0000003077802160 appears to be non-yielding on Scheduler 47. Thread creation time: 13145128446017. Approx Thread CPU Used: kernel 62171 ms, user 7281 ms. Process Utilization 4%. System Idle 96%. Interval: 70052 ms.

 

So how did we fix this?

First we limited the number of CPUs exposed to Windows.  We then loaded SQL Management Studio as my customer was going to load SQL on the server.  We then connected to the WID (\\.\pipe\MICROSOFT##WID\tsql\query).  We set the CPU affinity to only use CPU 0 and CPU 1.  Finally we allowed Windows to see all the CPUs and rebooted.

Here are the steps I would recommend taking to correct this issue.

  1. If the WID and associated roles are loaded, remove them.  This may not be required depending on the role being installed, but it is better to be safe than sorry.
  2. Limit the CPUs exposed to Windows.  The easy way to do this is through msconfig.
    1. Launch msconfig.  Start, Run, msconfig
    2. Click on the Boot tab.
    3. Click Advanced options…
    4. Check the box for Number of processors:
    5. Set the server for 16 or less.
    6. Click OK twice and reboot.
  3. Install the Windows role that requires the WID as you normally would.
  4. Add the -P2 parameter to the WID service
    1. Open the services console (start, run, services.msc)
    2. Locate the Windows Internal Database service
    3. Right-click on the Windows Internal Database service and choose properties
    4. In the Start parameters box add “-P2” without quotes and click OK.  (This will limit the WID to 2 CPUs.  If you want more, change the number.)
  5. Remove the CPU limit imposed in step 2.

 

I would like to thank my colleague Curt for the startup parameter for the WID.  Far easier than loading SQL Management Studio Express.  I hope you found this article informative.  If you have anything to add or any suggestions, please do so in the comments below.

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s