We had a meeting today to discuss what we should do prior to escalating a ticket to Microsoft. There’s obviously a lot of troubleshooting that Tier 2, Tier 3, and Engineering should do prior to an escalation to Microsoft, and all of that due diligence, but I wanted to put together an escalation checklist that goes beyond it. Kudos to Microsoft Support and Microsoft PFEs.
- Review the Service Pack and Cumulative Update level – You know one of the first things Microsoft support will want to know is what version and patch level you are at. If you’re way back, they are going to ask you to upgrade. At a minimum you should be on the latest service pack to address the majority of bugs they will point to. Now understanding that there is different tolerances to patching, this will be something you will need to decide. My recommendation is you don’t install a CU unless you need it. Well, when you’re dealing with what you think is a bug, there’s a chance it’s fixed a CU rollup or more recent CU. I’m not saying you HAVE to install the latest CU before you call Microsoft. Just know your service pack hotfix, patching and CU level and be able to defend it. It’s also an opportunity to make sure that any outstanding OS "windows update" style patches are installed as well. (Why is JSON from the list web service not working? Well, it has something to do with the fact that we’re missing ADO.NET SP1.) Make sure your house is in order before you make the call.
- Reboot / Recycle – If you’re ready to escalate an issue is it possible it’s client cache? Of course not, you’ve tried it out on multiple browsers and on multiple machines. Well what about server cache? One of our most recent escalations was addressed by cycling a service on an app server. It was a bit embarrassing, but taught me an important Microsoft lesson. Do a rolling reboot or at a minimum cycle iis on all your servers prior to calling MS.
- Eliminate Third party add-ons as the issue – One of the more common things that Microsoft will do is tell you to turn off your antivirus. You’d be surprised how many expensive tickets are resolved by finding that the server antivirus that’s installed is messing with SharePoint or SQL in some way. You’ll want to make sure it isn’t that Bamboo webpart, or codeplex solution. Microsoft won’t want to hear about issues you’re having that relate back to something you installed from someone else.
- Engineers Escalate / Partner / Awareness – It was a little embarrassing when we found out one of our users was on the call with Microsoft troubleshooting a workflow issue and engineering was getting looped in asking for access to the databases. Ops and Engineering and architects should all get a chance to troubleshoot and isolate the issue. They’ll also want to give the nod to make the call. Even if tickets from MS were free, you’re still going to want to make sure everyone has had the opportunity to figure out the issue. The most embarrassing escalation of all time was one that involved 3 days of troubleshooting and nearly 36 continuous hours on the phone to find out that the server was missing the Fab 40. I could have told him that was what was going on… that’s what the preupgradecheck or test-spcontentdatabase would have told us.
- Isolate the issue – You may not have the answer, but any good troubleshooting would narrow down the issue as far as possible. Is the issue with one front end? Well, then maybe you can take it out of load balancing and do some windiffs on the GAC. Maybe the issue is in a specific site collection, or only on a certain list. Have you tried exporting the contents to another site collection? Believe me, corruption and orphaning can and DOES happen in SharePoint, but often and import/export will leave the corruption behind.
- Code Issue – When you make that call, they are going to try to see if you’ve done your homework and narrow down the issue to what is in your software stack and slowly narrow it down. They’ll start very broad and then keep narrowing it down. Sometimes the broadness level they go to will drive you crazy, but it is part of the process to make sure all is taken into consideration. Do you have project server and TFS installed in your SharePoint farm as well? It matters, believe me. Now we’re looking at the issue and it looks like it might be in the code, and that’s where the strange error is coming from. Do you have a coorelation id? Hopefully you’ve already gone down that path to investigate those errors. The key is also to eliminate your own code. Not only will third party issues get closed down as you talk to Microsoft, if you want them to troubleshoot how your code interacts with their APIs you will likely need to talk to a totally different group, so make sure you keep that in mind as you ask for help. There’s a big difference between break fix and saying it’s not working like you expect it should when you build something against their API.
- Reach out to the Community (Twitter and/or Newsgroups) – Searching for the error message, or searching for a solution is already so common I’m not even going to suggest that you haven’t searched for an answer, but have you reached out to the community? I’m not saying this is the end state by any means, but where are you getting your list of known issues, known bugs, and how do you know if this is in that latest CU, but not in black and white? The importance of community is HUGE, and don’t overlook the power of this. I still get facebook messages from people bouncing ideas off to see if I know the answer to the issue. Many will post the explanation of the issue in the Microsoft Newsgroups and then reach out to the SharePoint Community on twitter to ask people to look at it. I’m sure many of us are not bothered by helping out others especially since we’ve stubbed our toe in exactly the same way. In fact in early SP 2010 it was amazing just how many had the same issues trying to configure the SharePoint User Profile service. Amazing how many conferences I’ve been at where someone raises their hand and says they spent 3 days and still couldn’t figure it out. Obviously we’d point that person to the purple blog, and Spence would get obvious kudos, but we’d also say, hey we’re in this together reach out. Don’t waste 3 days or 2 weeks when we’ve been there before.