Got this question from a reader…
Q. Is Windows NLB always an option? Or when does it become wise to look into a hardware solution?
Deciding to use NLB or Not…
A. I find NLB very scalable and a simple solution on intranets. I would rarely recommend NLB on an extranet or Internet solution. Why? Not so much because of scalability which ultimately ends up being more of a network bottleneck or NIC bottleneck than a software bottleneck in its load balancing algorithm. The DIP and VIP solution that NLB provides is pretty simple. The intelligence inside of NLB is quite stupid actually. I’m really not a fan of NLB from a pure high availability solution because of it’s lack of understanding of web technologies. There are some enhancements in NLB Windows Server 2008, but not the awareness features I would have liked to have seen around web service detection. What I mean by this is NLB will continue to send users to a server which is up, but the web service is hung, slow, or depreciated in a number of ways. It’s intelligence nearly goes to the level of is the IP up or not.
What about NLB vs. DNS round robin?
It’s almost as bad as DNS round robin where it will simply send users between two systems. The parts where it is better is the flexibility of who manages it. If your server is hung, but you can ping it, NLB and DNS round robin will both continue send users to the system, but if you’re doing maintenance on a system you can simply drainstop connections and monitor usage with perfmon for the w3wc service. With DNS RR you’d have to go into DNS and make the change to pull it out of rotation, and remember to add the entry back in. Most of us wouldn’t have access to both DNS and SharePoint. So the flexibility of the NLB actually living on our boxes makes it easier and more manageable. Think of NLB as more of a manageability and load balancing solution rather than a high availability solution, which is likely a tough stretch for most people.
How is hardware load balancing better or worse
NLB is cheap. Hey, you’ve already paid for the server and the Windows software and NLB is included. If you decide you want hardware load balancing solution, and there are a ton of options out there. You’ll find the cost takes quite a jump based on the solution. With a hardware based solution you should look at all the knobs and the bells and whistles. Now you get various layers of security filtering, you get caching and compression, you get a real high availability solution with very detailed detection based on HTTP error codes or a half dozen other mechanisms that can be used to detect failures or slow responses. Again NLB doesn’t give you added security, caching, compression and we’ve already discussed it is pretty stupid with HTTP. Originally it wasn’t designed for HTTP. It was designed for load balancing various applications and never went to the level of trying to understand the services that it load balances. It’s in the wrong network layer to help us with our application intelligence and would suggest it be configured in a monitoring solution like MOM or system center operations manager. It’s always expected applications to understand it, and hoped that applications would use it’s APIs to stop and start services.
With Hardware load balancing you can control the level of intelligence. You don’t have to look far for credible solutions. Most of your network equipment people have solutions for you… F5 and Cisco are the most popular, but even Microsoft’s IAG and ISA should be considered when you’re looking to provide solutions in the more intelligent space where you are looking at providing a real high availability solution. (While the MS solutions are not hardware or firmware based, you should look at offloading these to separate hardware.) Most people wouldn’t even dedicate the hardware for the SharePoint solution and often these types of devices are already providing solutions in the Internet or Extranet space and quite possibly for the Intranet Portal. When you purchase a solution make sure you’re buying redundancy. If you purchase one device you’re back to the single point of failure.
Q. Is it simply a matter of sessions and activity, or would utilizing Excel Services, Workflow, BDC, etc become a breaking point for NLB?
I have heard of stress tests with NLB vs. Hardware LB solutions, but haven’t been impressed. Every day I see NLB serving very large internal applications and doing just fine. Like I said, it’s usually the NIC or the network that’s the real bottleneck. Running services on the server doesn’t really matter, it’s all just traffic and NLB isn’t cracking open the packets anyway. The decision to go with one or the other should be coming from security requirements, or true high availability requirements, but understand the cost delta. It is quite possible to get 99.9% availability with NLB, but your monitoring team needs to understand NLB and they also need to be 24/7 and your monitoring needs to monitor all the DIPs (dedicated IPs) and VIPs (Virtual IPs) in your farms. Not just the IPs, but the HTTP services requested from the DIPs and VIPs.
Oh, and by the way, SharePoint the application doesn’t care which one you use. That’s up to you.
I’ve done some previous posts on the topic of configuring and troubleshooting NLB