Some time ago I had a revelation flying from NYC to Buffalo is a soviet style propeller plane. It came about in a form of shear fear. This is a quick flight and I was en-route to abig content provider’s brand new data center that gets it’s power feed from the Niagra falls. Basically about 20 minutes into the flight we entered a dark storm. The other 10 or so passengers were all pinged to the windows looking at this gnarly looking black cumulus type black cloud starting to surround us, even though it was well before 7am. Bammm! First of the many drops as the turbulence heated up. I was catching at least 2 to 3 inches of air off my seat as we wobbled and dropped entering this weather pattern. Not looking good!
Only thing I could do was to take my mind off something I had no control over. Obviously I could not ask the pilot to turn around. Basically we had about 45 min and had to ride it out. And then the lights went out!!
I was sure at this point that we were going to fall right out of the sky and this was the end of the line for me. I thought about my friends waiting for me at the data center and my family. How much worse would it get? And it got worse. The passenger to my left got sick, and I mean sick. I guess the MickeyD big breakfast had not settled well and the ups and downs of the ride got the better of the ol’ chap. Whatever pressure system we were riding on, could not justify Mc-egg and sausage sandwich.
I was trying to justify that this is as bad as it is going to get and the only thing left was for it to get better. This meant coming up with scenarios in my head such as “what if the pilot was blind!”. What? The pilot is blind?
I bet you would never fly in a plane were the pilot was actually blind. Same way you would never operate large scale applications blind. And what exactly do you need to have visibility into?
I have broken down the stack and recommended some basic things below:
In general businesses that are based on web centric services will have three layers:
- Compute resources: this encompasses network, storage, disk space, I/O, memory, CPU and other pieces easily translated to a hardware. Any of the out of the box SNMP solutions should work. Depending on your appetite for initial capex and your ability to run these systems, there a numerous services and packages available to you. Take a look st http://www.slac.stanford.edu/xorg/nmtf/nmtf-tools.html . I am going to pick just three out of the long list of available packages:
o Nagios – been around for a while. Opensource and free. Easy to setup but will require some development and upkeep. Having implemented this numerous times, I can tell you take the “what goes in = what comes out” approach to this platform. It comes from the old school stack of netsaint, with a previous lead from SATAN (no longer around). There are multiple MIB’s available from the community for this stack.
o NetFlow: more a network monitor but extendable to layer 4 on systems as well. Simple, but will also need development.
o NetIQ: $$$ but it comes as a service. Enough said.
- Service stack: these are the bits of software glued together to make thing run. If on a .NET stack, then IIS services, db calls, and .NET specific service architecture pieces and how they interconnect. If on a open source or a non-.NET stack, then (depending on the technology used below the hood), the actual code files, the build processes, the apache/lighttp services and what not, all mapped and interrelated. Here are a few good tools:
o SolarWinds Orion: you can’t go wrong. It has a self discovery module with lots of bells and whistles. In terms of dollars and cents, it starts at around $7K but well worth it. It will map out service layer pretty thoroughly and is able to suppress unwanted red flags as needed. Does require deployment time, either as a service or stand alone.
o HP Openview: this is probably the best stack available with all kinds of modules. However super expensive and more geared towards large production environments. It will also require significant deployment resources as well as maintenance.
o Nexvu: great service engine analyzer. Easy to deploy but $$$.
- Business: this will directly depend on the business flow. There is no out of the box solution. In general you can take some siple BI tools and base your KPI’s in it in a way that is trackable. A great reasonably priced tool called QlickView is one of the best choices I have seen. Others include Pentaho. Pentaho is great free and open source , however it will require some serious dev time to get it to a state in which it is usable.
Now how do you glue the three layers to get a holistic view from the resource layer all the way to biz layer? Another blog for another time.