Enterprise Monitoring vs. Enterprise Automating -- Bryan Ignatow
Posted by Rick Pandolfi on Mon, Jan 31, 2011 @ 11:52 AM

Enterprise Monitoring vs. Enterprise Automating
For service providers to effectively manage customer churn they must optimize service continuity; in the face of degradation, interruption, or change they must manage speed and efficiency. Gone are the days of hiring stadiums full of people to watch and react to the behavior of their network, applications, and services. We no longer see dozens of monitoring operators jump out of their bunker at the first sign of trouble pummeling it into submission. Those were the old days of monitoring.
Many Operators have been replaced by Engineers. The Engineers play a more strategic role than legacy Operators did. Engineers are expected to continuously improve the infrastructure by rapidly delivering innovative solutions to an increasingly discerning and competitive marketplace, while maintaining operational responsibility. Given the broader domain of building out the network, maintaining the network, stewarding changes, all while responding to real-time events -- eating and sleeping are becoming optional. How do we press this evolution forward, ensuring its sustainability and increasing economic viability?
Agile best practices and the automated EMS
Network engineers might need to crib some notes from the Software engineer’s playbook. By this I mean they need to take their own intellectual property: the experience, tools, and scripts that make them effective problem solvers, and package them into re-usable, re-producible objects… into automations.
If they do this, our intrepid Network Engineers will be their own best friends, while answering leadership’s and the market’s call to do more with less. Real-time issues will be resolved more rapidly and customer churn from service disruption and degradation will be greatly reduced.
I recently saw this scenario play out recently at a customer who has a substantial volume of network interfaces serving remote retail branches. Their circuits began exhibiting issues: high utilization, dropped traffic, congestion, etc. The branches began reporting poor response time for business critical applications.
In the past the scenario would have played out this way: bandwidth monitoring software and packet capture technologies would provide the raw data for analysis. In the best case, the issue would be detected and, if they were really fortunate, the time, experience and opportunity to design and deploy a solution would follow. Some very favorable conditions are required for this to end happily and efficiently: operators need to be on post and available to capture the offending traffic in real time and not otherwise occupied with other issues or responsibilities. If any of these conditions were absent, the issue goes undetected or otherwise unsolved, creating a missed opportunity, and a population of unhappy customers is born.
IT Service Management: Doing More with Less through Automation
Our customer has greatly reduced the risk of spawning event generated unhappy customers – through automation. They detected the issue when the router interface exceeded 75% utilization. At that threshold the EMS instructed an in-line packet capture probe to automatically start grabbing the offending traffic and begin recording other diagnostic information. An alert was automatically issued identifying the incident and suggesting where additional diagnostic evidence could be found.
This wider and more rationally calibrated net eliminated the need for a happy accident of live, real-time detection. The automations also relieved the responding engineers of time consuming rudimentary troubleshooting. The problem was halfway to a solution before human intervention was even mobilized or required.
Which scenario would your NOC prefer to live through?
Enterprise Systems Monitoring vs. Enterprise Systems Automation
With infrastructures growing and staff not, the only way to do more with less is automation. Automation begins with capturing the processes, procedures, and domain specific knowledge used every day, and it ends up with a library of routines that enrich the intelligence and value of the infrastructure over time. This model and these practices can be extended into many other areas such as databases, servers, web applications to name just a few.
What are you doing: Monitoring or Automating?
Bryan Ignatow