Post a Comment!

To post a comment on an article, simply select "Click here to read/write comments."  Please feel free to subscribe to our blog below!

Subscribe by Email

Your email:

Follow Me

Where we're coming from....

Current Articles | RSS Feed RSS Feed

Using Logmatrix Nervecenter for Runbook Automation

  
  
  
  
  
  
  

It has been a while since I was around Nervecenter.  I "knew of" the product because I used to compete against it years ago.  It used to not be able to connect to databases, it held all of its states in memory (Perl hashes), and was a bit complicated to use.  The real reason it suffered was "what do you use it for?"

Well, not only has Logmatrix given this product a tremendous upgrade (uses databases to hold states, exposes the true power of all Perl modules, and still is the most scalable transaction based product in the industry for my money), but I find this product to be indispensible for automating your environment.

Here are the reasons I think every company that wants to automate should consider this product:

1)  Nothing compares to this product when it comes to executing multiple code steps simultaneously.  Parallel operations are a must in large environments with thousands of objects that need to be "touched".

2)  In automating a procedure, users often must consider the before and after states before taking a next step.  Nervecenter does this better than any product out there.

3)  Nervecenter can use Perl to report on the before and after of each step.  It can notify anyone at anytime of the status of a step.

4)  Nervecenter can "change gears" in the middle of an automation.  If a process is being automated and in the middle of all of the steps it receives a trap, it can stop doing what it is doing and go do something else.  I cannot think of a product that does that.

Overall, Nervecenter is pretty much the perfect engine if you are looking to automate.

low cost IT management

EMS: The Heat and Light of Service Management

  
  
  
  
  
  
  

IT Service Management and Monitoring

The Heat and Light of Service Management

During this, the week of IBM’s Pulse gathering I’ve had the great fortune of speaking with some of the great thinkers in the space of Service Management.  Last night was no exception; I had the pleasure of sharing a meal with two of the leading thinkers in the space who have requested to remain nameless for the purpose of this post.

Last night the point was made that for Service Management software vendors, including the big 4 (IBM, HP, CA, BMC) and the insurgents (Monolith, ScienceLogic, Nimsoft) there are three potential markets to sell into:

  • Operations:  NOC Ops: real-time fault, availability management;
  • Engineering:  Ensuring emerging/expanding products (SP’s:  DSL, Fiber to the home, IPTV, etc.; Enterprise:  trading applications, customer portals, etc.) are planned for, tested, hardened through capacity management, dynamic provisioning, etc;
  • Customer Experience Management:  Real time, end-to-end and vertical visibility into Layers 2-7 focused on customer application service experience.

The consensus is that the Operations Market is saturated beyond anyone’s tolerance level to revisit.   Perhaps that is an over statement, but the insurgents are well aware that their superior technology and “point” value proposition will more often than not be thwarted or overwhelmed by the big 4’s ability to make seemingly interesting economic cases through ELA’s and other compensatory accommodations.

The Engineering Market holds some interest for the ISV’s as they see the opportunity to distinguish themselves across a complex and diverse set of business processes including Inventory, Provisioning, Billing, et al., where customers will truly put a premium on openness, flexibility and scalability. 

But the cherry of these markets for the ISV’s is the Customer Experience Management market and they are innovating to stake their claim to it.  The two characteristics of this market that they find intriguing: 

  • LOB centric:  the ability to perform and achieve in the Customer Experience domain has a straight-line translation into important and obvious KPI’s like customer conversion, customer churn, customer upsell/cross sell rates, etc.  There’s gold in these hills, and;
  • Innovation Sensitive:  unlike with Operations, and to a lesser degree, engineering the intellectual property required to pull this off is neither trivial nor ubiquitous, and the ISV's believe that customers will respond when they recognize the payoff of innovations in this area.

To perform in the Customer Experience Management domain requires:

  • The ability to acquire and transform “legacy” types of data:  availability, bandwidth thresholding, disk capacity/availability/usage so that they present a holistic, coherent representation of real world experience such that they could immediately answer questions like, “How is our online claims processing application performing?”
  • They get at “new” types of data.  For example providing full information on end-to-end VoIP performance such that the links between Handset=>RJ45Connector=>Switch=>Router=>MPLS=>VoIPApplicationSvr=>Termination are monitored for link performance, packet loss, etc.

The sense is that there is real heat and light in the Customer Experience Management domain because of the potential for significant business impact and the advances that it will drive in the technology.  For what it’s worth, we’re keeping our eye on Monolith Software.  We believe their unified approach, which organically brings together Historical Event data, IPSLA data, Transaction data, et al into a single database will allow customers to readily transform, consume and present this vast and varied data landscape into meaningful and impactful Customer Experience data that will drive improvements and business performance for the SP’s, ASP’s, MSP’s and Enterprises willing to get out front.

Rick Pandolfi

Many Hats -- Adventures in Enterprise Management Systems (ETL)

  
  
  
  
  
  
  

IT Service Management and Monitoring

Many Hats -- Adventrues in Enterprise Systems Management (Extraction, Transformation & Load)

In the wide, wide world of Enterprise Systems Monitoring and Management, ITSM pros find that we get to wear many hats. 

When given the opportunity to get your hands dirty you can muss them up while sporting the Scripting & Coding hat, often you get to wear the Engineer’s hat, and sometimes you get to wear the Systems Administrator’s hat.  For the last few days, I’ve been having fun times donning my DBA & ETL hat. 

The tussle was over the extraction of system data from an Oracle DB to be loaded into MySQL.  The queries I was running were fairly large (several hundred thousand rows) and my script would run fine for several minutes until Oracle threw off various errors; saying that it was running out of buffer, took too long to run the query, etc.  To feel my pain look here.  ORA-01406 fetched column value was truncated.

After several hours of researching the Oracle forums and websites for remedies to this error, I gently removed my sweat soaked DBA hat and replaced it with my faithful scripting hat -- the script in question pulls data from Oracle using DBI and DBD::Oracle (64 Bit….tune in for the next blog on how to get this installed on Linux). 

The script then pushes the Oracle data into MySQL using DBD::mysql.  It is simple enough to do except that it kept timing out.  The error above led me to think that this was an Oracle issue (not enough memory, time out, etc).  It wasn’t -- the problem was doing the push into MySQL while I was in the loop.  Here was my code before I fixed it:

 

#!<path to perl>

 

<declare variables>

use DBI;

use DBD::Oracle;

use DBD::mysql;

 

# Tell script where oracle lives

$ENV{'ORACLE_HOME'}="<path to oracle> /instantclient_10_2";

$ENV{'TNS_ADMIN'}="<path to oracle>/instantclient_10_2/network/admin";

$ENV{'LD_LIBRARY_PATH'}=$ENV{'LD_LIBRARY_PATH'}.":".$ENV{'ORACLE_HOME'};

$ENV{'PATH'}=$ENV{'PATH'}.":".$ENV{'ORACLE_HOME'};

 

#open conn to oracle

 

my $dbhMYSQL=DBI->connect('dbi:mysql:DB','mysqluser','mysqlpass');

my $dbhORACLE=DBI->connect('dbi:Oracle:Oradb','oracleuser','oraclepass');

 

$MyLongQuerySQL = “select …..”;

 

my $MyLongQuery=$dbhORACLE->prepare($MyLongQuerySQL);

$MyLongQuery->execute();

 

 

while (my @LongQueryData = $MyLongQuery->fetchrow_array())

        {

                # as I go through each Oracle row, insert the row into MySQL

  my $NewInsertSESQL = "insert into MSSwitchEnrichment values ('$LongQueryData[0]','$LongQueryData[1]','$LongQueryData[2]')";

                  my $NewInsertSE = $dbhMYSQL->prepare($NewInsertSESQL);

                  $NewInsertSE->execute();

    }

 

#<<<<<<<<<<<<END>>>>>>>>>>>>>>>

 

This worked when the Oracle Query and subsequent insert into MySQL was good if the Oracle query was only a few thousand rows.  Once the query exceeded multiple tens of thousands of rows it would time out.  This puzzled me when I did the Oracle query by the command line.  It had no problem sending the output to the screen….all 400K+ rows.

So this led me to consider the question what if I perform the Oracle query all at once and then store the results in an array?  I could then loop through the array and insert each one of THESE lines into MySQL.  Result?  Works great!  Here is the code fix below:

 

 

<everything is the same until you get to the first while loop, then>

 

while (my @LongQueryData = $MyLongQuery->fetchrow_array())

        {

               

 

                 # as I go through each Oracle row, insert the row into MySQL

  #my $NewInsertSESQL = "insert into MSSwitchEnrichment values ('$LongQueryData[0]','$LongQueryData[1]','$LongQueryData[2]')";

                  #my $NewInsertSE = $dbhMYSQL->prepare($NewInsertSESQL);

                  #$NewInsertSE->execute();

 

        my $entry = "";

 

                        $entry = "$LongQueryData[0], $LongQueryData[1], $LongQueryData[2]";

 

                        push (my @LongQueryInfo, $entry);

 

                        $entry="";

 

    }

#  Done with the Oracle part here, all Oracle lines are loaded into the array with commas separating each field.

# NOW we load the Mysql database with an array

 

foreach $LongQueryInfo (@LongQueryInfo)

{

@LongQueryEntry = split(",",$LongQueryInfo);

 

my $NewInsertSQL = "insert into MySQLTable (Field1, Field2, Field3) values ('$SEEntry[0]','$SEEntry[1]','$SEEntry[2]')";

my $NewInsert = $dbhMYSQL->prepare($NewInsertSQL);

$NewInsert->execute();

 

}

 

#<<<<<<<<<<<<<<<END OF FIX>>>>>>>>>>>>>>>>>>>>>>>>>

 

The script has many other parts to it and it is formatted better than what you see above, but the crux of the problem and solution are outlined above.  I hope this is helpful.  To get more details about this or share stories in Enterprise Systems Management, please reach me here at chris@mkadvantage.com

Christopher Schaft

Enterprise Management Systems: Compete or Compensate?

  
  
  
  
  
  
  

IT Service Management and Monitoring

Enterprise Management Systems:  Compete or Compensate?

In my last post I may have taken some liberty by linking Customer experience of interrupted or degraded service with existential angst.  Perhaps that was a reach.    

Bad experience is only one factor in the customer satisfaction equation.  Other factors of course are:  their perception of other options, their perception of the effort it takes to find and move to other options, their perception of value (cost/experience), their perception of the relative import of the service in question, and so on.

So, really there may be two other important things to realize/consider:

          ● Service Quality is contextual in that it is in relation to other market resident options – Competitive;

          ● Service Quality is contextual in that it is in relation to the other links you may have to your customer – Compensatory.

If these observations are true, then perhaps addressing these issues is a question of balance.  And to make balanced decisions, business leaders need to better understand their current state, their optional states, their ideal states, and the roadmaps leading beyond the status quo.

It’s not possible in this blog to speak to specific current state realities.  That would require the due diligence of an Assessment.  However, we can certainly look at some prevailing statistics and do a gut check on where we believe our enterprises sit relative to them.   On average:

                ● 6 out of 10 Network outages are caused by manual error;

                ● each device in the enterprise have 30 configuration errors;

                ● Almost 50% of your IT engineering resources are spent setting up the erroneous configurations or fixing them after they break.             

As I always ask my partner, “Do you want the good news or the bad news?”  I’ll give you both:  we can treat these statistics as a Competitive reality, which can be exploited if we choose to rise above the mean.   Or we can stay under the bell curve and choose to compete in other ways (compensatorily).

 I’ll close today with offering one final statistic that may suggest you want to compete on this one, rather than compensate:  Investing (wisely, that is to say competently) in Enterprise Management Systems automation has shown a 625% Return on Investment rate.  So, it’s not only good Ops, its good business.

Rick Pandolfi

Enterprise Management Systems, not just for CIO’s anymore

  
  
  
  
  
  
  

IT Service Management and Monitoring

Enterprise Management Systems, not just for CIO’s anymore

Who is responsible for your company’s success?  Everybody?  No body? How finely would you like to split those hairs?

 Is it easier to think about it framed this way:  who stands to gain the most if the company succeeds?  Who stands to lose the most when it fails to deliver quarterly results?  Still not clear?

How about this:  who is responsible for customer acquisition, retention, expanding mind and wallet share?

If you came up with a quick and clear answer, click here

If you were willing to ponder the question more deeply, click here or, in the interest of fair play here.

My point is that the Chief Information Officer is not solely responsible for product design, product delivery, distribution, marketing communications, customer service, sales, or any of the important domains or sub-domains of your enterprise.  I would even go so far as to say she is not solely responsible for the Information of your enterprise.  If you agree with me click here.

I know that everyone in your enterprise would like to see the CIO wring more value out of the IT Organization for less money.  Heck, I would love to see her do that too.  Because she could hire my team to help her do that and we would be doing business together for the remainder of her tenure, but that is a different pitch altogether.

The fact is, CIO’s have been doing a great job across the board in managing to no or low growth budgets, but they’re fighting an increasingly difficult battle.  Enterprise Infrastructures are growing like hot cakes (just seeing if you’re paying attention):

 

                ● Network device growth is growing 200% annually on average;

                ● Network heterogeneity is at an all-time high and complexity, with proliferating operating system versions deployed;

                ● Increased devices deployed in an “unknown” state.

 These pressures and dynamics have an impact in only one significant area – CUSTOMER EXPERIENCE

In my next post I will offer some statistics that will hopefully get CEO, CFO, VP of Sales attention.

 

Rick Pandolfi

Enterprise Management Systems and the Big Win

  
  
  
  
  
  
  

IT Service Management and Monitoring

Enterprise Management Systems and the Big Win

In my last post I relayed some statistics that are inescapably compelling even if they are only half way accurate: 

● More than 60% of annual IT cost is IT Staff.  The next highest cost contributor at a rate at or above 25% is System Downtime.  Combined these two items total over 85% of annual IT costs.  Let me emphasize that:  IT Staff costs and System Downtime account for 85% of IT costs.

● Combine this with the fact that neither Network Automation, nor Fault Automation crack the 25% threshold.

This means we have tremendous headroom to reduce IT Staff costs through automation – and we can start with the simple basics.  And in so doing we can substantially accelerate Root Cause Analysis (RCA) and compress Mean Time To Repair (MTTR).

Structuring a compelling Business Case

Even with these compelling facts and straightforward logic, many IT shops still can’t get themselves moving fast enough in the right direction.  This is typically a result of failure to present a compelling business case.  So, I offer the following structure for consideration:

Identify Opportunities for Improvement:  There are so many.  Let choose Customer Churn Rate.  I always recommend a dynamic easily linked to top line performance.  I consider cost reductions secondary to inspiring KPI’s like market share, attrition rate, revenue by (sector, geo, product).

Contextualize your choice:  In other words, align the Opportunity for Improvement to an accepted and well understood corporate goal, like Market share, Revenue from the Customer Base or Cost of Revenue

Identify Potential Projects:  We hate projects with IT only impact.  In our example, link Customer Churn Rate to hard metrics, like MTTR, Downtime Rates, System Capacity and Availability statistics, etc.

From here it’s a matter of assembling some readily available statistics and correlated facts.  This need not be overly labored.  The fact is that the greenfields of Enterprise Management Systems (EMS) are largely behind us.   Almost everyone has something nominally serviceable.  The gap is in the quality of the deployment.  And the quality of an EMS deployment can be materially improved for less than six figures – with demonstrably enriching results. 

Let’s remember our base line statistics:  85% of our IT costs are related to people (who are typically doing a large proportion of work that could be automated, while being distracted from work that only they can do) and Downtime; which is a function of misdirected resources and under leveraged automation.

This is neither Rocket Science or Brain Surgery.  This is a brief exercise in connecting the dots and lining them up for a big and easily attainable win.

Rick Pandolfi

EMS Implementation Lessons Learned: Database Performance

  
  
  
  
  
  
  

IT Service Management and Monitoring

EMS Implementation Lessons Learned:  Database Performance

My customers run the gamut in terms of their ITSM maturity, their business objectives and their technology platforms.  But there are some lessons learned that I believe are universal or near universal.  When I find those, I will post them here.

Recently, during a project to stand up a production Enterprise Management System solution my team encountered some interesting challenges and machine behavior.  The EMS platform we were working with included the Monolith Software  Event Manager Suite with a back end database of MySQL 5.1.36. from Oracle/Sun.   The operating system was AS5 Linux from Red Hat.

In conjunction with the customers Architecture team, we sized the servers to account for a large volume of SQL inserts and updates.  However, very quickly we encountered unexpected resource consumption issues when the applications spun up any significant insert/update activity -- off the charts load average, CPU and memory usage. 

Troubleshooting included a fair amount of trial and error.  Finally, we came upon a combination of settings in my.cnf that worked.  We learned some valuable lessons that may help on a MySQL standalone server.

5 Settings with Performance Impact

● Setting 1:  max_connections:   By default, max_connections is 150 for a huge MySQL server.  We boosted ours to 500 and found that applications such as the Trapd Aggregator and Syslog Aggregator performed much better.  Try this out in a test environment.

● Setting 2:  innodb_additional_mem_pool_size:  By default, this is 3M.  We put ours at 6M and it demonstrated significant improvement.

● Setting 3:  innodb_log_buffer_size:  We put ours at 64MB to account for large log files.

● Setting 4:  innodb_buffer_pool_size:  This setting seemed to be a key to our success.  We set ours at 16384K.  By default, it is SUPER low (1024K) for the machine we were using (our machines had 40GB RAM).  After we changed this setting MySQL had a far larger amount of memory allocated to it than prior.  Before setting this to 16384, we would do a “top” and MySQL was using less than 1GB of RAM. After resetting it, MySQL used 15GB of RAM.  This may not sound like a good thing, but it doesn’t cause any problems to our experience.  The MySQL program allocated the memory, but doesn’t use it unless needed.   

● Setting 5:  innodb_thread_concurrency:  By default, this is 8 for a huge configuration.  We put the count to 16 and saw much better CPU usage results on “top”.

After making these changes, we experienced substantial improvement in MySQL performance.  For anyone with a very large dedicated server for MySQL trying to get the most out of their hardware, check out the entire my.cnf file at http://www.mkadvantage.com/apps-and-tools.  Download the file called MySQL my.cnf SUPER HUGE Example.  And best of luck.

Christopher Schaft

Crunching the ROI numbers on Enterprise Systems Management automation

  
  
  
  
  
  
  

IT Service Management and Monitoring

Crunching the ROI numbers on Enterprise Management System automation

Lies, damn lies, and statistics.”  Mark Twain credited Benjamin Disreali with this fine coinage.  But the research doesn’t bear this out.  Wikipedia lists several possible alternative originators, including:  Sir Charles Wentworth Dilke, Walter Bagehot, Arthur James Balfour and Leonard H. Courtney, but credits Twain with its popularization in 1906.  So, let’s go with that because Twain was funny and American and I’ve never heard of the other fellows.  I do intend to cite some of these aforementioned hyper lies which I would invite you to test with your gut and personal experience:

                ● More than 60% of annual IT cost is IT Staff.  Of the remaining items that register none account for more than 25% and tellingly that is DowntimeRounding out the findings are single digit laggards:  Training, Software and Hardware.

                ● I can’t even help my 11 year old son with his math homework anymore, but I can calculate that IT costs are centered on Staff and Downtime to the tune of more than 85%.

                ● Meanwhile, the degree of Network Automation and Event Automation fail to crack the 25% threshold.

If I were a CFO

…and let’s all be very glad I’m not.  But if I were, I’d be challenging our CIO about his or her efforts to drive down these numbers, especially if I knew that automation is devilishly cheap and has an off the charts Return on Investment (ROI).

Tomorrow, less lies and some guidance on the construction of a simple and illustrative ROI model to justify more IT Services Management automation.  I won’t tell your CFO if you won’t, that you can start enjoying tremendous ROI for less than $50K and 30 days.  But in the immortal words of #23, "Just Do IT".

Rick Pandolfi

Spending on EMS? “Sometimes nothing can be a real cool hand.”

  
  
  
  
  
  
  

IT Service Management and Monitoring

Spending on Enterprise Monitoring?  “Sometimes nothing can be a real cool hand.” 

In Paul Newman’s off handed and iconic take, “sometimes nothing is a real cool hand.” Of course Cool Hand Luke doesn’t always come out on top.  So, I’m both respectful and skeptical of Nothing – having it or doing it. 

In my last post I pointed out that enterprise software sales professionals have to deal with the “Do Nothing” option whenever they’re plying their craft.  It’s the most common and powerful competitor; and its chosen far and away more often than Brand X or Brand Y.  You can hear the note of disdain when sales guys throw it around in weekly forecast calls, “Quarter end is coming up and they may just decide to ‘Do Nothing’.”  Of course what they really mean is that the prospect is going to do something else.  Meaning they’re not going to buy something for this particular project at this particular time.  They may buy something for a different project or they may choose to advance this project through other means.  But to a sales guy hawking products, it looks, tastes and feels like Do Nothing

Today CIO’s and VPIT’s would be well served to consider Doing Nothing or Doing Something Else when they are presented with another big ticket ITSM platform spend.    But don’t just take Cool Hand Luke’s word for it.  Doing Nothing has a long and illustrious history.  The Buddha made a virtue of doing nothing.  Teenagers and couch potatoes, a life style.  Seemingly overmatched by history and gender, Queen Elizabeth I famously enjoyed one of the longest and most popular reigns of any English sovereign by living by the motto video et taceo

The benefits and options today’s IT Leaders have:

                ● Buy Nothing – Rent:  look to the Cloud there are strong ITSM/ITIL solutions offered in the SaaS model, including:  Nimsoft and Service-now.

                ● Buy “small” (which is closer to nothing than huge):  look to the all-in-one ITSM appliance market.  There is some very slick, easy to deploy and hyper cost effective solutions coming out in this area.  Consider ScienceLogic as a leader in this area.

                ● Do Buy Nothing one better:  Don’t Rip and Replace.  Just rip.  Don’t replace.  Be judicious -- chances are high that your Tools inventory is already too large, too costly, and underperforming.

                ● Automate, automate, automate.  Many Service Providers and Enterprises already own enough ITSM software to alert and alarm an army of Jeff Spicoli’s (no mean task).  But they haven’t channeled them into extended automated machine commands and branch logic.  While your machines are working overtime your Engineers can enjoy some Do Nothing time, or they can be working through that development punch list sitting on their desk.

Is it better to Do Nothing and Look Busy or Look Busy and Do Nothing?  More on this imponderable later.  And don’t forget to Automate.

Rick Pandolfi

Enterprise Monitoring vs. Enterprise Automating -- Bryan Ignatow

  
  
  
  
  
  
  

IT Service Management and Monitoring

Enterprise Monitoring vs. Enterprise Automating

For service providers to effectively manage customer churn they must optimize service continuity; in the face of degradation, interruption, or change they must manage speed and efficiency.  Gone are the days of hiring stadiums full of people to watch and react to the behavior of their network, applications, and services.  We no longer see dozens of monitoring operators jump out of their bunker at the first sign of trouble pummeling it into submission.  Those were the old days of monitoring.

Many Operators have been replaced by Engineers.  The Engineers play a more strategic role than legacy Operators did.  Engineers are expected to continuously improve the infrastructure by rapidly delivering innovative solutions to an increasingly discerning and competitive marketplace, while maintaining operational responsibility.  Given the broader domain of building out the network, maintaining the network, stewarding changes, all while responding to real-time events -- eating and sleeping are becoming optional.  How do we press this evolution forward, ensuring its sustainability and increasing economic viability?

Agile best practices and the automated EMS

Network engineers might need to crib some notes from the Software engineer’s playbook.  By this I mean they need to take their own intellectual property:  the experience, tools, and scripts that make them effective problem solvers, and package them into re-usable, re-producible objects… into automations. 

If they do this, our intrepid Network Engineers will be their own best friends, while answering leadership’s and the market’s call to do more with less.  Real-time issues will be resolved more rapidly and customer churn from service disruption and degradation will be greatly reduced.

I recently saw this scenario play out recently at a customer who has a substantial volume of network interfaces serving remote retail branches.  Their circuits began exhibiting issues: high utilization, dropped traffic, congestion, etc.  The branches began reporting poor response time for business critical applications. 

In the past the scenario would have played out this way:  bandwidth monitoring software and packet capture technologies would provide the raw data for analysis.  In the best case, the issue would be detected and, if they were really fortunate, the time, experience and opportunity to design and deploy a solution would follow.  Some very favorable conditions are required for this to end happily and efficiently:  operators need to be on post and available to capture the offending traffic in real time and not otherwise occupied with other issues or responsibilities.  If any of these conditions were absent,  the issue goes undetected or otherwise unsolved, creating a missed opportunity, and a population of unhappy customers is born.

IT Service Management: Doing More with Less through Automation

Our customer has greatly reduced the risk of spawning event generated unhappy customers – through automation.  They detected the issue when the router interface exceeded 75% utilization.  At that threshold the EMS instructed an in-line packet capture probe to automatically start grabbing the offending traffic and begin recording other diagnostic information. An alert was automatically issued identifying the incident and suggesting where additional diagnostic evidence could be found. 

This wider and more rationally calibrated net eliminated the need for a happy accident of live, real-time detection.  The automations also relieved the responding engineers of time consuming rudimentary troubleshooting.  The problem was halfway to a solution before human intervention was even mobilized or required.

Which scenario would your NOC prefer to live through?

Enterprise Systems Monitoring vs. Enterprise Systems Automation

With infrastructures growing and staff not, the only way to do more with less is automation.  Automation begins with capturing the processes, procedures, and domain specific knowledge used every day, and it ends up with a library of routines that enrich the intelligence and value of the infrastructure over time.  This model and these practices can be extended into many other areas such as databases, servers, web applications to name just a few.

What are you doing: Monitoring or Automating?

Bryan Ignatow

All Posts