Beyond Building Networks: The Day Two Automation Challenge

Aug 1

Summarizing NetBrain's operational automation approach from AutoCon3

"We talk a lot about the build side, automating how you build your environment," began the NetBrain presenter, "but when you onboard it and hand it over to the run team—in my experience, it was always a nightmare."

His candid admission about the operational side of network automation struck a chord with many in the audience who raised hands when asked who's responsible for "keeping the lights on" for their business.

The 3 AM Reality Check

The presenter painted a scenario that will resonate with anyone in network operations:

"Phone call at three o'clock in the morning. You've lost an application. You know there was a change at one o'clock. The on-call team is trying to get out of bed, find somebody to help troubleshoot. The team call starts spinning up, people tapping away in CLI sessions, trying to find a needle in a haystack. It's like putting together a thousand-piece jigsaw puzzle without having the box."

That was his world for twenty years in IT outsourcing for a major global telco—"essentially saying sorry and writing service credits. It was horrible."

But contrast that with the NetBrain-enabled response: "You still get that phone call at three o'clock, but the team is already fixing it. We know what the problem was. NetBrain has given me a map of the crime scene, taken all the noise away, we know the service path, and we've run automated triage. Within sixty seconds we had an idea where this problem was."

The Problem Statement Everyone Recognizes

His problem statement from his telco days should sound familiar to many operations teams:

Not enough people with the right skills
Faults took too long to fix
Every change required "crossing fingers" due to unknown downstream effects
People always blamed the network first

This led him to market seeking solutions, eventually discovering NetBrain and joining the company after seeing how it addressed these operational pain points.

The Data-Driven Revelation

NetBrain's analysis of customer ITSM tickets revealed two critical insights across multiple industries:

50% of incidents were avoidable: Fat-finger errors, change-related issues, configuration drift—things that weren't actually broken but could be prevented if proactively monitored.

95% of real incidents were repeats: Teams knew how to fix them from previous tickets, understood the triage steps and root causes, but every time the incident repeated, they went through the same manual CLI-based troubleshooting process instead of learning from previous experience.

"We haven't changed the way we manage networks in well over 30 years," he observed. "This traditional approach had to change."

The Two-Pillar Approach

NetBrain's solution builds on their 20-year foundation of network discovery and digital twin creation, adding two operational automation pillars:

Pillar 1: Proactive Prevention

Look for the 50% of avoidable issues before they become problems:

Telnet still enabled on devices
Unnecessary open ports
Backup router configurations that don't match primary (causing inelegant failovers)
Configuration drift detection

Customers using this approach see "50% improvement in change-related incidents in the first year."

Pillar 2: Automated Incident Response

For the repeat incidents teams already know how to fix, automate the standard triage steps:

ITSM ticket triggers NetBrain
Creates "map of the crime scene" with noise filtered out
Runs through standard industry best practices checklist
Either eliminates usual suspects (confirming it's something new) or identifies the likely cause
Automatically feeds information back to resolver groups

Real-World Results

The presenter shared compelling customer outcomes:

Large Service Provider: 75% MTTR reduction through automated incident response workflows that create service path maps and run automated checks within the first 60 seconds of an outage.

Global Media Company: 30% improvement in network operations efficiency, freeing people from menial tasks for project work. Teams start each day with dashboards showing the most critical services, allowing them to prioritize effectively.

Compliance Automation: Over 90% reduction in audit overhead through automated compliance checking that verifies changes meet architectural standards and identifies any unintended path impacts.

Service-Centric Operations

A key theme was shifting from device-centric to service-centric network management. Rather than asking "what devices are affected?" the approach focuses on "what business services are impacted?"

"Having a dashboard that starts your day that the business can recognize—these are the things that dictate my share price, my reputation in the market. Everything's green, or if it's not green, it's red and I know why."

This stops the cycle of everything being treated as a P1 incident before determining actual business impact.

Integration, Not Replacement

NetBrain positions itself as augmenting existing automation tools rather than replacing them. The presenter showed integration examples with Ansible and other automation platforms, emphasizing that NetBrain provides the operational intelligence layer that triggers and informs other automation systems.

The Day Two Challenge

What makes this presentation valuable is its focus on the often-overlooked operational side of network automation. While much of the industry discussion centers on building and deploying network configurations, the daily reality of keeping services running presents different challenges:

Context awareness: Understanding service paths and dependencies during incidents
Proactive monitoring: Catching configuration drift and potential issues before they cause outages
Institutional knowledge: Capturing and automating the tribal knowledge of how to fix recurring problems
Business alignment: Connecting network health to business service impact

The Stockholm Syndrome Effect

The presenter's personal journey—from accepting 3 AM troubleshooting calls as "normal" to recognizing there had to be a better way—reflects a broader industry challenge. Many network operations teams have normalized reactive, manual processes simply because "that's how it's always been done."

His transformation story suggests that operational automation tools like NetBrain can break this cycle by providing:

Immediate incident context through automated mapping
Proactive detection of preventable issues
Automated execution of known troubleshooting procedures
Continuous monitoring to prevent repeat incidents

Why This Matters

The presentation highlights a critical gap in many automation strategies: the operational handoff. Organizations may successfully automate network builds and deployments but struggle with day-to-day operations, change management, and incident response.

NetBrain's approach suggests that operational automation requires different tools and thinking than deployment automation—focusing on service visibility, proactive monitoring, and institutional knowledge capture rather than just configuration management.

For teams responsible for keeping networks running, the promise of reducing MTTR by 75% and preventing 50% of incidents through automation represents a significant quality-of-life improvement beyond just operational efficiency.

As the presenter concluded, "Lift those senior engineers and get them doing things that are really important to the business" rather than spending their nights troubleshooting repeat incidents that could be automated away.

Watch the full presentation: Accelerating Sponsor: NetBrain

AC3

Chris Grundemann

Executive advisor. Specializing in network infrastructure strategy and how to leverage your network to the greatest possible business advantage through technological and cultural transformation.

https://www.khadgaconsulting.com/