The Black Box Problem: Why Trust is the Missing Piece in Network Automation

Aug 1

Summarizing Damien Garros's principles-to-practice approach from AutoCon3

"What if I give you a black box with a red button on it and ask you to press this button?" posed Damien Garros in his closing presentation. "I tell you it's going to fix everything. Just trust me. How many of you are going to press that button?"

The answer, predictably, was nobody. Yet as Garros pointed out, this is exactly what automation builders often create—black boxes that work perfectly for their creators but inspire zero confidence in their users. His presentation tackled the fundamental question Jason Edelman posed: What would it take to build trust in automation?

Trust as the Adoption Barrier

"Why haven't we seen full adoption of network automation yet?" Garros asked. "I truly believe that trust is actually one of the main issues." With gray hair earned from years of project successes and failures, he's witnessed too many automation initiatives fail not from technical inadequacy, but from user distrust.

The problem is perspective. For the builder, that "black box" isn't black at all—they know every component, every function, every possible outcome. But users see only opacity and risk. Building something that works isn't enough; building something others will trust requires fundamentally different thinking.

The Car Analogy Extended

Building on Jason Edelman's Tesla example, Garros offered his own automotive metaphor. Asked to build a car that goes from point A to point B, most engineers would create something functional but barely trustworthy—no doors, questionable brakes, missing lights. It works, but would you put your family in it?

A truly trustworthy car requires human-friendly design, safety features, and reliability that goes far beyond basic functionality. "The amount of difference in work between building something that I can use versus something that others will trust to use" represents the gap most automation projects fail to bridge.

The Six Principles of Trust

Garros defined six core principles for building trustworthy automation:

Predictable: Consistent, expected behavior every time Manageable: Easy to control, modify, and maintain
Transparent: Users understand what will happen before it happens Simple to Use: No 50-page documentation requirements Reliable: Works consistently without surprises Human-Friendly: Designed for human operators, not just technical correctness

These principles form the foundation for automation that inspires confidence rather than fear.

Three Essential Design Principles

1. Idempotency: The DHCP Parallel

"Idempotency is like DHCP," Garros explained. "Without idempotency, every time a laptop connects, it gets a different IP address—chaos. With idempotency, the laptop doesn't think about addressing; it always gets the same IP."

This simple concept is "super powerful but really, really hard to implement." It requires moving complexity from clients (users) to servers (automation systems), managing state effectively, and ensuring identical operations produce identical results.

2. Dry Runs: The Green Button Solution

Dry runs transform the black box problem by adding a "green button" alongside the red one. Press green to see exactly what red will do, then decide whether to proceed.

"Just with that, most of you will be okay to press the red button. It completely changes the game."

Tools like Terraform gained adoption partially because dry runs were mandatory from day one, while Ansible treated them as optional flags that users often ignored.

3. Transactional Operations: All or Nothing

When automation involves multiple changes, partial failures create worse problems than complete failures. Transactional operations ensure that if anything fails, everything rolls back to the starting state.

These three principles work synergistically—idempotent systems make dry runs easier to implement, and both concepts simplify transactional behavior.

Declarative vs. Imperative: The Complexity Equation

The choice between declarative and imperative approaches fundamentally impacts trustworthiness. Imperative workflows—sequences of steps mimicking human actions—create exponentially growing complexity as endpoints and steps multiply.

"The more steps you have in an imperative workflow, the higher the complexity will be, and it will probably grow exponentially. The level of trust in that workflow will decrease significantly."

Declarative approaches focus on desired end states rather than step sequences, naturally supporting idempotency and reducing complexity through abstraction. Instead of one massive workflow touching multiple systems, declarative architectures use smaller, focused agents that each handle specific systems idempotently.

Version Control: Beyond Developer Tools

"There are really two populations: those who understand version control and those who don't," Garros observed. "It's a big issue because decision makers often don't understand how important a collaboration tool it is."

Version control enables isolated change preparation, safe validation, peer review, and proper integration—transformative capabilities for network operations. What's exciting is version control expanding beyond text files to sources of truth like NetBox, Nautobot, and InfraHub.

"Every tool we use should have version control compatibility. It's not just about text files anymore."

Testing: Superpower or Kryptonite

Testing can make or break automation projects, but many network engineers approaching automation lack testing experience. The key is balance:

Over-testing early: Kills projects by preventing design iteration Under-testing later: Kills projects by undermining reliability

"If you cannot build tests easily on your automation, that means you have a design problem. You need to go back to the whiteboard and reduce workflow complexity."

Imperative workflows with many steps and endpoints are inherently harder to test than contained, declarative ones.

Practical Implementation Tips

Integrate, Don't Rebuild

"There's so much work. You should integrate as much as you can and reserve building to things you cannot find elsewhere." Stop rebuilding orchestration tools for every project—use existing solutions like Temporal, Prefect, or others.

Evaluate Integration Capabilities

When selecting tools, assess not just speed or features, but:

Proper interfaces and APIs
Support for declarative behaviors and idempotency
Integration with testing workflows
Overall user experience

Garros chose Prefect over Temporal not because it was faster, but because its Python-native architecture integrated better with their testing framework.

Classify Your Data

Implement "golden attributes" for everything in your source of truth:

Role: What function does this serve?
Function: What does this do?
Kind: What type of object is this?

Apply these to devices, IP addresses, interfaces, BGP sessions—everything. This classification enables workflow mapping to statuses and roles, building trust through automatic safety constraints.

Safe Defaults

"The day a network engineer forgets to read documentation, they're going to blame me nonetheless, even if I did everything right."

Make default options extremely safe and clear. Create two playbooks: one that shows what changes will happen (safe by default), another that actually implements changes (requiring explicit confirmation).

The Ecosystem Responsibility

Garros closed with a call for community support: "We're only as strong as the projects we rely on." Supporting open source projects that form automation foundations isn't just good citizenship—it's practical investment in the tools everyone depends on.

Why This Matters

Garros's presentation addresses automation's adoption challenge from a refreshingly honest perspective. Technical capability isn't enough; automation must inspire confidence in its users. This requires deliberate design choices that prioritize user trust alongside functional correctness.

His six trust principles and three design concepts provide a framework for evaluating existing automation and planning future projects. More importantly, they offer shared vocabulary for discussing automation quality beyond "does it work?"

The shift from imperative to declarative thinking, emphasis on version control as collaboration enabler, and balanced approach to testing represent mature automation practices that separate toy projects from production-ready systems.

As the automation community matures, conversations about trust, transparency, and user experience become as important as discussions about APIs and data models. Garros's insights suggest that the next phase of automation adoption depends less on better tools and more on building trustworthy tools—systems that users confidently deploy rather than fearfully avoid.

The black box problem isn't just about technical design; it's about human psychology and organizational dynamics. Solving it requires acknowledging that building automation for others demands fundamentally different approaches than building for ourselves.

Watch the full presentation: Building Trustworthy Network Automation, From Principles to Practice

AC3

Chris Grundemann

Executive advisor. Specializing in network infrastructure strategy and how to leverage your network to the greatest possible business advantage through technological and cultural transformation.

https://www.khadgaconsulting.com/