Beyond Scripts: How Temporal Transforms Complex Network Workflows

Summarizing Naveen Achyuta's workflow orchestration insights from AutoCon3

"If you have a Python script that runs a show command or pushes config, don't use Temporal, please," Naveen Achyuta warned his audience. "It's not for that."

But if you're dealing with complex, long-running processes that span multiple systems, require human intervention, or need bulletproof reliability—then Temporal might be exactly what your network automation has been missing.

What Is Temporal?

At its core, Temporal is "just a workflow engine that runs workflows." But this undersells its real value: ensuring workflows are durable and execute reliably, no matter what failures occur.

System crashes? External API failures? Network interruptions? Temporal resumes workflows from exactly where they stopped, maintaining complete state throughout the process.

Unlike static configuration tools, Temporal workflows are pure code—Python, Go, Java, .NET, or other languages—providing maximum flexibility for complex logic. It's a fork of Uber's Cadence project, now open-source with cloud options available.

The Architecture That Enables Resilience

Temporal splits responsibilities between user-hosted processes (your code) and Temporal service (the orchestration engine):

User Side:

  • Your application triggers workflows via CLI or API

  • Stateless temporal workers execute your workflow code

Temporal Service:

  • Frontend service routes requests

  • History service maintains workflow state and stores everything in a database

  • Matching service manages task queues for different workflow types

Here's the magic: when you start a workflow, it runs until it hits an activity (external operation) or timer, then pauses. The worker tells the history service "schedule this activity for me" and the workflow state gets saved. Any available worker can execute the activity—workflows aren't tied to specific workers.

If anything fails, the history service reconstructs the complete event history from the database and resumes exactly where it left off.

Network Automation Use Cases

Naveen demonstrated the breadth of possibilities with practical examples:

Device Provisioning with Human-in-the-Loop

A workflow that spans days: create device object → generate initial config → push to TFTP → pause workflow → engineer physically connects device → resume workflow → verify reachability → run checks → update NetBox → push full production config.

The key insight: the workflow can pause for two days while an engineer handles physical tasks, then automatically resume when ready.

Config Management with Automatic Rollback

Pre-checks → push config → post-checks → if post-checks fail → run rollback activity. Since it's pure code, conditional logic is straightforward. This implements the Saga pattern: execute operations and automatically compensate for failures.

Event-Driven Automation

Prometheus alert for link flap → get interface data → create ticket → check if safe to drain → drain link → update ticket. Multiple concurrent workflows don't matter—Temporal handles millions simultaneously.

Multi-Team Orchestration

Rack decommissioning involving SRE teams (server management), network teams (switch management), and facilities (power control). Each step talks to different services and teams, but Temporal ensures the entire process completes reliably despite external dependencies.

The AI Integration Opportunity

Naveen tackled AI agents—acknowledging some might "hate this" but explaining why it matters for networking. AI agents fail frequently when querying external systems, but networking requires accurate data. Incomplete data leads to wrong reasoning.

His AI troubleshooting workflow: BGP session down → select tool → gather network logs/metrics → ask LLM "need another tool?" → loop until complete → generate analysis. The reliability guarantees become crucial when AI agents must remediate actual network issues.

He showed a more advanced pattern: AI agents that not only diagnose problems but trigger remediation workflows. An agent detecting packet loss and high latency could automatically start the link flap remediation workflow, ensuring the fix completes even if intermediate steps fail.

When Temporal Makes Sense

Ideal for:

  • Long-running processes (device provisioning, OS upgrades spanning hours or days)

  • Complex workflows requiring multiple external systems

  • Processes needing human intervention at specific points

  • Scenarios where reliability is critical and failures are expensive

Not ideal for:

  • Simple scripts that run show commands

  • Quick configuration pushes without complex logic

  • Workflows that complete in seconds without external dependencies

Key Benefits for Network Engineers

Focus on Business Logic: Temporal handles infrastructure reliability, retries, state management, and failure recovery automatically.

Built-in Observability: Logging, metrics, tracing, and visualization come out of the box—no custom development needed.

Human Integration: Pause workflows at any point for manual intervention, then resume programmatically.

Cross-Language Support: Write workflows in one language, activities in others (Go workflow with Python NetMiko activities).

Low Migration Risk: Since workflows are pure code, you can extract business logic if you decide to move away from Temporal.

Getting Started

The barrier to entry is surprisingly low. On Mac: brew install temporal → temporal server start-dev → run your worker → trigger workflows. The complete development environment runs locally with two commands.

Naveen emphasized Temporal's exceptional documentation: "They have the best documentation I have ever seen in my life. They document everything. Every feature, guides, free courses, architecture explanations, code snippets, and exact source code references."

The Bigger Picture

What makes Temporal compelling for network automation isn't just technical capabilities—it's the recognition that complex network operations are inherently distributed systems problems.

Traditional automation tools excel at device configuration but struggle with orchestrating multi-system processes, handling long-running operations, and maintaining state across failures. Temporal embraces this complexity rather than fighting it.

For organizations moving beyond simple configuration management toward comprehensive service orchestration, Temporal offers a mature platform that handles the distributed systems challenges so engineers can focus on network-specific logic.

As Naveen concluded, "If you have a lot of different services to talk to, or something that needs to run for a long time—like device provisioning that takes 20 days—then it makes sense to use Temporal."

The question isn't whether your network automation needs better workflow orchestration. It's whether you're ready to stop rebuilding distributed systems infrastructure and start building network services instead.


Chris Grundemann

Executive advisor. Specializing in network infrastructure strategy and how to leverage your network to the greatest possible business advantage through technological and cultural transformation.

https://www.khadgaconsulting.com/
Previous
Previous

Beyond the Island of Automation: Bridging Complex Networks with Purpose-Built Platforms

Next
Next

From 50 Clicks to 50 Seconds: Automating Optical Networks Beyond Traditional Tools