From 50 Clicks to 50 Seconds: Automating Optical Networks Beyond Traditional Tools

Summarizing Matteo Colantonio's optical network automation journey from AutoCon3

"Man, this is crazy, it's fantastic," was the feedback Matteo Colantonio received from his operations team. "It took me 15 to 20 minutes to do this before. Now it's like 1.5 minutes."

The transformation wasn't just about speed—it was about fundamentally rethinking how to automate optical networks when traditional tools fall short. Colantonio's journey from GARR, the Italian National Research and Education Network, offers a masterclass in persistence, tool evaluation, and ultimately finding the right framework for complex automation challenges.

The Familiar Starting Point

Like many organizations, GARR began with Ansible. It seemed logical—colleagues had used it successfully in other domains, and it worked for straightforward tasks like upgrading transponders using AWX for nice UI scheduling during off-hours.

But the limitations became apparent quickly when dealing with complex optical layer procedures. "Simply calling one function in a loop that only restarts one card," Colantonio explained, "you have to nest one inside the other three YAML files with the include statement and keep track of all the register variables."

The optical layer presented unique challenges that exposed Ansible's constraints. Some devices don't fully support NETCONF, requiring custom modules for interfaces like TL1—a structured but imperative command-line interface. "Good luck having idempotent models with this type of interface," he noted wryly.

Lesson learned: Ansible works well for simple procedures like pushing configurations, but complexity scales exponentially when handling real-world service logic rather than just configuring boxes.

The Vendor Controller Mirage

Next, they explored vendor northbound APIs—a seemingly reasonable choice since vendors build the devices and understand the domain. The goal was automating optical circuit provisioning, a process that manually required 40-50 clicks across different graphical interfaces.

This highlighted a key difference between packet and optical layer management: "Optical people are really used to graphical tools. This is a different starting point because we're not so focused on command-line interfaces."

The vendor API approach partially worked but fell short of replacing manual processes:

  • Could only do line-to-line provisioning between remote transponders

  • Still required manual cross-connections on transponder cards

  • Failed to handle meaningful service labels and descriptions needed by operations teams

  • Couldn't handle dual optical line systems present in parts of their network

Lesson learned: Vendor APIs can bottleneck automation ambitions. "You're not really in the driving seat because any bug or feature request will take weeks if not months to be implemented."

The Workflow Orchestrator Discovery

The turning point came with discovering a workflow orchestration framework initially developed by SURF (Dutch NREN) and ESnet, later open-sourced through the Commons Conservancy. Organizations like GÉANT and CERNnet had adopted it, providing validation for the approach.

"It's a framework, so it doesn't do anything—it's not a turnkey solution," Colantonio clarified. "But it enables you to define your network services and entities."

The framework provides three core capabilities:

Products: Define network services and entities as domain models for your organization

Subscriptions: Track individual customer instances of those products

Workflows: Define clear procedures for creating, modifying, validating, and terminating subscriptions

Everything is built with microservices in mind, featuring a FastAPI backend, customizable UI, ORM database storage, and comprehensive auditing.

Building Reality with Composable Models

The framework's power lies in its composable approach. Colantonio demonstrated with an optical fiber building block:

  • Has fiber name and operation system support identifier

  • Contains two termination ports as separate blocks

  • Can be wrapped as a product to manage lifecycle

"These are composable like Lego blocks," he explained. When you want to manage lifecycle for building blocks, you wrap them into products that can be provisioned and managed.

Workflows are simply lists of Python functions that can handle complex logic:

  • Reserve resources via API calls

  • Request human input for confirmation

  • Check node reachability after patching

  • Trigger configuration managers and wait for callbacks

  • Enable monitoring based on conditional logic

"Since it's Python, you can basically do whatever you want. The most complex logic—it can handle it because it was created for this."

The Dramatic Results

The transformation was quantifiable and immediate. What previously required 40-50 manual clicks now takes 50 seconds through automated workflows. But the impact went beyond time savings:

  • Central service definition: All services defined consistently in one place

  • Consistent execution: Entire lifecycle management follows established procedures

  • Visibility: Complete state and history tracking for each service

  • Future-proof toolkit: When hardware changes, services and workflows remain stable

"It's not pushing people—people want to use it because it makes their life easier," Colantonio observed.

Handling Platform Diversity

One elegant solution addresses the challenge of managing different optical platforms (transponders, inline amplifiers, etc.) with the same workflows. They use Python's single dispatch approach:

  1. Define abstract functions for common tasks (like setting port admin state)

  2. Register platform-specific implementations

  3. The framework automatically routes to the correct implementation based on device type

This keeps code organized, makes adding new platforms easy, and simplifies workflow logic while maintaining the same abstract interface across different hardware.

Direct Device Communication

Rather than fighting tool limitations, they embraced direct device APIs for full functionality access. Here's the actual implementation showing the single dispatch approach with two different optical platforms:

@set_port_admin_state.register(Platform.Groove_G30)
def _(
    optical_device: OpticalDeviceBlock,
    port_name: str,
    admin_state=Literal["up", "down", "maintenance"],
) -> Dict[str, Any]:
    ids = port_name.split("-")[-1]  # port-1/2/3 -> 1/2/3
    shelf_id, slot_id, port_id = ids.split("/")  # 1/2/3 -> 1, 2, 3
    
    g30 = g30_client(optical_device.mngmt_ip)  # RESTCONF client
    port = g30.data.ne.shelf(shelf_id).slot(slot_id).card.port(port_id)
    # Dynamic Path → https://{{host}}:{{port}}/data/ne:ne/shelf={{shelf_id}}/slot={{slot_id}}/card/port={{port_id}}
    
    port.modify(admin_status=admin_state)  # PATCH method with data validation
    return port.retrieve(depth=2)  # GET method


@set_port_admin_state.register(Platform.GX_G42)
def _(
    optical_device: OpticalDeviceBlock,
    port_name: str,
    admin_state=Literal["up", "down", "maintenance"],
) -> Dict[str, Any]:
    shelf_id, slot_id, port_id = port_name.split("-")  # 1-4-L1 -> 1, 4, L1
    g42 = g42_client(optical_device.mngmt_ip)
    port = g42.data.ne.equipment.card(f"{shelf_id}-{slot_id}").port(port_id)
    
    port.modify(admin_state=admin_state)
    return port.retrieve(depth=2)

This shows how the same abstract function handles different optical platforms (Groove G30 vs GX G42) with platform-specific parsing and API calls, while maintaining clean, readable code that translates to proper RESTCONF operations with data validation.

Key Principles for Success

Abstract and Composable Models: Build reusable components that can be combined to represent complex services

Stateful Instances: Track service instances through their complete lifecycle

Device APIs Over Configuration Files: Why settle for configuration push when devices offer full programmatic interfaces?

One Service at a Time: "This is crucial for both developers and users because it nudges people to adopt your tool by making their life easier"

The Broader Implications

Colantonio's journey illustrates an important evolution in network automation thinking. Rather than forcing complex requirements into tools designed for simpler use cases, they found a framework that embraces complexity while maintaining clarity.

The workflow orchestrator approach acknowledges that real-world service provisioning involves more than just device configuration—it includes external systems (IPAM, ITSM, OSS, BSS) and human decision points, all while maintaining audit trails and state consistency.

Implementation Reality

When asked about the learning curve, Colantonio was honest: "No, definitely not as easy as getting started with Ansible, but for sure it was worth it."

Working as a two-person team (with colleague Philip handling Kubernetes deployment), they developed their proof of concept starting in November and had working implementations within 2-3 months.

The framework includes validation workflows that run on schedule, checking configuration consistency and locking services out of sync until manual intervention resolves discrepancies.

Why This Matters

This presentation demonstrates that automation doesn't have to conform to popular tool limitations. When requirements exceed tool capabilities, the solution isn't necessarily to reduce requirements—it might be to find better tools.

For optical networks, traditional automation approaches often fail because they don't account for the visual, GUI-centric culture of optical operations or the complex service lifecycle management requirements.

Colantonio's success suggests that purpose-built workflow orchestration frameworks might be the future for complex service automation, particularly in domains where vendor APIs are insufficient and traditional configuration management tools fall short.

The key insight: sometimes the best automation strategy is admitting that one size doesn't fit all and investing in frameworks that embrace rather than fight complexity.


Chris Grundemann

Executive advisor. Specializing in network infrastructure strategy and how to leverage your network to the greatest possible business advantage through technological and cultural transformation.

https://www.khadgaconsulting.com/
Previous
Previous

Beyond Scripts: How Temporal Transforms Complex Network Workflows

Next
Next

The AI Revolution We Didn't See Coming: Why Network Automation Is About to Change Forever