Taming Multi-Vendor SONiC with Abstraction APIs

Summarizing Josh Saul's practical SONiC automation approach from AutoCon3

"I got into network automation around 2001," began Josh Saul from BE Networks. "I was using Perl to automate a Cisco 5500, and if you hit it with the wrong password five times in a row, the box reboots." He paused. "That's what I like to call a resume generating event."

His lightning talk on SONiC abstraction APIs offered both war stories and practical solutions for one of networking's persistent challenges: achieving consistency across multi-vendor environments.

The Tool Problem

Answering the conference's recurring question about automation adoption, Saul was blunt: "We haven't had good tools. We've had rough knives, we've had hammers when we needed screwdrivers. The vendors haven't been very supportive either."

His message to the room: "Everyone should give themselves a pat on the back for working with these rough tools and learning how to sharpen them in communities like this."

SONiC: Promise and Reality

For those unfamiliar, SONiC (Software for Open Networking in the Cloud) is Microsoft Azure's open-source network operating system, now used by nearly every hyperscaler and supported by every major hardware vendor.

"It is the future of networking," Saul declared, but acknowledged the complexity: "Multi-vendor SONiC is not one distribution. It's one operating system customized for different vendors."

The Government Agency Challenge

Saul's customer story illustrated both SONiC's potential and its rough edges. A large government agency, hit by supply chain challenges, needed to deploy multiple vendor platforms while maintaining operational consistency.

Their use case was typical data center operations: maintaining physical underlay while constantly creating multi-tenancy services—new VRFs, VLANs, VXLANs, routing policies—and mapping them to physical ports and 802.1Q trunks.

The challenge: different SONiC distributions support different features, require different programming methods (GNMI, CLI, SNMP), and implement different protocols for the same outcomes.

Protocol Fragmentation

Saul highlighted a key example: server link redundancy. Early SONiC vendors implemented MLAG (not standardized), while later vendors could wait for EVPN with ESI support.

"The outcome for the customer is the same—redundant NICs to servers—but the protocols are wildly different," he explained. "You may not know that in this area of the data center, you have devices supporting MLAG and over here supporting ESI."

Users want a "cloud-like experience"—VRFs, VPCs, tenants with connected networks and NIC-level redundancy—without caring about underlying protocol differences.

The Abstraction Solution

Rather than dealing with individual device configurations, the solution treats fabric programming as a higher-level operation:

Batch Object Creation: Use REST APIs to create small objects (VLANs, VRFs) and combine them into "change sets" representing larger logical services.

Model Validation: Check consistency across the logical model—ensuring VLANs are properly connected to routers providing DHCP relay services, validating that partial configurations don't conflict with other changes.

Multi-Protocol Output: Generate both GNMI and CLI configurations from the same abstraction layer, accounting for vendor-specific feature differences and programming methods.

The CI/CD Integration

The abstraction API enables a workflow that many organizations need:

  1. Single REST API: One interface for defining overlay objects and requesting configurations for affected devices

  2. Configuration Preview: Get JSON-formatted configurations for analysis by other tools (compliance checking, security validation)

  3. Git Integration: Store configuration changes in version control for approval workflows

  4. Read-Only Validation: Test configurations without pushing changes to devices

  5. Controlled Deployment: Push approved configurations with telemetry enablement and device discovery

Technical Reality

Saul showed actual API responses—complex JSON structures using GNMI with OpenConfig that "looks pretty nasty" to CLI-focused engineers but provides the foundation for programmatic consistency.

The same API call returns device-specific configurations: GNMI format for one device, CLI commands for another, each tailored to that platform's capabilities and supported features.

Key Takeaways

Despite the lightning talk format, Saul packed in practical advice:

Model Thoroughly: "Spend a lot of time, stare at it, think through all the permutations of different objects"

Plan Extensibility: Use graph-based models that can accommodate additional information and device capabilities later

Encode Device Facts: Include device models, features, and capabilities in your data structures

Implement Broad Support: "If you're doing ESI, it makes sense to do MLAG at the same time. Get it all done early"

Enable External Analysis: "Get dry run configs out of the system so you can analyze elsewhere. Don't work in a vacuum"

Why This Matters

Saul's presentation addressed a practical challenge many organizations face: leveraging open networking benefits while managing vendor implementation differences. SONiC offers genuine multi-vendor consistency, but abstraction layers become essential for operational simplicity.

His approach—treating fabrics as programmable infrastructure rather than collections of individual devices—aligns with broader industry trends toward infrastructure-as-code and service-oriented networking.

The integration with CI/CD workflows and external validation tools demonstrates mature automation thinking: automation systems must integrate with existing operational processes rather than requiring wholesale workflow changes.

For organizations considering SONiC deployments or struggling with multi-vendor automation challenges, Saul's abstraction API approach offers a proven pattern for achieving operational consistency without sacrificing vendor choice or feature capabilities.


Chris Grundemann

Executive advisor. Specializing in network infrastructure strategy and how to leverage your network to the greatest possible business advantage through technological and cultural transformation.

https://www.khadgaconsulting.com/
Previous
Previous

Disrupting Enterprise Wi-Fi with Automation and Open Source

Next
Next

The Black Box Problem: Why Trust is the Missing Piece in Network Automation