Terraform for Networks: Where It Works, Where It Breaks, and How to Decide

Aug 25

The network automation community continues to debate whether Infrastructure as Code (IaC) tools like Terraform are well-suited for managing network infrastructure. A recent discussion among network engineers highlighted the practical realities, challenges, and nuanced considerations that teams face when evaluating Terraform for networking use cases.

The Core Challenge: Transactions vs. Resources

The fundamental tension lies in how Terraform's resource-based model maps to network device operations. Network operating systems offer sophisticated transaction capabilities—commits with validation and confirmation options that provide atomic operations across configuration changes. These features exist for good reason: they enable reliable, all-or-nothing configuration updates that can be safely rolled back if issues arise.

Terraform, however, treats each resource as an independent API call with an implicit commit. Creating an interface, BGP neighbor, VTEP, and VRF through Terraform results in four separate API operations, each committed individually. If any operation fails partway through, you're left with a partially configured service across potentially hundreds of devices.

"Commits with confirmation is a golden nugget your NOS offers to you," notes one engineer. "The system will reject a commit should any dependency issues occur during the commit process."

Where Terraform Shines in Networking

Despite these challenges, Terraform has found solid ground in specific networking contexts:

Cloud and Orchestrator-Based Infrastructure Teams report excellent results using Terraform with cloud networking services (AWS, GCP, Cloudflare) and network orchestrators like Cisco ACI. These platforms present unified APIs that align well with Terraform's resource model.

Security Policy Management Firewall management represents one of Terraform's strongest networking use cases. Many firewalls lack candidate store support anyway, and policy changes often involve discrete, independent resources that map naturally to Terraform's approach.

Organizational Alignment "Terraform is the preferred tool" among many network teams, particularly because it aligns with broader organizational practices. When cloud teams already use Terraform, network teams often adopt it for consistency and to leverage existing CI/CD pipelines.

Implementation Realities and Workarounds

Experienced practitioners have developed strategies to work within Terraform's constraints:

Provider Design Matters The quality of Terraform providers significantly impacts the experience. Well-designed providers can cache configurations and bundle multiple changes into atomic operations, though this requires careful engineering and may work against Terraform's natural patterns.

Scoped Deployments Teams often structure Terraform code into smaller stacks per tenant, site, or service rather than managing entire network configurations. This limits the blast radius of failed deployments and makes rollbacks more manageable.

State Management Considerations Terraform's "eventual state" model—where consistency is checked only during the next run—creates challenges in environments where CLI access coexists with automated management. Teams must carefully control how network changes are made to avoid state drift.

Alternative Approaches

Several engineers highlighted Ansible's advantages for commit-oriented network devices: "You define your resources in Ansible, set them as facts, populate the 'Transaction Basket,' and then send one API call to apply the whole basket."

This approach better matches how network engineers naturally think about configuration changes—as coherent transactions rather than collections of independent resources.

The Scaling Challenge

Performance becomes a significant consideration with larger deployments. Management appliances like Panorama have concurrency limits that can cause Terraform plans to take "dozens of minutes" when managing thousands of devices. The read-heavy nature of declarative systems can exhaust device resources, particularly when managing configuration pieces independently.

Finding the Right Tool for the Job

The discussion reveals that context matters enormously:

For cloud networking and orchestrators: Terraform often provides an excellent fit
For traditional network devices without orchestration: The impedance mismatch between Terraform's model and device capabilities creates friction
For greenfield environments: Terraform's destroy-and-rebuild approach works better
For mixed management environments: The tool choice becomes more complex

Practical Recommendations

Consider Terraform When:

Working with cloud networking services or network orchestrators
Managing security policies on firewalls
Seeking organizational tool consistency
Operating in greenfield environments
Requiring strong CI/CD integration

Exercise Caution When:

Managing traditional network devices directly
Operating in mixed CLI/automation environments
Requiring complex inter-device transactions
Dealing with legacy infrastructure

Design Considerations:

Structure code in appropriately scoped modules
Invest in high-quality provider implementations
Plan for rollback strategies
Consider performance implications at scale

The Bigger Picture

A key insight from this discussion is that many teams end up adapting tools originally designed for cloud and server management to networking contexts. While purpose-built network automation tools do exist, organizations often gravitate toward tools that align with their broader infrastructure practices and skillsets.

The goal isn't to find the perfect tool but to understand each tool's strengths and limitations. Terraform excels in certain networking contexts while creating friction in others. The key is matching the tool to the specific requirements, constraints, and organizational context of each use case.

As one engineer noted, "Use the right tool for the right job." In network automation, this often means maintaining a diverse toolset and applying each tool where it provides the most value rather than seeking a one-size-fits-all solution.

The ongoing evolution of network APIs, management paradigms, and automation tools suggests this landscape will continue to mature. Teams that focus on understanding the fundamental patterns and trade-offs—rather than betting everything on a single approach—will be best positioned to adapt as the ecosystem develops.

This post is based on community discussions and represents the collective experience and opinions of individual practitioners, including: Roman Dodin, Urs Baumann, Marco Martinez, Eduardo Pozo, Daniel Hertzberg, Wim Henderickx, Christian Drefke, Adrian Arumugam, Kurt Wauters, Calvin Remsburg. Approaches should be evaluated and adapted based on your specific network environment and requirements.

The conversation continues in the Network Automation Forum community – find us on Slack or LinkedIn.

Chris Grundemann

Executive advisor. Specializing in network infrastructure strategy and how to leverage your network to the greatest possible business advantage through technological and cultural transformation.

https://www.khadgaconsulting.com/