Git vs. Source of Truth: Real-World Network Automation Data Management
This article summarizes key insights from a recent NAF community discussion where network automation engineers debated the practical trade-offs between two different infrastructure data management approaches: Using Git repositories with JSON/YAML files, or using dedicated network source of truth (NSoT) platforms like NetBox, Nautobot, or Infrahub. This decision impacts everything from change management workflows to operational scalability.
The Core Dilemma
The conversation revolved around two approaches:
Git-based approach: Maintain configuration data in JSON/YAML files within Git repositories, leveraging merge request workflows, testing pipelines, and version control. The data is consumed directly by automation tools like Salt Stack.
Source of Truth platforms: Store data in specialized systems like NetBox, Nautobot, or Infrahub with structured data models, APIs, and user interfaces. Automation tools query these systems for current state information.
When Git Works (And When It Doesn't)
Git's Sweet Spot
Version control provides undeniable benefits that hyperscale companies rely on. The merge request workflow with testing and validation creates a controlled change process. For many organizations, Git-based workflows have operated successfully for years with established toolchains and dependencies.
The Scale Problem
However, Git-based approaches face serious scalability challenges. Large JSON files become unmanageable - imagine a 12,000-line manually formatted JSON file that expands to 71,000 lines when properly formatted. These monolithic files create several problems:
Merge conflicts: Frequent changes to large files inevitably lead to conflicts
Maintainability: Finding and editing specific configurations becomes increasingly difficult
Review complexity: Code reviews become exercises in finding needles in haystacks
Breaking the Monolith
The solution isn't abandoning Git, but restructuring data organization. Instead of massive flat files, consider:
Device-specific files: Separate configurations by scope (global, shared, local)
Modular structure: Break large configurations into smaller, focused files
Aggregation at runtime: Use Python scripts to combine smaller files when needed
One engineer's experience managing 22,000 lines of YAML reinforced this lesson: monolithic files eventually become unmanageable regardless of tooling.
Source of Truth Platforms: Structure vs. Flexibility
The NSOT Advantage
Popular network source of truth platforms offer several compelling benefits:
Structured data models: Enforced schemas prevent data quality issues
API accessibility: Multiple systems can consume the same data
User-friendly interfaces: Lower barrier to entry for non-Git users
Integration capabilities: Natural fit for monitoring and other operational systems
The Flexibility Trade-off
However, structured platforms come with their own challenges:
Data manipulation risk: Easier data changes can lead to unintended consequences
Impact awareness: Users may not understand how their changes affect downstream systems
Query overhead: Automation systems need frequent API calls rather than local file access
Emerging Solutions: The Best of Both Worlds
A Hybrid Approach
Infrahub is a newer SoT platform that is leading recent attempts to bridge this gap by providing:
Flexible data models: Custom schemas that can evolve over time
Git-like version control: Full versioning for both schema and data
Temporal capabilities: Time-based views of configuration state
Artifact generation: Can produce traditional JSON/YAML outputs
Others are following suit.
Template-Based Generation
NetBox, Nautobot, and Infrahub all support configuration templating, allowing source of truth platforms to generate traditional configuration files. This approach provides structured data management while maintaining compatibility with existing automation workflows.
Practical Implementation Strategies
Risk Mitigation for Source of Truth
When implementing NSoT platforms:
Selective data usage: Only consume stable DCIM data initially
Custom plugins: Build validation and restriction layers
Role-based access: Control who can modify automation-critical data
Change notifications: Alert automation teams to relevant data changes
Git Repository Optimization
For continuing with Git-based approaches:
Implement JSON schema validation: Add structure without changing tools
Comprehensive linting: Catch errors before they reach production
Short-lived branches: Minimize merge conflict windows
Clear file organization: Group related configurations logically
The Migration Reality
Organizations with established Git workflows face a significant decision point. The "never touch a running system" principle conflicts with the need for better scalability. The reality is that successful systems often need periodic re-architecture.
Consider a phased approach:
V1.5: Break up monolithic files while maintaining current output formats
V2: Migrate to structured source of truth while keeping proven processes
Key Takeaways
Monolithic files don't scale: Break large configurations into manageable pieces
Version control is non-negotiable: Whether in Git or built into your platform
User experience matters: Consider who updates data and how they interact with it
Integration complexity: Plan for multiple systems consuming the same data
Migration costs: Factor in existing toolchain investments and team expertise
The choice between Git and source of truth platforms isn't binary. The best approach often combines elements of both, leveraging structured data management while maintaining the change control and versioning that make network automation reliable.
Success comes from understanding your specific constraints: team size, change frequency, data complexity, and existing tool investments. The "right" answer is the one that scales with your organization while maintaining operational reliability.
This post is based on community discussions and represents the collective experience and opinions of individual practitioners, including: Bart Dorlandt, Ryan Shaw, Paul Schmidt, Tyler Bigler, Urs Baumann, Damien Garros, John Howard, Matthew Smith, Mathieu Millet, and Marco Martinez. Approaches should be evaluated and adapted based on your specific network environment and requirements.
The conversation continues in the Network Automation Forum community – find us on Slack or LinkedIn.