Fine-Tuning LLMs for Network Configuration Intelligence
Summarizing Urs Baumann's practical AI approach from AutoCon3
"I'm doing network automation since 1503," joked Urs Baumann, opening his lightning talk on teaching "old" LLMs new tricks. "I did a masters in artificial intelligence because you cannot really bear the technology you don't understand. I'm kind of against AI—that's the reason why I did the masters."
His self-deprecating humor masked serious intent: demonstrating how fine-tuning can make LLMs practically useful for network automation tasks beyond basic chat interactions.
The Problem with Generic AI
"Too much AI is never enough," Baumann quipped, acknowledging the hype while addressing real limitations. Generic LLMs face several challenges for network automation:
- Old training data: "Every LLM we have nowadays, the next day it is already old" 
- Context confusion: "LLMs love context, but they don't love too much context because they get confused" 
- Generic outputs: ChatGPT can attempt network tasks but lacks the precision needed for production schemas 
Fine-tuning offers a path to specialized, reliable AI capabilities.
Why LoRA for Network Automation
Among many fine-tuning methods (addition, selection, reparameterization), Baumann chose LoRA (Low-Rank Adaptation of Large Language Models)—though he noted the confusing naming collision with networking's LoRaWAN protocol.
Key LoRA advantages:
- Memory efficiency: "Minimize memory consumption" for affordable training on standard hardware 
- Model preservation: Base model stays frozen while adding specialized layers 
- Multiple task support: One base model can support multiple fine-tuned applications 
- Sharing efficiency: Share small adaptation layers rather than full multi-gigabyte models 
"You can train on hardware that is affordable. In Google Colab, you can fine-tune big models with a decent GPU for about 10 bucks."
The Network Intent Extraction Challenge
Baumann's use case addressed a common problem: extracting intent and resources from messy network configurations to enable modern declarative management.
The scenario: Cowboys shooting network configurations "from the hips," creating legacy mess with unknown ACLs and unclear purposes. The goal: extract structured intent data to populate modern network management systems.
The approach: Fine-tune an LLM to convert running configurations into specific schemas, handling dynamic configuration formats and dynamic output schemas that traditional parsing can't accommodate reliably.
Technical Implementation
Training Stack:
- Unsloth: Easy-to-use fine-tuning framework 
- vLLM: Model serving and inference 
- MLflow: Experiment tracking and benchmarking 
Data Strategy: Synthetic dataset generation to create training examples matching real-world configuration complexity while maintaining quality control.
Architecture: Running configurations → Fine-tuned LLM → Structured schema data → InfraHub population → Change request generation
Evaluation Results
Testing against a 32 billion parameter instruct model:
JSON Validity: 100% (compared to 98% from ChatGPT-4) Data Extraction Match: 86% exact matches
The lower extraction score primarily resulted from the LLM adding contextually sensible but schema-inappropriate data: "The LLM sometimes added data not in the schema because it was thinking 'this interface also needs an IP address set to none' when I didn't want IP addresses in this schema."
Challenges and Learnings
Schema Confusion: Similar-looking schemas with different purposes confused the model, requiring more targeted training data.
IP Address Calculations: Converting classical IP/netmask to CIDR notation proved problematic because "it's an LLM, it guesses" rather than calculating precisely.
Data Quality: Self-generated synthetic datasets required iterative improvement to eliminate training bugs.
Practical Improvements
More Fine-Tuning: Additional training rounds with higher-quality data Enhanced Training Data: Better synthetic generation or real-world configuration examples
 Tool Integration: Use MCPs (Model Context Protocol) to handle calculations through function calls rather than guessing Template Generation: Use LLMs to generate parsing templates (regex, TTP) for traditional tools
"My preferred option would probably be just use some intern," Baumann deadpanned about data generation challenges.
Why This Matters
Baumann's work demonstrates practical AI application beyond hype—using fine-tuning to solve specific networking problems that generic models handle poorly. His approach acknowledges AI limitations while leveraging its strengths for realistic use cases.
Key insights:
- Specialized beats general: Fine-tuned models outperform generic LLMs for domain-specific tasks 
- Affordable accessibility: Modern tools make fine-tuning accessible without massive infrastructure 
- Practical constraints: Understanding AI limitations enables better tool selection and hybrid approaches 
- Real-world validation: Benchmarking against production requirements reveals where AI helps and where traditional tools remain superior 
The Broader AI Message
Rather than wholesale AI adoption or rejection, Baumann advocated for thoughtful application: understand the technology, identify appropriate use cases, implement with realistic expectations, and validate results.
His approach—"kind of against AI" but pursuing masters-level understanding—represents mature AI adoption: skeptical evaluation combined with practical experimentation.
For network automation practitioners, his work shows how fine-tuning can create specialized tools for specific problems while acknowledging that "just use some intern" might sometimes be the pragmatic solution.
The lightning talk format prevented deeper technical detail, but Baumann's links and references provide starting points for practitioners interested in similar applications.
What's Missing
Baumann concluded honestly: "We need more training data in higher quality, we need benchmarks." The path forward requires community effort to build datasets and evaluation frameworks for network-specific AI applications.
His work represents early exploration rather than finished solution—exactly the kind of practical experimentation that advances the field beyond generic AI enthusiasm toward useful, specialized applications.
Watch the full lightning talk: Teaching "old" LLMs new tricks
 
                        