Escaping the Screen Scraping Trap with BGP Monitoring Protocol

Summarizing Bart Dorlandt's BMP implementation journey from AutoCon3

"It has been awesome being here and it's really like coming home," began Bart Dorlandt, freelancer from Dream Networking and Automation in the Netherlands. "I think we should rebrand Nafen into network automation family because I'm sure a lot of people feel the same way."

His lightning talk tackled a familiar pain point: the endless cycle of screen scraping BGP data from network devices and the better path forward using BGP Monitoring Protocol (BMP).

The Screen Scraping Nightmare

Dorlandt painted a picture that resonated with many in the audience: "You've been going through devices, endless devices, screen scraping and getting every bit and byte out of that box, looping through interfaces and neighbors, processing all the data. You got some awesome and magnificent and unbeatable regex to get that all done."

Now imagine scaling that to a service provider network: 60+ POPs, two routers per POP, continuously monitoring BGP on every peer. For detailed prefix information—pre and post policy data—there's no magic command that returns full JSON. "You would literally need to do that on a peer-by-peer basis. It's going to kill the box."

The Use Case: Customer Feedback Loops

The challenge his client faced was common in service provider environments: they had automation for configuration changes (customers could add prefixes through a portal for DDoS protection), but no feedback mechanism.

"What didn't happen was providing a feedback loop and giving the customer the view of 'green box, green box, everything is working.' Not even a red box, nothing."

The goal: let customers see that their prefixes are properly announced with correct communities and policies applied, providing the transparency modern service providers need.

BMP Architecture: Push vs. Pull

Instead of continuously polling devices, BMP enables a push model where routers actively send BGP information to collectors:

Network Edge: 60+ POPs with routers pushing data rather than being polled Collector: Processes incoming data, converts to protobuf format, feeds into Kafka Consumer: Homegrown application that translates Kafka messages into database entries API Layer: AWS Gateway with Lambda functions providing data access Customer Portal: GUI team consumes API for customer-facing interface

"Easy peasy, lemon squeezy," Dorlandt quipped about the conceptual architecture.

Technical Implementation Details

Collector: Written in Go, based on the BMP-to-Kafka library (wrapper around BioRD library) Messaging: Protobuf format over Kafka for reliable data transport Database: PostgreSQL-compatible (Aurora on AWS) API: AWS API Gateway with Python Lambda functions

The technical details matter because, as Dorlandt emphasized, "not everything went smooth—we need to highlight the pain as well to learn from each other."

Performance Challenges and Solutions

The Throughput Problem

Initial implementation: 30 messages per second processing 1 million+ IPv4 prefixes per ISP peer. "That takes a very, very long time to get into the database."

Solution: Created a pool of asynchronous producers, relaxed strict acknowledgments, and optimized Kafka configuration. Result: Increased throughput to 45,000 messages per second—a 1,500x improvement.

The Data Volume Problem

"Wanting everything often leads to having nothing." Without filters, they received everything: 4 ISPs per router, internal mesh, POP internals, plus customer data.

Problem: Millions of entries per router, billions across the network, database couldn't handle the load. Solution: Implement filters focusing only on customer data for database storage while keeping other data in Prometheus for operational visibility.

Implementation Lessons

Filter Early and Often

"Focus on what you need. Sounds simple, it's quite important." Don't push everything into your primary data store—be selective about what requires persistent storage versus operational metrics.

Open Source Customization

The BMP-to-Kafka library worked well but needed enhancements:

  • TLS Support: Added for secure communication

  • Peer-Down Notifications: Critical missing feature—database showed prefixes as active while entire peers were down

  • BGP Identifier Validation: Fixed upstream bug that caused warnings

Operational Metrics

Keep filtered-out data in Prometheus for network engineering teams: "That peer went down three days ago and we saw the flap—that's some side bonuses."

Results and Benefits

Closed-Loop Customer Feedback: No human intervention required for customer visibility into their prefix announcements Push Model Reliability: "Whatever the router gets, gets pushed upstream—you don't miss anything" Low Maintenance: "We did not have any, literally any issues with it" Operational Insight: Network engineering teams gained visibility into peer states and flaps Scalable Architecture: Handles service provider scale across multiple teams and systems

The Key Insight

"The biggest takeaway is focus on what you need. We enabled it and got everything—18 million entries from a single router—and it's kind of killing things along the way."

This principle applies beyond BMP to all automation projects: comprehensive data collection sounds appealing, but operational reality requires surgical precision in what you capture, process, and store.

Why This Matters

Dorlandt's presentation showcased the evolution from reactive (screen scraping) to proactive (push-based) network monitoring. BMP represents a fundamental shift in how we gather BGP intelligence—from resource-intensive polling to efficient, real-time data streams.

For service providers struggling with BGP visibility at scale, his architecture provides a proven pattern: leverage push protocols, implement smart filtering, and design for multi-team consumption. The customer feedback loop becomes a competitive advantage when automation handles the complexity behind the scenes.

The technical challenges—throughput optimization, data volume management, and open source customization—mirror issues many face when implementing monitoring at scale. His solutions provide practical guidance for similar deployments.

As network automation matures, moving beyond screen scraping toward protocol-native data collection becomes essential for both operational efficiency and customer experience. BMP offers a path forward for BGP-heavy environments.


Chris Grundemann

Executive advisor. Specializing in network infrastructure strategy and how to leverage your network to the greatest possible business advantage through technological and cultural transformation.

https://www.khadgaconsulting.com/
Previous
Previous

Security's Missing Link in Network Automation

Next
Next

Disrupting Enterprise Wi-Fi with Automation and Open Source