Step-by-step visual flows for the five core processes in this role: how SD-WAN routes traffic, how devices get onto the network, how changes get approved and executed, how incidents get resolved, and how the network recovers from a failure.
Praveendhra Rajkumar · p.rajkumar001@umb.edu · (857) 391-4257 · Framingham, MA
Every packet that hits the Silver Peak appliance gets classified by application type, then automatically sent down the best available WAN path, with instant failover if that path degrades. This is what fixed the VoIP call drop problem.
🔗 From the work sample: Aruba Silver Peak SD-WAN · VoIP MOS improved from 2.8 to 4.2
Entry point
Process / Classification
Decision / Policy Check
Primary / Success path
Failover / Degraded path
Destination reached
📡
Incoming WAN Traffic
Branch office → Silver Peak appliance
🔍
Deep Packet Inspection (DPI)
App-ID classification · layer 7 fingerprinting
⚖️
Business Intent Overlay Assignment
Which overlay does this traffic belong to?
🎙 Voice / Realtime
SIP · RTP · H.323
Realtime-Voice Overlay
Latency <20ms · Jitter <5ms · Loss <0.1%
SLA Check on MPLS
Meets latency / loss threshold?
Route via MPLS
FEC + Packet Order Correction
SLA Fail → Failover to LTE
Auto switchover <2s
🏥 Clinical Data
Epic EHR · PACS · HL7
Clinical-Data Overlay
Latency <50ms · Loss <0.5%
SLA Check on MPLS
Meets threshold?
Route via MPLS
Priority queuing for EHR
SLA Fail → Failover Broadband
Auto switchover <2s
💼 General Business
HTTP/S · Email · DNS
General-Business Overlay
Best-effort · Broadband preferred
Route via Broadband
Offloads MPLS capacity
Broadband Healthy
Traffic flows normally
Broadband Fail → MPLS Fallback
👥 Guest / Visitor
Internet Browsing
Guest Overlay
Best-effort · 10 Mbps cap
Route via Broadband
Rate limited · no MPLS access
Internet Delivered
Through captive portal VLAN
No failover, guest-only service
📊
SolarWinds VoIP Quality Manager
Continuous MOS monitoring post-delivery
✅
Application SLA Met · Traffic Delivered
How Devices Get onto the Network
Every device that plugs in or connects to Wi-Fi goes through this flow. ClearPass checks who you are, what device you're using, and puts you in exactly the right network segment, clinical staff, visitor, IoT device, or contractor. Nothing gets in by default.
🔗 From the work sample: Aruba ClearPass NAC · 802.1X · HIPAA Access Control
No one just logs in and makes changes to production. Every change has a risk review, a checklist, a rollback plan, and approval gates before anyone touches anything, and a post-review afterward. In a hospital, this process protects patients.
🔗 From the work sample: ServiceNow CHG · CAB approval · Reflects Bright Horizons production deployment discipline
Start
Process step
Decision / gate
Success path
Rollback path
📥
Change Request Submitted
ServiceNow · engineer creates CHG record
🔎
Risk Assessment
Engineer documents: scope, impact, rollback plan, test plan
⚖️
Risk Classification?
Low / Normal / High
Low Risk
Standard Change
Pre-approved template · no CAB needed
Normal / High Risk
CAB Review
Change Advisory Board approval required
CAB Approved?
Rejected → Rework
Update plan, resubmit
✅
Pre-Change Checklist (7 items)
Lab test · config backup (NCM) · rollback verified · stakeholders notified · maintenance mode set
🕐
Change Window Opens
Saturday 01:00–05:00 · clinical systems on standby
⚙️
Step-by-Step Implementation
Follow documented steps · update ServiceNow in real-time
🧪
Validation Tests
Ping gateways · EHR connectivity · SolarWinds all-green?
✅ Validation Pass
Maintenance Mode Off
SolarWinds alerts re-enabled
Stakeholders Notified
Email + ServiceNow update
CHG Record Closed
❌ Validation Fail (within SLA)
Rollback Initiated
Reconnect old hardware · restore config
P1 Incident Created
ServiceNow P1 · notify on-call
CAB Post-Mortem
Within 48 hours · updated risk plan
📊
Post-Change Review (1 week)
SolarWinds trends · any unintended impact?
📁
Change Knowledge Base Updated · Lessons Learned Captured
How an Incident Gets Resolved for Good
From the first alert to the last corrective action. This is the real BPDU storm incident from the work sample , showing how the network team detected it, found the root cause, fixed it immediately, and then made sure it literally cannot happen again.
🔗 From the work sample: RCA section · SolarWinds NTA alert · BPDU Guard remediation
Trigger
Response action
Classification / decision
Recovery / success
Escalation
Documentation / post-incident
🚨
Trigger: Incident Detected
SolarWinds threshold breach · user report · NOC alert
📋
Real Example: BPDU Storm
09:15 · SolarWinds NTA: VLAN 200 packet loss 3.2% ← VoIP call drops begin
🏷️
Severity Classification
P1 (all-hands) / P2 (on-call) / P3-P4 (queue)
P1 / P2, Immediate
On-Call Engineer Paged
ServiceNow alert · 5 min response SLA
Leadership Notified (P1)
Network Lead + IT Director
P3 / P4, Standard
Team Queue
Next business day · standard ticket
🔬
Problem Isolation
SolarWinds · NetFlow top-talker · CLI (show interface / show log) · Wireshark if needed
🔦
Root Cause Identified
09:35, Unmanaged switch from facilities created STP loop. BPDU Guard not enforced on port. Flooded VLAN 200.
🔧
Immediate Fix Applied
09:50, Port shut · BPDU Guard triggered · STP stabilized
✅
Validate: Service Restored?
Packet loss < 0.05% · VoIP calls working · SolarWinds green
Monitoring Updated · Runbook Updated · Problem Cannot Recur
What Happens When the Core Switch Goes Down
The exact steps from the moment SolarWinds fires a critical alert to the moment clinical systems are confirmed working and the P1 is closed. No guessing, no "we'll figure it out" : a tested runbook ready to execute at 2am.
🔗 From the work sample: DR Runbook section · Zoho quarterly DR switchovers (99.9% uptime) · Bright Horizons resilience testing
Alert / trigger
Runbook action
Diagnosis decision
Recovery path
Communication / documentation
🚨
SolarWinds NPM: CRITICAL
Core-A-NEXUS-01 unreachable · multiple dependent nodes down