Network Process Flows

Step-by-step visual flows for the five core processes in this role: how SD-WAN routes traffic, how devices get onto the network, how changes get approved and executed, how incidents get resolved, and how the network recovers from a failure.

Praveendhra Rajkumar  ·  p.rajkumar001@umb.edu  ·  (857) 391-4257  ·  Framingham, MA
← Work Sample
How SD-WAN Routes Traffic
Every packet that hits the Silver Peak appliance gets classified by application type, then automatically sent down the best available WAN path, with instant failover if that path degrades. This is what fixed the VoIP call drop problem.
🔗 From the work sample: Aruba Silver Peak SD-WAN · VoIP MOS improved from 2.8 to 4.2
Entry point
Process / Classification
Decision / Policy Check
Primary / Success path
Failover / Degraded path
Destination reached
📡
Incoming WAN Traffic
Branch office → Silver Peak appliance
🔍
Deep Packet Inspection (DPI)
App-ID classification · layer 7 fingerprinting
⚖️
Business Intent Overlay Assignment
Which overlay does this traffic belong to?
🎙 Voice / Realtime
SIP · RTP · H.323
Realtime-Voice Overlay
Latency <20ms · Jitter <5ms · Loss <0.1%
SLA Check on MPLS
Meets latency / loss threshold?
Route via MPLS
FEC + Packet Order Correction
SLA Fail → Failover to LTE
Auto switchover <2s
🏥 Clinical Data
Epic EHR · PACS · HL7
Clinical-Data Overlay
Latency <50ms · Loss <0.5%
SLA Check on MPLS
Meets threshold?
Route via MPLS
Priority queuing for EHR
SLA Fail → Failover Broadband
Auto switchover <2s
💼 General Business
HTTP/S · Email · DNS
General-Business Overlay
Best-effort · Broadband preferred
Route via Broadband
Offloads MPLS capacity
Broadband Healthy
Traffic flows normally
Broadband Fail → MPLS Fallback
👥 Guest / Visitor
Internet Browsing
Guest Overlay
Best-effort · 10 Mbps cap
Route via Broadband
Rate limited · no MPLS access
Internet Delivered
Through captive portal VLAN
No failover, guest-only service
📊
SolarWinds VoIP Quality Manager
Continuous MOS monitoring post-delivery
Application SLA Met · Traffic Delivered
How Devices Get onto the Network
Every device that plugs in or connects to Wi-Fi goes through this flow. ClearPass checks who you are, what device you're using, and puts you in exactly the right network segment, clinical staff, visitor, IoT device, or contractor. Nothing gets in by default.
🔗 From the work sample: Aruba ClearPass NAC · 802.1X · HIPAA Access Control
Entry
Process
Policy decision
Access granted
Access denied / quarantine
AD lookup
🔌
Client Device Connects to Port
Wired or wireless, any campus segment
📡
Port Detects Connection
CDP / LLDP profiling · device fingerprinting begins
🔏
802.1X EAP Request Sent
Does device respond to EAP?
Path A, EAP-TLS
Device presents certificate
Machine cert from internal CA
Cert Valid?
Check against PKI trust store
AD Computer Object Lookup
Is device joined to domain?
AD Group?
Clinical Staff
→ VLAN 100 · dACL: permit-clinical-apps
Path B, PEAP Fallback
No machine cert
PEAP-MSCHAPv2 user credentials
AD User Auth
Username + password validated
AD Group?
Contractor
→ VLAN 500 Guest · redirect to IT portal
Path C, MAC Auth Bypass
No 802.1X response
IoT / headless device
MAC in known IoT list?
ClearPass device database
Known IoT Device
→ VLAN 300 · dACL: healthcare servers only
Unknown Device
→ DENY · Quarantine VLAN · alert fired
📝
ClearPass Logs Access Event
Username · device · VLAN · timestamp · policy matched
🛡️
Device on correct VLAN · Least-privilege enforced
How a Network Change Gets Made
No one just logs in and makes changes to production. Every change has a risk review, a checklist, a rollback plan, and approval gates before anyone touches anything, and a post-review afterward. In a hospital, this process protects patients.
🔗 From the work sample: ServiceNow CHG · CAB approval · Reflects Bright Horizons production deployment discipline
Start
Process step
Decision / gate
Success path
Rollback path
📥
Change Request Submitted
ServiceNow · engineer creates CHG record
🔎
Risk Assessment
Engineer documents: scope, impact, rollback plan, test plan
⚖️
Risk Classification?
Low / Normal / High
Low Risk
Standard Change
Pre-approved template · no CAB needed
Normal / High Risk
CAB Review
Change Advisory Board approval required
CAB Approved?
Rejected → Rework
Update plan, resubmit
Pre-Change Checklist (7 items)
Lab test · config backup (NCM) · rollback verified · stakeholders notified · maintenance mode set
🕐
Change Window Opens
Saturday 01:00–05:00 · clinical systems on standby
⚙️
Step-by-Step Implementation
Follow documented steps · update ServiceNow in real-time
🧪
Validation Tests
Ping gateways · EHR connectivity · SolarWinds all-green?
✅ Validation Pass
Maintenance Mode Off
SolarWinds alerts re-enabled
Stakeholders Notified
Email + ServiceNow update
CHG Record Closed
❌ Validation Fail (within SLA)
Rollback Initiated
Reconnect old hardware · restore config
P1 Incident Created
ServiceNow P1 · notify on-call
CAB Post-Mortem
Within 48 hours · updated risk plan
📊
Post-Change Review (1 week)
SolarWinds trends · any unintended impact?
📁
Change Knowledge Base Updated · Lessons Learned Captured
How an Incident Gets Resolved for Good
From the first alert to the last corrective action. This is the real BPDU storm incident from the work sample , showing how the network team detected it, found the root cause, fixed it immediately, and then made sure it literally cannot happen again.
🔗 From the work sample: RCA section · SolarWinds NTA alert · BPDU Guard remediation
Trigger
Response action
Classification / decision
Recovery / success
Escalation
Documentation / post-incident
🚨
Trigger: Incident Detected
SolarWinds threshold breach · user report · NOC alert
📋
Real Example: BPDU Storm
09:15 · SolarWinds NTA: VLAN 200 packet loss 3.2% ← VoIP call drops begin
🏷️
Severity Classification
P1 (all-hands) / P2 (on-call) / P3-P4 (queue)
P1 / P2, Immediate
On-Call Engineer Paged
ServiceNow alert · 5 min response SLA
Leadership Notified (P1)
Network Lead + IT Director
P3 / P4, Standard
Team Queue
Next business day · standard ticket
🔬
Problem Isolation
SolarWinds · NetFlow top-talker · CLI (show interface / show log) · Wireshark if needed
🔦
Root Cause Identified
09:35, Unmanaged switch from facilities created STP loop. BPDU Guard not enforced on port. Flooded VLAN 200.
🔧
Immediate Fix Applied
09:50, Port shut · BPDU Guard triggered · STP stabilized
Validate: Service Restored?
Packet loss < 0.05% · VoIP calls working · SolarWinds green
Restored ✓
Incident Closed in ServiceNow
10:00 · resolution documented
Not Restored
Escalate / Alternate Fix
Engage TAC · bridge call · DR if needed
📄
RCA Document Written (P1/P2)
5-Why analysis · timeline · contributing factors · impact
🛠️
Corrective Actions (4-tier)
Immediate (same day) · Short-term (1 week) · Long-term (quarter) · Process update
📢
Real Example: Corrective Actions
BPDU Guard enabled globally via NCM script · ClearPass port-auto-shut policy · IDF physical security audit · NOC runbook updated
🛡️
Monitoring Updated · Runbook Updated · Problem Cannot Recur
What Happens When the Core Switch Goes Down
The exact steps from the moment SolarWinds fires a critical alert to the moment clinical systems are confirmed working and the P1 is closed. No guessing, no "we'll figure it out" : a tested runbook ready to execute at 2am.
🔗 From the work sample: DR Runbook section · Zoho quarterly DR switchovers (99.9% uptime) · Bright Horizons resilience testing
Alert / trigger
Runbook action
Diagnosis decision
Recovery path
Communication / documentation
🚨
SolarWinds NPM: CRITICAL
Core-A-NEXUS-01 unreachable · multiple dependent nodes down
📟
On-Call Engineer Paged
ServiceNow P1 incident auto-created · 5-minute response target
👁️
DETECT, Physical + OOB Check
Console cable via OOB MGMT network · check LEDs · "show system resources" · "show version"
🔎
ISOLATE, Failure Type?
Hardware failure vs Software/process crash
Software / Process Crash
Identify Failed Process
"show processes cpu sort" · syslog check
Restart Service / Reload Module
In-service restart if supported
Restore from Backup Image
SolarWinds NCM last-known-good config
Hardware Failure
Redundant Supervisor?
Is NSF/SSO configured?
Yes, Auto-Failover
NSF/SSO Kicks In
<30 sec · zero downtime · log confirms sup switchover
No, Chassis Failure
Stage DR Spare
Pre-racked Nexus 9508 in DR rack
Load NCM Config
SCP from jump host · last nightly backup
Reconnect Fiber Uplinks
IDF patch panel · label map in Visio
🔁
Verify Routing Adjacencies
"show ip ospf neighbor" · "show bgp summary" · all neighbors should re-establish within 5 min
🧪
VALIDATE
/scripts/validate-core.sh, ping all 20 critical server IPs · SolarWinds all-green?
📞
Confirm Clinical Access
Call charge nurses on each floor · "Is Epic working?"
📣
COMMUNICATE, Every 15 min
Update ServiceNow P1 · notify Network Lead + IT Leadership · estimated restore time
Service Restored
RTO met (<30 min target) · P1 closed in ServiceNow
📄
RCA & Post-Incident Review
Within 48 hours · how to prevent recurrence · runbook updated
🏁
Disaster Recovery Complete, Uptime Preserved