ePDG Monitoring [Документация VAS Experts]

ePDG Monitoring

Integrated VoWiFi Gateway Monitoring System (ePDG)

Review of the decision

The VAS Experts ePDG Monitoring system provides full operational control of the fast-epdg component, the VoWiFi (Voice over WiFi) gateway operating according to 3GPP TS 29.273 and TS 24.302. The gateway provides secure transmission of voice and packet traffic through untrusted Wi-Fi channels with IPSec / IKEv2 tunneling and integration with the EPC core through SWu, SWm, SWx, S2b, S6b interfaces.

The solution provides a single monitoring platform for the mobile operator’s operational services — from the IPSec SA (L3 security) level to the KPI of VoWiFi subscriber experience.

Key advantages

  • Real-time monitoring — update metrics every 10-15 seconds, directly display the status of IKE SA / Child SA and GTP tunnels in NOC dashboards without delayed aggregation (hereinafter NOC — Network Operation Center, network management center).
  • Proactive detection of anomalies — 20+ alarms with automatic escalation in importance. PGW/AAA inaccessibility, increased IKEv2 delays, and an increase in EAP-AKA errors are detected before subscribers notice problems with calls.
  • Open integration interfaces — Prometheus, SNMP v2c, Alertmanager webhooks, Grafana support. Integration into the existing NMS/OSS infrastructure without vendor binding.
  • Minimum external dependencies at the plugin level — built-in /metrics endpoint in fast-epdg, without Java, without JMX, without external agents.
  • Coverage of the entire SWu → S2b stack — IKEv2 (SWu), Diameter SWm/SWx/S6b, GTPv2-C (S2b) and GTP-U data plane — all in one place. The 33 metrics cover control plane and data plane.

Four-level monitoring architecture

Level Component Technology
Collection Built-in /metrics endpoint fast-epdg Prometheus text format over HTTP
Storage Prometheus TSDB Local storage, 15-day storage by default
Visualization Grafana + JSON support Autodownload 4 dashboards
Alerting Alertmanager + SNMP Trap Sender PromQL rules → webhook → SNMP v2c trap

Quantitative review by category

Category Number of metrics Survey interval Key indicators
Config 2 10 sec Configuration status, reload counter
Network 1 10 sec Node connection status (PGW/AAA/HSS)
IKEv2 (SWu) 3 10 sec Reports by type (IKE_SA_INIT, IKE_AUTH, CREATE_CHILD_SA), delay diagram, errors
GTPv2-C (S2b) 4 10 sec Messages (Create/Modify/Delete Session), delays, errors, relays
GTP-U data plane 3 10 sec Packets/bytes, tunneling errors
Diameter (SWm/SWx/S6b) 5 10 sec Command code messages (DER/DEA, MAR/MAA, AAR/AAA), delays, errors, watchdog, connection status
Service KPI 4 10 sec Percentage of successful attempts, duration histogram, service availability, uptime
Session State 4 10 sec IKE SA, Child SA, GTP sessions, all users
Application 3 10 sec Number of streams, memory, log messages by levels
System 4 10 sec CPU recycling, memory, memory disposal, open FD
Total 33 metrics

Alarm categories

Criticism Alarma Description Reaction
Critical ePDG_Service_Down, ePDG_High_Attach_Failure_Rate, ePDG_PGW_Unreachable, ePDG_AAA_Unreachable, ePDG_Diameter_Watchdog_Timeout Component is unavailable, widespread connection failures, nodes are unavailable Immediate escalation: Email + SNMP Trap + Webhook. Repeat every hour
Warning ePDG_High_IKEv2_Latency, ePDG_High_GTP_Latency, ePDG_High_IKEv2_Error_Rate, ePDG_High_GTP_Error_Rate, ePDG_High_Memory_Usage, ePDG_High_CPU_Usage, ePDG_Low_Disk_Space, ePDG_High_Error_Log_Rate Performance degradation, resource anomalies Email. Resend every 4 hours. Suppressed if a “Critical” status is present on the same component

Was this information helpful?