{{indexmenu_n>6}} ====== ePDG Monitoring ====== ===== Integrated VoWiFi Gateway Monitoring System (ePDG) ===== ===== Review of the decision ===== The VAS Experts ePDG Monitoring system provides full operational control of the **fast-epdg** component, the VoWiFi (Voice over WiFi) gateway operating according to 3GPP TS 29.273 and TS 24.302. The gateway provides secure transmission of voice and packet traffic through untrusted Wi-Fi channels with IPSec / IKEv2 tunneling and integration with the EPC core through SWu, SWm, SWx, S2b, S6b interfaces. The solution provides a single monitoring platform for the mobile operator’s operational services — from the IPSec SA (L3 security) level to the KPI of VoWiFi subscriber experience. ==== Key advantages ==== * ** Real-time monitoring** — update metrics every 10-15 seconds, directly display the status of IKE SA / Child SA and GTP tunnels in NOC dashboards without delayed aggregation (hereinafter NOC — Network Operation Center, network management center). * **Proactive detection of anomalies** — 20+ alarms with automatic escalation in importance. PGW/AAA inaccessibility, increased IKEv2 delays, and an increase in EAP-AKA errors are detected before subscribers notice problems with calls. * **Open integration interfaces** — Prometheus, SNMP v2c, Alertmanager webhooks, Grafana support. Integration into the existing NMS/OSS infrastructure without vendor binding. * ** Minimum external dependencies at the plugin level** — built-in ''/metrics'' endpoint in fast-epdg, without Java, without JMX, without external agents. * **Coverage of the entire SWu → S2b** stack — IKEv2 (SWu), Diameter SWm/SWx/S6b, GTPv2-C (S2b) and GTP-U data plane — all in one place. The 33 metrics cover control plane and data plane. ==== Four-level monitoring architecture ==== ^Level ^ Component ^ Technology ^ | **Collection** | Built-in ''/metrics'' endpoint fast-epdg | Prometheus text format over HTTP | | **Storage** | Prometheus TSDB | Local storage, 15-day storage by default | | **Visualization** | Grafana + JSON support | Autodownload 4 dashboards | | **Alerting** | Alertmanager + SNMP Trap Sender | PromQL rules → webhook → SNMP v2c trap | ==== Quantitative review by category ==== ^ Category ^ Number of metrics ^ Survey interval ^ Key indicators ^ | **Config** | 2 | 10 sec | Configuration status, reload counter | | **Network** | 1 | 10 sec | Node connection status (PGW/AAA/HSS) | | **IKEv2 (SWu)** | 3 | 10 sec | Reports by type (IKE_SA_INIT, IKE_AUTH, CREATE_CHILD_SA), delay diagram, errors | | **GTPv2-C (S2b)** | 4 | 10 sec | Messages (Create/Modify/Delete Session), delays, errors, relays | | **GTP-U data plane** | 3 | 10 sec | Packets/bytes, tunneling errors | | **Diameter (SWm/SWx/S6b)** | 5 | 10 sec | Command code messages (DER/DEA, MAR/MAA, AAR/AAA), delays, errors, watchdog, connection status | | **Service KPI** | 4 | 10 sec | Percentage of successful attempts, duration histogram, service availability, uptime | | **Session State** | 4 | 10 sec | IKE SA, Child SA, GTP sessions, all users | | **Application** | 3 | 10 sec | Number of streams, memory, log messages by levels | | **System** | 4 | 10 sec | CPU recycling, memory, memory disposal, open FD | | **Total** | **33 metrics** | | | ==== Alarm categories ==== ^ Criticism ^ Alarma ^ Description ^ Reaction ^ | **Critical** | ''ePDG_Service_Down'', ''ePDG_High_Attach_Failure_Rate'', ''ePDG_PGW_Unreachable'', ''ePDG_AAA_Unreachable'', ''ePDG_Diameter_Watchdog_Timeout'' | Component is unavailable, widespread connection failures, nodes are unavailable | Immediate escalation: Email + SNMP Trap + Webhook. Repeat every hour | | **Warning** | ''ePDG_High_IKEv2_Latency'', ''ePDG_High_GTP_Latency'', ''ePDG_High_IKEv2_Error_Rate'', ''ePDG_High_GTP_Error_Rate'', ''ePDG_High_Memory_Usage'', ''ePDG_High_CPU_Usage'', ''ePDG_Low_Disk_Space'', ''ePDG_High_Error_Log_Rate'' | Performance degradation, resource anomalies | Email. Resend every 4 hours. Suppressed if a “Critical” status is present on the same component |