BRAS Active-Standby (Master-Backup) redundancy [Документация VAS Experts]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
en:dpi:bras_bng:replication [2024/09/26 15:29] – created - external edit 127.0.0.1en:dpi:bras_bng:replication [2025/09/30 15:53] (current) elena.krasnobryzh
Line 1: Line 1:
-====== BNG/BRAS reservation ====== 
 {{indexmenu_n>12}} {{indexmenu_n>12}}
 +====== BRAS Active-Standby (Master-Backup) redundancy ======
 +===== Description of the switching algorithm for BRAS L2 (DHCP, Static IP) =====
 +  * BRAS L2 redundancy for IPoE (DHCP and Static IP) is recommended using the Active-Standby scheme, which involves connecting two SSG BRAS devices to a single broadcast L2 domain: one in Master mode, the other [[en:dpi:licensing#reserve_ssg_license|in Backup mode (hot standby)]].
 +  * The Master is the active server and processes traffic during normal network operation. The Backup is in standby state and does not pass traffic through itself; the DPDK interfaces towards subscribers (IN ports) are administratively shut down (down).
 +  * The Backup server monitors the operation of the Master server using a script (heartbeat) via dedicated management ports. Upon detecting a failure of the Master server, the Backup server automatically activates (up) the DPDK interfaces towards subscribers (IN ports) and begins processing traffic.
 +  * Single traffic switchover to the Backup server and stopping the Master server is implemented to avoid multiple traffic transfers and network impact. Switching traffic back to the Master is performed by the network administrator manually.
 +  * For correct operation, all service profiles must be identically configured on both the Master and Backup servers; we recommend using a profile synchronization script.
 +  * Note that SSG BRAS supports dynamic (OSPF, BGP) and static routing. In case of dynamic routing, for Static IP subscribers with public IP addresses, the announcement will change automatically when switching to the Backup server; for subscribers with private addresses, a NAT profile will be applied, under the same name, but from a different public address pool configured on the Backup server.
  
-The following replication scheme is used in the Stingray Service Gateway ver. 8.3+ to align subscriber data on all fastDPI serversfastPCRF sends authorization responses and CoA requests to all the servers listed in the [[en:dpi:bras_bng:radius_integration:radius_auth_fastpcrf_setup|fdpi_server]] configuration parameters.  +{{ :en:dpi:bras_bng:bras_replication.png?nolink&900 |}}
-<note important>Authorization parameters are sent by using a persistent queue so even if some of the fastDPI servers are inactive at the time of data transmission, they will receive all the data missed during their idle time once they are activated.</note>+
  
-===== Ways the data is applied on the DPI ===== +===== Master server status monitoring script ===== 
-When receiving authorization data, the fastDPI server identifies whether it was response to its own request or it was a response to another request (there is a special label in the packet for this purpose). If this is response to its own request, the data will be applied completely: a DHCP or PPPoE session will be created in case of DHCP or PPPoE authorization request and the data will be stored in UDRIf this is an answer to another request, the fastDPI will simply store corresponding "extraneous" data in the UDR. Thus, when the main fastDPI server becomes unavailable all the load would be imposed on the backup fastDPI server and the latter will already have contain all the subscriber properties in its UDRsubscriver services, its policing, L2 properties MAC address, VLAN, etc. That isthe UDR of the main and backup servers will  essentially contain the consistent data.+The script must be installed on the Backup server, where it runs in a continuous loopmonitoring the state of the Master server via SSH.\\ 
 +**Four checks** are used to confirm the normal operation of the Master server: 
 +  Server is reachable over the network (pingcheck) 
 +  - The fastDPI process is present 
 +  - The PID of the fastDPI process has not changed (no uncontrolled process restart) 
 +  - The link state on the main fastDPI has not changed (optional check)This check is disabled by default, as it may not be needed in some topologies
  
-===== BRAS L2 backup ===== +**Script Installation Process:** 
-Backing up of BRAS in L2 mode involves the connecting up of two Stingray Service Gateways in one L2 broadcast domain+  - Download all files from the {{:en:dpi:bras_bng:replication:reservation_script.zip|archive}} to the target backup server 
-One of them in Master mode and the other in Slave one+  - Configure the Master server's IP address in the ''SRS.sh'' script 
-Master SSG carries out traffic processing along with users authorization through the PCRF server. +  - Create an SSH key pair on the Backup server using the command <code bash>ssh-keygen -t ed25519</code> 
-Slave does not pass traffic through itself, dpdk interfaces are in the traffic standby mode (down). Subscribers information is synchronized through the PCRF server+  - Create a new user with sudo rights on the Master server 
-Slave monitors the availability and performance of the Master and when the last fails, Slave will activate (up) dpdk interfaces and start to process traffic automatically or manually. +  - Copy the private SSH keys from the Backup server to the ''authorized_keys'' file of the new account on the Master server 
-An example of DPI connection and routes the traffic passes through it are presented in the diagram below+  - Add execute permissions to the installation script using the command <code bash>chmod +x install.sh</code> 
-{{ :en:dpi:bras_bng:replication:bras_l2_reservation.png?nolink&750 |}}+  - Run <code bash>install.sh</code>
  
-==== Database synchronization ==== +**Service Management:** 
-FastPCRF is responsible for synchronization, its configuration is described in the section [[en:dpi:bras_bng:replication|Replication of authorization data]].+  - Start the service<code bash>systemctl start fastsrs</code> 
 +  - Check service status<code bash>systemctl status fastsrs</code> 
 +  - Stop the service: <code bash>systemctl stop fastsrs</code> 
 +  - Check service logs<code bash>journalctl -u fastsrs</code>
  
-=== Configuring the SSG Master mode ===+===== Service profile synchronization script ===== 
 +The script synchronizes service profiles [[en:dpi:dpi_options:opt_filtration:filtration_ctrl#activation_of_ipv6_traffic_blocking_service|4 (blacklist filtering)]], [[en:dpi:dpi_options:opt_capture:capt_mgmt#management_of_a_default_profile_service_5|5 (whitelist and Captive Portal)]], [[en:dpi:dpi_options:opt_shaping:shaping_session|18 (session policing and traffic class override)]] and [[en:dpi:dpi_components:platform:subscriber_management:policing_mng|policing]] between Master and Backup servers.\\ 
 +The script runs on the Master server; service profiles on the Backup server will be aligned with those on the Master server. Profile transfer is performed using ''fdpi_ctrl'' commands and remote SSH access.
  
-=== Configuring the SSG Slave mode ===+System Requirements: 
 +  * SSH 
 +  * Bash 
 +  * Jq 
 +  * Installed SSG 
 +  * Rsync
  
-== Algorithm description == +Script Logic:\\ 
-Stingray Service Gateway backup concept - MASTER-SLAVE (L2-BRAS)+The script retrieves the current service profile from the Master server and then sends it to the specified Backup serverThen the script connects to the Backup server and retrieves data for profiles present on the Master serverretrieves the profile data on the Backup servercompares themand deletes profiles missing on the Master server.
-  * 1. MASTER is running 99% of the time, it can be disabled or may fail +
-  * 2. When being recovered MASTER always treacherously proceed to process the traffic +
-  * 3SLAVE just accepts replications from MASTER and saves them in UDR in 99% of the time   +
-  * 4. There is a third party that switches traffic to MASTER or to SLAVE, depending on the current situation: +
-  *  4.1. MASTER is availableSLAVE is available, then the traffic will be switched to MASTER +
-  *  4.2. MASTER is available, SLAVE isn't, then the traffic will be switched to MASTER +
-  *  4.3. MASTER is not availableSLAVE is availablethen the traffic will be switched to SLAVE +
-  *  4.4. MASTER and SLAVE are not available, then the traffic will be switched to MASTER+
  
-MASTER-> SLAVE toggling+==== Installation and management ==== 
-  * 1The third party detects that MASTER becomes unavailable and switches all the traffic to SLAVE +  Configure certificate authenticationcreate a certificate on the Master server using ''ssh-keygen -t ed25519''; using the root account for authentication is easiest. 
-  * 2. Delays when switching are barely perceptible (physically and logicallydue to 99% SLAVE's UDR contains replicated data+  - Download the {{:en:dpi:bras_bng:profile_sync.sh|script}} to the Master server and place it in the ''/usr/local/bin/'' directory 
 +  - Add permissions for the script using the command <code bash>chmod +x /usr/local/bin/profile_sync.sh</code> 
 +  - Configure the user and IP of the Backup server within the script. The user must have write access to the ''/etc/dpi'' directory; the simplest option is to use the root user. Another user with appropriate rights can also be configured. 
 +  - Configure cron to run the script at desired intervals **(optional)**:<code bash>crontab -u root -e 
 +0 * * * * * /bin/bash /usr/local/bin/profile_sync.sh</code> 
 +  - Add a bash alias to run the script on demand:<code bash>echo "alias dpi_sync='/bin/bash /usr/local/bin/profile_sync.sh'">> ~/.bashrc</code> 
 +  - Create the directory ''/etc/dpi/service18'' and save all service 18 files in it.
  
-Bootstrap MASTER'а (SLAVE is active and process traffic)+Script Operation:\\ 
-  * 1. MASTER has the fastdpi+fastpcrf services running and enabled (they were started on boot) +The script is run by crontab at specified intervals or manually using the ''dpi_sync'' command.
-  * 2. MASTER detects that SLAVE is active and stores relevant data +
-  * 3. MASTER stops its fastdpi + fastpcrf services +
-  * 4. MASTER backups UDR on SLAVE and takes it back +
-  * 5. MASTER starts its fastdpi + fastpcrf +
-  * 6. A third party detects that the MASTER becomes available and switches the traffic to it +
- +
-Bootstrap MASTER'а (SLAVE is anavailable): +
-  * 1. MASTER has the fastdpi+fastpcrf services running and enabled (they were started on boot) +
-  * 2. MASTER determines that SLAVE is not available, considers that UDR it holds is more relevant than the one located on SLAVE, continues to work normally +
-  * 3. A third party detects that the MASTER becomes available and switches the traffic to it +
- +
-Bootstrap SLAVE'а (MASTER is active and process traffic): +
-  * 1. SLAVE has the fastdpi+fastpcrf services running and enabled (they were started on boot) +
-  * 2. SLAVE detects that MASTER is active and stores relevant data +
-  * 3. SLAVE stops its fastdpi + fastpcrf services +
-  * 4. SLAVE backups UDR on MASTER and takes it back +
-  * 5. SLAVE starts its fastdpi + fastpcrf +
-  * 6. SLAVE starts to replicate data +
- +
-Bootstrap SLAVE'а (MASTER is unavailable): +
-  * 1. SLAVE has the fastdpi+fastpcrf services running and enabled (they were started on boot) +
-  * 2. SLAVE detects that MASTER is unavailable, considers that UDR it holds is more relevant than the one located on the currently unavailable MASTER, continues to work normally +
-  * 3. A third party detects that SLAVE becomes available and switches the traffic to it +
- +
-=====Script for active DPI reservation===== +
-The script should be installed on the reserve DPI where it runs in a continious loop monitoring the state of main dpi via ssh.\\ +
-This script uss **4 checks** to confirm that main dpi is working: +
-  - The server is rachable via network (pingcheck) +
-  - The fastDPI process is present +
-  - FastDPI process PID did not change +
-  - The link state on the main DPI did not change (optional check). This check is disabled by default as it may not be necessary in some installations.+
  
-**Installation process:** +Note that if a service profile is applied to a subscriber, it will not be deleted. Also note that any files not saved in the ''service18'' folder will not be transferred to the Backup server, and thus the synchronized service profile 18 will not workIf the alias ''dpi_sync'' is absent, the script should be run via ''sudo bash /usr/local/bin/profile_sync.sh''.
-  - Download all files from the {{ :en:dpi:bras_bng:replication:reservation_script.zip |archive}} to your target reserve server. +
-  - Configure main server ip inside the ''SRS.sh'' script +
-  - Create an ssh key pair on the reserve server via <code bash>ssh-keygen -t ed25519</code> +
-  - Create a new sudo user account on the main server  +
-  - Copy private ssh keys from reserve server to the new account authorized keys file on the main server +
-  - Add permissions to installation scrip via <code bash>chomd +x install.sh</code> +
-  - Run ''install.sh''+
  
-**Controlling the service:** 
-  - Starting the service: <code bash>systemctl start fastsrs </code> 
-  - Checking service status: <code bash>systemctl status fastsrs</code> 
-  - Stoping the service: <code bash>systemctl stop fastsrs</code> 
-  - Checking service logs: <code bash>journalctl -u fastsrs</code>