跳转到内容
Ask AI

Easy RoCE Configuration Guide

此内容尚不支持你的语言。

RDMA (Remote Direct Memory Access), as a network-based memory access technology, is widely adopted in supercomputing, AI training, storage, and other scenarios. Initially implemented on InfiniBand networks, RDMA later evolved into Ethernet-based protocols—iWARP and RoCE (RDMA over Converged Ethernet). RoCEv2 operates over the connectionless UDP protocol. Compared to connection-oriented TCP, UDP offers faster speeds and lower resource consumption. However, unlike TCP—which ensures reliable transmission through mechanisms like sliding windows and acknowledgment—RoCEv2 faces significant performance degradation upon packet loss. RDMA NICs discard subsequently received packets when loss occurs, forcing retransmission of all subsequent data from the sender. Thus, RDMA requires a lossless Ethernet environment. To address this, RoCEv2 employs PFC (Priority Flow Control) and ECN (Explicit Congestion Notification) technologies to guarantee data transmission reliability. To simplify the difficulty of lossless Ethernet deployment and maintenance, Asterfusion has launched “Easy RoCE” on AsterNOS. Focusing on the requirements of RoCEv2 scenario, we have implemented a business-level command line wrapper on high-rate products, such as CX532P-N, to achieve the best maintainability and usability in this scenario.

With key parameters Cable-length, Incast-level, Traffic-model, the system will automatically generate a lossless configuration template. It applies the default DSCP mapping, with PFC and ECN enabled for queue 3 and 4, and strict priority scheduling set for queue 6 and 7. The key parameters of Easy RoCE are detailed below:

  • Cable-length: cable length, used for the calculation of PFC and ECN parameters
  • Incast-level: incast traffic model, optional levels low, medium, high, for the calculation of PFC parameters
  • Traffic-mode: service type, optional throughput-sensitive service, delay-sensitive service, balanced type, used for the calculation of ECN parameters

The lossless RoCE configuration template generated by “Easy RoCE” is shown as follows:

sonic# show running-config
buffer-profile roce_lossless_25g_profile
mode lossless dynamic 1 size 1518 xoff 27104 xon-offset 13440
!
buffer-profile roce_lossless_50g_profile
mode lossless dynamic 1 size 1518 xoff 28448 xon-offset 13440
!
buffer-profile roce_lossless_100g_profile
mode lossless dynamic 1 size 1518 xoff 38816 xon-offset 13440
!
class-map roce_lossless_25g_class_map
match cos 3 4
!
class-map roce_lossless_50g_class_map
match cos 3 4
!
class-map roce_lossless_100g_class_map
match cos 3 4
!
diffserv-map type ip-dscp roce_lossless_25g_diffserv_map
default copy
!
diffserv-map type ip-dscp roce_lossless_50g_diffserv_map
default copy
!
diffserv-map type ip-dscp roce_lossless_100g_diffserv_map
default copy
!
wred roce_lossless_25g_ecn
mode ecn gmin 15360 gmax 1388576 gprobability 90
!
wred roce_lossless_50g_ecn
mode ecn gmin 15360 gmax 1388576 gprobability 90
!
wred roce_lossless_100g_ecn
mode ecn gmin 15360 gmax 1388576 gprobability 90
!
policy-map roce_lossless_25g
class roce_lossless_25g_class_map
priority-group-buffer roce_lossless_25g_profile
wred roce_lossless_25g_ecn
queue-scheduler priority queue 6
queue-scheduler priority queue 7
set cos dscp diffserv roce_lossless_25g_diffserv_map
!
policy-map roce_lossless_50g
class roce_lossless_50g_class_map
priority-group-buffer roce_lossless_50g_profile
wred roce_lossless_50g_ecn
queue-scheduler priority queue 6
queue-scheduler priority queue 7
set cos dscp diffserv roce_lossless_50g_diffserv_map
!
policy-map roce_lossless_100g
class roce_lossless_100g_class_map
priority-group-buffer roce_lossless_100g_profile
wred roce_lossless_100g_ecn
queue-scheduler priority queue 6
queue-scheduler priority queue 7
set cos dscp diffserv roce_lossless_100g_diffserv_map

Meanwhile, when the template above is not fully applicable to your business scenario, we recommend you to modify the parameters via command line, refer to Configuration and Parameter Tuning for details.

The default setting of RoCE is shown in the following table.

Table 1 Default setting of RoCE

ParametersDefault value
cable-length40m
incast-levellow
traffic-modellatency

Table 2 Create RoCE configuration templates

PurposeCommandDescription
Enter global configuration view.configure terminal-
Generate lossless RoCE configuration templates.qos roce lossless [cable-length length] [incast-level level] [traffic-model model]length: specify the cable length, optional 5m/40m/100m/300m.level: specify the incast model, optional low/ medium/high.model: specify the flow model, optional throughput/ latency/balance

Apply Lossless RoCE Configuration to All Interfaces

Section titled “Apply Lossless RoCE Configuration to All Interfaces”

Table 3 Apply lossless RoCE Configuration to All Interfaces

PurposeCommandDescription
Enter global configuration view.configure terminal-
Apply lossless RoCE configuration to all interfaces.qos service-policy {roce_lossless| RoCE_profile_name}roce_lossless: RoCE template under default parameters. RoCE_profile_name: the name of the specific RoCE template

Apply Lossless RoCE Configuration to Specified Interfaces

Section titled “Apply Lossless RoCE Configuration to Specified Interfaces”

Table 4 Apply lossless RoCE Configuration to Specified Interfaces

PurposeCommandDescription
Enter global configuration view.configure terminal-
Enter the RoCE template configuration viewqos roce RoCE_profile_nameRoCE_profile_name: Specifies the RoCE template name.
Bind the interfaces that need to be enabled for RoCE configurationbind interface {all|ethernet interface_namerange interface_name_list}****

When the default lossless RoCE configuration above is not fully applicable to your business scenario, you can adjust configurations and parameters through the command line to optimize business performance.

Table 5 Configure DSCP mapping

OperationCommandDescription
Enter global configuration view.configure terminal-
Enter DSCP mapping configuration view.diffserv-map type ip-dscp roce_lossless_ diffserv_map-
Modify DSCP to COS mapping.ip-dscp value cos cos_valuevalue: DSCP value, range 0-63
cos_value: COS value, range 0-7
default {cos_value|copy}cos_value indicates that all packets are mapped to the corresponding COS value.
default copy indicates to use default DSCP mapping of the system.

If you have already bound the lossless RoCE policy to interfaces, please unbind it first before modifying.

Table 6 Configure queue scheduling policy

PurposeCommandDescription
Enter global configuration view.configure terminal-
Enter lossless RoCE policy configuration view.policy-map roce_lossless-
Configure SP mode scheduling.queue-scheduler priority queue queue-idqueue-id range from 0 to 7.
Configure DWRR mode scheduling.queue-scheduler queue-limit percent queue-weight queue queue-idpercentage specifies the scheduling weight of DWRR, range from 0 to 100.queue-id range from 0 to 7.

Table 7 Set PFC threshold

PurposeCommandDescription
Enter global configuration view.configure terminal-
Enter PFC configuration view.buffer-profile roce_lossless_profile-
Modify PFC lossless buffer.mode lossless dynamic dynamic_th size size xoff xoff xon-offset xon-offset [xon xon]-
  • dynamic_th: is a dynamic threshold coefficient, and the range is [-4,3]. Dynamic_th = 2dynamic_th*remaining available buffer. e.g., if dynamic_th is set to 1, then the dynamic threshold is 2 times of the remaining available buffer, i.e., the actual threshold is 2/3 of the total available buffer.
  • size: specifies the reservation size in bytes, and the recommended configuration value is 1518.
  • xoff: is PFC backpressure frame trigger buffer threshold value, and it is recommended to configure it as an integer multiple of the cell size in bytes. xoff is related to the cable length, interface rate and other parameters, and you can refer to the recommended configuration values for configuration. xoff must be greater than the xon value.
  • xon-offset: is PFC backpressure frame stop buffer threshold value, which is recommended to be an integer multiple of the cell size, and the unit is byte. The recommended configuration value is 13440.
  • xon: is an optional parameter and is normally configured as 0.

Table 8 Set ECN threshold

PurposeCommandDescription
Enter global configuration view.configure terminal-
Enter ECN configuration view.wred roce_lossless_ecn-
Modify ECN parameters.mode ecn gmin min_th gmax max_th gprobability probability [ymin min_th ymax max_th yprobability probability|rmin min_th rmax max_th rprobability probability]
  • min_th: set the low limit absolute value of ECN in bytes. When the message length in the queue reaches this value, the interface starts to set the ECN field of the message to CE according to the probability. The configurable minimum value is 15 KB. The recommended configuration value is 15360.
  • max_th: set the high limit absolute value of ECN in bytes. When the message length in the queue reaches this value, the interface starts to set ECN field of all packets to CE. The recommended configuration values for the different rate interfaces are as follows: 100G(bps) --- 768000(bytes) 200G(bps) --- 768000(bytes) 400G(bps) --- 1536000(bytes)
  • probability: set the maximum discard probability in integer form. The range is [1,100]. It is recommended to set the drop probability to 90 percent for latency-sensitive services and 10 percent for throughput-sensitive services.

Table 9 “Easy RoCE” Display and Maintenance

PurposeCommandDescription
Display RoCE-related configurations.show qos roce [all|summary|RoCE_profile_name]By default, only RoCE configuration templates in use (i.e., bound interfaces) are displayed. all: view all created RoCE configurations.
summary: to view the summary of RoCE configuration RoCE_profile_name: specify the RoCE configuration template name to be viewed
Display the relationship between interfaces and policy-maps.show interface policy-map-
Display RoCE statistics of the interface.show counters qos roce interface interface_name queue queue-id-
Clear RoCE statistics of all interfaces.clear counters qos roce-
  1. Enable Easy RoCE and apply it to all interfaces.
sonic# configure terminal
sonic(config)# qos roce lossless cable-length 40m incast-level low traffic-model latency
sonic(config)# qos service-policy roce_lossless_40m_low_latency
  1. Display RoCE-related configurations.
sonic# show qos roce
Notice: Displaying configurations of in-use RoCE profiles
==> RoCE Profile: roce_lossless_40m_low_latency | RoCE Policy Map: roce_lossless_40m_low_latency_25g <==
+--------------------+-------------------+-----------------------------------------------------+
| | Operational | Description |
+====================+===================+=====================================================+
| Mode | Lossless | QoS RoCE mode |
+--------------------+-------------------+-----------------------------------------------------+
| Status | Bind: 0/104-0/107 | QoS RoCE binding status |
+--------------------+-------------------+-----------------------------------------------------+
| Cable Length | 40m | Cable length in meters for QoS RoCE lossless config |
+--------------------+-------------------+-----------------------------------------------------+
| Congestion-Control | - | - |
| - Congestion Mode | ECN | Congestion control mode |
| - Enabled TC | 3,4 | Congestion control config enabled traffic class |
| - Max Threshold | 1388576 | Congestion control config max threshold |
| - Min Threshold | 15360 | Congestion control config max threshold |
+--------------------+-------------------+-----------------------------------------------------+
| PFC | - | - |
| - PFC Priority | 3,4 | PFC enabled switch priority |
| - TX Status | Enabled | PFC RX status |
| - RX Status | Enabled | PFC TX status |
+--------------------+-------------------+-----------------------------------------------------+
| Trust | - | - |
| - Trust Mode | DSCP | Trust setting for packet classification |
+--------------------+-------------------+-----------------------------------------------------+
====> RoCE DSCP->SP Mapping Configurations <====
+-------------------------+-------------------+
| DSCP | Switch Priority |
+=========================+===================+
| 0,1,2,3,4,5,6,7 | 0 |
| 8,9,10,11,12,13,14,15 | 1 |
| 16,17,18,19,20,21,22,23 | 2 |
| 24,25,26,27,28,29,30,31 | 3 |
| 32,33,34,35,36,37,38,39 | 4 |
| 40,41,42,43,44,45,46,47 | 5 |
| 48,49,50,51,52,53,54,55 | 6 |
| 56,57,58,59,60,61,62,63 | 7 |
+-------------------------+-------------------+
====> RoCE SP->TC Mapping & ETS Configurations <====
+-------------------+--------+----------+
| Switch Priority | Mode | Weight |
+===================+========+==========+
| 6 | SP | - |
| 7 | SP | - |
+-------------------+--------+----------+
====> PFC Profile Configurations <====
+--------------------------------------------+-------------------+
| Profile Name | Switch Priority |
+============================================+===================+
| egress_lossless_profile | 3,4 |
| egress_lossy_profile | 0,1,2,5,6,7 |
| ingress_lossy_profile | 0,1,2,5,6,7 |
| roce_lossless_40m_low_latency_25g_profile | 3,4 |
| roce_lossless_40m_low_latency_50g_profile | 3,4 |
| roce_lossless_40m_low_latency_100g_profile | 3,4 |
+--------------------------------------------+-------------------+
……
  1. Display RoCE statistics of the interface.
sonic# show counters qos roce interface 0/32 queue 3
operational
----------------------------- -----------------
roce states Ethernet32.3
pfc-stats
- pfc_rx_stats 0
- pfc_tx_stats 402
- pg-stats
- total_packet 11,380,786,999
- total_bytes 1,456,740,735,872
- drop_packet 0
- curr_occupancy 0
ecn-stats
- ecn_stats 0
- ecn_buffer
- shared_use_watermark_byte 0
- total_use_watermark_byte 0
- total_use_count_byte 0
queue-stats
- Counter_pkts 0
- Counter_bytes 0
- Drop_pkts 0
- Drop_bytes 0
- CounterRate_pkts 0.0
- CounterRate_bytes 0.0
- DropRate_pkts 0.0
- DropRate_bytes 0.0
- Occupancy_bytes 0
- SharedOccupancy_bytes 0