idnits 2.17.00 (12 Aug 2021)
/tmp/idnits16170/draft-dcn-bmwg-containerized-infra-08.txt:
Checking boilerplate required by RFC 5378 and the IETF Trust (see
https://trustee.ietf.org/license-info):
----------------------------------------------------------------------------
No issues found here.
Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
----------------------------------------------------------------------------
No issues found here.
Checking nits according to https://www.ietf.org/id-info/checklist :
----------------------------------------------------------------------------
** The document seems to lack an IANA Considerations section. (See Section
2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
when there are no actions for IANA.)
Miscellaneous warnings:
----------------------------------------------------------------------------
== Line 1166 has weird spacing: '...vSwitch v ...'
== The document seems to lack the recommended RFC 2119 boilerplate, even if
it appears to use RFC 2119 keywords -- however, there's a paragraph with
a matching beginning. Boilerplate error?
(The document does seem to have the reference to RFC 2119 which the
ID-Checklist requires).
-- The document date (March 2022) is 60 days in the past. Is this
intentional?
Checking references for intended status: Informational
----------------------------------------------------------------------------
No issues found here.
Summary: 1 error (**), 0 flaws (~~), 2 warnings (==), 1 comment (--).
Run idnits with the --verbose option for more detailed information about
the items above.
--------------------------------------------------------------------------------
2 Benchmarking Methodology Working Group K. Sun
3 Internet-Draft ETRI
4 Intended status: Informational H. Yang
5 Expires: 3 September 2022 KT
6 J. Lee
7 T. Ngoc
8 Y. Kim
9 Soongsil University
10 March 2022
12 Considerations for Benchmarking Network Performance in Containerized
13 Infrastructures
14 draft-dcn-bmwg-containerized-infra-08
16 Abstract
18 This draft describes considerations for benchmarking network
19 performance in containerized infrastructures. In the containerized
20 infrastructure, Virtualized Network Functions(VNFs) are deployed on
21 an operating-system-level virtualization platform by abstracting the
22 user namespace as opposed to virtualization using a hypervisor.
23 Hence, the system configurations and networking scenarios for
24 benchmarking will be partially changed by how the resource allocation
25 and network technologies are specified for containerized VNFs. This
26 draft compares the state of the art in the container networking
27 architecture with VM-based virtualized systems networking
28 architecture and provides several test scenarios for benchmarking
29 network performance in containerized infrastructures.
31 Status of This Memo
33 This Internet-Draft is submitted in full conformance with the
34 provisions of BCP 78 and BCP 79.
36 Internet-Drafts are working documents of the Internet Engineering
37 Task Force (IETF). Note that other groups may also distribute
38 working documents as Internet-Drafts. The list of current Internet-
39 Drafts is at https://datatracker.ietf.org/drafts/current/.
41 Internet-Drafts are draft documents valid for a maximum of six months
42 and may be updated, replaced, or obsoleted by other documents at any
43 time. It is inappropriate to use Internet-Drafts as reference
44 material or to cite them other than as "work in progress."
46 This Internet-Draft will expire on 2 September 2022.
48 Copyright Notice
50 Copyright (c) 2022 IETF Trust and the persons identified as the
51 document authors. All rights reserved.
53 This document is subject to BCP 78 and the IETF Trust's Legal
54 Provisions Relating to IETF Documents (https://trustee.ietf.org/
55 license-info) in effect on the date of publication of this document.
56 Please review these documents carefully, as they describe your rights
57 and restrictions with respect to this document. Code Components
58 extracted from this document must include Revised BSD License text as
59 described in Section 4.e of the Trust Legal Provisions and are
60 provided without warranty as described in the Revised BSD License.
62 Table of Contents
64 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
65 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4
66 3. Containerized Infrastructure Overview . . . . . . . . . . . . 4
67 4. Networking Models in Containerized Infrastructure . . . . . . 8
68 4.1. Kernel-space vSwitch Model . . . . . . . . . . . . . . . 9
69 4.2. User-space vSwitch Model . . . . . . . . . . . . . . . . 10
70 4.3. eBPF Acceleration Model . . . . . . . . . . . . . . . . . 10
71 4.4. Smart-NIC Acceleration Model . . . . . . . . . . . . . . 12
72 4.5. Model Combination . . . . . . . . . . . . . . . . . . . . 13
73 5. Performance Impacts . . . . . . . . . . . . . . . . . . . . . 14
74 5.1. CPU Isolation / NUMA Affinity . . . . . . . . . . . . . . 14
75 5.2. Hugepages . . . . . . . . . . . . . . . . . . . . . . . . 15
76 5.3. Service Function Chaining . . . . . . . . . . . . . . . . 15
77 5.4. Additional Considerations . . . . . . . . . . . . . . . . 16
78 6. Security Considerations . . . . . . . . . . . . . . . . . . . 16
79 7. References . . . . . . . . . . . . . . . . . . . . . . . . . 16
80 7.1. Informative References . . . . . . . . . . . . . . . . . 16
81 Appendix A. Benchmarking Experience(Contiv-VPP) . . . . . . . . 18
82 A.1. Benchmarking Environment . . . . . . . . . . . . . . . . 18
83 A.2. Trouble shooting and Result . . . . . . . . . . . . . . . 22
84 Appendix B. Benchmarking Experience(SR-IOV with DPDK) . . . . . 23
85 B.1. Benchmarking Environment . . . . . . . . . . . . . . . . 24
86 B.2. Trouble shooting and Results . . . . . . . . . . . . . . 27
87 Appendix C. Benchmarking Experience(Multi-pod Test) . . . . . . 27
88 C.1. Benchmarking Overview . . . . . . . . . . . . . . . . . . 27
89 C.2. Hardware Configurations . . . . . . . . . . . . . . . . . 28
90 C.3. NUMA Allocation Scenario . . . . . . . . . . . . . . . . 30
91 C.4. Traffic Generator Configurations . . . . . . . . . . . . 30
92 C.5. Benchmark Results and Trouble-shootings . . . . . . . . . 30
93 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 31
95 1. Introduction
97 The Benchmarking Methodology Working Group(BMWG) has recently
98 expanded its benchmarking scope from Physical Network Function(PNF)
99 running on a dedicated hardware system to Network Function
100 Virtualization(NFV) infrastructure and Virtualized Network
101 Function(VNF). [RFC8172] described considerations for configuring
102 NFV infrastructure and benchmarking metrics, and [RFC8204] gives
103 guidelines for benchmarking virtual switch which connects VNFs in
104 Open Platform for NFV(OPNFV).
106 Recently NFV infrastructure has evolved to include a lightweight
107 virtualized platform called the containerized infrastructure, where
108 VNFs share the same host Operating System(OS) and are logically
109 isolated by using a different namespace. While previous NFV
110 infrastructure uses a hypervisor to allocate resources for Virtual
111 Machine(VMs) and instantiate VNFs, the containerized infrastructure
112 virtualizes resources without a hypervisor, making containers very
113 lightweight and more efficient in infrastructure resource utilization
114 compared to the VM-based NFV infrastructure. When we consider
115 benchmarking for VNFs in the containerized infrastructure, it may
116 have a different System Under Test(SUT) and Device Under Test(DUT)
117 configuration compared with both black-box benchmarking and VM-based
118 NFV infrastructure as described in [RFC8172]. Accordingly,
119 additional configuration parameters and testing strategies may be
120 required.
122 In the containerized infrastructure, a VNF network is implemented by
123 running both switch and router functions in the host system. For
124 example, the internal communication between VNFs in the same host
125 uses the L2 bridge function, while communication with external
126 node(s) uses the L3 router function. For container networking, the
127 host system may use a virtual switch(vSwitch), but other options
128 exist. In the [ETSI-TST-009], they describe differences in
129 networking structure between the VM-based and the containerized
130 infrastructure. Occasioned by these differences, deployment
131 scenarios for testing network performance described in [RFC8204] may
132 be partially applied to the containerized infrastructure, but other
133 scenarios may be required.
135 This draft aims to distinguish benchmarking of containerized
136 infrastructure from the previous benchmarking methodology of common
137 NFV infrastructure. Considering the point in [RFC8204] that virtual
138 switch (vSwitch) is the networking principle of containerized
139 infrastructure, this draft investigates different network models
140 based on vSwitch location and acceleration technologies. At the same
141 time, it is essential to uncover the impact of different deployment
142 configurations on containerized infrastructure, such as resource
143 isolation, hugepages, service function chaining. The benchmark
144 experiences of various combinations of these mentioned configurations
145 and and networking models are also presented in this draft as the
146 references to set up and benchmark containerized infrastructure.
147 Note that, although the detailed configurations of both
148 infrastructures differ, the new benchmarks and metrics defined in
149 [RFC8172] can be equally applied in containerized infrastructure from
150 a generic-NFV point of view, and therefore defining additional
151 metrics or methodologies are out of scope.
153 2. Terminology
155 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
156 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
157 document is to be interpreted as described in [RFC2119]. This
158 document uses the terminology described in [RFC8172], [RFC8204],
159 [ETSI-TST-009].
161 3. Containerized Infrastructure Overview
163 For benchmarking of the containerized infrastructure, as mentioned in
164 [RFC8172], the basic approach is to reuse existing benchmarking
165 methods developed within the BMWG. Various network function
166 specifications defined in BMWG should still be applied to
167 containerized VNF(C-VNF)s for the performance comparison with
168 physical network functions and VM-based VNFs. A major distinction of
169 the containerized infrastructure from the VM-based infrastructure is
170 the absence of a hypervisor. Without hypervisor, all C- VNFs share
171 the same host resources, including but not limited to computing,
172 storage, and networking resources, as well as the host Operating
173 System(OS), kernel, and libraries. These architectural differences
174 bring additional considerations of resource management impacts for
175 benchmarking.
177 In a common containerized infrastructure, thanks to the proliferation
178 of Kubernetes, the pod is defined as a basic unit for orchestration
179 and management that can host multiple containers. Based on that,
180 [ETSI-TST-009] defined two test scenario for container infrastructure
181 as follows.
183 o Container2Container: Communication between containers running in
184 the same pod. it can be done by shared volumes or Inter-process
185 communication (IPC).
187 o Pod2Pod: Communication between containers running in the different
188 pods.
190 As mentioned in [RFC8204], vSwitch is also an important aspect of the
191 containerized infrastructure. For Pod2Pod communication, every pod
192 has only one virtual Ethernet (vETH) interface. This interface is
193 connected to the vSwitch via vETH pair for each container. Not only
194 Pod2Pod but also Pod2External scenario that communicates with an
195 external node is also required. In this case, vSwitch SHOULD support
196 gateway and Network Address Translation (NAT) functionalities.
198 Figure 1 shows briefly differences of network architectures based on
199 container deployment models. Basically, on bare metal, C-VNFs can be
200 deployed as a cluster called POD by Kubernetes. Otherwise, each
201 C-VNF can be deployed separately using Docker. In the former case,
202 there is only one external network interface, even a POD containing
203 more than one C-VNF. An additional deployment model considers a
204 scenario where C-VNFs or PODs are running on VM. In our draft, we
205 define new terminologies; BMP, which is Pod on bare metal, and VMP,
206 which is Pod on VM.
208 +---------------------------------------------------------------------+
209 | Baremetal Node |
210 | +--------------+ +--------------+ +-------------- + +-------------+ |
211 | | | | POD | | VM | | VM | |
212 | | | |+------------+| |+-------------+| | +-------+ | |
213 | | C-VNF(A) | || C-VNFs(B) || || C-VNFs(C) || | |PODs(D)| | |
214 | | | |+------------+| |+-----^-------+| | +---^---+ | |
215 | | | | | | | | | | | |
216 | | +------+ | | +------+ | | +--v---+ | | +---v--+ | |
217 | +---| veth |---+ +---| veth |---+ +---|virtio|----+ +--|virtio|---+ |
218 | +--^---+ +---^--+ +--^---+ +---^--+ |
219 | | | | | |
220 | | | +--v---+ +---v--+ |
221 | +------|-----------------|------------|vhost |---------|vhost |---+ |
222 | | | | +--^---+ +---^--+ | |
223 | | | | | | | |
224 | | +--v---+ +---v--+ +--v---+ +---v--+ | |
225 | | +-| veth |---------| veth |---------| Tap |---------| Tap |-+ | |
226 | | | +--^---+ +---^--+ +--^---+ +---^--+ | | |
227 | | | | | vSwitch | | | | |
228 | | | +--|-----------------|---------------|-----------------|--+ | | |
229 | | +-| | | Bridge | | |-+ | |
230 | | +--|-----------------|---------------|-----------------|--+ | |
231 | | | +---------+ | +--|-----------------|---+ | |
232 | | | |Container| | | | Hypervisor | | | |
233 | | | | Engine | | | | | | | |
234 | | | +---------+ | +--|-----------------|---+ | |
235 | | | | Host Kernel | | | |
236 | +------|-----------------|---------------|-----------------|------+ |
237 | +--v-----------------v---------------v-----------------v--+ |
238 +-----| physical network |-----+
239 +---------------------------------------------------------+
241 Figure 1: Examples of Networking Architecture based on Deployment
242 Models - (A)C-VNF on Baremetal (B)Pod on Baremetal(BMP) (C)C-VNF
243 on VM (D)Pod on VM(VMP)
245 In [ETSI-TST-009], they described data plane test scenarios in a
246 single host. In that document, there are two scenarios for
247 containerized infrastructure; Container2Container, which is internal
248 communication between two containers in the same Pod, and the Pod2Pod
249 model, which is communication between two containers running in
250 different Pods. According to our new terminologies, we can call the
251 Pod2Pod model the BMP2BMP scenario. When we consider container
252 running on VM as an additional deployment option, there can be more
253 single host test scenarios as follows;
255 o BMP2VMP scenario
257 +---------------------------------------------------------------------+
258 | HOST +-----------------------------+ |
259 | |VM +-------------------+ | |
260 | | | C-VNF | | |
261 | +--------------------+ | | +--------------+ | | |
262 | | C-VNF | | | | Logical Port | | | |
263 | | +--------------+ | | +-+--^-------^---+--+ | |
264 | | | Logical Port | | | +----|-------|---+ | |
265 | +-+--^-------^---+---+ | | Logical Port | | |
266 | | | +---+----^-------^---+--------+ |
267 | | | | | |
268 | +----v-------|----------------------------|-------v-------------+ |
269 | | l----------------------------l | |
270 | | Data Plane Networking | |
271 | | (Kernel or User space) | |
272 | +----^--------------------------------------------^-------------+ |
273 | | | |
274 | +----v------+ +----v------+ |
275 | | Phy Port | | Phy Port | |
276 | +-----------+ +-----------+
277 +-------^--------------------------------------------^----------------+
278 | |
279 +-------v--------------------------------------------v----------------+
280 | |
281 | Traffic Generator |
282 | |
283 +---------------------------------------------------------------------+
285 Figure 2: Single Host Test Scenario - BMP2VMP
287 o VMP2VMP scenario
289 +---------------------------------------------------------------------+
290 | HOST |
291 | +-----------------------------+ +-----------------------------+ |
292 | |VM +-------------------+ | |VM +-------------------+ | |
293 | | | C-VNF | | | | C-VNF | | |
294 | | | +--------------+ | | | | +--------------+ | | |
295 | | | | Logical Port | | | | | | Logical Port | | | |
296 | | +-+--^-------^---+--+ | | +-+--^-------^---+--+ | |
297 | | +----|-------|---+ | | +----|-------|---+ | |
298 | | | Logical Port | | | | Logical Port | | |
299 | +---+----^-------^---+--------+ +---+----^-------^---+--------+ |
300 | | | | | |
301 | +--------v-------v------------------------|-------v-------------+ |
302 | | l------------------------l | |
303 | | Data Plane Networking | |
304 | | (Kernel or User space) | |
305 | +----^--------------------------------------------^-------------+ |
306 | | | |
307 | +----v------+ +----v------+ |
308 | | Phy Port | | Phy Port | |
309 | +-----------+ +-----------+ |
310 +-------^--------------------------------------------^----------------+
311 | |
312 +-------v--------------------------------------------v----------------+
313 | |
314 | Traffic Generator |
315 | |
316 +---------------------------------------------------------------------+
318 Figure 3: Single Host Test Scenario - VMP2VMP
320 4. Networking Models in Containerized Infrastructure
322 Container networking services are provided as network plugins.
323 Basically, by using them, network services are deployed as an
324 isolation environment from container runtime through the host
325 namespace, creating a virtual interface, allocating interface and IP
326 address to C-VNF. Since the containerized infrastructure has
327 different network architecture depending on its using plugins, it is
328 necessary to specify the plugin used in the infrastructure.
329 Especially for Kubernetes infrastructure, several Container
330 Networking Interface (CNI) plugins are developed, which describes
331 network configuration files in JSON format, and plugins are
332 instantiated as new namespaces. When the CNI plugin is initiated, it
333 pushes forwarding rules and networking policies to the existing
334 vSwitch (i.e., Linux bridge, Open vSwitch) or creates its own switch
335 functions to provide networking service.
337 The container network model can be classified according to the
338 location of the vSwitch component. There are some CNI plugins that
339 provide networking without the vSwitch components; however, this
340 draft focuses on plugins using vSwitch components.
342 4.1. Kernel-space vSwitch Model
344 +------------------------------------------------------------------+
345 | User Space |
346 | +-----------+ +-----------+ |
347 | | C-VNF | | C-VNF | |
348 | | +-------+ | | +-------+ | |
349 | +-| eth |-+ +-| eth |-+ |
350 | +---^---+ +---^---+ |
351 | | | |
352 | | +----------------------------------+ | |
353 | | | | | |
354 | | | Networking Controller / Agent | | |
355 | | | | | |
356 | | +-----------------^^---------------+ | |
357 ----------|-----------------------||---------------------|----------
358 | +---v---+ || +---v---+ |
359 | +--| veth |-------------------vv-----------------| veth |--+ |
360 | | +-------+ vSwitch Component +-------+ | |
361 | | (OVS Kernel Datapath, Linux Bridge, ..) | |
362 | | | |
363 | +-------------------------------^----------------------------+ |
364 | | |
365 | Kernel Space +-----------v----------+ |
366 +----------------------| NIC |--------------------+
367 +----------------------+
369 Figure 4: Examples of Kernel-Space vSwitch Model
371 Figure 4 shows kernel-space vSwitch model. In this model, because
372 the vSwitch component is running on kernel space, data packets should
373 be processed in-network stack of host kernel before transferring
374 packets to the C-VNF running in user-space. Not only pod2External
375 but also pod2pod traffic should be processed in the kernel space.
376 For dynamic networking configuration, the Forwarding policy can be
377 pushed by the controller/agent located in the user-space. In the
378 case of Open vSwitch (OVS) [OVS], the first packet of flow can be
379 sent to the user space agent (ovs-switchd) for forwarding decision.
380 Kernel-space vSwitch models are listed below;
382 o Docker Network[Docker-network], Flannel Network[Flannel],
383 OVS(OpenvSwitch)[OVS], OVN(Open Virtual Network)[OVN]
385 4.2. User-space vSwitch Model
387 +------------------------------------------------------------------+
388 | User Space |
389 | +---------------+ +---------------+ |
390 | | C-VNF | | C-VNF | |
391 | | +-----------+ | +-----------------+ | +-----------+ | |
392 | | |virtio-user| | | Networking | | |virtio-user|-| |
393 | +-| / eth |-+ | Controller/Agent| +-| / eth |-+ |
394 | +-----^-----+ +-------^^--------+ +-----^-----+ |
395 | | || | |
396 | | || | |
397 | +-----v-----+ || +-----v-----+ |
398 | | vhost-user| || | vhost-user| |
399 | +--| / memif |--------------vv--------------| / memif |--+ |
400 | | +-----------+ +-----------+ | |
401 | | vSwtich | |
402 | | +--------------+ | |
403 | +----------------------| PMD Driver |----------------------+ |
404 | | | |
405 | +-------^------+ |
406 ----------------------------------|---------------------------------
407 | | |
408 | | |
409 | | |
410 | Kernel Space +----------V-----------+ |
411 +----------------------| NIC |--------------------+
412 +----------------------+
414 Figure 5: Examples of User-Space vSwitch Model
416 Figure 5 shows user-space vSwitch model, in which data packets from
417 physical network port are bypassed kernel processing and delivered
418 directly to the vSwitch running on user-space. This model is
419 commonly considered as Data Plane Acceleration (DPA) technology since
420 it can achieve high-rate packet processing than a kernel-space
421 network with limited packet throughput. For bypassing kernel and
422 directly transferring the packet to vSwitch, Data Plane Development
423 Kit (DPDK) is essentially required. With DPDK, an additional driver
424 called Pull-Mode Driver (PMD) is created on vSwtich. PMD driver must
425 be created for each NIC separately. User-space vSwitch models are
426 listed below;
428 o ovs-dpdk[ovs-dpdk], vpp[vpp]
430 4.3. eBPF Acceleration Model
431 +------------------------------------------------------------------+
432 | User Space |
433 | +----------------+ +----------------+ |
434 | | C-VNF | | C-VNF | |
435 | | +------------+ | | +------------+ | |
436 | +-| veth |-+ +-| veth |-+ |
437 | +-----^------+ +------^-----+ |
438 | | | |
439 -------------|---------------------------------------|--------------
440 | +-----v------+ +------v-----+ |
441 | | veth | | veth | |
442 | +-----^------+ +------^-----+ |
443 | | | |
444 | +-----v------+ +------v-----+ |
445 | | tc eBPF | | tc eBPF | |
446 | | Egress | | Egress | |
447 | +-----^------+ +------^-----+ |
448 | | | |
449 | +--------------+ +-------------+ |
450 | | | |
451 | +-v----------v-+ |
452 | | tc eBPF | |
453 | | Ingress | |
454 | +------^-------+ |
455 | | |
456 | +------v-------+ |
457 | | XDP | |
458 | +------^-------+ |
459 | | |
460 | Kernel Space +--------v--------+ |
461 +-----------------------| NIC |------------------------+
462 +-----------------+
464 Figure 6: Examples of eBPF Acceleration Model
466 Figure 6 shows eBPF Acceleration model, which leverages extended
467 Berkeley Packet Filter (eBPF) technology [eBPF] to achieve high-
468 performance packet processing. It enables execution of sandboxed
469 programs inside abstract virtual machines within the Linux kernel
470 without changing the kernel source code or loading the kernel module.
471 To accelerate data plane performance, eBPF programs are attached to
472 different BPF hooks inside the linux kernel stack.
474 One type of BPF hook is the eXpress Data Path (XDP) at the networking
475 driver. It is the first hook that triggers eBPF program upon packet
476 reception from external network. The other type of BPF hook is
477 Traffic Control Ingress/Egress eBPF hook (tc eBPF). These hooks are
478 attached to the vETH pair of the pod and the XDP hook. The tc Egress
479 eBPF hooks at the vETH pair enforce policy on all traffic exit the
480 pod, while the tc Ingress eBPF hook at the end of the kernel
481 networking runs after initial packet processing from XDP hook.
483 On the egress datapath side, whenever a packet exits the pod, it goes
484 through vETH pair then is picked up by the tc egress eBPF hook.
485 These hooks trigger eBPF programs to forward the packet directly to
486 the external facing network interface, bypassing all of the kernel
487 network layer processing such as iptables. On the ingress datapath
488 side, eBPF programs at the XDP and tc ingress eBPF hook pick up
489 packets from the network device and directly deliver it to the vETH
490 interface pair, or bypassing context-switching process to the pod
491 network namespace in the case of Cilium project [Cilium].
493 Notable eBPF Acceleration models are 2 CNI plugin projects:
494 Calico[Calico], Cilium[Cilium]. In the case of Cilium, eBPF/XDP
495 program can be offloaded directly on the smart NIC card, which allows
496 data plane acceleration without using the CPU. Container network
497 performance of these eBPF-based project is reported in
498 [cilium-benchmark].
500 4.4. Smart-NIC Acceleration Model
502 +------------------------------------------------------------------+
503 | User Space |
504 | +-----------------+ +-----------------+ |
505 | | C-VNF | | C-VNF | |
506 | | +-------------+ | | +-------------+ | |
507 | +-| vf driver |-+ +-| vf driver |-+ |
508 | +-----^-------+ +------^------+ |
509 | | | |
510 -------------|---------------------------------------|--------------
511 | +---------+ +---------+ |
512 | +------|-------------------|------+ |
513 | | +----v-----+ +-----v----+ | |
514 | | | virtual | | virtual | | |
515 | | | function | | function | | |
516 | Kernel Space | +----^-----+ NIC +-----^----+ | |
517 +---------------| | | |----------------+
518 | +----v-------------------v----+ |
519 | | Classify and Queue | |
520 | +-----------------------------+ |
521 +---------------------------------+
523 Figure 7: Examples of Smart-NIC Acceleration Model
525 Figure 7 shows Smart-NIC acceleration model, which does not use
526 vSwitch component. This model can be separated into two
527 technologies.
529 One is Single-Root I/O Virtualization (SR-IOV)[SR-IOV], which is an
530 extension of PCIe specifications to enable multiple partitions
531 running simultaneously within a system to share PCIe devices. In the
532 NIC, there are virtual replicas of PCI functions known as virtual
533 functions (VF), and each of them is directly connected to each
534 container's network interfaces. Using SR-IOV, data packets from
535 external bypass both kernel and user space and are directly forwarded
536 to container's virtual network interface.
538 The other technology is eBPF/XDP programs offloading to Smart-NIC
539 card as mentioned in the previous section. It enables general
540 acceleration of eBPF. eBPF programs are attached to XDP and run at
541 the Smart-NIC card, which allows server CPUs to perform more
542 application-level work. However, not all Smart-NIC cards provide
543 eBPF/XDP offloading support.
545 4.5. Model Combination
547 +-------------------------------------------------------+
548 | User Space |
549 | +--------------------+ +--------------------+ |
550 | | C-VNF | | C-VNF | |
551 | | +------+ +------+ | | +------+ +------+ | |
552 | +-| veth |--| veth |-+ +-| veth |--| veth |-+ |
553 | +---^--+ +---^--+ +--^---+ +---^--+ |
554 | | | | | |
555 | | | | | |
556 | | +---v--------+ +-------v----+ | |
557 | | | vhost-user | | vhost-user | | |
558 | | +--| / memif |--| / memif |--+ | |
559 | | | +------------+ +------------+ | | |
560 | | | vSwitch | | |
561 | | +----------------------------------+ | |
562 | | | |
563 --------|----------------------------------------|-------
564 | +-----------+ +-------------+ |
565 | +----|--------------|---+ |
566 | |+---v--+ +---v--+| |
567 | || vf | | vf || |
568 | |+------+ +------+| |
569 | Kernel Space | | |
570 +--------------| NIC |----------------+
571 +-----------------------+
573 Figure 8: Examples of Model Combination deployment
575 Figure 8 shows the networking model when combining user-space vSwitch
576 model and Smart-NIC acceleration model. This model is frequently
577 considered in service function chain scenarios when two different
578 types of traffic flows are present. These two types are North/South
579 traffic and East/West traffic.
581 North/South traffic is the type that packets are received from other
582 servers and routed through VNF. For this traffic type, Smart-NIC
583 model such as SR-IOV is preferred because packets always have to pass
584 the NIC. User-space vSwitch involvement in north-south traffic will
585 create more bottlenecks. On the other hand, East/West traffic is a
586 form of sending and receiving data between containers deployed in the
587 same server and can pass through multiple containers. For this type,
588 user-space vSwitch models such as OVS-DPDK and VPP are preferred
589 because packets are routed within the user space only and not through
590 the NIC.
592 The throughput advantages of these different networking models with
593 different traffic direction cases are reported in [Intel-SRIOV-NFV].
595 5. Performance Impacts
597 5.1. CPU Isolation / NUMA Affinity
599 CPU pinning enables benefits such as maximizing cache utilization,
600 eliminating operating system thread scheduling overhead as well as
601 coordinating network I/O by guaranteeing resources. This technology
602 is very effective in avoiding the "noisy neighbor" problem, and it is
603 already proved in existing experience [Intel-EPA].
605 Using NUMA, performance will be increasing not CPU and memory but
606 also network since that network interface connected PCIe slot of
607 specific NUMA node have locality. Using NUMA requires a strong
608 understanding of VNF's memory requirements. If VNF uses more memory
609 than a single NUMA node contains, the overhead will occurr due to
610 being spilled to another NUMA node. Network performance can be
611 changed depending on the location of the NUMA node whether it is the
612 same NUMA node where the physical network interface and CNF are
613 attached to. There is benchmarking experience for cross-NUMA
614 performance impacts [ViNePERF]. In that tests, they consist of
615 cross-NUMA performance with 3 scenarios depending on the location of
616 the traffic generator and traffic endpoint. As the results, it was
617 verified as below:
619 o A single NUMA Node serving multiple interfaces is worse than Cross-
620 NUMA Node performance degradation
621 o Worse performance with VNF sharing CPUs across NUMA
623 5.2. Hugepages
625 Hugepage configures a large page size of memory to reduce Translation
626 Lookaside Buffer(TLB) miss rate and increase the application
627 performance. This increases the performance of logical/virtual to
628 physical address lookups performed by a CPU's memory management unit,
629 and overall system performance. In the containerized infrastructure,
630 the container is isolated at the application level, and
631 administrators can set huge pages more granular level (e.g.,
632 Kubernetes allows to use of 512M bytes huge pages for the container
633 as default values). Moreover, this page is dedicated to the
634 application but another process, so the application uses the page
635 more efficiently way. From a network benchmark point of view,
636 however, the impact on general packet processing can be relatively
637 negligible, and it may be necessary to consider the application level
638 to measure the impact together. In the case of using the DPDK
639 application, as reported in [Intel-EPA], it was verified to improve
640 network performance because packet handling processes are running in
641 the application together.
643 5.3. Service Function Chaining
645 When we consider benchmarking for containerized and VM-based
646 infrastructure and network functions, benchmarking scenarios may
647 contain various operational use cases. Traditional black-box
648 benchmarking focuses on measuring the in-out performance of packets
649 from physical network ports since the hardware is tightly coupled
650 with its function and only a single function is running on its
651 dedicated hardware. However, in the NFV environment, the physical
652 network port commonly will be connected to multiple VNFs(i.e.,
653 Multiple PVP test setup architectures were described in
654 [ETSI-TST-009]) rather than dedicated to a single VNF. This scenario
655 is called Service Function Chaining. Therefore, benchmarking
656 scenarios should reflect operational considerations such as the
657 number of VNFs or network services defined by a set of VNFs in a
658 single host. [service-density] proposed a way for measuring the
659 performance of multiple NFV service instances at a varied service
660 density on a single host, which is one example of these operational
661 benchmarking aspects. Another aspect in benchmarking service
662 function chaining scenario should be considered is different network
663 acceleration technologies. Network performance differences may occur
664 because of different traffic patterns based on the provided
665 acceleration method.
667 5.4. Additional Considerations
669 Apart from the single-host test scenario, the multi-hosts scenario
670 should also be considered in container network benchmarking, where
671 container services are deployed across different servers. To provide
672 network connectivity for container-based VNFs between different
673 server nodes, inter-node networking is required. According to
674 [ETSI-NFV-IFA-038], there are several technologies to enable inter-
675 node network: overlay technologies using a tunnel endpoint (e.g.
676 VXLAN, IP in IP), routing using Border Gateway Protocol (BGP), layer
677 2 underlay, direct network using dedicated NIC for each pod, or load
678 balancer using LoadBalancer service type in Kubernetes. Different
679 protocols from these technologies may cause performance differences
680 in container networking.
682 6. Security Considerations
684 TBD
686 7. References
688 7.1. Informative References
690 [Calico] "Project Calico", July 2019,
691 .
693 [Cilium] "Cilium Documentation", March 2022,
694 .
696 [cilium-benchmark]
697 Cilium, "CNI Benchmark: Understanding Cilium Network
698 Performance", May 2021,
699 .
701 [Docker-network]
702 "Docker, Libnetwork design", July 2019,
703 .
705 [DPDK_eBPF]
706 "DPDK-Berkeley Packet Filter Library", August 2021,
707 .
709 [eBPF] "eBPF, extended Berkeley Packet Filter", July 2019,
710 .
712 [ETSI-NFV-IFA-038]
713 "Network Functions Virtualisation (NFV) Release 4;
714 Architectural Framework; Report on network connectivity
715 for container-based VNF", November 2021.
717 [ETSI-TST-009]
718 "Network Functions Virtualisation (NFV) Release 3;
719 Testing; Specification of Networking Benchmarks and
720 Measurement Methods for NFVI", October 2018.
722 [Flannel] "flannel 0.10.0 Documentation", July 2019,
723 .
725 [Intel-EPA]
726 Intel, "Enhanced Platform Awareness in Kubernetes", 2018,
727 .
730 [Intel-SRIOV-NFV]
731 Patrick, K. and J. Brian, "SR-IOV for NFV Solutions
732 Practical Considerations and Thoughts", February 2017.
734 [OVN] "How to use Open Virtual Networking with Kubernetes", July
735 2019, .
737 [OVS] "Open Virtual Switch", July 2019,
738 .
740 [ovs-dpdk] "Open vSwitch with DPDK", July 2019,
741 .
744 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
745 Requirement Levels", RFC 2119, March 1997,
746 .
748 [RFC8172] Morton, A., "Considerations for Benchmarking Virtual
749 Network Functions and Their Infrastructure", RFC 8172,
750 July 2017, .
752 [RFC8204] Tahhan, M., O'Mahony, B., and A. Morton, "Benchmarking
753 Virtual Switches in the Open Platform for NFV (OPNFV)",
754 RFC 8204, September 2017,
755 .
757 [service-density]
758 Konstantynowicz, M. and P. Mikus, "NFV Service Density
759 Benchmarking", March 2019, .
762 [SR-IOV] "SRIOV for Container-networking", July 2019,
763 .
765 [userspace-cni]
766 "Userspace CNI Plugin", August 2021,
767 .
769 [ViNePERF] Anuket Project, "Cross-NUMA performance measurements with
770 VSPERF", March 2019, .
773 [vpp] "VPP with Containers", July 2019, .
776 Appendix A. Benchmarking Experience(Contiv-VPP)
778 A.1. Benchmarking Environment
780 In this test, our purpose is to test the performance of user-space
781 based model for container infrastructure and figure out the
782 relationship between resource allocation and network performance.
783 With respect to this, we set up Contiv-VPP, one of the user-space
784 based network solutions in container infrastructure and tested like
785 below.
787 o Three physical server for benchmarking
789 +-------------------+----------------------+--------------------------+
790 | Node Name | Specification | Description |
791 +-------------------+----------------------+--------------------------+
792 | Conatiner Control |- Intel(R) Xeon(R) | Container Deployment |
793 | for Master | CPU E5-2690 | and Network Allocation |
794 | | (2Socket X 12Core) |- ubuntu 18.04 |
795 | |- MEM 128G |- Kubernetes Master |
796 | |- DISK 2T |- CNI Conterller |
797 | |- Control plane : 1G |.. Contive VPP Controller |
798 | | |.. Contive VPP Agent |
799 +-------------------+----------------------+--------------------------+
800 | Conatiner Service |- Intel(R) Xeon(R) | Container Service |
801 | for Worker | Gold 6148 |- ubuntu 18.04 |
802 | | (2socket X 20Core) |- Kubernetes Worker |
803 | |- MEM 128G |- CNI Agent |
804 | |- DISK 2T |.. Contive VPP Agent |
805 | |- Control plane : 1G | |
806 | |- Data plane : MLX 10G| |
807 | | (1NIC 2PORT) | |
808 +-------------------+----------------------+--------------------------+
809 | Packet Generator |- Intel(R) Xeon(R) | Packet Generator |
810 | | CPU E5-2690 |- CentOS 7 |
811 | | (2Socket X 12Core) |- installed Trex 2.4 |
812 | |- MEM 128G | |
813 | |- DISK 2T | |
814 | |- Control plane : 1G | |
815 | |- Data plane : MLX 10G| |
816 | | (1NIC 2PORT) | |
817 +-------------------+----------------------+--------------------------+
819 Figure 9: Test Environment-Server Specification
821 o The architecture of benchmarking
822 +----+ +--------------------------------------------------------+
823 | | | Containerized Infrastructure Master Node |
824 | | | +-----------+ |
825 | <-------> 1G PORT 0 | |
826 | | | +-----------+ |
827 | | +--------------------------------------------------------+
828 | |
829 | | +--------------------------------------------------------+
830 | | | Containerized Infrastructure Worker Node |
831 | | | +---------------------------------+ |
832 | s | | +-----------+ | +------------+ +------------+ | |
833 | w <-------> 1G PORT 0 | | | 10G PORT 0 | | 10G PORT 1 | | |
834 | i | | +-----------+ | +------^-----+ +------^-----+ | |
835 | t | | +--------|----------------|-------+ |
836 | c | +-----------------------------|----------------|---------+
837 | h | | |
838 | | +-----------------------------|----------------|---------+
839 | | | Packet Generator Node | | |
840 | | | +--------|----------------|-------+ |
841 | | | +-----------+ | +------v-----+ +------v-----+ | |
842 | <-------> 1G PORT 0 | | | 10G PORT 0 | | 10G PORT 1 | | |
843 | | | +-----------+ | +------------+ +------------+ | |
844 | | | +---------------------------------+ |
845 | | | |
846 +----+ +--------------------------------------------------------+
848 Figure 10: Test Environment-Architecture
850 o Network model of Containerized Infrastructure(User space Model)
851 +---------------------------------------------+---------------------+
852 | NUMA 0 | NUMA 0 |
853 +---------------------------------------------|---------------------+
854 | Containerized Infrastructure Worker Node | |
855 | +---------------------------+ | +----------------+ |
856 | | POD1 | | | POD2 | |
857 | | +-------------+ | | | +-------+ | |
858 | | | | | | | | | | |
859 | | +--v---+ +---v--+ | | | +-v--+ +-v--+ | |
860 | | | eth1 | | eth2 | | | | |eth1| |eth2| | |
861 | | +--^---+ +---^--+ | | | +-^--+ +-^--+ | |
862 | +------|-------------|------+ | +---|-------|----+ |
863 | +--- | | | | |
864 | | +-------|---------------|------+ | |
865 | | | | +------|--------------+ |
866 | +----------|--------|-------|--------|----+ | |
867 | | v v v v | | |
868 | | +-tap10--tap11-+ +-tap20--tap21-+ | | |
869 | | | ^ ^ | | ^ ^ | | | |
870 | | | | VRF1 | | | | VRF2 | | | | |
871 | | +--|--------|--+ +--|--------|--+ | | |
872 | | | +-----+ | +---+ | | |
873 | | +-tap01--|--|-------------|----|---+ | | |
874 | | | +------v--v-+ VRF0 +----v----v-+ | | | |
875 | | +-| 10G ETH0/0|------| 10G ETH0/1|-+ | | |
876 | | +---^-------+ +-------^---+ | | |
877 | | +---v-------+ +-------v---+ | | |
878 | +---| DPDK PMD0 |------| DPDK PMD1 |------+ | |
879 | +---^-------+ +-------^---+ | User Space |
880 +---------|----------------------|------------|---------------------+
881 | +-----|----------------------|-----+ | Kernal Space |
882 +---| +---V----+ +----v---+ |------|---------------------+
883 | | PORT 0 | 10G NIC | PORT 1 | | |
884 | +---^----+ +----^---+ |
885 +-----|----------------------|-----+
886 +-----|----------------------|-----+
887 +---| +---V----+ +----v---+ |----------------------------+
888 | | | PORT 0 | 10G NIC | PORT 1 | | Packet Generator (Trex) |
889 | | +--------+ +--------+ | |
890 | +----------------------------------+ |
891 +-------------------------------------------------------------------+
893 Figure 11: Test Environment-Network Architecture
895 We set up a Contive-VPP network to benchmark the user space container
896 network model in the containerized infrastructure worker node. We
897 set up network interface at NUMA0, and we created different network
898 subnets VRF1, VRF2 to classify input and output data traffic,
899 respectively. And then, we assigned two interfaces which connected
900 to VRF1, VRF2 and, we setup routing table to route Trex packet from
901 eth1 interface to eth2 interface in POD.
903 A.2. Trouble shooting and Result
905 In this environment, we confirmed that the routing table doesn't work
906 when we send packets using Trex packet generator. The reason is that
907 when kernel space based network configured, ip forwarding rule is
908 processed to kernel stack level while 'ip packet forwarding rule' is
909 processed only in vrf0, which is the default virtual routing and
910 forwarding (VRF0) in VPP. The above testing architecture makes
911 problem since vrf1 and vrf2 interface couldn't route packet.
912 According to above result, we assigned vrf0 and vrf1 to POD and, data
913 flow is like below.
915 +---------------------------------------------+---------------------+
916 | NUMA 0 | NUMA 0 |
917 +---------------------------------------------|---------------------+
918 | Containerized Infrastructure Worker Node | |
919 | +---------------------------+ | +----------------+ |
920 | | POD1 | | | POD2 | |
921 | | +-------------+ | | | +-------+ | |
922 | | +--v----+ +---v--+ | | | +-v--+ +-v--+ | |
923 | | | eth1 | | eth2 | | | | |eth1| |eth2| | |
924 | | +--^---+ +---^--+ | | | +-^--+ +-^--+ | |
925 | +------|-------------|------+ | +---|-------|----+ |
926 | +-------+ | | | | |
927 | | +-------------|---------------|------+ | |
928 | | | | +------|--------------+ |
929 | +-----|-------|-------------|--------|----+ | |
930 | | | | v v | | |
931 | | | | +-tap10--tap11-+ | | |
932 | | | | | ^ ^ | | | |
933 | | | | | | VRF1 | | | | |
934 | | | | +--|--------|--+ | | |
935 | | | | | +---+ | | |
936 | | +-*tap00--*tap01----------|----|---+ | | |
937 | | | +-V-------v-+ VRF0 +----v----v-+ | | | |
938 | | +-| 10G ETH0/0|------| 10G ETH0/1|-+ | | |
939 | | +-----^-----+ +------^----+ | | |
940 | | +-----v-----+ +------v----+ | | |
941 | +---|*DPDK PMD0 |------|*DPDK PMD1 |------+ | |
942 | +-----^-----+ +------^----+ | User Space |
943 +-----------|-------------------|-------------|---------------------+
944 v v
945 *- CPU pinning interface
946 Figure 12: Test Environment-Network Architecture(CPU Pinning)
948 We conducted benchmarking with three conditions. The test
949 environments are as follows. - Basic VPP switch - General kubernetes
950 (No CPU Pining) - Shared Mode / Exclusive mode. In the basic
951 Kubernetes environment, all PODs share a host's CPU. Shared mode is
952 that some POD share a pool of CPU assigned to specific PODs.
953 Exclusive mode is that a specific POD dedicates a specific CPU to
954 use. In shared mode, we assigned two CPUs for several PODs, in
955 exclusive mode, we dedicated one CPU for one POD, independently. The
956 result is like Figure 13. First, the test was conducted to figure
957 out the line rate of the VPP switch, and the basic Kubernetes
958 performance. After that, we applied NUMA to the network interface
959 using Shared Mode and Exclusive Mode in the same node and different
960 node. In Exclusive and Shared mode tests, we confirmed that
961 Exclusive mode showed better performance than Shared mode when same
962 NUMA CPU was assigned, respectively. However, we confirmed that
963 performance is reduced at the section between the vpp switch and the
964 POD, affecting the total result.
966 +--------------------+---------------------+-------------+
967 | Model | NUMA Mode (pinning)| Result(Gbps)|
968 +--------------------+---------------------+-------------+
969 | | N/A | 3.1 |
970 | Maximum Line Rate |---------------------+-------------+
971 | | same NUMA | 9.8 |
972 +--------------------+---------------------+-------------+
973 | Without CMK | N/A | 1.5 |
974 +--------------------+---------------------+-------------+
975 | | same NUMA | 4.7 |
976 | CMK-Exclusive Mode +---------------------+-------------+
977 | | Different NUMA | 3.1 |
978 +--------------------+---------------------+-------------+
979 | | same NUMA | 3.5 |
980 | CMK-shared Mode +---------------------+-------------+
981 | | Different NUMA | 2.3 |
982 +--------------------+---------------------+-------------+
984 Figure 13: Test Results
986 Appendix B. Benchmarking Experience(SR-IOV with DPDK)
987 B.1. Benchmarking Environment
989 In this test, our purpose is to test the performance of Smart-NIC
990 acceleration model for container infrastructure and figure out
991 relationship between resource allocation and network performance.
992 With respect to this, we setup SRIOV combining with DPDK to bypass
993 the Kernel space in container infrastructure and tested based on
994 that.
996 o Three physical server for benchmarking
998 +-------------------+-------------------------+------------------------+
999 | Node Name | Specification | Description |
1000 +-------------------+-------------------------+------------------------+
1001 | Conatiner Control |- Intel(R) Core(TM) | Container Deployment |
1002 | for Master | i5-6200U CPU | and Network Allocation |
1003 | | (1socket x 4Core) |- ubuntu 18.04 |
1004 | |- MEM 8G |- Kubernetes Master |
1005 | |- DISK 500GB |- CNI Conterller |
1006 | |- Control plane : 1G | MULTUS CNI |
1007 | | | SRIOV plugin with DPDK|
1008 +-------------------+-------------------------+------------------------+
1009 | Conatiner Service |- Intel(R) Xeon(R) | Container Service |
1010 | for Worker | E5-2620 v3 @ 2.4Ghz |- Centos 7.7 |
1011 | | (1socket X 6Core) |- Kubernetes Worker |
1012 | |- MEM 128G |- CNI Agent |
1013 | |- DISK 2T | MULTUS CNI |
1014 | |- Control plane : 1G | SRIOV plugin with DPDK|
1015 | |- Data plane : XL710-qda2| |
1016 | | (1NIC 2PORT- 40Gb) | |
1017 +-------------------+-------------------------+------------------------+
1018 | Packet Generator |- Intel(R) Xeon(R) | Packet Generator |
1019 | | Gold 6148 @ 2.4Ghz |- CentOS 7.7 |
1020 | | (2Socket X 20Core) |- installed Trex 2.4 |
1021 | |- MEM 128G | |
1022 | |- DISK 2T | |
1023 | |- Control plane : 1G | |
1024 | |- Data plane : XL710-qda2| |
1025 | | (1NIC 2PORT- 40Gb) | |
1026 +-------------------+-------------------------+------------------------+
1028 Figure 14: Test Environment-Server Specification
1030 o The architecture of benchmarking
1031 +----+ +--------------------------------------------------------+
1032 | | | Containerized Infrastructure Master Node |
1033 | | | +-----------+ |
1034 | <-------> 1G PORT 0 | |
1035 | | | +-----------+ |
1036 | | +--------------------------------------------------------+
1037 | |
1038 | | +--------------------------------------------------------+
1039 | | | Containerized Infrastructure Worker Node |
1040 | | | +---------------------------------+ |
1041 | s | | +-----------+ | +------------+ +------------+ | |
1042 | w <-------> 1G PORT 0 | | | 40G PORT 0 | | 40G PORT 1 | | |
1043 | i | | +-----------+ | +------^-----+ +------^-----+ | |
1044 | t | | +--------|----------------|-------+ |
1045 | c | +-----------------------------|----------------|---------+
1046 | h | | |
1047 | | +-----------------------------|----------------|---------+
1048 | | | Packet Generator Node | | |
1049 | | | +--------|----------------|-------+ |
1050 | | | +-----------+ | +------v-----+ +------v-----+ | |
1051 | <-------> 1G PORT 0 | | | 40G PORT 0 | | 40G PORT 1 | | |
1052 | | | +-----------+ | +------------+ +------------+ | |
1053 | | | +---------------------------------+ |
1054 | | | |
1055 +----+ +--------------------------------------------------------+
1057 Figure 15: Test Environment-Architecture
1059 o Network model of Containerized Infrastructure(User space Model)
1060 +---------------------------------------------+---------------------+
1061 | CMK shared core | CMK exclusive core |
1062 +---------------------------------------------|---------------------+
1063 | Containerized Infrastructure Worker Node | |
1064 | +---------------------------+ | +----------------+ |
1065 | | POD1 | | | POD2 | |
1066 | | (testpmd) | | | (testpmd) | |
1067 | | +-------------+ | | | +-------+ | |
1068 | | | | | | | | | | |
1069 | | +--v---+ +---v--+ | | | +-v--+ +-v--+ | |
1070 | | | eth1 | | eth2 | | | | |eth1| |eth2| | |
1071 | | +--^---+ +---^--+ | | | +-^--+ +-^--+ | |
1072 | +------|-------------|------+ | +---|-------|----+ |
1073 | | | | | | |
1074 | +------ +-+ | | | |
1075 | | +----|-----------------|------+ | |
1076 | | | | +--------|--------------+ |
1077 | | | | | | User Space|
1078 +---------|------------|----|--------|--------|---------------------+
1079 | | | | | | |
1080 | +--+ +------| | | | |
1081 | | | | | | Kernal Space|
1082 +------|--------|-----------|--------|--------+---------------------+
1083 | +----|--------|-----------|--------|-----+ | |
1084 | | +--v--+ +--v--+ +--v--+ +--v--+ | | NIC|
1085 | | | VF0 | | VF1 | | VF2 | | VF3 | | | |
1086 | | +--|---+ +|----+ +----|+ +-|---+ | | |
1087 | +----|------|---------------|-----|------+ | |
1088 +---| +v------v+ +-v-----v+ |------|---------------------+
1089 | | PORT 0 | 40G NIC | PORT 1 | |
1090 | +---^----+ +----^---+ |
1091 +-----|----------------------|-----+
1092 +-----|----------------------|-----+
1093 +---| +---V----+ +----v---+ |----------------------------+
1094 | | | PORT 0 | 40G NIC | PORT 1 | | Packet Generator (Trex) |
1095 | | +--------+ +--------+ | |
1096 | +----------------------------------+ |
1097 +-------------------------------------------------------------------+
1099 Figure 16: Test Environment-Network Architecture
1101 We set up a Multus CNI, SRIOV CNI with DPDK to benchmark the user-
1102 space container network model in the containerized infrastructure
1103 worker node. The Multus CNI support creates multiple interfaces for
1104 a container. The traffic is bypassed the Kernel space by SRIOV with
1105 DPDK. We established two modes of CMK: shared core and exclusive
1106 core. We created VFs for each network interface of a container.
1107 Then, we set up TREX to route packet from eth1 to eth2 in a POD.
1109 B.2. Trouble shooting and Results
1111 Figure 17 shows the test results when using 1518 bytes packet traffic
1112 from the T-Rex traffic generator. First, we get the maximum line
1113 rate of the system using SR-IOV as the packet acceleration technique.
1114 Then we measured throughput when applying the CMK feature. We
1115 observed similar results as VPP CPU Pinning test. The default
1116 Kubernetes system without CMK feature enabled had the worst
1117 performance as the CPU resources are shared without any isolation.
1118 When the CMK feature is enabled, Exclusive Mode performed better than
1119 Shared Mode because each pod had its own dedicated CPU.
1121 +--------------------+-------------+
1122 | Model | Result(Gbps)|
1123 +--------------------+-------------+
1124 | Maximum Line Rate | 39.3 |
1125 +--------------------+-------------+
1126 | Without CMK | 11.5 |
1127 +--------------------+-------------+
1128 | CMK-Exclusive Mode | 39.2 |
1129 +--------------------+-------------+
1130 | CMK-shared Mode | 29.6 |
1131 +--------------------+-------------+
1133 Figure 17: SR-IOV CPU Pinning Test Results
1135 Appendix C. Benchmarking Experience(Multi-pod Test)
1137 C.1. Benchmarking Overview
1139 The main goal of this experience was to benchmark the multi-pod
1140 scenario, in which packets are traversed through two pods. To create
1141 additional interfaces for forwarding packets between two pods, Multus
1142 CNI was used. We compared two userspace-vSwitch model network
1143 technologies: OVS/DPDK and VPP-memif. Since that vpp-memif has a
1144 different packet forwarding mechanism by using shared memory
1145 interface, it is expected that vpp-memif may provide higher
1146 performance that OVS-DPDK. Also, we consider NUMA impact for both
1147 cases, and made 6 scenarios depending on CPU location of vSwitch and
1148 two pods. Figure 18 is packet forwarding scenario in this test,
1149 where two pods run on the same host and vSwitch delivers packets
1150 between two pods.
1152 +----------------------------------------------------------------+
1153 |Worker Node |
1154 | +--------------------------------------------------------+ |
1155 | |Kubernetes | |
1156 | | +--------------+ +--------------+ | |
1157 | | | pod1 | | pod2 | | |
1158 | | | +--------+ | | +--------+ | | |
1159 | | | | L2FWD | | | | L2FWD | | | |
1160 | | | +---^--v-+ | | +--^--v--+ | | |
1161 | | | | DPDK | | | | DPDK | | | |
1162 | | | +---^--v-+ | | +--^--v--+ | | |
1163 | | +------^--v----+ +-----^--v-----+ | |
1164 | | ^ v ^ v | |
1165 | | +------^--v>>>>>>>>>>>>>>>>>>>>>>>>>>>^--v-----+ | |
1166 | | | ^ OVS-DPDK / VPP-memif vSwitch v | | |
1167 | | +------^---------------------------------v-----+ | |
1168 | | | ^ PMD Driver v | | |
1169 | | +------^---------------------------------v-----+ | |
1170 | | ^ v | |
1171 | +----------^---------------------------------v-----------+ |
1172 | ^ v |
1173 | +----------^---------------------------------v---------+ |
1174 | | ^ 40G NIC v | |
1175 | | +------^-------+ +--------v-----+ | |
1176 +---|---| Port 0 |----------------| Port 1 |---|-----+
1177 | +------^-------+ +--------v-----+ |
1178 +----------^---------------------------------v---------+
1179 +------^-------+ +--------v-----+
1180 +-------| Port 0 |----------------| Port 1 |---------+
1181 | +------^-------+ +--------v-----+ |
1182 | Traffic Generator (TRex) |
1183 | |
1184 +----------------------------------------------------------------+
1186 Figure 18: Multi-pod Benchmarking Scenario
1188 C.2. Hardware Configurations
1189 +-------------------+-------------------------+------------------------+
1190 | Node Name | Specification | Description |
1191 +-------------------+-------------------------+------------------------+
1192 | Conatiner Control |- Intel(R) Core(TM) | Container Deployment |
1193 | for Master | E5-2620v3 @ 2.40GHz | and Network Allocation |
1194 | | (1socket x 12Cores) |- ubuntu 18.04 |
1195 | |- MEM 32GB |- Kubernetes Master |
1196 | |- DISK 1TB |- CNI Controller |
1197 | |- NIC: Control plane: 1G | - MULTUS CNI |
1198 | |- OS: CentOS Linux7.9 | - DPDK-OVS/VPP-memif |
1199 +-------------------+-------------------------+------------------------+
1200 | Conatiner Service |- Intel(R) Xeon(R) |- Container dpdk-L2fwd |
1201 | for Worker | Gold 6148 @ 2.40GHz |- Kubernetes Worker |
1202 | | (2socket X 40Cores) |- CNI Agent |
1203 | |- MEM 256GB | - Multus CNI |
1204 | |- DISK 2TB | - DPDK-OVS/VPP-memif |
1205 | |- NIC | |
1206 | | - Control plane: 1G | |
1207 | | - Data plane: XL710-qda2| |
1208 | | (1NIC 2PORT- 40Gb) | |
1209 | |- OS: CentOS Linux 7.9 | |
1210 +-------------------+-------------------------+------------------------+
1211 | Packet Generator |- Intel(R) Xeon(R) | Packet Generator |
1212 | | Gold 6148 @ 2.4Ghz |- Installed Trex v2.92 |
1213 | | (2Socket X 40Core) | |
1214 | |- MEM 256GB | |
1215 | |- DISK 2TB | |
1216 | |- NIC | |
1217 | | - Data plane: XL710-qda2| |
1218 | | (1NIC 2PORT - 40Gb) | |
1219 | |- OS: CentOS Lunix 7.9 | |
1220 +-------------------+-------------------------+------------------------+
1222 Figure 19: Hardware Configurations for Multi-pod Benchmarking
1224 For installations and configurations of CNIs, we used userspace-cni
1225 network plugin. Among this CNI, multus provides to create multiple
1226 interfaces for each pod. Both OVS-DPDK and VPP-memif bypass kernel
1227 with DPDK PMD driver. For CPU isolation and NUMA allocation, we used
1228 Intel CMK with exclusive mode. Since Trex generator is upgraded to
1229 the new version, we used the latest version of Trex.
1231 C.3. NUMA Allocation Scenario
1233 To analyze benchmarking impacts of different NUMA allocation, we set
1234 6 scenarios depending on CPU location allocating to two pods and
1235 vSwich. For this scenario, we did not consider cross-NUMA case,
1236 which allocates CPUs to pod or switch in a manner that two cores are
1237 located in different NUMA nodes. 6 scenarios we considered are listed
1238 in Table 1. Note that, NIC is attached to the NUMA1.
1240 +============+=========+=======+=======+
1241 | Scenario # | vSwtich | pod1 | pod2 |
1242 +============+=========+=======+=======+
1243 | S1 | NUMA1 | NUMA0 | NUMA0 |
1244 +------------+---------+-------+-------+
1245 | S2 | NUMA1 | NUMA1 | NUMA1 |
1246 +------------+---------+-------+-------+
1247 | S3 | NUMA0 | NUMA0 | NUMA0 |
1248 +------------+---------+-------+-------+
1249 | S4 | NUMA0 | NUMA1 | NUMA1 |
1250 +------------+---------+-------+-------+
1251 | S5 | NUMA1 | NUMA1 | NUMA0 |
1252 +------------+---------+-------+-------+
1253 | S6 | NUMA0 | NUMA0 | NUMA1 |
1254 +------------+---------+-------+-------+
1256 Table 1: NUMA Allocation Scenarios
1258 C.4. Traffic Generator Configurations
1260 For multi-pod benchmarking, we discovered Non Drop Rate (NDR) with
1261 binary search algorithm. In Trex, it supports command to discover
1262 NDR for each testing. Also, we test for different ethernet frame
1263 sizes from 64bytes to 1518bytes. For running Trex, we used command
1264 as follows;
1266 ./ndr --stl --port 0 1 -v --profile stl/bench.py --prof-tun size=x --
1267 opt-bin-search
1269 C.5. Benchmark Results and Trouble-shootings
1271 As the benchmarking results, Table 2 shows packet loss ratio using
1272 1518 bytes packet in OVS-DPDK/vpp-memif. From that result, we can
1273 say that the vpp-memif has better performance that OVS-DPDK, which is
1274 came from the difference in the way to forward packets between
1275 vswitch and pod. Also, the impact of NUMA is bigger when vswitch and
1276 both pods are located in the same node than when allocating CPU to
1277 the node where NIC is attached.
1279 +==================+=======+=======+=======+=======+=======+=======+
1280 | Networking Model | S1 | S2 | S3 | S4 | S5 | S6 |
1281 +==================+=======+=======+=======+=======+=======+=======+
1282 | OVS-DPDK | 21.29 | 13.17 | 6.32 | 19.76 | 12.43 | 6.38 |
1283 +------------------+-------+-------+-------+-------+-------+-------+
1284 | vpp-memif | 59.96 | 34.17 | 45.13 | 57.1 | 33.47 | 44.92 |
1285 +------------------+-------+-------+-------+-------+-------+-------+
1287 Table 2: Multi-pod Benchmarking Results (% of Line Rate)
1289 Authors' Addresses
1291 Kyoungjae Sun
1292 ETRI
1293 218, Gajeong-ro, Yuseung-gu
1294 Dajeon
1295 34065
1296 Republic of Korea
1297 Phone: +82 10 3643 5627
1298 Email: kjsun@etri.re.kr
1300 Hyunsik Yang
1301 KT
1302 KT Research Center 151
1303 Taebong-ro, Seocho-gu
1304 Seoul
1305 06763
1306 Republic of Korea
1307 Phone: +82 10 9005 7439
1308 Email: yangun@dcn.ssu.ac.kr
1310 Jangwon Lee
1311 Soongsil University
1312 369, Sangdo-ro, Dongjak-gu
1313 Seoul
1314 06978
1315 Republic of Korea
1316 Phone: +82 10 7448 4664
1317 Email: jangwon.lee@dcn.ssu.ac.kr
1318 Tran Minh Ngoc
1319 Soongsil University
1320 369, Sangdo-ro, Dongjak-gu
1321 Seoul
1322 06978
1323 Republic of Korea
1324 Phone: +82 2 820 0841
1325 Email: mipearlska1307@dcn.ssu.ac.kr
1327 Younghan Kim
1328 Soongsil University
1329 369, Sangdo-ro, Dongjak-gu
1330 Seoul
1331 06978
1332 Republic of Korea
1333 Phone: +82 10 2691 0904
1334 Email: younghak@ssu.ac.kr