kakkotetsu

Cumulus VX で VXLAN+EVPN (original : 2017/03/22)

この記事は某所で 2017/03/22 に書いた記事のコピーです。
そのため 2017/05/11 時点ではやや古い情報も含まれています。(2017/05 に GNS3 v2.0.0 stableCumulus Linux v3.3 がリリースされた)

最初に

本項でやること

以下をやります。

  • Cumulus Linux の Early Access 版(2017/03/21 時点)で限定的に VXLAN+EVPN 機能を試行できるので、仮想版である Cumulus VX でも動くか見る
    • 将来的に本実装された際には、設定方法や挙動は変わる筈
    • 現在取得できる EA 版は Quagga daemon のみなので、EVPN機能周りの設定や参照は Quagga にて
  • EVPN Multihoming を実装していない代わりに、MLAGでVTEPを冗長化する仕組みがあるようなので、その設定と挙動を見る

環境情報

Cumulus VX

Cumulus公式 / Download Cumulus VX2017/03/13 時点でダウンロード可能な最新版(Cumulus VX 3.2.1)の KVM 版 アカウントを作れば、個人でも特に問題なくダウンロードできました。

kotetsu@kvm01:~/vm_images/qemu$ ls -al cumulus-linux-3.2.1-vx-amd64-1486153138.ac46c24zd00d13e.qcow2
-rw-r--r-- 1 kotetsu kotetsu 1232601088 Mar  7 22:11 cumulus-linux-3.2.1-vx-amd64-1486153138.ac46c24zd00d13e.qcow2

kotetsu@kvm01:~/vm_images/qemu$ sha1sum cumulus-linux-3.2.1-vx-amd64-1486153138.ac46c24zd00d13e.qcow2
3d782f2c450683b4da5ea2324c88f3dccb89b6c2  cumulus-linux-3.2.1-vx-amd64-1486153138.ac46c24zd00d13e.qcow2
kotetsu@bb03:~$ cat /etc/lsb-release
DISTRIB_ID="Cumulus Linux"
DISTRIB_RELEASE=3.2.1
DISTRIB_DESCRIPTION="Cumulus Linux 3.2.1"

kotetsu@bb03:~$ uname -a
Linux bb03 4.1.0-cl-4-amd64 #1 SMP Debian 4.1.33-1+cl3u7 (2017-01-26) x86_64 GNU/Linux

その他

  • KVM母艦
    • Ubuntu16.04.1-server-amd64
    • に apt で降ってくる KVM と GNS3 一式
$ uname -a
Linux kvm01 4.4.0-57-generic #78-Ubuntu SMP Fri Dec 9 23:50:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04.1 LTS"

$ virsh -v
1.3.1

$ qemu-system-x86_64 --version
QEMU emulator version 2.5.0 (Debian 1:2.5+dfsg-5ubuntu10.6), Copyright (c) 2003-2008 Fabrice Bellard

$ gns3 --version
1.5.2

参考資料

構築

以下のような環境を作ります。

f:id:kakkotetsu:20170511000723p:plain

GNS3 でデプロイ

以下の感じでデプロイしていきます。(陰っているところは、相互接続実験するための既存環境なので無視)

f:id:kakkotetsu:20170511000850p:plain

Cumulus VX に関しては、以下の公式docに従ってポチポチしとけばよいでしょー。

自分の環境では以下程度で十分でした。

f:id:kakkotetsu:20170511000939p:plain

f:id:kakkotetsu:20170511000952p:plain

f:id:kakkotetsu:20170511001001p:plain

kotetsu@kvm01:~$ ps aux | grep [C]umulus
root     28241  2.5  1.3 1417576 445408 pts/12 Sl+  20:32   0:20 /usr/bin/qemu-system-x86_64 -name CumulusVX_bb03 -m 512M -smp cpus=1 -enable-kvm -boot order=c -drive file=/home/kotetsu/GNS3/projects/vqfx/project-files/qemu/25f56fdc-48e7-4622-be73-bf98d5686e4e/hda_disk.qcow2,if=ide,index=0,media=disk -serial telnet:127.0.0.1:5018,server,nowait -monitor tcp:127.0.0.1:37529,server,nowait -net none -device virtio-net-pci,mac=00:37:c4:6e:4e:00,netdev=gns3-0 -netdev socket,id=gns3-0,udp=127.0.0.1:10102,localaddr=127.0.0.1:10103 -device virtio-net-pci,mac=00:37:c4:6e:4e:01,netdev=gns3-1 -netdev socket,id=gns3-1,udp=127.0.0.1:10125,localaddr=127.0.0.1:10124 -device virtio-net-pci,mac=00:37:c4:6e:4e:02,netdev=gns3-2 -netdev socket,id=gns3-2,udp=127.0.0.1:10129,localaddr=127.0.0.1:10128 -device virtio-net-pci,mac=00:37:c4:6e:4e:03,netdev=gns3-3 -netdev socket,id=gns3-3,udp=127.0.0.1:10133,localaddr=127.0.0.1:10132 -device virtio-net-pci,mac=00:37:c4:6e:4e:04,netdev=gns3-4 -netdev socket,id=gns3-4,udp=127.0.0.1:10137,localaddr=127.0.0.1:10136 -device virtio-net-pci,mac=00:37:c4:6e:4e:05

周辺機器設定

torSW[34]01a (Open vSwitch) 設定

Open vSwitch の導入なんかは、適当に公式ドキュメントを見て進めて頂くとして。(雑) 以下のような設定をしておけば良いですよ。今回は Open vSwitch を使っていますが、ここに置くのは LACP と VLAN が動けばなんでもよいので、適当に各々が使いやすいやつを入れればよいかと。(勿論Cumulus VXでもok)

  • torSW[12]01a 共通
# ovs-vsctl --no-wait init
# ovs-vsctl add-br br0
# ovs-vsctl set bridge br0 datapath_type=netdev

# ovs-vsctl add-bond br0 bond0 ens4 ens5 lacp=active bond_mode=balance-slb other_config:lacp-time=fast
# ovs-vsctl add-port br0 ens6 tag=100
# ovs-vsctl add-port br0 ens7 tag=200
# ip link set dev br0 up
# ip link set dev ens4 up
# ip link set dev ens5 up
# ip link set dev ens6 up
# ip link set dev ens7 up

通信確認用 node[34]1 設定

通信できりゃー何でもよいです。(雑)

kotetsu@node31:~$ ip a show dev ens4
3: ens4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 00:37:c4:55:09:01 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.3/24 brd 192.168.1.255 scope global ens4
       valid_lft forever preferred_lft forever
    inet6 fe80::237:c4ff:fe55:901/64 scope link
       valid_lft forever preferred_lft forever
kotetsu@node41:~$ ip a show dev ens4
3: ens4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 00:37:c4:56:b4:01 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.4/24 brd 192.168.1.255 scope global ens4
       valid_lft forever preferred_lft forever
    inet6 fe80::237:c4ff:fe56:b401/64 scope link
       valid_lft forever preferred_lft forever

Cumulus VX 初期設定

ログインアカウント/パスワードは Cumulus公式 / Using Cumulus VX with KVM に書いてある通り、ユーザ cumulus パスワード CumulusLinux!

あとは

らへんを見ながら適当に…hostname、操作用ユーザ作成とssh鍵登録、syslog、timezone, ntp などの設定を環境に合わせた感じでどうぞ。

追加したユーザで net コマンド各種を使いたい場合は /etc/netd.conf で許可するユーザ、グループ設定を適宜編集して反映 (Cumulus公式 / Network Command Line Utility / Adding More NCLU Users or Groups)

Cumulus VX 物理IF/BGP設定

以下のような感じのを作っていきます。

f:id:kakkotetsu:20170511001253p:plain

物理IF

Cumulus公式 / Interface Configuration and Management あたりを参考に、まずはBGP構成をとるための物理IF設定を。

  • bb03
net add interface swp1 alias DEV=spine31 IF=swp1
net add interface swp1 mtu 9216
net add interface swp1 ip address 192.0.2.8/31

net add interface swp2 alias DEV=spine32 IF=swp1
net add interface swp2 mtu 9216
net add interface swp2 ip address 192.0.2.10/31

net add interface swp3 alias DEV=spine41 IF=swp1
net add interface swp3 mtu 9216
net add interface swp3 ip address 192.0.2.12/31

net add interface swp4 alias DEV=spine42 IF=swp1
net add interface swp4 mtu 9216
net add interface swp4 ip address 192.0.2.14/31

net commit
kotetsu@bb03:~$ net show interface all

       Name                        Speed      MTU  Mode           Summary
-----  --------------------------  -------  -----  -------------  ------------------------
UP     lo                          N/A      65536  Loopback       IP: 127.0.0.1/8, ::1/128
UP     eth0                        1G        1500  Mgmt           IP: 10.0.0.193/24
UP     swp1 (DEV=spine31 IF=swp1)  1G        9216  Interface/L3   IP: 192.0.2.8/31
UP     swp2 (DEV=spine32 IF=swp1)  1G        9216  Interface/L3   IP: 192.0.2.10/31
UP     swp3 (DEV=spine41 IF=swp1)  1G        9216  Interface/L3   IP: 192.0.2.12/31
UP     swp4 (DEV=spine42 IF=swp1)  1G        9216  Interface/L3   IP: 192.0.2.14/31
ADMDN  swp5                        0M        1500  NotConfigured
kotetsu@bb03:~$ cat /etc/network/interfaces
# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

source /etc/network/interfaces.d/*.intf

# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
auto eth0
iface eth0
    address 10.0.0.193/24
    gateway 10.0.0.254

auto swp1
iface swp1
    address 192.0.2.8/31
    alias DEV=spine31 IF=swp1
    mtu 9216

auto swp2
iface swp2
    address 192.0.2.10/31
    alias DEV=spine32 IF=swp1
    mtu 9216

auto swp3
iface swp3
    address 192.0.2.12/31
    alias DEV=spine41 IF=swp1
    mtu 9216

auto swp4
iface swp4
    address 192.0.2.14/31
    alias DEV=spine42 IF=swp1
    mtu 9216
  • bb04
net add interface swp1 alias DEV=spine31 IF=swp2
net add interface swp1 mtu 9216
net add interface swp1 ip address 192.0.2.136/31

net add interface swp2 alias DEV=spine32 IF=swp2
net add interface swp2 mtu 9216
net add interface swp2 ip address 192.0.2.138/31

net add interface swp3 alias DEV=spine41 IF=swp2
net add interface swp3 mtu 9216
net add interface swp3 ip address 192.0.2.140/31

net add interface swp4 alias DEV=spine42 IF=swp2
net add interface swp4 mtu 9216
net add interface swp4 ip address 192.0.2.142/31

net commit
  • spine31
net add interface swp1 alias DEV=bb03 IF=swp1
net add interface swp1 mtu 9216
net add interface swp1 ip address 192.0.2.9/31

net add interface swp2 alias DEV=bb04 IF=swp1
net add interface swp2 mtu 9216
net add interface swp2 ip address 192.0.2.137/31

net commit
  • spine32
net add interface swp1 alias DEV=bb03 IF=swp2
net add interface swp1 mtu 9216
net add interface swp1 ip address 192.0.2.11/31

net add interface swp2 alias DEV=bb04 IF=swp2
net add interface swp2 mtu 9216
net add interface swp2 ip address 192.0.2.139/31

net commit
  • spine41
net add interface swp1 alias DEV=bb03 IF=swp3
net add interface swp1 mtu 9216
net add interface swp1 ip address 192.0.2.13/31

net add interface swp2 alias DEV=bb04 IF=swp3
net add interface swp2 mtu 9216
net add interface swp2 ip address 192.0.2.141/31

net commit
  • spine42
net add interface swp1 alias DEV=bb03 IF=swp4
net add interface swp1 mtu 9216
net add interface swp1 ip address 192.0.2.15/31

net add interface swp2 alias DEV=bb04 IF=swp4
net add interface swp2 mtu 9216
net add interface swp2 ip address 192.0.2.143/31

net commit

Early Access版Quagga導入

デフォルトは以下の感じなので Cumulus公式 / Ethernet Virtual Private Network - EVPN / Installing the EVPN Package に従い、Early Access版の Quagga を入れる。

kotetsu@bb03:~$ dpkg -l quagga
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                    Version          Architecture     Description
+++-=======================-================-================-====================================================
ii  quagga                  1.0.0+cl3u7      amd64            BGP/OSPF/RIP routing daemon
kotetsu@bb03:~$ grep -E "CumulusLinux-3-early-access" /etc/apt/sources.list
#deb     http://repo3.cumulusnetworks.com/repo CumulusLinux-3-early-access cumulus
#deb-src http://repo3.cumulusnetworks.com/repo CumulusLinux-3-early-access cumulus
kotetsu@bb03:~$ sudo sed -i -e '/CumulusLinux-3-early-access/ s/^#//g' /etc/apt/sources.list
kotetsu@bb03:~$ sudo apt update
kotetsu@bb03:~$ sudo apt install -y cumulus-evpn
kotetsu@bb03:~$ sudo apt upgrade
kotetsu@bb03:~$ dpkg -l quagga
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                    Version          Architecture     Description
+++-=======================-================-================-====================================================
ii  quagga                  1.0.0+cl3eau8    amd64            BGP/OSPF/RIP routing daemon

Quagga起動設定

デフォルトは以下の感じなので Cumulus公式 / Configuring Cumulus Quagga あたりを参考に、全台で起動設定を。

kotetsu@bb03:~$ grep -Ev "^#" /etc/quagga/daemons
zebra=no
bgpd=no
ospfd=no
ospf6d=no
ripd=no
ripngd=no
isisd=n

起動デーモン設定で zebrabgpdyes に変えて

kotetsu@bb03:~$ sudo sed -r -i -e 's/(zebra|bgpd)=no/\1=yes/g' /etc/quagga/daemons

自動起動設定して起動

kotetsu@bb03:~$ sudo systemctl enable quagga.service
kotetsu@bb03:~$ sudo systemctl start quagga.service
kotetsu@bb03:~$ sudo systemctl status quagga.service

...

   Active: active (running) since Mon 2017-03-20 10:52:25 JST; 4s ago

Mar 20 10:52:24 spine41 quagga[30608]: Starting Quagga daemons (prio:10):. zebra. bgpd.
Mar 20 10:52:24 spine41 bgpd[30631]: BGPd 1.0.0+cl3eau8 starting: vty@2605, bgp@<all>:179
Mar 20 10:52:24 spine41 zebra[30624]: client 12 says hello and bids fair to announce only bgp routes
Mar 20 10:52:24 spine41 watchquagga[30638]: watchquagga 1.0.0+cl3eau8 watching [zebra bgpd], mode [phased zebra restart]
Mar 20 10:52:24 spine41 watchquagga[30638]: bgpd state -> up : connect succeeded
Mar 20 10:52:25 spine41 watchquagga[30638]: zebra state -> up : connect succeeded
Mar 20 10:52:25 spine41 watchquagga[30638]: Watchquagga: Notifying Systemd we are up and running
Mar 20 10:52:25 spine41 quagga[30608]: Starting Quagga monitor daemon: watchquagga.
Mar 20 10:52:25 spine41 quagga[30608]: Exiting from the script
Mar 20 10:52:25 spine41 systemd[1]: Started Cumulus Linux Quagga.

eBGP設定

  • bb03
net add loopback lo ip address 172.31.0.3/32

net add bgp autonomous-system 65000
net add bgp router-id 172.31.0.3

net add routing prefix-list ipv4 PL_LO_CLOS seq 10 permit 172.16.0.0/12 ge 32 le 32
net add routing prefix-list ipv4 PL_LO_CLOS seq 20 permit 192.0.2.0/24 ge 31 le 31
net add bgp redistribute connected

net add bgp neighbor PEER_SPINE peer-group
net add bgp neighbor PEER_SPINE prefix-list PL_LO_CLOS out
net add bgp neighbor PEER_SPINE next-hop-self

net add bgp neighbor 192.0.2.9 remote-as 65003
net add bgp neighbor 192.0.2.9 description spine31
net add bgp neighbor 192.0.2.9 peer-group PEER_SPINE

net add bgp neighbor 192.0.2.11 remote-as 65003
net add bgp neighbor 192.0.2.11 description spine32
net add bgp neighbor 192.0.2.11 peer-group PEER_SPINE

net add bgp neighbor 192.0.2.13 remote-as 65004
net add bgp neighbor 192.0.2.13 description spine41
net add bgp neighbor 192.0.2.13 peer-group PEER_SPINE

net add bgp neighbor 192.0.2.15 remote-as 65004
net add bgp neighbor 192.0.2.15 description spine42
net add bgp neighbor 192.0.2.15 peer-group PEER_SPINE
  • bb04
net add loopback lo ip address 172.31.0.4/32

net add bgp autonomous-system 65000
net add bgp router-id 172.31.0.4

net add routing prefix-list ipv4 PL_LO_CLOS seq 10 permit 172.16.0.0/12 ge 32 le 32
net add routing prefix-list ipv4 PL_LO_CLOS seq 20 permit 192.0.2.0/24 ge 31 le 31
net add bgp redistribute connected

net add bgp neighbor PEER_SPINE peer-group
net add bgp neighbor PEER_SPINE prefix-list PL_LO_CLOS out

net add bgp neighbor 192.0.2.137 remote-as 65003
net add bgp neighbor 192.0.2.137 description spine31
net add bgp neighbor 192.0.2.137 peer-group PEER_SPINE

net add bgp neighbor 192.0.2.139 remote-as 65003
net add bgp neighbor 192.0.2.139 description spine32
net add bgp neighbor 192.0.2.139 peer-group PEER_SPINE

net add bgp neighbor 192.0.2.141 remote-as 65004
net add bgp neighbor 192.0.2.141 description spine41
net add bgp neighbor 192.0.2.141 peer-group PEER_SPINE

net add bgp neighbor 192.0.2.143 remote-as 65004
net add bgp neighbor 192.0.2.143 description spine42
net add bgp neighbor 192.0.2.143 peer-group PEER_SPINE
  • spine31
net add loopback lo ip address 172.16.3.1/32

net add bgp autonomous-system 65003
net add bgp router-id 172.16.3.1

net add routing prefix-list ipv4 PL_LO_CLOS seq 10 permit 172.16.0.0/12 ge 32 le 32
net add routing prefix-list ipv4 PL_LO_CLOS seq 20 permit 192.0.2.0/24 ge 31 le 31
net add bgp redistribute connected

net add bgp neighbor PEER_BB peer-group
net add bgp neighbor PEER_BB prefix-list PL_LO_CLOS out

net add bgp neighbor 192.0.2.8 remote-as 65000
net add bgp neighbor 192.0.2.8 description bb03
net add bgp neighbor 192.0.2.8 peer-group PEER_BB

net add bgp neighbor 192.0.2.136 remote-as 65000
net add bgp neighbor 192.0.2.136 description bb04
net add bgp neighbor 192.0.2.136 peer-group PEER_BB
  • spine32
net add loopback lo ip address 172.16.3.2/32

net add bgp autonomous-system 65003
net add bgp router-id 172.16.3.2

net add routing prefix-list ipv4 PL_LO_CLOS seq 10 permit 172.16.0.0/12 ge 32 le 32
net add routing prefix-list ipv4 PL_LO_CLOS seq 20 permit 192.0.2.0/24 ge 31 le 31
net add bgp redistribute connected

net add bgp neighbor PEER_BB peer-group
net add bgp neighbor PEER_BB prefix-list PL_LO_CLOS out

net add bgp neighbor 192.0.2.10 remote-as 65000
net add bgp neighbor 192.0.2.10 description bb03
net add bgp neighbor 192.0.2.10 peer-group PEER_BB

net add bgp neighbor 192.0.2.138 remote-as 65000
net add bgp neighbor 192.0.2.138 description bb04
net add bgp neighbor 192.0.2.138 peer-group PEER_BB
  • spine41
net add loopback lo ip address 172.16.4.1/32

net add bgp autonomous-system 65004
net add bgp router-id 172.16.4.1

net add routing prefix-list ipv4 PL_LO_CLOS seq 10 permit 172.16.0.0/12 ge 32 le 32
net add routing prefix-list ipv4 PL_LO_CLOS seq 20 permit 192.0.2.0/24 ge 31 le 31
net add bgp redistribute connected

net add bgp neighbor PEER_BB peer-group
net add bgp neighbor PEER_BB prefix-list PL_LO_CLOS out

net add bgp neighbor 192.0.2.12 remote-as 65000
net add bgp neighbor 192.0.2.12 description bb03
net add bgp neighbor 192.0.2.12 peer-group PEER_BB

net add bgp neighbor 192.0.2.140 remote-as 65000
net add bgp neighbor 192.0.2.140 description bb04
net add bgp neighbor 192.0.2.140 peer-group PEER_BB
  • spine42
net add loopback lo ip address 172.16.4.2/32

net add bgp autonomous-system 65004
net add bgp router-id 172.16.4.2

net add routing prefix-list ipv4 PL_LO_CLOS seq 10 permit 172.16.0.0/12 ge 32 le 32
net add routing prefix-list ipv4 PL_LO_CLOS seq 20 permit 192.0.2.0/24 ge 31 le 31
net add bgp redistribute connected

net add bgp neighbor PEER_BB peer-group
net add bgp neighbor PEER_BB prefix-list PL_LO_CLOS out

net add bgp neighbor 192.0.2.14 remote-as 65000
net add bgp neighbor 192.0.2.14 description bb03
net add bgp neighbor 192.0.2.14 peer-group PEER_BB

net add bgp neighbor 192.0.2.142 remote-as 65000
net add bgp neighbor 192.0.2.142 description bb04
net add bgp neighbor 192.0.2.142 peer-group PEER_BB

ちなみに…neighbor 設定をしようと何となく tab を押したら、LLDPで得た隣接機器の情報と物理IFのマッピングが表示された…しゅごい…。

kotetsu@bb03:~$ net add bgp neighbor
    <bgppeer>          :  BGP neighbor or peer-group
    <interface>        :  An interface name "swp1" or glob "swp1-4,6,10-12"
    <ip>               :  An IPv4 or IPv6 Address
    <text-peer-group>  :  A BGP peer-group name
    eth0               :  LLDP peer spine41
    lo                 :  interface
    swp1               :  LLDP peer spine31
    swp2               :  LLDP peer spine32
    swp3               :  LLDP peer spine41
    swp4               :  LLDP peer spine42
    swp5               :  interface

Cumulus VX MLAG 設定

Cumulus公式 / Multi-Chassis Link Aggregation - MLAG あたりを参考に

まずは MLAG 用の LAG 設定を Cumulus公式 / Bonding - Link Aggregation あたりを参考に設定していきます。 組める最低限の設定だけ…。

  • spine31
net add bond bond0 bond slaves swp3-4
net add bond bond0 alias DEV=spine32 IF=bond0

net add interface bond0.4094 alias MLAG DEDICATED
net add interface bond0.4094 ip address 198.51.100.1/30
net add interface bond0.4094 clag peer-ip 198.51.100.2
net add interface bond0.4094 clag sys-mac 44:38:39:FF:40:94
  • spine32
net add bond bond0 bond slaves swp3-4
net add bond bond0 alias DEV=spine31 IF=bond0

net add interface bond0.4094 alias MLAG DEDICATED
net add interface bond0.4094 ip address 198.51.100.2/30
net add interface bond0.4094 clag peer-ip 198.51.100.1
net add interface bond0.4094 clag sys-mac 44:38:39:FF:40:94
  • spine41
net add bond bond0 bond slaves swp3-4
net add bond bond0 alias DEV=spine42 IF=bond0

net add interface bond0.4094 alias MLAG DEDICATED
net add interface bond0.4094 ip address 198.51.100.1/30
net add interface bond0.4094 clag peer-ip 198.51.100.2
net add interface bond0.4094 clag sys-mac 44:38:39:FF:40:94
  • spine42
net add bond bond0 bond slaves swp3-4
net add bond bond0 alias DEV=spine41 IF=bond0

net add interface bond0.4094 alias MLAG DEDICATED
net add interface bond0.4094 ip address 198.51.100.2/30
net add interface bond0.4094 clag peer-ip 198.51.100.1
net add interface bond0.4094 clag sys-mac 44:38:39:FF:40:94

こんな感じで MLAG が組めている筈。

kotetsu@spine31:~$ net show clag status
The peer is alive
    Peer Priority, ID, and Role: 32768 00:37:c4:a9:0f:03 primary
     Our Priority, ID, and Role: 32768 00:37:c4:f8:17:03 secondary
          Peer Interface and IP: bond0.4094 198.51.100.2
                      Backup IP:  (inactive)
                     System MAC: 44:38:39:ff:40:94
kotetsu@spine32:~$ net show clag status
The peer is alive
     Our Priority, ID, and Role: 32768 00:37:c4:a9:0f:03 primary
    Peer Priority, ID, and Role: 32768 00:37:c4:f8:17:03 secondary
          Peer Interface and IP: bond0.4094 198.51.100.1
                      Backup IP:  (inactive)
                     System MAC: 44:38:39:ff:40:94
  • spine3[12]
net add bond bond1 bond slaves swp5
net add bond bond1 alias DEV=torSW301a IF=bond0
net add bond bond1 mtu 9000
net add bond bond1 clag id 1
  • spine4[12]
net add bond bond1 bond slaves swp5
net add bond bond1 alias DEV=torSW401a IF=bond0
net add bond bond1 mtu 9000
net add bond bond1 clag id 1

bridge 設定

spine[34][12] 全台で 例によって Cumulus公式 / VLAN-aware Bridge Mode for Large-scale Layer 2 Environments を参考にして

net add bridge bridge ports bond0
net add bridge bridge ports bond1
net add bridge bridge vids 2-4093

torSW での LACP 状態確認

root@torSW301a:~# ovs-appctl lacp/show bond0
---- bond0 ----
        status: active negotiated
        sys_id: 00:37:c4:7e:e0:01
        sys_priority: 65534
        aggregation key: 1
        lacp_time: fast

slave: ens4: current attached
        port_id: 2
        port_priority: 65535
        may_enable: true

        actor sys_id: 00:37:c4:7e:e0:01
        actor sys_priority: 65534
        actor port_id: 2
        actor port_priority: 65535
        actor key: 1
        actor state: activity timeout aggregation synchronized collecting distributing

        partner sys_id: 44:38:39:ff:40:94
        partner sys_priority: 65535
        partner port_id: 1
        partner port_priority: 255
        partner key: 9
        partner state: activity timeout aggregation synchronized collecting distributing

slave: ens5: current attached
        port_id: 1
        port_priority: 65535
        may_enable: true

        actor sys_id: 00:37:c4:7e:e0:01
        actor sys_priority: 65534
        actor port_id: 1
        actor port_priority: 65535
        actor key: 1
        actor state: activity timeout aggregation synchronized collecting distributing

        partner sys_id: 44:38:39:ff:40:94
        partner sys_priority: 65535
        partner port_id: 1
        partner port_priority: 255
        partner key: 9
        partner state: activity timeout aggregation synchronized collecting distributing
root@torSW401a:~# ovs-appctl lacp/show bond0
---- bond0 ----
        status: active negotiated
        sys_id: 00:37:c4:2c:e5:01
        sys_priority: 65534
        aggregation key: 1
        lacp_time: fast

slave: ens4: current attached
        port_id: 1
        port_priority: 65535
        may_enable: true

        actor sys_id: 00:37:c4:2c:e5:01
        actor sys_priority: 65534
        actor port_id: 1
        actor port_priority: 65535
        actor key: 1
        actor state: activity timeout aggregation synchronized collecting distributing

        partner sys_id: 44:38:39:ff:40:94
        partner sys_priority: 65535
        partner port_id: 1
        partner port_priority: 255
        partner key: 9
        partner state: activity timeout aggregation synchronized collecting distributing

slave: ens5: current attached
        port_id: 2
        port_priority: 65535
        may_enable: true

        actor sys_id: 00:37:c4:2c:e5:01
        actor sys_priority: 65534
        actor port_id: 2
        actor port_priority: 65535
        actor key: 1
        actor state: activity timeout aggregation synchronized collecting distributing

        partner sys_id: 44:38:39:ff:40:94
        partner sys_priority: 65535
        partner port_id: 1
        partner port_priority: 255
        partner key: 9
        partner state: activity timeout aggregation synchronized collecting distributing

Cumulus VX VXLAN+EVPN 設定

仮想VTEPごとの仮想IPアドレス設定

  • spine3[12]
net add loopback lo clag vxlan-anycast-ip 172.16.3.100
  • spine4[12]
net add loopback lo clag vxlan-anycast-ip 172.16.4.100

本環境では redistribute connected で BGP ipv4 に流していて out でかけている prefix-list にもマッチする設定にしたので、これでこの仮想IPアドレスも広告される筈

kotetsu@bb03:~$ net show route

show ip route
=============
Codes: K - kernel route, C - connected, S - static, R - RIP,
       O - OSPF, I - IS-IS, B - BGP, P - PIM, T - Table, v - VNC,
       V - VPN,
       > - selected route, * - FIB route

K>* 0.0.0.0/0 via 10.0.0.254, eth0
C>* 10.0.0.0/24 is directly connected, eth0
B>* 172.16.3.1/32 [20/0] via 192.0.2.9, swp1, 08:58:27
B>* 172.16.3.2/32 [20/0] via 192.0.2.11, swp2, 08:57:25
B>* 172.16.3.100/32 [20/0] via 192.0.2.9, swp1, 00:28:29
  *                        via 192.0.2.11, swp2, 00:28:29
B>* 172.16.4.1/32 [20/0] via 192.0.2.13, swp3, 08:56:25
B>* 172.16.4.2/32 [20/0] via 192.0.2.15, swp4, 08:56:10
B>* 172.16.4.100/32 [20/0] via 192.0.2.15, swp4, 00:27:45
  *                        via 192.0.2.13, swp3, 00:27:45
C>* 172.31.0.3/32 is directly connected, lo
C>* 192.0.2.8/31 is directly connected, swp1
C>* 192.0.2.10/31 is directly connected, swp2
C>* 192.0.2.12/31 is directly connected, swp3
C>* 192.0.2.14/31 is directly connected, swp4
B>* 192.0.2.136/31 [20/0] via 192.0.2.9, swp1, 08:58:27
B>* 192.0.2.138/31 [20/0] via 192.0.2.11, swp2, 08:57:25
B>* 192.0.2.140/31 [20/0] via 192.0.2.13, swp3, 08:56:25
B>* 192.0.2.142/31 [20/0] via 192.0.2.15, swp4, 08:56:10

spine全台にVXLAN VNI設定

  • spine[34][12]
net add vxlan vxlan010100 vxlan id 10100
net add vxlan vxlan010100 bridge access 100

net add vxlan vxlan010200 vxlan id 10200
net add vxlan vxlan010200 bridge access 200

これで net commit すると、この vxlan インターフェース群は自動的に bridge にくっついてくる

kotetsu@spine42:~$ net commit
--- /etc/network/interfaces     2017-03-20 22:07:22.297455993 +0900
+++ /var/run/nclu/iface/interfaces.tmp  2017-03-20 22:17:54.341064283 +0900

...

 iface bridge
-    bridge-ports bond0 bond1
+    bridge-ports bond0 bond1 vxlan010100 vxlan010200

...

全台にVXLAN Tunnel IPアドレスを付与

  • spine31
net add vxlan vxlan010100 vxlan local-tunnelip 172.16.3.1
net add vxlan vxlan010200 vxlan local-tunnelip 172.16.3.1
  • spine32
net add vxlan vxlan010100 vxlan local-tunnelip 172.16.3.2
net add vxlan vxlan010200 vxlan local-tunnelip 172.16.3.2
  • spine41
net add vxlan vxlan010100 vxlan local-tunnelip 172.16.4.1
net add vxlan vxlan010200 vxlan local-tunnelip 172.16.4.1
  • spine42
net add vxlan vxlan010100 vxlan local-tunnelip 172.16.4.2
net add vxlan vxlan010200 vxlan local-tunnelip 172.16.4.2

EVPN 有効化~設定

Cumulus公式 / Ethernet Virtual Private Network - EVPN / Configuring EVPN に従って設定していきます。 Early Access 版の機能(quagga限定でCLIまでは)どうも探した感じでは net コマンドはまだ用意されていないようなので、従来の Quagga 設定で

kotetsu@spine41:~$ sudo vtysh

Hello, this is Quagga (version 1.0.0+cl3eau8).
Copyright 1996-2005 Kunihiro Ishiguro, et al.

spine41#
spine41# configure terminal
spine41(config)# router bgp 65004
spine41(config-router)# address-family evpn
spine41(config-router-af)# neighbor PEER_BB activate
spine41(config-router-af)# advertise-all-vni
spine41(config-router-af)# end
spine41# write memory
Note: this version of vtysh never writes vtysh.conf
Building Configuration...
Integrated configuration saved to /etc/quagga/Quagga.conf
[OK]
spine41#
spine41# exit
kotetsu@spine41:~$

以下のような設定を

  • bb0[34]
router bgp 65000
 address-family evpn
   neighbor PEER_SPINE activate
  • spine3[12]
router bgp 65003
 address-family evpn
   neighbor PEER_BB activate
   advertise-all-vni
  • spine4[12]
router bgp 65004
 address-family evpn
   neighbor PEER_BB activate
   advertise-all-vni

Disabling Data Plane MAC Learning over VXLAN Tunnels

spine[34][12] にて /etc/network/interfaces を編集して、全vxlanインターフェースに bridge-learning off を追記しておきます。

kotetsu@spine31:~$ diff -u /var/tmp/etc_network_interfaces /etc/network/interfaces
--- /var/tmp/etc_network_interfaces     2017-03-20 23:19:45.046311072 +0900
+++ /etc/network/interfaces     2017-03-20 23:20:33.701311345 +0900
@@ -64,6 +64,7 @@
 auto vxlan010100
 iface vxlan010100
     bridge-access 100
+    bridge-learning off
     mstpctl-bpduguard yes
     mstpctl-portbpdufilter yes
     vxlan-id 10100
@@ -72,6 +73,7 @@
 auto vxlan010200
 iface vxlan010200
     bridge-access 200
+    bridge-learning off
     mstpctl-bpduguard yes
     mstpctl-portbpdufilter yes
     vxlan-id 10200

動作確認

通信確認

End End での通信確認(L2 over L3)

kotetsu@node31:~$ ping 192.168.1.4
PING 192.168.1.4 (192.168.1.4) 56(84) bytes of data.
64 bytes from 192.168.1.4: icmp_seq=1 ttl=64 time=4.67 ms
64 bytes from 192.168.1.4: icmp_seq=2 ttl=64 time=1.86 ms
64 bytes from 192.168.1.4: icmp_seq=3 ttl=64 time=1.81 ms
64 bytes from 192.168.1.4: icmp_seq=4 ttl=64 time=1.94 ms
64 bytes from 192.168.1.4: icmp_seq=5 ttl=64 time=2.07 ms
64 bytes from 192.168.1.4: icmp_seq=6 ttl=64 time=1.24 ms
64 bytes from 192.168.1.4: icmp_seq=7 ttl=64 time=1.78 ms
^C
--- 192.168.1.4 ping statistics ---
7 packets transmitted, 7 received, 0% packet loss, time 6009ms
rtt min/avg/max/mdev = 1.241/2.199/4.677/1.041 ms
kotetsu@node31:~$ ip n show
192.168.1.4 dev ens4 lladdr 00:37:c4:56:b4:01 STALE
kotetsu@node41:~$ ip n show
192.168.1.3 dev ens4 lladdr 00:37:c4:55:09:01 STALE

Cumulus VX 各種テーブル確認

Cumulus公式 / Ethernet Virtual Private Network - EVPN / Output Commands に参照系のコマンドが色々と提示されているので、それを見ながら。

spine MAC アドレステーブル

まずは VTEP, EVPN PE として動作している spine 群の MAC アドレステーブルを。

  • TunnelDest 列で対向 VTEP の共有loopback IPアドレスを使っていることが伺える
  • MAC 列で 00:00:00:00:00:00 と表示されているのは BUM traffic replication らしい(公式の記載より)
kotetsu@spine31:~$ net show bridge macs

VLAN      Master    Interface    MAC                TunnelDest    State      Flags    LastSeen
--------  --------  -----------  -----------------  ------------  ---------  -------  ----------
100       bridge    bond1        00:37:c4:55:09:01                                    00:01:43
100       bridge    vxlan010100  00:37:c4:56:b4:01                                    00:03:52
untagged            vxlan010100  00:00:00:00:00:00  172.16.4.100  permanent  self     01:08:33
untagged            vxlan010100  00:37:c4:56:b4:01  172.16.4.100             self     00:03:58
untagged            vxlan010200  00:00:00:00:00:00  172.16.4.100  permanent  self     01:08:33
untagged  bridge    bond0        00:37:c4:f8:17:03                permanent           05:38:27
untagged  bridge    bond1        00:37:c4:f8:17:05                permanent           03:42:10
untagged  bridge    vxlan010100  a6:21:d1:0c:20:a8                permanent           02:20:01
untagged  bridge    vxlan010200  de:8e:ed:62:05:12                permanent           02:20:01
kotetsu@spine32:~$ net show bridge macs

VLAN      Master    Interface    MAC                TunnelDest    State      Flags    LastSeen
--------  --------  -----------  -----------------  ------------  ---------  -------  ----------
100       bridge    bond1        00:37:c4:55:09:01                                    00:04:45
100       bridge    vxlan010100  00:37:c4:56:b4:01                                    00:04:51
untagged            vxlan010100  00:00:00:00:00:00  172.16.4.100  permanent  self     01:09:25
untagged            vxlan010100  00:37:c4:56:b4:01  172.16.4.100             self     00:04:51
untagged            vxlan010200  00:00:00:00:00:00  172.16.4.100  permanent  self     01:09:25
untagged  bridge    bond0        00:37:c4:a9:0f:03                permanent           05:38:42
untagged  bridge    bond1        00:37:c4:a9:0f:05                permanent           03:42:00
untagged  bridge    vxlan010100  ea:60:31:c9:77:63                permanent           02:18:00
untagged  bridge    vxlan010200  06:f9:9e:92:a4:c0                permanent           02:18:00
kotetsu@spine41:~$ net show bridge macs

VLAN      Master    Interface    MAC                TunnelDest    State      Flags    LastSeen
--------  --------  -----------  -----------------  ------------  ---------  -------  ----------
100       bridge    bond1        00:37:c4:56:b4:01                                    00:01:43
100       bridge    vxlan010100  00:37:c4:55:09:01                                    00:01:49
untagged            vxlan010100  00:00:00:00:00:00  172.16.3.100  permanent  self     01:06:24
untagged            vxlan010100  00:37:c4:55:09:01  172.16.3.100             self     00:01:49
untagged            vxlan010200  00:00:00:00:00:00  172.16.3.100  permanent  self     01:06:24
untagged  bridge    bond0        00:37:c4:fe:34:03                permanent           05:34:25
untagged  bridge    bond1        00:37:c4:fe:34:05                permanent           03:35:31
untagged  bridge    vxlan010100  46:bf:75:c3:83:e3                permanent           02:14:34
untagged  bridge    vxlan010200  ca:4e:29:fd:d9:8e                permanent           02:14:34
kotetsu@spine42:~$ net show bridge macs

VLAN      Master    Interface    MAC                TunnelDest    State      Flags    LastSeen
--------  --------  -----------  -----------------  ------------  ---------  -------  ----------
100       bridge    bond1        00:37:c4:56:b4:01                                    00:00:46
100       bridge    vxlan010100  00:37:c4:55:09:01                                    00:02:49
untagged            vxlan010100  00:00:00:00:00:00  172.16.3.100  permanent  self     01:07:29
untagged            vxlan010100  00:37:c4:55:09:01  172.16.3.100             self     00:02:55
untagged            vxlan010200  00:00:00:00:00:00  172.16.3.100  permanent  self     01:07:29
untagged  bridge    bond0        00:37:c4:32:db:03                permanent           05:35:24
untagged  bridge    bond1        00:37:c4:32:db:05                permanent           03:36:18
untagged  bridge    vxlan010100  9e:e4:df:d2:a9:3a                permanent           02:15:20
untagged  bridge    vxlan010200  6a:3a:0d:08:fb:9e                permanent           02:15:20

広告している VNI や VTEP 情報

sudo vtysh から

spine31# show bgp evpn vni
Advertise All VNI flag: Enabled
Number of VNIs: 2
Flags: * - Kernel
  VNI        Orig IP         RD                    Import RT             Export RT
* 10200      172.16.3.100    172.16.3.1:10200      65003:10200           65003:10200
* 10100      172.16.3.100    172.16.3.1:10100      65003:10100           65003:10100


spine31# show evpn vni
Number of VNIs: 2
VNI        VxLAN IF              VTEP IP         # MACs   Remote VTEPs
10200      vxlan010200           172.16.3.100    0        172.16.4.100
10100      vxlan010100           172.16.3.100    2        172.16.4.100
spine32# show bgp evpn vni
Advertise All VNI flag: Enabled
Number of VNIs: 2
Flags: * - Kernel
  VNI        Orig IP         RD                    Import RT             Export RT
* 10200      172.16.3.100    172.16.3.2:10200      65003:10200           65003:10200
* 10100      172.16.3.100    172.16.3.2:10100      65003:10100           65003:10100


spine32# show evpn vni
Number of VNIs: 2
VNI        VxLAN IF              VTEP IP         # MACs   Remote VTEPs
10200      vxlan010200           172.16.3.100    0        172.16.4.100
10100      vxlan010100           172.16.3.100    2        172.16.4.100
spine41# show bgp evpn vni
Advertise All VNI flag: Enabled
Number of VNIs: 2
Flags: * - Kernel
  VNI        Orig IP         RD                    Import RT             Export RT
* 10200      172.16.4.100    172.16.4.1:10200      65004:10200           65004:10200
* 10100      172.16.4.100    172.16.4.1:10100      65004:10100           65004:10100


spine41# show evpn vni
Number of VNIs: 2
VNI        VxLAN IF              VTEP IP         # MACs   Remote VTEPs
10200      vxlan010200           172.16.4.100    0        172.16.3.100
10100      vxlan010100           172.16.4.100    2        172.16.3.100
spine42# show bgp evpn vni
Advertise All VNI flag: Enabled
Number of VNIs: 2
Flags: * - Kernel
  VNI        Orig IP         RD                    Import RT             Export RT
* 10200      172.16.4.100    172.16.4.2:10200      65004:10200           65004:10200
* 10100      172.16.4.100    172.16.4.2:10100      65004:10100           65004:10100


spine42# show evpn vni
Number of VNIs: 2
VNI        VxLAN IF              VTEP IP         # MACs   Remote VTEPs
10200      vxlan010200           172.16.4.100    0        172.16.3.100
10100      vxlan010100           172.16.4.100    2        172.16.3.100

EVPN 学習経路

自ASの別 spine からの経路を bb 経由で受け取るように設定してはいないので、RDとしても登場しないです。 自ASのMAC学習同期は、MLAGで良きようにやってくれる筈だから、それで良いかと。 また EVPN Multihoming を使った際には必要になる Type 1,4 に関しても一切情報が登場しません。

suto vtysh から

spine31# show bgp evpn route
BGP table version is 0, local router ID is 172.16.3.1
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal
Origin codes: i - IGP, e - EGP, ? - incomplete
EVPN type-2 prefix: [2]:[ESI]:[EthTag]:[MAClen]:[MAC]
EVPN type-3 prefix: [3]:[EthTag]:[IPlen]:[OrigIP]

   Network          Next Hop            Metric LocPrf Weight Path
Route Distinguisher: 172.16.3.1:10100
*> [2]:[0]:[0]:[48]:[00:37:c4:55:09:01]
                    172.16.3.100                       32768 i
*> [3]:[0]:[32]:[172.16.3.100]
                    172.16.3.100                       32768 i
Route Distinguisher: 172.16.3.1:10200
*> [3]:[0]:[32]:[172.16.3.100]
                    172.16.3.100                       32768 i
Route Distinguisher: 172.16.4.1:10100
*  [2]:[0]:[0]:[48]:[00:37:c4:56:b4:01]
                    172.16.4.100                           0 65000 65004 i
*> [2]:[0]:[0]:[48]:[00:37:c4:56:b4:01]
                    172.16.4.100                           0 65000 65004 i
*  [3]:[0]:[32]:[172.16.4.100]
                    172.16.4.100                           0 65000 65004 i
*> [3]:[0]:[32]:[172.16.4.100]
                    172.16.4.100                           0 65000 65004 i
Route Distinguisher: 172.16.4.1:10200
*  [3]:[0]:[32]:[172.16.4.100]
                    172.16.4.100                           0 65000 65004 i
*> [3]:[0]:[32]:[172.16.4.100]
                    172.16.4.100                           0 65000 65004 i
Route Distinguisher: 172.16.4.2:10100
*  [2]:[0]:[0]:[48]:[00:37:c4:56:b4:01]
                    172.16.4.100                           0 65000 65004 i
*> [2]:[0]:[0]:[48]:[00:37:c4:56:b4:01]
                    172.16.4.100                           0 65000 65004 i
*  [3]:[0]:[32]:[172.16.4.100]
                    172.16.4.100                           0 65000 65004 i
*> [3]:[0]:[32]:[172.16.4.100]
                    172.16.4.100                           0 65000 65004 i
Route Distinguisher: 172.16.4.2:10200
*  [3]:[0]:[32]:[172.16.4.100]
                    172.16.4.100                           0 65000 65004 i
*> [3]:[0]:[32]:[172.16.4.100]
                    172.16.4.100                           0 65000 65004 i

Displayed 9 prefixes (15 paths)
spine32# show bgp evpn route
BGP table version is 0, local router ID is 172.16.3.2
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal
Origin codes: i - IGP, e - EGP, ? - incomplete
EVPN type-2 prefix: [2]:[ESI]:[EthTag]:[MAClen]:[MAC]
EVPN type-3 prefix: [3]:[EthTag]:[IPlen]:[OrigIP]

   Network          Next Hop            Metric LocPrf Weight Path
Route Distinguisher: 172.16.3.2:10100
*> [2]:[0]:[0]:[48]:[00:37:c4:55:09:01]
                    172.16.3.100                       32768 i
*> [3]:[0]:[32]:[172.16.3.100]
                    172.16.3.100                       32768 i
Route Distinguisher: 172.16.3.2:10200
*> [3]:[0]:[32]:[172.16.3.100]
                    172.16.3.100                       32768 i
Route Distinguisher: 172.16.4.1:10100
*  [2]:[0]:[0]:[48]:[00:37:c4:56:b4:01]
                    172.16.4.100                           0 65000 65004 i
*> [2]:[0]:[0]:[48]:[00:37:c4:56:b4:01]
                    172.16.4.100                           0 65000 65004 i
*  [3]:[0]:[32]:[172.16.4.100]
                    172.16.4.100                           0 65000 65004 i
*> [3]:[0]:[32]:[172.16.4.100]
                    172.16.4.100                           0 65000 65004 i
Route Distinguisher: 172.16.4.1:10200
*  [3]:[0]:[32]:[172.16.4.100]
                    172.16.4.100                           0 65000 65004 i
*> [3]:[0]:[32]:[172.16.4.100]
                    172.16.4.100                           0 65000 65004 i
Route Distinguisher: 172.16.4.2:10100
*> [2]:[0]:[0]:[48]:[00:37:c4:56:b4:01]
                    172.16.4.100                           0 65000 65004 i
*  [2]:[0]:[0]:[48]:[00:37:c4:56:b4:01]
                    172.16.4.100                           0 65000 65004 i
*  [3]:[0]:[32]:[172.16.4.100]
                    172.16.4.100                           0 65000 65004 i
*> [3]:[0]:[32]:[172.16.4.100]
                    172.16.4.100                           0 65000 65004 i
Route Distinguisher: 172.16.4.2:10200
*  [3]:[0]:[32]:[172.16.4.100]
                    172.16.4.100                           0 65000 65004 i
*> [3]:[0]:[32]:[172.16.4.100]
                    172.16.4.100                           0 65000 65004 i

Displayed 9 prefixes (15 paths)
spine41# show bgp evpn route
BGP table version is 0, local router ID is 172.16.4.1
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal
Origin codes: i - IGP, e - EGP, ? - incomplete
EVPN type-2 prefix: [2]:[ESI]:[EthTag]:[MAClen]:[MAC]
EVPN type-3 prefix: [3]:[EthTag]:[IPlen]:[OrigIP]

   Network          Next Hop            Metric LocPrf Weight Path
Route Distinguisher: 172.16.3.1:10100
*  [2]:[0]:[0]:[48]:[00:37:c4:55:09:01]
                    172.16.3.100                           0 65000 65003 i
*> [2]:[0]:[0]:[48]:[00:37:c4:55:09:01]
                    172.16.3.100                           0 65000 65003 i
*  [3]:[0]:[32]:[172.16.3.100]
                    172.16.3.100                           0 65000 65003 i
*> [3]:[0]:[32]:[172.16.3.100]
                    172.16.3.100                           0 65000 65003 i
Route Distinguisher: 172.16.3.1:10200
*  [3]:[0]:[32]:[172.16.3.100]
                    172.16.3.100                           0 65000 65003 i
*> [3]:[0]:[32]:[172.16.3.100]
                    172.16.3.100                           0 65000 65003 i
Route Distinguisher: 172.16.3.2:10100
*  [2]:[0]:[0]:[48]:[00:37:c4:55:09:01]
                    172.16.3.100                           0 65000 65003 i
*> [2]:[0]:[0]:[48]:[00:37:c4:55:09:01]
                    172.16.3.100                           0 65000 65003 i
*  [3]:[0]:[32]:[172.16.3.100]
                    172.16.3.100                           0 65000 65003 i
*> [3]:[0]:[32]:[172.16.3.100]
                    172.16.3.100                           0 65000 65003 i
Route Distinguisher: 172.16.3.2:10200
*  [3]:[0]:[32]:[172.16.3.100]
                    172.16.3.100                           0 65000 65003 i
*> [3]:[0]:[32]:[172.16.3.100]
                    172.16.3.100                           0 65000 65003 i
Route Distinguisher: 172.16.4.1:10100
*> [2]:[0]:[0]:[48]:[00:37:c4:56:b4:01]
                    172.16.4.100                       32768 i
*> [3]:[0]:[32]:[172.16.4.100]
                    172.16.4.100                       32768 i
Route Distinguisher: 172.16.4.1:10200
*> [3]:[0]:[32]:[172.16.4.100]
                    172.16.4.100                       32768 i

Displayed 9 prefixes (15 paths)
spine42# show bgp evpn route
BGP table version is 0, local router ID is 172.16.4.2
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal
Origin codes: i - IGP, e - EGP, ? - incomplete
EVPN type-2 prefix: [2]:[ESI]:[EthTag]:[MAClen]:[MAC]
EVPN type-3 prefix: [3]:[EthTag]:[IPlen]:[OrigIP]

   Network          Next Hop            Metric LocPrf Weight Path
Route Distinguisher: 172.16.3.1:10100
*  [2]:[0]:[0]:[48]:[00:37:c4:55:09:01]
                    172.16.3.100                           0 65000 65003 i
*> [2]:[0]:[0]:[48]:[00:37:c4:55:09:01]
                    172.16.3.100                           0 65000 65003 i
*  [3]:[0]:[32]:[172.16.3.100]
                    172.16.3.100                           0 65000 65003 i
*> [3]:[0]:[32]:[172.16.3.100]
                    172.16.3.100                           0 65000 65003 i
Route Distinguisher: 172.16.3.1:10200
*  [3]:[0]:[32]:[172.16.3.100]
                    172.16.3.100                           0 65000 65003 i
*> [3]:[0]:[32]:[172.16.3.100]
                    172.16.3.100                           0 65000 65003 i
Route Distinguisher: 172.16.3.2:10100
*  [2]:[0]:[0]:[48]:[00:37:c4:55:09:01]
                    172.16.3.100                           0 65000 65003 i
*> [2]:[0]:[0]:[48]:[00:37:c4:55:09:01]
                    172.16.3.100                           0 65000 65003 i
*  [3]:[0]:[32]:[172.16.3.100]
                    172.16.3.100                           0 65000 65003 i
*> [3]:[0]:[32]:[172.16.3.100]
                    172.16.3.100                           0 65000 65003 i
Route Distinguisher: 172.16.3.2:10200
*  [3]:[0]:[32]:[172.16.3.100]
                    172.16.3.100                           0 65000 65003 i
*> [3]:[0]:[32]:[172.16.3.100]
                    172.16.3.100                           0 65000 65003 i
Route Distinguisher: 172.16.4.2:10100
*> [2]:[0]:[0]:[48]:[00:37:c4:56:b4:01]
                    172.16.4.100                       32768 i
*> [3]:[0]:[32]:[172.16.4.100]
                    172.16.4.100                       32768 i
Route Distinguisher: 172.16.4.2:10200
*> [3]:[0]:[32]:[172.16.4.100]
                    172.16.4.100                       32768 i

Displayed 9 prefixes (15 paths)

VXLAN関係にはノータッチで転送土管に徹する bb も、EVPN signaling 用のMP-BGPには参加します。

bb03# show bgp evpn route
BGP table version is 0, local router ID is 172.31.0.3
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal
Origin codes: i - IGP, e - EGP, ? - incomplete
EVPN type-2 prefix: [2]:[ESI]:[EthTag]:[MAClen]:[MAC]
EVPN type-3 prefix: [3]:[EthTag]:[IPlen]:[OrigIP]

   Network          Next Hop            Metric LocPrf Weight Path
Route Distinguisher: 172.16.3.1:10100
*> [2]:[0]:[0]:[48]:[00:37:c4:55:09:01]
                    172.16.3.100                           0 65003 i
*> [3]:[0]:[32]:[172.16.3.100]
                    172.16.3.100                           0 65003 i
Route Distinguisher: 172.16.3.1:10200
*> [3]:[0]:[32]:[172.16.3.100]
                    172.16.3.100                           0 65003 i
Route Distinguisher: 172.16.3.2:10100
*> [2]:[0]:[0]:[48]:[00:37:c4:55:09:01]
                    172.16.3.100                           0 65003 i
*> [3]:[0]:[32]:[172.16.3.100]
                    172.16.3.100                           0 65003 i
Route Distinguisher: 172.16.3.2:10200
*> [3]:[0]:[32]:[172.16.3.100]
                    172.16.3.100                           0 65003 i
Route Distinguisher: 172.16.4.1:10100
*> [2]:[0]:[0]:[48]:[00:37:c4:56:b4:01]
                    172.16.4.100                           0 65004 i
*> [3]:[0]:[32]:[172.16.4.100]
                    172.16.4.100                           0 65004 i
Route Distinguisher: 172.16.4.1:10200
*> [3]:[0]:[32]:[172.16.4.100]
                    172.16.4.100                           0 65004 i
Route Distinguisher: 172.16.4.2:10100
*> [2]:[0]:[0]:[48]:[00:37:c4:56:b4:01]
                    172.16.4.100                           0 65004 i
*> [3]:[0]:[32]:[172.16.4.100]
                    172.16.4.100                           0 65004 i
Route Distinguisher: 172.16.4.2:10200
*> [3]:[0]:[32]:[172.16.4.100]
                    172.16.4.100                           0 65004 i

Displayed 12 prefixes (12 paths)
bb04# show bgp evpn route
BGP table version is 0, local router ID is 172.31.0.4
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal
Origin codes: i - IGP, e - EGP, ? - incomplete
EVPN type-2 prefix: [2]:[ESI]:[EthTag]:[MAClen]:[MAC]
EVPN type-3 prefix: [3]:[EthTag]:[IPlen]:[OrigIP]

   Network          Next Hop            Metric LocPrf Weight Path
Route Distinguisher: 172.16.3.1:10100
*> [2]:[0]:[0]:[48]:[00:37:c4:55:09:01]
                    172.16.3.100                           0 65003 i
*> [3]:[0]:[32]:[172.16.3.100]
                    172.16.3.100                           0 65003 i
Route Distinguisher: 172.16.3.1:10200
*> [3]:[0]:[32]:[172.16.3.100]
                    172.16.3.100                           0 65003 i
Route Distinguisher: 172.16.3.2:10100
*> [2]:[0]:[0]:[48]:[00:37:c4:55:09:01]
                    172.16.3.100                           0 65003 i
*> [3]:[0]:[32]:[172.16.3.100]
                    172.16.3.100                           0 65003 i
Route Distinguisher: 172.16.3.2:10200
*> [3]:[0]:[32]:[172.16.3.100]
                    172.16.3.100                           0 65003 i
Route Distinguisher: 172.16.4.1:10100
*> [2]:[0]:[0]:[48]:[00:37:c4:56:b4:01]
                    172.16.4.100                           0 65004 i
*> [3]:[0]:[32]:[172.16.4.100]
                    172.16.4.100                           0 65004 i
Route Distinguisher: 172.16.4.1:10200
*> [3]:[0]:[32]:[172.16.4.100]
                    172.16.4.100                           0 65004 i
Route Distinguisher: 172.16.4.2:10100
*> [2]:[0]:[0]:[48]:[00:37:c4:56:b4:01]
                    172.16.4.100                           0 65004 i
*> [3]:[0]:[32]:[172.16.4.100]
                    172.16.4.100                           0 65004 i
Route Distinguisher: 172.16.4.2:10200
*> [3]:[0]:[32]:[172.16.4.100]
                    172.16.4.100                           0 65004 i

Displayed 12 prefixes (12 paths)

 EVPN 学習経路(特定RDをドリルダウンして)

sudo vtysh から

bb03# show bgp evpn route rd 172.16.3.2:10100
EVPN type-2 prefix: [2]:[ESI]:[EthTag]:[MAClen]:[MAC]
EVPN type-3 prefix: [3]:[EthTag]:[IPlen]:[OrigIP]

BGP routing table entry for 172.16.3.2:10100:[2]:[0]:[0]:[48]:[00:37:c4:55:09:01]
Paths: (1 available, best #1)
  Advertised to non peer-group peers:
  spine31(192.0.2.9) spine32(192.0.2.11) spine41(192.0.2.13) spine42(192.0.2.15)
  Route [2]:[0]:[0]:[48]:[00:37:c4:55:09:01] VNI 10100
  65003
    172.16.3.100 from spine32(192.0.2.11) (172.16.3.2)
      Origin IGP, localpref 100, valid, external, bestpath-from-AS 65003, best
      Extended Community: RT:65003:10100 ET:8
      AddPath ID: RX 0, TX 138
      Last update: Tue Mar 21 21:51:06 2017

BGP routing table entry for 172.16.3.2:10100:[3]:[0]:[32]:[172.16.3.100]
Paths: (1 available, best #1)
  Advertised to non peer-group peers:
  spine31(192.0.2.9) spine32(192.0.2.11) spine41(192.0.2.13) spine42(192.0.2.15)
  Route [3]:[0]:[32]:[172.16.3.100]
  65003
    172.16.3.100 from spine32(192.0.2.11) (172.16.3.2)
      Origin IGP, localpref 100, valid, external, bestpath-from-AS 65003, best
      Extended Community: RT:65003:10100 ET:8
      AddPath ID: RX 0, TX 110
      Last update: Tue Mar 21 20:57:31 2017

パケットを眺める

ControlPlane

EVPN NLRI Type3(Inclusive Multicast Ethernet Tag route)

spine41bb03 と eBGP OPEN 直後に送信している UPDATE です。 (EVPN Multihoming との比較という意味で)注目すべきは Originating Router's IP Address として spine4[12] で組んでいる共有?loopback IPアドレス(172.16.4.100)が入っていることでしょう。 また Cumulus公式 / Ethernet Virtual Private Network - EVPN / Enabling EVPN with Route Distinguishers (RDs) and Route Targets (RTs)andRouteTargets(RTs)) に記載がある通り、RDRT は明示的に設定せずとも自動付与された情報が入っています。

f:id:kakkotetsu:20170511001352p:plain

EVPN NLRI Type2(MAC/IP Advertisement route)

spine41bb03node41(at VLAN100:VNI10100)MAC アドレスを広告する図です。

f:id:kakkotetsu:20170511001417p:plain

DataPlane

BUM

node31 からの ARP Request は spine31 から bb03 に送信されています。 VXLANカプセル外側の IP ヘッダを見ると、Src が 172.16.3.100 (spine3[12] の共有lo IPaddr) で Dst が 172.16.4.100(spine4[12] の共有loopback IPアドレス) になっており、各ペアが2台で共有?loopback IPアドレスを使った論理?VTEPを構成していることが分かります。 VXLAN 的には HER(Head End Replication) 動作。 なお、2017/03/21 時点で公式ページのHERに関する注意書きを読むと、HER で構成可能な VTEP 数は 128 だそうです。

Cumulus Linux verified support for up to 128 VTEPs with head end replication.

f:id:kakkotetsu:20170511001433p:plain

Unicast

spine41 から bb04 方面に送信される node41 から node31 への ICMP Echo Reply の様子。 ただのVXLANカプセル化されたパケットですが、外側のIPヘッダを見ると共有loopback同士での通信になっています。

f:id:kakkotetsu:20170511001446p:plain

MLAG 動作

単なる MLAG の切り替わりでしかなく、仮想環境での障害試験なので、超簡単に…。

トラフィックbb03 -> spine41 -> torSW401a -> node41 という経路で流れている状態で spine41 の downlink を sudo ifdown swp5 で down させると、即時 bb03 -> spine41 -> spine42 -> torSW401a -> node41 という経路に切り替わりました。 spine4[12]torSW401a に組んでいる LAG や仮想loopbackは up したままなので、特に EVPN 的な WithDrawn なども発生せずです。

おしまい

以下、所感です。

  • Cumulus Linux
    • VX の軽さが良い
    • Network Command Line Utility(NCLU) というラッパの使い勝手が良い
    • ドキュメントがちゃんと揃っているのが良い (今回とりあげたのは EA 版機能なのに)
      • だから僕の説明が雑なのは仕方ない