kakkotetsu

Nexus9000v で VxLAN+EVPN (anycast gateway 編) Appendix. IPv6エンドノード通信確認

最初に

やること/サマリ

タイトルの通り、前回記事のオマケ

構成

前回の構成 と一緒で、以下のように IPv6 セグメントを足します。

f:id:kakkotetsu:20170917230919p:plain

参考資料

前回までの Nexus9000v 設定

スタート時点の設定として、関係個所のみ show run 結果をペタリ

  • torsw101a
version 7.0(3)I6(1)
hostname torsw101a

nv overlay evpn
feature ospf
feature bgp
feature interface-vlan
feature vn-segment-vlan-based
feature lldp
clock timezone JST 9 0
feature nv overlay

vlan 1,100,300,3901
fabric forwarding anycast-gateway-mac 2020.0000.00aa
vlan 100
  vn-segment 10100
vlan 300
  vn-segment 10300
vlan 3901
  vn-segment 50001

vrf context VRF001
  vni 50001
  rd auto
  address-family ipv4 unicast
    route-target both auto
    route-target both auto evpn
vrf context management

interface Vlan1

interface Vlan100
  no shutdown
  vrf member VRF001
  no ip redirects
  ip address 192.168.1.254/24
  fabric forwarding mode anycast-gateway

interface Vlan300
  no shutdown
  vrf member VRF001
  no ip redirects
  ip address 192.168.3.254/24
  fabric forwarding mode anycast-gateway

interface Vlan3901
  no shutdown
  vrf member VRF001
  ip forward

interface nve1
  no shutdown
  source-interface loopback1
  host-reachability protocol bgp
  member vni 10001-10300
    ingress-replication protocol bgp
  member vni 50001 associate-vrf

interface Ethernet1/1
  description DEV=node11 IF=ens4
  switchport access vlan 100

interface Ethernet1/2
  description DEV=node13 IF=ens4
  switchport access vlan 300

interface Ethernet1/8
  description DEV=spine001 IF=Eth1/1
  no switchport
  mtu 9216
  ip address 192.0.2.1/31
  ip ospf network point-to-point
  ip router ospf OSPF_UNDERLAY area 0.0.0.0
  no shutdown

interface loopback0
  ip address 172.16.1.1/32
  ip router ospf OSPF_UNDERLAY area 0.0.0.0

interface loopback1
  ip address 198.18.1.11/32
  ip router ospf OSPF_UNDERLAY area 0.0.0.0

router ospf OSPF_UNDERLAY
  router-id 172.16.1.1
router bgp 64512
  neighbor 172.31.0.1
    remote-as 64512
    update-source loopback0
    address-family l2vpn evpn
      send-community
      send-community extended
  vrf VRF001
    address-family ipv4 unicast
      advertise l2vpn evpn
evpn
  vni 10100 l2
    rd auto
    route-target import auto
    route-target export auto
  vni 10300 l2
    rd auto
    route-target import auto
    route-target export auto
  • torsw201a
version 7.0(3)I6(1)
hostname torsw201a

nv overlay evpn
feature ospf
feature bgp
feature interface-vlan
feature vn-segment-vlan-based
feature lldp
clock timezone JST 9 0
feature nv overlay

vlan 1,100,200,3901
fabric forwarding anycast-gateway-mac 2020.0000.00aa
vlan 100
  vn-segment 10100
vlan 200
  vn-segment 10200
vlan 3901
  vn-segment 50001

vrf context VRF001
  vni 50001
  rd auto
  address-family ipv4 unicast
    route-target both auto
    route-target both auto evpn
vrf context management

interface Vlan1

interface Vlan100
  no shutdown
  vrf member VRF001
  no ip redirects
  ip address 192.168.1.254/24
  fabric forwarding mode anycast-gateway

interface Vlan200
  no shutdown
  vrf member VRF001
  no ip redirects
  ip address 192.168.2.254/24
  fabric forwarding mode anycast-gateway

interface Vlan3901
  no shutdown
  vrf member VRF001
  ip forward

interface nve1
  no shutdown
  source-interface loopback1
  host-reachability protocol bgp
  member vni 10001-10300
    ingress-replication protocol bgp
  member vni 50001 associate-vrf

interface Ethernet1/1
  description DEV=node21 IF=ens4
  switchport access vlan 100

interface Ethernet1/2
  description DEV=node22 IF=ens4
  switchport access vlan 200

interface Ethernet1/8
  description DEV=spine001 IF=Eth1/2
  no switchport
  mtu 9216
  ip address 192.0.2.3/31
  ip ospf network point-to-point
  ip router ospf OSPF_UNDERLAY area 0.0.0.0
  no shutdown

interface loopback0
  ip address 172.16.2.1/32
  ip router ospf OSPF_UNDERLAY area 0.0.0.0

interface loopback1
  ip address 198.18.1.21/32
  ip router ospf OSPF_UNDERLAY area 0.0.0.0

router ospf OSPF_UNDERLAY
  router-id 172.16.2.1
router bgp 64512
  neighbor 172.31.0.1
    remote-as 64512
    update-source loopback0
    address-family l2vpn evpn
      send-community
      send-community extended
  vrf VRF001
    address-family ipv4 unicast
      advertise l2vpn evpn
evpn
  vni 10100 l2
    rd auto
    route-target import auto
    route-target export auto
  vni 10200 l2
    rd auto
    route-target import auto
    route-target export auto
  • swpine001
version 7.0(3)I6(1)
hostname spine001

nv overlay evpn
feature ospf
feature bgp
feature lldp
clock timezone JST 9 0

interface Ethernet1/1
  description DEV=torsw101a IF=Eth1/8
  no switchport
  mtu 9216
  ip address 192.0.2.0/31
  ip ospf network point-to-point
  ip router ospf OSPF_UNDERLAY area 0.0.0.0
  no shutdown

interface Ethernet1/2
  description DEV=torsw201a IF=Eth1/8
  no switchport
  mtu 9216
  ip address 192.0.2.2/31
  ip ospf network point-to-point
  ip router ospf OSPF_UNDERLAY area 0.0.0.0
  no shutdown

interface loopback0
  ip address 172.31.0.1/32
  ip router ospf OSPF_UNDERLAY area 0.0.0.

router ospf OSPF_UNDERLAY
  router-id 172.31.0.1
router bgp 64512
  neighbor 172.16.1.1
    remote-as 64512
    update-source loopback0
    address-family l2vpn evpn
      send-community
      send-community extended
      route-reflector-client
  neighbor 172.16.2.1
    remote-as 64512
    update-source loopback0
    address-family l2vpn evpn
      send-community
      send-community extended
      route-reflector-client

構築

node 群の設定

通信確認用ノード群の関連設定を貼っておきます。

  • node11
kotetsu@node11:~$ ip a show dev ens4
3: ens4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 00:21:96:9a:03:01 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.1/24 brd 192.168.1.255 scope global ens4
       valid_lft forever preferred_lft forever
    inet6 fd00:0:0:1::1/64 scope global
       valid_lft forever preferred_lft forever
    inet6 fe80::221:96ff:fe9a:301/64 scope link
       valid_lft forever preferred_lft forever

kotetsu@node11:~$ ip -6 r show dev ens4
fd00:0:0:1::/64  proto kernel  metric 256  pref medium
fe80::/64  proto kernel  metric 256  pref medium
default via fd00:0:0:1::fe  metric 1024  pref medium
  • node13
kotetsu@node13:~$ ip a show dev ens4
3: ens4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 00:21:96:3d:6e:01 brd ff:ff:ff:ff:ff:ff
    inet 192.168.3.1/24 brd 192.168.3.255 scope global ens4
       valid_lft forever preferred_lft forever
    inet6 fd00:0:0:3::1/64 scope global
       valid_lft forever preferred_lft forever
    inet6 fe80::221:96ff:fe3d:6e01/64 scope link
       valid_lft forever preferred_lft forever

kotetsu@node13:~$ ip -6 r show dev ens4
fd00:0:0:3::/64  proto kernel  metric 256  pref medium
fe80::/64  proto kernel  metric 256  pref medium
default via fd00:0:0:3::fe  metric 1024  pref medium
  • node21
kotetsu@node21:~$ ip a show dev ens4
3: ens4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 00:21:96:9f:c7:01 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.2/24 brd 192.168.1.255 scope global ens4
       valid_lft forever preferred_lft forever
    inet6 fd00:0:0:1::2/64 scope global
       valid_lft forever preferred_lft forever
    inet6 fe80::221:96ff:fe9f:c701/64 scope link
       valid_lft forever preferred_lft forever

kotetsu@node21:~$ ip -6 r show dev ens4
fd00:0:0:1::/64  proto kernel  metric 256  pref medium
fe80::/64  proto kernel  metric 256  pref medium
default via fd00:0:0:1::fe  metric 1024  pref medium
  • node22
kotetsu@node22:~$ ip a show dev ens4
3: ens4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 00:21:96:42:5f:01 brd ff:ff:ff:ff:ff:ff
    inet 192.168.2.1/24 brd 192.168.2.255 scope global ens4
       valid_lft forever preferred_lft forever
    inet6 fd00:0:0:2::1/64 scope global
       valid_lft forever preferred_lft forever
    inet6 fe80::221:96ff:fe42:5f01/64 scope link
       valid_lft forever preferred_lft forever

kotetsu@node22:~$ ip -6 r show dev ens4
fd00:0:0:2::/64  proto kernel  metric 256  pref medium
fe80::/64  proto kernel  metric 256  pref medium
default via fd00:0:0:2::fe  metric 1024  pref medium

Nexus9000v 追加設定

IPv6 関係の設定追加内容は以下の通り
interface vlan 3901 という L3VNI 用の SVI でも ipv6 forward 設定をしておかないと、受信した VxLAN パケットをノード側に転送してくれないです。

  • torsw101a
interface vlan 100
 ipv6 address fd00:0:0:1::fe/64

interface vlan 300
 ipv6 address fd00:0:0:3::fe/64

interface vlan 3901
 ipv6 forward

vrf context VRF001
  address-family ipv6 unicast
    route-target both auto
    route-target both auto evpn
  • torsw201a
interface vlan 100
 ipv6 address fd00:0:0:1::fe/64

interface vlan 200
 ipv6 address fd00:0:0:2::fe/64

interface vlan 3901
 ipv6 forward

vrf context VRF001
  address-family ipv6 unicast
    route-target both auto
    route-target both auto evpn

Nexus9000v 各種テーブル確認

ノード間がフルメッシュで IPv6 での通信が可能になったので、通信確認後の Nexus9000v テーブル情報を。

EVPN 学習経路情報

torsw101a# show bgp l2vpn evpn
BGP routing table information for VRF default, address family L2VPN EVPN
BGP table version is 15946, local router ID is 172.16.1.1
Status: s-suppressed, x-deleted, S-stale, d-dampened, h-history, *-valid, >-best
Path type: i-internal, e-external, c-confed, l-local, a-aggregate, r-redist, I-injected
Origin codes: i - IGP, e - EGP, ? - incomplete, | - multipath, & - backup

   Network            Next Hop            Metric     LocPrf     Weight Path
Route Distinguisher: 172.16.1.1:32867    (L2VNI 10100)
*>l[2]:[0]:[0]:[48]:[0021.969a.0301]:[0]:[0.0.0.0]/216
                      198.18.1.11                       100      32768 i
*>i[2]:[0]:[0]:[48]:[0021.969f.c701]:[0]:[0.0.0.0]/216
                      198.18.1.21                       100          0 i
*>l[2]:[0]:[0]:[48]:[0021.969a.0301]:[32]:[192.168.1.1]/272
                      198.18.1.11                       100      32768 i
*>i[2]:[0]:[0]:[48]:[0021.969f.c701]:[32]:[192.168.1.2]/272
                      198.18.1.21                       100          0 i
*>l[2]:[0]:[0]:[48]:[0021.969a.0301]:[128]:[fd00:0:0:1::1]/368
                      198.18.1.11                       100      32768 i
*>i[2]:[0]:[0]:[48]:[0021.969f.c701]:[128]:[fd00:0:0:1::2]/368
                      198.18.1.21                       100          0 i
*>l[3]:[0]:[32]:[198.18.1.11]/88
                      198.18.1.11                       100      32768 i
*>i[3]:[0]:[32]:[198.18.1.21]/88
                      198.18.1.21                       100          0 i

Route Distinguisher: 172.16.1.1:33067    (L2VNI 10300)
*>l[2]:[0]:[0]:[48]:[0021.963d.6e01]:[0]:[0.0.0.0]/216
                      198.18.1.11                       100      32768 i
*>l[2]:[0]:[0]:[48]:[0021.963d.6e01]:[32]:[192.168.3.1]/272
                      198.18.1.11                       100      32768 i
*>l[2]:[0]:[0]:[48]:[0021.963d.6e01]:[128]:[fd00:0:0:3::1]/368
                      198.18.1.11                       100      32768 i
*>l[3]:[0]:[32]:[198.18.1.11]/88
                      198.18.1.11                       100      32768 i

Route Distinguisher: 172.16.2.1:32867
*>i[2]:[0]:[0]:[48]:[0021.969f.c701]:[0]:[0.0.0.0]/216
                      198.18.1.21                       100          0 i
*>i[2]:[0]:[0]:[48]:[0021.969f.c701]:[32]:[192.168.1.2]/272
                      198.18.1.21                       100          0 i
*>i[2]:[0]:[0]:[48]:[0021.969f.c701]:[128]:[fd00:0:0:1::2]/368
                      198.18.1.21                       100          0 i
*>i[3]:[0]:[32]:[198.18.1.21]/88
                      198.18.1.21                       100          0 i

Route Distinguisher: 172.16.2.1:32967
*>i[2]:[0]:[0]:[48]:[0021.9642.5f01]:[32]:[192.168.2.1]/272
                      198.18.1.21                       100          0 i
*>i[2]:[0]:[0]:[48]:[0021.9642.5f01]:[128]:[fd00:0:0:2::1]/368
                      198.18.1.21                       100          0 i

Route Distinguisher: 172.16.1.1:3    (L3VNI 50001)
*>i[2]:[0]:[0]:[48]:[0021.9642.5f01]:[32]:[192.168.2.1]/272
                      198.18.1.21                       100          0 i
*>i[2]:[0]:[0]:[48]:[0021.969f.c701]:[32]:[192.168.1.2]/272
                      198.18.1.21                       100          0 i
*>i[2]:[0]:[0]:[48]:[0021.9642.5f01]:[128]:[fd00:0:0:2::1]/368
                      198.18.1.21                       100          0 i
*>i[2]:[0]:[0]:[48]:[0021.969f.c701]:[128]:[fd00:0:0:1::2]/368
                      198.18.1.21                       100          0 i
torsw201a# show bgp l2vpn evpn
BGP routing table information for VRF default, address family L2VPN EVPN
BGP table version is 16191, local router ID is 172.16.2.1
Status: s-suppressed, x-deleted, S-stale, d-dampened, h-history, *-valid, >-best
Path type: i-internal, e-external, c-confed, l-local, a-aggregate, r-redist, I-injected
Origin codes: i - IGP, e - EGP, ? - incomplete, | - multipath, & - backup

   Network            Next Hop            Metric     LocPrf     Weight Path
Route Distinguisher: 172.16.1.1:32867
*>i[2]:[0]:[0]:[48]:[0021.969a.0301]:[0]:[0.0.0.0]/216
                      198.18.1.11                       100          0 i
*>i[2]:[0]:[0]:[48]:[0021.969a.0301]:[32]:[192.168.1.1]/272
                      198.18.1.11                       100          0 i
*>i[2]:[0]:[0]:[48]:[0021.969a.0301]:[128]:[fd00:0:0:1::1]/368
                      198.18.1.11                       100          0 i
*>i[3]:[0]:[32]:[198.18.1.11]/88
                      198.18.1.11                       100          0 i

Route Distinguisher: 172.16.1.1:33067
*>i[2]:[0]:[0]:[48]:[0021.963d.6e01]:[32]:[192.168.3.1]/272
                      198.18.1.11                       100          0 i
*>i[2]:[0]:[0]:[48]:[0021.963d.6e01]:[128]:[fd00:0:0:3::1]/368
                      198.18.1.11                       100          0 i

Route Distinguisher: 172.16.2.1:32867    (L2VNI 10100)
*>i[2]:[0]:[0]:[48]:[0021.969a.0301]:[0]:[0.0.0.0]/216
                      198.18.1.11                       100          0 i
*>l[2]:[0]:[0]:[48]:[0021.969f.c701]:[0]:[0.0.0.0]/216
                      198.18.1.21                       100      32768 i
*>i[2]:[0]:[0]:[48]:[0021.969a.0301]:[32]:[192.168.1.1]/272
                      198.18.1.11                       100          0 i
*>l[2]:[0]:[0]:[48]:[0021.969f.c701]:[32]:[192.168.1.2]/272
                      198.18.1.21                       100      32768 i
*>i[2]:[0]:[0]:[48]:[0021.969a.0301]:[128]:[fd00:0:0:1::1]/368
                      198.18.1.11                       100          0 i
*>l[2]:[0]:[0]:[48]:[0021.969f.c701]:[128]:[fd00:0:0:1::2]/368
                      198.18.1.21                       100      32768 i
*>i[3]:[0]:[32]:[198.18.1.11]/88
                      198.18.1.11                       100          0 i
*>l[3]:[0]:[32]:[198.18.1.21]/88
                      198.18.1.21                       100      32768 i

Route Distinguisher: 172.16.2.1:32967    (L2VNI 10200)
*>l[2]:[0]:[0]:[48]:[0021.9642.5f01]:[0]:[0.0.0.0]/216
                      198.18.1.21                       100      32768 i
*>l[2]:[0]:[0]:[48]:[0021.9642.5f01]:[32]:[192.168.2.1]/272
                      198.18.1.21                       100      32768 i
*>l[2]:[0]:[0]:[48]:[0021.9642.5f01]:[128]:[fd00:0:0:2::1]/368
                      198.18.1.21                       100      32768 i
*>l[3]:[0]:[32]:[198.18.1.21]/88
                      198.18.1.21                       100      32768 i

Route Distinguisher: 172.16.2.1:3    (L3VNI 50001)
*>i[2]:[0]:[0]:[48]:[0021.963d.6e01]:[32]:[192.168.3.1]/272
                      198.18.1.11                       100          0 i
*>i[2]:[0]:[0]:[48]:[0021.969a.0301]:[32]:[192.168.1.1]/272
                      198.18.1.11                       100          0 i
*>i[2]:[0]:[0]:[48]:[0021.963d.6e01]:[128]:[fd00:0:0:3::1]/368
                      198.18.1.11                       100          0 i
*>i[2]:[0]:[0]:[48]:[0021.969a.0301]:[128]:[fd00:0:0:1::1]/368
                      198.18.1.11                       100          0 i

ドリルダウンして、特定経路の詳細を見るとこんな感じ。
出力情報の解説は Cisco 公式 / Cisco Programmable Fabric with VXLAN BGP EVPN Configuration Guide / Chapter: Unicast Forwarding の下の方をご参照くださいませ。

torsw101anode22 (torsw201a 配下) の情報を見たものです。

torsw101a# show bgp l2vpn evpn fd00:0:0:2::1
BGP routing table information for VRF default, address family L2VPN EVPN
Route Distinguisher: 172.16.2.1:32967
BGP routing table entry for [2]:[0]:[0]:[48]:[0021.9642.5f01]:[128]:[fd00:0:0:2::1]/368, version 15909
Paths: (1 available, best #1)
Flags: (0x000202) on xmit-list, is not in l2rib/evpn, is not in HW, is locked

  Advertised path-id 1
  Path type: internal, path is valid, is best path, no labeled nexthop
  AS-Path: NONE, path sourced internal to AS
    198.18.1.21 (metric 81) from 172.31.0.1 (172.31.0.1)
      Origin IGP, MED not set, localpref 100, weight 0
      Received label 10200 50001
      Extcommunity:  RT:64512:10200 RT:64512:50001 ENCAP:8 Router MAC:0021.9643.7607
      Originator: 172.16.2.1 Cluster list: 172.31.0.1

  Path-id 1 not advertised to any peer

Route Distinguisher: 172.16.1.1:3    (L3VNI 50001)
BGP routing table entry for [2]:[0]:[0]:[48]:[0021.9642.5f01]:[128]:[fd00:0:0:2::1]/368, version 15912
Paths: (1 available, best #1)
Flags: (0x000202) on xmit-list, is not in l2rib/evpn, is not in HW

  Advertised path-id 1
  Path type: internal, path is valid, is best path, no labeled nexthop
             Imported from 172.16.2.1:32967:[2]:[0]:[0]:[48]:[0021.9642.5f01]:[128]:[fd00:0:0:2::1]/240
  AS-Path: NONE, path sourced internal to AS
    198.18.1.21 (metric 81) from 172.31.0.1 (172.31.0.1)
      Origin IGP, MED not set, localpref 100, weight 0
      Received label 10200 50001
      Extcommunity:  RT:64512:10200 RT:64512:50001 ENCAP:8 Router MAC:0021.9643.7607
      Originator: 172.16.2.1 Cluster list: 172.31.0.1

  Path-id 1 not advertised to any peer

VRF のルーティングテーブル(IPv6)

torsw101a# show ipv6 route vrf VRF001
IPv6 Routing Table for VRF "VRF001"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]

fd00:0:0:1::/64, ubest/mbest: 1/0, attached
    *via fd00:0:0:1::fe, Vlan100, [0/0], 01:27:22, direct,
fd00:0:0:1::1/128, ubest/mbest: 1/0, attached
    *via fd00:0:0:1::1, Vlan100, [190/0], 01:24:44, hmm
fd00:0:0:1::2/128, ubest/mbest: 1/0
    *via ::ffff:198.18.1.21%default:IPv4, [200/0], 01:14:49, bgp-64512, internal, tag 64512 (evpn) segid 50001 tunnel: 0xc6120115 encap: VXLAN

fd00:0:0:1::fe/128, ubest/mbest: 1/0, attached
    *via fd00:0:0:1::fe, Vlan100, [0/0], 01:27:22, local
fd00:0:0:2::1/128, ubest/mbest: 1/0
    *via ::ffff:198.18.1.21%default:IPv4, [200/0], 01:14:49, bgp-64512, internal, tag 64512 (evpn) segid 50001 tunnel: 0xc6120115 encap: VXLAN

fd00:0:0:3::/64, ubest/mbest: 1/0, attached
    *via fd00:0:0:3::fe, Vlan300, [0/0], 01:26:13, direct,
fd00:0:0:3::1/128, ubest/mbest: 1/0, attached
    *via fd00:0:0:3::1, Vlan300, [190/0], 01:24:45, hmm
fd00:0:0:3::fe/128, ubest/mbest: 1/0, attached
    *via fd00:0:0:3::fe, Vlan300, [0/0], 01:26:13, local
torsw201a# show ipv6 route vrf VRF001
IPv6 Routing Table for VRF "VRF001"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]

fd00:0:0:1::/64, ubest/mbest: 1/0, attached
    *via fd00:0:0:1::fe, Vlan100, [0/0], 01:27:51, direct,
fd00:0:0:1::1/128, ubest/mbest: 1/0
    *via ::ffff:198.18.1.11%default:IPv4, [200/0], 01:15:31, bgp-64512, internal, tag 64512 (evpn) segid 50001 tunnel: 0xc612010b encap: VXLAN

fd00:0:0:1::2/128, ubest/mbest: 1/0, attached
    *via fd00:0:0:1::2, Vlan100, [190/0], 01:17:26, hmm
fd00:0:0:1::fe/128, ubest/mbest: 1/0, attached
    *via fd00:0:0:1::fe, Vlan100, [0/0], 01:27:51, local
fd00:0:0:2::/64, ubest/mbest: 1/0, attached
    *via fd00:0:0:2::fe, Vlan200, [0/0], 01:26:04, direct,
fd00:0:0:2::1/128, ubest/mbest: 1/0, attached
    *via fd00:0:0:2::1, Vlan200, [190/0], 01:17:27, hmm
fd00:0:0:2::fe/128, ubest/mbest: 1/0, attached
    *via fd00:0:0:2::fe, Vlan200, [0/0], 01:26:04, local
fd00:0:0:3::1/128, ubest/mbest: 1/0
    *via ::ffff:198.18.1.11%default:IPv4, [200/0], 01:15:31, bgp-64512, internal, tag 64512 (evpn) segid 50001 tunnel: 0xc612010b encap: VXLAN

VRF のND テーブル

torsw101a# show ipv6 neighbor vrf VRF001

Flags: # - Adjacencies Throttled for Glean
       G - Adjacencies of vPC peer with G/W bit
       R - Adjacencies learnt remotely
       CP - Added via L2RIB, Control plane Adjacencies
       PS - Added via L2RIB, Peer Sync
       RO - Dervied from L2RIB Peer Sync Entry

IPv6 Adjacency Table for VRF VRF001
Total number of entries: 4
Address         Age       MAC Address     Pref Source     Interface
fd00:0:0:3::1   07:50:57  0021.963d.6e01  50   icmpv6     Vlan300
fe80::221:96ff:fe3d:6e01
                07:50:52  0021.963d.6e01  50   icmpv6     Vlan300
fd00:0:0:1::1   07:50:56  0021.969a.0301  50   icmpv6     Vlan100
fe80::221:96ff:fe9a:301
                07:50:57  0021.969a.0301  50   icmpv6     Vlan100
torsw201a# show ipv6 neighbor vrf VRF001

Flags: # - Adjacencies Throttled for Glean
       G - Adjacencies of vPC peer with G/W bit
       R - Adjacencies learnt remotely
       CP - Added via L2RIB, Control plane Adjacencies
       PS - Added via L2RIB, Peer Sync
       RO - Dervied from L2RIB Peer Sync Entry

IPv6 Adjacency Table for VRF VRF001
Total number of entries: 4
Address         Age       MAC Address     Pref Source     Interface
fd00:0:0:1::2   07:43:00  0021.969f.c701  50   icmpv6     Vlan100
fe80::221:96ff:fe9f:c701
                07:43:01  0021.969f.c701  50   icmpv6     Vlan100
fd00:0:0:2::1   07:43:01  0021.9642.5f01  50   icmpv6     Vlan200
fe80::221:96ff:fe42:5f01
                07:42:56  0021.9642.5f01  50   icmpv6     Vlan200

おしまい

Overlay 側は、別に IPv4 だろうが IPv6 だろうが変わりないですね、というだけの話でした。

Nexus9000v で VxLAN+EVPN (anycast gateway 編)

最初に

本項でやること / 概要構成図

以前 Juniper vQFX で似たようなことを試した のですが、その Nexus9000v 版です。
実装の違いにより、完全に同じではないですが。

構成としてはこんな感じで

f:id:kakkotetsu:20170917231256p:plain

肝になる EVPN 周りの動作はこんな感じで

f:id:kakkotetsu:20170911235700p:plain

テナントから見るとこんな風かな、という絵

f:id:kakkotetsu:20170911235733p:plain

参考資料

Building Data Centers with VXLAN BGP EVPN: A Cisco NX-OS Perspective (Networking Technology)

Building Data Centers with VXLAN BGP EVPN: A Cisco NX-OS Perspective (Networking Technology)

環境情報

KVM 母艦と GNS3 は以下の感じで(前回 から Ubuntu と GNS3 のバージョンアップしているので一応)

$ uname -a
Linux kvm01 4.4.0-93-generic #116-Ubuntu SMP Fri Aug 11 21:17:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04.3 LTS"

$ virsh -v
1.3.1

$ qemu-system-x86_64 --version
QEMU emulator version 2.5.0 (Debian 1:2.5+dfsg-5ubuntu10.14), Copyright (c) 2003-2008 Fabrice Bellard

$ gns3 --version
2.0.3

Nexus9000v2017/08/16 現在ダウンロード可能な最新の nxosv-final.7.0.3.I6.1.qcow2OVMF2016/08/13 時点のビルド版ぽい(MARKETPLACE で降ってきたのを使っただけ)

構築

Nexus9000v デプロイ

前回の記事 の感じで、ポチポチとデプロイしていきます。
今回、メモリは全て最低要件を狙って 4096MB としてあります。(Memory Usage Warning みたいなのは Syslog に出ていたけれど…)

同様に、疎通確認用のノードもいくらか用意しておきます。

f:id:kakkotetsu:20170912000052p:plain

Nexus9000v 物理IF 設定

Nexus 同士の部分をば。VxLAN 渡すので MTU は大きめに。
あと、NXOS の特徴的なところで feature なにがし で有効化しないと設定コマンドの候補も出てこないので、使う機能は feature コマンドでまず有効化する必要ありです。

  • spine001
feature lldp

interface Ethernet1/1
  description DEV=torsw101a IF=Eth1/8
  no switchport
  mtu 9216
  ip address 192.0.2.0/31
  no shutdown

interface Ethernet1/2
  description DEV=torsw201a IF=Eth1/8
  no switchport
  mtu 9216
  ip address 192.0.2.2/31
  no shutdown
  • torsw101a
feature lldp

interface Ethernet1/8
  description DEV=spine001 IF=Eth1/1
  no switchport
  mtu 9216
  ip address 192.0.2.1/31
  no shutdown
  • torsw201a
feature lldp

interface Ethernet1/8
  description DEV=spine001 IF=Eth1/2
  no switchport
  mtu 9216
  ip address 192.0.2.3/31
  no shutdown

Nexus9000v Underlay 設定

Cisco 公式 / Cisco Programmable Fabric with VXLAN BGP EVPN Configuration Guide / Chapter: IP Fabric Underlay あたりを参考に。

IP Unnumbered も動くみたいですが ip unnumbered loopback0 は コマンドが通らなかったので、前章の通り普通に IP アドレス降ってます。(何かの feature を有効化する必要あるのか、仮想版の機能制約かは未調査)

プロトコル選択肢としては IS-ISeBGP なども使えるようですが、「ちゃんとテストしているのは OSPF と IS-IS だ」って書いてあったので、あまり考えずに OSPF 使ってます。
これまでの記事(vQFX や Cumulus Linux)では大体 eBGP 使っていましたが、NXOS の設定体系で BGP に Underlay と Overlay を混ぜ込むと(自分的に)分かりが良くなかったというのもありますが。

f:id:kakkotetsu:20170912000111p:plain

設定

  • spine01
feature ospf

interface loopback0
  ip address 172.31.0.1/32

router ospf OSPF_UNDERLAY
  router-id 172.31.0.1

interface Ethernet1/1
  ip ospf network point-to-point
  ip router ospf OSPF_UNDERLAY area 0

interface Ethernet1/2
  ip ospf network point-to-point
  ip router ospf OSPF_UNDERLAY area 0

interface loopback0
  ip router ospf OSPF_UNDERLAY area 0.0.0.0
  • torsw101a
feature ospf

interface loopback0
  ip address 172.16.1.1/32

router ospf OSPF_UNDERLAY
  router-id 172.16.1.1

interface Ethernet1/8
  ip ospf network point-to-point
  ip router ospf OSPF_UNDERLAY area 0

interface loopback0
  ip router ospf OSPF_UNDERLAY area 0
  • torsw201a
feature ospf

interface loopback0
  ip address 172.16.2.1/32

router ospf OSPF_UNDERLAY
  router-id 172.16.2.1

interface Ethernet1/8
  ip ospf network point-to-point
  ip router ospf OSPF_UNDERLAY area 0

interface loopback0
  ip router ospf OSPF_UNDERLAY area 0

エリア0オンリーでやってますが、大規模環境ではエリア分けも検討するのが良いでしょうかね。

簡易動作確認

1台分をチラ見。

spine001# show ip route
IP Route Table for VRF "default"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>

172.16.1.1/32, ubest/mbest: 1/0
    *via 192.0.2.1, Eth1/1, [110/41], 00:12:23, ospf-OSPF_UNDERLAY, intra
172.16.2.1/32, ubest/mbest: 1/0
    *via 192.0.2.3, Eth1/2, [110/41], 00:03:39, ospf-OSPF_UNDERLAY, intra
172.31.0.1/32, ubest/mbest: 2/0, attached
    *via 172.31.0.1, Lo0, [0/0], 00:35:13, local
    *via 172.31.0.1, Lo0, [0/0], 00:35:13, direct
192.0.2.0/31, ubest/mbest: 1/0, attached
    *via 192.0.2.0, Eth1/1, [0/0], 00:22:47, direct
192.0.2.0/32, ubest/mbest: 1/0, attached
    *via 192.0.2.0, Eth1/1, [0/0], 00:22:47, local
192.0.2.2/31, ubest/mbest: 1/0, attached
    *via 192.0.2.2, Eth1/2, [0/0], 00:22:37, direct
192.0.2.2/32, ubest/mbest: 1/0, attached
    *via 192.0.2.2, Eth1/2, [0/0], 00:22:37, local


spine001# show ip ospf neighbors
 OSPF Process ID OSPF_UNDERLAY VRF default
 Total number of neighbors: 2
 Neighbor ID     Pri State            Up Time  Address         Interface
 172.16.1.1        1 FULL/ -          00:21:16 192.0.2.1       Eth1/1
 172.16.2.1        1 FULL/ -          00:12:01 192.0.2.3       Eth1/2

Nexus9000v Overlay 設定

Cisco 公式 / Cisco Programmable Fabric with VXLAN BGP EVPN Configuration Guide / Chapter: Forwarding Configurations / Cisco Nexus 9000 Series switch configuration あたりを参考に。

f:id:kakkotetsu:20170912000126p:plain

  • spine001
feature bgp
nv overlay evpn

router bgp 64512
  neighbor 172.16.1.1
    remote-as 64512
    update-source loopback0
    address-family l2vpn evpn
      send-community
      send-community extended
      route-reflector-client
  neighbor 172.16.2.1
    remote-as 64512
    update-source loopback0
    address-family l2vpn evpn
      send-community
      send-community extended
      route-reflector-client
  • torsw101a
feature bgp
nv overlay evpn

router bgp 64512
  neighbor 172.31.0.1
    remote-as 64512
    update-source loopback0
    address-family l2vpn evpn
      send-community
      send-community extended
  • torsw201a
feature bgp
nv overlay evpn

router bgp 64512
  neighbor 172.31.0.1
    remote-as 64512
    update-source loopback0
    address-family l2vpn evpn
      send-community
      send-community extended

Nexus9000v VxLAN + EVPN 設定

設定の羅列を。
! で軽くポイントをコメント入れておきます…(torsw101atorsw201a はほぼ一緒の設定なので、torsw101a 側のみに)。

なお、spine001 は Control Plane としては MP-BGP の RR 動作しますがその設定は済んでおり、Data Plane としては VxLAN 通信 の土管にしかならないので、本項では何も追加設定なしです。

  • torsw101a
! anycast gateway の仮想 MAC アドレス
! fabric forwarding mode anycast-gateway 設定した SVI 全てで使われる Global 設定
fabric forwarding anycast-gateway-mac 20:20:00:00:00:AA

feature vn-segment-vlan-based
! L2 VNI と VLAN ID の mapping
vlan 100
 vn-segment 10100
vlan 300
 vn-segment 10300

! VTEP 間 VxLAN 通信の Src/Dst IP アドレスとなる loopback (先に設定した loopback0 は EVPN Signaling 用 iBGP 用)
interface loopback1
  ip address 198.18.1.11/32
  ip router ospf OSPF_UNDERLAY area 0.0.0.0

feature nv overlay
! Network Virtualization Endpoint (VTEP 用デバイス?インターフェース? 適切な言葉が...)
interface nve1
  source-interface loopback1
  host-reachability protocol bgp

! テナント向け VRF
! ちなみにデフォルトで mgmt は management VRF に所属している
vrf context VRF001
  ! 本 VRF 専用の L3VNI
  vni 50001
  rd auto
  address-family ipv4 unicast
    route-target both auto
    route-target both auto evpn

interface nve1
  ! - や , で指定可能
  ! ただし上限数があって 10001-14094 とVLAN 4094 分やろうとしたらエラーになった
  member vni 10001-10300
    ! BUM トラフィック処理方法(VTEP 間) は Ingress Replication でユニキャスト通信させる
    ! マルチキャストも選択可能だが、そもそもこんな調査・検証している理由のひとつに「マルチキャストルーティング使いたくない」もあるので
    ingress-replication protocol bgp
  member vni 50001 associate-vrf
  no shutdown

router bgp 64512
  vrf VRF001
    address-family ipv4 unicast
      advertise l2vpn evpn

! テナント用 SVI 群
! SVI 200 は torsw201a 側のみ、SVI 300 は torsw101a 側のみで OK
feature interface-vlan
interface Vlan100
  no shutdown
  vrf member VRF001
  no ip redirects
  ip address 192.168.1.254/24
  ! 本セグメントで anycast gateway を使う
  fabric forwarding mode anycast-gateway
interface Vlan300
  no shutdown
  vrf member VRF001
  no ip redirects
  ip address 192.168.3.254/24
  fabric forwarding mode anycast-gateway

! テナント エンドノード収容物理 IF 設定
! trunk VLAN でも OK
interface Ethernet1/1
  switchport access vlan 100
  description DEV=node11 IF=ens4
interface Ethernet1/2
  switchport access vlan 300
  description DEV=node13 IF=ens4

evpn
  vni 10100 l2
    rd auto
    route-target import auto
    route-target export auto
  vni 10300 l2
    rd auto
    route-target import auto
    route-target export auto

! テナント VRF の L3 VNI につきひとつ、VLAN と SVI が必要...
! なお、VLAN は 4094 全て使えるわけでなく 39XX までしかいけない
vlan 3901
  vn-segment 50001

interface Vlan3901
  no shutdown
  vrf member VRF001
  ip forward
  • torsw201a
fabric forwarding anycast-gateway-mac 20:20:00:00:00:AA

feature vn-segment-vlan-based
vlan 100
 vn-segment 10100
vlan 200
 vn-segment 10200

interface loopback1
  ip address 198.18.1.21/32
  ip router ospf OSPF_UNDERLAY area 0.0.0.0

feature nv overlay
interface nve1
  source-interface loopback1
  host-reachability protocol bgp

vrf context VRF001
  vni 50001
  rd auto
  address-family ipv4 unicast
    route-target both auto
    route-target both auto evpn

interface nve1
  member vni 10001-10300
    ingress-replication protocol bgp
  member vni 50001 associate-vrf
  no shutdown

router bgp 64512
  vrf VRF001
    address-family ipv4 unicast
      advertise l2vpn evpn

feature interface-vlan
interface Vlan100
  no shutdown
  vrf member VRF001
  no ip redirects
  ip address 192.168.1.254/24
  fabric forwarding mode anycast-gateway
interface Vlan200
  no shutdown
  vrf member VRF001
  no ip redirects
  ip address 192.168.2.254/24
  fabric forwarding mode anycast-gateway

interface Ethernet1/1
  switchport access vlan 100
  description DEV=node21 IF=ens4

interface Ethernet1/2
  switchport access vlan 200
  description DEV=node22 IF=ens4

evpn
  vni 10100 l2
    rd auto
    route-target import auto
    route-target export auto
  vni 10200 l2
    rd auto
    route-target import auto
    route-target export auto

vlan 3901
  vn-segment 50001

interface Vlan3901
  no shutdown
  vrf member VRF001
  ip forward

動作確認

Nexus9000v 各種テーブル確認

これで 4 つの node が相互通信可能になったわけですが、その通信確認後の Nexus9000v のテーブルを見ていきます。

VTEP 同士の peer 状態 / 自身の NVE 状態

torsw101a# show nve peers detail
Details of nve Peers:
----------------------------------------
Peer-Ip: 198.18.1.21
    NVE Interface       : nve1
    Peer State          : Up
    Peer Uptime         : 05:40:49
    Router-Mac          : 0021.9643.7607
    Peer First VNI      : 50001
    Time since Create   : 05:40:50
    Configured VNIs     : 10001-10300,50001
    Provision State     : add-complete
    Route-Update        : Yes
    Peer Flags          : RmacL2Rib, TunnelPD, DisableLearn
    Learnt CP VNIs      : 10100,50001
    Peer-ifindex-resp   : Yes
----------------------------------------


torsw101a# show nve internal platform interface nve 1 detail
Printing Interface ifindex 0x49000001 detail
|======|=========================|===============|===============|=====|=====|
|Intf  |State                    |PriIP          |SecIP          |Vnis |Peers|
|======|=========================|===============|===============|=====|=====|
|nve1  |UP                       |198.18.1.11    |0.0.0.0        |3    |1    |
|======|=========================|===============|===============|=====|=====|

SW_BD/VNIs of interface nve1:
================================================
|======|======|=========================|======|====|======|========
|Sw BD |Vni   |State                    |Intf  |Type|Vrf-ID|Notified
|======|======|=========================|======|====|======|========
|100   |10100 |UP                       |nve1  |CP  |0     |Yes
|300   |10300 |UP                       |nve1  |CP  |0     |Yes
|3901  |50001 |UP                       |nve1  |CP  |3     |Yes
|======|======|=========================|======|====|======|========

Peers of interface nve1:
============================================

Peer_ip: 198.18.1.21
  Peer-ID   : 1
  State     : UP
  Learning  : Disabled
  TunnelID  : 0xc6120115
  MAC       : 0021.9643.7607
  Table-ID  : 0x1
  Encap     : 0x1
torsw201a# show nve peers detail
Details of nve Peers:
----------------------------------------
Peer-Ip: 198.18.1.11
    NVE Interface       : nve1
    Peer State          : Up
    Peer Uptime         : 05:40:33
    Router-Mac          : 0021.960f.f307
    Peer First VNI      : 10100
    Time since Create   : 05:40:33
    Configured VNIs     : 10001-10300,50001
    Provision State     : add-complete
    Route-Update        : Yes
    Peer Flags          : RmacL2Rib, TunnelPD, DisableLearn
    Learnt CP VNIs      : 10100,50001
    Peer-ifindex-resp   : Yes
----------------------------------------


torsw201a# show nve internal platform interface nve 1 detail
Printing Interface ifindex 0x49000001 detail
|======|=========================|===============|===============|=====|=====|
|Intf  |State                    |PriIP          |SecIP          |Vnis |Peers|
|======|=========================|===============|===============|=====|=====|
|nve1  |UP                       |198.18.1.21    |0.0.0.0        |3    |1    |
|======|=========================|===============|===============|=====|=====|

SW_BD/VNIs of interface nve1:
================================================
|======|======|=========================|======|====|======|========
|Sw BD |Vni   |State                    |Intf  |Type|Vrf-ID|Notified
|======|======|=========================|======|====|======|========
|100   |10100 |UP                       |nve1  |CP  |0     |Yes
|200   |10200 |UP                       |nve1  |CP  |0     |Yes
|3901  |50001 |UP                       |nve1  |CP  |3     |Yes
|======|======|=========================|======|====|======|========

Peers of interface nve1:
============================================

Peer_ip: 198.18.1.11
  Peer-ID   : 1
  State     : UP
  Learning  : Disabled
  TunnelID  : 0xc612010b
  MAC       : 0021.960f.f307
  Table-ID  : 0x1
  Encap     : 0x1

あと、設定終わったと思いきや何か想定通り動かない…って時に、以下のコマンドを使いました。
以下出力例は「L3 VNI 用の SVI と VLAN が不正(前記設定の 3901 を設定していなかった)」ものですが、他にも mcast-group-or-ingress-rep-not-cfg みたいな割と分かりやすい出力もあります。

torsw101a# show nve internal vni 50001

VNI 50001
  Ready-State         : Not Ready [invalid sw-bd]

MP-BGP for EVPN Signaling Neighbor 情報

torsw101a# show bgp l2vpn evpn neighbors
BGP neighbor is 172.31.0.1, remote AS 64512, ibgp link, Peer index 3
  BGP version 4, remote router ID 172.31.0.1
  BGP state = Established, up for 1d06h
  Using loopback0 as update source for this peer
  Last read 00:00:14, hold time = 180, keepalive interval is 60 seconds
  Last written 00:00:48, keepalive timer expiry due 00:00:11
  Received 2911 messages, 0 notifications, 0 bytes in queue
  Sent 3216 messages, 0 notifications, 0 bytes in queue
  Connections established 1, dropped 0
  Last reset by us never, due to No error
  Last reset by peer never, due to No error

  Neighbor capabilities:
  Dynamic capability: advertised (mp, refresh, gr) received (mp, refresh, gr)
  Dynamic capability (old): advertised received
  Route refresh capability (new): advertised received
  Route refresh capability (old): advertised received
  4-Byte AS capability: advertised received
  Address family L2VPN EVPN: advertised received
  Graceful Restart capability: advertised received

  Graceful Restart Parameters:
  Address families advertised to peer:
    L2VPN EVPN
  Address families received from peer:
    L2VPN EVPN
  Forwarding state preserved by peer for:
  Restart time advertised to peer: 120 seconds
  Stale time for routes advertised by peer: 300 seconds
  Restart time advertised by peer: 120 seconds
  Extended Next Hop Encoding Capability: advertised received
  Receive IPv6 next hop encoding Capability for AF:
    IPv4 Unicast

  Message statistics:
                              Sent               Rcvd
  Opens:                         1                  1
  Notifications:                 0                  0
  Updates:                    1443               1449
  Keepalives:                 1771               1460
  Route Refresh:                 0                  0
  Capability:                    1                  1
  Total:                      3216               2911
  Total bytes:              155694             160466
  Bytes in queue:                0                  0

  For address family: L2VPN EVPN
  BGP table version 2897, neighbor version 2897
  4 accepted paths consume 512 bytes of memory
  6 sent paths
  Community attribute sent to this neighbor
  Extended community attribute sent to this neighbor
  Third-party Nexthop will not be computed.
  Last End-of-RIB received 00:00:01 after session start

  Local host: 172.16.1.1, Local port: 19618
  Foreign host: 172.31.0.1, Foreign port: 179
  fd = 76

EVPN 学習経路情報

Cumulus Linux と似た出力フォーマットですね。

torsw101a# show bgp l2vpn evpn
BGP routing table information for VRF default, address family L2VPN EVPN
BGP table version is 2893, local router ID is 172.16.1.1
Status: s-suppressed, x-deleted, S-stale, d-dampened, h-history, *-valid, >-best
Path type: i-internal, e-external, c-confed, l-local, a-aggregate, r-redist, I-injected
Origin codes: i - IGP, e - EGP, ? - incomplete, | - multipath, & - backup

   Network            Next Hop            Metric     LocPrf     Weight Path
Route Distinguisher: 172.16.1.1:32867    (L2VNI 10100)
*>l[2]:[0]:[0]:[48]:[0021.969a.0301]:[0]:[0.0.0.0]/216
                      198.18.1.11                       100      32768 i
*>i[2]:[0]:[0]:[48]:[0021.969f.c701]:[0]:[0.0.0.0]/216
                      198.18.1.21                       100          0 i
*>l[2]:[0]:[0]:[48]:[0021.969a.0301]:[32]:[192.168.1.1]/272
                      198.18.1.11                       100      32768 i
*>i[2]:[0]:[0]:[48]:[0021.969f.c701]:[32]:[192.168.1.2]/272
                      198.18.1.21                       100          0 i
*>l[3]:[0]:[32]:[198.18.1.11]/88
                      198.18.1.11                       100      32768 i
*>i[3]:[0]:[32]:[198.18.1.21]/88
                      198.18.1.21                       100          0 i

Route Distinguisher: 172.16.1.1:33067    (L2VNI 10300)
*>l[2]:[0]:[0]:[48]:[0021.963d.6e01]:[0]:[0.0.0.0]/216
                      198.18.1.11                       100      32768 i
*>l[2]:[0]:[0]:[48]:[0021.963d.6e01]:[32]:[192.168.3.1]/272
                      198.18.1.11                       100      32768 i
*>l[3]:[0]:[32]:[198.18.1.11]/88
                      198.18.1.11                       100      32768 i

Route Distinguisher: 172.16.2.1:32867
*>i[2]:[0]:[0]:[48]:[0021.969f.c701]:[0]:[0.0.0.0]/216
                      198.18.1.21                       100          0 i
*>i[2]:[0]:[0]:[48]:[0021.969f.c701]:[32]:[192.168.1.2]/272
                      198.18.1.21                       100          0 i
*>i[3]:[0]:[32]:[198.18.1.21]/88
                      198.18.1.21                       100          0 i

Route Distinguisher: 172.16.2.1:32967
*>i[2]:[0]:[0]:[48]:[0021.9642.5f01]:[32]:[192.168.2.1]/272
                      198.18.1.21                       100          0 i

Route Distinguisher: 172.16.1.1:3    (L3VNI 50001)
*>i[2]:[0]:[0]:[48]:[0021.9642.5f01]:[32]:[192.168.2.1]/272
                      198.18.1.21                       100          0 i
*>i[2]:[0]:[0]:[48]:[0021.969f.c701]:[32]:[192.168.1.2]/272
                      198.18.1.21                       100          0 i

MAC アドレステーブル

Cisco 公式 / NX-OSv 9000 Guide / NX-OSv 9000 Software Functionality の下の方に NX-OSv 9000 Feature UI/CLI Difference From Hardware Platform ってのがあって、show mac addr とかその辺は代わりにこのコマンドを使え、とあったので仮想版ではこれで。

anycast gateway(20:20:00:00:00:aa) のエントリ表示がおかしいのはご愛敬ということで。

torsw101a# show system internal l2fwder mac
Legend:
        * - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay MAC
        age - seconds since last seen,+ - primary entry using vPC Peer-Link,
        (T) - True, (F) - False, C - ControlPlane MAC
   VLAN     MAC Address      Type      age     Secure NTFY Ports
---------+-----------------+--------+---------+------+----+------------------
G  3901    0021.960f.f307    static   -          F     F   sup-eth1(R)
G   100    0021.960f.f307    static   -          F     F   sup-eth1(R)
*   100    0021.969f.c701    static   -          F     F  (0x47000001) nve-peer1 198.18.
G   300    0021.960f.f307    static   -          F     F   sup-eth1(R)
*   100    0021.969a.0301   dynamic   00:02:55   F     F     Eth1/1
*   300    0021.963d.6e01   dynamic   00:03:06   F     F     Eth1/2
    1           1         -20:20:00:00:00:aa         -             1
torsw201a# show system internal l2fwder mac
Legend:
        * - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay MAC
        age - seconds since last seen,+ - primary entry using vPC Peer-Link,
        (T) - True, (F) - False, C - ControlPlane MAC
   VLAN     MAC Address      Type      age     Secure NTFY Ports
---------+-----------------+--------+---------+------+----+------------------
G   100    0021.9643.7607    static   -          F     F   sup-eth1(R)
*   200    0021.9642.5f01   dynamic   00:01:03   F     F     Eth1/2
*   100    0021.969f.c701   dynamic   00:03:40   F     F     Eth1/1
G   200    0021.9643.7607    static   -          F     F   sup-eth1(R)
*   100    0021.969a.0301    static   -          F     F  (0x47000001) nve-peer1 198.18.
    1           1         -20:20:00:00:00:aa         -             1

MAC アドレステーブル(EVPN 学習観点)

Seq No があるってことは MAC Mobility Extended Community が使えるんじゃないのか!?(説明放棄)

torsw101a# show l2route evpn mac all

Flags -(Rmac):Router MAC (Stt):Static (L):Local (R):Remote (V):vPC link
(Dup):Duplicate (Spl):Split (Rcv):Recv (AD):Auto-Delete(D):Del Pending (S):Stale (C):Clear
(Ps):Peer Sync (O):Re-Originated

Topology    Mac Address    Prod   Flags         Seq No     Next-Hops
----------- -------------- ------ ------------- ---------- ----------------
100         0021.969a.0301 Local  L,            0          Eth1/1
100         0021.969f.c701 BGP    SplRcv        0          198.18.1.21
300         0021.963d.6e01 Local  L,            0          Eth1/2
3901        0021.9643.7607 VXLAN  Rmac          0          198.18.1.21
torsw201a# show l2route evpn mac all

Flags -(Rmac):Router MAC (Stt):Static (L):Local (R):Remote (V):vPC link
(Dup):Duplicate (Spl):Split (Rcv):Recv (AD):Auto-Delete(D):Del Pending (S):Stale (C):Clear
(Ps):Peer Sync (O):Re-Originated

Topology    Mac Address    Prod   Flags         Seq No     Next-Hops
----------- -------------- ------ ------------- ---------- ----------------
100         0021.969a.0301 BGP    SplRcv        0          198.18.1.11
100         0021.969f.c701 Local  L,            0          Eth1/1
200         0021.9642.5f01 Local  L,            0          Eth1/2
3901        0021.960f.f307 VXLAN  Rmac          0          198.18.1.11

VRF の ARP テーブル

VxLAN や EVPN は関係ないですが。
なお AgeOut しそうになると、自発的に Nexus9000v が ARP request をノードに投げて、reply があったら AgeOut させないという(割とよくある)動きをしていました。
また、自身の配下にあるノードの分しか見えません。

torsw101a# show ip arp vrf VRF001

Flags: * - Adjacencies learnt on non-active FHRP router
       + - Adjacencies synced via CFSoE
       # - Adjacencies Throttled for Glean
       CP - Added via L2RIB, Control plane Adjacencies
       PS - Added via L2RIB, Peer Sync
       RO - Dervied from L2RIB Peer Sync Entry
       D - Static Adjacencies attached to down interface

IP ARP Table for context VRF001
Total number of entries: 2
Address         Age       MAC Address     Interface       Flags
192.168.3.1     00:02:36  0021.963d.6e01  Vlan300
192.168.1.1     00:02:24  0021.969a.0301  Vlan100
torsw201a# show ip arp vrf VRF001

Flags: * - Adjacencies learnt on non-active FHRP router
       + - Adjacencies synced via CFSoE
       # - Adjacencies Throttled for Glean
       CP - Added via L2RIB, Control plane Adjacencies
       PS - Added via L2RIB, Peer Sync
       RO - Dervied from L2RIB Peer Sync Entry
       D - Static Adjacencies attached to down interface

IP ARP Table for context VRF001
Total number of entries: 2
Address         Age       MAC Address     Interface       Flags
192.168.1.2     00:00:07  0021.969f.c701  Vlan100
192.168.2.1     00:02:31  0021.9642.5f01  Vlan200

VRF のルーティングテーブル(IPv4)

EVPN の MAC-IP を NLRI Type 2 でやりとりしているので、ホスト単位の経路情報になってます。

torsw101a# show ip route vrf VRF001
IP Route Table for VRF "VRF001"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>

192.168.1.0/24, ubest/mbest: 1/0, attached
    *via 192.168.1.254, Vlan100, [0/0], 1d06h, direct
192.168.1.1/32, ubest/mbest: 1/0, attached
    *via 192.168.1.1, Vlan100, [190/0], 1d06h, hmm
192.168.1.2/32, ubest/mbest: 1/0
    *via 198.18.1.21%default, [200/0], 1d06h, bgp-64512, internal, tag 64512 (evpn) segid: 50001 tunnelid: 0xc6
120115 encap: VXLAN

192.168.1.254/32, ubest/mbest: 1/0, attached
    *via 192.168.1.254, Vlan100, [0/0], 1d06h, local
192.168.2.1/32, ubest/mbest: 1/0
    *via 198.18.1.21%default, [200/0], 1d06h, bgp-64512, internal, tag 64512 (evpn) segid: 50001 tunnelid: 0xc6
120115 encap: VXLAN

192.168.3.0/24, ubest/mbest: 1/0, attached
    *via 192.168.3.254, Vlan300, [0/0], 1d06h, direct
192.168.3.1/32, ubest/mbest: 1/0, attached
    *via 192.168.3.1, Vlan300, [190/0], 1d06h, hmm
192.168.3.254/32, ubest/mbest: 1/0, attached
    *via 192.168.3.254, Vlan300, [0/0], 1d06h, local
torsw201a# show ip route vrf VRF001
IP Route Table for VRF "VRF001"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>

192.168.1.0/24, ubest/mbest: 1/0, attached
    *via 192.168.1.254, Vlan100, [0/0], 1d21h, direct
192.168.1.1/32, ubest/mbest: 1/0
    *via 198.18.1.11%default, [200/0], 1d06h, bgp-64512, internal, tag 64512 (evpn) segid: 50001 tunnelid
: 0xc612010b encap: VXLAN

192.168.1.2/32, ubest/mbest: 1/0, attached
    *via 192.168.1.2, Vlan100, [190/0], 1d11h, hmm
192.168.1.254/32, ubest/mbest: 1/0, attached
    *via 192.168.1.254, Vlan100, [0/0], 1d21h, local
192.168.2.0/24, ubest/mbest: 1/0, attached
    *via 192.168.2.254, Vlan200, [0/0], 1d09h, direct
192.168.2.1/32, ubest/mbest: 1/0, attached
    *via 192.168.2.1, Vlan200, [190/0], 1d06h, hmm
192.168.2.254/32, ubest/mbest: 1/0, attached
    *via 192.168.2.254, Vlan200, [0/0], 1d09h, local
192.168.3.1/32, ubest/mbest: 1/0
    *via 198.18.1.11%default, [200/0], 1d06h, bgp-64512, internal, tag 64512 (evpn) segid: 50001 tunnelid
: 0xc612010b encap: VXLAN

通信・パケット確認

Cisco 公式 / Cisco Programmable Fabric with VXLAN BGP EVPN Configuration Guide / Chapter: Unicast Forwarding に色々なパターンのフォワーディング動作が書いてあるので、それと見比べながら。

node11(VLAN 100) node21(VLAN 100) 通信 (via L2VNI)

各ノードで $ sudo ip n flush dev ens4ARP テーブル flush した上で

kotetsu@node11:~$ ping 192.168.1.2
PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data.
64 bytes from 192.168.1.2: icmp_seq=1 ttl=64 time=24.5 ms
64 bytes from 192.168.1.2: icmp_seq=2 ttl=64 time=7.11 ms
64 bytes from 192.168.1.2: icmp_seq=3 ttl=64 time=7.05 ms
kotetsu@node11:~$ ip n show dev ens4
192.168.1.2 lladdr 00:21:96:9f:c7:01 STALE
192.168.1.254 lladdr 20:20:00:00:00:aa STALE
kotetsu@node21:~$ ip n show dev ens4
192.168.1.1 lladdr 00:21:96:9a:03:01 STALE
192.168.1.254 lladdr 20:20:00:00:00:aa STALE

node11 からの ARP request は torsw101a が L2 VNI 10100 で VxLAN カプセル化して Ingress Replication (外側の IP src/dst は torsw[12]01a の loopback1)

f:id:kakkotetsu:20170912000201p:plain

node11 からの ICMP Echo Request は単に L2 VNI 10100 で VxLAN カプセル化されたやつ

f:id:kakkotetsu:20170912000213p:plain

node11(VLAN 100) node22(VLAN 200) 通信 (via L3VNI)

各ノードで $ sudo ip n flush dev ens4ARP テーブル flush した上で

kotetsu@node11:~$ ping 192.168.2.1
PING 192.168.2.1 (192.168.2.1) 56(84) bytes of data.
64 bytes from 192.168.2.1: icmp_seq=1 ttl=62 time=22.1 ms
64 bytes from 192.168.2.1: icmp_seq=2 ttl=62 time=9.69 ms
64 bytes from 192.168.2.1: icmp_seq=3 ttl=62 time=12.3 ms
64 bytes from 192.168.2.1: icmp_seq=4 ttl=62 time=8.81 ms
kotetsu@node11:~$ ip n show dev ens4
192.168.1.2  FAILED
192.168.1.254 lladdr 20:20:00:00:00:aa STALE
kotetsu@node22:~$ ip n show dev ens4
192.168.2.254 lladdr 20:20:00:00:00:aa STALE

ARP Req は torsw101a から torsw201aVNI 10100 で転送されるが、torsw201a 側が ARP Reply はしないでくれるので Dup った ARP Reply が node11 に戻ることはないです。(結果的には良いのですが、torsw201a はどうやって巧く判断しているんだろう…)

f:id:kakkotetsu:20170912000228p:plain

node11 からの ICMP Echo Request は torsw101a が VRF 間の L3 VNI 50001 でカプセル化して転送

f:id:kakkotetsu:20170912000302p:plain

Control Plane パケット

かなり見飽きてきた感はありますが EVPN NLRI Type2 MAC/IP Advertisement route) Update を一つ見てみます。
今回、inter subnet 通信のために SVI を作っていますので、MAC アドレスだけでなくだけでなく IP アドレスもアドバタイズされています。

f:id:kakkotetsu:20170912000313p:plain

おしまい

  • Nexus9000v は仮想版なのに VxLAN + EVPN がそれなりに動いてくれる良い奴です
    • anycast gateway が動きました
      • 仮想 MAC アドレスで通信してくれるし、ICMP Echo Request にも応えてくれる良い奴です
    • ARP supression が動かせないのが残念
      • 「要はただの Proxy ARP だろうが!」なんて野暮なことは言わないで下さい
  • Nexus といえば最近は ACI 推しなイメージですが
    • 用途がマッチすれば ACI を使うことで、こういう基盤の細かい L2, L3 周りを隠ぺいしてくれるものと思われます (使ったことないからマーケティング公開情報ベースの想像)
    • ACI がマッチしない用途でも、Nexus を普通のイーサネットスイッチとして独立動作させることもできます (本項のような)
    • 何を言いたいかって「選択肢があるって良いことですね」「Nexus 使ったからって皆が (物理ネットワークを気にしてこんな設定をしないと | ACI を使わないと) いけないわけではないですよ」ということです

Cisco Nexus9000v を KVM+GNS3 で動かす

最初に

やること/サマリ

本項では以下の話をします。

  • KVM+GNS3 環境で Nexus9000v を動かすところまで

話のポイントはこんなところ

  • GNS3MARKETPLACE を使ったアプライアンスデプロイが楽
  • Cisco Nexus 9000 シリーズの仮想版であるところの Nexus9000v2017/08/16 現在、誰でもダウンロード可能なので NX-OS 素振りが出来る
    • SW 版固有の機能制約はもちろんあり
    • メモリ要件は Minimum 4GB, Recommended 8GB といういつも通りなやつ

まー Nexus9000v のデプロイで UEFI 必須でタルかったので MARKETPLACE に逃げた、というのが正直なところですが。

参考資料

思うに、こんなページをわざわざ見に来る人は、このリンクを見れば自分で問題なく出来そうだよね。

環境情報

KVM 母艦と GNS3 は以下の感じで

$ uname -a
Linux kvm01 4.4.0-79-generic #100-Ubuntu SMP Wed May 17 19:58:14 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04.2 LTS"

$ virsh -v
1.3.1

$ qemu-system-x86_64 --version
QEMU emulator version 2.5.0 (Debian 1:2.5+dfsg-5ubuntu10.14), Copyright (c) 2003-2008 Fabrice Bellard

$ gns3 --version
2.0.2

Nexus9000v2017/08/16 現在ダウンロード可能な最新の nxosv-final.7.0.3.I6.1.qcow2OVMF2016/08/13 時点のビルド版ぽい(MARKETPLACE で降ってきたのを使っただけ)

Nexus9000v デプロイ

Cisco サイトで OS ファイル取得

まあこの後のデプロイ手順は、ここで取得したファイルは使わないのですが…「俺は正当にこのファイルをダウンロードして使える権限を持っているんだな」って確認はしておきたいじゃないですか。

上記から qcow2 ファイルを取得(紐付いているのはハンペンAP1台だけの個人アカウントでも問題なく)して、チェックサムを確認

$ ll nxosv-final.7.0.3.I6.1.qcow2
-rw-r--r-- 1 kotetsu kotetsu 780402688 Aug 16 23:07 nxosv-final.7.0.3.I6.1.qcow2

$ sha512sum nxosv-final.7.0.3.I6.1.qcow2
93f2ffdcb230b3a0bcba4a120db4fcc752a1f91adfd911c9d11d3a725f0bbba7df40509ce896dcea45d33cbeacfa9fad9054678f91169ea1d684db8aadaa04cb  nxosv-final.7.0.3.I6.1.qcow2

以下は 2017/08/16 時点でのダウンロード時画面キャプチャ

f:id:kakkotetsu:20170817224740p:plain

f:id:kakkotetsu:20170817224758p:plain

GNS3 の MARKETPLACE から Nexus9000v 用の テンプレートファイル取得

以下のリンクから DOWNLOAD TEMPLATE をポチッて cisco-nxosv9k.gns3a を取得します(完)

テンプレートファイルをインポート

いつも通り GNS3 の適当なプロジェクトを開いたら、先ほど取得した cisco-nxosv9k.gns3a をインポートします。
GNS3File -> Import appliance を押して後は流れで…って感じですが、必要に応じて GNS3 公式 / Import GNS3 appliance に画面キャプチャ付の公式手順もございますので、そちらも合わせてご参照くださいませ。

f:id:kakkotetsu:20170817224850p:plain

f:id:kakkotetsu:20170817224900p:plain

f:id:kakkotetsu:20170817224910p:plain

f:id:kakkotetsu:20170817224924p:plain

f:id:kakkotetsu:20170817224934p:plain

これで GNS3 のテンプレートとして All devicesCisco NX-OSv 9000 7.0.3.I6.1 ができあがります。

手動でやる時には、以下あたりと睨めっこしながら試行錯誤して…というのがお決まりの流れだと思うんですが、そこをサボッたのがこの手順です。

ちゃんと動くお手本パラメータとして使う、ってのもありかもですね。

テンプレートからデプロイ~起動

いつも通りにテンプレート(all devices)からD&Dでデプロイします。
なお、GNS3 MARKETPLACE から取得したテンプレートは前述の公式パラメータに完全に沿ったものになっていますので、メモリを 4GB に減らすなり用途に応じたチューニングはお好きなように。

$ ps aux | grep [N]X-OS
root     17924 12.7  1.1 9240844 367620 pts/12 Sl+  02:26   0:05 /usr/bin/qemu-system-x86_64 -name CiscoNX-OSv90007.0.3.I6.1-1 -m 8096M -smp cpus=2 -enable-kvm -machine smm=off -boot order=c -bios /home/kotetsu/GNS3/images/QEMU/OVMF-20160813.fd -device ahci,id=ahci0,bus=pci.0 -drive file=/home/kotetsu/GNS3/projects/700/project-files/qemu/f8111e6a-9fd1-4f1b-9fd6-a3c9a42b0ff3/hda_disk.qcow2,if=none,id=drive-sata-disk0,index=0,media=disk -device ide-drive,drive=drive-sata-disk0,bus=ahci0.0,id=drive-sata-disk0 -uuid f8111e6a-9fd1-4f1b-9fd6-a3c9a42b0ff3 -serial telnet:127.0.0.1:5004,server,nowait -monitor tcp:127.0.0.1:54313,server,nowait -net none -device e1000,mac=00:21:96:0f:f3:00,netdev=gns3-0 -netdev socket,id=gns3-0,udp=127.0.0.1:10019,localaddr=0.0.0.0:10018 -device e1000,mac=00:21:96:0f:f3:01,netdev=gns3-1 -netdev socket,id=gns3-1,udp=127.0.0.1:10021,localaddr=0.0.0.0:10020 -device e1000,mac=00:21:96:0f:f3:02,netdev=gns3-2 -netdev socket,id=gns3-2,udp=127.0.0.1:10023,localaddr=0.0.0.0:10022 -device e1000,mac=00:21:96:0f:f3:03,netdev=gns3-3 -netdev socket,id=gns3-3,udp=127.0.0.1:10025,localaddr=0.0.0.0:10024 -device e1000,mac=00:21:96:0f:f3:04,netdev=gns3-4 -netdev socket,id=gns3-4,udp=127.0.0.1:10027,localaddr=0.0.0.0:10026 -device e1000,mac=00:21:96:0f:f3:05,netdev=gns3-5 -netdev socket,id=gns3-5,udp=127.0.0.1:10029,localaddr=0.0.0.0:10028 -device e1000,mac=00:21:96:0f:f3:06,netdev=gns3-6 -netdev socket,id=gns3-6,udp=127.0.0.1:10031,localaddr=0.0.0.0:10030 -device e1000,mac=00:21:96:0f:f3:07,netdev=gns3-7 -netdev socket,id=gns3-7,udp=127.0.0.1:10033,localaddr=0.0.0.0:10032 -device e1000,mac=00:21:96:0f:f3:08,netdev=gns3-8 -netdev socket,id=gns3-8,udp=127.0.0.1:10035,localaddr=0.0.0.0:10034 -device e1000,mac=00:21:96:0f:f3:09,netdev=gns3-9 -netdev socket,id=gns3-9,udp=127.0.0.1:10037,localaddr=0.0.0.0:10036 -nographic

f:id:kakkotetsu:20170817225311p:plain

f:id:kakkotetsu:20170817225323p:plain

f:id:kakkotetsu:20170817225335p:plain

f:id:kakkotetsu:20170817225346p:plain

んで、起動してコンソールを見守ると、以下のように ZTP ライクなものが走りだします

2017 Aug 16 17:29:15 switch %$ VDC-1 %$ %POAP-2-POAP_INFO: [90SNLUQJ25I-00:21:96:0F:F3:07] - USB Initializing Success
2017 Aug 16 17:29:15 switch %$ VDC-1 %$ %POAP-2-POAP_INFO: [90SNLUQJ25I-00:21:96:0F:F3:07] - USB disk not detected
2017 Aug 16 17:29:15 switch %$ VDC-1 %$ last message repeated 1 time
2017 Aug 16 17:29:15 switch %$ VDC-1 %$ %POAP-2-POAP_DHCP_DISCOVER_START: [90SNLUQJ25I-00:21:96:0F:F3:07] - POAP DHCP Discover phase started
2017 Aug 16 17:29:16 switch %$ VDC-1 %$ %POAP-2-POAP_INFO: [90SNLUQJ25I-00:21:96:0F:F3:07] - Invalid DHCP OFFER from 0.0.0.0: Missing Script Server information
2017 Aug 16 17:29:16 switch %$ VDC-1 %$ %POAP-2-POAP_INFO: [90SNLUQJ25I-00:21:96:0F:F3:07] - Invalid DHCP OFFER from 0.0.0.0: Missing Script Name
2017 Aug 16 17:29:20 switch %$ VDC-1 %$ %ASCII-CFG-2-CONF_CONTROL: System ready
2017 Aug 16 17:29:20 switch %$ VDC-1 %$ %POAP-2-POAP_INFO: [90SNLUQJ25I-00:21:96:0F:F3:07] - Invalid DHCP OFFER from 0.0.0.0: Missing Script Server information
2017 Aug 16 17:29:20 switch %$ VDC-1 %$ %POAP-2-POAP_INFO: [90SNLUQJ25I-00:21:96:0F:F3:07] - Invalid DHCP OFFER from 0.0.0.0: Missing Script Name
2017 Aug 16 17:29:26 switch %$ VDC-1 %$ %POAP-2-POAP_FAILURE: [90SNLUQJ25I-00:21:96:0F:F3:07] - POAP DHCP discover phase failed
2017 Aug 16 17:29:38 switch %$ VDC-1 %$ %POAP-2-POAP_INFO: [90SNLUQJ25I-00:21:96:0F:F3:07] - USB Initializing Success
2017 Aug 16 17:29:38 switch %$ VDC-1 %$ %POAP-2-POAP_INFO: [90SNLUQJ25I-00:21:96:0F:F3:07] - USB disk not detected

ひたすらループするので、以下の出力で y して止めて

Abort Auto Provisioning and continue with normal setup ?(yes/no)[n]: y
Disabling POAP

admin ユーザ用のパスワード設定とかを対話でやると、初期ログイン可能になります。

Abort Auto Provisioning and continue with normal setup ?(yes/no)[n]: y
Disabling POAP



         ---- System Admin Account Setup ----


Do you want to enforce secure password standard (yes/no) [y]:

  Enter the password for "admin":
    Wrong Password, Reason:
       [Length should be at least 8 characters]
    Invalid admin password.

    Enter the password for "admin":
  Confirm the password for "admin":

         ---- Basic System Configuration Dialog VDC: 1 ----

This setup utility will guide you through the basic configuration of
the system. Setup configures only enough connectivity for management
of the system.

Please register Cisco Nexus9000 Family devices promptly with your
supplier. Failure to register may affect response times for initial
service calls. Nexus9000 devices must be registered to receive
entitled support services.

Press Enter at anytime to skip a dialog. Use ctrl-c at anytime
to skip the remaining dialogs.

 Would you like to enter the basic configuration dialog (yes/no):

2017 Aug 16 17:33:45 switch %$ VDC-1 %$ %ACLQOS-SLOT1-2-ACLQOS_FAILED: ACLQOS failure: TCAM region is not configured for feature QoS class IPv4 direction ingress. Please configure TCAM region Ingress COPP [copp] and retry the command.

Error: There was an error executing atleast one of the command
Please verify the following log for the command execution errors.
TCAM region is not configured. Please configure TCAM region and retry the command




User Access Verification
User Access Verification
 login: admin
Password:

Cisco NX-OS Software
Copyright (c) 2002-2017, Cisco Systems, Inc. All rights reserved.
NX-OSv9K software ("NX-OSv9K Software") and related documentation,
files or other reference materials ("Documentation") are
the proprietary property and confidential information of Cisco
Systems, Inc. ("Cisco") and are protected, without limitation,
pursuant to United States and International copyright and trademark
laws in the applicable jurisdiction which provide civil and criminal
penalties for copying or distribution without Cisco's authorization.

Any use or disclosure, in whole or in part, of the NX-OSv9K Software
or Documentation to any third party for any purposes is expressly
prohibited except as otherwise authorized by Cisco in writing.
The copyrights to certain works contained herein are owned by other
third parties and are used and distributed under license. Some parts
of this software may be covered under the GNU Public License or the
GNU Lesser General Public License. A copy of each such license is
available at
http://www.gnu.org/licenses/gpl.html and
http://www.gnu.org/licenses/lgpl.html
***************************************************************************
*  NX-OSv9K is strictly limited to use for evaluation, demonstration      *
*  and NX-OS education. Any use or disclosure, in whole or in part of     *
*  the NX-OSv9K Software or Documentation to any third party for any      *
*  purposes is expressly prohibited except as otherwise authorized by     *
*  Cisco in writing.                                                      *
***************************************************************************
switch#
switch# show ver
Cisco Nexus Operating System (NX-OS) Software
TAC support: http://www.cisco.com/tac
Documents: http://www.cisco.com/en/US/products/ps9372/tsd_products_support_serie
s_home.html
Copyright (c) 2002-2017, Cisco Systems, Inc. All rights reserved.
The copyrights to certain works contained herein are owned by
other third parties and are used and distributed under license.
Some parts of this software are covered under the GNU Public
License. A copy of the license is available at
http://www.gnu.org/licenses/gpl.html.

NX-OSv9K is a demo version of the Nexus Operating System

Software
  BIOS: version
  NXOS: version 7.0(3)I6(1)
  BIOS compile time:
  NXOS image file is: bootflash:///nxos.7.0.3.I6.1.bin
  NXOS compile time:  5/16/2017 22:00:00 [05/17/2017 06:21:28]


Hardware
  cisco NX-OSv Chassis
   with 8062148 kB of memory.
  Processor Board ID 90SNLUQJ25I

  Device name: switch
  bootflash:    3509454 kB
Kernel uptime is 0 day(s), 19 hour(s), 48 minute(s), 5 second(s)

Last reset
  Reason: Unknown
  System version:
  Service:

plugin
  Core Plugin, Ethernet Plugin

Active Package(s):

ほい、お疲れ様でしたー。

bootイメージ設定

Cisco 公式 / Troubleshooting the NX-OSv 9000 / How to prevent VM from dropping into “loader >” prompt にあるように、起動イメージを忘れずに設定しないと次回起動時に loader で止まってしまいますよ、って話

初期状態は以下なので

switch# show boot
Current Boot Variables:

sup-1
NXOS variable not set
No module boot variable set

Boot Variables on next reload:

sup-1
NXOS variable not set
No module boot variable set


switch# dir bootflash:
       4096    Aug 16 17:27:42 2017  .rpmstore/
       4096    Aug 16 17:27:53 2017  .swtam/
      38575    Aug 16 17:32:16 2017  20170816_172847_poap_26000_init.log
  759941120    May 17 06:46:25 2017  nxos.7.0.3.I6.1.bin
          0    Aug 16 17:35:04 2017  platform-sdk.cmd
       4096    Aug 16 17:28:44 2017  scripts/
       4096    Aug 16 17:28:45 2017  virt_strg_pool_bf_vdc_1/
       4096    Aug 16 17:28:06 2017  virtual-instance/
         59    Aug 16 17:27:57 2017  virtual-instance.conf

Usage for bootflash://sup-local
 1158098944 bytes used
 2379120640 bytes free
 3537219584 bytes total

以下のように boot イメージを指定しておきましょう。

switch# configure terminal
Enter configuration commands, one per line. End with CNTL/Z.
switch(config)#
switch(config)# boot nxos nxos.7.0.3.I6.1.bin
Performing image verification and compatibility check, please wait....
switch(config)#
switch(config)# end
switch#
switch# copy running-config startup-config
[########################################] 100%
Copy complete.

そーいや copy run start はその内なくなる、みたいな話を IOS 時代に見たような気もするんですが、どうなんでしたっけ…。

ともかく、これで再起動してもちゃんと OS が読み込まれます。reload でも実行して、動作確認しておきましょう。

switch# show boot
Current Boot Variables:

sup-1
NXOS variable = bootflash:/nxos.7.0.3.I6.1.bin
No module boot variable set

Boot Variables on next reload:

sup-1
NXOS variable = bootflash:/nxos.7.0.3.I6.1.bin
No module boot variable set

お手軽手順に逃げてインストールしただけなので、別に…
仮想版をユーザ制限なくダウンロードさせてもらったから言うわけじゃないですが、Nexus9000 はデータセンタスイッチとしてはなかなか良さげですよ!!

ブログサービス移行

はてなブログに引っ越してきました。
「インフラ屋さんなら自前鯖でWebサービスくらい公開せーや、ブログサービスなんて甘え」ってのはごもっともですが…。

移行元はプログラミング情報共有サービス的な感じだったんですが、僕のプログラミング要素が薄すぎて…。
2014/11-2017/03 の期間に公開した記事で、今となっては内容が古すぎる記事も多いのですが、その旨の注記を添えてこちらに移行しておきました。
移行元も先も Markdown 形式でコピペするだけだったので、とっても楽でした。(図だけはこちらにアップしなおしましたが)

ちなみに、プログラミング要素が大変うっすいアカウントの投稿内訳はこちら↓

f:id:kakkotetsu:20170513221643p:plain

f:id:kakkotetsu:20170513223655p:plain

Cumulus VX で VXLAN+EVPN (original : 2017/03/22)

この記事は某所で 2017/03/22 に書いた記事のコピーです。
そのため 2017/05/11 時点ではやや古い情報も含まれています。(2017/05 に GNS3 v2.0.0 stableCumulus Linux v3.3 がリリースされた)

最初に

本項でやること

以下をやります。

  • Cumulus Linux の Early Access 版(2017/03/21 時点)で限定的に VXLAN+EVPN 機能を試行できるので、仮想版である Cumulus VX でも動くか見る
    • 将来的に本実装された際には、設定方法や挙動は変わる筈
    • 現在取得できる EA 版は Quagga daemon のみなので、EVPN機能周りの設定や参照は Quagga にて
  • EVPN Multihoming を実装していない代わりに、MLAGでVTEPを冗長化する仕組みがあるようなので、その設定と挙動を見る

環境情報

Cumulus VX

Cumulus公式 / Download Cumulus VX2017/03/13 時点でダウンロード可能な最新版(Cumulus VX 3.2.1)の KVM 版 アカウントを作れば、個人でも特に問題なくダウンロードできました。

kotetsu@kvm01:~/vm_images/qemu$ ls -al cumulus-linux-3.2.1-vx-amd64-1486153138.ac46c24zd00d13e.qcow2
-rw-r--r-- 1 kotetsu kotetsu 1232601088 Mar  7 22:11 cumulus-linux-3.2.1-vx-amd64-1486153138.ac46c24zd00d13e.qcow2

kotetsu@kvm01:~/vm_images/qemu$ sha1sum cumulus-linux-3.2.1-vx-amd64-1486153138.ac46c24zd00d13e.qcow2
3d782f2c450683b4da5ea2324c88f3dccb89b6c2  cumulus-linux-3.2.1-vx-amd64-1486153138.ac46c24zd00d13e.qcow2
kotetsu@bb03:~$ cat /etc/lsb-release
DISTRIB_ID="Cumulus Linux"
DISTRIB_RELEASE=3.2.1
DISTRIB_DESCRIPTION="Cumulus Linux 3.2.1"

kotetsu@bb03:~$ uname -a
Linux bb03 4.1.0-cl-4-amd64 #1 SMP Debian 4.1.33-1+cl3u7 (2017-01-26) x86_64 GNU/Linux

その他

  • KVM母艦
    • Ubuntu16.04.1-server-amd64
    • に apt で降ってくる KVM と GNS3 一式
$ uname -a
Linux kvm01 4.4.0-57-generic #78-Ubuntu SMP Fri Dec 9 23:50:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04.1 LTS"

$ virsh -v
1.3.1

$ qemu-system-x86_64 --version
QEMU emulator version 2.5.0 (Debian 1:2.5+dfsg-5ubuntu10.6), Copyright (c) 2003-2008 Fabrice Bellard

$ gns3 --version
1.5.2

参考資料

構築

以下のような環境を作ります。

f:id:kakkotetsu:20170511000723p:plain

GNS3 でデプロイ

以下の感じでデプロイしていきます。(陰っているところは、相互接続実験するための既存環境なので無視)

f:id:kakkotetsu:20170511000850p:plain

Cumulus VX に関しては、以下の公式docに従ってポチポチしとけばよいでしょー。

自分の環境では以下程度で十分でした。

f:id:kakkotetsu:20170511000939p:plain

f:id:kakkotetsu:20170511000952p:plain

f:id:kakkotetsu:20170511001001p:plain

kotetsu@kvm01:~$ ps aux | grep [C]umulus
root     28241  2.5  1.3 1417576 445408 pts/12 Sl+  20:32   0:20 /usr/bin/qemu-system-x86_64 -name CumulusVX_bb03 -m 512M -smp cpus=1 -enable-kvm -boot order=c -drive file=/home/kotetsu/GNS3/projects/vqfx/project-files/qemu/25f56fdc-48e7-4622-be73-bf98d5686e4e/hda_disk.qcow2,if=ide,index=0,media=disk -serial telnet:127.0.0.1:5018,server,nowait -monitor tcp:127.0.0.1:37529,server,nowait -net none -device virtio-net-pci,mac=00:37:c4:6e:4e:00,netdev=gns3-0 -netdev socket,id=gns3-0,udp=127.0.0.1:10102,localaddr=127.0.0.1:10103 -device virtio-net-pci,mac=00:37:c4:6e:4e:01,netdev=gns3-1 -netdev socket,id=gns3-1,udp=127.0.0.1:10125,localaddr=127.0.0.1:10124 -device virtio-net-pci,mac=00:37:c4:6e:4e:02,netdev=gns3-2 -netdev socket,id=gns3-2,udp=127.0.0.1:10129,localaddr=127.0.0.1:10128 -device virtio-net-pci,mac=00:37:c4:6e:4e:03,netdev=gns3-3 -netdev socket,id=gns3-3,udp=127.0.0.1:10133,localaddr=127.0.0.1:10132 -device virtio-net-pci,mac=00:37:c4:6e:4e:04,netdev=gns3-4 -netdev socket,id=gns3-4,udp=127.0.0.1:10137,localaddr=127.0.0.1:10136 -device virtio-net-pci,mac=00:37:c4:6e:4e:05

周辺機器設定

torSW[34]01a (Open vSwitch) 設定

Open vSwitch の導入なんかは、適当に公式ドキュメントを見て進めて頂くとして。(雑) 以下のような設定をしておけば良いですよ。今回は Open vSwitch を使っていますが、ここに置くのは LACP と VLAN が動けばなんでもよいので、適当に各々が使いやすいやつを入れればよいかと。(勿論Cumulus VXでもok)

  • torSW[12]01a 共通
# ovs-vsctl --no-wait init
# ovs-vsctl add-br br0
# ovs-vsctl set bridge br0 datapath_type=netdev

# ovs-vsctl add-bond br0 bond0 ens4 ens5 lacp=active bond_mode=balance-slb other_config:lacp-time=fast
# ovs-vsctl add-port br0 ens6 tag=100
# ovs-vsctl add-port br0 ens7 tag=200
# ip link set dev br0 up
# ip link set dev ens4 up
# ip link set dev ens5 up
# ip link set dev ens6 up
# ip link set dev ens7 up

通信確認用 node[34]1 設定

通信できりゃー何でもよいです。(雑)

kotetsu@node31:~$ ip a show dev ens4
3: ens4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 00:37:c4:55:09:01 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.3/24 brd 192.168.1.255 scope global ens4
       valid_lft forever preferred_lft forever
    inet6 fe80::237:c4ff:fe55:901/64 scope link
       valid_lft forever preferred_lft forever
kotetsu@node41:~$ ip a show dev ens4
3: ens4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 00:37:c4:56:b4:01 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.4/24 brd 192.168.1.255 scope global ens4
       valid_lft forever preferred_lft forever
    inet6 fe80::237:c4ff:fe56:b401/64 scope link
       valid_lft forever preferred_lft forever

Cumulus VX 初期設定

ログインアカウント/パスワードは Cumulus公式 / Using Cumulus VX with KVM に書いてある通り、ユーザ cumulus パスワード CumulusLinux!

あとは

らへんを見ながら適当に...hostname、操作用ユーザ作成とssh鍵登録、syslog、timezone, ntp などの設定を環境に合わせた感じでどうぞ。

追加したユーザで net コマンド各種を使いたい場合は /etc/netd.conf で許可するユーザ、グループ設定を適宜編集して反映 (Cumulus公式 / Network Command Line Utility / Adding More NCLU Users or Groups)

Cumulus VX 物理IF/BGP設定

以下のような感じのを作っていきます。

f:id:kakkotetsu:20170511001253p:plain

物理IF

Cumulus公式 / Interface Configuration and Management あたりを参考に、まずはBGP構成をとるための物理IF設定を。

  • bb03
net add interface swp1 alias DEV=spine31 IF=swp1
net add interface swp1 mtu 9216
net add interface swp1 ip address 192.0.2.8/31

net add interface swp2 alias DEV=spine32 IF=swp1
net add interface swp2 mtu 9216
net add interface swp2 ip address 192.0.2.10/31

net add interface swp3 alias DEV=spine41 IF=swp1
net add interface swp3 mtu 9216
net add interface swp3 ip address 192.0.2.12/31

net add interface swp4 alias DEV=spine42 IF=swp1
net add interface swp4 mtu 9216
net add interface swp4 ip address 192.0.2.14/31

net commit
kotetsu@bb03:~$ net show interface all

       Name                        Speed      MTU  Mode           Summary
-----  --------------------------  -------  -----  -------------  ------------------------
UP     lo                          N/A      65536  Loopback       IP: 127.0.0.1/8, ::1/128
UP     eth0                        1G        1500  Mgmt           IP: 10.0.0.193/24
UP     swp1 (DEV=spine31 IF=swp1)  1G        9216  Interface/L3   IP: 192.0.2.8/31
UP     swp2 (DEV=spine32 IF=swp1)  1G        9216  Interface/L3   IP: 192.0.2.10/31
UP     swp3 (DEV=spine41 IF=swp1)  1G        9216  Interface/L3   IP: 192.0.2.12/31
UP     swp4 (DEV=spine42 IF=swp1)  1G        9216  Interface/L3   IP: 192.0.2.14/31
ADMDN  swp5                        0M        1500  NotConfigured
kotetsu@bb03:~$ cat /etc/network/interfaces
# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

source /etc/network/interfaces.d/*.intf

# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
auto eth0
iface eth0
    address 10.0.0.193/24
    gateway 10.0.0.254

auto swp1
iface swp1
    address 192.0.2.8/31
    alias DEV=spine31 IF=swp1
    mtu 9216

auto swp2
iface swp2
    address 192.0.2.10/31
    alias DEV=spine32 IF=swp1
    mtu 9216

auto swp3
iface swp3
    address 192.0.2.12/31
    alias DEV=spine41 IF=swp1
    mtu 9216

auto swp4
iface swp4
    address 192.0.2.14/31
    alias DEV=spine42 IF=swp1
    mtu 9216
  • bb04
net add interface swp1 alias DEV=spine31 IF=swp2
net add interface swp1 mtu 9216
net add interface swp1 ip address 192.0.2.136/31

net add interface swp2 alias DEV=spine32 IF=swp2
net add interface swp2 mtu 9216
net add interface swp2 ip address 192.0.2.138/31

net add interface swp3 alias DEV=spine41 IF=swp2
net add interface swp3 mtu 9216
net add interface swp3 ip address 192.0.2.140/31

net add interface swp4 alias DEV=spine42 IF=swp2
net add interface swp4 mtu 9216
net add interface swp4 ip address 192.0.2.142/31

net commit
  • spine31
net add interface swp1 alias DEV=bb03 IF=swp1
net add interface swp1 mtu 9216
net add interface swp1 ip address 192.0.2.9/31

net add interface swp2 alias DEV=bb04 IF=swp1
net add interface swp2 mtu 9216
net add interface swp2 ip address 192.0.2.137/31

net commit
  • spine32
net add interface swp1 alias DEV=bb03 IF=swp2
net add interface swp1 mtu 9216
net add interface swp1 ip address 192.0.2.11/31

net add interface swp2 alias DEV=bb04 IF=swp2
net add interface swp2 mtu 9216
net add interface swp2 ip address 192.0.2.139/31

net commit
  • spine41
net add interface swp1 alias DEV=bb03 IF=swp3
net add interface swp1 mtu 9216
net add interface swp1 ip address 192.0.2.13/31

net add interface swp2 alias DEV=bb04 IF=swp3
net add interface swp2 mtu 9216
net add interface swp2 ip address 192.0.2.141/31

net commit
  • spine42
net add interface swp1 alias DEV=bb03 IF=swp4
net add interface swp1 mtu 9216
net add interface swp1 ip address 192.0.2.15/31

net add interface swp2 alias DEV=bb04 IF=swp4
net add interface swp2 mtu 9216
net add interface swp2 ip address 192.0.2.143/31

net commit

Early Access版Quagga導入

デフォルトは以下の感じなので Cumulus公式 / Ethernet Virtual Private Network - EVPN / Installing the EVPN Package に従い、Early Access版の Quagga を入れる。

kotetsu@bb03:~$ dpkg -l quagga
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                    Version          Architecture     Description
+++-=======================-================-================-====================================================
ii  quagga                  1.0.0+cl3u7      amd64            BGP/OSPF/RIP routing daemon
kotetsu@bb03:~$ grep -E "CumulusLinux-3-early-access" /etc/apt/sources.list
#deb     http://repo3.cumulusnetworks.com/repo CumulusLinux-3-early-access cumulus
#deb-src http://repo3.cumulusnetworks.com/repo CumulusLinux-3-early-access cumulus
kotetsu@bb03:~$ sudo sed -i -e '/CumulusLinux-3-early-access/ s/^#//g' /etc/apt/sources.list
kotetsu@bb03:~$ sudo apt update
kotetsu@bb03:~$ sudo apt install -y cumulus-evpn
kotetsu@bb03:~$ sudo apt upgrade
kotetsu@bb03:~$ dpkg -l quagga
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                    Version          Architecture     Description
+++-=======================-================-================-====================================================
ii  quagga                  1.0.0+cl3eau8    amd64            BGP/OSPF/RIP routing daemon

Quagga起動設定

デフォルトは以下の感じなので Cumulus公式 / Configuring Cumulus Quagga あたりを参考に、全台で起動設定を。

kotetsu@bb03:~$ grep -Ev "^#" /etc/quagga/daemons
zebra=no
bgpd=no
ospfd=no
ospf6d=no
ripd=no
ripngd=no
isisd=n

起動デーモン設定で zebrabgpdyes に変えて

kotetsu@bb03:~$ sudo sed -r -i -e 's/(zebra|bgpd)=no/\1=yes/g' /etc/quagga/daemons

自動起動設定して起動

kotetsu@bb03:~$ sudo systemctl enable quagga.service
kotetsu@bb03:~$ sudo systemctl start quagga.service
kotetsu@bb03:~$ sudo systemctl status quagga.service

...

   Active: active (running) since Mon 2017-03-20 10:52:25 JST; 4s ago

Mar 20 10:52:24 spine41 quagga[30608]: Starting Quagga daemons (prio:10):. zebra. bgpd.
Mar 20 10:52:24 spine41 bgpd[30631]: BGPd 1.0.0+cl3eau8 starting: vty@2605, bgp@<all>:179
Mar 20 10:52:24 spine41 zebra[30624]: client 12 says hello and bids fair to announce only bgp routes
Mar 20 10:52:24 spine41 watchquagga[30638]: watchquagga 1.0.0+cl3eau8 watching [zebra bgpd], mode [phased zebra restart]
Mar 20 10:52:24 spine41 watchquagga[30638]: bgpd state -> up : connect succeeded
Mar 20 10:52:25 spine41 watchquagga[30638]: zebra state -> up : connect succeeded
Mar 20 10:52:25 spine41 watchquagga[30638]: Watchquagga: Notifying Systemd we are up and running
Mar 20 10:52:25 spine41 quagga[30608]: Starting Quagga monitor daemon: watchquagga.
Mar 20 10:52:25 spine41 quagga[30608]: Exiting from the script
Mar 20 10:52:25 spine41 systemd[1]: Started Cumulus Linux Quagga.

eBGP設定

  • bb03
net add loopback lo ip address 172.31.0.3/32

net add bgp autonomous-system 65000
net add bgp router-id 172.31.0.3

net add routing prefix-list ipv4 PL_LO_CLOS seq 10 permit 172.16.0.0/12 ge 32 le 32
net add routing prefix-list ipv4 PL_LO_CLOS seq 20 permit 192.0.2.0/24 ge 31 le 31
net add bgp redistribute connected

net add bgp neighbor PEER_SPINE peer-group
net add bgp neighbor PEER_SPINE prefix-list PL_LO_CLOS out
net add bgp neighbor PEER_SPINE next-hop-self

net add bgp neighbor 192.0.2.9 remote-as 65003
net add bgp neighbor 192.0.2.9 description spine31
net add bgp neighbor 192.0.2.9 peer-group PEER_SPINE

net add bgp neighbor 192.0.2.11 remote-as 65003
net add bgp neighbor 192.0.2.11 description spine32
net add bgp neighbor 192.0.2.11 peer-group PEER_SPINE

net add bgp neighbor 192.0.2.13 remote-as 65004
net add bgp neighbor 192.0.2.13 description spine41
net add bgp neighbor 192.0.2.13 peer-group PEER_SPINE

net add bgp neighbor 192.0.2.15 remote-as 65004
net add bgp neighbor 192.0.2.15 description spine42
net add bgp neighbor 192.0.2.15 peer-group PEER_SPINE
  • bb04
net add loopback lo ip address 172.31.0.4/32

net add bgp autonomous-system 65000
net add bgp router-id 172.31.0.4

net add routing prefix-list ipv4 PL_LO_CLOS seq 10 permit 172.16.0.0/12 ge 32 le 32
net add routing prefix-list ipv4 PL_LO_CLOS seq 20 permit 192.0.2.0/24 ge 31 le 31
net add bgp redistribute connected

net add bgp neighbor PEER_SPINE peer-group
net add bgp neighbor PEER_SPINE prefix-list PL_LO_CLOS out

net add bgp neighbor 192.0.2.137 remote-as 65003
net add bgp neighbor 192.0.2.137 description spine31
net add bgp neighbor 192.0.2.137 peer-group PEER_SPINE

net add bgp neighbor 192.0.2.139 remote-as 65003
net add bgp neighbor 192.0.2.139 description spine32
net add bgp neighbor 192.0.2.139 peer-group PEER_SPINE

net add bgp neighbor 192.0.2.141 remote-as 65004
net add bgp neighbor 192.0.2.141 description spine41
net add bgp neighbor 192.0.2.141 peer-group PEER_SPINE

net add bgp neighbor 192.0.2.143 remote-as 65004
net add bgp neighbor 192.0.2.143 description spine42
net add bgp neighbor 192.0.2.143 peer-group PEER_SPINE
  • spine31
net add loopback lo ip address 172.16.3.1/32

net add bgp autonomous-system 65003
net add bgp router-id 172.16.3.1

net add routing prefix-list ipv4 PL_LO_CLOS seq 10 permit 172.16.0.0/12 ge 32 le 32
net add routing prefix-list ipv4 PL_LO_CLOS seq 20 permit 192.0.2.0/24 ge 31 le 31
net add bgp redistribute connected

net add bgp neighbor PEER_BB peer-group
net add bgp neighbor PEER_BB prefix-list PL_LO_CLOS out

net add bgp neighbor 192.0.2.8 remote-as 65000
net add bgp neighbor 192.0.2.8 description bb03
net add bgp neighbor 192.0.2.8 peer-group PEER_BB

net add bgp neighbor 192.0.2.136 remote-as 65000
net add bgp neighbor 192.0.2.136 description bb04
net add bgp neighbor 192.0.2.136 peer-group PEER_BB
  • spine32
net add loopback lo ip address 172.16.3.2/32

net add bgp autonomous-system 65003
net add bgp router-id 172.16.3.2

net add routing prefix-list ipv4 PL_LO_CLOS seq 10 permit 172.16.0.0/12 ge 32 le 32
net add routing prefix-list ipv4 PL_LO_CLOS seq 20 permit 192.0.2.0/24 ge 31 le 31
net add bgp redistribute connected

net add bgp neighbor PEER_BB peer-group
net add bgp neighbor PEER_BB prefix-list PL_LO_CLOS out

net add bgp neighbor 192.0.2.10 remote-as 65000
net add bgp neighbor 192.0.2.10 description bb03
net add bgp neighbor 192.0.2.10 peer-group PEER_BB

net add bgp neighbor 192.0.2.138 remote-as 65000
net add bgp neighbor 192.0.2.138 description bb04
net add bgp neighbor 192.0.2.138 peer-group PEER_BB
  • spine41
net add loopback lo ip address 172.16.4.1/32

net add bgp autonomous-system 65004
net add bgp router-id 172.16.4.1

net add routing prefix-list ipv4 PL_LO_CLOS seq 10 permit 172.16.0.0/12 ge 32 le 32
net add routing prefix-list ipv4 PL_LO_CLOS seq 20 permit 192.0.2.0/24 ge 31 le 31
net add bgp redistribute connected

net add bgp neighbor PEER_BB peer-group
net add bgp neighbor PEER_BB prefix-list PL_LO_CLOS out

net add bgp neighbor 192.0.2.12 remote-as 65000
net add bgp neighbor 192.0.2.12 description bb03
net add bgp neighbor 192.0.2.12 peer-group PEER_BB

net add bgp neighbor 192.0.2.140 remote-as 65000
net add bgp neighbor 192.0.2.140 description bb04
net add bgp neighbor 192.0.2.140 peer-group PEER_BB
  • spine42
net add loopback lo ip address 172.16.4.2/32

net add bgp autonomous-system 65004
net add bgp router-id 172.16.4.2

net add routing prefix-list ipv4 PL_LO_CLOS seq 10 permit 172.16.0.0/12 ge 32 le 32
net add routing prefix-list ipv4 PL_LO_CLOS seq 20 permit 192.0.2.0/24 ge 31 le 31
net add bgp redistribute connected

net add bgp neighbor PEER_BB peer-group
net add bgp neighbor PEER_BB prefix-list PL_LO_CLOS out

net add bgp neighbor 192.0.2.14 remote-as 65000
net add bgp neighbor 192.0.2.14 description bb03
net add bgp neighbor 192.0.2.14 peer-group PEER_BB

net add bgp neighbor 192.0.2.142 remote-as 65000
net add bgp neighbor 192.0.2.142 description bb04
net add bgp neighbor 192.0.2.142 peer-group PEER_BB

ちなみに...neighbor 設定をしようと何となく tab を押したら、LLDPで得た隣接機器の情報と物理IFのマッピングが表示された...しゅごい...。

kotetsu@bb03:~$ net add bgp neighbor
    <bgppeer>          :  BGP neighbor or peer-group
    <interface>        :  An interface name "swp1" or glob "swp1-4,6,10-12"
    <ip>               :  An IPv4 or IPv6 Address
    <text-peer-group>  :  A BGP peer-group name
    eth0               :  LLDP peer spine41
    lo                 :  interface
    swp1               :  LLDP peer spine31
    swp2               :  LLDP peer spine32
    swp3               :  LLDP peer spine41
    swp4               :  LLDP peer spine42
    swp5               :  interface

Cumulus VX MLAG 設定

Cumulus公式 / Multi-Chassis Link Aggregation - MLAG あたりを参考に

まずは MLAG 用の LAG 設定を Cumulus公式 / Bonding - Link Aggregation あたりを参考に設定していきます。 組める最低限の設定だけ...。

  • spine31
net add bond bond0 bond slaves swp3-4
net add bond bond0 alias DEV=spine32 IF=bond0

net add interface bond0.4094 alias MLAG DEDICATED
net add interface bond0.4094 ip address 198.51.100.1/30
net add interface bond0.4094 clag peer-ip 198.51.100.2
net add interface bond0.4094 clag sys-mac 44:38:39:FF:40:94
  • spine32
net add bond bond0 bond slaves swp3-4
net add bond bond0 alias DEV=spine31 IF=bond0

net add interface bond0.4094 alias MLAG DEDICATED
net add interface bond0.4094 ip address 198.51.100.2/30
net add interface bond0.4094 clag peer-ip 198.51.100.1
net add interface bond0.4094 clag sys-mac 44:38:39:FF:40:94
  • spine41
net add bond bond0 bond slaves swp3-4
net add bond bond0 alias DEV=spine42 IF=bond0

net add interface bond0.4094 alias MLAG DEDICATED
net add interface bond0.4094 ip address 198.51.100.1/30
net add interface bond0.4094 clag peer-ip 198.51.100.2
net add interface bond0.4094 clag sys-mac 44:38:39:FF:40:94
  • spine42
net add bond bond0 bond slaves swp3-4
net add bond bond0 alias DEV=spine41 IF=bond0

net add interface bond0.4094 alias MLAG DEDICATED
net add interface bond0.4094 ip address 198.51.100.2/30
net add interface bond0.4094 clag peer-ip 198.51.100.1
net add interface bond0.4094 clag sys-mac 44:38:39:FF:40:94

こんな感じで MLAG が組めている筈。

kotetsu@spine31:~$ net show clag status
The peer is alive
    Peer Priority, ID, and Role: 32768 00:37:c4:a9:0f:03 primary
     Our Priority, ID, and Role: 32768 00:37:c4:f8:17:03 secondary
          Peer Interface and IP: bond0.4094 198.51.100.2
                      Backup IP:  (inactive)
                     System MAC: 44:38:39:ff:40:94
kotetsu@spine32:~$ net show clag status
The peer is alive
     Our Priority, ID, and Role: 32768 00:37:c4:a9:0f:03 primary
    Peer Priority, ID, and Role: 32768 00:37:c4:f8:17:03 secondary
          Peer Interface and IP: bond0.4094 198.51.100.1
                      Backup IP:  (inactive)
                     System MAC: 44:38:39:ff:40:94
  • spine3[12]
net add bond bond1 bond slaves swp5
net add bond bond1 alias DEV=torSW301a IF=bond0
net add bond bond1 mtu 9000
net add bond bond1 clag id 1
  • spine4[12]
net add bond bond1 bond slaves swp5
net add bond bond1 alias DEV=torSW401a IF=bond0
net add bond bond1 mtu 9000
net add bond bond1 clag id 1

bridge 設定

spine[34][12] 全台で 例によって Cumulus公式 / VLAN-aware Bridge Mode for Large-scale Layer 2 Environments を参考にして

net add bridge bridge ports bond0
net add bridge bridge ports bond1
net add bridge bridge vids 2-4093

torSW での LACP 状態確認

root@torSW301a:~# ovs-appctl lacp/show bond0
---- bond0 ----
        status: active negotiated
        sys_id: 00:37:c4:7e:e0:01
        sys_priority: 65534
        aggregation key: 1
        lacp_time: fast

slave: ens4: current attached
        port_id: 2
        port_priority: 65535
        may_enable: true

        actor sys_id: 00:37:c4:7e:e0:01
        actor sys_priority: 65534
        actor port_id: 2
        actor port_priority: 65535
        actor key: 1
        actor state: activity timeout aggregation synchronized collecting distributing

        partner sys_id: 44:38:39:ff:40:94
        partner sys_priority: 65535
        partner port_id: 1
        partner port_priority: 255
        partner key: 9
        partner state: activity timeout aggregation synchronized collecting distributing

slave: ens5: current attached
        port_id: 1
        port_priority: 65535
        may_enable: true

        actor sys_id: 00:37:c4:7e:e0:01
        actor sys_priority: 65534
        actor port_id: 1
        actor port_priority: 65535
        actor key: 1
        actor state: activity timeout aggregation synchronized collecting distributing

        partner sys_id: 44:38:39:ff:40:94
        partner sys_priority: 65535
        partner port_id: 1
        partner port_priority: 255
        partner key: 9
        partner state: activity timeout aggregation synchronized collecting distributing
root@torSW401a:~# ovs-appctl lacp/show bond0
---- bond0 ----
        status: active negotiated
        sys_id: 00:37:c4:2c:e5:01
        sys_priority: 65534
        aggregation key: 1
        lacp_time: fast

slave: ens4: current attached
        port_id: 1
        port_priority: 65535
        may_enable: true

        actor sys_id: 00:37:c4:2c:e5:01
        actor sys_priority: 65534
        actor port_id: 1
        actor port_priority: 65535
        actor key: 1
        actor state: activity timeout aggregation synchronized collecting distributing

        partner sys_id: 44:38:39:ff:40:94
        partner sys_priority: 65535
        partner port_id: 1
        partner port_priority: 255
        partner key: 9
        partner state: activity timeout aggregation synchronized collecting distributing

slave: ens5: current attached
        port_id: 2
        port_priority: 65535
        may_enable: true

        actor sys_id: 00:37:c4:2c:e5:01
        actor sys_priority: 65534
        actor port_id: 2
        actor port_priority: 65535
        actor key: 1
        actor state: activity timeout aggregation synchronized collecting distributing

        partner sys_id: 44:38:39:ff:40:94
        partner sys_priority: 65535
        partner port_id: 1
        partner port_priority: 255
        partner key: 9
        partner state: activity timeout aggregation synchronized collecting distributing

Cumulus VX VXLAN+EVPN 設定

仮想VTEPごとの仮想IPアドレス設定

  • spine3[12]
net add loopback lo clag vxlan-anycast-ip 172.16.3.100
  • spine4[12]
net add loopback lo clag vxlan-anycast-ip 172.16.4.100

本環境では redistribute connected で BGP ipv4 に流していて out でかけている prefix-list にもマッチする設定にしたので、これでこの仮想IPアドレスも広告される筈

kotetsu@bb03:~$ net show route

show ip route
=============
Codes: K - kernel route, C - connected, S - static, R - RIP,
       O - OSPF, I - IS-IS, B - BGP, P - PIM, T - Table, v - VNC,
       V - VPN,
       > - selected route, * - FIB route

K>* 0.0.0.0/0 via 10.0.0.254, eth0
C>* 10.0.0.0/24 is directly connected, eth0
B>* 172.16.3.1/32 [20/0] via 192.0.2.9, swp1, 08:58:27
B>* 172.16.3.2/32 [20/0] via 192.0.2.11, swp2, 08:57:25
B>* 172.16.3.100/32 [20/0] via 192.0.2.9, swp1, 00:28:29
  *                        via 192.0.2.11, swp2, 00:28:29
B>* 172.16.4.1/32 [20/0] via 192.0.2.13, swp3, 08:56:25
B>* 172.16.4.2/32 [20/0] via 192.0.2.15, swp4, 08:56:10
B>* 172.16.4.100/32 [20/0] via 192.0.2.15, swp4, 00:27:45
  *                        via 192.0.2.13, swp3, 00:27:45
C>* 172.31.0.3/32 is directly connected, lo
C>* 192.0.2.8/31 is directly connected, swp1
C>* 192.0.2.10/31 is directly connected, swp2
C>* 192.0.2.12/31 is directly connected, swp3
C>* 192.0.2.14/31 is directly connected, swp4
B>* 192.0.2.136/31 [20/0] via 192.0.2.9, swp1, 08:58:27
B>* 192.0.2.138/31 [20/0] via 192.0.2.11, swp2, 08:57:25
B>* 192.0.2.140/31 [20/0] via 192.0.2.13, swp3, 08:56:25
B>* 192.0.2.142/31 [20/0] via 192.0.2.15, swp4, 08:56:10

spine全台にVXLAN VNI設定

  • spine[34][12]
net add vxlan vxlan010100 vxlan id 10100
net add vxlan vxlan010100 bridge access 100

net add vxlan vxlan010200 vxlan id 10200
net add vxlan vxlan010200 bridge access 200

これで net commit すると、この vxlan インターフェース群は自動的に bridge にくっついてくる

kotetsu@spine42:~$ net commit
--- /etc/network/interfaces     2017-03-20 22:07:22.297455993 +0900
+++ /var/run/nclu/iface/interfaces.tmp  2017-03-20 22:17:54.341064283 +0900

...

 iface bridge
-    bridge-ports bond0 bond1
+    bridge-ports bond0 bond1 vxlan010100 vxlan010200

...

全台にVXLAN Tunnel IPアドレスを付与

  • spine31
net add vxlan vxlan010100 vxlan local-tunnelip 172.16.3.1
net add vxlan vxlan010200 vxlan local-tunnelip 172.16.3.1
  • spine32
net add vxlan vxlan010100 vxlan local-tunnelip 172.16.3.2
net add vxlan vxlan010200 vxlan local-tunnelip 172.16.3.2
  • spine41
net add vxlan vxlan010100 vxlan local-tunnelip 172.16.4.1
net add vxlan vxlan010200 vxlan local-tunnelip 172.16.4.1
  • spine42
net add vxlan vxlan010100 vxlan local-tunnelip 172.16.4.2
net add vxlan vxlan010200 vxlan local-tunnelip 172.16.4.2

EVPN 有効化~設定

Cumulus公式 / Ethernet Virtual Private Network - EVPN / Configuring EVPN に従って設定していきます。 Early Access 版の機能(quagga限定でCLIまでは)どうも探した感じでは net コマンドはまだ用意されていないようなので、従来の Quagga 設定で

kotetsu@spine41:~$ sudo vtysh

Hello, this is Quagga (version 1.0.0+cl3eau8).
Copyright 1996-2005 Kunihiro Ishiguro, et al.

spine41#
spine41# configure terminal
spine41(config)# router bgp 65004
spine41(config-router)# address-family evpn
spine41(config-router-af)# neighbor PEER_BB activate
spine41(config-router-af)# advertise-all-vni
spine41(config-router-af)# end
spine41# write memory
Note: this version of vtysh never writes vtysh.conf
Building Configuration...
Integrated configuration saved to /etc/quagga/Quagga.conf
[OK]
spine41#
spine41# exit
kotetsu@spine41:~$

以下のような設定を

  • bb0[34]
router bgp 65000
 address-family evpn
   neighbor PEER_SPINE activate
  • spine3[12]
router bgp 65003
 address-family evpn
   neighbor PEER_BB activate
   advertise-all-vni
  • spine4[12]
router bgp 65004
 address-family evpn
   neighbor PEER_BB activate
   advertise-all-vni

Disabling Data Plane MAC Learning over VXLAN Tunnels

spine[34][12] にて /etc/network/interfaces を編集して、全vxlanインターフェースに bridge-learning off を追記しておきます。

kotetsu@spine31:~$ diff -u /var/tmp/etc_network_interfaces /etc/network/interfaces
--- /var/tmp/etc_network_interfaces     2017-03-20 23:19:45.046311072 +0900
+++ /etc/network/interfaces     2017-03-20 23:20:33.701311345 +0900
@@ -64,6 +64,7 @@
 auto vxlan010100
 iface vxlan010100
     bridge-access 100
+    bridge-learning off
     mstpctl-bpduguard yes
     mstpctl-portbpdufilter yes
     vxlan-id 10100
@@ -72,6 +73,7 @@
 auto vxlan010200
 iface vxlan010200
     bridge-access 200
+    bridge-learning off
     mstpctl-bpduguard yes
     mstpctl-portbpdufilter yes
     vxlan-id 10200

動作確認

通信確認

End End での通信確認(L2 over L3)

kotetsu@node31:~$ ping 192.168.1.4
PING 192.168.1.4 (192.168.1.4) 56(84) bytes of data.
64 bytes from 192.168.1.4: icmp_seq=1 ttl=64 time=4.67 ms
64 bytes from 192.168.1.4: icmp_seq=2 ttl=64 time=1.86 ms
64 bytes from 192.168.1.4: icmp_seq=3 ttl=64 time=1.81 ms
64 bytes from 192.168.1.4: icmp_seq=4 ttl=64 time=1.94 ms
64 bytes from 192.168.1.4: icmp_seq=5 ttl=64 time=2.07 ms
64 bytes from 192.168.1.4: icmp_seq=6 ttl=64 time=1.24 ms
64 bytes from 192.168.1.4: icmp_seq=7 ttl=64 time=1.78 ms
^C
--- 192.168.1.4 ping statistics ---
7 packets transmitted, 7 received, 0% packet loss, time 6009ms
rtt min/avg/max/mdev = 1.241/2.199/4.677/1.041 ms
kotetsu@node31:~$ ip n show
192.168.1.4 dev ens4 lladdr 00:37:c4:56:b4:01 STALE
kotetsu@node41:~$ ip n show
192.168.1.3 dev ens4 lladdr 00:37:c4:55:09:01 STALE

Cumulus VX 各種テーブル確認

Cumulus公式 / Ethernet Virtual Private Network - EVPN / Output Commands に参照系のコマンドが色々と提示されているので、それを見ながら。

spine MAC アドレステーブル

まずは VTEP, EVPN PE として動作している spine 群の MAC アドレステーブルを。

  • TunnelDest 列で対向 VTEP の共有loopback IPアドレスを使っていることが伺える
  • MAC 列で 00:00:00:00:00:00 と表示されているのは BUM traffic replication らしい(公式の記載より)
kotetsu@spine31:~$ net show bridge macs

VLAN      Master    Interface    MAC                TunnelDest    State      Flags    LastSeen
--------  --------  -----------  -----------------  ------------  ---------  -------  ----------
100       bridge    bond1        00:37:c4:55:09:01                                    00:01:43
100       bridge    vxlan010100  00:37:c4:56:b4:01                                    00:03:52
untagged            vxlan010100  00:00:00:00:00:00  172.16.4.100  permanent  self     01:08:33
untagged            vxlan010100  00:37:c4:56:b4:01  172.16.4.100             self     00:03:58
untagged            vxlan010200  00:00:00:00:00:00  172.16.4.100  permanent  self     01:08:33
untagged  bridge    bond0        00:37:c4:f8:17:03                permanent           05:38:27
untagged  bridge    bond1        00:37:c4:f8:17:05                permanent           03:42:10
untagged  bridge    vxlan010100  a6:21:d1:0c:20:a8                permanent           02:20:01
untagged  bridge    vxlan010200  de:8e:ed:62:05:12                permanent           02:20:01
kotetsu@spine32:~$ net show bridge macs

VLAN      Master    Interface    MAC                TunnelDest    State      Flags    LastSeen
--------  --------  -----------  -----------------  ------------  ---------  -------  ----------
100       bridge    bond1        00:37:c4:55:09:01                                    00:04:45
100       bridge    vxlan010100  00:37:c4:56:b4:01                                    00:04:51
untagged            vxlan010100  00:00:00:00:00:00  172.16.4.100  permanent  self     01:09:25
untagged            vxlan010100  00:37:c4:56:b4:01  172.16.4.100             self     00:04:51
untagged            vxlan010200  00:00:00:00:00:00  172.16.4.100  permanent  self     01:09:25
untagged  bridge    bond0        00:37:c4:a9:0f:03                permanent           05:38:42
untagged  bridge    bond1        00:37:c4:a9:0f:05                permanent           03:42:00
untagged  bridge    vxlan010100  ea:60:31:c9:77:63                permanent           02:18:00
untagged  bridge    vxlan010200  06:f9:9e:92:a4:c0                permanent           02:18:00
kotetsu@spine41:~$ net show bridge macs

VLAN      Master    Interface    MAC                TunnelDest    State      Flags    LastSeen
--------  --------  -----------  -----------------  ------------  ---------  -------  ----------
100       bridge    bond1        00:37:c4:56:b4:01                                    00:01:43
100       bridge    vxlan010100  00:37:c4:55:09:01                                    00:01:49
untagged            vxlan010100  00:00:00:00:00:00  172.16.3.100  permanent  self     01:06:24
untagged            vxlan010100  00:37:c4:55:09:01  172.16.3.100             self     00:01:49
untagged            vxlan010200  00:00:00:00:00:00  172.16.3.100  permanent  self     01:06:24
untagged  bridge    bond0        00:37:c4:fe:34:03                permanent           05:34:25
untagged  bridge    bond1        00:37:c4:fe:34:05                permanent           03:35:31
untagged  bridge    vxlan010100  46:bf:75:c3:83:e3                permanent           02:14:34
untagged  bridge    vxlan010200  ca:4e:29:fd:d9:8e                permanent           02:14:34
kotetsu@spine42:~$ net show bridge macs

VLAN      Master    Interface    MAC                TunnelDest    State      Flags    LastSeen
--------  --------  -----------  -----------------  ------------  ---------  -------  ----------
100       bridge    bond1        00:37:c4:56:b4:01                                    00:00:46
100       bridge    vxlan010100  00:37:c4:55:09:01                                    00:02:49
untagged            vxlan010100  00:00:00:00:00:00  172.16.3.100  permanent  self     01:07:29
untagged            vxlan010100  00:37:c4:55:09:01  172.16.3.100             self     00:02:55
untagged            vxlan010200  00:00:00:00:00:00  172.16.3.100  permanent  self     01:07:29
untagged  bridge    bond0        00:37:c4:32:db:03                permanent           05:35:24
untagged  bridge    bond1        00:37:c4:32:db:05                permanent           03:36:18
untagged  bridge    vxlan010100  9e:e4:df:d2:a9:3a                permanent           02:15:20
untagged  bridge    vxlan010200  6a:3a:0d:08:fb:9e                permanent           02:15:20

広告している VNI や VTEP 情報

sudo vtysh から

spine31# show bgp evpn vni
Advertise All VNI flag: Enabled
Number of VNIs: 2
Flags: * - Kernel
  VNI        Orig IP         RD                    Import RT             Export RT
* 10200      172.16.3.100    172.16.3.1:10200      65003:10200           65003:10200
* 10100      172.16.3.100    172.16.3.1:10100      65003:10100           65003:10100


spine31# show evpn vni
Number of VNIs: 2
VNI        VxLAN IF              VTEP IP         # MACs   Remote VTEPs
10200      vxlan010200           172.16.3.100    0        172.16.4.100
10100      vxlan010100           172.16.3.100    2        172.16.4.100
spine32# show bgp evpn vni
Advertise All VNI flag: Enabled
Number of VNIs: 2
Flags: * - Kernel
  VNI        Orig IP         RD                    Import RT             Export RT
* 10200      172.16.3.100    172.16.3.2:10200      65003:10200           65003:10200
* 10100      172.16.3.100    172.16.3.2:10100      65003:10100           65003:10100


spine32# show evpn vni
Number of VNIs: 2
VNI        VxLAN IF              VTEP IP         # MACs   Remote VTEPs
10200      vxlan010200           172.16.3.100    0        172.16.4.100
10100      vxlan010100           172.16.3.100    2        172.16.4.100
spine41# show bgp evpn vni
Advertise All VNI flag: Enabled
Number of VNIs: 2
Flags: * - Kernel
  VNI        Orig IP         RD                    Import RT             Export RT
* 10200      172.16.4.100    172.16.4.1:10200      65004:10200           65004:10200
* 10100      172.16.4.100    172.16.4.1:10100      65004:10100           65004:10100


spine41# show evpn vni
Number of VNIs: 2
VNI        VxLAN IF              VTEP IP         # MACs   Remote VTEPs
10200      vxlan010200           172.16.4.100    0        172.16.3.100
10100      vxlan010100           172.16.4.100    2        172.16.3.100
spine42# show bgp evpn vni
Advertise All VNI flag: Enabled
Number of VNIs: 2
Flags: * - Kernel
  VNI        Orig IP         RD                    Import RT             Export RT
* 10200      172.16.4.100    172.16.4.2:10200      65004:10200           65004:10200
* 10100      172.16.4.100    172.16.4.2:10100      65004:10100           65004:10100


spine42# show evpn vni
Number of VNIs: 2
VNI        VxLAN IF              VTEP IP         # MACs   Remote VTEPs
10200      vxlan010200           172.16.4.100    0        172.16.3.100
10100      vxlan010100           172.16.4.100    2        172.16.3.100

EVPN 学習経路

自ASの別 spine からの経路を bb 経由で受け取るように設定してはいないので、RDとしても登場しないです。 自ASのMAC学習同期は、MLAGで良きようにやってくれる筈だから、それで良いかと。 また EVPN Multihoming を使った際には必要になる Type 1,4 に関しても一切情報が登場しません。

suto vtysh から

spine31# show bgp evpn route
BGP table version is 0, local router ID is 172.16.3.1
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal
Origin codes: i - IGP, e - EGP, ? - incomplete
EVPN type-2 prefix: [2]:[ESI]:[EthTag]:[MAClen]:[MAC]
EVPN type-3 prefix: [3]:[EthTag]:[IPlen]:[OrigIP]

   Network          Next Hop            Metric LocPrf Weight Path
Route Distinguisher: 172.16.3.1:10100
*> [2]:[0]:[0]:[48]:[00:37:c4:55:09:01]
                    172.16.3.100                       32768 i
*> [3]:[0]:[32]:[172.16.3.100]
                    172.16.3.100                       32768 i
Route Distinguisher: 172.16.3.1:10200
*> [3]:[0]:[32]:[172.16.3.100]
                    172.16.3.100                       32768 i
Route Distinguisher: 172.16.4.1:10100
*  [2]:[0]:[0]:[48]:[00:37:c4:56:b4:01]
                    172.16.4.100                           0 65000 65004 i
*> [2]:[0]:[0]:[48]:[00:37:c4:56:b4:01]
                    172.16.4.100                           0 65000 65004 i
*  [3]:[0]:[32]:[172.16.4.100]
                    172.16.4.100                           0 65000 65004 i
*> [3]:[0]:[32]:[172.16.4.100]
                    172.16.4.100                           0 65000 65004 i
Route Distinguisher: 172.16.4.1:10200
*  [3]:[0]:[32]:[172.16.4.100]
                    172.16.4.100                           0 65000 65004 i
*> [3]:[0]:[32]:[172.16.4.100]
                    172.16.4.100                           0 65000 65004 i
Route Distinguisher: 172.16.4.2:10100
*  [2]:[0]:[0]:[48]:[00:37:c4:56:b4:01]
                    172.16.4.100                           0 65000 65004 i
*> [2]:[0]:[0]:[48]:[00:37:c4:56:b4:01]
                    172.16.4.100                           0 65000 65004 i
*  [3]:[0]:[32]:[172.16.4.100]
                    172.16.4.100                           0 65000 65004 i
*> [3]:[0]:[32]:[172.16.4.100]
                    172.16.4.100                           0 65000 65004 i
Route Distinguisher: 172.16.4.2:10200
*  [3]:[0]:[32]:[172.16.4.100]
                    172.16.4.100                           0 65000 65004 i
*> [3]:[0]:[32]:[172.16.4.100]
                    172.16.4.100                           0 65000 65004 i

Displayed 9 prefixes (15 paths)
spine32# show bgp evpn route
BGP table version is 0, local router ID is 172.16.3.2
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal
Origin codes: i - IGP, e - EGP, ? - incomplete
EVPN type-2 prefix: [2]:[ESI]:[EthTag]:[MAClen]:[MAC]
EVPN type-3 prefix: [3]:[EthTag]:[IPlen]:[OrigIP]

   Network          Next Hop            Metric LocPrf Weight Path
Route Distinguisher: 172.16.3.2:10100
*> [2]:[0]:[0]:[48]:[00:37:c4:55:09:01]
                    172.16.3.100                       32768 i
*> [3]:[0]:[32]:[172.16.3.100]
                    172.16.3.100                       32768 i
Route Distinguisher: 172.16.3.2:10200
*> [3]:[0]:[32]:[172.16.3.100]
                    172.16.3.100                       32768 i
Route Distinguisher: 172.16.4.1:10100
*  [2]:[0]:[0]:[48]:[00:37:c4:56:b4:01]
                    172.16.4.100                           0 65000 65004 i
*> [2]:[0]:[0]:[48]:[00:37:c4:56:b4:01]
                    172.16.4.100                           0 65000 65004 i
*  [3]:[0]:[32]:[172.16.4.100]
                    172.16.4.100                           0 65000 65004 i
*> [3]:[0]:[32]:[172.16.4.100]
                    172.16.4.100                           0 65000 65004 i
Route Distinguisher: 172.16.4.1:10200
*  [3]:[0]:[32]:[172.16.4.100]
                    172.16.4.100                           0 65000 65004 i
*> [3]:[0]:[32]:[172.16.4.100]
                    172.16.4.100                           0 65000 65004 i
Route Distinguisher: 172.16.4.2:10100
*> [2]:[0]:[0]:[48]:[00:37:c4:56:b4:01]
                    172.16.4.100                           0 65000 65004 i
*  [2]:[0]:[0]:[48]:[00:37:c4:56:b4:01]
                    172.16.4.100                           0 65000 65004 i
*  [3]:[0]:[32]:[172.16.4.100]
                    172.16.4.100                           0 65000 65004 i
*> [3]:[0]:[32]:[172.16.4.100]
                    172.16.4.100                           0 65000 65004 i
Route Distinguisher: 172.16.4.2:10200
*  [3]:[0]:[32]:[172.16.4.100]
                    172.16.4.100                           0 65000 65004 i
*> [3]:[0]:[32]:[172.16.4.100]
                    172.16.4.100                           0 65000 65004 i

Displayed 9 prefixes (15 paths)
spine41# show bgp evpn route
BGP table version is 0, local router ID is 172.16.4.1
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal
Origin codes: i - IGP, e - EGP, ? - incomplete
EVPN type-2 prefix: [2]:[ESI]:[EthTag]:[MAClen]:[MAC]
EVPN type-3 prefix: [3]:[EthTag]:[IPlen]:[OrigIP]

   Network          Next Hop            Metric LocPrf Weight Path
Route Distinguisher: 172.16.3.1:10100
*  [2]:[0]:[0]:[48]:[00:37:c4:55:09:01]
                    172.16.3.100                           0 65000 65003 i
*> [2]:[0]:[0]:[48]:[00:37:c4:55:09:01]
                    172.16.3.100                           0 65000 65003 i
*  [3]:[0]:[32]:[172.16.3.100]
                    172.16.3.100                           0 65000 65003 i
*> [3]:[0]:[32]:[172.16.3.100]
                    172.16.3.100                           0 65000 65003 i
Route Distinguisher: 172.16.3.1:10200
*  [3]:[0]:[32]:[172.16.3.100]
                    172.16.3.100                           0 65000 65003 i
*> [3]:[0]:[32]:[172.16.3.100]
                    172.16.3.100                           0 65000 65003 i
Route Distinguisher: 172.16.3.2:10100
*  [2]:[0]:[0]:[48]:[00:37:c4:55:09:01]
                    172.16.3.100                           0 65000 65003 i
*> [2]:[0]:[0]:[48]:[00:37:c4:55:09:01]
                    172.16.3.100                           0 65000 65003 i
*  [3]:[0]:[32]:[172.16.3.100]
                    172.16.3.100                           0 65000 65003 i
*> [3]:[0]:[32]:[172.16.3.100]
                    172.16.3.100                           0 65000 65003 i
Route Distinguisher: 172.16.3.2:10200
*  [3]:[0]:[32]:[172.16.3.100]
                    172.16.3.100                           0 65000 65003 i
*> [3]:[0]:[32]:[172.16.3.100]
                    172.16.3.100                           0 65000 65003 i
Route Distinguisher: 172.16.4.1:10100
*> [2]:[0]:[0]:[48]:[00:37:c4:56:b4:01]
                    172.16.4.100                       32768 i
*> [3]:[0]:[32]:[172.16.4.100]
                    172.16.4.100                       32768 i
Route Distinguisher: 172.16.4.1:10200
*> [3]:[0]:[32]:[172.16.4.100]
                    172.16.4.100                       32768 i

Displayed 9 prefixes (15 paths)
spine42# show bgp evpn route
BGP table version is 0, local router ID is 172.16.4.2
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal
Origin codes: i - IGP, e - EGP, ? - incomplete
EVPN type-2 prefix: [2]:[ESI]:[EthTag]:[MAClen]:[MAC]
EVPN type-3 prefix: [3]:[EthTag]:[IPlen]:[OrigIP]

   Network          Next Hop            Metric LocPrf Weight Path
Route Distinguisher: 172.16.3.1:10100
*  [2]:[0]:[0]:[48]:[00:37:c4:55:09:01]
                    172.16.3.100                           0 65000 65003 i
*> [2]:[0]:[0]:[48]:[00:37:c4:55:09:01]
                    172.16.3.100                           0 65000 65003 i
*  [3]:[0]:[32]:[172.16.3.100]
                    172.16.3.100                           0 65000 65003 i
*> [3]:[0]:[32]:[172.16.3.100]
                    172.16.3.100                           0 65000 65003 i
Route Distinguisher: 172.16.3.1:10200
*  [3]:[0]:[32]:[172.16.3.100]
                    172.16.3.100                           0 65000 65003 i
*> [3]:[0]:[32]:[172.16.3.100]
                    172.16.3.100                           0 65000 65003 i
Route Distinguisher: 172.16.3.2:10100
*  [2]:[0]:[0]:[48]:[00:37:c4:55:09:01]
                    172.16.3.100                           0 65000 65003 i
*> [2]:[0]:[0]:[48]:[00:37:c4:55:09:01]
                    172.16.3.100                           0 65000 65003 i
*  [3]:[0]:[32]:[172.16.3.100]
                    172.16.3.100                           0 65000 65003 i
*> [3]:[0]:[32]:[172.16.3.100]
                    172.16.3.100                           0 65000 65003 i
Route Distinguisher: 172.16.3.2:10200
*  [3]:[0]:[32]:[172.16.3.100]
                    172.16.3.100                           0 65000 65003 i
*> [3]:[0]:[32]:[172.16.3.100]
                    172.16.3.100                           0 65000 65003 i
Route Distinguisher: 172.16.4.2:10100
*> [2]:[0]:[0]:[48]:[00:37:c4:56:b4:01]
                    172.16.4.100                       32768 i
*> [3]:[0]:[32]:[172.16.4.100]
                    172.16.4.100                       32768 i
Route Distinguisher: 172.16.4.2:10200
*> [3]:[0]:[32]:[172.16.4.100]
                    172.16.4.100                       32768 i

Displayed 9 prefixes (15 paths)

VXLAN関係にはノータッチで転送土管に徹する bb も、EVPN signaling 用のMP-BGPには参加します。

bb03# show bgp evpn route
BGP table version is 0, local router ID is 172.31.0.3
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal
Origin codes: i - IGP, e - EGP, ? - incomplete
EVPN type-2 prefix: [2]:[ESI]:[EthTag]:[MAClen]:[MAC]
EVPN type-3 prefix: [3]:[EthTag]:[IPlen]:[OrigIP]

   Network          Next Hop            Metric LocPrf Weight Path
Route Distinguisher: 172.16.3.1:10100
*> [2]:[0]:[0]:[48]:[00:37:c4:55:09:01]
                    172.16.3.100                           0 65003 i
*> [3]:[0]:[32]:[172.16.3.100]
                    172.16.3.100                           0 65003 i
Route Distinguisher: 172.16.3.1:10200
*> [3]:[0]:[32]:[172.16.3.100]
                    172.16.3.100                           0 65003 i
Route Distinguisher: 172.16.3.2:10100
*> [2]:[0]:[0]:[48]:[00:37:c4:55:09:01]
                    172.16.3.100                           0 65003 i
*> [3]:[0]:[32]:[172.16.3.100]
                    172.16.3.100                           0 65003 i
Route Distinguisher: 172.16.3.2:10200
*> [3]:[0]:[32]:[172.16.3.100]
                    172.16.3.100                           0 65003 i
Route Distinguisher: 172.16.4.1:10100
*> [2]:[0]:[0]:[48]:[00:37:c4:56:b4:01]
                    172.16.4.100                           0 65004 i
*> [3]:[0]:[32]:[172.16.4.100]
                    172.16.4.100                           0 65004 i
Route Distinguisher: 172.16.4.1:10200
*> [3]:[0]:[32]:[172.16.4.100]
                    172.16.4.100                           0 65004 i
Route Distinguisher: 172.16.4.2:10100
*> [2]:[0]:[0]:[48]:[00:37:c4:56:b4:01]
                    172.16.4.100                           0 65004 i
*> [3]:[0]:[32]:[172.16.4.100]
                    172.16.4.100                           0 65004 i
Route Distinguisher: 172.16.4.2:10200
*> [3]:[0]:[32]:[172.16.4.100]
                    172.16.4.100                           0 65004 i

Displayed 12 prefixes (12 paths)
bb04# show bgp evpn route
BGP table version is 0, local router ID is 172.31.0.4
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal
Origin codes: i - IGP, e - EGP, ? - incomplete
EVPN type-2 prefix: [2]:[ESI]:[EthTag]:[MAClen]:[MAC]
EVPN type-3 prefix: [3]:[EthTag]:[IPlen]:[OrigIP]

   Network          Next Hop            Metric LocPrf Weight Path
Route Distinguisher: 172.16.3.1:10100
*> [2]:[0]:[0]:[48]:[00:37:c4:55:09:01]
                    172.16.3.100                           0 65003 i
*> [3]:[0]:[32]:[172.16.3.100]
                    172.16.3.100                           0 65003 i
Route Distinguisher: 172.16.3.1:10200
*> [3]:[0]:[32]:[172.16.3.100]
                    172.16.3.100                           0 65003 i
Route Distinguisher: 172.16.3.2:10100
*> [2]:[0]:[0]:[48]:[00:37:c4:55:09:01]
                    172.16.3.100                           0 65003 i
*> [3]:[0]:[32]:[172.16.3.100]
                    172.16.3.100                           0 65003 i
Route Distinguisher: 172.16.3.2:10200
*> [3]:[0]:[32]:[172.16.3.100]
                    172.16.3.100                           0 65003 i
Route Distinguisher: 172.16.4.1:10100
*> [2]:[0]:[0]:[48]:[00:37:c4:56:b4:01]
                    172.16.4.100                           0 65004 i
*> [3]:[0]:[32]:[172.16.4.100]
                    172.16.4.100                           0 65004 i
Route Distinguisher: 172.16.4.1:10200
*> [3]:[0]:[32]:[172.16.4.100]
                    172.16.4.100                           0 65004 i
Route Distinguisher: 172.16.4.2:10100
*> [2]:[0]:[0]:[48]:[00:37:c4:56:b4:01]
                    172.16.4.100                           0 65004 i
*> [3]:[0]:[32]:[172.16.4.100]
                    172.16.4.100                           0 65004 i
Route Distinguisher: 172.16.4.2:10200
*> [3]:[0]:[32]:[172.16.4.100]
                    172.16.4.100                           0 65004 i

Displayed 12 prefixes (12 paths)

 EVPN 学習経路(特定RDをドリルダウンして)

sudo vtysh から

bb03# show bgp evpn route rd 172.16.3.2:10100
EVPN type-2 prefix: [2]:[ESI]:[EthTag]:[MAClen]:[MAC]
EVPN type-3 prefix: [3]:[EthTag]:[IPlen]:[OrigIP]

BGP routing table entry for 172.16.3.2:10100:[2]:[0]:[0]:[48]:[00:37:c4:55:09:01]
Paths: (1 available, best #1)
  Advertised to non peer-group peers:
  spine31(192.0.2.9) spine32(192.0.2.11) spine41(192.0.2.13) spine42(192.0.2.15)
  Route [2]:[0]:[0]:[48]:[00:37:c4:55:09:01] VNI 10100
  65003
    172.16.3.100 from spine32(192.0.2.11) (172.16.3.2)
      Origin IGP, localpref 100, valid, external, bestpath-from-AS 65003, best
      Extended Community: RT:65003:10100 ET:8
      AddPath ID: RX 0, TX 138
      Last update: Tue Mar 21 21:51:06 2017

BGP routing table entry for 172.16.3.2:10100:[3]:[0]:[32]:[172.16.3.100]
Paths: (1 available, best #1)
  Advertised to non peer-group peers:
  spine31(192.0.2.9) spine32(192.0.2.11) spine41(192.0.2.13) spine42(192.0.2.15)
  Route [3]:[0]:[32]:[172.16.3.100]
  65003
    172.16.3.100 from spine32(192.0.2.11) (172.16.3.2)
      Origin IGP, localpref 100, valid, external, bestpath-from-AS 65003, best
      Extended Community: RT:65003:10100 ET:8
      AddPath ID: RX 0, TX 110
      Last update: Tue Mar 21 20:57:31 2017

パケットを眺める

ControlPlane

EVPN NLRI Type3(Inclusive Multicast Ethernet Tag route)

spine41bb03 と eBGP OPEN 直後に送信している UPDATE です。 (EVPN Multihoming との比較という意味で)注目すべきは Originating Router's IP Address として spine4[12] で組んでいる共有?loopback IPアドレス(172.16.4.100)が入っていることでしょう。 また Cumulus公式 / Ethernet Virtual Private Network - EVPN / Enabling EVPN with Route Distinguishers (RDs) and Route Targets (RTs)andRouteTargets(RTs)) に記載がある通り、RDRT は明示的に設定せずとも自動付与された情報が入っています。

f:id:kakkotetsu:20170511001352p:plain

EVPN NLRI Type2(MAC/IP Advertisement route)

spine41bb03node41(at VLAN100:VNI10100)MAC アドレスを広告する図です。

f:id:kakkotetsu:20170511001417p:plain

DataPlane

BUM

node31 からの ARP Request は spine31 から bb03 に送信されています。 VXLANカプセル外側の IP ヘッダを見ると、Src が 172.16.3.100 (spine3[12] の共有lo IPaddr) で Dst が 172.16.4.100(spine4[12] の共有loopback IPアドレス) になっており、各ペアが2台で共有?loopback IPアドレスを使った論理?VTEPを構成していることが分かります。 VXLAN 的には HER(Head End Replication) 動作。 なお、2017/03/21 時点で公式ページのHERに関する注意書きを読むと、HER で構成可能な VTEP 数は 128 だそうです。

Cumulus Linux verified support for up to 128 VTEPs with head end replication.

f:id:kakkotetsu:20170511001433p:plain

Unicast

spine41 から bb04 方面に送信される node41 から node31 への ICMP Echo Reply の様子。 ただのVXLANカプセル化されたパケットですが、外側のIPヘッダを見ると共有loopback同士での通信になっています。

f:id:kakkotetsu:20170511001446p:plain

MLAG 動作

単なる MLAG の切り替わりでしかなく、仮想環境での障害試験なので、超簡単に...。

トラフィックbb03 -> spine41 -> torSW401a -> node41 という経路で流れている状態で spine41 の downlink を sudo ifdown swp5 で down させると、即時 bb03 -> spine41 -> spine42 -> torSW401a -> node41 という経路に切り替わりました。 spine4[12]torSW401a に組んでいる LAG や仮想loopbackは up したままなので、特に EVPN 的な WithDrawn なども発生せずです。

おしまい

以下、所感です。

  • Cumulus Linux
    • VX の軽さが良い
    • Network Command Line Utility(NCLU) というラッパの使い勝手が良い
    • ドキュメントがちゃんと揃っているのが良い (今回とりあげたのは EA 版機能なのに)
      • だから僕の説明が雑なのは仕方ない

vQFX10000 で VXLAN+EVPN (Multihoming 編) (original : 2017/02/28)

この記事は某所で 2017/02/28 に書いたもののコピーです。
そのため 2017/05/13 時点ではやや古い情報も含まれています。

  • 201705GNS3 ver2.0 stable が出たので
    • KVM 間のパケットキャプチャ用に GNS3 のハブを挟む必要がなくなった
    • Docker インスタンスで vlan tag を扱えるようになり、Open vSwitch を Docker で簡単に並べられるようになった

.

最初に

本項でやること

以下をやります。

  • 先日「vQFX10000 で VXLAN+EVPN (L2 over L3 編)」で動かした構成を冗長化する
    • RFC7432(BGP MPLS-Based Ethernet VPN)に書かれた Multihoming を動かして様子を見る
      • 構成図でいう spine 群(VTEP と EVI の役割を担う)の下側
      • これがメイン
    • マルチパスBGPもついでに組んでおく
      • 構成図でいう spine 群(VTEP と EVI の役割を担う)の上側
      • 単にマルチパスなBGP構成にしただけで、サラリと流す
      • ここ数年、あちこちのデータセンタ事業者とかメーカとかが IP CLOS Network として提示しているアレ

個人的には、2014/12 時点で物理スイッチでの VXLAN 実装を確認した時 (VyOS と Arista で VXLAN 相互接続)に、VTEPの冗長化手法が分からなかった(配下のノードで bonding なり VRRP なりで制御せざるを得ないと考えていた)ので、それを標準化技術で実現する手法として期待を持っていたりします。

概要構成図 / 構成簡易解説 / 環境情報

下図のようなものを作っていきます。

f:id:kakkotetsu:20170513140423p:plain

コンポーネントの役割は、前回と変わらず以下の感じ。

  • bb0[12]
    • DataPlane としては VTEP (spine[12]{2})間の VXLAN パケットをフォワードするのみ
    • ControlPlane としては MP-BGP(EVPNシグナリング用) の RouteReflector として、EVPN NLRI経路を reflect するのみ
  • spien[12]{2}
    • DataPlane としては VTEP 動作
    • ControlPlane としては EVPN の PE 動作
  • torSW[12]01a
    • VLAN tag 処理をする単純なL2SW

冗長化の方式としては、構成図の下線で書いた通りで、以下の感じ。

  • bb0[12] - spien[12]{2}
    • Underlay の eBGP では PerPacket の ECMP を動かす
    • Overlay の iBGP では bb0[12] が RouteReflector となる 1 クラスタにする
  • spine[12]{2} - torSW[12]01a
    • spine[12]{2} は 2台1組で torSW[12]01a に対して LAG を組む
      • spine1[12]torSW101a
      • spine2[12]torSW201a
    • spine[12]{2} の EVPN 動作としては
      • その LAG を EthernetSegment(ES) として扱う
      • その LAG に EVPN 世界内で一意になるように EthernetSegmentID(ESI) を付与して識別する
        • 今回の構成例だと、spine1[12]ae0spine2[12]ae0 が重複してはいけない
      • ESI を EVPN NLRI Type4(Ethernet Segment route) で広告しあうことで、相互に誰とLAGを組んでいるかを識別したり・誰が各 ES の Designated Forwarder(DF)になるかの Election をしたりできる
        • だから MLAG や MC-LAG や vPC (何か標準的な用語あるんすかね...?)で用意するような InterLink は不要
          • 実装によっては学習している MAC アドレスを InterLink で同期したりするけれど、EVPN では NLRI Type2(MAC/IP Advertisement route)で経路交換するので不要
      • 各 LAG で DF になっている PE だけが EVPN/VXLAN 側(=図中の上側)から来た BUM トラフィックを VLAN 側(=図中の下側)にフォワードする
    • torSW[12]01a
      • 対向が2台いるとか Multihoming してるとか一切気にする必要なく、単に LACP で LAG 組めば OK

そんなわけで、今までは VLAN Tag の処理だけしていれば良かった torSW[12]01a に新たに「LACP を使えること」という要件が出てきたので、今回は GNS3 の EthernetSwitch ではなく Open vSwitch を使います。

環境情報は最初から変わらずです。
今回新たに登場した torSW[12]01a は(詳細は後述しますが)、 Ubuntu16.04.1-server-amd64 + ovs_version: "2.5.0" です。

参考資料

構築~動作確認

GNS3 でデプロイ

前回の環境を引き続き使っていますが、ポチポチとvQFX のデプロイと、Open vSwitch を動かす Ubuntu のデプロイをしていきます。
以下の感じで。

f:id:kakkotetsu:20170513140925p:plain

例によって、GNS3 のバージョンは 1.5.2 を使っているので、パケットキャプチャ用に EthernetSwitch(図中の cap ではじまるやつら) を挟んでますよ。

余談。
最初、Open vSwitch は手軽に docker で動かしたんですよ。(以下、参考リンク)

そうしたら VLAN trunk port で VLAN tag が取り除かれて出てきたんですよ。つーわけで docker OVS は止め。諦めて各々の OVS 用に Ubuntu 仮想マシンを立てることにしましたとさ。
詳細は以下のリンク先を参照ください。例によって GNS 2.0 では、docker でもちゃんと VLAN tag 取り除かずに動くようになっていました(2017/05 確認済)。

torSW[12]01a (Open vSwitch) 設定

Open vSwitch の導入なんかは、適当に公式ドキュメントを見て進めて頂くとして。(雑) 以下のような設定で。このスイッチは LACP と VLAN が動けばなんでもよいので、適当に各々が使いやすいやつを入れればよいかと。

  • torSW[12]01a 共通
# ovs-vsctl --no-wait init
# ovs-vsctl add-br br0
# ovs-vsctl set bridge br0 datapath_type=netdev

# ovs-vsctl add-bond br0 bond0 ens4 ens5 lacp=active bond_mode=balance-slb other_config:lacp-time=fast
# ovs-vsctl add-port br0 ens6 tag=100
# ovs-vsctl add-port br0 ens7 tag=200
# ip link set dev br0 up
# ip link set dev ens4 up
# ip link set dev ens5 up
# ip link set dev ens6 up
# ip link set dev ens7 up

# ovs-vsctl show
79e38752-1ada-4e44-9da2-f457504b149a
    Bridge "br0"
        Port "ens7"
            tag: 200
            Interface "ens7"
        Port "bond0"
            Interface "ens5"
            Interface "ens4"
        Port "ens6"
            tag: 100
            Interface "ens6"
        Port "br0"
            Interface "br0"
                type: internal
    ovs_version: "2.5.0"

# ovs-vsctl list port bond0
_uuid               : 8e767f1a-6a08-411f-9b86-407922a7565c
bond_active_slave   : "00:37:c4:9e:d1:01"
bond_downdelay      : 0
bond_fake_iface     : false
bond_mode           : balance-slb
bond_updelay        : 0
external_ids        : {}
fake_bridge         : false
interfaces          : [470f05cf-c3db-423d-abd5-4a6c45c9a581, 97fb2fa2-5074-455a-8b58-04ec746ea0a6]
lacp                : active
mac                 : []
name                : "bond0"
other_config        : {lacp-time=fast}
qos                 : []
rstp_statistics     : {}
rstp_status         : {}
statistics          : {}
status              : {}
tag                 : []
trunks              : []
vlan_mode           : []

以下出力は spine1[12] の設定とかも終わった後で LAG が組めている状態のものですが...。

root@torSW101a:~# ovs-appctl bond/show bond0
---- bond0 ----
bond_mode: balance-slb
bond may use recirculation: no, Recirc-ID : -1
bond-hash-basis: 0
updelay: 0 ms
downdelay: 0 ms
next rebalance: 7123 ms
lacp_status: negotiated
active slave mac: 00:37:c4:9e:d1:01(ens4)

slave ens4: enabled
        active slave
        may_enable: true

slave ens5: enabled
        may_enable: true


root@torSW101a:~# ovs-appctl lacp/show bond0
---- bond0 ----
        status: active negotiated
        sys_id: 00:37:c4:9e:d1:01
        sys_priority: 65534
        aggregation key: 1
        lacp_time: fast

slave: ens4: current attached
        port_id: 2
        port_priority: 65535
        may_enable: true

        actor sys_id: 00:37:c4:9e:d1:01
        actor sys_priority: 65534
        actor port_id: 2
        actor port_priority: 65535
        actor key: 1
        actor state: activity timeout aggregation synchronized collecting distributing

        partner sys_id: 00:00:00:01:01:01
        partner sys_priority: 127
        partner port_id: 1
        partner port_priority: 127
        partner key: 1
        partner state: activity timeout aggregation synchronized collecting distributing

slave: ens5: current attached
        port_id: 1
        port_priority: 65535
        may_enable: true

        actor sys_id: 00:37:c4:9e:d1:01
        actor sys_priority: 65534
        actor port_id: 1
        actor port_priority: 65535
        actor key: 1
        actor state: activity timeout aggregation synchronized collecting distributing

        partner sys_id: 00:00:00:01:01:01
        partner sys_priority: 127
        partner port_id: 1
        partner port_priority: 127
        partner key: 1
        partner state: activity timeout aggregation synchronized collecting distributing

疎通確認用 node 設定

今まで通りなのですが、適当に node 群を用意しておきます。疎通確認にしか使わないので、何を使ってもいいです。

  • node11
    • spine1[12] 配下 VLAN 100
kotetsu@node11:~$ ip a show dev ens4
3: ens4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 00:37:c4:e2:60:01 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.1/24 brd 192.168.1.255 scope global ens4
       valid_lft forever preferred_lft forever
    inet6 fe80::237:c4ff:fee2:6001/64 scope link
       valid_lft forever preferred_lft forever
  • node21
    • spine2[12] 配下 VLAN 100
kotetsu@node21:~$ ip a show dev ens4
3: ens4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 00:37:c4:46:d8:01 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.2/24 brd 192.168.1.255 scope global ens4
       valid_lft forever preferred_lft forever
    inet6 fe80::237:c4ff:fe46:d801/64 scope link
       valid_lft forever preferred_lft forever
  • node12
    • spine1[12] 配下 VLAN 200
kotetsu@node12:~$ ip a show dev ens4
3: ens4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 00:37:c4:9c:dd:01 brd ff:ff:ff:ff:ff:ff
    inet 192.168.2.1/24 brd 192.168.2.255 scope global ens4
       valid_lft forever preferred_lft forever
    inet6 fe80::237:c4ff:fe9c:dd01/64 scope link
       valid_lft forever preferred_lft forever
  • node22
    • spine2[12] 配下 VLAN 200
kotetsu@node22:~$ ip a show dev ens4
3: ens4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 00:37:c4:3d:e0:01 brd ff:ff:ff:ff:ff:ff
    inet 192.168.2.2/24 brd 192.168.2.255 scope global ens4
       valid_lft forever preferred_lft forever
    inet6 fe80::237:c4ff:fe3d:e001/64 scope link
       valid_lft forever preferred_lft forever

vQFX 設定~確認

まずは Underlay の設定(物理IF~eBGP)を。長くなってしまいますが、下図のような感じで作っていきます。

f:id:kakkotetsu:20170513141043p:plain

Underlay 物理IF設定

vQFX でも RFC3021 Using 31-Bit Prefixes on IPv4 Point-to-Point Links をサポートしているので、使っていくです。

6台分をペタペタ貼っていきますが、まあ物理インターフェース番号やIPアドレスのパラメータが個々に異なるくらい。

  • bb01
set interfaces xe-0/0/0 description "DEV=spine11 IF=xe-0/0/0"
set interfaces xe-0/0/0 mtu 9192
set interfaces xe-0/0/0 unit 0 family inet address 192.0.2.0/31
set interfaces xe-0/0/0 unit 0 family inet mtu 9000

delete interfaces xe-0/0/1 unit 0 family inet dhcp
set interfaces xe-0/0/1 description "DEV=spine12 IF=xe-0/0/0"
set interfaces xe-0/0/1 mtu 9192
set interfaces xe-0/0/1 unit 0 family inet address 192.0.2.2/31
set interfaces xe-0/0/1 unit 0 family inet mtu 9000

delete interfaces xe-0/0/2 unit 0 family inet dhcp
set interfaces xe-0/0/2 description "DEV=spine21 IF=xe-0/0/0"
set interfaces xe-0/0/2 mtu 9192
set interfaces xe-0/0/2 unit 0 family inet address 192.0.2.4/31
set interfaces xe-0/0/2 unit 0 family inet mtu 9000

delete interfaces xe-0/0/3 unit 0 family inet dhcp
set interfaces xe-0/0/3 description "DEV=spine22 IF=xe-0/0/0"
set interfaces xe-0/0/3 mtu 9192
set interfaces xe-0/0/3 unit 0 family inet address 192.0.2.6/31
set interfaces xe-0/0/3 unit 0 family inet mtu 9000

set protocols lldp port-id-subtype interface-name
set protocols lldp interface xe-0/0/0
set protocols lldp interface xe-0/0/1
set protocols lldp interface xe-0/0/2
set protocols lldp interface xe-0/0/3
  • bb02
delete interfaces xe-0/0/0 unit 0 family inet dhcp
set interfaces xe-0/0/0 description "DEV=spine11 IF=xe-0/0/1"
set interfaces xe-0/0/0 mtu 9192
set interfaces xe-0/0/0 unit 0 family inet address 192.0.2.128/31
set interfaces xe-0/0/0 unit 0 family inet mtu 9000

delete interfaces xe-0/0/1 unit 0 family inet dhcp
set interfaces xe-0/0/1 description "DEV=spine12 IF=xe-0/0/1"
set interfaces xe-0/0/1 mtu 9192
set interfaces xe-0/0/1 unit 0 family inet address 192.0.2.130/31
set interfaces xe-0/0/1 unit 0 family inet mtu 9000

delete interfaces xe-0/0/2 unit 0 family inet dhcp
set interfaces xe-0/0/2 description "DEV=spine21 IF=xe-0/0/1"
set interfaces xe-0/0/2 mtu 9192
set interfaces xe-0/0/2 unit 0 family inet address 192.0.2.132/31
set interfaces xe-0/0/2 unit 0 family inet mtu 9000

delete interfaces xe-0/0/3 unit 0 family inet dhcp
set interfaces xe-0/0/3 description "DEV=spine22 IF=xe-0/0/1"
set interfaces xe-0/0/3 mtu 9192
set interfaces xe-0/0/3 unit 0 family inet address 192.0.2.134/31
set interfaces xe-0/0/3 unit 0 family inet mtu 9000

set protocols lldp port-id-subtype interface-name
set protocols lldp interface xe-0/0/0
set protocols lldp interface xe-0/0/1
set protocols lldp interface xe-0/0/2
set protocols lldp interface xe-0/0/3
  • spine11
delete interfaces xe-0/0/0 unit 0 family inet dhcp
set interfaces xe-0/0/0 description "DEV=bb01 IF=xe-0/0/0"
set interfaces xe-0/0/0 mtu 9192
set interfaces xe-0/0/0 unit 0 family inet address 192.0.2.1/31
set interfaces xe-0/0/0 unit 0 family inet mtu 9000

delete interfaces xe-0/0/1 unit 0 family inet dhcp
set interfaces xe-0/0/1 description "DEV=bb02 IF=xe-0/0/0"
set interfaces xe-0/0/1 mtu 9192
set interfaces xe-0/0/1 unit 0 family inet address 192.0.2.129/31
set interfaces xe-0/0/1 unit 0 family inet mtu 9000

delete interfaces xe-0/0/2 unit 0 family inet dhcp
set interfaces xe-0/0/2 description "DEV=torSW101a IF=eth0"

set protocols lldp port-id-subtype interface-name
set protocols lldp interface xe-0/0/0
set protocols lldp interface xe-0/0/1
  • spine12
delete interfaces xe-0/0/0 unit 0 family inet dhcp
set interfaces xe-0/0/0 description "DEV=bb01 IF=xe-0/0/1"
set interfaces xe-0/0/0 mtu 9192
set interfaces xe-0/0/0 unit 0 family inet address 192.0.2.3/31
set interfaces xe-0/0/0 unit 0 family inet mtu 9000

delete interfaces xe-0/0/1 unit 0 family inet dhcp
set interfaces xe-0/0/1 description "DEV=bb02 IF=xe-0/0/1"
set interfaces xe-0/0/1 mtu 9192
set interfaces xe-0/0/1 unit 0 family inet address 192.0.2.131/31
set interfaces xe-0/0/1 unit 0 family inet mtu 9000

delete interfaces xe-0/0/2 unit 0 family inet dhcp
set interfaces xe-0/0/2 description "DEV=torSW101a IF=eth1"

set protocols lldp port-id-subtype interface-name
set protocols lldp interface xe-0/0/0
set protocols lldp interface xe-0/0/1
  • spine21
delete interfaces xe-0/0/0 unit 0 family inet dhcp
set interfaces xe-0/0/0 description "DEV=bb01 IF=xe-0/0/2"
set interfaces xe-0/0/0 mtu 9192
set interfaces xe-0/0/0 unit 0 family inet address 192.0.2.5/31
set interfaces xe-0/0/0 unit 0 family inet mtu 9000

delete interfaces xe-0/0/1 unit 0 family inet dhcp
set interfaces xe-0/0/1 description "DEV=bb02 IF=xe-0/0/2"
set interfaces xe-0/0/1 mtu 9192
set interfaces xe-0/0/1 unit 0 family inet address 192.0.2.133/31
set interfaces xe-0/0/1 unit 0 family inet mtu 9000

delete interfaces xe-0/0/2 unit 0 family inet dhcp
set interfaces xe-0/0/2 description "DEV=torSW201a IF=eth0"

set protocols lldp port-id-subtype interface-name
set protocols lldp interface xe-0/0/0
set protocols lldp interface xe-0/0/1
  • spine22
delete interfaces xe-0/0/0 unit 0 family inet dhcp
set interfaces xe-0/0/0 description "DEV=bb01 IF=xe-0/0/3"
set interfaces xe-0/0/0 mtu 9192
set interfaces xe-0/0/0 unit 0 family inet address 192.0.2.7/31
set interfaces xe-0/0/0 unit 0 family inet mtu 9000

delete interfaces xe-0/0/1 unit 0 family inet dhcp
set interfaces xe-0/0/1 description "DEV=bb02 IF=xe-0/0/3"
set interfaces xe-0/0/1 mtu 9192
set interfaces xe-0/0/1 unit 0 family inet address 192.0.2.135/31
set interfaces xe-0/0/1 unit 0 family inet mtu 9000

delete interfaces xe-0/0/2 unit 0 family inet dhcp
set interfaces xe-0/0/2 description "DEV=torSW201a IF=eth1"

set protocols lldp port-id-subtype interface-name
set protocols lldp interface xe-0/0/0
set protocols lldp interface xe-0/0/1

Underlay eBGP 設定~簡易動作確認

これを設定すれば eBGP で各々の lo0 IPアドレスを相互学習できるようになる筈。

全台共通設定

ここでは6台の vQFX 全台共通の設定を。

ともあれまずは eBGP を動かします。

set protocols bgp group BGP_UNDERLAY type external
set protocols bgp group BGP_UNDERLAY mtu-discovery

広告対象は自身の lo0 のみとします。共通のポリシを全台で流用しました。この辺は今まで通りです。

set policy-options policy-statement POLICY_EXPORT_LO0 from family inet
set policy-options policy-statement POLICY_EXPORT_LO0 from protocol direct
set policy-options policy-statement POLICY_EXPORT_LO0 from route-filter 0.0.0.0/0 prefix-length-range /32-/32
set policy-options policy-statement POLICY_EXPORT_LO0 then accept

set protocols bgp group BGP_UNDERLAY export POLICY_EXPORT_LO0

このあたりは Underlay 設計にもよるのですが...。
今回は IGP を一切使わずに、eBGP のみで組んでいます。(今回のような環境を作るのに、モノの本には IGP が推奨されていたりもしますが。)
また、2台をペアとして同じ AS に所属させています。(6台を各々別 AS に所属させる選択肢もあって、システムの拡張性と利用可能な AS 番号の数、2byte ASN か 4byte ASN か...など考慮して、選択すればよいのでは。)
なので、spine 4 台の lo0 同士でフルメッシュに到達性を持たせるためには、以下を実現する必要があります。

  • AS 65000bb0[12] は、AS 65001spine11 から受信した経路(spine11lo0 IP アドレス)を、AS 65001spine12 に広告する必要がある
    • AS 65002 から来る経路に関しても同様
  • AS 65001spine11 は、AS PATH が 65000 65001 となる spine12 への経路を学習しないといけない

というわけで、以下の設定を入れておきます。

set protocols bgp group BGP_UNDERLAY advertise-peer-as
set protocols bgp group BGP_UNDERLAY family inet unicast loops 2

今回冗長化しますので spine 同士の通信が Per Packet で ECMP されるポリシを適用して、マルチパスを動かします。
VXLANカプセル化したパケットの通信は全て SrcIP も DstIP も spine (の lo0)の通信になるわけで、spine[12]{2} にさえ入れておけば良い設定の筈ですが。

set policy-options policy-statement POLICY_ECMP then load-balance per-packet

set routing-options forwarding-table export POLICY_ECMP
set routing-options forwarding-table ecmp-fast-reroute

set protocols bgp group BGP_UNDERLAY multipath

折角だから BFD も動かしてみようと欲張ったのですが、別に必須ではないし環境が弱いと邪魔になりかねないので、お好みでどうぞ。

set protocols bgp group BGP_UNDERLAY bfd-liveness-detection minimum-interval 350
set protocols bgp group BGP_UNDERLAY bfd-liveness-detection multiplier 3
set protocols bgp group BGP_UNDERLAY bfd-liveness-detection session-mode automatic

ログファイルを /var/log/bgp.log に仕分けたりの細々した設定なのでお好みでどうぞ。

set protocols bgp traceoptions file bgp.log
set protocols bgp traceoptions file size 10k
set protocols bgp traceoptions file files 30
set protocols bgp traceoptions flag normal
set protocols bgp log-updown

仮想環境でどんだけ意味があるかっつーと...ですけど、あーコマンドは入るのねーくらいの感じで投入しただけです。まー今回の箱庭検証では不要でしょう。

set protocols bgp graceful-restart

個別設定

ここから先は個別パラメータの設定です。
lo0IPアドレス、AS番号、neighbor 設定くらいすね。

  • bb01
set interfaces lo0 unit 0 family inet address 172.31.0.1/32

set routing-options router-id 172.31.0.1
set routing-options autonomous-system 65000

set protocols bgp group BGP_UNDERLAY neighbor 192.0.2.1 description spine11
set protocols bgp group BGP_UNDERLAY neighbor 192.0.2.1 peer-as 65001
set protocols bgp group BGP_UNDERLAY neighbor 192.0.2.3 description spine12
set protocols bgp group BGP_UNDERLAY neighbor 192.0.2.3 peer-as 65001
set protocols bgp group BGP_UNDERLAY neighbor 192.0.2.5 description spine21
set protocols bgp group BGP_UNDERLAY neighbor 192.0.2.5 peer-as 65002
set protocols bgp group BGP_UNDERLAY neighbor 192.0.2.7 description spine22
set protocols bgp group BGP_UNDERLAY neighbor 192.0.2.7 peer-as 65002
  • bb02
set interfaces lo0 unit 0 family inet address 172.31.0.2/32

set routing-options router-id 172.31.0.2
set routing-options autonomous-system 65000

set protocols bgp group BGP_UNDERLAY neighbor 192.0.2.129 description spine11
set protocols bgp group BGP_UNDERLAY neighbor 192.0.2.129 peer-as 65001
set protocols bgp group BGP_UNDERLAY neighbor 192.0.2.131 description spine12
set protocols bgp group BGP_UNDERLAY neighbor 192.0.2.131 peer-as 65001
set protocols bgp group BGP_UNDERLAY neighbor 192.0.2.133 description spine21
set protocols bgp group BGP_UNDERLAY neighbor 192.0.2.133 peer-as 65002
set protocols bgp group BGP_UNDERLAY neighbor 192.0.2.135 description spine22
set protocols bgp group BGP_UNDERLAY neighbor 192.0.2.135 peer-as 65002
  • spine11
set interfaces lo0 unit 0 family inet address 172.16.1.1/32

set routing-options router-id 172.16.1.1
set routing-options autonomous-system 65001

set protocols bgp group BGP_UNDERLAY neighbor 192.0.2.0 description bb01
set protocols bgp group BGP_UNDERLAY neighbor 192.0.2.0 peer-as 65000
set protocols bgp group BGP_UNDERLAY neighbor 192.0.2.128 description bb02
set protocols bgp group BGP_UNDERLAY neighbor 192.0.2.128 peer-as 65000
  • spine12
set interfaces lo0 unit 0 family inet address 172.16.1.2/32

set routing-options router-id 172.16.1.2
set routing-options autonomous-system 65001

set protocols bgp group BGP_UNDERLAY neighbor 192.0.2.2 description bb01
set protocols bgp group BGP_UNDERLAY neighbor 192.0.2.2 peer-as 65000
set protocols bgp group BGP_UNDERLAY neighbor 192.0.2.130 description bb02
set protocols bgp group BGP_UNDERLAY neighbor 192.0.2.130 peer-as 65000
  • spine21
set interfaces lo0 unit 0 family inet address 172.16.2.1/32

set routing-options router-id 172.16.2.1
set routing-options autonomous-system 65002

set protocols bgp group BGP_UNDERLAY neighbor 192.0.2.4 description bb01
set protocols bgp group BGP_UNDERLAY neighbor 192.0.2.4 peer-as 65000
set protocols bgp group BGP_UNDERLAY neighbor 192.0.2.132 description bb02
set protocols bgp group BGP_UNDERLAY neighbor 192.0.2.132 peer-as 65000
  • spine22
set interfaces lo0 unit 0 family inet address 172.16.2.2/32

set routing-options router-id 172.16.2.2
set routing-options autonomous-system 65002

set protocols bgp group BGP_UNDERLAY neighbor 192.0.2.6 description bb01
set protocols bgp group BGP_UNDERLAY neighbor 192.0.2.6 peer-as 65000
set protocols bgp group BGP_UNDERLAY neighbor 192.0.2.134 description bb02
set protocols bgp group BGP_UNDERLAY neighbor 192.0.2.134 peer-as 65000

Underlay eBGP ECMP 動作確認

これで BGP ステータスやテーブルを確認・フルメッシュでの lo0 間疎通ができるようになります。

では ECMP の挙動を軽く確認しておきます。
例えば spine11spine21 は互いに相手の lo0 への経路を 2 パス(bb01 経由と bb02 経由)持っています。

{master:0}
kotetsu@spine11> show route 172.16.2.1

inet.0: 14 destinations, 17 routes (14 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

172.16.2.1/32      *[BGP/170] 00:20:02, localpref 100
                      AS path: 65000 65002 I, validation-state: unverified
                      to 192.0.2.0 via xe-0/0/0.0
                    > to 192.0.2.128 via xe-0/0/1.0
                    [BGP/170] 00:02:45, localpref 100
                      AS path: 65000 65002 I, validation-state: unverified
                    > to 192.0.2.0 via xe-0/0/0.0
{master:0}
kotetsu@spine21> show route 172.16.1.1

inet.0: 14 destinations, 19 routes (14 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

172.16.1.1/32      *[BGP/170] 00:19:54, localpref 100
                      AS path: 65000 65001 I, validation-state: unverified
                      to 192.0.2.4 via xe-0/0/0.0
                    > to 192.0.2.132 via xe-0/0/1.0
                    [BGP/170] 00:03:23, localpref 100
                      AS path: 65000 65001 I, validation-state: unverified
                    > to 192.0.2.4 via xe-0/0/0.0

ここで spine11 から spine21 に対して(双方 lo0) ping を実行しつつ

{master:0}
kotetsu@spine11> ping source 172.16.1.1 172.16.2.1
PING 172.16.2.1 (172.16.2.1): 56 data bytes
64 bytes from 172.16.2.1: icmp_seq=0 ttl=63 time=28.369 ms
64 bytes from 172.16.2.1: icmp_seq=1 ttl=63 time=37.308 ms

...

--- 172.16.2.1 ping statistics ---
21 packets transmitted, 21 packets received, 0% packet loss
round-trip min/avg/max/stddev = 18.653/29.917/45.982/7.670 ms

spine11spine21 から bb01bb02 へのリンクをパケットキャプチャすると、分散していることが分かります。

あと一応、AS65001 に所属している spine12 へのパスが AS65001 に所属している spine11 で AS-PATH 65000 65001 で学習できていることも確認しておきます。
各 spine の unicast loops 設定と、各 bb の advertise-peer-as 設定が働いて、狙い通り学習できています。

{master:0}
kotetsu@spine11> show route 172.16.1.2

inet.0: 14 destinations, 17 routes (14 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

172.16.1.2/32      *[BGP/170] 00:29:19, localpref 100
                      AS path: 65000 65001 I, validation-state: unverified
                    > to 192.0.2.0 via xe-0/0/0.0
                      to 192.0.2.128 via xe-0/0/1.0
                    [BGP/170] 00:29:15, localpref 100
                      AS path: 65000 65001 I, validation-state: unverified
                    > to 192.0.2.128 via xe-0/0/1.0

Overlay 設定(MP-BGP)

Underlay eBGP で経路交換した各 lo0 同士で iBGP を組みます。
これで EVPN NLRI をやりとりする MP-BGP が組める筈。
設定は長々となってしまいますが、絵にすると単純で以下のような感じです。

f:id:kakkotetsu:20170513141154p:plain

全台共通設定

前回と同様の設定です。

set protocols bgp group BGP_OVERLAY type internal
set protocols bgp group BGP_OVERLAY family evpn signaling
set protocols bgp group BGP_OVERLAY local-as 64512

個別設定

bb0[12] が共に RouteReflector になる構成をとるので、同じクラスタIDを設定します。
場合によっては RouteReflector 使わずにフルメッシュにするなり、お好きなようにどうぞ。(なんか、今回こればっかり言ってる気がする...。)

  • bb01
set protocols bgp group BGP_OVERLAY local-address 172.31.0.1

set protocols bgp group BGP_OVERLAY cluster 172.31.0.0

set protocols bgp group BGP_OVERLAY neighbor 172.16.1.1 description spine11
set protocols bgp group BGP_OVERLAY neighbor 172.16.1.2 description spine12
set protocols bgp group BGP_OVERLAY neighbor 172.16.2.1 description spine21
set protocols bgp group BGP_OVERLAY neighbor 172.16.2.2 description spine22
  • bb02
set protocols bgp group BGP_OVERLAY local-address 172.31.0.2

set protocols bgp group BGP_OVERLAY cluster 172.31.0.0

set protocols bgp group BGP_OVERLAY neighbor 172.16.1.1 description spine11
set protocols bgp group BGP_OVERLAY neighbor 172.16.1.2 description spine12
set protocols bgp group BGP_OVERLAY neighbor 172.16.2.1 description spine21
set protocols bgp group BGP_OVERLAY neighbor 172.16.2.2 description spine22
  • spine11
set protocols bgp group BGP_OVERLAY local-address 172.16.1.1

set protocols bgp group BGP_OVERLAY neighbor 172.31.0.1 description bb01
set protocols bgp group BGP_OVERLAY neighbor 172.31.0.2 description bb02
  • spine12
set protocols bgp group BGP_OVERLAY local-address 172.16.1.2

set protocols bgp group BGP_OVERLAY neighbor 172.31.0.1 description bb01
set protocols bgp group BGP_OVERLAY neighbor 172.31.0.2 description bb02
  • spine21
set protocols bgp group BGP_OVERLAY local-address 172.16.2.1

set protocols bgp group BGP_OVERLAY neighbor 172.31.0.1 description bb01
set protocols bgp group BGP_OVERLAY neighbor 172.31.0.2 description bb02
  • spine22
set protocols bgp group BGP_OVERLAY local-address 172.16.2.2

set protocols bgp group BGP_OVERLAY neighbor 172.31.0.1 description bb01
set protocols bgp group BGP_OVERLAY neighbor 172.31.0.2 description bb02

Overlay設定~確認(EVPN+VXLAN)

EVPN の設定と、その Dataplane として使う VXLAN 周りの設定をしていきます。前回のシングル構成版と全く一緒です。
bb0[12] はこの辺の挙動に関しては土管に徹しているので登場しません。

spine[12]{2} 4台共通設定

set vlans VLAN0100 vlan-id 100
set vlans VLAN0100 vxlan vni 10100

set vlans VLAN0200 vlan-id 200
set vlans VLAN0200 vxlan vni 10200

set protocols evpn encapsulation vxlan
set protocols evpn extended-vni-list all
set protocols evpn multicast-mode ingress-replication
set protocols evpn vni-options vni 10100 vrf-target export target:1:10100
set protocols evpn vni-options vni 10200 vrf-target export target:1:10200

set policy-options community COM_10100 members target:1:10100
set policy-options community COM_10200 members target:1:10200
set policy-options community COM_LEAF_ESI members target:9999:9999

set policy-options policy-statement POLICY_VRF_IMPORT term T_10100 from community COM_10100
set policy-options policy-statement POLICY_VRF_IMPORT term T_10100 then accept
set policy-options policy-statement POLICY_VRF_IMPORT term T_10200 from community COM_10200
set policy-options policy-statement POLICY_VRF_IMPORT term T_10200 then accept
set policy-options policy-statement POLICY_VRF_IMPORT term T_99900 from community COM_LEAF_ESI
set policy-options policy-statement POLICY_VRF_IMPORT term T_99900 then accept
set policy-options policy-statement POLICY_VRF_IMPORT term T_99999 then reject

set switch-options vtep-source-interface lo0.0
set switch-options vrf-import POLICY_VRF_IMPORT
set switch-options vrf-target target:9999:9999
set switch-options vrf-target auto

個別設定

つっても RD だけですね。

  • spine11
set switch-options route-distinguisher 64512:11
  • spine12
set switch-options route-distinguisher 64512:12
  • spine21
set switch-options route-distinguisher 64512:21
  • spine22
set switch-options route-distinguisher 64512:22

EVPN Multihoming 設定

ようやく本題ですね!(息切れ)

spine1[12]torSW101a に、spine2[12]torSW201a に、それぞれペアで ae0 を提供し、それを EVPN 的には Ethernet Segmeent(ES) として扱い、その識別子として Ethernet Segment Identifier(ESI) を付与します。

spine[12]{2} 4台共通設定

まずは普通の AE 設定です。

  • ethernet device-count を 448 なんて値にしているのは、単に ? で出てきたヘルプに書いてあった最大値を入れただけ。足りないより良かろうガハハ!くらいのノリです。
  • torSW[12]01a 向けの物理インターフェース(4台共に xe-0/0/2)を ae0 に所属させ
  • ae0vlan trunk で(自身の DB にある)全 VLAN 食うようにしておき
  • LACP は
    • default の interval fast (1s間隔)で良いので特に設定せず
    • spine[12]{2} の別筐体にまたがる LAG を torSW[12]01a 側では同じものと見做せる必要があるので、各ペアで同じ system-id を付与
      • それぞれ 3 台(spine 2 台と torSW 1 台)で閉じる部分だし、spine[12]{2} で同じ値を使ってしまっているが、本来は一意になるようにしておいた方が無難かと
    • system-priority 値はデフォルトの 127 のまま
  • EVPN Multihoming Mode としては active-active を使う
set chassis aggregated-devices ethernet device-count 448

set interfaces xe-0/0/2 ether-options 802.3ad ae0
delete interfaces xe-0/0/2 unit 0

set interfaces ae0 unit 0 family ethernet-switching interface-mode trunk
set interfaces ae0 unit 0 family ethernet-switching vlan members all

set interfaces ae0 aggregated-ether-options lacp active
set interfaces ae0 aggregated-ether-options lacp system-id 00:00:00:01:01:01

set interfaces ae0 esi all-active

個別設定

ペア単位で個別な設定を見ていきましょうか。って、ESI だけですが。

2017/02/26 時点で Juniper公式 / Supported EVPN Standardsを見ると

RFC 7432, BGP MPLS-Based Ethernet VPN The following features are not supported: - Automatic derivation of Ethernet segment (ES) values. Only static ES configurations are supported.

となっていますので、巧いこと EVPN の世界で一意になるように手動で付与してやりましょう。

RFC74322 / Ethernet Segment あたりを参照して、1octet 目は ESI Type0x00 とし、あとは適当に。
今回は 9octet 目で spine のペアを、10octet 目で ae 番号を、それぞれ識別できるように付与しました。

  • spine1[12]
set interfaces ae0 esi 00:01:01:01:01:01:01:01:01:00
  • spine2[12]
set interfaces ae0 esi 00:01:01:01:01:01:01:01:02:00

ちなみにですが、自分の環境起因か手順の不備か仕様か分かりませんが、この ESI 値を変更した時、古い ESI が(巧いこと WithDrawn されなかったのかも知れないですが)残ってしまうことがありました。
spine1[12] ae0 用の ESI を変更したら、spine2[12] 側に新旧両方の経路が残ってしまったんですよね。
何でかは追ってません。もし、そんな状況が再現できた & 原因が分かったりしたら & それが公開可能な情報なら、教えてください。

動作確認

node間の疎通確認

node11node21$ sudo ip n flush dev ens4 して node11 から ping 192.168.1.2 で L2 over L3 通信できることを確認できます。

kotetsu@node11:~$ ping 192.168.1.2
PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data.
64 bytes from 192.168.1.2: icmp_seq=1 ttl=64 time=181 ms
64 bytes from 192.168.1.2: icmp_seq=2 ttl=64 time=61.4 ms
64 bytes from 192.168.1.2: icmp_seq=3 ttl=64 time=35.1 ms
64 bytes from 192.168.1.2: icmp_seq=4 ttl=64 time=42.7 ms
64 bytes from 192.168.1.2: icmp_seq=5 ttl=64 time=53.8 ms
64 bytes from 192.168.1.2: icmp_seq=6 ttl=64 time=78.2 ms
64 bytes from 192.168.1.2: icmp_seq=7 ttl=64 time=33.6 ms
64 bytes from 192.168.1.2: icmp_seq=8 ttl=64 time=35.1 ms
64 bytes from 192.168.1.2: icmp_seq=9 ttl=64 time=33.4 ms
64 bytes from 192.168.1.2: icmp_seq=10 ttl=64 time=61.7 ms
^C
--- 192.168.1.2 ping statistics ---
10 packets transmitted, 10 received, 0% packet loss, time 9013ms
rtt min/avg/max/mdev = 33.471/61.687/181.628/42.518 ms

ここで気になったのは EVPN Multihoming の aliasing 動作に沿うならば、Request パケットは spine21 宛と spine22 宛にロードバランスされて、Reply パケットは spine11spine12 にロードバランスされる、って動きになる筈なんですが、そうはならなかった(どちらもその時点での Type2 のみに従って、寄っていた)です。
何らかの環境起因(サポートされないHW or SW)なのか、設定不足なのか...。何かわかったら追記しておきます(2017/02/27)。

vQFX のテーブル確認

EVPN Multihoming の Designated Forwarder 状態

spine1[12] で組んでいる ae0 ESI= 00:01:01:01:01:01:01:01:01:00 においては 172.16.1.1 = spine11 が DF として選出されていることが見えます。
一方で、自身が関与していない spine2[12] で組んでいる ae0 ESI= 00:01:01:01:01:01:01:01:02:00 においては No local attachment to ethernet segment として何も情報出さないようです。

{master:0}
kotetsu@spine11> show evpn instance designated-forwarder
Instance: default-switch
  Number of ethernet segments: 2
    ESI: 00:01:01:01:01:01:01:01:01:00
      Designated forwarder: 172.16.1.1
    ESI: 00:01:01:01:01:01:01:01:02:00
      Designated forwarder: No local attachment to ethernet segment
{master:0}
kotetsu@spine12> show evpn instance designated-forwarder
Instance: default-switch
  Number of ethernet segments: 2
    ESI: 00:01:01:01:01:01:01:01:01:00
      Designated forwarder: 172.16.1.1
    ESI: 00:01:01:01:01:01:01:01:02:00
      Designated forwarder: No local attachment to ethernet segment

spine2[12] で組んでいる ae0 ESI= 00:01:01:01:01:01:01:01:02:00 においては 172.16.2.1 = spine21 が DF として選出されていることが見えます。
一方で、自身が関与していない spine1[12] で組んでいる ae0 ESI= 00:01:01:01:01:01:01:01:01:00 においては No local attachment to ethernet segment として何も情報出さないようです。

{master:0}
kotetsu@spine21> show evpn instance designated-forwarder
Instance: default-switch
  Number of ethernet segments: 2
    ESI: 00:01:01:01:01:01:01:01:01:00
      Designated forwarder: No local attachment to ethernet segment
    ESI: 00:01:01:01:01:01:01:01:02:00
      Designated forwarder: 172.16.2.1
kotetsu@spine22> show evpn instance designated-forwarder
Instance: default-switch
  Number of ethernet segments: 2
    ESI: 00:01:01:01:01:01:01:01:01:00
      Designated forwarder: No local attachment to ethernet segment
    ESI: 00:01:01:01:01:01:01:01:02:00
      Designated forwarder: 172.16.2.1

なお、選出の流れは Juniper公式 / Designated Forwarder Election あたりに書いてあります。
本環境では単純にIPアドレスが小さいやつがそれぞれ DF に選出されたっぽいです。

以下のコマンドで、下の方 Number of ethernet segments でやや詳しい情報が見られるようです。

{master:0}
kotetsu@spine11> show evpn instance extensive
Instance: __default_evpn__
  Route Distinguisher: 172.16.1.1:0
  Number of bridge domains: 0
  Number of neighbors: 1
    172.16.1.2
      Received routes
        Ethernet Segment:                       1

Instance: default-switch
  Route Distinguisher: 64512:11
  Encapsulation type: VXLAN
  MAC database status                     Local  Remote
    MAC advertisements:                       1       1
    MAC+IP advertisements:                    0       0
    Default gateway MAC advertisements:       0       0
  Number of local interfaces: 1 (1 up)
    Interface name  ESI                            Mode             Status
    ae0.0           00:01:01:01:01:01:01:01:01:00  all-active       Up
  Number of IRB interfaces: 0 (0 up)
  Number of bridge domains: 3
    VLAN  VNI    Intfs / up    IRB intf   Mode             MAC sync  IM route label
    100   10100      1   1                Extended         Enabled   10100
    200   10200      1   1                Extended         Enabled   10200
    300   10300      1   1                Extended         Enabled   10300
  Number of neighbors: 3
    172.16.1.2
      Received routes
        MAC address advertisement:              0
        MAC+IP address advertisement:           0
        Inclusive multicast:                    2
        Ethernet auto-discovery:                2
    172.16.2.1
      Received routes
        MAC address advertisement:              0
        MAC+IP address advertisement:           0
        Inclusive multicast:                    2
        Ethernet auto-discovery:                2
    172.16.2.2
      Received routes
        MAC address advertisement:              1
        MAC+IP address advertisement:           0
        Inclusive multicast:                    2
        Ethernet auto-discovery:                2
  Number of ethernet segments: 2
    ESI: 00:01:01:01:01:01:01:01:01:00
      Status: Resolved by IFL ae0.0
      Local interface: ae0.0, Status: Up/Forwarding
      Number of remote PEs connected: 1
        Remote PE        MAC label  Aliasing label  Mode
        172.16.1.2       0          0               all-active
      Designated forwarder: 172.16.1.1
      Backup forwarder: 172.16.1.2
    ESI: 00:01:01:01:01:01:01:01:02:00
      Status: Resolved
      Number of remote PEs connected: 2
        Remote PE        MAC label  Aliasing label  Mode
        172.16.2.1       0          0               all-active
        172.16.2.2       10100      0               all-active
  Router-ID: 172.16.1.1
  Source VTEP interface IP: 172.16.1.1
{master:0}
kotetsu@spine12> show evpn instance extensive
Instance: __default_evpn__
  Route Distinguisher: 172.16.1.2:0
  Number of bridge domains: 0
  Number of neighbors: 1
    172.16.1.1
      Received routes
        Ethernet Segment:                       1

Instance: default-switch
  Route Distinguisher: 64512:12
  Encapsulation type: VXLAN
  MAC database status                     Local  Remote
    MAC advertisements:                       0       2
    MAC+IP advertisements:                    0       0
    Default gateway MAC advertisements:       0       0
  Number of local interfaces: 1 (1 up)
    Interface name  ESI                            Mode             Status
    ae0.0           00:01:01:01:01:01:01:01:01:00  all-active       Up
  Number of IRB interfaces: 0 (0 up)
  Number of bridge domains: 2
    VLAN  VNI    Intfs / up    IRB intf   Mode             MAC sync  IM route label
    100   10100      1   1                Extended         Enabled   10100
    200   10200      1   1                Extended         Enabled   10200
  Number of neighbors: 3
    172.16.1.1
      Received routes
        MAC address advertisement:              1
        MAC+IP address advertisement:           0
        Inclusive multicast:                    2
        Ethernet auto-discovery:                2
    172.16.2.1
      Received routes
        MAC address advertisement:              0
        MAC+IP address advertisement:           0
        Inclusive multicast:                    2
        Ethernet auto-discovery:                2
    172.16.2.2
      Received routes
        MAC address advertisement:              1
        MAC+IP address advertisement:           0
        Inclusive multicast:                    2
        Ethernet auto-discovery:                2
  Number of ethernet segments: 2
    ESI: 00:01:01:01:01:01:01:01:01:00
      Status: Resolved by IFL ae0.0
      Local interface: ae0.0, Status: Up/Forwarding
      Number of remote PEs connected: 1
        Remote PE        MAC label  Aliasing label  Mode
        172.16.1.1       10100      0               all-active
      Designated forwarder: 172.16.1.1
      Backup forwarder: 172.16.1.2
    ESI: 00:01:01:01:01:01:01:01:02:00
      Status: Resolved
      Number of remote PEs connected: 2
        Remote PE        MAC label  Aliasing label  Mode
        172.16.2.2       10100      0               all-active
        172.16.2.1       0          0               all-active
  Router-ID: 172.16.1.2
  Source VTEP interface IP: 172.16.1.2
{master:0}
kotetsu@spine21> show evpn instance extensive
Instance: __default_evpn__
  Route Distinguisher: 172.16.2.1:0
  Number of bridge domains: 0
  Number of neighbors: 1
    172.16.2.2
      Received routes
        Ethernet Segment:                       1

Instance: default-switch
  Route Distinguisher: 64512:21
  Encapsulation type: VXLAN
  MAC database status                     Local  Remote
    MAC advertisements:                       0       2
    MAC+IP advertisements:                    0       0
    Default gateway MAC advertisements:       0       0
  Number of local interfaces: 1 (1 up)
    Interface name  ESI                            Mode             Status
    ae0.0           00:01:01:01:01:01:01:01:02:00  all-active       Up
  Number of IRB interfaces: 0 (0 up)
  Number of bridge domains: 2
    VLAN  VNI    Intfs / up    IRB intf   Mode             MAC sync  IM route label
    100   10100      1   1                Extended         Enabled   10100
    200   10200      1   1                Extended         Enabled   10200
  Number of neighbors: 3
    172.16.1.1
      Received routes
        MAC address advertisement:              1
        MAC+IP address advertisement:           0
        Inclusive multicast:                    2
        Ethernet auto-discovery:                2
    172.16.1.2
      Received routes
        MAC address advertisement:              0
        MAC+IP address advertisement:           0
        Inclusive multicast:                    2
        Ethernet auto-discovery:                2
    172.16.2.2
      Received routes
        MAC address advertisement:              1
        MAC+IP address advertisement:           0
        Inclusive multicast:                    2
        Ethernet auto-discovery:                2
  Number of ethernet segments: 2
    ESI: 00:01:01:01:01:01:01:01:01:00
      Status: Resolved
      Number of remote PEs connected: 2
        Remote PE        MAC label  Aliasing label  Mode
        172.16.1.1       10100      0               all-active
        172.16.1.2       0          0               all-active
    ESI: 00:01:01:01:01:01:01:01:02:00
      Status: Resolved by IFL ae0.0
      Local interface: ae0.0, Status: Up/Forwarding
      Number of remote PEs connected: 1
        Remote PE        MAC label  Aliasing label  Mode
        172.16.2.2       10100      0               all-active
      Designated forwarder: 172.16.2.1
      Backup forwarder: 172.16.2.2
  Router-ID: 172.16.2.1
  Source VTEP interface IP: 172.16.2.1
{master:0}
kotetsu@spine22> show evpn instance extensive
Instance: __default_evpn__
  Route Distinguisher: 172.16.2.2:0
  Number of bridge domains: 0
  Number of neighbors: 1
    172.16.2.1
      Received routes
        Ethernet Segment:                       1

Instance: default-switch
  Route Distinguisher: 64512:22
  Encapsulation type: VXLAN
  MAC database status                     Local  Remote
    MAC advertisements:                       1       1
    MAC+IP advertisements:                    0       0
    Default gateway MAC advertisements:       0       0
  Number of local interfaces: 1 (1 up)
    Interface name  ESI                            Mode             Status
    ae0.0           00:01:01:01:01:01:01:01:02:00  all-active       Up
  Number of IRB interfaces: 0 (0 up)
  Number of bridge domains: 2
    VLAN  VNI    Intfs / up    IRB intf   Mode             MAC sync  IM route label
    100   10100      1   1                Extended         Enabled   10100
    200   10200      1   1                Extended         Enabled   10200
  Number of neighbors: 3
    172.16.1.1
      Received routes
        MAC address advertisement:              1
        MAC+IP address advertisement:           0
        Inclusive multicast:                    2
        Ethernet auto-discovery:                2
    172.16.1.2
      Received routes
        MAC address advertisement:              0
        MAC+IP address advertisement:           0
        Inclusive multicast:                    2
        Ethernet auto-discovery:                2
    172.16.2.1
      Received routes
        MAC address advertisement:              0
        MAC+IP address advertisement:           0
        Inclusive multicast:                    2
        Ethernet auto-discovery:                2
  Number of ethernet segments: 2
    ESI: 00:01:01:01:01:01:01:01:01:00
      Status: Resolved
      Number of remote PEs connected: 2
        Remote PE        MAC label  Aliasing label  Mode
        172.16.1.1       10100      0               all-active
        172.16.1.2       0          0               all-active
    ESI: 00:01:01:01:01:01:01:01:02:00
      Status: Resolved by IFL ae0.0
      Local interface: ae0.0, Status: Up/Forwarding
      Number of remote PEs connected: 1
        Remote PE        MAC label  Aliasing label  Mode
        172.16.2.1       0          0               all-active
      Designated forwarder: 172.16.2.1
      Backup forwarder: 172.16.2.2
  Router-ID: 172.16.2.2
  Source VTEP interface IP: 172.16.2.2

MACアドレステーブル

EVPN Multihoming を設定したことで

  • MAC flags の項目に L - locally learnedR - remote PE MAC が現れた
  • Logical interface の項目は
    • 自身・Multihomingペアで学習したものは local の AE インターフェース名
    • Remote の Multihoming ペアで学習したものは ESI インターフェース名
  • Active source として Remote VTEP の IPアドレスの代わりに ESI が表示されるようになった

例えば spine1[12] の Multihoming ペアを起点として見た時、以下のようになっています。

  • node11MAC アドレス 00:37:c4:e2:60:01spine11 側が学習したので
    • spine11 では L Flag がたった
    • spine12 では R Flag がたった
  • node21MAC アドレス 00:37:c4:46:d8:01spine2[12] が持つ ESI 00:01:01:01:01:01:01:01:02:00 から学習したので
    • spine1[12] 共に R Flag がたった - spine1[12] 共に Active source として spine2[12] で設定した ESI 00:01:01:01:01:01:01:01:02:00 が見える
{master:0}
kotetsu@spine11> show ethernet-switching table

MAC flags (S - static MAC, D - dynamic MAC, L - locally learned, P - Persistent static
           SE - statistics enabled, NM - non configured MAC, R - remote PE MAC, O - ovsdb MAC)


Ethernet switching table : 2 entries, 2 learned
Routing instance : default-switch
   Vlan                MAC                 MAC      Logical                Active
   name                address             flags    interface              source
   VLAN0100            00:37:c4:46:d8:01   DR       esi.1739               00:01:01:01:01:01:01:01:02:00
   VLAN0100            00:37:c4:e2:60:01   DL       ae0.0
{master:0}
kotetsu@spine12> show ethernet-switching table

MAC flags (S - static MAC, D - dynamic MAC, L - locally learned, P - Persistent static
           SE - statistics enabled, NM - non configured MAC, R - remote PE MAC, O - ovsdb MAC)


Ethernet switching table : 2 entries, 2 learned
Routing instance : default-switch
   Vlan                MAC                 MAC      Logical                Active
   name                address             flags    interface              source
   VLAN0100            00:37:c4:46:d8:01   DR       esi.1736               00:01:01:01:01:01:01:01:02:00
   VLAN0100            00:37:c4:e2:60:01   DR       ae0.0

spine2[12] の Multihoming ペアを起点として見た時も同様。

{master:0}
kotetsu@spine21> show ethernet-switching table

MAC flags (S - static MAC, D - dynamic MAC, L - locally learned, P - Persistent static
           SE - statistics enabled, NM - non configured MAC, R - remote PE MAC, O - ovsdb MAC)


Ethernet switching table : 2 entries, 2 learned
Routing instance : default-switch
   Vlan                MAC                 MAC      Logical                Active
   name                address             flags    interface              source
   VLAN0100            00:37:c4:46:d8:01   DR       ae0.0
   VLAN0100            00:37:c4:e2:60:01   DR       esi.1742               00:01:01:01:01:01:01:01:01:00
{master:0}
kotetsu@spine22> show ethernet-switching table

MAC flags (S - static MAC, D - dynamic MAC, L - locally learned, P - Persistent static
           SE - statistics enabled, NM - non configured MAC, R - remote PE MAC, O - ovsdb MAC)


Ethernet switching table : 2 entries, 2 learned
Routing instance : default-switch
   Vlan                MAC                 MAC      Logical                Active
   name                address             flags    interface              source
   VLAN0100            00:37:c4:46:d8:01   DL       ae0.0
   VLAN0100            00:37:c4:e2:60:01   DR       esi.1742               00:01:01:01:01:01:01:01:01:00

Multihoming で使われる EVPN NLRI

4台全部のテーブルを並べても見難い & そこまで有意な差もないので、代表として spine11 で見ていきます。
Juniper公式 / EVPN Multihoming Overview - New BGP NLRIs あたりと見比べながらテーブルを眺めるのが良いでしょう。

まずシングル構成の時には登場しなかった Type 1(Ethernet Auto-Discovery (A-D) route) が登場します。
まずは Autodiscovery route per EVPN instance (EVI) を各 PE から学習している状況。こいつはどうも active-active モードでのみ登場するやつ。

{master:0}
kotetsu@spine11> show route table default-switch.evpn.0

default-switch.evpn.0: 18 destinations, 32 routes (18 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

1:64512:11::010101010101010100::0/304
                   *[EVPN/170] 6d 13:23:27
                      Indirect
1:64512:12::010101010101010100::0/304
                   *[BGP/170] 12:29:57, localpref 100, from 172.31.0.1
                      AS path: I, validation-state: unverified
                      to 192.0.2.0 via xe-0/0/0.0
                    > to 192.0.2.128 via xe-0/0/1.0
                    [BGP/170] 03:36:07, localpref 100, from 172.31.0.2
                      AS path: I, validation-state: unverified
                      to 192.0.2.0 via xe-0/0/0.0
                    > to 192.0.2.128 via xe-0/0/1.0
1:64512:21::010101010101010200::0/304
                   *[BGP/170] 2d 17:29:38, localpref 100, from 172.31.0.1
                      AS path: I, validation-state: unverified
                    > to 192.0.2.0 via xe-0/0/0.0
                      to 192.0.2.128 via xe-0/0/1.0
                    [BGP/170] 22:46:20, localpref 100, from 172.31.0.2
                      AS path: I, validation-state: unverified
                    > to 192.0.2.0 via xe-0/0/0.0
                      to 192.0.2.128 via xe-0/0/1.0
1:64512:22::010101010101010200::0/304
                   *[BGP/170] 05:04:07, localpref 100, from 172.31.0.1
                      AS path: I, validation-state: unverified
                    > to 192.0.2.0 via xe-0/0/0.0
                      to 192.0.2.128 via xe-0/0/1.0
                    [BGP/170] 1d 00:56:03, localpref 100, from 172.31.0.2
                      AS path: I, validation-state: unverified
                    > to 192.0.2.0 via xe-0/0/0.0
                      to 192.0.2.128 via xe-0/0/1.0

同様に Type 1(Ethernet Auto-Discovery (A-D) route) で今度は Autodiscovery route per Ethernet segment なやつ。

{master:0}
kotetsu@spine11> show route table default-switch.evpn.0

default-switch.evpn.0: 18 destinations, 32 routes (18 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

...

1:172.16.1.2:0::010101010101010100::FFFF:FFFF/304
                   *[BGP/170] 12:29:57, localpref 100, from 172.31.0.1
                      AS path: I, validation-state: unverified
                    > to 192.0.2.0 via xe-0/0/0.0
                      to 192.0.2.128 via xe-0/0/1.0
                    [BGP/170] 03:36:07, localpref 100, from 172.31.0.2
                      AS path: I, validation-state: unverified
                    > to 192.0.2.0 via xe-0/0/0.0
                      to 192.0.2.128 via xe-0/0/1.0
1:172.16.2.1:0::010101010101010200::FFFF:FFFF/304
                   *[BGP/170] 2d 17:29:38, localpref 100, from 172.31.0.1
                      AS path: I, validation-state: unverified
                    > to 192.0.2.0 via xe-0/0/0.0
                      to 192.0.2.128 via xe-0/0/1.0
                    [BGP/170] 22:46:20, localpref 100, from 172.31.0.2
                      AS path: I, validation-state: unverified
                    > to 192.0.2.0 via xe-0/0/0.0
                      to 192.0.2.128 via xe-0/0/1.0
1:172.16.2.2:0::010101010101010200::FFFF:FFFF/304
                   *[BGP/170] 05:04:05, localpref 100, from 172.31.0.1
                      AS path: I, validation-state: unverified
                    > to 192.0.2.0 via xe-0/0/0.0
                      to 192.0.2.128 via xe-0/0/1.0
                    [BGP/170] 1d 00:56:03, localpref 100, from 172.31.0.2
                      AS path: I, validation-state: unverified
                    > to 192.0.2.0 via xe-0/0/0.0
                      to 192.0.2.128 via xe-0/0/1.0

次、Type 4(Ethernet Segment route) で、こいつを使って同じ ESI を持つ PE を識別するそうですよ。

{master:0}
kotetsu@spine11> show route table __default_evpn__.evpn.0

__default_evpn__.evpn.0: 5 destinations, 8 routes (5 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

1:172.16.1.1:0::010101010101010100::FFFF:FFFF/304
                   *[EVPN/170] 6d 13:58:52
                      Indirect
4:172.16.1.1:0::010101010101010100:172.16.1.1/304
                   *[EVPN/170] 6d 13:58:52
                      Indirect
4:172.16.1.2:0::010101010101010100:172.16.1.2/304
                   *[BGP/170] 13:05:22, localpref 100, from 172.31.0.1
                      AS path: I, validation-state: unverified
                    > to 192.0.2.0 via xe-0/0/0.0
                      to 192.0.2.128 via xe-0/0/1.0
                    [BGP/170] 04:11:32, localpref 100, from 172.31.0.2
                      AS path: I, validation-state: unverified
                    > to 192.0.2.0 via xe-0/0/0.0
                      to 192.0.2.128 via xe-0/0/1.0
4:172.16.2.1:0::010101010101010200:172.16.2.1/304
                   *[BGP/170] 2d 18:05:03, localpref 100, from 172.31.0.1
                      AS path: I, validation-state: unverified
                    > to 192.0.2.0 via xe-0/0/0.0
                      to 192.0.2.128 via xe-0/0/1.0
                    [BGP/170] 23:21:45, localpref 100, from 172.31.0.2
                      AS path: I, validation-state: unverified
                    > to 192.0.2.0 via xe-0/0/0.0
                      to 192.0.2.128 via xe-0/0/1.0
4:172.16.2.2:0::010101010101010200:172.16.2.2/304
                   *[BGP/170] 05:39:30, localpref 100, from 172.31.0.1
                      AS path: I, validation-state: unverified
                    > to 192.0.2.0 via xe-0/0/0.0
                      to 192.0.2.128 via xe-0/0/1.0
                    [BGP/170] 1d 01:31:28, localpref 100, from 172.31.0.2
                      AS path: I, validation-state: unverified
                    > to 192.0.2.0 via xe-0/0/0.0
                      to 192.0.2.128 via xe-0/0/1.0

BUM トラフィックフロー確認

Ethernet Segment(ES) で選出された Designated Forwarder(DF) のみが BUM トラフィックを CE 方面に転送する、という動きをパケットキャプチャしてみました。
つって、ここで各リンクのパケットキャプチャを並べてもアレなので、キャプチャから動作を纏めた簡易図のみを...。

f:id:kakkotetsu:20170513141356p:plain

EthernetSegment リンク障害時動作確認(簡易)

仮想環境で真面目に障害試験をやるつもりはサラサラないので、何となく「あー Multihoming を組んでいる単一リンクを落としても経路が切り替わるね」とか「あー、想定通りの経路広告がされるね」くらいの緩いやつです。
今回の環境だと spinetorSW の間に全部パケットキャプチャ用の Ethernet Switch を挟んでいて、対向のリンクダウン検知なんてできず、単に torSW 側では LACP で検知して切り替わっているだけですし...。

node11 から node21ping 撃ちながら spine11xe-0/0/2 (to torSW101a) を disable

{master:0}[edit]
kotetsu@spine11# show | compare
[edit interfaces xe-0/0/2]
+   disable;

切り替わり時間とかは、環境に左右されるので参考にしないでください。あくまで「ホントに切り替わったね」を見ただけですので。

kotetsu@node11:~$ ping 192.168.1.2
PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data.
64 bytes from 192.168.1.2: icmp_seq=1 ttl=64 time=49.3 ms
64 bytes from 192.168.1.2: icmp_seq=2 ttl=64 time=60.0 ms

...

64 bytes from 192.168.1.2: icmp_seq=59 ttl=64 time=58.4 ms
64 bytes from 192.168.1.2: icmp_seq=60 ttl=64 time=75.0 ms
64 bytes from 192.168.1.2: icmp_seq=64 ttl=64 time=64.3 ms
64 bytes from 192.168.1.2: icmp_seq=65 ttl=64 time=70.8 ms
64 bytes from 192.168.1.2: icmp_seq=66 ttl=64 time=44.2 ms

...

--- 192.168.1.2 ping statistics ---
68 packets transmitted, 65 received, 4% packet loss, time 67123ms
rtt min/avg/max/mdev = 27.724/44.214/92.524/13.438 ms

リンク復旧

--- 192.168.1.2 ping statistics ---
60 packets transmitted, 60 received, 0% packet loss, time 59093ms
rtt min/avg/max/mdev = 28.324/45.522/79.932/12.499 ms

この一連の障害で spine11 からの UPDATE/Withdrawn も拾ったので貼っておきます。

まずリンク disabled 時の WithDrawn
Type1~4~3と2が順次 WithDrawn されていく様です。(リンク障害時に Type 2,3 が全部 WithDrawn される前に、まずはザクッと ES 死んだことを別 PE に広告することで、少しでも早く自分宛に送ってこなくなるようにする = 高速な切り替えを促すために Type1 を最初に...という動き)

f:id:kakkotetsu:20170513141454p:plain

f:id:kakkotetsu:20170513141503p:plain

f:id:kakkotetsu:20170513141511p:plain

f:id:kakkotetsu:20170513141518p:plain

次にリンク復旧時の UPDATE
Type1~4~3が順次 UPDATE される様です。

f:id:kakkotetsu:20170513141543p:plain

f:id:kakkotetsu:20170513141552p:plain

f:id:kakkotetsu:20170513141559p:plain

f:id:kakkotetsu:20170513141607p:plain

動作確認(トラフィック バイバイ~ン 問題~~~(ドラ声で))

4台の spineclear ethernet-switching table して、node[12]1 は双方の MAC アドレスが ARP テーブルに載っている状態で

kotetsu@node11:~$ ip n show
10.0.0.254 dev ens3 lladdr 00:a0:de:c0:55:80 REACHABLE
192.168.1.2 dev ens4 lladdr 00:37:c4:46:d8:01 STALE
10.0.0.65 dev ens3 lladdr 52:54:00:07:30:c0 DELAY
kotetsu@node21:~$ ip n show
192.168.1.1 dev ens4 lladdr 00:37:c4:e2:60:01 STALE
10.0.0.254 dev ens3 lladdr 00:a0:de:c0:55:80 STALE
10.0.0.65 dev ens3 lladdr 52:54:00:07:30:c0 DELAY

DUP る。

kotetsu@node11:~$ ping 192.168.1.2
PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data.
64 bytes from 192.168.1.2: icmp_seq=1 ttl=64 time=85.3 ms
64 bytes from 192.168.1.2: icmp_seq=2 ttl=64 time=88.3 ms
64 bytes from 192.168.1.2: icmp_seq=2 ttl=64 time=101 ms (DUP!)
64 bytes from 192.168.1.2: icmp_seq=2 ttl=64 time=103 ms (DUP!)
64 bytes from 192.168.1.2: icmp_seq=2 ttl=64 time=130 ms (DUP!)
64 bytes from 192.168.1.2: icmp_seq=3 ttl=64 time=78.4 ms
64 bytes from 192.168.1.2: icmp_seq=3 ttl=64 time=81.7 ms (DUP!)
64 bytes from 192.168.1.2: icmp_seq=3 ttl=64 time=128 ms (DUP!)
64 bytes from 192.168.1.2: icmp_seq=4 ttl=64 time=43.2 ms
64 bytes from 192.168.1.2: icmp_seq=5 ttl=64 time=51.9 ms
64 bytes from 192.168.1.2: icmp_seq=6 ttl=64 time=46.1 ms
64 bytes from 192.168.1.2: icmp_seq=7 ttl=64 time=37.7 ms
^C
--- 192.168.1.2 ping statistics ---
7 packets transmitted, 7 received, +5 duplicates, 0% packet loss, time 6009ms
rtt min/avg/max/mdev = 37.764/81.372/130.159/30.336 ms

パケットを拾って、一番酷い ICMP Echo (Request|Reply) Sequence No.2 の状況を追ってみると

  • node21
    • Sequence No.2 の ICMP Echo Request を2個受信している
  • node11
    • Sequence No.2 の ICMP Echo Request を1個しか送信していない
    • Sequence No.2 の ICMP Echo Reply を4個受信している
  • torSW101a
    • Sequence No.2 の ICMP Echo Request を spine12 に転送している
  • spine12
    • Sequence No.2 の ICMP Echo Request を Ingress Replication
    • この時点では、まだ node21MAC アドレスがどの VTEP 配下にいるか知らないから Unknown Unicast 扱い
    • Sequence No.2 の ICMP Echo Request を 投げた直後、spine22 から EVPN NLRI Type2node21MAC アドレス advertise を受けて、以降は IngressReplication せずに spine22 だけに送信
  • bb01
    • Sequence No.2 の ICMP Echo Request をそのまま spine11, spine21, spine22 に転送
  • spine21, spine22
    • Sequence No.2 の ICMP Echo Request のカプセルの中身は DstMAC が node21 で、自分の ES 配下にいるやつへの Unicast Traffic なので、そのまま torSw201a に転送(もしかしたら、spine21 はこの時点で node21MACアドレスを学習しておらず、Unknown Unicast として Multihoming の DF 動作を果たした、という可能性も)
    • EVPN Multihoming の挙動としては正しい

という感じでして。
現状の実装だと仕方ない挙動と言わざるをえない...でしょうかね。そもそもあまり起こりえないシナリオですかね。僕はノリで clear ethernet-switching table とかよく叩くんすが(そりゃ検証環境だからだ)。
EVPN Multihoming Mode を active-standby にしても、前述の spine2[12] の挙動は変わらず、Remote VTEP 側(spine12)が DF にのみ送信してくることを期待しているように見えますし...。
なんかこの辺の問題提起やら解決案やらが RFC, draft であったりするんでしょうか。もしくは各メーカの実装で何か回避できているんでしょうか。追っている人がいたら是非教えてください...。

一応、以下は spine12bb01 間のパケットキャプチャですが、Sequence No.2 の ICMP Echo Request を Ingress Replication した直後に spine21 から EVPN NLRI Type2node21MAC アドレス advertise を受けているの図です。

f:id:kakkotetsu:20170513141645p:plain

おしまい

まとめ

  • EVPN Multihoming の触りを試して、最低限の正常動作が確認できました
  • RFC7432(BGP MPLS-Based Ethernet VPN) Multihoming に書かれている内容(で実装されているもの)は、今回試した範囲には収まりきらんので色々やってみましょう

所感

  • VTEP を冗長化する手法として考えると、EVPN Multihoming は各 PE がそれなりに独立して動いてくれるのが嬉しいです。変に独自機構を使うと「その冗長化プロトコルを形成できるバージョンの組み合わせ」でバージョンアップとかが苦しくなりそうなので...。
    • 偏見かも知らんけど L2, L4 の世界はその辺で結構苦しむような...。
    • 例えば、同じ Juniper でいうと MC-LAG とかはどうなんでしょうね。組めるバージョンの組み合わせとか気にして運用するもんでしょうか。
  • vQFX いいすね...100万回言ってますけど、この辺の調査/検証をお家で好きにできると、色々と敷居が下がるので。
    • こうなってくると、本記事の torSW[12]01a にあたるところに、QFX5110 とか QFX5200 とかの仮想版も欲しくなってきますね
  • この辺のニッチな話に興味がある人、冒頭で紹介した oreilly の「Juniper QFX10000 Series」もオススメですよ!
    • 独自プロセッサ開発経緯や特徴・Architecture(HW/SW) の結構詳しい説明・MPLS/VXLAN+EVPNの設計ポイント とか、箱を持っていなくても読んでいて楽しいです
  • Juniper QFX10000 最高!みんな買おうぜ!!(買えない)

vQFX10000 で VXLAN+EVPN (evpn-inter-subnet-forwarding(Symmetric) 編) (original : 2017/01/09)

この記事は某所で 2017/01/09 に書いたもののコピーです。
そのため 2017/05/13 時点ではやや古い情報も含まれています。(以下一例)

  • 201705 に GNS3 ver2.0 stable が出た

.

最初に

先日(vQFX10000 を KVM+GNS3 で動かす)、Juniper vQFX10000(以降 vQFX) の DL 権限を個人で得て GNS3 で軽く動作確認をとり、前回(vQFX10000 で VXLAN+EVPN (L2 over L3 編))、仮想版でも L2VPN 機能が動くことを確認しました。
QFX10000系では EVPN NLRI Type5 Route (参考資料の draft 参照)で Symmetric な evpn-inter-subnet-forwarding ができるようなので、今回は vQFX でそれを試します。(あらすじが長い...)

本項でやること

  • vQFX で VXLAN の Control Plane として EVPN を動かす
  • EVPN NLRI Type5(EVPN Prefix Advertisement) を使って evpn-inter-subnet-forwarding (Symmetric)を動かす
  • 前回の NLRI Type2+3 を使った L2 over L3 も併用できるか確認する
    • 下図の赤い点線部分
    • Type5 を使って VRFtoVRF しているドキュメントでは「セグメントが単一サイト(/部屋)に閉じる場合には」という文言があったりするけれど

f:id:kakkotetsu:20170513134936p:plain

概要構成図

前回に引き続き、下図の感じで。

f:id:kakkotetsu:20170513135024p:plain

余談。
本環境の GNS3 1.5.2 でパケットキャプチャするために、qemu 同士を直結せずに、間に GNS3 の Ethernet Switch を挟む構成にしてます。が、「pelican@ainoniwa.net / GNS3 2.0からはKVM間のパケットキャプチャも取れるようになるぞい」のように GNS3 2.0(2017/01/04 時点では beta 版, 2017/05/13時点ではstable版公開済)を使えば、これが不要になります。やったぜ!

参考資料

構築~動作確認

GNS3 でデプロイ

前回の環境を引き続き使います。先にあげた概要構成図の通り。

疎通確認用 node 設定

以下のような感じで

kotetsu@node11:~$ ip a show dev ens4
3: ens4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 00:37:c4:e2:60:01 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.1/24 brd 192.168.1.255 scope global ens4
       valid_lft forever preferred_lft forever
    inet6 fe80::237:c4ff:fee2:6001/64 scope link
       valid_lft forever preferred_lft forever

kotetsu@node11:~$ ip route show dev ens4
192.168.0.0/16 via 192.168.1.254
192.168.1.0/24  proto kernel  scope link  src 192.168.1.1
kotetsu@node21:~$ ip a show dev ens4
3: ens4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 00:37:c4:46:d8:01 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.2/24 brd 192.168.1.255 scope global ens4
       valid_lft forever preferred_lft forever
    inet6 fe80::237:c4ff:fe46:d801/64 scope link
       valid_lft forever preferred_lft forever

kotetsu@node21:~$ ip route show dev ens4
192.168.0.0/16 via 192.168.1.254
192.168.1.0/24  proto kernel  scope link  src 192.168.1.2
kotetsu@node22:~$ ip a show dev ens4
3: ens4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 00:37:c4:3d:e0:01 brd ff:ff:ff:ff:ff:ff
    inet 192.168.2.2/24 brd 192.168.2.255 scope global ens4
       valid_lft forever preferred_lft forever
    inet6 fe80::237:c4ff:fe3d:e001/64 scope link
       valid_lft forever preferred_lft forever

kotetsu@node22:~$ ip route show dev ens4
192.168.0.0/16 via 192.168.2.254
192.168.2.0/24  proto kernel  scope link  src 192.168.2.2
kotetsu@node13:~$ ip a show dev ens4
3: ens4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 00:37:c4:0d:a8:01 brd ff:ff:ff:ff:ff:ff
    inet 192.168.3.1/24 brd 192.168.3.255 scope global ens4
       valid_lft forever preferred_lft forever
    inet6 fe80::237:c4ff:fe0d:a801/64 scope link
       valid_lft forever preferred_lft forever

kotetsu@node13:~$ ip route show dev ens4
192.168.0.0/16 via 192.168.3.254
192.168.3.0/24  proto kernel  scope link  src 192.168.3.1

vQFX 基本設定ちょいたし

EVPN にこんな設定をして /var/log/evpn.log で追えるようにしときました。flag all にしたらちょっと追いきれない情報量になったので、そこはお好みで...。

set protocols evpn traceoptions file evpn.log
set protocols evpn traceoptions file size 10k
set protocols evpn traceoptions file files 30
set protocols evpn traceoptions flag all

vQFX Inter Subnet Forwarding 設定~確認

Inter Subnet Forwarding 設定

spine11

前回こんな感じの設定を入れてあったので

{master:0}[edit]
kotetsu@spine11# show vlans | display set
set vlans VLAN0100 vlan-id 100
set vlans VLAN0100 vxlan vni 10100
set vlans VLAN0100 vxlan ingress-node-replication
set vlans VLAN0300 vlan-id 300
set vlans VLAN0300 vxlan vni 10300
set vlans VLAN0300 vxlan ingress-node-replication
set vlans default vlan-id 1

以下で IRB 作成 ~ VRF 作成

set interfaces irb unit 100 family inet address 192.168.1.254/24
set interfaces irb unit 100 proxy-macip-advertisement

set interfaces irb unit 300 family inet address 192.168.3.254/24
set interfaces irb unit 300 proxy-macip-advertisement

set vlans VLAN0100 l3-interface irb.100
set vlans VLAN0300 l3-interface irb.300


set interfaces lo0 unit 1 family inet address 198.18.1.11/32

set routing-instances VRF001 instance-type vrf
set routing-instances VRF001 interface irb.100
set routing-instances VRF001 interface irb.300
set routing-instances VRF001 interface lo0.1
set routing-instances VRF001 route-distinguisher 50001:11
set routing-instances VRF001 vrf-target target:64512:50001
set routing-instances VRF001 protocols evpn ip-prefix-routes advertise direct-nexthop
set routing-instances VRF001 protocols evpn ip-prefix-routes encapsulation vxlan
set routing-instances VRF001 protocols evpn ip-prefix-routes vni 50001

ポイントは protocols evpn ip-prefix-routes 周りの設定ですね。
これで VRF to VRF で EVPN NLRI Type5 で IP Prefix や自分の MAC アドレス(Router's MAC) を送信できるようになるはず。

Juniper公式 / ReleaseNotes Junos 15.1X53-D60 for QFX10000 Switches の 19 ページによると

Best practice for EVPN-VXLAN configuration (QFX10000 switches)—Starting with Junos OS Release 15.1X53-D60, in an EVPN-VXLAN configuration on QFX10000 switches, you no longer need to configure vxlan ingress-node-replication.

だそうな。なんでだろ。とにかく Best Practice なら従っておくのがよかろうということで。

delete vlans VLAN0100 vxlan ingress-node-replication
delete vlans VLAN0300 vxlan ingress-node-replication

これを消さずにテストしていた時、なんか EVPN NLRI Type3 を送ってきていない Remote VTEP と VNI の組み合わせに対して BUM を ingress replication 転送してたんだよな...ということで消すのが正解と思います。

あと自分の環境起因かもですが、この辺の設定追加・変更して commit した時に、bb01 と spine[12]1 の間で LLDP や BGP がフラッピングし始めることがありました。ちゃんと追ってないですが、そんな時にはおとなしく request system reboot して落ち着かせました。(雑)

spine21

spine11 とほとんど一緒です。

前回こんな感じの設定を入れてあったので

{master:0}[edit]
kotetsu@spine21# show vlans | display set
set vlans VLAN0100 vlan-id 100
set vlans VLAN0100 vxlan vni 10100
set vlans VLAN0100 vxlan ingress-node-replication
set vlans VLAN0200 vlan-id 200
set vlans VLAN0200 vxlan vni 10200
set vlans VLAN0200 vxlan ingress-node-replication
set vlans default vlan-id 1

以下で IRB 作成 ~ VRF 作成

set interfaces irb unit 100 family inet address 192.168.1.254/24
set interfaces irb unit 100 proxy-macip-advertisement
set interfaces irb unit 200 family inet address 192.168.2.254/24
set interfaces irb unit 200 proxy-macip-advertisement

set vlans VLAN0100 l3-interface irb.100
set vlans VLAN0200 l3-interface irb.200

set interfaces lo0 unit 1 family inet address 198.18.1.21/32

set routing-instances VRF001 instance-type vrf
set routing-instances VRF001 interface irb.100
set routing-instances VRF001 interface irb.200
set routing-instances VRF001 interface lo0.1
set routing-instances VRF001 route-distinguisher 50001:21
set routing-instances VRF001 vrf-target target:64512:50001
set routing-instances VRF001 protocols evpn ip-prefix-routes advertise direct-nexthop
set routing-instances VRF001 protocols evpn ip-prefix-routes encapsulation vxlan
set routing-instances VRF001 protocols evpn ip-prefix-routes vni 50001

前述の通り Best Practice に従っておく。

delete vlans VLAN0100 vxlan ingress-node-replication
delete vlans VLAN0200 vxlan ingress-node-replication

動作確認

IPアドレスMACアドレス情報 整理

ちょっとノード数が増えてきたので、まとめときます。

VLAN IPアドレス MACアドレス ノード インターフェース 備考
100 192.168.1.1/24 00:37:c4:e2:60:01 node11 ens4 spine11 配下
100 192.168.1.2/24 00:37:c4:46:d8:01 node21 ens4 spine21 配下
100 192.168.1.254/24 02:05:86:71:3c:00 spine11 irb
100 192.168.1.254/24 02:05:86:71:d8:00 spine21 irb
200 192.168.2.2/24 00:37:c4:3d:e0:01 node22 ens4 spine21 配下
200 192.168.2.254/24 02:05:86:71:d8:00 spine21 irb
300 192.168.3.1/24 00:37:c4:0d:a8:01 node13 ens4 spine11 配下
300 192.168.3.254/24 02:05:86:71:3c:00 spine11 irb
- 192.0.2.1/30 02:05:86:71:21:03 bb01 xe-0/0/0 spine21
- 192.0.2.2/30 02:05:86:71:3c:03 spine11 xe-0/0/0 bb01
- 192.0.2.5/30 02:05:86:71:21:07 bb01 xe-0/0/1 spine21
- 192.0.2.6/30 02:05:86:71:d8:03 spine21 xe-0/0/0 bb01

spine11 と spine21 の IRB MAC アドレス at VLAN100 は各物理のものになっていて、共有用の仮想 MAC を設定したりはしていないです。
spine11 配下の node11 は spine11 の IRB MACアドレスゲートウェイとして使い、spine21 配下の node21 は spine21 の IRB MACアドレスゲートウェイとして使う...ってな動きになるのか、を見ようという主旨。

node間疎通確認~ARPテーブル確認

4 台の node 間をフルメッシュに ping 簡易疎通確認、とりあえず疎通オールおっけー。
直後 ARP テーブル確認

  • node11(192.168.1.1/24)
kotetsu@node11:~$ ip n show dev ens4
192.168.1.254 lladdr 02:05:86:71:3c:00 REACHABLE  ## spine11 IRB の MACaddr
192.168.1.2 lladdr 00:37:c4:46:d8:01 STALE
  • node21(192.168.1.2/24)
kotetsu@node21:~$ ip n show dev ens4
192.168.1.254 lladdr 02:05:86:71:d8:00 REACHABLE  ## spine21 IRB の MACaddr
192.168.1.1 lladdr 00:37:c4:e2:60:01 REACHABLE
  • node13(192.168.3.1/24)
kotetsu@node13:~$ ip n show dev ens4
192.168.3.254 lladdr 02:05:86:71:3c:00 STALE  ## spine11 IRB の MACaddr
  • node22(192.168.2.2/24)
kotetsu@node22:~$ ip n show dev ens4
192.168.2.254 lladdr 02:05:86:71:d8:00 REACHABLE  ## spine21 IRB の MACaddr

ちなみに spine11 や spine12 から ping を実行しようとしたら、自発パケットを EVPN 方面には流られないっぽいです。そりゃあそうか。

{master:0}
kotetsu@spine21> ping routing-instance VRF001 192.168.1.2
PING 192.168.1.2 (192.168.1.2): 56 data bytes
64 bytes from 192.168.1.2: icmp_seq=0 ttl=64 time=6.448 ms
64 bytes from 192.168.1.2: icmp_seq=1 ttl=64 time=5.700 ms

{master:0}
kotetsu@spine21> ping routing-instance VRF001 192.168.3.1
PING 192.168.3.1 (192.168.3.1): 56 data bytes
ping: sendto: Operation not supported
ping: sendto: Operation not supported


{master:0}
kotetsu@spine21> show route table VRF001.inet.0 192.168.3.0/24

VRF001.inet.0: 8 destinations, 9 routes (8 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

192.168.3.0/24     *[EVPN/170] 20:55:03
                    > to 192.0.2.5 via xe-0/0/0.0

パッと見は巧いこと通信できているようですが...ちょっと中身を追っていきます。

vQFXテーブル確認

EVPN

両方の spine につくっている VLAN100(VNI 10100)の MAC 情報だけが Remote から学習したものに含まれ...ってやはり Router's MAC として 192.168.1.254 の実 MAC 2 台分を学習している。

{master:0}
kotetsu@spine11> show evpn database
Instance: default-switch
VLAN  VNI  MAC address        Active source                  Timestamp        IP address
      10100 00:37:c4:46:d8:01  172.16.2.1                     Jan 08 21:35:56
      10100 00:37:c4:e2:60:01  xe-0/0/1.0                     Jan 08 01:30:42  192.168.1.1
      10100 02:05:86:71:3c:00  irb.100                        Jan 08 12:50:23  192.168.1.254
      10100 02:05:86:71:d8:00  172.16.2.1                     Jan 08 12:50:38  192.168.1.254
      10300 00:37:c4:0d:a8:01  xe-0/0/1.0                     Jan 08 01:30:41  192.168.3.1
      10300 02:05:86:71:3c:00  irb.300                        Jan 08 01:27:27  192.168.3.254
{master:0}
kotetsu@spine21> show evpn database
Instance: default-switch
VLAN  VNI  MAC address        Active source                  Timestamp        IP address
      10100 00:37:c4:46:d8:01  xe-0/0/1.0                     Jan 08 01:30:41  192.168.1.2
      10100 00:37:c4:e2:60:01  172.16.1.1                     Jan 08 21:42:10  192.168.1.1
      10100 02:05:86:71:3c:00  172.16.1.1                     Jan 08 12:53:48  192.168.1.254
      10100 02:05:86:71:d8:00  irb.100                        Jan 08 12:54:02  192.168.1.254
      10200 00:37:c4:3d:e0:01  xe-0/0/1.0                     Jan 08 01:30:41  192.168.2.2
      10200 02:05:86:71:d8:00  irb.200                        Jan 08 01:16:21  192.168.2.254

自分が吐いている Type5 情報サマリ

{master:0}
kotetsu@spine11> show evpn l3-context
L3 context                      Type  Adv      Encap  VNI/Label  Router MAC/GW intf
VRF001                          Cfg   Direct   VXLAN  50001      02:05:86:71:3c:00
{master:0}
kotetsu@spine21> show evpn l3-context
L3 context                      Type  Adv      Encap  VNI/Label  Router MAC/GW intf
VRF001                          Cfg   Direct   VXLAN  50001      02:05:86:71:d8:00

Remote から Type 5 で学習した Router's MAC を確認して

{master:0}
kotetsu@spine11> show evpn ip-prefix-database
L3 context: VRF001

IPv4->EVPN Exported Prefixes
Prefix                                       EVPN route status
192.168.1.0/24                               Created
192.168.3.0/24                               Created

EVPN->IPv4 Imported Prefixes
Prefix                                       Etag      IP route status
192.168.1.0/24                               0         Created
  Route distinguisher    St  VNI/Label  Router MAC         Nexthop/Overlay GW/ESI
  50001:21               A   50001      02:05:86:71:d8:00  172.16.2.1
192.168.2.0/24                               0         Created
  Route distinguisher    St  VNI/Label  Router MAC         Nexthop/Overlay GW/ESI
  50001:21               A   50001      02:05:86:71:d8:00  172.16.2.1
{master:0}
kotetsu@spine21> show evpn ip-prefix-database
L3 context: VRF001

IPv4->EVPN Exported Prefixes
Prefix                                       EVPN route status
192.168.1.0/24                               Created
192.168.2.0/24                               Created

EVPN->IPv4 Imported Prefixes
Prefix                                       Etag      IP route status
192.168.1.0/24                               0         Created
  Route distinguisher    St  VNI/Label  Router MAC         Nexthop/Overlay GW/ESI
  50001:11               A   50001      02:05:86:71:3c:00  172.16.1.1
192.168.3.0/24                               0         Created
  Route distinguisher    St  VNI/Label  Router MAC         Nexthop/Overlay GW/ESI
  50001:11               A   50001      02:05:86:71:3c:00  172.16.1.1

Routing Table

当該 VRF の Routing Table サマリーを見て

{master:0}
kotetsu@spine11> show route instance VRF001 extensive
VRF001:
  Router ID: 198.18.1.11
  Type: vrf               State: Active
  Interfaces:
    irb.100
    irb.300
    lo0.1
  Route-distinguisher: 50001:11
  Vrf-import: [ __vrf-import-VRF001-internal__ ]
  Vrf-export: [ __vrf-export-VRF001-internal__ ]
  Vrf-import-target: [ target:64512:50001 ]
  Vrf-export-target: [ target:64512:50001 ]
  Fast-reroute-priority: low
  Tables:
    VRF001.inet.0          : 9 routes (8 active, 0 holddown, 0 hidden)
    VRF001.inet.3          : 0 routes (0 active, 0 holddown, 0 hidden)
    VRF001.iso.0           : 0 routes (0 active, 0 holddown, 0 hidden)
    VRF001.inet6.0         : 0 routes (0 active, 0 holddown, 0 hidden)
    VRF001.inet6.3         : 0 routes (0 active, 0 holddown, 0 hidden)
    VRF001.mdt.0           : 0 routes (0 active, 0 holddown, 0 hidden)
    VRF001.evpn.0          : 4 routes (4 active, 0 holddown, 0 hidden)
{master:0}
kotetsu@spine21> show route instance VRF001 extensive
VRF001:
  Router ID: 198.18.1.21
  Type: vrf               State: Active
  Interfaces:
    irb.100
    irb.200
    lo0.1
  Route-distinguisher: 50001:21
  Vrf-import: [ __vrf-import-VRF001-internal__ ]
  Vrf-export: [ __vrf-export-VRF001-internal__ ]
  Vrf-import-target: [ target:64512:50001 ]
  Vrf-export-target: [ target:64512:50001 ]
  Fast-reroute-priority: low
  Tables:
    VRF001.inet.0          : 9 routes (8 active, 0 holddown, 0 hidden)
    VRF001.inet.3          : 0 routes (0 active, 0 holddown, 0 hidden)
    VRF001.iso.0           : 0 routes (0 active, 0 holddown, 0 hidden)
    VRF001.inet6.0         : 0 routes (0 active, 0 holddown, 0 hidden)
    VRF001.inet6.3         : 0 routes (0 active, 0 holddown, 0 hidden)
    VRF001.mdt.0           : 0 routes (0 active, 0 holddown, 0 hidden)
    VRF001.evpn.0          : 4 routes (4 active, 0 holddown, 0 hidden)

中身のサマリをみる。まずは「bgp.evpn.0 = Junos OS ルーティング プロトコル プロセス(RPD)内のグローバル EVPN ルーティング テーブル」を

{master:0}
kotetsu@spine11> show route table bgp.evpn.0

bgp.evpn.0: 9 destinations, 9 routes (9 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

1:172.16.2.1:0::050000fdea0000277400::FFFF:FFFF/304
                   *[BGP/170] 21:18:16, localpref 100, from 172.31.0.1
                      AS path: I, validation-state: unverified
                    > to 192.0.2.1 via xe-0/0/0.0
1:172.16.2.1:0::050000fdea000027d800::FFFF:FFFF/304
                   *[BGP/170] 21:18:16, localpref 100, from 172.31.0.1
                      AS path: I, validation-state: unverified
                    > to 192.0.2.1 via xe-0/0/0.0
2:64512:21::10100::00:37:c4:46:d8:01/304
                   *[BGP/170] 19:24:02, localpref 100, from 172.31.0.1
                      AS path: I, validation-state: unverified
                    > to 192.0.2.1 via xe-0/0/0.0
2:64512:21::10100::02:05:86:71:d8:00/304
                   *[BGP/170] 09:57:37, localpref 100, from 172.31.0.1
                      AS path: I, validation-state: unverified
                    > to 192.0.2.1 via xe-0/0/0.0
2:64512:21::10100::00:37:c4:46:d8:01::192.168.1.2/304
                   *[BGP/170] 09:36:35, localpref 100, from 172.31.0.1
                      AS path: I, validation-state: unverified
                    > to 192.0.2.1 via xe-0/0/0.0
2:64512:21::10100::00:37:c4:46:d8:01::192.168.1.2/304
                   *[BGP/170] 09:36:35, localpref 100, from 172.31.0.1
                      AS path: I, validation-state: unverified
                    > to 192.0.2.1 via xe-0/0/0.0
2:64512:21::10100::02:05:86:71:d8:00::192.168.1.254/304
                   *[BGP/170] 09:57:37, localpref 100, from 172.31.0.1
                      AS path: I, validation-state: unverified
                    > to 192.0.2.1 via xe-0/0/0.0
3:64512:21::10100::172.16.2.1/304
                   *[BGP/170] 19:24:02, localpref 100, from 172.31.0.1
                      AS path: I, validation-state: unverified
                    > to 192.0.2.1 via xe-0/0/0.0
5:50001:21::0::192.168.1.0::24/304
                   *[BGP/170] 21:18:16, localpref 100, from 172.31.0.1
                      AS path: I, validation-state: unverified
                    > to 192.0.2.1 via xe-0/0/0.0
5:50001:21::0::192.168.2.0::24/304
                   *[BGP/170] 21:18:16, localpref 100, from 172.31.0.1
                      AS path: I, validation-state: unverified
                    > to 192.0.2.1 via xe-0/0/0.0
{master:0}
kotetsu@spine21> show route table bgp.evpn.0

bgp.evpn.0: 9 destinations, 9 routes (9 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

1:172.16.1.1:0::050000fde90000277400::FFFF:FFFF/304
                   *[BGP/170] 10:02:28, localpref 100, from 172.31.0.1
                      AS path: I, validation-state: unverified
                    > to 192.0.2.5 via xe-0/0/0.0
1:172.16.1.1:0::050000fde90000283c00::FFFF:FFFF/304
                   *[BGP/170] 21:26:18, localpref 100, from 172.31.0.1
                      AS path: I, validation-state: unverified
                    > to 192.0.2.5 via xe-0/0/0.0
2:64512:11::10100::00:37:c4:e2:60:01/304
                   *[BGP/170] 19:32:22, localpref 100, from 172.31.0.1
                      AS path: I, validation-state: unverified
                    > to 192.0.2.5 via xe-0/0/0.0
2:64512:11::10100::02:05:86:71:3c:00/304
                   *[BGP/170] 10:02:28, localpref 100, from 172.31.0.1
                      AS path: I, validation-state: unverified
                    > to 192.0.2.5 via xe-0/0/0.0
2:64512:11::10100::00:37:c4:e2:60:01::192.168.1.1/304
                   *[BGP/170] 10:01:04, localpref 100, from 172.31.0.1
                      AS path: I, validation-state: unverified
                    > to 192.0.2.5 via xe-0/0/0.0
2:64512:11::10100::02:05:86:71:3c:00::192.168.1.254/304
                   *[BGP/170] 10:02:28, localpref 100, from 172.31.0.1
                      AS path: I, validation-state: unverified
                    > to 192.0.2.5 via xe-0/0/0.0
3:64512:11::10100::172.16.1.1/304
                   *[BGP/170] 19:32:21, localpref 100, from 172.31.0.1
                      AS path: I, validation-state: unverified
                    > to 192.0.2.5 via xe-0/0/0.0
5:50001:11::0::192.168.1.0::24/304
                   *[BGP/170] 10:02:29, localpref 100, from 172.31.0.1
                      AS path: I, validation-state: unverified
                    > to 192.0.2.5 via xe-0/0/0.0
5:50001:11::0::192.168.3.0::24/304
                   *[BGP/170] 21:26:18, localpref 100, from 172.31.0.1
                      AS path: I, validation-state: unverified
                    > to 192.0.2.5 via xe-0/0/0.0

次に当該 VRF のさまりー

{master:0}
kotetsu@spine11> show route table VRF001.evpn.0

VRF001.evpn.0: 4 destinations, 4 routes (4 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

5:50001:11::0::192.168.1.0::24/304
                   *[EVPN/170] 10:15:34
                      Indirect
5:50001:11::0::192.168.3.0::24/304
                   *[EVPN/170] 21:36:19
                      Indirect
5:50001:21::0::192.168.1.0::24/304
                   *[BGP/170] 21:35:58, localpref 100, from 172.31.0.1
                      AS path: I, validation-state: unverified
                    > to 192.0.2.1 via xe-0/0/0.0
5:50001:21::0::192.168.2.0::24/304
                   *[BGP/170] 21:35:58, localpref 100, from 172.31.0.1
                      AS path: I, validation-state: unverified
                    > to 192.0.2.1 via xe-0/0/0.0


{master:0}
kotetsu@spine11> show route table VRF001.inet.0

VRF001.inet.0: 8 destinations, 9 routes (8 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

192.168.1.0/24     *[Direct/0] 10:11:26
                    > via irb.100
                    [EVPN/170] 21:31:50
                    > to 192.0.2.1 via xe-0/0/0.0
192.168.1.1/32     *[EVPN/7] 10:10:01
                    > via irb.100
192.168.1.254/32   *[Local/0] 10:11:27
                      Local via irb.100
192.168.2.0/24     *[EVPN/170] 21:31:50
                    > to 192.0.2.1 via xe-0/0/0.0
192.168.3.0/24     *[Direct/0] 21:32:11
                    > via irb.300
192.168.3.1/32     *[EVPN/7] 21:31:06
                    > via irb.300
192.168.3.254/32   *[Local/0] 21:34:23
                      Local via irb.300
198.18.1.11/32     *[Direct/0] 21:34:23
                    > via lo0.1
{master:0}
kotetsu@spine21> show route table VRF001.evpn.0

VRF001.evpn.0: 4 destinations, 4 routes (4 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

5:50001:11::0::192.168.1.0::24/304
                   *[BGP/170] 10:19:12, localpref 100, from 172.31.0.1
                      AS path: I, validation-state: unverified
                    > to 192.0.2.5 via xe-0/0/0.0
5:50001:11::0::192.168.3.0::24/304
                   *[BGP/170] 21:43:01, localpref 100, from 172.31.0.1
                      AS path: I, validation-state: unverified
                    > to 192.0.2.5 via xe-0/0/0.0
5:50001:21::0::192.168.1.0::24/304
                   *[EVPN/170] 10:18:58
                      Indirect
5:50001:21::0::192.168.2.0::24/304
                   *[EVPN/170] 21:54:15
                      Indirect


{master:0}
kotetsu@spine21> show route table VRF001.inet.0

VRF001.inet.0: 8 destinations, 9 routes (8 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

192.168.1.0/24     *[Direct/0] 10:16:27
                    > via irb.100
                    [EVPN/170] 10:16:41
                    > to 192.0.2.5 via xe-0/0/0.0
192.168.1.2/32     *[EVPN/7] 10:16:10
                    > via irb.100
192.168.1.254/32   *[Local/0] 10:16:27
                      Local via irb.100
192.168.2.0/24     *[Direct/0] 21:51:44
                    > via irb.200
192.168.2.2/32     *[EVPN/7] 21:39:37
                    > via irb.200
192.168.2.254/32   *[Local/0] 21:54:08
                      Local via irb.200
192.168.3.0/24     *[EVPN/170] 21:40:30
                    > to 192.0.2.5 via xe-0/0/0.0
198.18.1.21/32     *[Direct/0] 21:54:08
                    > via lo0.1

最後に Type 5 route の詳細を見ておく。こいつらは VRF001.evpn.0 にもロードされている。

{master:0}
kotetsu@spine11> show route table bgp.evpn.0 extensive

...

5:50001:21::0::192.168.1.0::24/304 (1 entry, 0 announced)
        *BGP    Preference: 170/-101
                Route Distinguisher: 50001:21
                Next hop type: Indirect, Next hop index: 0
                Address: 0xaa605f0
                Next-hop reference count: 18
                Source: 172.31.0.1
                Protocol next hop: 172.16.2.1
                Indirect next hop: 0x2 no-forward INH Session ID: 0x0
                State: <Active Int Ext>
                Local AS: 65001 Peer AS: 64512
                Age: 21:25:33   Metric2: 0
                Validation State: unverified
                Task: BGP_64512_64512.172.31.0.1
                AS path: I (Originator)
                Cluster list:  172.31.0.1
                Originator ID: 172.16.2.1
                Communities: target:64512:50001 encapsulation0:0:0:0:vxlan router-mac:02:05:86:71:d8:00
                Import Accepted
                Route Label: 50001
                Overlay gateway address: 0.0.0.0
                ESI 00:00:00:00:00:00:00:00:00:00
                Localpref: 100
                Router ID: 172.31.0.1
                Secondary Tables: VRF001.evpn.0
                Indirect next hops: 1
                        Protocol next hop: 172.16.2.1
                        Indirect next hop: 0x2 no-forward INH Session ID: 0x0
                        Indirect path forwarding next hops: 1
                                Next hop type: Router
                                Next hop: 192.0.2.1 via xe-0/0/0.0
                                Session Id: 0x0
                        172.16.2.1/32 Originating RIB: inet.0
                          Node path count: 1
                          Forwarding nexthops: 1
                                Nexthop: 192.0.2.1 via xe-0/0/0.0

5:50001:21::0::192.168.2.0::24/304 (1 entry, 0 announced)
        *BGP    Preference: 170/-101
                Route Distinguisher: 50001:21
                Next hop type: Indirect, Next hop index: 0
                Address: 0xaa605f0
                Next-hop reference count: 18
                Source: 172.31.0.1
                Protocol next hop: 172.16.2.1
                Indirect next hop: 0x2 no-forward INH Session ID: 0x0
                State: <Active Int Ext>
                Local AS: 65001 Peer AS: 64512
                Age: 21:25:33   Metric2: 0
                Validation State: unverified
                Task: BGP_64512_64512.172.31.0.1
                AS path: I (Originator)
                Cluster list:  172.31.0.1
                Originator ID: 172.16.2.1
                Communities: target:64512:50001 encapsulation0:0:0:0:vxlan router-mac:02:05:86:71:d8:00
                Import Accepted
                Route Label: 50001
                Overlay gateway address: 0.0.0.0
                ESI 00:00:00:00:00:00:00:00:00:00
                Localpref: 100
                Router ID: 172.31.0.1
                Secondary Tables: VRF001.evpn.0
                Indirect next hops: 1
                        Protocol next hop: 172.16.2.1
                        Indirect next hop: 0x2 no-forward INH Session ID: 0x0
                        Indirect path forwarding next hops: 1
                                Next hop type: Router
                                Next hop: 192.0.2.1 via xe-0/0/0.0
                                Session Id: 0x0
                        172.16.2.1/32 Originating RIB: inet.0
                          Node path count: 1
                          Forwarding nexthops: 1
                                Nexthop: 192.0.2.1 via xe-0/0/0.0
{master:0}
kotetsu@spine21> show route table bgp.evpn.0 extensive

...

5:50001:11::0::192.168.1.0::24/304 (1 entry, 0 announced)
        *BGP    Preference: 170/-101
                Route Distinguisher: 50001:11
                Next hop type: Indirect, Next hop index: 0
                Address: 0xaa60170
                Next-hop reference count: 18
                Source: 172.31.0.1
                Protocol next hop: 172.16.1.1
                Indirect next hop: 0x2 no-forward INH Session ID: 0x0
                State: <Active Int Ext>
                Local AS: 65002 Peer AS: 64512
                Age: 10:04:02   Metric2: 0
                Validation State: unverified
                Task: BGP_64512_64512.172.31.0.1
                AS path: I (Originator)
                Cluster list:  172.31.0.1
                Originator ID: 172.16.1.1
                Communities: target:64512:50001 encapsulation0:0:0:0:vxlan router-mac:02:05:86:71:3c:00
                Import Accepted
                Route Label: 50001
                Overlay gateway address: 0.0.0.0
                ESI 00:00:00:00:00:00:00:00:00:00
                Localpref: 100
                Router ID: 172.31.0.1
                Secondary Tables: VRF001.evpn.0
                Indirect next hops: 1
                        Protocol next hop: 172.16.1.1
                        Indirect next hop: 0x2 no-forward INH Session ID: 0x0
                        Indirect path forwarding next hops: 1
                                Next hop type: Router
                                Next hop: 192.0.2.5 via xe-0/0/0.0
                                Session Id: 0x0
                        172.16.1.1/32 Originating RIB: inet.0
                          Node path count: 1
                          Forwarding nexthops: 1
                                Nexthop: 192.0.2.5 via xe-0/0/0.0

5:50001:11::0::192.168.3.0::24/304 (1 entry, 0 announced)
        *BGP    Preference: 170/-101
                Route Distinguisher: 50001:11
                Next hop type: Indirect, Next hop index: 0
                Address: 0xaa60170
                Next-hop reference count: 18
                Source: 172.31.0.1
                Protocol next hop: 172.16.1.1
                Indirect next hop: 0x2 no-forward INH Session ID: 0x0
                State: <Active Int Ext>
                Local AS: 65002 Peer AS: 64512
                Age: 21:27:51   Metric2: 0
                Validation State: unverified
                Task: BGP_64512_64512.172.31.0.1
                AS path: I (Originator)
                Cluster list:  172.31.0.1
                Originator ID: 172.16.1.1
                Communities: target:64512:50001 encapsulation0:0:0:0:vxlan router-mac:02:05:86:71:3c:00
                Import Accepted
                Route Label: 50001
                Overlay gateway address: 0.0.0.0
                ESI 00:00:00:00:00:00:00:00:00:00
                Localpref: 100
                Router ID: 172.31.0.1
                Secondary Tables: VRF001.evpn.0
                Indirect next hops: 1
                        Protocol next hop: 172.16.1.1
                        Indirect next hop: 0x2 no-forward INH Session ID: 0x0
                        Indirect path forwarding next hops: 1
                                Next hop type: Router
                                Next hop: 192.0.2.5 via xe-0/0/0.0
                                Session Id: 0x0
                        172.16.1.1/32 Originating RIB: inet.0
                          Node path count: 1
                          Forwarding nexthops: 1
                                Nexthop: 192.0.2.5 via xe-0/0/0.0

MAC アドレステーブル

Type 2 から学習しているのは、両方の VTEP にくっつけている VLAN 100 = VNI 10100 のみ

{master:0}
kotetsu@spine11> show ethernet-switching table

MAC flags (S - static MAC, D - dynamic MAC, L - locally learned, P - Persistent static
           SE - statistics enabled, NM - non configured MAC, R - remote PE MAC, O - ovsdb MAC)


Ethernet switching table : 4 entries, 4 learned
Routing instance : default-switch
   Vlan                MAC                 MAC      Logical                Active
   name                address             flags    interface              source
   VLAN0100            00:37:c4:46:d8:01   D        vtep.32769             172.16.2.1
   VLAN0100            00:37:c4:e2:60:01   D        xe-0/0/1.0
   VLAN0100            02:05:86:71:d8:00   D        vtep.32769             172.16.2.1
   VLAN0300            00:37:c4:0d:a8:01   D        xe-0/0/1.0
{master:0}
kotetsu@spine21> show ethernet-switching table

MAC flags (S - static MAC, D - dynamic MAC, L - locally learned, P - Persistent static
           SE - statistics enabled, NM - non configured MAC, R - remote PE MAC, O - ovsdb MAC)


Ethernet switching table : 4 entries, 4 learned
Routing instance : default-switch
   Vlan                MAC                 MAC      Logical                Active
   name                address             flags    interface              source
   VLAN0100            00:37:c4:46:d8:01   D        xe-0/0/1.0
   VLAN0100            00:37:c4:e2:60:01   D        vtep.32769             172.16.1.1
   VLAN0100            02:05:86:71:3c:00   D        vtep.32769             172.16.1.1
   VLAN0200            00:37:c4:3d:e0:01   D        xe-0/0/1.0

通信確認詳細(DataPlane)

各通信の様子を見ていきます。それぞれ、実行前に sudo ip n flush dev ens4 とかしてます。

L3 Symmetric

node22(192.168.2.2/24) から node13(192.168.3.1/24) へ ping

以下、ICMP Echo request を spine21 と bb01 間で拾ったもの。
VXLANカプセル内の Eth ヘッダを見ると、Type5で学習した spine11 の irb MAC アドレスを Dst に指定されています。

f:id:kakkotetsu:20170513135437p:plain

なお、ICMP Echo Reply も同じ感じです。

L2 over L3

node21(192.168.1.2/24) から node11(192.168.1.1/24) へ ping

これは前回と同様に、単純に VNI 10100 でカプセル化されて通信しているだけ。

f:id:kakkotetsu:20170513135451p:plain

L3 往路 Symmetric 復路 Asymmetric

node11(192.168.1.1/24) から node22(192.168.2.2/24) へ ping

このケースが、どう動くのかイマイチ予測できなかったところです。

  1. 往路 node11 がゲートウェイIPアドレス(192.168.1.254) ARP 解決を試みる
  2. 往路 spine11 が node11 に 自身の irb MAC アドレスを答える (この時「自分が応答したので ARP request を VNI 10100 方面には転送しない」という動作になってくれるのか否かが予想できなかった & 確認したかった)
  3. 往路 spine11 は node11 からの Dst IP(192.168.2.2/24) な ICMP Echo request パケットを受信し、Type5 で得た情報に従い VNI 50001 でカプセル化して spine21 に投げる (Symmetric)
  4. 往路 spine21 は自分の irb direct 配下にいる node22 にフォワードするだけ
  5. 復路 node22 はゲートウェイIPアドレス(192.168.2.254)の ARP 解決をして、spine21 の irb MAC に ICMP Echo Reply 投げる
  6. 復路 spine21 は node22 からの Dst IP(192.168.1.1/24) な ICMP Echo Reply パケットを受信し、Type2 で得た情報に従い VNI 10100 でカプセル化して spine11 に投げる (Asymmetric)
  7. 復路 spine11 は自分の irb direct 配下にいる node11 にフォワードするだけ

という感じに予測して、行きと戻りの VNI が異なる論理的に非対称な通信でちゃんと成り立っているのか、っていうのと太字箇所を見ていきます。いやまあ疎通は確認済なんですが。

まず node11 で拾ったパケットを見ると...はい、上記の太字箇所は期待した動作になっていないですね。
spine11 と spine21 双方の irb から ARP Reply が来ていて(パケットNo.2と4)、たまたま node の仕様的に先に返ってきた spine11 側の MAC アドレスを使っていたに過ぎないようです。

f:id:kakkotetsu:20170513135515p:plain

上記の動作を、今度は spine11 と bb01 間で拾ったパケットで追ってみます。
このパケットNo.16 が「自分が node11 に ARP Reply した直後、ARP Request を VNI 10100 で spine21 方面に Ingress Replication している」やつです。

f:id:kakkotetsu:20170513135531p:plain

当然、spine21 側は通常通り ARP Reply を返してくれるし、それは先ほど node11 側で見た通り node11 にも伝搬されます。

f:id:kakkotetsu:20170513135549p:plain

ということで、ARP Request に対する Reply の動作確認は、自分が期待していたのと違う動きのようです。
構成や設定でどうにか巧いことならないか...ってところは深追いしていません。
(試しに spine[12]1 双方で set interfaces irb unit 100 mac 00:00:5e:00:53:99 とか virtual-gateway-address とか設定して同じ MAC アドレスを持たせてみたりはしましたが...それでも通信は可能だけれど、特に代表動作とかをしないIP/MAC重複状態なので、健全な状態とは言えないかと。)

あ、往路 VNI 50001, 復路 VNI 10100 に関しては、想定通りの動きでした。

f:id:kakkotetsu:20170513135633p:plain f:id:kakkotetsu:20170513135642p:plain

パケットキャプチャ(ControlPlane)

EVPN NLRI Type5 の UPDATE と WithDrawn を軽く見ておきます。

まずは WithDrawn を spine21 から吐かせた時(雑に deactivate routing-instance VRF001 とかで)

f:id:kakkotetsu:20170513135705p:plain f:id:kakkotetsu:20170513135712p:plain

UPDATE を spine21 から吐かせた時(雑に rollback 1 とかで)

f:id:kakkotetsu:20170513135726p:plain

おしまい

  • Juniper vQFX10000 でも EVPN NLRI Type5 が動くことを確認できました
  • VRF to VRF 動作と L2VPN を併用する場合、何らかの工夫が必要そうだということは分かりました
    • 「L3 往路 Symmetric 復路 Asymmetric」の項でウダウダ書いた通り
    • そもそも MX を使って MAC VRF 使う、のが常道なのかもしれませんが