kakkotetsu

Nexus9000v で VxLAN+EVPN (anycast gateway 編)

最初に

本項でやること / 概要構成図

以前 Juniper vQFX で似たようなことを試した のですが、その Nexus9000v 版です。
実装の違いにより、完全に同じではないですが。

構成としてはこんな感じで

f:id:kakkotetsu:20170917231256p:plain

肝になる EVPN 周りの動作はこんな感じで

f:id:kakkotetsu:20170911235700p:plain

テナントから見るとこんな風かな、という絵

f:id:kakkotetsu:20170911235733p:plain

参考資料

Building Data Centers with VXLAN BGP EVPN: A Cisco NX-OS Perspective (Networking Technology)

Building Data Centers with VXLAN BGP EVPN: A Cisco NX-OS Perspective (Networking Technology)

環境情報

KVM 母艦と GNS3 は以下の感じで(前回 から Ubuntu と GNS3 のバージョンアップしているので一応)

$ uname -a
Linux kvm01 4.4.0-93-generic #116-Ubuntu SMP Fri Aug 11 21:17:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04.3 LTS"

$ virsh -v
1.3.1

$ qemu-system-x86_64 --version
QEMU emulator version 2.5.0 (Debian 1:2.5+dfsg-5ubuntu10.14), Copyright (c) 2003-2008 Fabrice Bellard

$ gns3 --version
2.0.3

Nexus9000v2017/08/16 現在ダウンロード可能な最新の nxosv-final.7.0.3.I6.1.qcow2OVMF2016/08/13 時点のビルド版ぽい(MARKETPLACE で降ってきたのを使っただけ)

構築

Nexus9000v デプロイ

前回の記事 の感じで、ポチポチとデプロイしていきます。
今回、メモリは全て最低要件を狙って 4096MB としてあります。(Memory Usage Warning みたいなのは Syslog に出ていたけれど…)

同様に、疎通確認用のノードもいくらか用意しておきます。

f:id:kakkotetsu:20170912000052p:plain

Nexus9000v 物理IF 設定

Nexus 同士の部分をば。VxLAN 渡すので MTU は大きめに。
あと、NXOS の特徴的なところで feature なにがし で有効化しないと設定コマンドの候補も出てこないので、使う機能は feature コマンドでまず有効化する必要ありです。

  • spine001
feature lldp

interface Ethernet1/1
  description DEV=torsw101a IF=Eth1/8
  no switchport
  mtu 9216
  ip address 192.0.2.0/31
  no shutdown

interface Ethernet1/2
  description DEV=torsw201a IF=Eth1/8
  no switchport
  mtu 9216
  ip address 192.0.2.2/31
  no shutdown
  • torsw101a
feature lldp

interface Ethernet1/8
  description DEV=spine001 IF=Eth1/1
  no switchport
  mtu 9216
  ip address 192.0.2.1/31
  no shutdown
  • torsw201a
feature lldp

interface Ethernet1/8
  description DEV=spine001 IF=Eth1/2
  no switchport
  mtu 9216
  ip address 192.0.2.3/31
  no shutdown

Nexus9000v Underlay 設定

Cisco 公式 / Cisco Programmable Fabric with VXLAN BGP EVPN Configuration Guide / Chapter: IP Fabric Underlay あたりを参考に。

IP Unnumbered も動くみたいですが ip unnumbered loopback0 は コマンドが通らなかったので、前章の通り普通に IP アドレス降ってます。(何かの feature を有効化する必要あるのか、仮想版の機能制約かは未調査)

プロトコル選択肢としては IS-ISeBGP なども使えるようですが、「ちゃんとテストしているのは OSPF と IS-IS だ」って書いてあったので、あまり考えずに OSPF 使ってます。
これまでの記事(vQFX や Cumulus Linux)では大体 eBGP 使っていましたが、NXOS の設定体系で BGP に Underlay と Overlay を混ぜ込むと(自分的に)分かりが良くなかったというのもありますが。

f:id:kakkotetsu:20170912000111p:plain

設定

  • spine01
feature ospf

interface loopback0
  ip address 172.31.0.1/32

router ospf OSPF_UNDERLAY
  router-id 172.31.0.1

interface Ethernet1/1
  ip ospf network point-to-point
  ip router ospf OSPF_UNDERLAY area 0

interface Ethernet1/2
  ip ospf network point-to-point
  ip router ospf OSPF_UNDERLAY area 0

interface loopback0
  ip router ospf OSPF_UNDERLAY area 0.0.0.0
  • torsw101a
feature ospf

interface loopback0
  ip address 172.16.1.1/32

router ospf OSPF_UNDERLAY
  router-id 172.16.1.1

interface Ethernet1/8
  ip ospf network point-to-point
  ip router ospf OSPF_UNDERLAY area 0

interface loopback0
  ip router ospf OSPF_UNDERLAY area 0
  • torsw201a
feature ospf

interface loopback0
  ip address 172.16.2.1/32

router ospf OSPF_UNDERLAY
  router-id 172.16.2.1

interface Ethernet1/8
  ip ospf network point-to-point
  ip router ospf OSPF_UNDERLAY area 0

interface loopback0
  ip router ospf OSPF_UNDERLAY area 0

エリア0オンリーでやってますが、大規模環境ではエリア分けも検討するのが良いでしょうかね。

簡易動作確認

1台分をチラ見。

spine001# show ip route
IP Route Table for VRF "default"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>

172.16.1.1/32, ubest/mbest: 1/0
    *via 192.0.2.1, Eth1/1, [110/41], 00:12:23, ospf-OSPF_UNDERLAY, intra
172.16.2.1/32, ubest/mbest: 1/0
    *via 192.0.2.3, Eth1/2, [110/41], 00:03:39, ospf-OSPF_UNDERLAY, intra
172.31.0.1/32, ubest/mbest: 2/0, attached
    *via 172.31.0.1, Lo0, [0/0], 00:35:13, local
    *via 172.31.0.1, Lo0, [0/0], 00:35:13, direct
192.0.2.0/31, ubest/mbest: 1/0, attached
    *via 192.0.2.0, Eth1/1, [0/0], 00:22:47, direct
192.0.2.0/32, ubest/mbest: 1/0, attached
    *via 192.0.2.0, Eth1/1, [0/0], 00:22:47, local
192.0.2.2/31, ubest/mbest: 1/0, attached
    *via 192.0.2.2, Eth1/2, [0/0], 00:22:37, direct
192.0.2.2/32, ubest/mbest: 1/0, attached
    *via 192.0.2.2, Eth1/2, [0/0], 00:22:37, local


spine001# show ip ospf neighbors
 OSPF Process ID OSPF_UNDERLAY VRF default
 Total number of neighbors: 2
 Neighbor ID     Pri State            Up Time  Address         Interface
 172.16.1.1        1 FULL/ -          00:21:16 192.0.2.1       Eth1/1
 172.16.2.1        1 FULL/ -          00:12:01 192.0.2.3       Eth1/2

Nexus9000v Overlay 設定

Cisco 公式 / Cisco Programmable Fabric with VXLAN BGP EVPN Configuration Guide / Chapter: Forwarding Configurations / Cisco Nexus 9000 Series switch configuration あたりを参考に。

f:id:kakkotetsu:20170912000126p:plain

  • spine001
feature bgp
nv overlay evpn

router bgp 64512
  neighbor 172.16.1.1
    remote-as 64512
    update-source loopback0
    address-family l2vpn evpn
      send-community
      send-community extended
      route-reflector-client
  neighbor 172.16.2.1
    remote-as 64512
    update-source loopback0
    address-family l2vpn evpn
      send-community
      send-community extended
      route-reflector-client
  • torsw101a
feature bgp
nv overlay evpn

router bgp 64512
  neighbor 172.31.0.1
    remote-as 64512
    update-source loopback0
    address-family l2vpn evpn
      send-community
      send-community extended
  • torsw201a
feature bgp
nv overlay evpn

router bgp 64512
  neighbor 172.31.0.1
    remote-as 64512
    update-source loopback0
    address-family l2vpn evpn
      send-community
      send-community extended

Nexus9000v VxLAN + EVPN 設定

設定の羅列を。
! で軽くポイントをコメント入れておきます…(torsw101atorsw201a はほぼ一緒の設定なので、torsw101a 側のみに)。

なお、spine001 は Control Plane としては MP-BGP の RR 動作しますがその設定は済んでおり、Data Plane としては VxLAN 通信 の土管にしかならないので、本項では何も追加設定なしです。

  • torsw101a
! anycast gateway の仮想 MAC アドレス
! fabric forwarding mode anycast-gateway 設定した SVI 全てで使われる Global 設定
fabric forwarding anycast-gateway-mac 20:20:00:00:00:AA

feature vn-segment-vlan-based
! L2 VNI と VLAN ID の mapping
vlan 100
 vn-segment 10100
vlan 300
 vn-segment 10300

! VTEP 間 VxLAN 通信の Src/Dst IP アドレスとなる loopback (先に設定した loopback0 は EVPN Signaling 用 iBGP 用)
interface loopback1
  ip address 198.18.1.11/32
  ip router ospf OSPF_UNDERLAY area 0.0.0.0

feature nv overlay
! Network Virtualization Endpoint (VTEP 用デバイス?インターフェース? 適切な言葉が...)
interface nve1
  source-interface loopback1
  host-reachability protocol bgp

! テナント向け VRF
! ちなみにデフォルトで mgmt は management VRF に所属している
vrf context VRF001
  ! 本 VRF 専用の L3VNI
  vni 50001
  rd auto
  address-family ipv4 unicast
    route-target both auto
    route-target both auto evpn

interface nve1
  ! - や , で指定可能
  ! ただし上限数があって 10001-14094 とVLAN 4094 分やろうとしたらエラーになった
  member vni 10001-10300
    ! BUM トラフィック処理方法(VTEP 間) は Ingress Replication でユニキャスト通信させる
    ! マルチキャストも選択可能だが、そもそもこんな調査・検証している理由のひとつに「マルチキャストルーティング使いたくない」もあるので
    ingress-replication protocol bgp
  member vni 50001 associate-vrf
  no shutdown

router bgp 64512
  vrf VRF001
    address-family ipv4 unicast
      advertise l2vpn evpn

! テナント用 SVI 群
! SVI 200 は torsw201a 側のみ、SVI 300 は torsw101a 側のみで OK
feature interface-vlan
interface Vlan100
  no shutdown
  vrf member VRF001
  no ip redirects
  ip address 192.168.1.254/24
  ! 本セグメントで anycast gateway を使う
  fabric forwarding mode anycast-gateway
interface Vlan300
  no shutdown
  vrf member VRF001
  no ip redirects
  ip address 192.168.3.254/24
  fabric forwarding mode anycast-gateway

! テナント エンドノード収容物理 IF 設定
! trunk VLAN でも OK
interface Ethernet1/1
  switchport access vlan 100
  description DEV=node11 IF=ens4
interface Ethernet1/2
  switchport access vlan 300
  description DEV=node13 IF=ens4

evpn
  vni 10100 l2
    rd auto
    route-target import auto
    route-target export auto
  vni 10300 l2
    rd auto
    route-target import auto
    route-target export auto

! テナント VRF の L3 VNI につきひとつ、VLAN と SVI が必要...
! なお、VLAN は 4094 全て使えるわけでなく 39XX までしかいけない
vlan 3901
  vn-segment 50001

interface Vlan3901
  no shutdown
  vrf member VRF001
  ip forward
  • torsw201a
fabric forwarding anycast-gateway-mac 20:20:00:00:00:AA

feature vn-segment-vlan-based
vlan 100
 vn-segment 10100
vlan 200
 vn-segment 10200

interface loopback1
  ip address 198.18.1.21/32
  ip router ospf OSPF_UNDERLAY area 0.0.0.0

feature nv overlay
interface nve1
  source-interface loopback1
  host-reachability protocol bgp

vrf context VRF001
  vni 50001
  rd auto
  address-family ipv4 unicast
    route-target both auto
    route-target both auto evpn

interface nve1
  member vni 10001-10300
    ingress-replication protocol bgp
  member vni 50001 associate-vrf
  no shutdown

router bgp 64512
  vrf VRF001
    address-family ipv4 unicast
      advertise l2vpn evpn

feature interface-vlan
interface Vlan100
  no shutdown
  vrf member VRF001
  no ip redirects
  ip address 192.168.1.254/24
  fabric forwarding mode anycast-gateway
interface Vlan200
  no shutdown
  vrf member VRF001
  no ip redirects
  ip address 192.168.2.254/24
  fabric forwarding mode anycast-gateway

interface Ethernet1/1
  switchport access vlan 100
  description DEV=node21 IF=ens4

interface Ethernet1/2
  switchport access vlan 200
  description DEV=node22 IF=ens4

evpn
  vni 10100 l2
    rd auto
    route-target import auto
    route-target export auto
  vni 10200 l2
    rd auto
    route-target import auto
    route-target export auto

vlan 3901
  vn-segment 50001

interface Vlan3901
  no shutdown
  vrf member VRF001
  ip forward

動作確認

Nexus9000v 各種テーブル確認

これで 4 つの node が相互通信可能になったわけですが、その通信確認後の Nexus9000v のテーブルを見ていきます。

VTEP 同士の peer 状態 / 自身の NVE 状態

torsw101a# show nve peers detail
Details of nve Peers:
----------------------------------------
Peer-Ip: 198.18.1.21
    NVE Interface       : nve1
    Peer State          : Up
    Peer Uptime         : 05:40:49
    Router-Mac          : 0021.9643.7607
    Peer First VNI      : 50001
    Time since Create   : 05:40:50
    Configured VNIs     : 10001-10300,50001
    Provision State     : add-complete
    Route-Update        : Yes
    Peer Flags          : RmacL2Rib, TunnelPD, DisableLearn
    Learnt CP VNIs      : 10100,50001
    Peer-ifindex-resp   : Yes
----------------------------------------


torsw101a# show nve internal platform interface nve 1 detail
Printing Interface ifindex 0x49000001 detail
|======|=========================|===============|===============|=====|=====|
|Intf  |State                    |PriIP          |SecIP          |Vnis |Peers|
|======|=========================|===============|===============|=====|=====|
|nve1  |UP                       |198.18.1.11    |0.0.0.0        |3    |1    |
|======|=========================|===============|===============|=====|=====|

SW_BD/VNIs of interface nve1:
================================================
|======|======|=========================|======|====|======|========
|Sw BD |Vni   |State                    |Intf  |Type|Vrf-ID|Notified
|======|======|=========================|======|====|======|========
|100   |10100 |UP                       |nve1  |CP  |0     |Yes
|300   |10300 |UP                       |nve1  |CP  |0     |Yes
|3901  |50001 |UP                       |nve1  |CP  |3     |Yes
|======|======|=========================|======|====|======|========

Peers of interface nve1:
============================================

Peer_ip: 198.18.1.21
  Peer-ID   : 1
  State     : UP
  Learning  : Disabled
  TunnelID  : 0xc6120115
  MAC       : 0021.9643.7607
  Table-ID  : 0x1
  Encap     : 0x1
torsw201a# show nve peers detail
Details of nve Peers:
----------------------------------------
Peer-Ip: 198.18.1.11
    NVE Interface       : nve1
    Peer State          : Up
    Peer Uptime         : 05:40:33
    Router-Mac          : 0021.960f.f307
    Peer First VNI      : 10100
    Time since Create   : 05:40:33
    Configured VNIs     : 10001-10300,50001
    Provision State     : add-complete
    Route-Update        : Yes
    Peer Flags          : RmacL2Rib, TunnelPD, DisableLearn
    Learnt CP VNIs      : 10100,50001
    Peer-ifindex-resp   : Yes
----------------------------------------


torsw201a# show nve internal platform interface nve 1 detail
Printing Interface ifindex 0x49000001 detail
|======|=========================|===============|===============|=====|=====|
|Intf  |State                    |PriIP          |SecIP          |Vnis |Peers|
|======|=========================|===============|===============|=====|=====|
|nve1  |UP                       |198.18.1.21    |0.0.0.0        |3    |1    |
|======|=========================|===============|===============|=====|=====|

SW_BD/VNIs of interface nve1:
================================================
|======|======|=========================|======|====|======|========
|Sw BD |Vni   |State                    |Intf  |Type|Vrf-ID|Notified
|======|======|=========================|======|====|======|========
|100   |10100 |UP                       |nve1  |CP  |0     |Yes
|200   |10200 |UP                       |nve1  |CP  |0     |Yes
|3901  |50001 |UP                       |nve1  |CP  |3     |Yes
|======|======|=========================|======|====|======|========

Peers of interface nve1:
============================================

Peer_ip: 198.18.1.11
  Peer-ID   : 1
  State     : UP
  Learning  : Disabled
  TunnelID  : 0xc612010b
  MAC       : 0021.960f.f307
  Table-ID  : 0x1
  Encap     : 0x1

あと、設定終わったと思いきや何か想定通り動かない…って時に、以下のコマンドを使いました。
以下出力例は「L3 VNI 用の SVI と VLAN が不正(前記設定の 3901 を設定していなかった)」ものですが、他にも mcast-group-or-ingress-rep-not-cfg みたいな割と分かりやすい出力もあります。

torsw101a# show nve internal vni 50001

VNI 50001
  Ready-State         : Not Ready [invalid sw-bd]

MP-BGP for EVPN Signaling Neighbor 情報

torsw101a# show bgp l2vpn evpn neighbors
BGP neighbor is 172.31.0.1, remote AS 64512, ibgp link, Peer index 3
  BGP version 4, remote router ID 172.31.0.1
  BGP state = Established, up for 1d06h
  Using loopback0 as update source for this peer
  Last read 00:00:14, hold time = 180, keepalive interval is 60 seconds
  Last written 00:00:48, keepalive timer expiry due 00:00:11
  Received 2911 messages, 0 notifications, 0 bytes in queue
  Sent 3216 messages, 0 notifications, 0 bytes in queue
  Connections established 1, dropped 0
  Last reset by us never, due to No error
  Last reset by peer never, due to No error

  Neighbor capabilities:
  Dynamic capability: advertised (mp, refresh, gr) received (mp, refresh, gr)
  Dynamic capability (old): advertised received
  Route refresh capability (new): advertised received
  Route refresh capability (old): advertised received
  4-Byte AS capability: advertised received
  Address family L2VPN EVPN: advertised received
  Graceful Restart capability: advertised received

  Graceful Restart Parameters:
  Address families advertised to peer:
    L2VPN EVPN
  Address families received from peer:
    L2VPN EVPN
  Forwarding state preserved by peer for:
  Restart time advertised to peer: 120 seconds
  Stale time for routes advertised by peer: 300 seconds
  Restart time advertised by peer: 120 seconds
  Extended Next Hop Encoding Capability: advertised received
  Receive IPv6 next hop encoding Capability for AF:
    IPv4 Unicast

  Message statistics:
                              Sent               Rcvd
  Opens:                         1                  1
  Notifications:                 0                  0
  Updates:                    1443               1449
  Keepalives:                 1771               1460
  Route Refresh:                 0                  0
  Capability:                    1                  1
  Total:                      3216               2911
  Total bytes:              155694             160466
  Bytes in queue:                0                  0

  For address family: L2VPN EVPN
  BGP table version 2897, neighbor version 2897
  4 accepted paths consume 512 bytes of memory
  6 sent paths
  Community attribute sent to this neighbor
  Extended community attribute sent to this neighbor
  Third-party Nexthop will not be computed.
  Last End-of-RIB received 00:00:01 after session start

  Local host: 172.16.1.1, Local port: 19618
  Foreign host: 172.31.0.1, Foreign port: 179
  fd = 76

EVPN 学習経路情報

Cumulus Linux と似た出力フォーマットですね。

torsw101a# show bgp l2vpn evpn
BGP routing table information for VRF default, address family L2VPN EVPN
BGP table version is 2893, local router ID is 172.16.1.1
Status: s-suppressed, x-deleted, S-stale, d-dampened, h-history, *-valid, >-best
Path type: i-internal, e-external, c-confed, l-local, a-aggregate, r-redist, I-injected
Origin codes: i - IGP, e - EGP, ? - incomplete, | - multipath, & - backup

   Network            Next Hop            Metric     LocPrf     Weight Path
Route Distinguisher: 172.16.1.1:32867    (L2VNI 10100)
*>l[2]:[0]:[0]:[48]:[0021.969a.0301]:[0]:[0.0.0.0]/216
                      198.18.1.11                       100      32768 i
*>i[2]:[0]:[0]:[48]:[0021.969f.c701]:[0]:[0.0.0.0]/216
                      198.18.1.21                       100          0 i
*>l[2]:[0]:[0]:[48]:[0021.969a.0301]:[32]:[192.168.1.1]/272
                      198.18.1.11                       100      32768 i
*>i[2]:[0]:[0]:[48]:[0021.969f.c701]:[32]:[192.168.1.2]/272
                      198.18.1.21                       100          0 i
*>l[3]:[0]:[32]:[198.18.1.11]/88
                      198.18.1.11                       100      32768 i
*>i[3]:[0]:[32]:[198.18.1.21]/88
                      198.18.1.21                       100          0 i

Route Distinguisher: 172.16.1.1:33067    (L2VNI 10300)
*>l[2]:[0]:[0]:[48]:[0021.963d.6e01]:[0]:[0.0.0.0]/216
                      198.18.1.11                       100      32768 i
*>l[2]:[0]:[0]:[48]:[0021.963d.6e01]:[32]:[192.168.3.1]/272
                      198.18.1.11                       100      32768 i
*>l[3]:[0]:[32]:[198.18.1.11]/88
                      198.18.1.11                       100      32768 i

Route Distinguisher: 172.16.2.1:32867
*>i[2]:[0]:[0]:[48]:[0021.969f.c701]:[0]:[0.0.0.0]/216
                      198.18.1.21                       100          0 i
*>i[2]:[0]:[0]:[48]:[0021.969f.c701]:[32]:[192.168.1.2]/272
                      198.18.1.21                       100          0 i
*>i[3]:[0]:[32]:[198.18.1.21]/88
                      198.18.1.21                       100          0 i

Route Distinguisher: 172.16.2.1:32967
*>i[2]:[0]:[0]:[48]:[0021.9642.5f01]:[32]:[192.168.2.1]/272
                      198.18.1.21                       100          0 i

Route Distinguisher: 172.16.1.1:3    (L3VNI 50001)
*>i[2]:[0]:[0]:[48]:[0021.9642.5f01]:[32]:[192.168.2.1]/272
                      198.18.1.21                       100          0 i
*>i[2]:[0]:[0]:[48]:[0021.969f.c701]:[32]:[192.168.1.2]/272
                      198.18.1.21                       100          0 i

MAC アドレステーブル

Cisco 公式 / NX-OSv 9000 Guide / NX-OSv 9000 Software Functionality の下の方に NX-OSv 9000 Feature UI/CLI Difference From Hardware Platform ってのがあって、show mac addr とかその辺は代わりにこのコマンドを使え、とあったので仮想版ではこれで。

anycast gateway(20:20:00:00:00:aa) のエントリ表示がおかしいのはご愛敬ということで。

torsw101a# show system internal l2fwder mac
Legend:
        * - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay MAC
        age - seconds since last seen,+ - primary entry using vPC Peer-Link,
        (T) - True, (F) - False, C - ControlPlane MAC
   VLAN     MAC Address      Type      age     Secure NTFY Ports
---------+-----------------+--------+---------+------+----+------------------
G  3901    0021.960f.f307    static   -          F     F   sup-eth1(R)
G   100    0021.960f.f307    static   -          F     F   sup-eth1(R)
*   100    0021.969f.c701    static   -          F     F  (0x47000001) nve-peer1 198.18.
G   300    0021.960f.f307    static   -          F     F   sup-eth1(R)
*   100    0021.969a.0301   dynamic   00:02:55   F     F     Eth1/1
*   300    0021.963d.6e01   dynamic   00:03:06   F     F     Eth1/2
    1           1         -20:20:00:00:00:aa         -             1
torsw201a# show system internal l2fwder mac
Legend:
        * - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay MAC
        age - seconds since last seen,+ - primary entry using vPC Peer-Link,
        (T) - True, (F) - False, C - ControlPlane MAC
   VLAN     MAC Address      Type      age     Secure NTFY Ports
---------+-----------------+--------+---------+------+----+------------------
G   100    0021.9643.7607    static   -          F     F   sup-eth1(R)
*   200    0021.9642.5f01   dynamic   00:01:03   F     F     Eth1/2
*   100    0021.969f.c701   dynamic   00:03:40   F     F     Eth1/1
G   200    0021.9643.7607    static   -          F     F   sup-eth1(R)
*   100    0021.969a.0301    static   -          F     F  (0x47000001) nve-peer1 198.18.
    1           1         -20:20:00:00:00:aa         -             1

MAC アドレステーブル(EVPN 学習観点)

Seq No があるってことは MAC Mobility Extended Community が使えるんじゃないのか!?(説明放棄)

torsw101a# show l2route evpn mac all

Flags -(Rmac):Router MAC (Stt):Static (L):Local (R):Remote (V):vPC link
(Dup):Duplicate (Spl):Split (Rcv):Recv (AD):Auto-Delete(D):Del Pending (S):Stale (C):Clear
(Ps):Peer Sync (O):Re-Originated

Topology    Mac Address    Prod   Flags         Seq No     Next-Hops
----------- -------------- ------ ------------- ---------- ----------------
100         0021.969a.0301 Local  L,            0          Eth1/1
100         0021.969f.c701 BGP    SplRcv        0          198.18.1.21
300         0021.963d.6e01 Local  L,            0          Eth1/2
3901        0021.9643.7607 VXLAN  Rmac          0          198.18.1.21
torsw201a# show l2route evpn mac all

Flags -(Rmac):Router MAC (Stt):Static (L):Local (R):Remote (V):vPC link
(Dup):Duplicate (Spl):Split (Rcv):Recv (AD):Auto-Delete(D):Del Pending (S):Stale (C):Clear
(Ps):Peer Sync (O):Re-Originated

Topology    Mac Address    Prod   Flags         Seq No     Next-Hops
----------- -------------- ------ ------------- ---------- ----------------
100         0021.969a.0301 BGP    SplRcv        0          198.18.1.11
100         0021.969f.c701 Local  L,            0          Eth1/1
200         0021.9642.5f01 Local  L,            0          Eth1/2
3901        0021.960f.f307 VXLAN  Rmac          0          198.18.1.11

VRF の ARP テーブル

VxLAN や EVPN は関係ないですが。
なお AgeOut しそうになると、自発的に Nexus9000v が ARP request をノードに投げて、reply があったら AgeOut させないという(割とよくある)動きをしていました。
また、自身の配下にあるノードの分しか見えません。

torsw101a# show ip arp vrf VRF001

Flags: * - Adjacencies learnt on non-active FHRP router
       + - Adjacencies synced via CFSoE
       # - Adjacencies Throttled for Glean
       CP - Added via L2RIB, Control plane Adjacencies
       PS - Added via L2RIB, Peer Sync
       RO - Dervied from L2RIB Peer Sync Entry
       D - Static Adjacencies attached to down interface

IP ARP Table for context VRF001
Total number of entries: 2
Address         Age       MAC Address     Interface       Flags
192.168.3.1     00:02:36  0021.963d.6e01  Vlan300
192.168.1.1     00:02:24  0021.969a.0301  Vlan100
torsw201a# show ip arp vrf VRF001

Flags: * - Adjacencies learnt on non-active FHRP router
       + - Adjacencies synced via CFSoE
       # - Adjacencies Throttled for Glean
       CP - Added via L2RIB, Control plane Adjacencies
       PS - Added via L2RIB, Peer Sync
       RO - Dervied from L2RIB Peer Sync Entry
       D - Static Adjacencies attached to down interface

IP ARP Table for context VRF001
Total number of entries: 2
Address         Age       MAC Address     Interface       Flags
192.168.1.2     00:00:07  0021.969f.c701  Vlan100
192.168.2.1     00:02:31  0021.9642.5f01  Vlan200

VRF のルーティングテーブル(IPv4)

EVPN の MAC-IP を NLRI Type 2 でやりとりしているので、ホスト単位の経路情報になってます。

torsw101a# show ip route vrf VRF001
IP Route Table for VRF "VRF001"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>

192.168.1.0/24, ubest/mbest: 1/0, attached
    *via 192.168.1.254, Vlan100, [0/0], 1d06h, direct
192.168.1.1/32, ubest/mbest: 1/0, attached
    *via 192.168.1.1, Vlan100, [190/0], 1d06h, hmm
192.168.1.2/32, ubest/mbest: 1/0
    *via 198.18.1.21%default, [200/0], 1d06h, bgp-64512, internal, tag 64512 (evpn) segid: 50001 tunnelid: 0xc6
120115 encap: VXLAN

192.168.1.254/32, ubest/mbest: 1/0, attached
    *via 192.168.1.254, Vlan100, [0/0], 1d06h, local
192.168.2.1/32, ubest/mbest: 1/0
    *via 198.18.1.21%default, [200/0], 1d06h, bgp-64512, internal, tag 64512 (evpn) segid: 50001 tunnelid: 0xc6
120115 encap: VXLAN

192.168.3.0/24, ubest/mbest: 1/0, attached
    *via 192.168.3.254, Vlan300, [0/0], 1d06h, direct
192.168.3.1/32, ubest/mbest: 1/0, attached
    *via 192.168.3.1, Vlan300, [190/0], 1d06h, hmm
192.168.3.254/32, ubest/mbest: 1/0, attached
    *via 192.168.3.254, Vlan300, [0/0], 1d06h, local
torsw201a# show ip route vrf VRF001
IP Route Table for VRF "VRF001"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>

192.168.1.0/24, ubest/mbest: 1/0, attached
    *via 192.168.1.254, Vlan100, [0/0], 1d21h, direct
192.168.1.1/32, ubest/mbest: 1/0
    *via 198.18.1.11%default, [200/0], 1d06h, bgp-64512, internal, tag 64512 (evpn) segid: 50001 tunnelid
: 0xc612010b encap: VXLAN

192.168.1.2/32, ubest/mbest: 1/0, attached
    *via 192.168.1.2, Vlan100, [190/0], 1d11h, hmm
192.168.1.254/32, ubest/mbest: 1/0, attached
    *via 192.168.1.254, Vlan100, [0/0], 1d21h, local
192.168.2.0/24, ubest/mbest: 1/0, attached
    *via 192.168.2.254, Vlan200, [0/0], 1d09h, direct
192.168.2.1/32, ubest/mbest: 1/0, attached
    *via 192.168.2.1, Vlan200, [190/0], 1d06h, hmm
192.168.2.254/32, ubest/mbest: 1/0, attached
    *via 192.168.2.254, Vlan200, [0/0], 1d09h, local
192.168.3.1/32, ubest/mbest: 1/0
    *via 198.18.1.11%default, [200/0], 1d06h, bgp-64512, internal, tag 64512 (evpn) segid: 50001 tunnelid
: 0xc612010b encap: VXLAN

通信・パケット確認

Cisco 公式 / Cisco Programmable Fabric with VXLAN BGP EVPN Configuration Guide / Chapter: Unicast Forwarding に色々なパターンのフォワーディング動作が書いてあるので、それと見比べながら。

node11(VLAN 100) node21(VLAN 100) 通信 (via L2VNI)

各ノードで $ sudo ip n flush dev ens4ARP テーブル flush した上で

kotetsu@node11:~$ ping 192.168.1.2
PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data.
64 bytes from 192.168.1.2: icmp_seq=1 ttl=64 time=24.5 ms
64 bytes from 192.168.1.2: icmp_seq=2 ttl=64 time=7.11 ms
64 bytes from 192.168.1.2: icmp_seq=3 ttl=64 time=7.05 ms
kotetsu@node11:~$ ip n show dev ens4
192.168.1.2 lladdr 00:21:96:9f:c7:01 STALE
192.168.1.254 lladdr 20:20:00:00:00:aa STALE
kotetsu@node21:~$ ip n show dev ens4
192.168.1.1 lladdr 00:21:96:9a:03:01 STALE
192.168.1.254 lladdr 20:20:00:00:00:aa STALE

node11 からの ARP request は torsw101a が L2 VNI 10100 で VxLAN カプセル化して Ingress Replication (外側の IP src/dst は torsw[12]01a の loopback1)

f:id:kakkotetsu:20170912000201p:plain

node11 からの ICMP Echo Request は単に L2 VNI 10100 で VxLAN カプセル化されたやつ

f:id:kakkotetsu:20170912000213p:plain

node11(VLAN 100) node22(VLAN 200) 通信 (via L3VNI)

各ノードで $ sudo ip n flush dev ens4ARP テーブル flush した上で

kotetsu@node11:~$ ping 192.168.2.1
PING 192.168.2.1 (192.168.2.1) 56(84) bytes of data.
64 bytes from 192.168.2.1: icmp_seq=1 ttl=62 time=22.1 ms
64 bytes from 192.168.2.1: icmp_seq=2 ttl=62 time=9.69 ms
64 bytes from 192.168.2.1: icmp_seq=3 ttl=62 time=12.3 ms
64 bytes from 192.168.2.1: icmp_seq=4 ttl=62 time=8.81 ms
kotetsu@node11:~$ ip n show dev ens4
192.168.1.2  FAILED
192.168.1.254 lladdr 20:20:00:00:00:aa STALE
kotetsu@node22:~$ ip n show dev ens4
192.168.2.254 lladdr 20:20:00:00:00:aa STALE

ARP Req は torsw101a から torsw201aVNI 10100 で転送されるが、torsw201a 側が ARP Reply はしないでくれるので Dup った ARP Reply が node11 に戻ることはないです。(結果的には良いのですが、torsw201a はどうやって巧く判断しているんだろう…)

f:id:kakkotetsu:20170912000228p:plain

node11 からの ICMP Echo Request は torsw101a が VRF 間の L3 VNI 50001 でカプセル化して転送

f:id:kakkotetsu:20170912000302p:plain

Control Plane パケット

かなり見飽きてきた感はありますが EVPN NLRI Type2 MAC/IP Advertisement route) Update を一つ見てみます。
今回、inter subnet 通信のために SVI を作っていますので、MAC アドレスだけでなくだけでなく IP アドレスもアドバタイズされています。

f:id:kakkotetsu:20170912000313p:plain

おしまい

  • Nexus9000v は仮想版なのに VxLAN + EVPN がそれなりに動いてくれる良い奴です
    • anycast gateway が動きました
      • 仮想 MAC アドレスで通信してくれるし、ICMP Echo Request にも応えてくれる良い奴です
    • ARP supression が動かせないのが残念
      • 「要はただの Proxy ARP だろうが!」なんて野暮なことは言わないで下さい
  • Nexus といえば最近は ACI 推しなイメージですが
    • 用途がマッチすれば ACI を使うことで、こういう基盤の細かい L2, L3 周りを隠ぺいしてくれるものと思われます (使ったことないからマーケティング公開情報ベースの想像)
    • ACI がマッチしない用途でも、Nexus を普通のイーサネットスイッチとして独立動作させることもできます (本項のような)
    • 何を言いたいかって「選択肢があるって良いことですね」「Nexus 使ったからって皆が (物理ネットワークを気にしてこんな設定をしないと | ACI を使わないと) いけないわけではないですよ」ということです