Zero Trust Isn't a Product.

It's an Operating Model.

And in Broadcom VCF 9.0, it finally has a real platform.

A Technical Blog | VMware Cloud Foundation 9.0 | Zero Trust Architecture

#ZeroTrust #VCF9 #CyberSecurity #CloudArchitect #Microsegmentation #NSX #DistributedFirewall

The Problem With 'Perimeter Security'

For decades, enterprise security was built on a single core assumption: if you're inside the network, you're trusted. We built firewalls at the edge, drew a hard line between 'external' and 'internal', and assumed everything behind that line was safe.

That assumption is now a liability.

Think about what 'inside the network' actually means today:

Remote workers on VPNs spanning continents
Hybrid cloud workloads split across on-prem and AWS/Azure
Containerised microservices with ephemeral identities
SaaS integrations that bypass your network entirely
Third-party contractors with broad internal access

The perimeter is everywhere. And a perimeter that's everywhere is effectively nowhere.

The castle-and-moat model was designed for a world where your data lived in one building. That world no longer exists.

Figure 1: The traditional perimeter model — once inside, traffic roams freely. A single breach grants lateral movement across the entire flat network.

In the diagram above, notice what happens the moment an attacker gets past the perimeter. There are no internal barriers. East-west traffic flows without inspection. One compromised VM becomes a launchpad for the entire environment.

Flat networks aren't just inefficient. They're a liability waiting to be exploited.

What Zero Trust Actually Means

Zero Trust is not a product you can buy. It is not a firewall SKU, a vendor badge, or a compliance checkbox. It is an architectural philosophy — and it rests on three principles:

Never trust, always verify — every request must be authenticated and authorised regardless of source
Assume breach — design systems as if attackers are already inside
Least privilege access — every identity, workload, and service gets the minimum access required, nothing more

The problem is that the industry has spent years selling 'Zero Trust-inspired' solutions — products that approximate these principles through overlapping tools, manual configuration, and bolt-on controls. The result is a security posture that looks good on paper and falls apart under pressure.

You cannot retrofit Zero Trust onto a flat network any more than you can retrofit fire exits onto a building that was never designed for them.

True Zero Trust requires the platform itself to be the enforcement layer. Security must be engineered into the architecture — not added on top of it. This is exactly what VCF 9.0 delivers.

VCF 9.0: Zero Trust as a Platform

VMware Cloud Foundation 9.0 is not a Zero Trust-inspired private cloud. It is a Zero Trust-native private cloud. The distinction matters enormously.

'Inspired by' means the principles influenced the design but compromises were made. 'Native' means the architecture is the security model — they are inseparable. In VCF 9.0, two core capabilities deliver this:

Figure 2: VCF 9.0 Zero Trust-Native Architecture — NSX VPCs providing macro isolation, with Distributed Firewall enforcement at every VM vNIC across all tenants.

NSX VPCs: Macro Isolation at the Tenant Boundary

NSX Virtual Private Clouds (VPCs) provide hard tenant-level segmentation within the VCF platform. Think of them as dedicated, isolated network constructs — not just logical groupings, but enforced boundaries that prevent any lateral movement between tenants.

Each VPC is an independent network domain with its own:

Routing domain — traffic cannot cross VPC boundaries without explicit policy
Address space — overlapping IP ranges are fully supported across tenants
Security policy context — each VPC operates under its own policy namespace
Network services — DNS, DHCP, NAT, load balancing are all VPC-scoped

This is macro isolation. Whether you're segregating business units, application environments, or customer tenants in a multi-tenant deployment, NSX VPCs provide the hard boundaries that flat VLANs never could.

NSX VPCs aren't just an organisational tool. They are enforcement points. Cross-VPC traffic is blocked by default — not permitted by default.

The Distributed Firewall: Micro-Enforcement at Every vNIC

If NSX VPCs are the macro layer, the Distributed Firewall (DFW) is the micro layer — and it is where VCF 9.0's Zero Trust architecture becomes truly powerful.

Traditional firewalls sit at network boundaries. Traffic must flow to the firewall to be inspected. In a flat network, that means east-west traffic — VM to VM, service to service — largely bypasses inspection entirely.

The VCF Distributed Firewall works differently. It is implemented as a kernel module in every ESXi hypervisor. This means enforcement happens at the vNIC of every single virtual machine — before the packet ever touches the virtual switch, before it traverses the network, before it reaches its destination.

Figure 3: The Distributed Firewall enforces policy at the hypervisor kernel level, at each VM vNIC — east-west traffic is inspected before it ever hits the wire.

What makes this architecturally significant:

The firewall cannot be bypassed — it operates below the OS layer of the VM
Policy is stateful and identity-aware — not just IP and port rules
Enforcement is consistent regardless of physical location — VM migration preserves policy
Performance overhead is minimal — enforcement happens in the fast path of the hypervisor
Visibility is complete — every east-west flow is logged, inspected, and policy-matched

East-west traffic doesn't move freely in VCF 9.0. It is inspected, policy-driven, and controlled at every hop — at the vNIC, not the edge.

Macro + Micro: One Consistent Trust Model

The architectural genius of VCF 9.0's Zero Trust implementation is how these two layers work together to create a single, consistent trust model from the user to the application to the workload.

NSX VPCs handle the macro layer — defining hard boundaries between tenants, business units, and application domains. The Distributed Firewall handles the micro layer — enforcing least-privilege access between every workload within those boundaries.

Together, they deliver:

Macro Isolation. Micro Enforcement. One consistent trust model from user to app to workload.

Figure 4: Zero Trust capability comparison — traditional infrastructure vs VCF 9.0's native approach across every critical dimension.

The comparison above is stark. Traditional infrastructure relies on coarse VLAN segmentation, lacks east-west inspection, uses static IP-based policy rules that break on VM migration, and is built on implicit trust. VCF 9.0 replaces every one of these with a native, platform-level alternative.

Why 'Engineered In' Matters

There is a meaningful difference between security that is engineered into a platform and security that is layered on top of one. It is not just a marketing distinction — it has real operational consequences.

The 'Layered On' Problem

When you bolt Zero Trust controls onto existing infrastructure, you end up with:

Overlapping toolsets from multiple vendors, each with their own policy models
Change freezes every time you need to update segmentation rules
Policy drift as VMs migrate and static rules become stale
Inconsistent enforcement as some workloads fall through coverage gaps
Complex troubleshooting across tools that don't share context

This isn't a theoretical concern. It is the lived reality of most enterprise security teams today — stitching together NSGs, network ACLs, hardware firewalls, and micro-segmentation overlays, hoping the gaps don't show.

The 'Engineered In' Advantage

When Zero Trust is native to the platform, the calculus flips entirely:

The DFW is always-on — there is no 'gap' because enforcement is in the hypervisor
Policy follows the workload — vMotion and DRS migrations preserve security posture
A single policy model — one consistent framework across all workloads and tenants
Operational simplicity — security teams manage policy, not infrastructure complexity
Auditability by default — every flow is visible, logged, and policy-attributed

Security that moves with the workload isn't just operationally convenient. It's a fundamentally different risk posture. Policy drift becomes impossible when the policy is part of the platform.

Private Cloud Just Grew Up

For most of the last decade, private cloud was playing catch-up with public cloud on agility. Developers wanted AWS-speed provisioning. Platform teams struggled to deliver it. The conversation was almost entirely about speed-to-deployment.

VCF 9.0 flips that narrative. On the dimension that matters most in 2025 — security architecture — private cloud now leads.

Consider what Zero Trust looks like in AWS:

IAM policies — complex JSON, easy to misconfigure, hard to audit
Security Groups — stateful but IP-centric, no workload identity
Network ACLs — stateless, coarse, applied at subnet level
VPC peering — creates implicit trust between environments
GuardDuty — detection, not prevention; you still need to respond

You can absolutely implement Zero Trust controls in AWS. But you are stitching together multiple services, each with their own model, each requiring expertise, and each introducing potential for gaps. The platform does not enforce Zero Trust — you bolt it on.

In VCF 9.0, the platform is the security model. You do not stitch. You do not overlap. You do not hope the gaps don't show. The DFW is always on. NSX VPCs are always isolated. Trust is never assumed.

Zero Trust isn't coming to private cloud. In VCF 9.0, it's already the default.

Closing Thoughts

Zero Trust is not a destination — it is a continuous operating model. But you cannot operate a Zero Trust model if your platform was not designed for it.

VCF 9.0 is the first private cloud platform that takes this seriously at every layer. NSX VPCs provide the isolation boundaries that make macro segmentation real. The Distributed Firewall provides the microsegmentation enforcement that makes east-west control real. Together, they deliver something that no overlay solution or bolt-on tool can match: a consistent, platform-native trust model that does not drift, does not gap, and does not break when workloads move.

If you are still designing private cloud environments with implicit internal trust, you are not behind on a feature. You are behind on a decade of threat evolution.

The question is no longer whether Zero Trust belongs in your private cloud. It is whether your platform was built for it.

VCF 9.0 was built for it. This isn't Zero Trust-inspired private cloud. This is Zero Trust-native private cloud.

Cloud Architect | VMware VCF Practice

VCF 9 Lab Network Pre-Requisites: Arista Switch Configuration, VLAN Design & Full Validation Covers

VCF 9 Lab Network Pre-Requisites — Farrukh Hanif

Introduction

This post is part of an ongoing series documenting the build-out of a physical VCF 9 home lab from scratch. Before a single VCF installer OVA is deployed, the physical network layer needs to be correct — VLANs present, MTUs consistent end-to-end, BGP uplinks reachable, and NFS accessible from the management domain. If any of these are wrong at day zero, VCF deployment will fail in ways that are difficult to diagnose after the fact.

This guide covers the complete network pre-requisite configuration applied to an Arista DCS-7050TX-64-R acting as the primary lab leaf switch, including the design decisions behind every choice, the full EOS configuration, and a thorough validation checklist. Everything here reflects a real deployment — including mistakes encountered along the way.

Note: This is not a theoretical design guide. Every command shown was run on real hardware. Where something failed during testing, it is documented here along with the fix.

Lab Hardware Overview

The physical lab consists of the following hardware. Understanding the role of each node informs every design decision that follows.

Component	Specification	Role in Lab
Arista DCS-7050TX-64-R	48x 10GbE RJ45, 4x QSFP+, EOS 4.19.10M	Primary lab leaf switch — all VLANs, BGP, SVIs
Cisco Catalyst 3750E-PoE-24	24x 1GbE PoE, IOS 15.2	Core access switch — trunked to Arista Et48
Supermicro SYS-6029TP-HTR (×2)	2U TwinPro², 4 nodes/chassis, dual Xeon Silver 4214R, 1TB RAM total	8× VCF compute/management nodes (Site 1 = CHx1 A-D, Site 2 = CHx2 E-H)
Dell PowerEdge R630	128GB DDR4, 4× 1GbE onboard	Management host: ESXi running Ubuntu 24.04 VM for NFS + Docker services
Intel NUC Skull Canyon	2× NIC, Ubuntu Desktop	Admin jumpbox — SSH gateway, Vaultwarden, HashiCorp Vault
HPE 3PAR StoreServ 8000	12× 1.2TB SAS + 8× 480GB SAS SSD	Future vSAN drives — require sg_format 520→512 byte sector conversion

Design Note: The R630 is not a VCF management domain host. It runs ESXi purely to host an Ubuntu Server 24.04 VM which provides NFS storage and Docker-based services (Outline wiki, Gitea, Oxidized, draw.io, HashiCorp Vault). This avoids a circular dependency — the R630 ESXi does not consume the NFS it serves.

VLAN Design & IP Addressing

VCF 9 requires a minimum of five dedicated VLANs per management domain: ESXi Management, vMotion, vSAN, NSX Host TEP, and VM Management. NFS storage and OOB IPMI are additional VLANs added for this lab. A second site VLAN range is pre-provisioned using a completely separate numbering scheme to avoid any ambiguity when both sites are active simultaneously.

Design Decisions

Site 1 VLANs use the 111x range (1110–1115). Site 2 uses 121x (1210–1215). The leading digit difference makes it immediately obvious from any port config or trunk which site a VLAN belongs to.
OOB IPMI uses VLAN 100 (Site 1) and VLAN 200 (Site 2) — intentionally low, access-only, never trunked with data VLANs.
BGP T0 uplink VLANs (60, 70, 160, 170) use dedicated /30 subnets on access-mode ports connecting to NSX Edge uplink vNICs. No other traffic shares these VLANs.
Native VLAN on blade trunks is set to ESXi Management (1111 for Site 1, 1211 for Site 2) so untagged ESXi management frames are processed correctly.
Both site VLAN ranges are trunked on all blade port-channels from day one. To isolate a site, remove that site's VLANs from the allowed list — no port mode changes required.

VLAN Reference — Site 1

VLAN	Name / Purpose	Subnet	Gateway	MTU	Notes
100	OOB IPMI / iDRAC	10.10.0.0/24	10.10.0.1	1500	Access only — Chassis 1+2 IPMI, R630 iDRAC
1110	VM Management	10.11.10.0/24	10.11.10.1	1500	VCF VM-Mgmt network
1111	ESXi Management	10.11.11.0/24	10.11.11.1	1500	SDDC Mgr, vCenter, NSX Mgr — Native VLAN on blade trunks
1112	vMotion	10.11.12.0/24	10.11.12.1	9000	Jumbo MTU — VDS vmkernel port must match
1113	vSAN	10.11.13.0/24	10.11.13.1	9000	Jumbo MTU — vSAN OSA architecture
1114	NSX Host TEP	10.11.14.0/24	10.11.14.1	9000	Geneve encapsulation — jumbo MTU mandatory
1115	NFS Storage	10.11.15.0/24	10.11.15.1	9000	NFS from R630 Ubuntu VM — static IP 10.11.15.10
60	NSX T0 Uplink 1	10.0.60.0/30	10.0.60.1	9216	BGP eBGP: Arista .1 ↔ T0 .2 — ASN 65000 ↔ 65001
70	NSX T0 Uplink 2	10.0.70.0/30	10.0.70.1	9216	BGP eBGP: Arista .1 ↔ T0 .2 — ASN 65000 ↔ 65001

VLAN Reference — Site 2

VLAN	Name / Purpose	Subnet	Gateway	MTU	Notes
200	OOB IPMI / iDRAC S2	10.20.0.0/24	10.20.0.1	1500	Access only — Chassis 2 IPMI
1210	VM Management S2	10.12.10.0/24	10.12.10.1	1500	Site 2 VM-Mgmt
1211	ESXi Management S2	10.12.11.0/24	10.12.11.1	1500	Native VLAN on CHx2 blade trunks
1212	vMotion S2	10.12.12.0/24	10.12.12.1	9000	Jumbo MTU
1213	vSAN S2	10.12.13.0/24	10.12.13.1	9000	Jumbo MTU — vSAN OSA
1214	NSX Host TEP S2	10.12.14.0/24	10.12.14.1	9000	Geneve — jumbo MTU mandatory
1215	NFS Storage S2	10.12.15.0/24	10.12.15.1	9000	Site 2 NFS
160	NSX T0 Uplink 1 S2	10.0.160.0/30	10.0.160.1	9216	BGP eBGP: Arista .1 ↔ T0 .2 — ASN 65000 ↔ 65002
170	NSX T0 Uplink 2 S2	10.0.170.0/30	10.0.170.1	9216	BGP eBGP: Arista .1 ↔ T0 .2 — ASN 65000 ↔ 65002

MTU Strategy

MTU misconfiguration is one of the most common causes of silent VCF failures. vSAN, vMotion, and NSX Geneve tunnels all require end-to-end jumbo frame support. A mismatch anywhere in the path causes fragmentation or silent drops that manifest as performance degradation or session instability rather than obvious errors.

Traffic Type	Required MTU	Applies To
ESXi / VM Mgmt / OOB	1500	VLANs 100, 200, 1110, 1111, 1210, 1211
vMotion	9000 (inner payload)	VLAN 1112 / 1212 — SVI MTU 9000
vSAN (OSA)	9000 (inner payload)	VLAN 1113 / 1213 — health check will warn on mismatch
NSX Host TEP (Geneve)	9000 inner / 9216 physical	VLAN 1114 / 1214 — Geneve adds ~50 bytes overhead
NFS Storage	9000	VLAN 1115 / 1215 — jumbo recommended even on 1GbE
NSX T0 BGP Uplinks	9216	VLANs 60, 70, 160, 170 — SVIs and access ports
Blade Port-Channels (Po1–Po8)	9216	Physical MTU headroom for Geneve overhead
R630 Trunk Ports	9000	1GbE links — practical ceiling for NFS and mgmt

Key Rule: Physical port MTU ≥ SVI MTU ≥ VMkernel port MTU. Blade port-channels = 9216. Jumbo SVIs = 9000 or 9216. VDS VMkernel ports for vMotion/vSAN/TEP = 9000. Never set VMkernel MTU higher than its SVI MTU.

Switch Port Allocation

Ports	Device	LAG / Mode	MTU	Notes
Et1–Et2	CHx1-NodeA (Site 1)	LACP → Po1	9216	Trunk: 100,200,1110-1115,1210-1215 \| Native: 1111
Et3–Et4	CHx1-NodeB (Site 1)	LACP → Po2	9216	Trunk: 100,200,1110-1115,1210-1215 \| Native: 1111
Et5–Et6	CHx1-NodeC (Site 1)	LACP → Po3	9216	Trunk: 100,200,1110-1115,1210-1215 \| Native: 1111
Et7–Et8	CHx1-NodeD (Site 1)	LACP → Po4	9216	Trunk: 100,200,1110-1115,1210-1215 \| Native: 1111
Et9–Et12	CHx1 IPMI (Nodes A–D)	Access	1500	Access VLAN 100 — OOB only
Et13–Et14	CHx2-NodeE (Site 2)	LACP → Po5	9216	Trunk: 100,200,1110-1115,1210-1215 \| Native: 1211
Et15–Et16	CHx2-NodeF (Site 2)	LACP → Po6	9216	Trunk: 100,200,1110-1115,1210-1215 \| Native: 1211
Et17–Et18	CHx2-NodeG (Site 2)	LACP → Po7	9216	Trunk: 100,200,1110-1115,1210-1215 \| Native: 1211
Et19–Et20	CHx2-NodeH (Site 2)	LACP → Po8	9216	Trunk: 100,200,1110-1115,1210-1215 \| Native: 1211
Et21–Et24	CHx2 IPMI (Nodes E–H)	Access	1500	Access VLAN 200 — OOB only
Et25–Et28	R630-1 NIC1–4	Trunk (no LAG)	9000	Trunk: all VLANs \| Native: 1111 \| 1GbE links
Et29	R630-1 iDRAC	Access	1500	Access VLAN 100
Et30–Et33	R630-2 NIC1–4	Trunk (no LAG)	9000	Trunk: all VLANs \| Native: 1111 \| 1GbE links
Et34	R630-2 iDRAC	Access	1500	Access VLAN 100
Et35	NUC NIC1	Trunk	1500	Admin trunk — all VLANs \| Native: 1111
Et36	NUC NIC2	Access	1500	Access VLAN 1110 — VM-Mgmt
Et37	NSX T0 S1 Uplink1	Access	9216	Access VLAN 60 \| BGP peer 10.0.60.2
Et38	NSX T0 S1 Uplink2	Access	9216	Access VLAN 70 \| BGP peer 10.0.70.2
Et39	NSX T0 S2 Uplink1	Access	9216	Access VLAN 160 \| BGP peer 10.0.160.2
Et40	NSX T0 S2 Uplink2	Access	9216	Access VLAN 170 \| BGP peer 10.0.170.2
Et41–Et46	SPARE	Shutdown	—	Available for future expansion
Et47	Internet Uplink	Routed L3	1500	192.168.31.2/24 — default route via 192.168.31.1
Et48	Cisco 3750E Trunk	Trunk	9216	All VLANs both sites \| Native: 1111
Et49–Et52	QSFP Reserved	Shutdown	—	40G uplinks — reserved
Management1	OOB Management	DHCP	1500	192.168.31.x/24 from home AP — out-of-band only

EOS Configuration

1 Baseline — Hostname, Routing & Credentials

Global — Hostname / Routing / Credentials

hostname VCF-LEAF-SW01
!
spanning-tree mode mstp
!
no aaa root
username admin privilege 15 role network-admin secret 0 <REPLACE_PASSWORD>
!
ip routing
!
! Default route toward home router — internet access for workload VMs via BGP
ip route 0.0.0.0/0 192.168.31.1

2 VLAN Database

VLAN Database — Site 1 & Site 2

! ── Site 1 VLANs ──────────────────────────────────────────────
vlan 60
   name NSX-T0-Uplink1-S1
vlan 70
   name NSX-T0-Uplink2-S1
vlan 100
   name OOB-IPMI-S1
vlan 1110
   name VM-Mgmt-S1
vlan 1111
   name ESX-Mgmt-S1
vlan 1112
   name vMotion-S1
vlan 1113
   name vSAN-S1
vlan 1114
   name NSX-TEP-S1
vlan 1115
   name NFS-S1
! ── Site 2 VLANs ──────────────────────────────────────────────
vlan 160
   name NSX-T0-Uplink1-S2
vlan 170
   name NSX-T0-Uplink2-S2
vlan 200
   name OOB-IPMI-S2
vlan 1210
   name VM-Mgmt-S2
vlan 1211
   name ESX-Mgmt-S2
vlan 1212
   name vMotion-S2
vlan 1213
   name vSAN-S2
vlan 1214
   name NSX-TEP-S2
vlan 1215
   name NFS-S2

3 LACP Port-Channels

Each Supermicro blade node has two 10GbE NICs bonded as LACP port-channels (active/active) providing link redundancy and 20Gbps aggregate bandwidth. All port-channels trunk both site VLAN ranges from day one.

Port-Channel Configuration (LACP) — Chassis 1 & Chassis 2

! ── Chassis 1 — Nodes A/B/C/D (Site 1 native VLAN 1111) ──────
interface Port-Channel1
   description CHx1-NodeA-LACP
   switchport mode trunk
   switchport trunk allowed vlan 100,200,1110-1115,1210-1215
   switchport trunk native vlan 1111
   mtu 9216
   no shutdown
!
interface Port-Channel2
   description CHx1-NodeB-LACP
   switchport mode trunk
   switchport trunk allowed vlan 100,200,1110-1115,1210-1215
   switchport trunk native vlan 1111
   mtu 9216
   no shutdown
!
interface Port-Channel3
   description CHx1-NodeC-LACP
   switchport mode trunk
   switchport trunk allowed vlan 100,200,1110-1115,1210-1215
   switchport trunk native vlan 1111
   mtu 9216
   no shutdown
!
interface Port-Channel4
   description CHx1-NodeD-LACP
   switchport mode trunk
   switchport trunk allowed vlan 100,200,1110-1115,1210-1215
   switchport trunk native vlan 1111
   mtu 9216
   no shutdown
!
! ── Chassis 2 — Nodes E/F/G/H (Site 2 native VLAN 1211) ──────
interface Port-Channel5
   description CHx2-NodeE-LACP
   switchport mode trunk
   switchport trunk allowed vlan 100,200,1110-1115,1210-1215
   switchport trunk native vlan 1211
   mtu 9216
   no shutdown
!
! Po6/Po7/Po8 follow identical pattern with native vlan 1211

LACP Member Ports — Physical blade NICs (Node A example)

interface Ethernet1
   description CHx1-NodeA-NIC1-LAG1
   switchport mode trunk
   switchport trunk allowed vlan 100,200,1110-1115,1210-1215
   switchport trunk native vlan 1111
   mtu 9216
   channel-group 1 mode active
   spanning-tree portfast
   no shutdown
!
interface Ethernet2
   description CHx1-NodeA-NIC2-LAG1
   switchport mode trunk
   switchport trunk allowed vlan 100,200,1110-1115,1210-1215
   switchport trunk native vlan 1111
   mtu 9216
   channel-group 1 mode active
   spanning-tree portfast
   no shutdown
!
! Pattern repeats:
! Et3/Et4  -> channel-group 2  (Node B)
! Et5/Et6  -> channel-group 3  (Node C)
! Et7/Et8  -> channel-group 4  (Node D)
! Et13/14  -> channel-group 5  native 1211 (Node E)
! Et15/16  -> channel-group 6  native 1211 (Node F)
! Et17/18  -> channel-group 7  native 1211 (Node G)
! Et19/20  -> channel-group 8  native 1211 (Node H)

4 OOB IPMI / iDRAC Ports

IPMI / iDRAC Access Ports

! Chassis 1 IPMI — access VLAN 100 (Et9–Et12)
interface Ethernet9
   description CHx1-NodeA-IPMI
   switchport mode access
   switchport access vlan 100
   mtu 1500
   spanning-tree portfast
   no shutdown
! Et10/11/12 — NodeB/C/D IPMI — identical config, VLAN 100
!
! Chassis 2 IPMI — access VLAN 200 (Et21–Et24)
interface Ethernet21
   description CHx2-NodeE-IPMI
   switchport mode access
   switchport access vlan 200
   mtu 1500
   spanning-tree portfast
   no shutdown
! Et22/23/24 — NodeF/G/H IPMI — identical config, VLAN 200
!
! R630-1 iDRAC — access VLAN 100 (Et29)
interface Ethernet29
   description R630-1-iDRAC-OOB
   switchport mode access
   switchport access vlan 100
   mtu 1500
   spanning-tree portfast
   no shutdown

5 Dell R630 — Management / NFS Host

The R630 has 4× onboard 1GbE NICs. All four are trunked with full VLAN ranges. The Ubuntu VM has a static IP of 10.11.15.10 on VLAN 1115 (NFS) and resides on VLAN 1111 (ESXi Management).

R630-1 Management / NFS Host — Et25–Et28

interface Ethernet25
   description R630-1-NIC1
   switchport mode trunk
   switchport trunk allowed vlan 100,200,1110-1115,1210-1215
   switchport trunk native vlan 1111
   mtu 9000
   spanning-tree portfast
   no shutdown
!
! Ethernet26/27/28 — R630-1-NIC2/3/4 — identical config

6 NSX T0 BGP Uplink Ports

Four dedicated access ports connect to NSX Edge node uplink vNICs. Each port is an access port on its own /30 VLAN. The Arista SVI acts as the BGP peer endpoint.

NSX T0 BGP Uplink Access Ports — Et37–Et40

! Site 1
interface Ethernet37
   description NSX-T0-S1-Uplink1-VLAN60
   switchport mode access
   switchport access vlan 60
   mtu 9216
   spanning-tree portfast
   no shutdown
!
interface Ethernet38
   description NSX-T0-S1-Uplink2-VLAN70
   switchport mode access
   switchport access vlan 70
   mtu 9216
   spanning-tree portfast
   no shutdown
!
! Site 2
interface Ethernet39
   description NSX-T0-S2-Uplink1-VLAN160
   switchport mode access
   switchport access vlan 160
   mtu 9216
   spanning-tree portfast
   no shutdown
!
interface Ethernet40
   description NSX-T0-S2-Uplink2-VLAN170
   switchport mode access
   switchport access vlan 170
   mtu 9216
   spanning-tree portfast
   no shutdown

7 Internet Uplink & Cisco Core Trunk

Internet Uplink / Cisco Core SW / OOB Management

! Et47 — Routed L3 port to home router/firewall
! Workload VMs reach internet via NSX T0 -> Arista Et47 -> 192.168.31.1
interface Ethernet47
   description Internet-Uplink-HomeRouter
   no switchport
   ip address 192.168.31.2/24
   mtu 1500
   no shutdown
!
! Et48 — Trunk uplink to Cisco Catalyst 3750E (VCF-CORE-SW01)
interface Ethernet48
   description Cisco-VCF-CORE-SW01-Trunk
   switchport mode trunk
   switchport trunk allowed vlan 60,70,100,160,170,200,1110-1115,1210-1215
   switchport trunk native vlan 1111
   mtu 9216
   no shutdown
!
! Management1 — OOB port to home AP (separate from data plane)
interface Management1
   description OOB-Management-HomeAP
   ip address dhcp
   no shutdown

8 SVIs — Layer 3 Routing Interfaces

Layer 3 SVIs — Site 1 (Site 2 follows same pattern)

! ── Site 1 SVIs ───────────────────────────────────────────────
interface Vlan100
   description OOB-IPMI-S1
   ip address 10.10.0.1/24
   mtu 1500
   no shutdown
!
interface Vlan1110
   description VM-Mgmt-S1
   ip address 10.11.10.1/24
   mtu 1500
   no shutdown
!
interface Vlan1111
   description ESX-Mgmt-S1
   ip address 10.11.11.1/24
   mtu 1500
   no shutdown
!
interface Vlan1112
   description vMotion-S1
   ip address 10.11.12.1/24
   mtu 9000
   no shutdown
!
interface Vlan1113
   description vSAN-S1
   ip address 10.11.13.1/24
   mtu 9000
   no shutdown
!
interface Vlan1114
   description NSX-TEP-S1
   ip address 10.11.14.1/24
   mtu 9000
   no shutdown
!
interface Vlan1115
   description NFS-S1
   ip address 10.11.15.1/24
   mtu 9000
   no shutdown
!
interface Vlan60
   description NSX-T0-Uplink1-S1
   ip address 10.0.60.1/30
   mtu 9216
   no shutdown
!
interface Vlan70
   description NSX-T0-Uplink2-S1
   ip address 10.0.70.1/30
   mtu 9216
   no shutdown
!
! ── Site 2 SVIs — same structure ──────────────────────────────
! Vlan200/1210/1211           -> mtu 1500,  10.20.x / 10.12.1x.x
! Vlan1212/1213/1214/1215     -> mtu 9000,  10.12.1x.x
! Vlan160  -> ip 10.0.160.1/30  mtu 9216
! Vlan170  -> ip 10.0.170.1/30  mtu 9216

9 eBGP Configuration — NSX T0 Peering

The Arista runs eBGP ASN 65000. NSX T0 Site 1 uses ASN 65001, Site 2 uses ASN 65002. The Arista advertises all infrastructure subnets plus the default route so workload VMs can reach the internet through the NSX T0 → Arista → Et47 path.

eBGP Configuration — ASN 65000

router bgp 65000
   router-id 10.11.11.254
   no bgp default ipv4-unicast
   maximum-paths 4 ecmp 4
   !
   ! ── Site 1 T0 peers (ASN 65001) ──────────────────────────
   neighbor 10.0.60.2 remote-as 65001
   neighbor 10.0.60.2 description NSX-T0-S1-Uplink1
   neighbor 10.0.60.2 send-community
   neighbor 10.0.60.2 maximum-routes 500
   neighbor 10.0.60.2 bfd
   !
   neighbor 10.0.70.2 remote-as 65001
   neighbor 10.0.70.2 description NSX-T0-S1-Uplink2
   neighbor 10.0.70.2 send-community
   neighbor 10.0.70.2 maximum-routes 500
   neighbor 10.0.70.2 bfd
   !
   ! ── Site 2 T0 peers (ASN 65002) ──────────────────────────
   neighbor 10.0.160.2 remote-as 65002
   neighbor 10.0.160.2 description NSX-T0-S2-Uplink1
   neighbor 10.0.160.2 send-community
   neighbor 10.0.160.2 maximum-routes 500
   neighbor 10.0.160.2 bfd
   !
   neighbor 10.0.170.2 remote-as 65002
   neighbor 10.0.170.2 description NSX-T0-S2-Uplink2
   neighbor 10.0.170.2 send-community
   neighbor 10.0.170.2 maximum-routes 500
   neighbor 10.0.170.2 bfd
   !
   address-family ipv4
      neighbor 10.0.60.2  activate
      neighbor 10.0.70.2  activate
      neighbor 10.0.160.2 activate
      neighbor 10.0.170.2 activate
      ! OOB
      network 10.10.0.0/24
      network 10.20.0.0/24
      ! Site 1 infrastructure
      network 10.11.10.0/24
      network 10.11.11.0/24
      network 10.11.12.0/24
      network 10.11.13.0/24
      network 10.11.14.0/24
      network 10.11.15.0/24
      ! Site 2 infrastructure
      network 10.12.10.0/24
      network 10.12.11.0/24
      network 10.12.12.0/24
      network 10.12.13.0/24
      network 10.12.14.0/24
      network 10.12.15.0/24
      ! Default route — workload VM internet access
      network 0.0.0.0/0

NSX T0 Side Required: Configure matching BGP settings on NSX T0 — Local AS 65001 (Site 1) or 65002 (Site 2), remote-as 65000, neighbour IPs 10.0.60.1 / 10.0.70.1 (Site 1) and 10.0.160.1 / 10.0.170.1 (Site 2). BFD must be enabled on both sides if used.

10 NTP, LLDP, SSH & eAPI

NTP / LLDP / SSH / eAPI

lldp run
!
ntp server 192.168.31.1 prefer
ntp server 0.pool.ntp.org
ntp server 1.pool.ntp.org
!
logging on
logging buffered 65535 informational
! logging host <SYSLOG_SERVER_IP>
!
management ssh
   idle-timeout 60
   authentication mode password
   no shutdown
!
management api http-commands
   protocol https
   no protocol http
   no shutdown
   vrf default
      no shutdown

Validation — End-to-End Checklist

Run this validation sequence in order. Each phase builds on the previous. Do not proceed to VCF deployment until all checks pass.

Phase 1 — Physical Layer & Port State

Check	EOS Command	Expected Result
All active ports are up/up	show interfaces status	Connected ports show connected, correct speed
No err-disabled ports	show interfaces status err-disabled	No output (empty)
LACP port-channels formed	show port-channel summary	Po1–Po8 show U (in use), member ports show P (bundled)
LLDP neighbours visible	show lldp neighbors	R630, NUC, Cisco 3750E, Supermicro nodes visible
Correct LLDP port mapping	show lldp neighbors detail	Verify each device on expected interface

Phase 2 — VLAN & Trunk Verification

Check	EOS Command	Expected Result
All 18 VLANs in database	show vlan	VLANs 60,70,100,160,170,200,1110–1115,1210–1215 active
VLANs active on correct ports	show vlan id 1111	Po1–Po4, Et25–28, Et35, Et48 listed
Blade trunks carry both site VLANs	show interfaces trunk	Po1–Po8 allowed VLANs include both 111x and 121x ranges
Native VLANs correct	show interfaces trunk	Po1–Po4 native=1111, Po5–Po8 native=1211
IPMI ports in correct VLAN	show interfaces Ethernet9 switchport	Access VLAN 100
T0 uplink ports in correct VLAN	show interfaces Ethernet37 switchport	Access VLAN 60

Phase 3 — Layer 3 SVI & IP Routing

Check	EOS Command	Expected Result
All SVIs are up/up	show ip interface brief	All Vlan interfaces show protocol up
SVI IP addresses correct	show ip interface brief	Verify .1 address on each VLAN subnet
SVI MTU matches VLAN policy	show interfaces Vlan1112	MTU 9000 for jumbo VLANs, 1500 for mgmt VLANs
Routing table populated	show ip route	Connected routes for all 18 subnets present
Default route installed	show ip route 0.0.0.0/0	Via 192.168.31.1, Ethernet47
Internet reachability	ping vrf default 8.8.8.8	Success — confirms Et47 uplink and NAT on home router

Phase 4 — MTU End-to-End Validation

Check	EOS Command / Test	Expected Result
SVI MTU — jumbo VLANs	show interfaces Vlan1113	MTU 9000
SVI MTU — T0 uplinks	show interfaces Vlan60	MTU 9216
Port-channel MTU	show interfaces Port-Channel1	MTU 9216
Physical member port MTU	show interfaces Ethernet1	MTU 9216
Jumbo ping — vSAN VLAN	ping vrf default 10.11.13.1 size 8972 df-bit	Success — 5/5 packets
Jumbo ping — TEP VLAN	ping vrf default 10.11.14.1 size 8972 df-bit	Success — 5/5 packets

Jumbo Frame Ping Tests — Run from Arista

! 8972 byte payload + 28 byte IP/ICMP header = 9000 bytes on wire
! Failure = MTU mismatch somewhere in the path
ping vrf default 10.11.12.1 size 8972 df-bit repeat 5   ! vMotion
ping vrf default 10.11.13.1 size 8972 df-bit repeat 5   ! vSAN
ping vrf default 10.11.14.1 size 8972 df-bit repeat 5   ! NSX TEP
ping vrf default 10.11.15.1 size 8972 df-bit repeat 5   ! NFS

Phase 5 — BGP Uplink Verification

Check	EOS Command	Expected Result
BGP process running	show bgp summary	BGP process up — peers may show Active/Idle pre-NSX
T0 uplink SVIs up (Site 1)	show interfaces Vlan60	up/up, IP 10.0.60.1/30, MTU 9216
T0 uplink SVIs up (Site 1)	show interfaces Vlan70	up/up, IP 10.0.70.1/30, MTU 9216
Physical uplink ports up	show interfaces Ethernet37	connected, 10G full, MTU 9216
[Post-NSX] BGP Established	show bgp summary	Peer 10.0.60.2 state = Established, prefixes > 0
[Post-NSX] Routes received from T0	show bgp neighbors 10.0.60.2 received-routes	NSX overlay segment routes received
[Post-NSX] Routes advertised to T0	show bgp neighbors 10.0.60.2 advertised-routes	All 16 infra subnets + 0.0.0.0/0 shown
[Post-NSX] BFD sessions up	show bfd peers	State = Up for all T0 peers

Phase 6 — NFS Path Validation

NFS Path Validation Commands

! From Arista — confirm NFS SVI is up and R630 VM is reachable
show interfaces Vlan1115
ping vrf default 10.11.15.10

! From ESXi host (once installed) — confirm NFS mount path
esxcli network ip interface list
vmkping -I vmk0 -d -s 8972 10.11.15.10    ! jumbo frame test to NFS server

! From Ubuntu NFS server (R630 VM) — verify export is active
showmount -e localhost
cat /etc/exports
systemctl status nfs-kernel-server

NFS Firewall Note: The Ubuntu NFS VM uses UFW. Ensure rules allow traffic from 10.11.11.0/24 (ESXi Management) and 10.11.15.0/24 (NFS VLAN) on ports 111 (rpcbind) and 2049 (NFS). Run: ufw allow from 10.11.11.0/24 to any port 2049

Phase 7 — OOB IPMI Reachability

Check	EOS Command	Expected Result
IPMI VLAN 100 SVI up	show interfaces Vlan100	up/up, 10.10.0.1/24
IPMI VLAN 200 SVI up	show interfaces Vlan200	up/up, 10.20.0.1/24
Ping Chassis 1 Node A IPMI	ping vrf default 10.10.0.x	Success (DHCP addr from VLAN 100 pool)
Ping R630 iDRAC	ping vrf default 10.10.0.y	Success
IPMI ports in access VLAN	show interfaces Ethernet9 switchport	Access mode, VLAN 100

Phase 8 — Full Connectivity Matrix

Full Connectivity Matrix — Run from Arista before VCF deployment

! ── Layer 3 SVI self-test ──────────────────────────────────────
ping vrf default 10.11.11.1 source Vlan1111    ! ESXi Mgmt SVI
ping vrf default 10.11.12.1 source Vlan1112    ! vMotion SVI
ping vrf default 10.11.13.1 source Vlan1113    ! vSAN SVI
ping vrf default 10.11.14.1 source Vlan1114    ! NSX TEP SVI
ping vrf default 10.11.15.1 source Vlan1115    ! NFS SVI
ping vrf default 10.0.60.1  source Vlan60      ! T0 Uplink1 SVI
ping vrf default 10.0.70.1  source Vlan70      ! T0 Uplink2 SVI
!
! ── Device reachability ─────────────────────────────────────────
ping vrf default 10.11.15.10                   ! R630 NFS VM
ping vrf default 10.10.0.x                     ! Supermicro IPMI
ping vrf default 192.168.31.1                  ! Home router
ping vrf default 8.8.8.8                       ! Internet
!
! ── BGP T0 uplinks (after NSX deployment) ──────────────────────
ping vrf default 10.0.60.2                     ! T0 S1 Uplink1 peer
ping vrf default 10.0.70.2                     ! T0 S1 Uplink2 peer
ping vrf default 10.0.160.2                    ! T0 S2 Uplink1 peer
ping vrf default 10.0.170.2                    ! T0 S2 Uplink2 peer

Issues Encountered & Fixes

Real issues hit during this lab build, documented so others don't waste time on the same problems.

Symptom	Root Cause	Fix
LACP port-channel stuck in I (individual) state	ESXi not yet installed — no LACP PDUs sent from host NIC team	Expected pre-ESXi. Port-channels form once ESXi LACP NIC teaming is configured on the VDS. Verify with `show lacp neighbor` once ESXi is up.
vSAN health: MTU check failed	VMkernel vSAN port MTU left at 1500 default while SVI is 9000	Set VMkernel port MTU to 9000 on the VDS vSAN portgroup. Must match across all ESXi hosts in the cluster.
NSX TEP tunnels not forming	MTU mismatch — Geneve needs ~50 bytes overhead on top of 9000 inner payload	Confirm blade port-channel MTU is 9216. Verify: `show interfaces Po1` — MTU must show 9216 not 9000.
BGP peer stuck in Active state	NSX Edge uplink vNIC not connected to correct port or VLAN mismatch on access port	Verify NSX Edge uplink vNIC is on the correct portgroup, VLAN 60/70 is tagged, and the physical port (Et37/38) shows connected.
NFS datastore mount fails in VCF	UFW on R630 Ubuntu VM blocking NFS ports from ESXi management subnet	`ufw allow from 10.11.11.0/24 to any port 2049` and `ufw allow from 10.11.11.0/24 to any port 111`
Management1 and Et47 subnet overlap concern	Both ports on 192.168.31.x — potential routing confusion	Management1 operates in the mgmt VRF, Et47 is in the default VRF. No actual overlap. Confirm with `show ip interface brief`.

Next Steps

With the network layer validated, the remaining pre-requisites before launching the VCF 9 SDDC Manager installer OVA are:

ESXi 9.x installed on all four management domain nodes (CHx1 Nodes A–D) with management vmkernel on VLAN 1111
VDS configured with portgroups for vMotion (1112), vSAN (1113), NSX TEP (1114), and NFS (1115) with correct MTU settings
DNS entries created for all VCF components (SDDC Manager, vCenter, NSX Manager ×3, ESXi hosts) before deployment begins
NTP synchronised across all hosts — VCF deployment fails if time drift exceeds threshold
VCF 9 Planning and Preparation Workbook completed with all IP and DNS entries populated
R630 NFS export mounted as a datastore on all management domain hosts for SDDC Manager VM storage
3PAR drives sg_formatted 520→512 byte sectors using sg_format (sg3_utils) via LSI 9211-8i HBA in IT mode for vSAN OSA

VCF 9 Architecture Change: The SDDC Manager OVA is the installer in VCF 9. There is no separate Cloud Builder appliance as in VCF 5.x. This catches many engineers familiar with older documentation — do not reference VCF 5.x deployment guides.

Upgrading NSX Manager in a Federated VCF Environment

Upgrading NSX Manager in a Federated VCF Environment | Farrukh's Tech Blog

VMware VCF · NSX Federation · Deep Dive

Upgrading NSX Manager
in a Federated VCF Environment

A step-by-step architect's guide for upgrading NSX 4.1.2.3 → 4.2.3.1 when SDDC Manager has no visibility of Global Managers — and why sequence is everything.

● VCF 5.x ● NSX Federation 4.1.2.3 → 4.2.3.1 April 2025

Upgrading NSX in a standard VCF workload domain is a well-understood workflow — SDDC Manager owns the lifecycle, orchestrates the upgrade bundle, and walks you through a pre-check → upgrade → validation loop. But introduce NSX Federation — with its Global Manager / Local Manager topology — and that comfortable automation suddenly has a blind spot: SDDC Manager has no visibility of Global Managers whatsoever.

Get the sequence wrong, and you can end up with a Local Manager running a newer NSX version than your Global Manager. Federation's N±1 interoperability rule means that is a hard-stop condition. This post walks through the complete, architect-level upgrade sequence for moving from NSX 4.1.2.3 → 4.2.3.1 in a federated VCF environment.

Section 01

Understanding the Architectural Blind Spot

Before any upgrade activity, you must understand what SDDC Manager sees and what it doesn't.

NSX Federation — Multi-site Architecture with Global Manager and Local Managers

NSX Federation Architecture — Global Manager and Local Managers

NSX-T / NSX Federation topology — Global Manager (Active/Standby) and per-site Local Managers

In a federated NSX deployment inside VCF:

Local Managers (LM) are registered as part of VCF workload domains. SDDC Manager sees them, manages their lifecycle, and upgrades them.
Global Managers (GM) are deployed independently and registered to SDDC Manager's inventory only as an external reference — SDDC Manager cannot upgrade them.
This means Global Manager upgrade is entirely manual, and must always happen before the Local Manager upgrade is triggered via SDDC Manager.

⚑ Fact-Checked — The "LM must never lead GM" rule has changed

A common misconception (including in earlier drafts of this post) is that allowing LM to exceed the GM version will categorically break federation. This was true prior to NSX 4.1.1, but no longer applies to 4.1.1+ or 4.2. Starting with NSX 4.1.1, and explicitly confirmed for NSX 4.2, upgrades can occur in any order — LM first or GM first — and federation sync is maintained across any version combination between 3.2 and 4.2.

That said, for VCF deployments on 4.1.x → 4.2, Broadcom's documented procedure still prescribes upgrading GM manually first for two specific reasons: (1) VCF BOM and SDDC Manager orchestration alignment, and (2) a resolved defect in 4.1.x that required all sites to be upgraded before moving to NSX 4.2. GM-first is still the right operational call — just understand why, so you're not cargo-culting an outdated rule.

📄 Broadcom TechDocs — Upgrading NSX Federation (NSX 4.2)
📄 Broadcom TechDocs — Upgrading NSX Federation (NSX 4.1)

Section 02

The Real Interoperability Model — What Changed in 4.1.1 and 4.2

The old N±1 rule — where GM had to be upgraded before LM at all times — applied only up to NSX 4.1.0. Broadcom fundamentally relaxed this constraint in subsequent releases. Understanding the version-specific rules is essential before planning your sequence.

Global Manager

4.2.3.1

⇌

Local Manager

4.1.2.3

✓ Actual Compatibility Rules — NSX 4.1.1+ and 4.2

Any upgrade order is supported (LM-first or GM-first) ✓
GM and LM sync is maintained across any version combination 3.2–4.2 ✓
Old N±1 rule → Applies only to NSX 4.1.0 and earlier
VCF procedure still prescribes GM-first for BOM + defect reasons

During the interim window — after you've upgraded GM to 4.2.3.1 but before SDDC Manager upgrades LM — your environment sits in a mixed-version state (GM 4.2.3.1, LM 4.1.2.3). Per Broadcom's documentation, federation sync continues uninterrupted in this state. The GM-first sequence is followed here because the official VCF 5.x upgrade procedure mandates it, not because the architecture requires it.

📚 Official References

      → Broadcom TechDocs: Upgrading NSX Federation — NSX 4.2 (GM/LM any order, 3.2–4.2 range)
      → Broadcom TechDocs: Upgrading NSX Federation — NSX 4.1 (any order from 4.1.1+; N±1 prior to 4.1.1)
      → VMware Product Interoperability Matrix — NSX Upgrade Paths (productId=912)
    

Section 03

Phase 1 — Pre-Upgrade Validation

Before touching a single component, perform thorough environmental health checks. Upgrades that fail mid-way in federated environments are significantly harder to recover from than in standalone deployments.

1.1 — Federation Health

Validate GM ↔ LM Channel Status

In Global Manager UI → System → Location Manager — all sites must show ACTIVE. Any DEGRADED or STANDBY alarm must be resolved before proceeding.

Check Config Replication Sync State

Verify no pending replication lag from GM to LM. Push a test config change and confirm propagation before upgrade.

Review Broadcom Interoperability Matrix

Confirm vCenter, ESXi, and vSAN versions in the target workload domain are all compatible with NSX 4.2.3.1. Use the VMware Interoperability Matrix at interopmatrix.vmware.com.

Backup GM and LM (All Nodes)

Trigger a manual NSX configuration backup for both Global Manager and all Local Managers via System → Backup & Restore. Confirm backup file is written and accessible.

Confirm VCF BOM Alignment

In SDDC Manager, validate that the VCF release bundle you are upgrading to includes NSX 4.2.3.1 in its Bill of Materials. SDDC Manager will not offer an NSX version that isn't in its BOM.

Confirm No Active Span Operations

Ensure no stretched segment migrations, HCX workload moves, or cross-site DR operations are in-flight. Pause or complete these before upgrade windows open.

Section 04

Phase 2 — Upgrade Global Manager (Manual)

ℹ SDDC Manager is not involved here

This entire phase is performed directly in the NSX Global Manager UI or via NSX API. SDDC Manager has zero visibility of this operation. You must complete this phase yourself before triggering anything via SDDC Manager.

4.1 — Active / Standby GM Pair

NSX Global Manager · System · Lifecycle Management · Upgrade

NSX Upgrade Coordinator — Global Manager

NSX Global Manager — Upgrade Coordinator showing bundle upload and pre-check phase

Upgrade Sequence — Active/Standby GM Pair

# Step 1 — Upload upgrade bundle to STANDBY Global Manager
Action : System → Lifecycle Mgmt → Upgrade
Upload : VMware-NSX-4.2.3.1-upgrade-bundle.mub
Target : Standby GM only

# Step 2 — Run pre-check on Standby GM
Action : Run Prechecks → Resolve all WARNINGs/ERRORs

# Step 3 — Execute upgrade on Standby GM
Action : Start Upgrade → Monitor until 100% complete
Validate: Standby GM reports healthy, reachable, version = 4.2.3.1

# Step 4 — Promote Standby to Active (planned failover)
Action : System → Location Manager → Promote Standby GM to Active
Confirm : New Active GM = 4.2.3.1 | Old Active GM now = Standby (4.1.2.3)

# Step 5 — Upgrade the original Active (now Standby) GM
Action : Repeat upgrade on remaining node
Validate: Both GMs = 4.2.3.1 | Active/Standby replication healthy

# Step 6 — Confirm all federation channels
Check  : System → Location Manager → all sites ACTIVE
Interim: GM = 4.2.3.1 | LM = 4.1.2.3 → N±1 valid, proceed

ℹ Single Active GM

If your environment has only a single Active GM (no standby pair), simply upload the bundle, run pre-checks, and execute the upgrade directly. There is no failover step. The GM will be unavailable for the duration of its upgrade — plan your change window accordingly, as no cross-site config pushes can occur during this window.

Section 05

Phase 3 — Upgrade Local Managers via SDDC Manager

Now that Global Manager is on 4.2.3.1 and federation channels are confirmed healthy, SDDC Manager can safely orchestrate the Local Manager upgrade. This is where the standard VCF lifecycle management workflow takes over.

SDDC Manager · Lifecycle Management · Upgrade

SDDC Manager — Lifecycle Management upgrade workflow showing NSX (Local Manager) as a component target

Download the VCF Release Bundle

SDDC Manager → Lifecycle Management → Bundle Management. Download the target VCF bundle containing NSX 4.2.3.1 in its BOM. Confirm bundle is in AVAILABLE state.

Initiate Workload Domain Upgrade

Navigate to Lifecycle Management → Upgrade → select the target Workload Domain. SDDC Manager presents the component upgrade order: NSX → vCenter → ESXi/vSAN.

Run Pre-Checks — Resolve All Issues

SDDC Manager will run environment pre-checks. Do not proceed with any WARNING or ERROR state. Common blockers: certificate expiry, vSAN health failures, ESXi host connectivity issues.

Execute NSX Local Manager Upgrade

SDDC Manager upgrades the 3-node LM cluster in a rolling fashion (node-by-node). Monitor via both SDDC Manager UI and the NSX Manager UI simultaneously for any anomalies.

NSX Edge Cluster Upgrade (Automatic)

SDDC Manager orchestrates Edge node upgrades as part of the NSX lifecycle step. Edge nodes go one-by-one with traffic continuity maintained via BFD/ECMP failover on the T0 gateway.

vCenter and ESXi/vSAN Upgrades

SDDC Manager continues with vCenter (if in BOM), then ESXi cluster-by-cluster. Host upgrades use vSphere DRS-based DPM evacuation — confirm DRS is enabled and automation level is set appropriately.

NSX Manager · Upgrade Coordinator · Pre-Checks

NSX Upgrade Coordinator — Pre-check results showing component health before upgrade execution

Section 06

Phase 4 — Post-Upgrade Federation Validation

Both GM and LM are now on 4.2.3.1. Do not close your change window until all of the following validation points have been confirmed.

Post-Upgrade Validation Checklist

# 1. Federation Channel Health
Location : GM UI → System → Location Manager
Expected : All sites = ACTIVE  |  No DEGRADED / PARTIAL sites

# 2. Config Sync Validation
Action   : Push a test config change (e.g., tag on a segment) from GM
Expected : Change propagates to LM within expected replication window

# 3. Stretched Segment / Gateway Policy
Location : GM UI → Networking → Segments / Gateway Policies
Expected : No objects in PARTIAL_SUCCESS or ERROR realisation state

# 4. BGP / Routing Table Validation
Action   : SSH to T0 SR Edge nodes at each site
Command  : get logical-router <UUID> bgp neighbor summary
Expected : All BGP sessions ESTABLISHED | route counts stable

# 5. NSX Edge Cluster Health
Location : LM UI → System → Fabric → Nodes → Edge Transport Nodes
Expected : All Edge nodes = UP | Deployment status = NODE_READY

# 6. Alarm Review
Location : GM UI and LM UI → Alarms
Expected : No new CRITICAL or HIGH alarms post-upgrade

# 7. Datapath Verification (Optional but Recommended)
Action   : Run a cross-site ping/traceroute between stretched segment VMs
Expected : Traffic flows correctly across federation sites

Summary

Complete Upgrade Sequence at a Glance

Step	Action	Executed By	Tool
01	Backup GM + LM (all nodes)	Manual	NSX UI / API
02	Validate federation health (all sites ACTIVE)	Manual	NSX Global Manager UI
03	Confirm VCF BOM includes NSX 4.2.3.1	SDDC Mgr	SDDC Manager UI
04	Upgrade Standby GM → failover → upgrade original Active GM	Manual	NSX Global Manager UI
05	Validate GM health + federation channels (ACTIVE)	Manual	NSX Global Manager UI
06	Trigger Workload Domain upgrade via SDDC Manager	SDDC Mgr	SDDC Manager UI
07	SDDC Manager upgrades NSX Local Manager (rolling)	SDDC Mgr	SDDC Manager UI
08	SDDC Manager upgrades NSX Edge cluster	SDDC Mgr	SDDC Manager UI
09	SDDC Manager upgrades vCenter + ESXi/vSAN	SDDC Mgr	SDDC Manager UI
10	Full post-upgrade federation validation	Both	NSX GM + LM UI

Section 07

Key Gotchas and Architect Notes

🟡 The old "LM must never lead GM" rule is outdated for 4.1.1+ and 4.2. Broadcom's official docs confirm that from NSX 4.1.1 onwards, and explicitly in 4.2, GM and LM can be upgraded in any order — federation sync is preserved across any version mix from 3.2 to 4.2. The N±1 rule only applied to NSX 4.1.0 and earlier. For VCF 5.x → 5.2, the prescribed sequence is still GM-first, but the reason is VCF BOM alignment and a resolved 4.1.x defect — not a hard architectural constraint. Always follow the official upgrade table for your exact VCF version: NSX 4.2 Federation Upgrade Guide.
🟡 Edge nodes are managed under the LM domain in VCF. SDDC Manager handles NSX Edge node upgrades as part of the NSX component step. Do not manually upgrade Edge nodes via NSX UI — let SDDC Manager orchestrate it.
🟡 GM config backup is your only recovery path. If the GM upgrade fails mid-way on a single-GM deployment, restoring from a pre-upgrade backup is the only supported recovery method. Verify backup integrity before starting.
🔵 VCF BOM alignment is mandatory. SDDC Manager will only offer NSX versions that are part of its release BOM. If 4.2.3.1 isn't in the BOM of your target VCF release, SDDC Manager won't surface it — check the VCF release notes before planning your upgrade path.
🔵 Cross-site config push is unavailable during GM upgrade. Plan your change window to account for the GM downtime period. Any configuration changes that need to propagate cross-site must be completed before or after — never during — the GM upgrade window.
🟢 NSX 4.2.x improvements are worth the effort. The 4.2.x line brings significant improvements to federation replication reliability, VPC-mode support, and BGP graceful restart handling — all relevant for multi-site VCF deployments. The operational overhead of a careful upgrade sequence pays dividends in post-upgrade stability.

✓ Closing Note — Corrected

Federation upgrades reward preparation and accurate knowledge. The sequence — backup, validate federation health, upgrade GM manually, then let SDDC Manager handle LM — remains the right call for VCF 5.x deployments going to 4.2. But it's right because Broadcom's VCF upgrade table mandates it and there was a specific resolved defect in 4.1.x, not because "LM ahead of GM breaks federation." That old N±1 rule was retired in NSX 4.1.1.

Always verify the exact upgrade path for your version combination in the official Broadcom TechDocs Federation Upgrade Guide and cross-check with the VMware Interoperability Matrix before opening any change window.

The Problem With 'Perimeter Security'

What Zero Trust Actually Means

VCF 9.0: Zero Trust as a Platform

NSX VPCs: Macro Isolation at the Tenant Boundary

The Distributed Firewall: Micro-Enforcement at Every vNIC

Macro + Micro: One Consistent Trust Model

Why 'Engineered In' Matters

Private Cloud Just Grew Up

Closing Thoughts

VCF 9 Lab Network Pre-Requisites: Arista Switch Configuration, VLAN Design & Full Validation Covers

VCF 9 Lab Network Pre-Requisites

Introduction

Lab Hardware Overview

VLAN Design & IP Addressing

Design Decisions

VLAN Reference — Site 1

VLAN Reference — Site 2

MTU Strategy

Switch Port Allocation

EOS Configuration

1 Baseline — Hostname, Routing & Credentials

2 VLAN Database

3 LACP Port-Channels

4 OOB IPMI / iDRAC Ports

5 Dell R630 — Management / NFS Host

6 NSX T0 BGP Uplink Ports

7 Internet Uplink & Cisco Core Trunk

8 SVIs — Layer 3 Routing Interfaces

9 eBGP Configuration — NSX T0 Peering

10 NTP, LLDP, SSH & eAPI

Validation — End-to-End Checklist

Issues Encountered & Fixes

Next Steps

Upgrading NSX Manager in a Federated VCF Environment

Understanding the Architectural Blind Spot

The Real Interoperability Model — What Changed in 4.1.1 and 4.2

Phase 1 — Pre-Upgrade Validation

1.1 — Federation Health

Validate GM ↔ LM Channel Status

Check Config Replication Sync State

Review Broadcom Interoperability Matrix

Backup GM and LM (All Nodes)

Confirm VCF BOM Alignment

Confirm No Active Span Operations

Phase 2 — Upgrade Global Manager (Manual)

4.1 — Active / Standby GM Pair

Phase 3 — Upgrade Local Managers via SDDC Manager

Download the VCF Release Bundle

Initiate Workload Domain Upgrade

Run Pre-Checks — Resolve All Issues

Execute NSX Local Manager Upgrade

NSX Edge Cluster Upgrade (Automatic)

vCenter and ESXi/vSAN Upgrades

Phase 4 — Post-Upgrade Federation Validation

Complete Upgrade Sequence at a Glance

Key Gotchas and Architect Notes

VCF 9 Home Lab | Embedded vIDM (viDB) --- AD Integration, Users, Groups & NSX SSO