Upgrading NSX Manager
in a Federated VCF Environment
A step-by-step architect's guide for upgrading NSX 4.1.2.3 → 4.2.3.1 when SDDC Manager has no visibility of Global Managers — and why sequence is everything.
Upgrading NSX in a standard VCF workload domain is a well-understood workflow — SDDC Manager owns the lifecycle, orchestrates the upgrade bundle, and walks you through a pre-check → upgrade → validation loop. But introduce NSX Federation — with its Global Manager / Local Manager topology — and that comfortable automation suddenly has a blind spot: SDDC Manager has no visibility of Global Managers whatsoever.
Get the sequence wrong, and you can end up with a Local Manager running a newer NSX version than your Global Manager. Federation's N±1 interoperability rule means that is a hard-stop condition. This post walks through the complete, architect-level upgrade sequence for moving from NSX 4.1.2.3 → 4.2.3.1 in a federated VCF environment.
Understanding the Architectural Blind Spot
Before any upgrade activity, you must understand what SDDC Manager sees and what it doesn't.
In a federated NSX deployment inside VCF:
- Local Managers (LM) are registered as part of VCF workload domains. SDDC Manager sees them, manages their lifecycle, and upgrades them.
- Global Managers (GM) are deployed independently and registered to SDDC Manager's inventory only as an external reference — SDDC Manager cannot upgrade them.
- This means Global Manager upgrade is entirely manual, and must always happen before the Local Manager upgrade is triggered via SDDC Manager.
A common misconception (including in earlier drafts of this post) is that allowing LM to exceed the GM version will categorically break federation. This was true prior to NSX 4.1.1, but no longer applies to 4.1.1+ or 4.2. Starting with NSX 4.1.1, and explicitly confirmed for NSX 4.2, upgrades can occur in any order — LM first or GM first — and federation sync is maintained across any version combination between 3.2 and 4.2.
That said, for VCF deployments on 4.1.x → 4.2, Broadcom's documented procedure still prescribes upgrading GM manually first for two specific reasons: (1) VCF BOM and SDDC Manager orchestration alignment, and (2) a resolved defect in 4.1.x that required all sites to be upgraded before moving to NSX 4.2. GM-first is still the right operational call — just understand why, so you're not cargo-culting an outdated rule.
๐ Broadcom TechDocs — Upgrading NSX Federation (NSX 4.2)
๐ Broadcom TechDocs — Upgrading NSX Federation (NSX 4.1)
The Real Interoperability Model — What Changed in 4.1.1 and 4.2
The old N±1 rule — where GM had to be upgraded before LM at all times — applied only up to NSX 4.1.0. Broadcom fundamentally relaxed this constraint in subsequent releases. Understanding the version-specific rules is essential before planning your sequence.
- Any upgrade order is supported (LM-first or GM-first) ✓
- GM and LM sync is maintained across any version combination 3.2–4.2 ✓
- Old N±1 rule → Applies only to NSX 4.1.0 and earlier
- VCF procedure still prescribes GM-first for BOM + defect reasons
During the interim window — after you've upgraded GM to 4.2.3.1 but before SDDC Manager upgrades LM — your environment sits in a mixed-version state (GM 4.2.3.1, LM 4.1.2.3). Per Broadcom's documentation, federation sync continues uninterrupted in this state. The GM-first sequence is followed here because the official VCF 5.x upgrade procedure mandates it, not because the architecture requires it.
Phase 1 — Pre-Upgrade Validation
Before touching a single component, perform thorough environmental health checks. Upgrades that fail mid-way in federated environments are significantly harder to recover from than in standalone deployments.
1.1 — Federation Health
Validate GM ↔ LM Channel Status
In Global Manager UI → System → Location Manager — all sites must show ACTIVE. Any DEGRADED or STANDBY alarm must be resolved before proceeding.
Check Config Replication Sync State
Verify no pending replication lag from GM to LM. Push a test config change and confirm propagation before upgrade.
Review Broadcom Interoperability Matrix
Confirm vCenter, ESXi, and vSAN versions in the target workload domain are all compatible with NSX 4.2.3.1. Use the VMware Interoperability Matrix at interopmatrix.vmware.com.
Backup GM and LM (All Nodes)
Trigger a manual NSX configuration backup for both Global Manager and all Local Managers via System → Backup & Restore. Confirm backup file is written and accessible.
Confirm VCF BOM Alignment
In SDDC Manager, validate that the VCF release bundle you are upgrading to includes NSX 4.2.3.1 in its Bill of Materials. SDDC Manager will not offer an NSX version that isn't in its BOM.
Confirm No Active Span Operations
Ensure no stretched segment migrations, HCX workload moves, or cross-site DR operations are in-flight. Pause or complete these before upgrade windows open.
Phase 2 — Upgrade Global Manager (Manual)
This entire phase is performed directly in the NSX Global Manager UI or via NSX API. SDDC Manager has zero visibility of this operation. You must complete this phase yourself before triggering anything via SDDC Manager.
4.1 — Active / Standby GM Pair
# Step 1 — Upload upgrade bundle to STANDBY Global Manager Action : System → Lifecycle Mgmt → Upgrade Upload : VMware-NSX-4.2.3.1-upgrade-bundle.mub Target : Standby GM only # Step 2 — Run pre-check on Standby GM Action : Run Prechecks → Resolve all WARNINGs/ERRORs # Step 3 — Execute upgrade on Standby GM Action : Start Upgrade → Monitor until 100% complete Validate: Standby GM reports healthy, reachable, version = 4.2.3.1 # Step 4 — Promote Standby to Active (planned failover) Action : System → Location Manager → Promote Standby GM to Active Confirm : New Active GM = 4.2.3.1 | Old Active GM now = Standby (4.1.2.3) # Step 5 — Upgrade the original Active (now Standby) GM Action : Repeat upgrade on remaining node Validate: Both GMs = 4.2.3.1 | Active/Standby replication healthy # Step 6 — Confirm all federation channels Check : System → Location Manager → all sites ACTIVE Interim: GM = 4.2.3.1 | LM = 4.1.2.3 → N±1 valid, proceed
If your environment has only a single Active GM (no standby pair), simply upload the bundle, run pre-checks, and execute the upgrade directly. There is no failover step. The GM will be unavailable for the duration of its upgrade — plan your change window accordingly, as no cross-site config pushes can occur during this window.
Phase 3 — Upgrade Local Managers via SDDC Manager
Now that Global Manager is on 4.2.3.1 and federation channels are confirmed healthy, SDDC Manager can safely orchestrate the Local Manager upgrade. This is where the standard VCF lifecycle management workflow takes over.
Download the VCF Release Bundle
SDDC Manager → Lifecycle Management → Bundle Management. Download the target VCF bundle containing NSX 4.2.3.1 in its BOM. Confirm bundle is in AVAILABLE state.
Initiate Workload Domain Upgrade
Navigate to Lifecycle Management → Upgrade → select the target Workload Domain. SDDC Manager presents the component upgrade order: NSX → vCenter → ESXi/vSAN.
Run Pre-Checks — Resolve All Issues
SDDC Manager will run environment pre-checks. Do not proceed with any WARNING or ERROR state. Common blockers: certificate expiry, vSAN health failures, ESXi host connectivity issues.
Execute NSX Local Manager Upgrade
SDDC Manager upgrades the 3-node LM cluster in a rolling fashion (node-by-node). Monitor via both SDDC Manager UI and the NSX Manager UI simultaneously for any anomalies.
NSX Edge Cluster Upgrade (Automatic)
SDDC Manager orchestrates Edge node upgrades as part of the NSX lifecycle step. Edge nodes go one-by-one with traffic continuity maintained via BFD/ECMP failover on the T0 gateway.
vCenter and ESXi/vSAN Upgrades
SDDC Manager continues with vCenter (if in BOM), then ESXi cluster-by-cluster. Host upgrades use vSphere DRS-based DPM evacuation — confirm DRS is enabled and automation level is set appropriately.
Phase 4 — Post-Upgrade Federation Validation
Both GM and LM are now on 4.2.3.1. Do not close your change window until all of the following validation points have been confirmed.
# 1. Federation Channel Health Location : GM UI → System → Location Manager Expected : All sites = ACTIVE | No DEGRADED / PARTIAL sites # 2. Config Sync Validation Action : Push a test config change (e.g., tag on a segment) from GM Expected : Change propagates to LM within expected replication window # 3. Stretched Segment / Gateway Policy Location : GM UI → Networking → Segments / Gateway Policies Expected : No objects in PARTIAL_SUCCESS or ERROR realisation state # 4. BGP / Routing Table Validation Action : SSH to T0 SR Edge nodes at each site Command : get logical-router <UUID> bgp neighbor summary Expected : All BGP sessions ESTABLISHED | route counts stable # 5. NSX Edge Cluster Health Location : LM UI → System → Fabric → Nodes → Edge Transport Nodes Expected : All Edge nodes = UP | Deployment status = NODE_READY # 6. Alarm Review Location : GM UI and LM UI → Alarms Expected : No new CRITICAL or HIGH alarms post-upgrade # 7. Datapath Verification (Optional but Recommended) Action : Run a cross-site ping/traceroute between stretched segment VMs Expected : Traffic flows correctly across federation sites
Complete Upgrade Sequence at a Glance
| Step | Action | Executed By | Tool |
|---|---|---|---|
| 01 | Backup GM + LM (all nodes) | Manual | NSX UI / API |
| 02 | Validate federation health (all sites ACTIVE) | Manual | NSX Global Manager UI |
| 03 | Confirm VCF BOM includes NSX 4.2.3.1 | SDDC Mgr | SDDC Manager UI |
| 04 | Upgrade Standby GM → failover → upgrade original Active GM | Manual | NSX Global Manager UI |
| 05 | Validate GM health + federation channels (ACTIVE) | Manual | NSX Global Manager UI |
| 06 | Trigger Workload Domain upgrade via SDDC Manager | SDDC Mgr | SDDC Manager UI |
| 07 | SDDC Manager upgrades NSX Local Manager (rolling) | SDDC Mgr | SDDC Manager UI |
| 08 | SDDC Manager upgrades NSX Edge cluster | SDDC Mgr | SDDC Manager UI |
| 09 | SDDC Manager upgrades vCenter + ESXi/vSAN | SDDC Mgr | SDDC Manager UI |
| 10 | Full post-upgrade federation validation | Both | NSX GM + LM UI |
Key Gotchas and Architect Notes
- The old "LM must never lead GM" rule is outdated for 4.1.1+ and 4.2. Broadcom's official docs confirm that from NSX 4.1.1 onwards, and explicitly in 4.2, GM and LM can be upgraded in any order — federation sync is preserved across any version mix from 3.2 to 4.2. The N±1 rule only applied to NSX 4.1.0 and earlier. For VCF 5.x → 5.2, the prescribed sequence is still GM-first, but the reason is VCF BOM alignment and a resolved 4.1.x defect — not a hard architectural constraint. Always follow the official upgrade table for your exact VCF version: NSX 4.2 Federation Upgrade Guide.
- Edge nodes are managed under the LM domain in VCF. SDDC Manager handles NSX Edge node upgrades as part of the NSX component step. Do not manually upgrade Edge nodes via NSX UI — let SDDC Manager orchestrate it.
- GM config backup is your only recovery path. If the GM upgrade fails mid-way on a single-GM deployment, restoring from a pre-upgrade backup is the only supported recovery method. Verify backup integrity before starting.
- VCF BOM alignment is mandatory. SDDC Manager will only offer NSX versions that are part of its release BOM. If 4.2.3.1 isn't in the BOM of your target VCF release, SDDC Manager won't surface it — check the VCF release notes before planning your upgrade path.
- Cross-site config push is unavailable during GM upgrade. Plan your change window to account for the GM downtime period. Any configuration changes that need to propagate cross-site must be completed before or after — never during — the GM upgrade window.
- NSX 4.2.x improvements are worth the effort. The 4.2.x line brings significant improvements to federation replication reliability, VPC-mode support, and BGP graceful restart handling — all relevant for multi-site VCF deployments. The operational overhead of a careful upgrade sequence pays dividends in post-upgrade stability.
Federation upgrades reward preparation and accurate knowledge. The sequence — backup, validate federation health, upgrade GM manually, then let SDDC Manager handle LM — remains the right call for VCF 5.x deployments going to 4.2. But it's right because Broadcom's VCF upgrade table mandates it and there was a specific resolved defect in 4.1.x, not because "LM ahead of GM breaks federation." That old N±1 rule was retired in NSX 4.1.1.
Always verify the exact upgrade path for your version combination in the official Broadcom TechDocs Federation Upgrade Guide and cross-check with the VMware Interoperability Matrix before opening any change window.
No comments:
Post a Comment