Skip to content

Docs: Core Fabric Status Guide - Understanding and Troubleshooting Fabric Connectivity #384

@jasnoyaeger

Description

@jasnoyaeger

Summary

A dedicated product guide page is needed that explains VergeOS core fabric statuses, how to interpret fabric configuration data, and how to troubleshoot fabric connectivity issues. This information is currently scattered across SOPs, node diagnostics references, and internal knowledge base articles, but no single authoritative page exists for administrators to reference.

Type

Conceptual + Reference (hybrid)

Suggested Content

  • Audience: System administrators, support engineers, and implementation engineers
  • Prerequisites: Basic understanding of VergeOS core network architecture (link to Core Concepts page)
  • Key sections:
    • What is the Core Fabric? - Brief overview of the fabric network's role in vSAN traffic, node-to-node communication, VM migrations, and management
    • Accessing Fabric Status - How to view fabric configuration (Node Diagnostics > Fabric Configuration, and cat /run/ybfabric.json via CLI)
    • Understanding Fabric Status Fields:
      • confirmed: true/false - What each means and implications
      • score value - What the numeric score represents (e.g., 200 = perfect connectivity), what lower scores indicate
      • Path entries - Core1 and Core2 paths per node, IP addressing
      • Missing nodes or paths in the fabric output
    • NIC Fabric Status in the UI - Explanation of the globe icon and "Confirmed/Not Confirmed" status shown on the Nodes > NICs section
    • Healthy vs. Unhealthy Fabric Examples - JSON examples showing:
      • Healthy: All nodes with 2 paths, score 200, confirmed true
      • Degraded: One path missing or score < 200
      • Critical: confirmed false, missing nodes
    • Pre-Maintenance Fabric Verification - Why you must verify fabric before updates, scale-ups, and scale-outs (reference existing SOPs)
    • Troubleshooting Fabric Issues:
      • Path not confirmed - Check physical cabling, switch VLAN config, MTU mismatches
      • Score degradation - Network latency, switch hop issues
      • Missing nodes - Node offline, core NIC down, VLAN isolation failure
      • Single path only - Lost redundancy, cable/switch failure on one core network
    • Best Practices - Regular fabric verification, post-installation testing, monitoring recommendations

Context

Requested via support interaction. Fabric status interpretation is frequently needed during:

  • Pre-update/pre-scale verification (all SOPs reference checking confirmed:true)
  • Troubleshooting vSAN or cluster connectivity issues
  • Post-installation validation
  • Diagnostic bundle analysis (ybfabric.txt)

Current documentation references fabric status only in passing. Support engineers and customers need a single reference page to understand what they're looking at when reviewing fabric configuration output.

Related Existing Pages

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentation

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions