From e73a617c7f10ccb63db212f563cb7e355fe89341 Mon Sep 17 00:00:00 2001 From: Jason Yaeger <30736996+jasnoyaeger@users.noreply.github.com> Date: Thu, 19 Feb 2026 10:29:46 -0500 Subject: [PATCH 1/2] =?UTF-8?q?=F0=9F=93=9D=20docs:=20add=20vSAN=20I/O=20s?= =?UTF-8?q?cheduling=20and=20workload=20management=20page?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Document how vSAN fair queuing distributes I/O across workloads, capacity-based throttling thresholds (91%/96%), and explicitly clarify that per-VM/tenant I/O limiting is not available. Also update Azure AD references to Entra ID. Closes #390 Co-Authored-By: Claude Opus 4.6 --- docs/product-guide/auth/azure-auth.md | 4 +- .../storage/vsan-io-scheduling.md | 143 ++++++++++++++++++ mkdocs.yml | 1 + 3 files changed, 146 insertions(+), 2 deletions(-) create mode 100644 docs/product-guide/storage/vsan-io-scheduling.md diff --git a/docs/product-guide/auth/azure-auth.md b/docs/product-guide/auth/azure-auth.md index 3998bff5..bd979b8a 100644 --- a/docs/product-guide/auth/azure-auth.md +++ b/docs/product-guide/auth/azure-auth.md @@ -1,5 +1,5 @@ -# Using Azure AD for Authentication +# Using Entra ID (Azure AD) for Authentication VergeOS can be configured to allow users to authenticate using their corporate Azure credentials. This page will walk you through the configuration process. @@ -39,7 +39,7 @@ VergeOS can be configured to allow users to authenticate using their corporate A 15. ***Remote User Fields*** defines the list of fields used to initially find the Azure user; this field is auto-populated with (sub,preferred_username,email nickname), a default list that can typically work for most implementations. **Note: For security reasons, it is not recommended to locate remote users based on fields that are changeable by the end user on the remote system.** 16. To carry over group membership from Azure to VergeOS, check the ***Update Group Membership*** checkbox. Groups must be created in VergeOS using instructions below. 17. **User Auto-Creation Features (optional)**: Users can be auto-created upon initial login to VergeOS; this can be selected for all Azure AD users -OR- limited to users in specified Azure AD groups. - - ***Auto-Create Users*** - If all users should be auto-created, enter **\*** here. + - ***Auto-Create Users*** - If all users should be auto-created, enter **\.*** here. - ***Auto Create Users in Group*** - To only auto-create users that are members of specified Azure AD groups, enter the Azure group object ID(s) in regular expression (regex) form. !!! info "Auto Creating Users in Groups" - The **Group Scope** must be defined. (Group Scope field defined above) diff --git a/docs/product-guide/storage/vsan-io-scheduling.md b/docs/product-guide/storage/vsan-io-scheduling.md new file mode 100644 index 00000000..95617b55 --- /dev/null +++ b/docs/product-guide/storage/vsan-io-scheduling.md @@ -0,0 +1,143 @@ +--- +status: new +--- + +# vSAN I/O Scheduling and Workload Management + +VergeOS vSAN uses a fair queuing mechanism to distribute storage I/O across all competing workloads. This page explains how I/O scheduling works, what automatic protections are in place, and what workload-level controls are and are not available. + +## Overview + +When multiple VMs, tenants, and volumes share the same vSAN storage, the system must decide how to allocate I/O bandwidth among them. The vSAN handles this through **fair queuing** — an automatic scheduling approach that distributes I/O operations proportionally across all active workloads without requiring manual configuration. + +This design prioritizes simplicity and consistent behavior: every workload gets a fair share of available I/O, and no single VM or tenant can monopolize storage resources under normal conditions. + +!!! info "Key Takeaway" + The vSAN provides automatic, cluster-wide I/O fairness and capacity-based throttling. It does **not** provide per-VM or per-tenant I/O limits (IOPS caps, bandwidth caps, or storage QoS policies). + +## How Fair Queuing Works + +The vSAN's I/O scheduler operates at the block layer, distributing read and write operations across all available storage devices in a tier. Key characteristics include: + +- **Proportional distribution** — I/O requests from all workloads (VMs, volumes, tenants) are serviced in a round-robin fashion, preventing any single workload from starving others +- **No manual tuning required** — Fair queuing is always active and requires no configuration +- **Cluster-wide scope** — I/O scheduling operates across all nodes participating in the vSAN, not just within a single node +- **Automatic rebalancing** — As workloads start and stop, the scheduler adjusts automatically + +Because data blocks are distributed across all drives in a tier using a hash-based algorithm, I/O naturally spreads across physical devices. This avoids hot spots and ensures that adding more drives to a tier increases aggregate I/O capacity for all workloads. + +## Automatic Capacity-Based Throttling + +The vSAN includes built-in protections that automatically throttle write operations when storage tiers approach capacity limits. This prevents a full tier from causing data loss or cluster instability. + +### Throttling Thresholds + +| Capacity Used | System Behavior | +|---|---| +| Below 91% | Normal operation — no throttling applied | +| 91% | Write throttling begins — the vSAN progressively slows incoming writes to the affected tier | +| 96% | Aggressive throttling — write operations are significantly restricted to prevent the tier from filling completely | + +!!! warning "Capacity Planning" + Throttling is a safety mechanism, not a substitute for capacity planning. Sustained operation above 91% will degrade write performance for **all** workloads on the affected tier. Monitor tier utilization and add capacity before reaching these thresholds. + +### How Throttling Manifests + +When capacity-based throttling activates: + +- The `write_throttle` value in the tier's settings becomes non-zero, representing a bytes-per-second limit +- Individual drives show a `vsan_throttle` value in their status +- The tier status may transition to `outofspace` if usage continues to climb +- All workloads on the affected tier experience slower writes proportionally — no single workload is singled out + +### Monitoring Tier Capacity + +We can monitor tier capacity and throttling status through the VergeOS dashboard or from the command line: + +- Navigate to **System > vSAN > Tier Status** in the dashboard to view per-tier utilization and health +- Check for active throttling by reviewing the tier's write throttle status and per-drive throttle values + +## Storage Tier Placement + +While the vSAN does not offer per-workload I/O limits, **storage tier placement** is the primary mechanism for differentiating performance between workloads. By assigning workloads to different tiers backed by different media classes, we can achieve meaningful performance separation. + +| Tier | Typical Media | Best For | +|---|---|---| +| Tier 1 | NVMe SSD | High-performance databases, latency-sensitive applications | +| Tier 2 | SATA/SAS SSD | General-purpose VM storage, mixed workloads | +| Tier 3 | SSD or fast HDD | Read-heavy workloads, application repositories | +| Tier 4 | HDD | File servers, backup targets | +| Tier 5 | Archive HDD | Cold storage, long-term retention | + +Each VM disk, volume, and media file supports a **preferred tier** setting that controls which tier its data is stored on. For details on configuring tier assignments, see [Preferred Tier](/product-guide/storage/preferred-tiers). + +!!! tip "Performance Isolation Strategy" + To isolate a performance-sensitive workload, place it on a dedicated tier that other workloads do not use. For example, assign a critical database VM's disks to Tier 1 (NVMe) while keeping general workloads on Tier 2 (SATA SSD). The fair queuing scheduler operates independently per tier, so workloads on separate tiers do not compete for the same I/O resources. + +## What the vSAN Does Not Provide + +The following storage QoS capabilities are **not currently available** in VergeOS vSAN: + +| Capability | Status | +|---|---| +| Per-VM IOPS limits | Not available | +| Per-VM bandwidth caps | Not available | +| Per-tenant storage I/O limits | Not available | +| Storage QoS policies or profiles | Not available | +| I/O priority classes per workload | Not available | + +!!! note "Network vs. Storage I/O Limiting" + VergeOS **does** support network bandwidth limiting through [network rate limiting](/product-guide/networks/network-rules), which can cap the network throughput for individual VMs or tenants. However, this applies only to network traffic — it does not affect storage I/O operations within the vSAN. + +## Workarounds for Workload Isolation + +For environments that require tighter control over storage I/O distribution, consider these strategies: + +### Tier-Based Separation + +Assign performance-sensitive workloads to faster, less-contended tiers while placing bulk or background workloads on slower tiers. This is the most effective approach because each tier has its own pool of physical devices and its own I/O scheduling scope. + +### Guest-Level I/O Controls + +Operating systems within VMs can enforce their own I/O limits: + +- **Linux** — Use `cgroups` (specifically the `blkio` controller) to cap IOPS or bandwidth for processes within the guest +- **Windows** — Use Windows System Resource Manager or Storage QoS policies within the guest OS to throttle disk I/O for specific processes or services + +!!! note + Guest-level controls limit I/O from the perspective of the guest operating system. They do not interact with the vSAN's fair queuing mechanism — they simply reduce how much I/O the guest generates in the first place. + +### Right-Sizing Tenant Resources + +For multi-tenant environments, right-sizing the compute and memory resources allocated to each tenant indirectly constrains their storage I/O generation. A tenant with fewer CPU cores and less RAM will naturally generate less storage I/O than one with more resources. + +### Workload Scheduling + +For batch or background workloads (backups, data processing, replication), schedule them during off-peak hours when fewer interactive workloads are competing for I/O. This maximizes the benefit of fair queuing by reducing contention during business hours. + +## Monitoring I/O Distribution + +VergeOS provides several tools for observing how I/O is distributed across the cluster: + +- **Dashboard** — The vSAN dashboard displays real-time I/O rates, tier utilization, and per-node activity +- **Tier Status** — Navigate to **System > vSAN > Tier Status** to view per-tier health, capacity, and throttling state +- **Drive Metrics** — Individual drive statistics (read/write errors, latency, throttling) are available under **System > Nodes > [Node] > Drives** + +These tools help identify whether a particular workload or tier is experiencing contention, allowing administrators to make informed decisions about tier placement or capacity expansion. + +## Summary + +The vSAN's approach to I/O scheduling prioritizes fairness and simplicity: + +- **Fair queuing** automatically distributes I/O across all workloads without configuration +- **Capacity-based throttling** protects tiers from filling up by progressively slowing writes at 91% and 96% utilization +- **Tier placement** is the primary tool for separating workload performance classes +- **Per-VM and per-tenant I/O limits** are not available — use tier separation, guest-level controls, or resource right-sizing as alternatives + +### Related Documentation + +- [Storage Overview](/product-guide/storage/overview) +- [vSAN Architecture](/product-guide/storage/vsan-architecture) +- [Storage Tiers](/product-guide/storage/storage-tiers) +- [Preferred Tier](/product-guide/storage/preferred-tiers) +- [Network Rules](/product-guide/networks/network-rules) diff --git a/mkdocs.yml b/mkdocs.yml index d4afe7af..cc842e35 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -175,6 +175,7 @@ nav: - Storage: - Overview: product-guide/storage/overview.md - vSAN Architecture: product-guide/storage/vsan-architecture.md # Added from PR + - vSAN I/O Scheduling: product-guide/storage/vsan-io-scheduling.md - Storage Tiers: product-guide/storage/storage-tiers.md - Preferred Tier: product-guide/storage/preferred-tiers.md - External Storage Integration: product-guide/storage/external-storage-integration.md From c96f3dafcb248daae014cba15cca559f7846db12 Mon Sep 17 00:00:00 2001 From: Larry Ludlow <2191492+lludlow@users.noreply.github.com> Date: Thu, 19 Feb 2026 11:36:13 -0500 Subject: [PATCH 2/2] cleanup and Entra ID rename --- docs/product-guide/auth/azure-auth.md | 36 ++++---- .../storage/vsan-io-scheduling.md | 89 +++++++++---------- 2 files changed, 62 insertions(+), 63 deletions(-) diff --git a/docs/product-guide/auth/azure-auth.md b/docs/product-guide/auth/azure-auth.md index bd979b8a..0b5a462e 100644 --- a/docs/product-guide/auth/azure-auth.md +++ b/docs/product-guide/auth/azure-auth.md @@ -3,7 +3,7 @@ VergeOS can be configured to allow users to authenticate using their corporate Azure credentials. This page will walk you through the configuration process. -## Configure an Azure AD Authorization Source +## Configure an Entra ID Authorization Source 1. In Azure services: register a single-tenant web application, setting the ***Redirect URI*** to the URL of the VergeOS system and creating a new client secret. **Azure Active Directory > App Registrations > New Registration** ![azurereg1.png](/product-guide/screenshots/azurereg1.png) @@ -38,32 +38,32 @@ VergeOS can be configured to allow users to authenticate using their corporate A 14. Enter the ***Client Secret*** obtained in the previous step. 15. ***Remote User Fields*** defines the list of fields used to initially find the Azure user; this field is auto-populated with (sub,preferred_username,email nickname), a default list that can typically work for most implementations. **Note: For security reasons, it is not recommended to locate remote users based on fields that are changeable by the end user on the remote system.** 16. To carry over group membership from Azure to VergeOS, check the ***Update Group Membership*** checkbox. Groups must be created in VergeOS using instructions below. -17. **User Auto-Creation Features (optional)**: Users can be auto-created upon initial login to VergeOS; this can be selected for all Azure AD users -OR- limited to users in specified Azure AD groups. - - ***Auto-Create Users*** - If all users should be auto-created, enter **\.*** here. - - ***Auto Create Users in Group*** - To only auto-create users that are members of specified Azure AD groups, enter the Azure group object ID(s) in regular expression (regex) form. +17. **User Auto-Creation Features (optional)**: Users can be auto-created upon initial login to VergeOS; this can be selected for all Entra ID users -OR- limited to users in specified Entra ID groups. + - ***Auto-Create Users*** - If all users should be auto-created, enter `.*` here. + - ***Auto Create Users in Group*** - To only auto-create users that are members of specified Entra ID groups, enter the group object ID(s) in regular expression (regex) form. !!! info "Auto Creating Users in Groups" - The **Group Scope** must be defined. (Group Scope field defined above) - - **Token Configuration** must be setup in Azure AD (instructions below). - - Azure AD **groups** specified for Auto Create **must be created on the VergeOS side** (instructions below). + - **Token Configuration** must be set up in Entra ID (instructions below). + - Entra ID **groups** specified for Auto Create **must be created on the VergeOS side** (instructions below). - To auto-create based on group, the ***Auto Create Users*** **field must be blank** - Multiple specific group IDs can be entered using the format: (ID)|(ID)|(ID) 18. **Options (recommended enabled):** -- ***Update Remote User:*** - once the user is located in Azure AD, update VergeOS user *Remote Username* field to the corresponding Azure unique ID. -!!! success "Enabling the Update Remote User will allow the VergeOS system to store the unique Azure ID in the VergeOS user record (after initially locating the Azure AD user with fields defined in Remote User Fields), so the unique identifier can subsequently be used for finding Azure AD user; this is typically recommended since fields such as email address are subject to change." +- ***Update Remote User:*** - once the user is located in Entra ID, update VergeOS user *Remote Username* field to the corresponding Azure unique ID. +!!! success "Enabling the Update Remote User will allow the VergeOS system to store the unique Azure ID in the VergeOS user record (after initially locating the Entra ID user with fields defined in Remote User Fields), so the unique identifier can subsequently be used for finding the Entra ID user; this is typically recommended since fields such as email address are subject to change." -- ***Update User Email Address:*** - Update VergeOS user email address to match email address within Azure. -- ***Update User Display Name:*** - Update VergeOS user display name to match display name within Azure. +- ***Update User Email Address:*** - Update VergeOS user email address to match email address within Entra ID. +- ***Update User Display Name:*** - Update VergeOS user display name to match display name within Entra ID. - ***Update Group Membership:*** - Update the groups that a VergeOS user is a member of. (A Group Scope is required for this to function.) 19. **Additional Optional Fields:** See [**Authorization Sources (General)**](/product-guide/auth/auth-sources-overview) for information regarding additional optional Fields. ## Add Azure Groups to VergeOS -Interfacing with Azure groups requires a token on the Azure AD app registration and creation of groups in VergeOS: +Interfacing with Azure groups requires a token on the Entra ID app registration and creation of groups in VergeOS: -### Set up a Token Configuration in Azure AD +### Set up a Token Configuration in Entra ID 1. Navigate to the **App registration page** for the App created above. 2. Click on **Token Configuration** on the left menu and click **+Add groups claim**. @@ -76,9 +76,9 @@ Interfacing with Azure groups requires a token on the Azure AD app registration 1. Navigate to **System > Groups.** 2. Click **New** on the left menu. -3. Enter the group ***Name*** to match the group name in Azure AD. +3. Enter the group ***Name*** to match the group name in Entra ID. 4. Optionally, an ***Email*** can be entered for the group. This email address is used for sending subscription alerts and/or reports assigned to the group. -5. Copy the coordinating **Object Id** from the Groups/All Groups page in Azure AD to the ***Identifier*** field. +5. Copy the coordinating **Object Id** from the Groups/All Groups page in Entra ID to the ***Identifier*** field. 6. Click **Submit** (bottom of the page) to save the new group. ![azure-groupspage.png](/product-guide/screenshots/azure-groupspage.png) @@ -89,14 +89,14 @@ Interfacing with Azure groups requires a token on the Azure AD app registration After the Azure auth source is created, users can be manually created in VergeOS to utilize the authorization source for login authentication. Manually creating users is only necessary when users are not configured to be auto-created. -### Add VergeOS Users that will use the Azure AD Auth +### Add VergeOS Users that will use Entra ID Auth When creating the new user, use the following configuration: -- ***Authorization Source:*** Select the Azure AD source from the dropdown list +- ***Authorization Source:*** Select the Entra ID source from the dropdown list - ***Username:*** unique name within the VergeOS system; typically it is recommended to use the Azure principal name. - ***Remote Username:*** use value of one of the fields defined as *Remote User fields* these are fields that are searched on in Azure (e.g. username, email) -- ***Display Name:*** (optional) If *Update User Display Name* is enabled on the Azure AD auth source, display name will automatically synchronize from the Azure AD user. -- ***Email Address:*** (optional) If *Update User Email Address* is enabled on the Azure AD auth source, email address will automatically synchronize from the Azure AD user. +- ***Display Name:*** (optional) If *Update User Display Name* is enabled on the Entra ID auth source, display name will automatically synchronize from Entra ID. +- ***Email Address:*** (optional) If *Update User Email Address* is enabled on the Entra ID auth source, email address will automatically synchronize from Entra ID. ![azure-newuser.png](/product-guide/screenshots/azure-newuser.png) \ No newline at end of file diff --git a/docs/product-guide/storage/vsan-io-scheduling.md b/docs/product-guide/storage/vsan-io-scheduling.md index 95617b55..8bd12368 100644 --- a/docs/product-guide/storage/vsan-io-scheduling.md +++ b/docs/product-guide/storage/vsan-io-scheduling.md @@ -4,27 +4,27 @@ status: new # vSAN I/O Scheduling and Workload Management -VergeOS vSAN uses a fair queuing mechanism to distribute storage I/O across all competing workloads. This page explains how I/O scheduling works, what automatic protections are in place, and what workload-level controls are and are not available. +This page explains how the vSAN distributes storage I/O, what automatic protections are in place when tiers approach capacity, and what options are available for managing workload isolation. ## Overview -When multiple VMs, tenants, and volumes share the same vSAN storage, the system must decide how to allocate I/O bandwidth among them. The vSAN handles this through **fair queuing** — an automatic scheduling approach that distributes I/O operations proportionally across all active workloads without requiring manual configuration. +When multiple VMs, tenants, and volumes share the same vSAN storage, I/O is distributed across physical devices through the vSAN's hash-based block placement algorithm. Each 64 KB block is assigned to a specific drive based on a hash, which naturally spreads I/O across all drives in a tier without creating hot spots. -This design prioritizes simplicity and consistent behavior: every workload gets a fair share of available I/O, and no single VM or tenant can monopolize storage resources under normal conditions. +At the device level, the Linux kernel's block I/O schedulers handle request ordering and fairness across all operations reaching each drive. !!! info "Key Takeaway" - The vSAN provides automatic, cluster-wide I/O fairness and capacity-based throttling. It does **not** provide per-VM or per-tenant I/O limits (IOPS caps, bandwidth caps, or storage QoS policies). + The vSAN distributes I/O across drives through hash-based block placement and protects against capacity issues with automatic write throttling. It does **not** provide per-VM or per-tenant I/O limits (IOPS caps, bandwidth caps, or storage QoS policies). -## How Fair Queuing Works +## How I/O Distribution Works -The vSAN's I/O scheduler operates at the block layer, distributing read and write operations across all available storage devices in a tier. Key characteristics include: +The vSAN distributes data blocks across all drives in a tier using a hash-based mapping algorithm. This has several important effects on I/O behavior: -- **Proportional distribution** — I/O requests from all workloads (VMs, volumes, tenants) are serviced in a round-robin fashion, preventing any single workload from starving others -- **No manual tuning required** — Fair queuing is always active and requires no configuration -- **Cluster-wide scope** — I/O scheduling operates across all nodes participating in the vSAN, not just within a single node -- **Automatic rebalancing** — As workloads start and stop, the scheduler adjusts automatically - -Because data blocks are distributed across all drives in a tier using a hash-based algorithm, I/O naturally spreads across physical devices. This avoids hot spots and ensures that adding more drives to a tier increases aggregate I/O capacity for all workloads. +- **Natural load spreading** — Because blocks are distributed by hash across all drives and nodes in a tier, read and write operations are spread across physical devices automatically +- **No hot spots** — No single drive or node bears a disproportionate share of I/O, regardless of which VM or tenant generates it +- **Scales horizontally and vertically** — Adding more drives to existing nodes or adding new nodes with drives both increase the aggregate I/O capacity available to all workloads +- **Data locality** — When a block is available on the local node, reads are served locally, reducing network overhead and latency +- **N+1 redundancy** — Every block is stored on two different nodes. If a node goes offline, all data remains accessible from the surviving copy with no interruption to I/O +- **No manual tuning required** — Block distribution is automatic and requires no configuration ## Automatic Capacity-Based Throttling @@ -35,44 +35,45 @@ The vSAN includes built-in protections that automatically throttle write operati | Capacity Used | System Behavior | |---|---| | Below 91% | Normal operation — no throttling applied | -| 91% | Write throttling begins — the vSAN progressively slows incoming writes to the affected tier | -| 96% | Aggressive throttling — write operations are significantly restricted to prevent the tier from filling completely | +| 91% | Write throttling begins — the vSAN adds **5 ms of write latency** to the affected tier | +| 96% | Aggressive throttling — the vSAN adds **50 ms of write latency** to prevent the tier from filling completely | !!! warning "Capacity Planning" - Throttling is a safety mechanism, not a substitute for capacity planning. Sustained operation above 91% will degrade write performance for **all** workloads on the affected tier. Monitor tier utilization and add capacity before reaching these thresholds. + Throttling is a safety mechanism, not a substitute for capacity planning. Even the initial 5 ms penalty at 91% can noticeably affect latency-sensitive workloads. Monitor tier utilization and add capacity before reaching these thresholds. ### How Throttling Manifests When capacity-based throttling activates: -- The `write_throttle` value in the tier's settings becomes non-zero, representing a bytes-per-second limit -- Individual drives show a `vsan_throttle` value in their status -- The tier status may transition to `outofspace` if usage continues to climb -- All workloads on the affected tier experience slower writes proportionally — no single workload is singled out +- All workloads on the affected tier experience increased write latency — no single workload is singled out. +- The tier status may transition to `outofspace` if usage continues to climb. +- Throttling applies to the entire tier across all nodes. -### Monitoring Tier Capacity +### Monitoring Throttle Status -We can monitor tier capacity and throttling status through the VergeOS dashboard or from the command line: +You can check for active throttling in the following locations: -- Navigate to **System > vSAN > Tier Status** in the dashboard to view per-tier utilization and health -- Check for active throttling by reviewing the tier's write throttle status and per-drive throttle values +- **Dashboard** — Navigate to **System > vSAN > Tier Status** to view per-tier utilization, health, and throttle state +- **Per-drive metrics** — Individual drive throttle values are visible under **System > Nodes > [Node] > Drives** ## Storage Tier Placement -While the vSAN does not offer per-workload I/O limits, **storage tier placement** is the primary mechanism for differentiating performance between workloads. By assigning workloads to different tiers backed by different media classes, we can achieve meaningful performance separation. +While the vSAN does not offer per-workload I/O limits, **storage tier placement** is the primary mechanism for separating workload performance classes. VergeOS supports up to five user-configurable tiers (Tier 1 through Tier 5), and each tier can be backed by any media type. The tier number does not dictate the type of drive — it is up to the administrator to assign drives to tiers based on the needs of the environment. + +For example, a common configuration might look like: -| Tier | Typical Media | Best For | +| Tier | Media | Use Case | |---|---|---| | Tier 1 | NVMe SSD | High-performance databases, latency-sensitive applications | | Tier 2 | SATA/SAS SSD | General-purpose VM storage, mixed workloads | -| Tier 3 | SSD or fast HDD | Read-heavy workloads, application repositories | -| Tier 4 | HDD | File servers, backup targets | -| Tier 5 | Archive HDD | Cold storage, long-term retention | +| Tier 3 | HDD | File servers, backup targets, cold storage | + +However, an environment could just as easily have three tiers of NVMe at different capacities or performance profiles, or two tiers of SSD and one of HDD. The tier structure is entirely flexible. Each VM disk, volume, and media file supports a **preferred tier** setting that controls which tier its data is stored on. For details on configuring tier assignments, see [Preferred Tier](/product-guide/storage/preferred-tiers). !!! tip "Performance Isolation Strategy" - To isolate a performance-sensitive workload, place it on a dedicated tier that other workloads do not use. For example, assign a critical database VM's disks to Tier 1 (NVMe) while keeping general workloads on Tier 2 (SATA SSD). The fair queuing scheduler operates independently per tier, so workloads on separate tiers do not compete for the same I/O resources. + For strategies on isolating latency-sensitive workloads, see [Strategies for Workload Isolation](#strategies-for-workload-isolation) below. ## What the vSAN Does Not Provide @@ -89,31 +90,31 @@ The following storage QoS capabilities are **not currently available** in VergeO !!! note "Network vs. Storage I/O Limiting" VergeOS **does** support network bandwidth limiting through [network rate limiting](/product-guide/networks/network-rules), which can cap the network throughput for individual VMs or tenants. However, this applies only to network traffic — it does not affect storage I/O operations within the vSAN. -## Workarounds for Workload Isolation +## Strategies for Workload Isolation -For environments that require tighter control over storage I/O distribution, consider these strategies: +For environments that require tighter control over storage I/O distribution, there are several approaches depending on scale: ### Tier-Based Separation -Assign performance-sensitive workloads to faster, less-contended tiers while placing bulk or background workloads on slower tiers. This is the most effective approach because each tier has its own pool of physical devices and its own I/O scheduling scope. +For latency-sensitive workloads, assign them to a dedicated tier or distribute them across multiple tiers to reduce contention. Each tier has its own pool of physical devices, so workloads on different tiers do not compete for the same I/O resources. This is the simplest and most effective approach within a single cluster. + +### Dedicated Storage Clusters + +In larger environments such as enterprises or MSPs with mixed workloads, consider dedicating entire storage clusters to high-performance workloads. This provides full physical isolation — the high-performance cluster's drives, nodes, and network are not shared with general-purpose workloads at all. ### Guest-Level I/O Controls Operating systems within VMs can enforce their own I/O limits: -- **Linux** — Use `cgroups` (specifically the `blkio` controller) to cap IOPS or bandwidth for processes within the guest -- **Windows** — Use Windows System Resource Manager or Storage QoS policies within the guest OS to throttle disk I/O for specific processes or services +- **Linux** — Use `cgroups` (specifically the `blkio` or `io` controller) to cap IOPS or bandwidth for processes within the guest +- **Windows** — Use File Server Resource Manager (FSRM) for file server workloads, or third-party tools for general disk I/O throttling within the guest !!! note - Guest-level controls limit I/O from the perspective of the guest operating system. They do not interact with the vSAN's fair queuing mechanism — they simply reduce how much I/O the guest generates in the first place. - -### Right-Sizing Tenant Resources - -For multi-tenant environments, right-sizing the compute and memory resources allocated to each tenant indirectly constrains their storage I/O generation. A tenant with fewer CPU cores and less RAM will naturally generate less storage I/O than one with more resources. + Guest-level controls limit I/O from the perspective of the guest operating system. They reduce how much I/O the guest generates, but they do not interact with the vSAN — the vSAN sees the resulting (reduced) I/O just like any other workload. ### Workload Scheduling -For batch or background workloads (backups, data processing, replication), schedule them during off-peak hours when fewer interactive workloads are competing for I/O. This maximizes the benefit of fair queuing by reducing contention during business hours. +For batch or background workloads (backups, data processing, replication), schedule them during off-peak hours when fewer interactive workloads are competing for I/O. This reduces contention during business hours. ## Monitoring I/O Distribution @@ -123,14 +124,12 @@ VergeOS provides several tools for observing how I/O is distributed across the c - **Tier Status** — Navigate to **System > vSAN > Tier Status** to view per-tier health, capacity, and throttling state - **Drive Metrics** — Individual drive statistics (read/write errors, latency, throttling) are available under **System > Nodes > [Node] > Drives** -These tools help identify whether a particular workload or tier is experiencing contention, allowing administrators to make informed decisions about tier placement or capacity expansion. +These tools help identify whether a particular tier is experiencing contention, allowing administrators to make informed decisions about tier placement or capacity expansion. ## Summary -The vSAN's approach to I/O scheduling prioritizes fairness and simplicity: - -- **Fair queuing** automatically distributes I/O across all workloads without configuration -- **Capacity-based throttling** protects tiers from filling up by progressively slowing writes at 91% and 96% utilization +- **Hash-based block distribution** spreads I/O naturally across all drives in a tier without manual configuration +- **Capacity-based throttling** adds 5 ms of write latency at 91% tier utilization and 50 ms at 96% to protect against full tiers - **Tier placement** is the primary tool for separating workload performance classes - **Per-VM and per-tenant I/O limits** are not available — use tier separation, guest-level controls, or resource right-sizing as alternatives