Skip to content

Comments

Add I/O statistics on Linux#5

Open
MitchLewis930 wants to merge 1 commit intopr_015_beforefrom
pr_015_after
Open

Add I/O statistics on Linux#5
MitchLewis930 wants to merge 1 commit intopr_015_beforefrom
pr_015_after

Conversation

@MitchLewis930
Copy link

@MitchLewis930 MitchLewis930 commented Jan 29, 2026

User description

PR_015


PR Type

Enhancement


Description

  • Add I/O statistics collection from /proc/diskstats on Linux

  • Track device read/write operations and sectors per block device

  • Calculate delta metrics between consecutive probe intervals

  • Expose aggregated and per-device I/O stats in filesystem monitoring


Diagram Walkthrough

flowchart LR
  A["ESFileStore"] -->|"Extract device numbers"| B["NodePath"]
  B -->|"Store major/minor device IDs"| C["FsProbe"]
  C -->|"Read /proc/diskstats"| D["DeviceStats"]
  D -->|"Calculate deltas"| E["IoStats"]
  E -->|"Aggregate metrics"| F["FsInfo"]
  F -->|"Expose via API"| G["Nodes Stats Response"]
Loading

File Walkthrough

Relevant files
Enhancement
5 files
ESFileStore.java
Extract device numbers from mountinfo                                       
+25/-4   
NodeEnvironment.java
Store major and minor device numbers                                         
+7/-1     
FsInfo.java
Add DeviceStats and IoStats classes                                           
+243/-8 
FsProbe.java
Implement I/O statistics collection from diskstats             
+79/-3   
FsService.java
Pass previous stats for delta calculation                               
+28/-17 
Formatting
1 files
MonitorService.java
Remove unnecessary comments and formatting                             
+1/-7     
Tests
4 files
DiskUsageTests.java
Update FsInfo constructor calls with IoStats                         
+6/-6     
DeviceStatsTests.java
Add unit tests for DeviceStats calculations                           
+62/-0   
FsProbeTests.java
Add tests for I/O statistics collection                                   
+120/-1 
MockInternalClusterInfoService.java
Update FsInfo constructor call with IoStats                           
+1/-1     
Configuration changes
1 files
security.policy
Grant read permission for /proc/diskstats                               
+3/-0     
Documentation
1 files
nodes-stats.asciidoc
Document new I/O statistics fields                                             
+49/-0   

This commit adds a variety of real disk metrics for the block devices
that back Elasticsearch data paths. A collection of statistics are read
from /proc/diskstats and are used to report the raw metrics for
operations and read/write bytes.

Relates elastic#15915
@qodo-code-review
Copy link

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
Sensitive info exposure

Description: The new fs.io_stats output (including per-device device_name plus aggregated I/O totals)
may expose host-level device topology/identifiers and activity patterns to any principal
that can access the Nodes Stats API, which can be sensitive in shared/hosted environments.

FsInfo.java [314-496]

Referred Code
    public XContentBuilder toXContent(XContentBuilder builder, Params params) throws IOException {
        builder.field("device_name", deviceName);
        builder.field(IoStats.OPERATIONS, operations());
        builder.field(IoStats.READ_OPERATIONS, readOperations());
        builder.field(IoStats.WRITE_OPERATIONS, writeOperations());
        builder.field(IoStats.READ_KILOBYTES, readKilobytes());
        builder.field(IoStats.WRITE_KILOBYTES, writeKilobytes());
        return builder;
    }

}

public static class IoStats implements Writeable, ToXContent {

    private static final String OPERATIONS = "operations";
    private static final String READ_OPERATIONS = "read_operations";
    private static final String WRITE_OPERATIONS = "write_operations";
    private static final String READ_KILOBYTES = "read_kilobytes";
    private static final String WRITE_KILOBYTES = "write_kilobytes";

    final DeviceStats[] devicesStats;


 ... (clipped 162 lines)
Ticket Compliance
🎫 No ticket provided
  • Create ticket/issue
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

🔴
Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status:
Missing edge validation: Parsing of /proc/diskstats assumes fixed field counts and directly indexes fields[...]
without bounds/format checks, risking runtime exceptions on unexpected kernel/format
variations.

Referred Code
List<String> lines = readProcDiskStats();
if (!lines.isEmpty()) {
    for (String line : lines) {
        String fields[] = line.trim().split("\\s+");
        final int majorDeviceNumber = Integer.parseInt(fields[0]);
        final int minorDeviceNumber = Integer.parseInt(fields[1]);
        if (!devicesNumbers.contains(Tuple.tuple(majorDeviceNumber, minorDeviceNumber))) {
            continue;
        }
        final String deviceName = fields[2];
        final long readsCompleted = Long.parseLong(fields[3]);
        final long sectorsRead = Long.parseLong(fields[5]);
        final long writesCompleted = Long.parseLong(fields[7]);
        final long sectorsWritten = Long.parseLong(fields[9]);
        final FsInfo.DeviceStats deviceStats =
                new FsInfo.DeviceStats(
                        majorDeviceNumber,
                        minorDeviceNumber,
                        deviceName,
                        readsCompleted,
                        sectorsRead,


 ... (clipped 4 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status:
Unvalidated proc parsing: /proc/self/mountinfo lines are split and indexed (e.g., fields[2], fields[4]) without
validating field presence/format, which can lead to exceptions or incorrect device number
extraction.

Referred Code
try {
    final List<String> lines = Files.readAllLines(PathUtils.get("/proc/self/mountinfo"));
    for (final String line : lines) {
        final String[] fields = line.trim().split("\\s+");
        final String mountPoint = fields[4];
        if (mountPoint.equals(getMountPointLinux(in))) {
            final String[] deviceNumbers = fields[2].split(":");
            majorDeviceNumber = Integer.parseInt(deviceNumbers[0]);
            minorDeviceNumber = Integer.parseInt(deviceNumbers[1]);
        }
    }

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status:
Confusing parameter order: DeviceStats's public constructor forwards parameters in a different order than its
signature (e.g., currentSectorsRead vs currentSectorsWritten), which harms readability and
can lead to misuse.

Referred Code
public DeviceStats(
        final int majorDeviceNumber,
        final int minorDeviceNumber,
        final String deviceName,
        final long currentReadsCompleted,
        final long currentSectorsRead,
        final long currentWritesCompleted,
        final long currentSectorsWritten,
        final DeviceStats previousDeviceStats) {
    this(
            majorDeviceNumber,
            minorDeviceNumber,
            deviceName,
            currentReadsCompleted,
            previousDeviceStats != null ? previousDeviceStats.currentReadsCompleted : -1,
            currentSectorsWritten,
            previousDeviceStats != null ? previousDeviceStats.currentSectorsWritten : -1,
            currentSectorsRead,
            previousDeviceStats != null ? previousDeviceStats.currentSectorsRead : -1,
            currentWritesCompleted,
            previousDeviceStats != null ? previousDeviceStats.currentWritesCompleted : -1);


 ... (clipped 1 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

@qodo-code-review
Copy link

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix incorrect constructor argument order

Fix a bug in the DeviceStats public constructor by reordering the arguments
passed to the private constructor to match its signature, ensuring correct I/O
statistics calculation.

core/src/main/java/org/elasticsearch/monitor/fs/FsInfo.java [205-226]

 public DeviceStats(
         final int majorDeviceNumber,
         final int minorDeviceNumber,
         final String deviceName,
         final long currentReadsCompleted,
         final long currentSectorsRead,
         final long currentWritesCompleted,
         final long currentSectorsWritten,
         final DeviceStats previousDeviceStats) {
     this(
             majorDeviceNumber,
             minorDeviceNumber,
             deviceName,
             currentReadsCompleted,
             previousDeviceStats != null ? previousDeviceStats.currentReadsCompleted : -1,
-            currentSectorsWritten,
-            previousDeviceStats != null ? previousDeviceStats.currentSectorsWritten : -1,
+            currentWritesCompleted,
+            previousDeviceStats != null ? previousDeviceStats.currentWritesCompleted : -1,
             currentSectorsRead,
             previousDeviceStats != null ? previousDeviceStats.currentSectorsRead : -1,
-            currentWritesCompleted,
-            previousDeviceStats != null ? previousDeviceStats.currentWritesCompleted : -1);
+            currentSectorsWritten,
+            previousDeviceStats != null ? previousDeviceStats.currentSectorsWritten : -1);
 }
  • Apply / Chat
Suggestion importance[1-10]: 9

__

Why: This suggestion correctly identifies a critical bug where constructor arguments are passed in the wrong order, which would lead to incorrect I/O statistics. Fixing this is essential for the feature's correctness.

High
Use last value for stats calculation

In FsInfoCache.refresh(), use the last cached value instead of the initialValue
for calculating I/O stats. This ensures stats represent the delta between
refreshes, not a cumulative value since startup.

core/src/main/java/org/elasticsearch/monitor/fs/FsService.java [67-81]

 private class FsInfoCache extends SingleObjectCache<FsInfo> {
-
-    private final FsInfo initialValue;
 
     public FsInfoCache(TimeValue interval, FsInfo initialValue) {
         super(interval, initialValue);
-        this.initialValue = initialValue;
     }
 
     @Override
     protected FsInfo refresh() {
-        return stats(probe, initialValue, logger);
+        return stats(probe, get(), logger);
     }
 
 }
  • Apply / Chat
Suggestion importance[1-10]: 9

__

Why: This suggestion correctly identifies a critical logic flaw in the caching mechanism. The I/O stats would be cumulative since startup instead of per refresh interval, which defeats the purpose of periodic monitoring. This is a major bug in the feature's implementation.

High
Fix serialization and calculation logic

Fix a serialization bug in IoStats where total stats fields are read but not
written. Update the writeTo method to serialize these fields and adjust the
constructor logic for calculating totals.

core/src/main/java/org/elasticsearch/monitor/fs/FsInfo.java [341-360]

 public IoStats(final DeviceStats[] devicesStats) {
     this.devicesStats = devicesStats;
     long totalOperations = 0;
     long totalReadOperations = 0;
     long totalWriteOperations = 0;
     long totalReadKilobytes = 0;
     long totalWriteKilobytes = 0;
+    boolean hasPrevious = true;
     for (DeviceStats deviceStats : devicesStats) {
-        totalOperations += deviceStats.operations() != -1 ? deviceStats.operations() : 0;
-        totalReadOperations += deviceStats.readOperations() != -1 ? deviceStats.readOperations() : 0;
-        totalWriteOperations += deviceStats.writeOperations() != -1 ? deviceStats.writeOperations() : 0;
-        totalReadKilobytes += deviceStats.readKilobytes() != -1 ? deviceStats.readKilobytes() : 0;
-        totalWriteKilobytes += deviceStats.writeKilobytes() != -1 ? deviceStats.writeKilobytes() : 0;
+        if (deviceStats.operations() == -1) {
+            hasPrevious = false;
+        }
+        if (hasPrevious) {
+            totalOperations += deviceStats.operations();
+            totalReadOperations += deviceStats.readOperations();
+            totalWriteOperations += deviceStats.writeOperations();
+            totalReadKilobytes += deviceStats.readKilobytes();
+            totalWriteKilobytes += deviceStats.writeKilobytes();
+        }
     }
-    this.totalOperations = totalOperations;
-    this.totalReadOperations = totalReadOperations;
-    this.totalWriteOperations = totalWriteOperations;
-    this.totalReadKilobytes = totalReadKilobytes;
-    this.totalWriteKilobytes = totalWriteKilobytes;
+    this.totalOperations = hasPrevious ? totalOperations : -1;
+    this.totalReadOperations = hasPrevious ? totalReadOperations : -1;
+    this.totalWriteOperations = hasPrevious ? totalWriteOperations : -1;
+    this.totalReadKilobytes = hasPrevious ? totalReadKilobytes : -1;
+    this.totalWriteKilobytes = hasPrevious ? totalWriteKilobytes : -1;
 }
  • Apply / Chat
Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies a critical bug where total stats are read during deserialization but never written, which would cause EOFException. This is a significant correctness issue that needs to be fixed.

Medium
Initialize device numbers and break loop

Initialize majorDeviceNumber and minorDeviceNumber to -1 to ensure consistent
behavior for "not found" cases. Add a break statement to the loop for efficiency
after a match is found.

core/src/main/java/org/elasticsearch/env/ESFileStore.java [48-83]

-int majorDeviceNumber;
-int minorDeviceNumber;
+int majorDeviceNumber = -1;
+int minorDeviceNumber = -1;
 
 @SuppressForbidden(reason = "tries to determine if disk is spinning")
 // TODO: move PathUtils to be package-private here instead of 
 // public+forbidden api!
 ESFileStore(FileStore in) {
     this.in = in;
     Boolean spins;
     // Lucene's IOUtils.spins only works on Linux today:
     if (Constants.LINUX) {
         try {
             spins = IOUtils.spins(PathUtils.get(getMountPointLinux(in)));
         } catch (Exception e) {
             spins = null;
         }
         try {
             final List<String> lines = Files.readAllLines(PathUtils.get("/proc/self/mountinfo"));
             for (final String line : lines) {
                 final String[] fields = line.trim().split("\\s+");
                 final String mountPoint = fields[4];
                 if (mountPoint.equals(getMountPointLinux(in))) {
                     final String[] deviceNumbers = fields[2].split(":");
                     majorDeviceNumber = Integer.parseInt(deviceNumbers[0]);
                     minorDeviceNumber = Integer.parseInt(deviceNumbers[1]);
+                    break;
                 }
             }
         } catch (Exception e) {
-            majorDeviceNumber = -1;
-            minorDeviceNumber = -1;
+            // major/minor device numbers are already -1
         }
     } else {
         spins = null;
     }
     this.spins = spins;
 }
  • Apply / Chat
Suggestion importance[1-10]: 6

__

Why: The suggestion correctly identifies that majorDeviceNumber and minorDeviceNumber are not consistently initialized, which could lead to 0 being used as a default instead of the intended -1 for "not found". It also provides a useful optimization by adding a break to the loop.

Low
General
Close total JSON object

In IoStats.toXContent, add a builder.endObject() call after writing the fields
for the "total" object to produce valid JSON.

core/src/main/java/org/elasticsearch/monitor/fs/FsInfo.java [389-407]

 @Override
 public XContentBuilder toXContent(XContentBuilder builder, Params params) throws IOException {
     if (devicesStats.length > 0) {
         builder.startArray("devices");
         ...
         builder.endArray();
         builder.startObject("total");
         builder.field(OPERATIONS, totalOperations);
         builder.field(READ_OPERATIONS, totalReadOperations);
         builder.field(WRITE_OPERATIONS, totalWriteOperations);
         builder.field(READ_KILOBYTES, totalReadKilobytes);
         builder.field(WRITE_KILOBYTES, totalWriteKilobytes);
+        builder.endObject();
     }
     return builder;
 }

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 8

__

Why: The suggestion correctly points out that a builder.endObject() call is missing, which would result in malformed JSON output. This is a significant bug that would break consumers of this API endpoint.

Medium
  • More

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants