Skip to content

Commit 9d5b3ff

Browse files
jasha26Mahalaxmibejugamankitaluthra1suni72
authored
Write, List, Rename & Delete Microbenchmarks (#738)
* Adds GCSFS Microbenchmarks * Fixing Block size and consistency options in Extended GCSFS Open (#34) * Separate versioned and non-versioned tests to use different bucket * Update cleanup logic in tests to empty the bucket instead of deleting the bucket * Add consistency and blocksize for ext gcsfs * Merge conflicts resolution * bucket type paramet for fs test for block size and consistency * removed unused mocks variables from some tests * fixing lint errors * fixed small issue with core tests so it can run with exp flag true --------- Co-authored-by: Mahalaxmibejugam <60227368+Mahalaxmibejugam@users.noreply.github.com> * pytest microbenchmarks for seq and random reads both single multi threaded * multiprocess benchmarks * script to run tests * undo settings for bucket names and logs * benchmark script updates * file size and bucket type decorators * file size configuration * removed zonal config * Added README * Readme update * Moving settings and fixture to tests root * Readme update * Readme update * Ignore benchmark pytests in CI * benchmark hook fix * adding skip tests flag * benchmark plugin conditional enablement * Fixing PR Comments, simplifying the configuration by doing auto gen * Fixing PR Comments * default settings --------- Co-authored-by: Mahalaxmibejugam <60227368+Mahalaxmibejugam@users.noreply.github.com> * resolves comments and adds perf micro profiler * Fixing Block size and consistency options in Extended GCSFS Open (#34) * Separate versioned and non-versioned tests to use different bucket * Update cleanup logic in tests to empty the bucket instead of deleting the bucket * Add consistency and blocksize for ext gcsfs * Merge conflicts resolution * bucket type paramet for fs test for block size and consistency * removed unused mocks variables from some tests * fixing lint errors * fixed small issue with core tests so it can run with exp flag true --------- Co-authored-by: Mahalaxmibejugam <60227368+Mahalaxmibejugam@users.noreply.github.com> * pytest microbenchmarks for seq and random reads both single multi threaded * multiprocess benchmarks * script to run tests * undo settings for bucket names and logs * benchmark script updates * file size and bucket type decorators * file size configuration * removed zonal config * Added README * Readme update * Moving settings and fixture to tests root * Readme update * Readme update * Ignore benchmark pytests in CI * benchmark hook fix * adding skip tests flag * benchmark plugin conditional enablement * Fixing PR Comments, simplifying the configuration by doing auto gen * Fixing PR Comments * default settings * added resource monitoring for benchmarks * minor refactoring * moved config to yaml * lint fixes * config yaml update for full read suite * undo zonal file logging changes * simplify single threaded read * bringing back requirements * update readme * csv generation fix when some tests fail * psutil install in cloudbuild * psutil install in cloudbuild * Removing zonal conditional code and updating config --------- Co-authored-by: Mahalaxmibejugam <60227368+Mahalaxmibejugam@users.noreply.github.com> * Parallel File Upload - Microbenchmark Setup (#54) * Fixing Block size and consistency options in Extended GCSFS Open (#34) * Separate versioned and non-versioned tests to use different bucket * Update cleanup logic in tests to empty the bucket instead of deleting the bucket * Add consistency and blocksize for ext gcsfs * Merge conflicts resolution * bucket type paramet for fs test for block size and consistency * removed unused mocks variables from some tests * fixing lint errors * fixed small issue with core tests so it can run with exp flag true --------- Co-authored-by: Mahalaxmibejugam <60227368+Mahalaxmibejugam@users.noreply.github.com> * pytest microbenchmarks for seq and random reads both single multi threaded * multiprocess benchmarks * script to run tests * undo settings for bucket names and logs * benchmark script updates * file size and bucket type decorators * file size configuration * removed zonal config * Added README * Readme update * Moving settings and fixture to tests root * Readme update * Readme update * Ignore benchmark pytests in CI * benchmark hook fix * adding skip tests flag * benchmark plugin conditional enablement * Fixing PR Comments, simplifying the configuration by doing auto gen * Fixing PR Comments * default settings * added resource monitoring for benchmarks * minor refactoring * moved config to yaml * lint fixes * config yaml update for full read suite * undo zonal file logging changes * simplify single threaded read * bringing back requirements * update readme * csv generation fix when some tests fail * psutil install in cloudbuild * psutil install in cloudbuild * Removing zonal conditional code and updating config * parallel file creation in setup * merge conflicts * merge conflicts - lint fixes * lint issues --------- Co-authored-by: Mahalaxmibejugam <60227368+Mahalaxmibejugam@users.noreply.github.com> * write benchmarks * Adds mv/rename method implementation for HNS buckets (#727) * Add mv method for HNS aware implementation * enable HNS tests in the CI pipeline * Update super mv call to pass all arguments * Add more unit tests and make info call optional * add TODO * refactor tests and address comments * address comments * add debug logs * update cache logic for rename method * add TODO * feat(zb-write): Support write mode in Zonal File (#726) * Support write mode in ZonalFile and override related methods * Add unit test for init_aaow method in zb_hns_utils * Implement zonal write methods * Remove overwrite parameter since it is not added in AAOW. instead use generation=None to overwrite * Remove redundant init_aaow method from zonal_file * update statement for NotImplementedErrors * set autocommit=false as default for ZonalFile * add logic to route upload methods to zonal implementation * Revert "set autocommit=false as default for ZonalFile" This reverts commit d51c8c3. * add logic to skip test in test_file itself * Added comments for clarity * Added tests for Zonal writes Close GCSFile before aaow/mrd * Do not finalize files on close by default Use simple_flush instead of normal flush add comments for clarity * Update requirements to use python-storage 3.7.0 update ci pipeline to run test_zonal_file also * Update ulimit to 2048 to avoid too many open files error * Update tests to finalize the object before reading * Support append mode in ZonalFile * try removing finalized file if it already exists * Fix lint error * fix failing hns rename tests * Update assertions in mv tests touch method's side effect shouldn't be counted in the assertion of the rename test --------- Co-authored-by: Mahalaxmi <mahalaxmib@google.com> * write test random chunk on every write * removed zonal conditional code * Adds mv/rename method implementation for HNS buckets (#727) * Add mv method for HNS aware implementation * enable HNS tests in the CI pipeline * Update super mv call to pass all arguments * Add more unit tests and make info call optional * add TODO * refactor tests and address comments * address comments * add debug logs * update cache logic for rename method * add TODO * Listing benchmarking * Listing benchmarking fixes * Listing benchmarking scenarios * refactoring benchmarking code * Testing after refactoring * Fixing listing yaml * refactor file prep * rename and delete benchmarks * renaming threads, processes and files fields * README update and yaml changes * refactoring read, write setup * Update read and write config * fixing zonal write issue and adding data integrity check * Added unit tests for benchmarks helpers * Integration test failure fixes * Integration test failure fixes --------- Co-authored-by: Mahalaxmibejugam <60227368+Mahalaxmibejugam@users.noreply.github.com> Co-authored-by: Ankita Luthra <lankita@google.com> Co-authored-by: suni72 <59387263+suni72@users.noreply.github.com> Co-authored-by: Mahalaxmi <mahalaxmib@google.com>
1 parent 3ed414a commit 9d5b3ff

32 files changed

+1716
-436
lines changed

.isort.cfg

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
11
[settings]
22
profile = black
3-
known_third_party = aiohttp,click,conftest,decorator,fsspec,fuse,google,google_auth_oauthlib,numpy,prettytable,psutil,pytest,pytest_asyncio,requests,resource_monitor,setuptools,yaml
3+
known_third_party = aiohttp,click,decorator,fsspec,fuse,google,google_auth_oauthlib,numpy,prettytable,psutil,pytest,pytest_asyncio,requests,resource_monitor,setuptools,yaml

cloudbuild/e2e-tests-cloudbuild.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -98,7 +98,7 @@ steps:
9898
9999
pip install --upgrade pip > /dev/null
100100
# Install testing libraries explicitly, as they are not in setup.py
101-
pip install pytest pytest-timeout pytest-subtests pytest-asyncio fusepy google-cloud-storage psutil PyYAML > /dev/null
101+
pip install pytest pytest-timeout pytest-subtests pytest-asyncio fusepy google-cloud-storage psutil PyYAML numpy prettytable > /dev/null
102102
pip install -e . > /dev/null
103103
"
104104

gcsfs/tests/perf/microbenchmarks/README.md

Lines changed: 73 additions & 172 deletions
Original file line numberDiff line numberDiff line change
@@ -2,222 +2,123 @@
22

33
## Introduction
44

5-
This document describes the microbenchmark suite for `gcsfs`. These benchmarks are designed to measure the performance
6-
of various I/O operations under different conditions. They are built using `pytest` and the `pytest-benchmark` plugin to
7-
provide detailed performance metrics for single-threaded, multi-threaded, and multi-process scenarios.
5+
GCSFS microbenchmarks are a suite of performance tests designed to evaluate the efficiency and latency of various Google Cloud Storage file system operations, including read, write, listing, delete, and rename.
86

9-
## Prerequisites
7+
These benchmarks are built using the `pytest` and `pytest-benchmark` frameworks. Each benchmark test is a parameterized pytest case, where the parameters are dynamically configured at runtime from YAML configuration files. This allows for flexible and extensive testing scenarios without modifying the code.
108

11-
Before running the benchmarks, ensure you have installed the project's dependencies for performance testing. This can be
12-
done by running the following command from the root of the repository:
9+
An orchestrator script (`run.py`) is provided to execute specific or all benchmarks, manage the test environment, and generate detailed reports in CSV format along with a summary table.
1310

14-
```bash
15-
pip install -r gcsfs/tests/perf/microbenchmarks/requirements.txt
16-
```
17-
18-
This will install `pytest`, `pytest-benchmark`, and other necessary dependencies.
19-
For more information on `pytest-benchmark`, you can refer to its official documentation. [1]
20-
21-
## Read Benchmarks
22-
23-
The read benchmarks are located in `gcsfs/tests/perf/microbenchmarks/read/` and are designed to test read performance
24-
with various configurations.
25-
26-
### Parameters
27-
28-
The read benchmarks are defined by the `ReadBenchmarkParameters` class in `read/parameters.py`. Key parameters include:
29-
30-
* `name`: The name of the benchmark configuration.
31-
* `num_files`: The number of files to use, this is always num_processes x num_threads.
32-
* `pattern`: Read pattern, either sequential (`seq`) or random (`rand`).
33-
* `num_threads`: Number of threads for multi-threaded tests.
34-
* `num_processes`: Number of processes for multi-process tests.
35-
* `block_size_bytes`: The block size for gcsfs file buffering. Defaults to `16MB`.
36-
* `chunk_size_bytes`: The size of each read operation. Defaults to `16MB`.
37-
* `file_size_bytes`: The total size of each file.
38-
* `rounds`: The total number of pytest-benchmark rounds for each parameterized test. Defaults to `10`.
39-
40-
To ensure that the results are stable and not skewed by outliers, each benchmark is run for a set number of rounds.
41-
By default, this is set to 10 rounds, but it can be configured via `rounds` parameter if needed. This helps in providing
42-
a more accurate and reliable performance profile.
11+
## How to install
4312

44-
### Configurations
45-
46-
The base configurations in `read/configs.yaml` are simplified to just `read_seq` and `read_rand`. Decorators are then
47-
used to generate a full suite of test cases by creating variations for parallelism, file sizes, and bucket types.
48-
49-
The benchmarks are split into three main test functions based on the execution model:
50-
51-
* `test_read_single_threaded`: Measures baseline performance of read operations.
52-
* `test_read_multi_threaded`: Measures performance with multiple threads.
53-
* `test_read_multi_process`: Measures performance using multiple processes, each with its own set of threads.
54-
55-
### Running Benchmarks with `pytest`
56-
57-
You can use `pytest` to run the benchmarks directly.
58-
The `GCSFS_BENCHMARK_FILTER` option is useful for filtering tests by name.
59-
60-
**Examples:**
61-
62-
Run all read benchmarks:
13+
To run the microbenchmarks, you need to install the required dependencies. You can do this using pip:
6314

6415
```bash
65-
pytest gcsfs/tests/perf/microbenchmarks/read/
16+
pip install -r requirements.txt
6617
```
6718

68-
Run a specific benchmark(s) configuration by setting `GCSFS_BENCHMARK_FILTER` environment variable which expect comma
69-
separated configuration names.
70-
This is useful for targeting specific configuration(s) defined in `read/configs.yaml`.
19+
Ensure you have the necessary Google Cloud credentials set up to access the GCS buckets used in the tests.
7120

72-
For example, if you want to run multi process sequential and random reads only, you can set:
21+
## Parameters
7322

74-
```bash
75-
export GCSFS_BENCHMARK_FILTER="read_seq_multi_process, read_rand_multi_process"
76-
pytest gcsfs/tests/perf/microbenchmarks/read/
77-
```
23+
The benchmarks use a set of parameter classes to define the configuration for each test case.
7824

79-
## Function-level Fixture: `gcsfs_benchmark_read_write`
25+
* **Base Parameters**: Common to all benchmarks.
26+
* `name`: Unique name for the benchmark case.
27+
* `bucket_name`: The GCS bucket used.
28+
* `bucket_type`: Type of bucket (regional, zonal, hns).
29+
* `threads`: Number of threads.
30+
* `processes`: Number of processes.
31+
* `files`: Number of files involved.
32+
* `rounds`: Number of iterations for the benchmark.
8033

81-
A function-level `pytest` fixture named `gcsfs_benchmark_read_write` (defined in `conftest.py`) is used to set up and
82-
tear down the environment for the benchmarks.
34+
* **IO Parameters**: Common to Read and Write operations.
35+
* `file_size_bytes`: Size of the file.
36+
* `chunk_size_bytes`: Size of chunks for I/O operations.
8337

84-
### Setup and Teardown
38+
* **Read Parameters**: Specific to Read operations (extends IO Parameters).
39+
* `pattern`: Read pattern ("seq" for sequential, "rand" for random).
40+
* `block_size_bytes`: Block size for GCSFS file buffering.
8541

86-
* **Setup**: Before a benchmark function runs, this fixture creates the specified number of files with the configured
87-
size in a temporary directory within the test bucket. It uses `os.urandom()` to write data in chunks to avoid high
88-
memory usage.
89-
* **Teardown**: After the benchmark completes, the fixture recursively deletes the temporary directory and all the files
90-
created during the setup phase.
42+
* **Listing Parameters**: Specific to Listing, Delete, and Rename operations.
43+
* `depth`: Directory depth.
44+
* `folders`: Number of folders.
45+
* `pattern`: Listing pattern (e.g., "ls", "find").
9146

92-
Here is how the fixture is used in a test:
47+
## Configuration
9348

94-
```python
95-
@pytest.mark.parametrize(
96-
"gcsfs_benchmark_read_write",
97-
single_threaded_cases,
98-
indirect=True,
99-
ids=lambda p: p.name,
100-
)
101-
def test_read_single_threaded(benchmark, gcsfs_benchmark_read_write):
102-
gcs, file_paths, params = gcsfs_benchmark_read_write
103-
# ... benchmark logic ...
104-
```
49+
Configuration values are stored in YAML files (e.g., `configs.yaml`) located within each benchmark's directory. These files define:
10550

106-
### Environment Variables
51+
* **Common**: Shared settings like bucket types, file sizes, or rounds.
52+
* **Scenarios**: Specific test scenarios defining variations in threads, processes, patterns, etc.
10753

108-
To run the benchmarks, you need to configure your environment.
109-
The orchestrator script (`run.py`) sets these for you, but if you are running `pytest` directly, you will need to export
110-
them.
54+
## Configurators
11155

112-
* `GCSFS_TEST_BUCKET`: The name of a regional GCS bucket.
113-
* `GCSFS_ZONAL_TEST_BUCKET`: The name of a zonal GCS bucket.
114-
* `GCSFS_HNS_TEST_BUCKET`: The name of an HNS-enabled GCS bucket.
56+
Configurators are Python classes (e.g., `ReadConfigurator`, `ListingConfigurator`) responsible for parsing the YAML configuration files and converting them into a list of parameter objects (`BenchmarkParameters`). These objects are then consumed by the test files to generate parameterized test cases.
11557

116-
You must also set the following environment variables to ensure that the benchmarks run against the live GCS API and
117-
that experimental features are enabled.
58+
## Benchmark File
11859

119-
```bash
120-
export STORAGE_EMULATOR_HOST="https://storage.googleapis.com"
121-
export GCSFS_EXPERIMENTAL_ZB_HNS_SUPPORT="true"
122-
```
60+
The benchmark files (e.g., `test_read.py`, `test_listing.py`) contain the actual test logic. They call the respective configurator to retrieve the list of benchmark cases (parameters).
12361

124-
## Orchestrator Script (`run.py`)
62+
Each test function is decorated with `@pytest.mark.parametrize` to run multiple variations based on the generated parameters. The benchmarks support three execution modes:
12563

126-
An orchestrator script, `run.py`, is provided to simplify running the benchmark suite. It wraps `pytest`, sets up the
127-
necessary environment variables, and generates a summary report.
64+
1. **Single-threaded**: Runs the operation in the main thread.
65+
2. **Multi-threaded**: Uses `ThreadPoolExecutor` to run operations concurrently within a single process.
66+
3. **Multi-process**: Uses `multiprocessing` to run operations across multiple processes, each potentially using multiple threads.
12867

129-
### Parameters
68+
## Orchestrator Script
13069

131-
The script accepts several command-line arguments:
70+
The `run.py` script is the central entry point for executing benchmarks. It handles environment setup, test execution via `pytest`, and report generation.
13271

133-
* `--group`: The benchmark group to run (e.g., `read`).
134-
* `--config`: The name of a specific benchmark configuration to run (e.g., `read_seq`).
135-
* `--regional-bucket`: Name of the Regional GCS bucket.
136-
* `--zonal-bucket`: Name of the Zonal GCS bucket.
137-
* `--hns-bucket`: Name of the HNS GCS bucket.
138-
* `--log`: Set to `true` to enable `pytest` console logging.
139-
* `--log-level`: Sets the log level (e.g., `INFO`, `DEBUG`).
72+
### Command Line Options
14073

141-
**Important Notes:**
74+
| Option | Description | Required |
75+
| :--- | :--- | :--- |
76+
| `--group` | The benchmark group to run (e.g., `read`, `write`, `listing`). Runs all groups if not specified. | No |
77+
| `--config` | Specific scenario names to run (e.g., `read_seq`, `list_flat`). Accepts multiple values. | No |
78+
| `--regional-bucket` | Name of the regional GCS bucket. | Yes* |
79+
| `--zonal-bucket` | Name of the zonal GCS bucket. | Yes* |
80+
| `--hns-bucket` | Name of the HNS GCS bucket. | Yes* |
81+
| `--log` | Enable console logging (`true` or `false`). Default: `false`. | No |
82+
| `--log-level` | Logging level (e.g., `INFO`, `DEBUG`). Default: `DEBUG`. | No |
14283

143-
* You must provide at least one bucket name (`--regional-bucket`, `--zonal-bucket`, or `--hns-bucket`).
84+
*\* At least one bucket type must be provided.*
14485

145-
Run the script with `--help` to see all available options:
86+
### Usage Examples
14687

88+
**1. Run all benchmarks**
89+
Runs every available benchmark against a regional bucket.
14790
```bash
148-
python gcsfs/tests/perf/microbenchmarks/run.py --help
91+
python gcsfs/tests/perf/microbenchmarks/run.py --regional-bucket=<BUCKET_NAME>
14992
```
15093

151-
### Examples
152-
153-
Here are some examples of how to use the orchestrator script from the root of the `gcsfs` repository:
154-
155-
Run all available benchmarks against a regional bucket with default settings. This is the simplest way to trigger all
156-
tests across all groups (e.g., read, write):
157-
94+
**2. Run a specific group**
95+
Runs only the tests in the `read` directory.
15896
```bash
159-
python gcsfs/tests/perf/microbenchmarks/run.py --regional-bucket your-regional-bucket
97+
python gcsfs/tests/perf/microbenchmarks/run.py --group=read --regional-bucket=<BUCKET_NAME>
16098
```
16199

162-
Run only the `read` group benchmarks against a regional bucket with the default 128MB file size:
163-
100+
**3. Run specific scenarios**
101+
Runs only the scenarios named `read_seq` and `read_rand`. This is useful for targeting specific configurations defined in the YAML files.
164102
```bash
165-
python gcsfs/tests/perf/microbenchmarks/run.py --group read --regional-bucket your-regional-bucket
103+
python gcsfs/tests/perf/microbenchmarks/run.py --config=read_seq,read_rand --regional-bucket=<BUCKET_NAME>
166104
```
167105

168-
Run only the single-threaded sequential read benchmark with 256MB and 512MB file sizes:
169-
106+
**4. Run with multiple bucket types**
107+
Runs benchmarks against both regional and zonal buckets.
170108
```bash
171-
python gcsfs/tests/perf/microbenchmarks/run.py \
172-
--group read \
173-
--config "read_seq" \
174-
--regional-bucket your-regional-bucket
109+
python gcsfs/tests/perf/microbenchmarks/run.py --group=write --regional-bucket=<REGIONAL_BUCKET> --zonal-bucket=<ZONAL_BUCKET>
175110
```
176111

177-
Run all read benchmarks against both a regional and a zonal bucket:
178-
112+
**5. Run with logging enabled**
113+
Enables detailed logging to the console during execution.
179114
```bash
180-
python gcsfs/tests/perf/microbenchmarks/run.py \
181-
--group read \
182-
--regional-bucket your-regional-bucket \
183-
--zonal-bucket your-zonal-bucket
115+
python gcsfs/tests/perf/microbenchmarks/run.py --group=delete --regional-bucket=<BUCKET_NAME> --log=true --log-level=INFO
184116
```
185117

186-
### Script Output
187-
188-
The script will create a timestamped directory in `gcsfs/tests/perf/microbenchmarks/__run__/` containing the JSON and
189-
CSV results, and it will print a summary table to the console.
190-
191-
#### JSON File (`results.json`)
192-
193-
The `results.json` file will contain a structured representation of the benchmark results.
194-
The exact content can vary depending on the pytest-benchmark version and the tests run, but it typically includes:
195-
196-
* machine_info: Details about the system where the benchmarks were run (e.g., Python version, OS, CPU).
197-
* benchmarks: A list of individual benchmark results, each containing:
198-
* name: The name of the benchmark test.
199-
* stats: Performance statistics like min, max, mean, stddev, rounds, iterations, ops (operations per second), q1,
200-
q3 (quartiles).
201-
* options: Configuration options used for the benchmark (e.g., min_rounds, max_time).
202-
* extra_info: Any additional information associated with the benchmark.
203-
204-
#### CSV File (`results.csv`)
205-
206-
The CSV file provides a detailed performance profile of gcsfs operations, allowing for analysis of how different factors
207-
like threading, process parallelism, and access patterns affect I/O throughput.
208-
This file is a summarized view of the results generated in the JSON file and for each test run, the file records
209-
detailed performance statistics, including:
210-
211-
* Minimum, maximum, mean, and median execution times in secs.
212-
* Standard deviation and percentile values (p90, p95, p99) for timing.
213-
* The maximum throughput achieved, measured in Megabytes per second (MB/s).
214-
* The maximum CPU and memory used during the test
215-
216-
#### Summary Table
118+
## Output
217119

218-
The script also puts out a nice summary table like below, for quick glance at results.
120+
The orchestrator script generates output in a structured format:
219121

220-
| Bucket Type | Group | Pattern | Files | Threads | Processes | File Size (MB) | Chunk Size (MB) | Block Size (MB) | Min Latency (s) | Mean Latency (s) | Max Throughput (MB/s) | Max CPU (%) | Max Memory (MB) |
221-
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
222-
| regional | read | seq | 1 | 1 | 1 | 128.00 | 16.00 | 16.00 | 0.6391 | 0.7953 | 200.2678 | 0.26 | 507
223-
| regional | read | rand | 1 | 1 | 1 | 128.00 | 16.00 | 16.00 | 0.6537 | 0.7843 | 195.8066 | 5.6 | 510
122+
* **Directory**: Results are saved in a timestamped folder under `__run__` (e.g., `__run__/DDMMYYYY-HHMMSS/`).
123+
* **JSON**: A raw JSON file generated by `pytest-benchmark` containing detailed statistics.
124+
* **CSV**: A processed CSV report containing key metrics such as min/max/mean latency, throughput, and resource usage (CPU, Memory).
Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
import logging
2+
import os
3+
4+
import yaml
5+
6+
from gcsfs.tests.conftest import BUCKET_NAME_MAP
7+
from gcsfs.tests.settings import BENCHMARK_FILTER
8+
9+
10+
class BaseBenchmarkConfigurator:
11+
def __init__(self, module_file):
12+
self.config_path = os.path.join(os.path.dirname(module_file), "configs.yaml")
13+
14+
def _load_config(self):
15+
with open(self.config_path, "r") as f:
16+
config = yaml.safe_load(f)
17+
18+
common = config["common"]
19+
scenarios = config["scenarios"]
20+
21+
if BENCHMARK_FILTER:
22+
filter_names = [
23+
name.strip().lower() for name in BENCHMARK_FILTER.split(",")
24+
]
25+
scenarios = [s for s in scenarios if s["name"].lower() in filter_names]
26+
27+
return common, scenarios
28+
29+
def get_bucket_name(self, bucket_type):
30+
return BUCKET_NAME_MAP.get(bucket_type)
31+
32+
def generate_cases(self):
33+
common_config, scenarios = self._load_config()
34+
all_cases = []
35+
36+
for scenario in scenarios:
37+
cases = self.build_cases(scenario, common_config)
38+
all_cases.extend(cases)
39+
40+
if all_cases:
41+
logging.info(
42+
f"Benchmark cases to be triggered: {', '.join([case.name for case in all_cases])}"
43+
)
44+
return all_cases
45+
46+
def build_cases(self, scenario, common_config):
47+
"""
48+
Abstract method to be implemented by subclasses.
49+
Should return a list of BenchmarkParameters objects.
50+
"""
51+
raise NotImplementedError

0 commit comments

Comments
 (0)