Enhance the capability of Bridgescaler for supporting tensors by kevinyang-cky · Pull Request #23 · NCAR/bridgescaler

kevinyang-cky · 2026-02-04T17:13:07Z

This PR addresses the following things: (probably not dive into each commit as I reverted a couple of things during the development, see the latest version of the files)

Support saving out and reading in a distributed scaler for tensors: print_scaler_tensor() and read_scaler_tensor() in backend_tensor.py do that.
Add PyTorch library check: The basic idea is that if a user does not have PyTorch installed in the environment, Bridgescaler can still function properly. Errors will be raised if a user wants to use distributed scalers for tensors but does not have PyTorch installed or if the required version is not met.
Tensors placement in distributed_tensor.py: ensure input tensors and the following fitting or transforming calculation stay on the same device.
Code optimization for distributed_tensor.py: remove for-loop over channels and use vectorization instead.

Unit tests passed, and I suggest CREDIT to use this version of Bridgescaler moving on.

Here is an example for using distributed scalers for tensors, happy to put it into the docs if @djgagne can point to a place to include it.

import numpy as np
import pandas as pd
import torch

from bridgescaler.distributed_tensor import DStandardScalerTensor
from bridgescaler import print_scaler_tensor, read_scaler_tensor

# create synthetic data
x_1 = np.random.normal(0, 2.2, (20, 5, 4, 8))
x_2 = np.random.normal(1, 3.5, (25, 4, 8, 5))

# fitting and transform
dss_1_tensor = DStandardScalerTensor(channels_last=False)
dss_2_tensor = DStandardScalerTensor(channels_last=True)
dss_1_tensor.fit(torch.from_numpy(x_1))
dss_2_tensor.fit(torch.from_numpy(x_2))
dss_combined_tensor = dss_1_tensor + dss_2_tensor

dss_combined_tensor.transform(torch.from_numpy(x_1), channels_last=False)

# save out scalers and read back in
scaler_list = [dss_1_tensor, dss_2_tensor]
df = pd.DataFrame({"scalers": [print_scaler_tensor(s) for s in scaler_list]})
df.to_parquet("scalers.parquet")
df_new = pd.read_parquet("scalers.parquet")
scaler_objs = df_new["scalers"].apply(read_scaler_tensor)
total_scaler = scaler_objs.sum()

djgagne · 2026-02-09T18:44:20Z

@kevinyang-cky Your changes all look good code-wise. For the documentation, can you add a separate file on the tensor scalers to the directory https://github.com/NCAR/bridgescaler/tree/main/doc/source in a rst file? See other doc files for examples.

…ensor

…sor and errors in DStandScalerTensor

…bute

charlie-becker · 2026-02-13T16:37:32Z

This is coming together nicely!

However, I do not believe the ability to transform data with different channel order is working correctly. Please see the following tests that fail on my end (this is code added directly to the end of your test example above).

x3 = np.random.normal(0, 2.2, (20, 5, 44, 11))
x3_tensor = torch.from_numpy(x3)
x3_tensor.variable_names = ['a', 'b', 'c', 'd', 'e']

x4_tensor = torch.from_numpy(x3)
x4_tensor.variable_names = ['b', 'a', 'c', 'd', 'e'] # reverse the first and second channel dim

x3_transformed = total_scaler.transform(x3_tensor)
x4_transformed = total_scaler.transform(x4_tensor)

assert (x3_transformed[:, 2:, :, :] == x4_transformed[:, 2:, :, :]).all() ## passes
assert (x3_transformed[:, 0, :, :] == x4_transformed[:, 1, :, :]).all()   ## fails

kevinyang-cky · 2026-02-13T17:22:05Z

Thanks @charlie-becker for providing testing feedback. I think the issue is that x4_tensor's variable_names are reorder but the data is still in the same order as x3_tensor. Try changing this line of code x4_tensor = torch.from_numpy(x3) to x4_tensor = torch.from_numpy(x3[:,[1,0,2,3,4],:,:]), and it should give you a pass test. Let me know if you still run into issues.

I will continue working on incorporating more unit tests into the current test script today.

charlie-becker · 2026-02-13T17:34:30Z

@kevinyang-cky

Yup, you're exactly right! Passes no problem. Thank you for catching my testing bug!

djgagne · 2026-02-14T00:23:52Z

@kevinyang-cky I think the code looks good but will wait to approve and merge until you have added your remaining tests. I'm going to fix some of the docs and do some other library cleanup issues in my PR.

kevinyang-cky · 2026-02-14T00:27:26Z

@djgagne sounds good to me! I will tag you again when I have all the tests and the example adding into the docs. Have a great long weekend!

kevinyang-cky added 13 commits December 22, 2025 14:21

add scalers for tensors

74d8ca2

modified print_scaler() to convert tensors to NumPy arrays

858c5ec

modified read_scaler() to convert NumPy arrays to tensors

830d00c

add PyTorch check and public API

bede810

remove PyTorch check (already in __init__.py)

6a0d1e2

revert changes to backend.py

0c7ccb4

modify PyTorch check and import backend for tensors

9b738de

add PyTorch hard check

7dbea59

backend methods for tensors

b5d18f5

modify the require version and conditional torch imoports

9b9a4aa

modify the required version

ca18ee2

tensors placement

2ba8785

code optimization: avoid for-looping

a7d255f

kevinyang-cky requested a review from djgagne February 4, 2026 17:13

kevinyang-cky added 8 commits February 4, 2026 13:00

add path to environment

8bc88d6

installing in editable mode

5a3d33f

specify uv version

37c4101

fix syntax error

7ed3595

downgrade Python version

262423b

try this version combination

bb87cb0

use Python 3.9 in GitHub workflow and latest uv

9d8466b

fix syntax for backward compatibility

9eea944

kevinyang-cky mentioned this pull request Feb 5, 2026

Pin Pandas and NumPy versions #24

Closed

kevinyang-cky added 2 commits February 5, 2026 13:28

revert changes to pass workflow runs

18d1876

Pin Pandas and NumPy versions per Katelyn's suggestion

d192628

include attribute variable_names into operations for DStandardScalerT…

5775e2c

…ensor

kevinyang-cky force-pushed the main branch from 46c4d58 to 5775e2c Compare February 13, 2026 01:03

kevinyang-cky added 2 commits February 12, 2026 21:47

include attribute variable_names into operations for DMinMaxScalerTen…

03c950e

…sor and errors in DStandScalerTensor

fix columns check in DMinMaxScalerTensor()

37e0877

kevinyang-cky added 8 commits February 12, 2026 22:17

modified variable_names attribute decoding

3c615e8

Merge branch 'modify_backend4tensors'

c7c623a

modify package_transformed_x() to accomodate input data without attri…

bb095b0

…bute

Merge branch 'add_attribute'

d77c83f

allow the data to be transform to have less variables than the scaler

a4ec6c6

Merge branch 'add_attribute'

d2afae5

uncommont code

614acde

Merge branch 'add_attribute'

73b5a6b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance the capability of Bridgescaler for supporting tensors#23

Enhance the capability of Bridgescaler for supporting tensors#23
kevinyang-cky wants to merge 34 commits intoNCAR:mainfrom
kevinyang-cky:main

kevinyang-cky commented Feb 4, 2026

Uh oh!

djgagne commented Feb 9, 2026

Uh oh!

charlie-becker commented Feb 13, 2026

Uh oh!

kevinyang-cky commented Feb 13, 2026 •

edited

Loading

Uh oh!

charlie-becker commented Feb 13, 2026

Uh oh!

djgagne commented Feb 14, 2026

Uh oh!

kevinyang-cky commented Feb 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

kevinyang-cky commented Feb 4, 2026

Uh oh!

djgagne commented Feb 9, 2026

Uh oh!

charlie-becker commented Feb 13, 2026

Uh oh!

kevinyang-cky commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

charlie-becker commented Feb 13, 2026

Uh oh!

djgagne commented Feb 14, 2026

Uh oh!

kevinyang-cky commented Feb 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kevinyang-cky commented Feb 13, 2026 •

edited

Loading