Releases: bioscan-ml/dataset
v1.3.0
Release date: 2025-04-19. Full commit changelog.
This is a minor release which adds support for dictionary output format.
Added
- Add support for dictionary outputs to
__getitem__, which is enabled by setting new parameteroutput_formatto"dict"(#53). The tuple output format, enabled by settingoutput_format="tuple", remains the default behaviour.
Fixed
- Add support for NaN inputs to
BIOSCAN1M.label2indexandBIOSCAN5M.label2index(#55). This addresses the fact that missing labels in the taxonomic columns are returned as NaN, which was not supported in the previous version.
Documentation
- Add a usage example for
target_transformto the usage guide (#56).
Version 1.2.1
Release date: 2025-04-11. Full commit changelog.
This is a bugfix release which fixes minor issues.
Fixed
- Fix handling of
BIOSCAN5M(... split=None), which was indicated as supported in the type hint but didn't work any more due to updates in 1.2.0 (#46). Now it actually does work, but isn't indicated as supported in the type hint anymore. - Provide clearer error messages when some or all images are missing (#50).
Documentation
Version 1.2.0
Release date: 2025-04-03. Full commit changelog.
This is a minor release adding some new features. In particular, CLIBD partitioning of BIOSCAN-1M is now supported, automatic download of BIOSCAN-1M is now supported, and multiple splits can be loaded at once by joining their names with "+" such as "pretrain+train".
Fixed
- Sped up BIOSCAN5M load times by vectorizing the image path generation process (#28).
- Avoid re-download and re-extraction of splits which were already correctly present, which previously could be triggered by other splits needing to be downloaded, for example when using metasplit
"seen"or"all"when some (but not all) splits were already downloaded (#40).
Added
- Added support for CLIBD partitioning of BIOSCAN-1M, using argument
partitioning_version="clibd"to BIOSCAN1M (#25, #26, #30, #35). - Added automatic download support to BIOSCAN1M. This includes both the metadata CSV and the image files (#31, #37), and the CLIBD partitioning data (#33). As with BIOSCAN5M, data is lazily downloaded, so only additional files needed for the current dataset request are downloaded.
- Added support for combinations of splits being specified joined with
"+"such assplit="pretrain+train"(#39, #40). - Added aliasing between
"val"(BIOSCAN-5M) and"validation"(BIOSCAN-1M) split names (#38). - Added
__all__to better supportfrom bioscan_dataset import *(#41). - Added type hinting (#44).
- Added access to columns
"processid"inBIOSCAN1M.metadataand both"area_fraction"and"scale_factor"inBIOSCAN5M.metadata(#43). - Added more detailed
__repr__information, which is shown when printing the dataset object (#34). - Improved error messages for bad split values or partitioning versions (#27, #32).
Documentation
Version 1.1.0
Release date: 2025-03-27. Full commit changelog.
This is a minor release adding some new features.
Added
- Added
target_formatargument which controls whether taxonomic labels are returned by__getitem__as a strings or integers indicating the class index (#10). Thanks to @xl-huo for contributing this. - Added
index2labelandlabel2indexproperties to the dataset class to map between class indices and taxonomic labels (#12, #23). - Added support for arbitrary modality names, which are taken from the metadata, without the option to apply a transform to the data (#13).
- Added
image_packageargument to BIOSCAN1M, to select the image package to use, as was alreaday implemented for BIOSCAN5M (#15). - Added an warning to BIOSCAN1M that is automatically raised if one of the requested target ranks is incompatible with the selected
partitioning_version(#18). Thanks @kevinkasa for highlighting this.
Documentation
Version 1.0.1
Release date: 2024-12-07. Full commit changelog.
This is a bugfix release to address incorrect RGB stdev values.
Fixed
- RGB_STDEV for bioscan1m and bioscan5m was corrected to address a miscalculation when estimating the pixel RGB standard deviation. (#2)
Documentation
- Corrected example import of RGB_MEAN and RGB_STDEV. (#1)
- General documentation fixes and improvements.
Version 1.0.0
Initial release.