Skip to content

Releases: bioscan-ml/dataset

v1.3.0

19 Apr 16:29

Choose a tag to compare

Release date: 2025-04-19. Full commit changelog.

This is a minor release which adds support for dictionary output format.

Added

  • Add support for dictionary outputs to __getitem__, which is enabled by setting new parameter output_format to "dict" (#53). The tuple output format, enabled by setting output_format="tuple", remains the default behaviour.

Fixed

  • Add support for NaN inputs to BIOSCAN1M.label2index and BIOSCAN5M.label2index (#55). This addresses the fact that missing labels in the taxonomic columns are returned as NaN, which was not supported in the previous version.

Documentation

  • Add a usage example for target_transform to the usage guide (#56).

Version 1.2.1

11 Apr 17:45

Choose a tag to compare

Release date: 2025-04-11. Full commit changelog.

This is a bugfix release which fixes minor issues.

Fixed

  • Fix handling of BIOSCAN5M(... split=None), which was indicated as supported in the type hint but didn't work any more due to updates in 1.2.0 (#46). Now it actually does work, but isn't indicated as supported in the type hint anymore.
  • Provide clearer error messages when some or all images are missing (#50).

Documentation

  • General documentation improvements (#49, #51).

Version 1.2.0

03 Apr 18:31

Choose a tag to compare

Release date: 2025-04-03. Full commit changelog.

This is a minor release adding some new features. In particular, CLIBD partitioning of BIOSCAN-1M is now supported, automatic download of BIOSCAN-1M is now supported, and multiple splits can be loaded at once by joining their names with "+" such as "pretrain+train".

Fixed

  • Sped up BIOSCAN5M load times by vectorizing the image path generation process (#28).
  • Avoid re-download and re-extraction of splits which were already correctly present, which previously could be triggered by other splits needing to be downloaded, for example when using metasplit "seen" or "all" when some (but not all) splits were already downloaded (#40).

Added

  • Added support for CLIBD partitioning of BIOSCAN-1M, using argument partitioning_version="clibd" to BIOSCAN1M (#25, #26, #30, #35).
  • Added automatic download support to BIOSCAN1M. This includes both the metadata CSV and the image files (#31, #37), and the CLIBD partitioning data (#33). As with BIOSCAN5M, data is lazily downloaded, so only additional files needed for the current dataset request are downloaded.
  • Added support for combinations of splits being specified joined with "+" such as split="pretrain+train" (#39, #40).
  • Added aliasing between "val" (BIOSCAN-5M) and "validation" (BIOSCAN-1M) split names (#38).
  • Added __all__ to better support from bioscan_dataset import * (#41).
  • Added type hinting (#44).
  • Added access to columns "processid" in BIOSCAN1M.metadata and both "area_fraction" and "scale_factor" in BIOSCAN5M.metadata (#43).
  • Added more detailed __repr__ information, which is shown when printing the dataset object (#34).
  • Improved error messages for bad split values or partitioning versions (#27, #32).

Documentation

  • General documentation improvements (#42, #44).

Version 1.1.0

27 Mar 04:09

Choose a tag to compare

Release date: 2025-03-27. Full commit changelog.

This is a minor release adding some new features.

Added

  • Added target_format argument which controls whether taxonomic labels are returned by __getitem__ as a strings or integers indicating the class index (#10). Thanks to @xl-huo for contributing this.
  • Added index2label and label2index properties to the dataset class to map between class indices and taxonomic labels (#12, #23).
  • Added support for arbitrary modality names, which are taken from the metadata, without the option to apply a transform to the data (#13).
  • Added image_package argument to BIOSCAN1M, to select the image package to use, as was alreaday implemented for BIOSCAN5M (#15).
  • Added an warning to BIOSCAN1M that is automatically raised if one of the requested target ranks is incompatible with the selected partitioning_version (#18). Thanks @kevinkasa for highlighting this.

Documentation

  • Changed color scheme to match bioscan-browser (#4). Thanks to @annavik for contributing to this.
  • Corrected example usage to use a single tuple, not nested (#5). Thanks to @xl-huo for reporting this.
  • General documentation improvements (#3, #11, #14, #16, #17, #22).

Version 1.0.1

07 Dec 17:58

Choose a tag to compare

Release date: 2024-12-07. Full commit changelog.

This is a bugfix release to address incorrect RGB stdev values.

Fixed

  • RGB_STDEV for bioscan1m and bioscan5m was corrected to address a miscalculation when estimating the pixel RGB standard deviation. (#2)

Documentation

  • Corrected example import of RGB_MEAN and RGB_STDEV. (#1)
  • General documentation fixes and improvements.

Version 1.0.0

04 Dec 04:49

Choose a tag to compare

Initial release.