Skip to content

h5read() crashes after h5set_extent() on a dataset of variable-length strings #172

@christiansteck

Description

@christiansteck

Hi,

I want to use an extendable dataset of variable length strings.
So far, I was not able to make it work, as h5read() crashes after using h5set_extent().

Below is a minimal reproducible example.

rhdf5::h5createFile("foo.h5")

rhdf5::h5createDataset(
  file = "foo.h5",
  dataset = "charset_test",
  dims = 0,
  maxdim = rhdf5::H5Sunlimited(),
  chunk = 10,
  size = NULL,
  storage.mode = "character"
)

rhdf5::h5set_extent(
  file = "foo.h5",
  dataset = "charset_test",
  dims = 2
)

rhdf5::h5read(
  file = "foo.h5",
  name = "charset_test"
)

This results in a segfault with the following traceback:

 *** caught segfault ***
address (nil), cause 'memory not mapped'

Traceback:
 1: H5Dread(h5dataset = h5dataset, h5spaceFile = h5spaceFile, h5spaceMem = h5spaceMem,     compoundAsDataFrame = compoundAsDataFrame, drop = drop, ...)
 2: doTryCatch(return(expr), name, parentenv, handler)
 3: tryCatchOne(expr, names, parentenv, handlers[[1L]])
 4: tryCatchList(expr, classes, parentenv, handlers)
 5: tryCatch({    obj <- H5Dread(h5dataset = h5dataset, h5spaceFile = h5spaceFile,         h5spaceMem = h5spaceMem, compoundAsDataFrame = compoundAsDataFrame,         drop = drop, ...)}, error = function(e) {    err <- h5checkFilters(h5dataset)    on.exit(H5Dclose(h5dataset))    if (nchar(err) > 0)         stop(err, call. = FALSE)    else stop(e)})
 6: h5readDataset(h5dataset, index = index, start = start, stride = stride,     block = block, count = count, compoundAsDataFrame = compoundAsDataFrame,     drop = drop, ...)
 7: rhdf5::h5read(file = "foo.h5", name = "charset_test")

Am I using something incorrectly or do I need to supply other/more parameters to h5createDataset()?
Any help would be greatly appreciated.


Some notes and additional information:

  • Writing data to the dataset after calling h5set_extent() but before calling h5read(), e.g. h5write(c("hello", "world"), "foo.h5", "charset_test") apparently fixes the issue and h5read() works fine, but I'd rather have the file itself use a suitable fill value for the new entries.
  • Setting fillValue = "" in h5createDataset() gives Error in H5Tset_size(tid, size) : HDF5. Invalid arguments to routine. Bad value.
  • Directly after h5set_extent(), the dataset contains NULL values and it seems to me that those are related to the crash:
$ h5dump -d charset_test foo.h5
HDF5 "foo.h5" {
DATASET "charset_test" {
   DATATYPE  H5T_STRING {
      STRSIZE H5T_VARIABLE;
      STRPAD H5T_STR_NULLPAD;
      CSET H5T_CSET_ASCII;
      CTYPE H5T_C_S1;
   }
   DATASPACE  SIMPLE { ( 2 ) / ( H5S_UNLIMITED ) }
   DATA {
   (0): NULL, NULL
   }
}
}
  • Setting e.g. size = 10 and fillValue = "1234567890" in h5createDataset() works fine, but I need (or at least prefer) variable-length strings as the experiment results come in one after another.
> R.version.string
[1] "R version 4.2.2 (2022-10-31)"
> packageVersion("rhdf5")
[1] ‘2.42.1> rhdf5::H5get_libversion()
majnum minnum relnum
     1     10      7

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions