-
Notifications
You must be signed in to change notification settings - Fork 20
Open
Description
Hi,
I want to use an extendable dataset of variable length strings.
So far, I was not able to make it work, as h5read() crashes after using h5set_extent().
Below is a minimal reproducible example.
rhdf5::h5createFile("foo.h5")
rhdf5::h5createDataset(
file = "foo.h5",
dataset = "charset_test",
dims = 0,
maxdim = rhdf5::H5Sunlimited(),
chunk = 10,
size = NULL,
storage.mode = "character"
)
rhdf5::h5set_extent(
file = "foo.h5",
dataset = "charset_test",
dims = 2
)
rhdf5::h5read(
file = "foo.h5",
name = "charset_test"
)This results in a segfault with the following traceback:
*** caught segfault ***
address (nil), cause 'memory not mapped'
Traceback:
1: H5Dread(h5dataset = h5dataset, h5spaceFile = h5spaceFile, h5spaceMem = h5spaceMem, compoundAsDataFrame = compoundAsDataFrame, drop = drop, ...)
2: doTryCatch(return(expr), name, parentenv, handler)
3: tryCatchOne(expr, names, parentenv, handlers[[1L]])
4: tryCatchList(expr, classes, parentenv, handlers)
5: tryCatch({ obj <- H5Dread(h5dataset = h5dataset, h5spaceFile = h5spaceFile, h5spaceMem = h5spaceMem, compoundAsDataFrame = compoundAsDataFrame, drop = drop, ...)}, error = function(e) { err <- h5checkFilters(h5dataset) on.exit(H5Dclose(h5dataset)) if (nchar(err) > 0) stop(err, call. = FALSE) else stop(e)})
6: h5readDataset(h5dataset, index = index, start = start, stride = stride, block = block, count = count, compoundAsDataFrame = compoundAsDataFrame, drop = drop, ...)
7: rhdf5::h5read(file = "foo.h5", name = "charset_test")Am I using something incorrectly or do I need to supply other/more parameters to h5createDataset()?
Any help would be greatly appreciated.
Some notes and additional information:
- Writing data to the dataset after calling
h5set_extent()but before callingh5read(), e.g.h5write(c("hello", "world"), "foo.h5", "charset_test")apparently fixes the issue andh5read()works fine, but I'd rather have the file itself use a suitable fill value for the new entries. - Setting
fillValue = ""inh5createDataset()givesError in H5Tset_size(tid, size) : HDF5. Invalid arguments to routine. Bad value. - Directly after
h5set_extent(), the dataset containsNULLvalues and it seems to me that those are related to the crash:
$ h5dump -d charset_test foo.h5
HDF5 "foo.h5" {
DATASET "charset_test" {
DATATYPE H5T_STRING {
STRSIZE H5T_VARIABLE;
STRPAD H5T_STR_NULLPAD;
CSET H5T_CSET_ASCII;
CTYPE H5T_C_S1;
}
DATASPACE SIMPLE { ( 2 ) / ( H5S_UNLIMITED ) }
DATA {
(0): NULL, NULL
}
}
}- Setting e.g.
size = 10andfillValue = "1234567890"inh5createDataset()works fine, but I need (or at least prefer) variable-length strings as the experiment results come in one after another.
> R.version.string
[1] "R version 4.2.2 (2022-10-31)"
> packageVersion("rhdf5")
[1] ‘2.42.1’
> rhdf5::H5get_libversion()
majnum minnum relnum
1 10 7Metadata
Metadata
Assignees
Labels
No labels