-
-
Notifications
You must be signed in to change notification settings - Fork 15
FEAT: Adding variable length StringDType to/from QuadDType array casting.
#244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Tests contribute the major part of diff |
| } | ||
|
|
||
| quad_value out_val; | ||
| if (bytes_to_quad_convert(s.buf, s.size, backend, &out_val) < 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utilising bytes_to_quad_convert instead of unicode_to_quad_convert because the later expects Py_UCS format otherwise both are doing the same thing
ngoldbaum
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I spotted some minor issues and one bigger opportunity to avoid unnecessarily creating python strings.
https://github.com/numpy/numpy-user-dtypes/pull/244/changes#r2636204832 is a bigger comment; don't consider it a blocker if you disagree or don't want to spend time on it.
| npy_intp *view_offset) | ||
| { | ||
| Py_INCREF(given_descrs[0]); | ||
| loop_descrs[0] = given_descrs[0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you do this after checking given_descrs[1] then there's no need to decref in the error paths below so it'll be a little clearer
| loop_descrs[1] = given_descrs[1]; | ||
| } | ||
|
|
||
| // no notion of fix length, so always unsafe |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd just delete this comment, I don't think it's correct. It's unsafe because arbitrary strings aren't generally convertible losslessly to quads.
| npy_intp *view_offset) | ||
| { | ||
| Py_INCREF(given_descrs[0]); | ||
| loop_descrs[0] = given_descrs[0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
|
||
| // Get string representation with adaptive notation | ||
| // Use a large buffer size to allow for full precision | ||
| PyObject *py_str = quad_to_string_adaptive(&sleef_val, QUAD_STR_WIDTH); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no need to create a PyUnicode object here. Just pass the ASCII bytes of the C parsed string to NpyString_Pack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So actually this function quad_to_string_adaptive uses Dragon4 utilities like Dragon4_Positional_QuadDType and Dragon4_Scientific_QuadDType for conversion and they both returns a PyUnicode_FromString object and from them we extract the cstring from it.
I can modify the dragon4 helper or add one more helper with _cstr suffix that returns cstring and then this would be doable. Let me know if this sounds good?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, there should be a code path that bypasses creating a Python string.
| quad_array = str_array.astype(QuadPrecDType()) | ||
|
|
||
| assert quad_array.shape == (size,) | ||
| np.testing.assert_array_equal(quad_array, np.array(str_values, dtype=QuadPrecDType())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider consolidating the new tests you added in this PR with the existing tests for unicode fixed-width strings and byte arrays. Are there really enough special-cases for handling stringdtype to justify adding 200 LoC of new tests? IMO it'd be cleaner if you just encoded that as special cases of more generic tests for string support that are parameterized by a dtype parameter.
|
|
Cool, these changes should address all the comments |
As per the title
closes #224