FEAT: Adding variable length `StringDType` to/from QuadDType array casting. #244

SwayamInSync · 2025-12-19T15:04:11Z

As per the title

closes #224

SwayamInSync · 2025-12-19T15:22:13Z

Tests contribute the major part of diff

SwayamInSync · 2025-12-19T17:30:22Z

quaddtype/numpy_quaddtype/src/casts.cpp

+        }
+
+        quad_value out_val;
+        if (bytes_to_quad_convert(s.buf, s.size, backend, &out_val) < 0) {


utilising bytes_to_quad_convert instead of unicode_to_quad_convert because the later expects Py_UCS format otherwise both are doing the same thing

ngoldbaum

I spotted some minor issues and one bigger opportunity to avoid unnecessarily creating python strings.

https://github.com/numpy/numpy-user-dtypes/pull/244/changes#r2636204832 is a bigger comment; don't consider it a blocker if you disagree or don't want to spend time on it.

ngoldbaum · 2025-12-19T20:21:23Z

quaddtype/numpy_quaddtype/src/casts.cpp

+                                        npy_intp *view_offset)
+{
+    Py_INCREF(given_descrs[0]);
+    loop_descrs[0] = given_descrs[0];


If you do this after checking given_descrs[1] then there's no need to decref in the error paths below so it'll be a little clearer

ngoldbaum · 2025-12-19T20:22:13Z

quaddtype/numpy_quaddtype/src/casts.cpp

+        loop_descrs[1] = given_descrs[1];
+    }
+
+    // no notion of fix length, so always unsafe


I'd just delete this comment, I don't think it's correct. It's unsafe because arbitrary strings aren't generally convertible losslessly to quads.

ngoldbaum · 2025-12-19T20:22:38Z

quaddtype/numpy_quaddtype/src/casts.cpp

+                                        npy_intp *view_offset)
+{
+    Py_INCREF(given_descrs[0]);
+    loop_descrs[0] = given_descrs[0];


same as https://github.com/numpy/numpy-user-dtypes/pull/244/changes#r2636204832

ngoldbaum · 2025-12-19T20:25:40Z

quaddtype/numpy_quaddtype/src/casts.cpp

+
+        // Get string representation with adaptive notation
+        // Use a large buffer size to allow for full precision
+        PyObject *py_str = quad_to_string_adaptive(&sleef_val, QUAD_STR_WIDTH);


There's no need to create a PyUnicode object here. Just pass the ASCII bytes of the C parsed string to NpyString_Pack.

So actually this function quad_to_string_adaptive uses Dragon4 utilities like Dragon4_Positional_QuadDType and Dragon4_Scientific_QuadDType for conversion and they both returns a PyUnicode_FromString object and from them we extract the cstring from it.

I can modify the dragon4 helper or add one more helper with _cstr suffix that returns cstring and then this would be doable. Let me know if this sounds good?

Yeah, there should be a code path that bypasses creating a Python string.

ngoldbaum · 2025-12-19T20:27:34Z

quaddtype/tests/test_quaddtype.py

+        quad_array = str_array.astype(QuadPrecDType())
+
+        assert quad_array.shape == (size,)
+        np.testing.assert_array_equal(quad_array, np.array(str_values, dtype=QuadPrecDType()))


Consider consolidating the new tests you added in this PR with the existing tests for unicode fixed-width strings and byte arrays. Are there really enough special-cases for handling stringdtype to justify adding 200 LoC of new tests? IMO it'd be cleaner if you just encoded that as special cases of more generic tests for string support that are parameterized by a dtype parameter.

SwayamInSync · 2025-12-19T21:53:34Z

dragon4.c/.h only contains some duplicate helpers that return cstring instead of PyUnicode object
casts.cpp got another duplicate helper quad_to_string_adaptive_cstr that calls corresponding dragon4 utilities to return cstring
tests are consolidated into others

SwayamInSync · 2025-12-19T22:14:03Z

Cool, these changes should address all the comments

SwayamInSync added 3 commits December 17, 2025 11:39

initial stringdtype cast support

0c6322c

Merge branch 'main' into strdtype

14dac3d

adding tests

970b62c

SwayamInSync added this to the v1.0 milestone Dec 19, 2025

SwayamInSync added the numpy_quaddtype label Dec 19, 2025

SwayamInSync marked this pull request as draft December 19, 2025 15:15

use the bytes_to_quad helper

0703921

SwayamInSync marked this pull request as ready for review December 19, 2025 15:20

SwayamInSync commented Dec 19, 2025

View reviewed changes

SwayamInSync requested a review from ngoldbaum December 19, 2025 17:30

ngoldbaum reviewed Dec 19, 2025

View reviewed changes

SwayamInSync added 4 commits December 20, 2025 02:27

removed comment, fix given_descrs[0] check order

a3d326a

consolidated tests, removed some not needed

4d14322

StringDtype bypass PyUnicode creation

f0e6b1f

cover edges

87fd7dc

SwayamInSync added 2 commits December 20, 2025 03:27

replace -0 with already present

407d57a

remove empty array test

fc4c7fa

Uh oh!

FEAT: Adding variable length StringDType to/from QuadDType array casting. #244

Are you sure you want to change the base?

FEAT: Adding variable length StringDType to/from QuadDType array casting. #244

Conversation

SwayamInSync commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SwayamInSync commented Dec 19, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ngoldbaum left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SwayamInSync commented Dec 19, 2025

Uh oh!

SwayamInSync commented Dec 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

FEAT: Adding variable length `StringDType` to/from QuadDType array casting. #244

FEAT: Adding variable length `StringDType` to/from QuadDType array casting. #244

SwayamInSync commented Dec 19, 2025 •

edited

Loading