ENH: Fixed-length strings in read_csv

### Feature Type

- [ ] Adding new functionality to pandas

- [x] Changing existing functionality in pandas

- [ ] Removing existing functionality in pandas


### Problem Description

I am not sure if I am looking in the right direction for this very simple problem so please advise.

I think the shortest formulation is this: I would like `read_csv` to use fixed-length string column data types where it currently uses "object" column data type.

### Feature Description

Some combination of arguments in `read_csv` prohibiting 'object' data type and, instead, promoting appropriately-sized fixed-length string data types as a replacement. This will (hopefully) enable processing huge csv files while spawning a reasonably small number of python objects.

For my application, I am not even interested in the pandas dataframe itself but rather in `read_csv` that would return a numpy record array which I can use with cython (the latter does not support object-typed fields because reference counting with C structs is not really possible).

### Alternative Solutions

Chatbot suggested this:

```python
import numpy as np
import pandas as pd

def csv_to_recarray(filepath, delimiter=','):
    # Read CSV with pandas - single pass, infers types well
    df = pd.read_csv(filepath, delimiter=delimiter)
    
    # Build dtype for record array
    dtype_list = []
    for col in df.columns:
        if pd.api.types.is_integer_dtype(df[col]):
            dtype_list.append((col, 'i8'))
        elif pd.api.types.is_float_dtype(df[col]):
            dtype_list.append((col, 'f8'))
        elif pd.api.types.is_bool_dtype(df[col]):
            dtype_list.append((col, '?'))
        else:
            # String column - get max length
            max_len = df[col].astype(str).str.len().max()
            # Add buffer for safety, minimum 1
            max_len = max(1, max_len + 10)
            dtype_list.append((col, f'U{max_len}'))
    
    # Convert to record array
    rec_array = np.rec.array(
        [tuple(row) for row in df.values],
        dtype=dtype_list
    )
    
    return rec_array

# Usage
rec_array = csv_to_recarray('data.csv')
```

### Additional Context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH: Fixed-length strings in read_csv #63373

Feature Type

Problem Description

Feature Description

Alternative Solutions

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

ENH: Fixed-length strings in read_csv #63373

Description

Feature Type

Problem Description

Feature Description

Alternative Solutions

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions