Skip to content

Handle SystemError from corrupted cache files gracefully#79

Closed
Copilot wants to merge 2 commits intomasterfrom
copilot/fix-negative-size-error
Closed

Handle SystemError from corrupted cache files gracefully#79
Copilot wants to merge 2 commits intomasterfrom
copilot/fix-negative-size-error

Conversation

Copy link

Copilot AI commented Nov 30, 2025

When the shelve cache file becomes corrupted, shelved_cache raises SystemError: Negative size passed to PyBytes_FromStringAndSize during initialization, crashing wbdata on import. Users had to manually delete the cache directory to recover.

Changes

  • Cache corruption recovery: Catch SystemError during cache initialization, remove corrupted files, and recreate a fresh cache
  • File cleanup helper: Added _remove_cache_files() to handle shelve's various file extensions (.db, .dir, .bak, .dat, etc.)
  • Tests: Added coverage for cache functionality and corruption recovery
# Before: crashes on import with corrupted cache
import wbdata  # SystemError: Negative size passed to PyBytes_FromStringAndSize

# After: automatically recovers
import wbdata  # Warning logged, fresh cache created, import succeeds
Original prompt

This section details on the original issue you should resolve

<issue_title>SystemError: Negative size passed to PyBytes_FromStringAndSize</issue_title>
<issue_description>Run on Mac OS 15.3.2, Python 3.11, wbdata 1.1.0. After calling wbdata.get_data(indicator=indicators), error happened at the import sentence.

This can be fixed by rm -rf ~/Library/Caches/wbdata.

However, I must do that every time after calling wbdata.get_data(indicator=indicators)

Traceback (most recent call last):
  File "/Users/sunsealucky/Desktop/DataSphere/Program/DataSphere.py", line 3, in <module>
    from WorldBank import WorldBank
  File "/Users/sunsealucky/Desktop/DataSphere/Program/WorldBank.py", line 1, in <module>
    import wbdata
  File "/Users/sunsealucky/miniconda3/envs/datasphere/lib/python3.11/site-packages/wbdata/__init__.py", line 19, in <module>
    get_data = get_default_client().get_data
               ^^^^^^^^^^^^^^^^^^^^
  File "/Users/sunsealucky/miniconda3/envs/datasphere/lib/python3.11/site-packages/wbdata/__init__.py", line 16, in get_default_client
    return Client()
           ^^^^^^^^
  File "<string>", line 7, in __init__
  File "/Users/sunsealucky/miniconda3/envs/datasphere/lib/python3.11/site-packages/wbdata/client.py", line 174, in __post_init__
    cache=cache.get_cache(
          ^^^^^^^^^^^^^^^^
  File "/Users/sunsealucky/miniconda3/envs/datasphere/lib/python3.11/site-packages/wbdata/cache.py", line 79, in get_cache
    cache.expire()
    ^^^^^^^^^^^^
  File "/Users/sunsealucky/miniconda3/envs/datasphere/lib/python3.11/site-packages/shelved_cache/persistent_cache.py", line 106, in __getattr__
    self.initialize_if_not_initialized()
  File "/Users/sunsealucky/miniconda3/envs/datasphere/lib/python3.11/site-packages/shelved_cache/persistent_cache.py", line 150, in initialize_if_not_initialized
    raise e
  File "/Users/sunsealucky/miniconda3/envs/datasphere/lib/python3.11/site-packages/shelved_cache/persistent_cache.py", line 128, in initialize_if_not_initialized
    for hk, (k, v) in self.persistent_dict.items():
  File "<frozen _collections_abc>", line 860, in __iter__
  File "/Users/sunsealucky/miniconda3/envs/datasphere/lib/python3.11/shelve.py", line 95, in __iter__
    for k in self.dict.keys():
             ^^^^^^^^^^^^^^^^
SystemError: Negative size passed to PyBytes_FromStringAndSize

Here is a part of my project:

import wbdata
import pandas as pd
from LLM import LLM

class WorldBank:
    def __init__(self) -> None:
        # ======================= Init word bank api ======================= #
        self.get_data = wbdata.get_data
        self.get_series = wbdata.get_series
        self.get_dataframe = wbdata.get_dataframe
        self.get_countries = wbdata.get_countries
        self.get_indicators = wbdata.get_indicators
        self.get_incomelevels = wbdata.get_incomelevels
        self.get_lendingtypes = wbdata.get_lendingtypes
        self.get_sources = wbdata.get_sources
        self.get_topics = wbdata.get_topics
        self.llm = LLM()
        with open('prompt_pattern/api/resource_prompt.txt', encoding='utf-8') as f:
            self.resource_prompt = f.read()
        with open('prompt_pattern/api/indicator_prompt.txt', encoding='utf-8') as f:
            self.indicator_prompt = f.read()
        print("WordBank API initialized.")
    
    def query_resources(self, sentence: str) -> str:        
        # ===================== Construct query content ==================== #
        content = (
            f'\nBelow is query sentence: \n{sentence}' +
            f'\nBelow is resources provided to choose from: \n{self.get_sources()}'
        )
        
        # ===================== Reason resources index ===================== #
        res = self.llm.query(f"{self.resource_prompt}\n{content}")
        if not res.isdigit():
            raise ValueError(f'Invalid model output index: "{res}". As the provided content is \n\n {content}')
        print(f'✅ Choose resource index: {res} ')
        return res
        
    def query_indicators(self, sentence: str) -> str:
        resources_index = self.query_resources(sentence=sentence)
        
        # ===================== Construct query content ==================== #
        df = pd.DataFrame(self.get_indicators(source=resources_index))        
        content = (
            f'\nBelow is query sentence: {sentence}' +
            '\n\nBelow is indicators provided to choose from: \n' + str(df[['name']])
        )
        
        # ===================== Reason resources index ===================== #
        idx:str = self.llm.query(f"{self.indicator_prompt}\n{content}")
        if not idx.isdigit():
            raise ValueError(f'Invalid model output index: "{idx}". As the provided content is \n\n {content}')  
        print(f"✅ Choose indicator index: {idx} " + df.loc[int(idx), 'id'])      
        return df.loc[int(idx), 'id']
    
    def query(self, sentence: str):
        print("🧠 Analyzing b...

</details>

- Fixes OliverSherouse/wbdata#78

<!-- START COPILOT CODING AGENT TIPS -->
---

✨ Let Copilot coding agent [set things up for you](https://github.com/OliverSherouse/wbdata/issues/new?title=✨+Set+up+Copilot+instructions&body=Configure%20instructions%20for%20this%20repository%20as%20documented%20in%20%5BBest%20practices%20for%20Copilot%20coding%20agent%20in%20your%20repository%5D%28https://gh.io/copilot-coding-agent-tips%29%2E%0A%0A%3COnboard%20this%20repo%3E&assignees=copilot) — coding agent works faster and does higher quality work when set up for your repo.

Co-authored-by: OliverSherouse <1217314+OliverSherouse@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix SystemError for wbdata on Mac OS Handle SystemError from corrupted cache files gracefully Nov 30, 2025
Copilot AI requested a review from OliverSherouse November 30, 2025 13:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments