Skip to content

Large file handling by read_file tool #22

@marutilai

Description

@marutilai

Issue Description:

Currently, the read_file tool relies on the LLM to be proactive about reading large files by specifying start_line and end_line. This is brittle because:

  1. The agent doesn't know a file's size until it tries to read it.
  2. If the agent reads a large file without specifying a range, it can flood its own context window, leading to poor performance, high token costs, and potential loss of important conversational history.

This issue proposes upgrading read_file to be "smarter" by default. It should automatically detect large files and, instead of returning the full content, return a more useful, context-rich partial view. This is inspired by the robust handling seen in mature coding agents.

Action Items:

  1. When read_file is called without start_line or end_line parameters:

    • The tool must first determine the total line count of the target file.
    • If the line count is greater than a configurable threshold (MAX_READ_LINES, e.g., 1000), automatic truncation and summarization logic is triggered.
    • If the line count is less than or equal to the threshold, the tool should function as it does now, returning the full file content.
  2. When the truncation logic is triggered, the tool's JSON output must be enhanced to include:

    • content: The first MAX_READ_LINES of the file.
    • definitions: The output of calling the list_code_definition_names logic for that same file.
    • info: A new field containing a human-readable notice, e.g.,
      "File truncated. Showing first 1000 of 5280 lines. Code definitions are provided for a high-level overview."
  3. The read_file tool must still respect the user-provided start_line and end_line parameters, bypassing the large-file logic entirely if they are present.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions