Add iterative deep search loop #14

marcelsamyn · 2025-05-28T10:28:28Z

Summary

implement formatSearchResultsWithIds to give search results stable IDs
iterate deep search over multiple turns with LLM refinement

Testing

npx prettier -w src/lib/formatting.ts src/lib/jobs/deep-research.ts
pnpm exec vitest run

gemini-code-assist

Hello @iamarcel, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

Hello! Gemini here, providing a summary of this pull request to help everyone get up to speed quickly. This PR introduces an iterative deep search loop into the system. The main goal is to allow the deep research process to perform multiple search queries and refine the results based on feedback from an LLM, making the search process more dynamic and potentially more relevant over several turns. It also adds functionality to assign temporary IDs to search results so the LLM can easily reference specific items for refinement.

Highlights

Iterative Deep Search: The core change is the implementation of an iterative loop within the deep research job. Instead of a single search execution, the system can now perform multiple search queries sequentially, refining the results in each step.
LLM Refinement Step: A new function refineSearchResults is introduced to interact with an LLM. This LLM is given the conversation context, search queries, and current results, and it provides feedback on which results to drop and potentially suggests a new query for the next iteration.
Temporary IDs for Results: To enable the LLM to reference specific search results for refinement, a new function formatSearchResultsWithIds is added. This function formats the search results into XML, including a temporary ID for each result item.
Search Loop Control: The iterative search process is controlled by a maximum number of loops (MAX_SEARCH_LOOPS) and can terminate early if the LLM indicates it is 'done' or if the query queue is empty.

Changelog

Click here to see the changelog

src/lib/formatting.ts
- Removed a blank line (L93).
- Added SearchResultWithId type (L152).
- Added formatSearchResultsWithIds function (L155-L178) to format search results with temporary IDs for LLM referencing.
src/lib/jobs/deep-research.ts
- Imported new types and function from ../formatting (L5-L8).
- Imported TemporaryIdMapper (L22).
- Added MAX_SEARCH_LOOPS constant (L38) to limit the iterative search depth.
- Updated comment regarding initial queries (L53).
- Replaced the direct call to executeDeepSearchQueries with a call to the new runIterativeSearch function (L65-L70).
- Updated caching logic to handle the result from runIterativeSearch (L73).
- Added the runIterativeSearch function (L128-L191) which implements the iterative search loop, including query management, search execution, deduplication, temporary ID mapping, and calling the refinement step.
- Added the refineSearchResults function (L202-L247) which interacts with the LLM using performStructuredAnalysis to get feedback on results and potential next queries, using a Zod schema for structured output.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Search results flow,
LLM helps them to grow,
Iterate and refine.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces an iterative deep search loop with LLM refinement and a utility to format search results with stable IDs. The overall approach is a significant enhancement to the deep research capabilities. The code is generally well-structured, but there are a few critical and high-severity issues related to ID generation and XML escaping that need to be addressed before merging. Additionally, some medium-severity points regarding type safety and robustness could be improved.

Summary of Findings

Critical ID Generation Flaw: The TemporaryIdMapper in runIterativeSearch uses an ID generation strategy that will lead to duplicate ID errors and crash the process. This needs to be fixed by ensuring globally unique IDs within the mapper's instance.
XML Injection Vulnerability: In refineSearchResults, message.content and query strings are directly embedded into XML prompts for the LLM without escaping. This is a high-severity issue that can lead to malformed XML or prompt injection. escapeXml should be used.
Robustness of XML Formatting: The formatSearchResultsWithIds function in formatting.ts lacks a default case in its switch statement, potentially leading to 'undefined' in the output XML if new result groups are added. (Severity: Medium)
Type Safety in Deduplication: The use of (r.item as any).id in runIterativeSearch for deduplication bypasses type safety. While currently functional, it's a maintainability risk. (Severity: Medium)
IIFE for inner variable (Not Commented): In src/lib/formatting.ts, the IIFE used to assign the inner variable in formatSearchResultsWithIds is a bit unconventional and could be simplified. (Severity: Low - not commented due to review settings)
Minor Code Removal (Not Commented): An empty line was removed in src/lib/formatting.ts. This is a minor stylistic change. (Severity: Low - not commented due to review settings)
Function Length (Not Commented): The runIterativeSearch function in src/lib/jobs/deep-research.ts is quite long. Consider if parts could be extracted for better readability and maintainability. (Severity: Low - not commented due to review settings)

Merge Readiness

This pull request introduces valuable iterative search functionality. However, due to the identified critical and high severity issues, particularly concerning the TemporaryIdMapper ID generation and XML escaping for LLM prompts, I recommend that these changes not be merged until these issues are addressed. Addressing the medium severity issues would also improve the code's robustness and maintainability. As a reviewer, I am not authorized to approve pull requests, but I advise further review and resolution of these points before merging.

gemini-code-assist · 2025-05-28T10:30:32Z

src/lib/jobs/deep-research.ts

+  const mapper = new TemporaryIdMapper<SearchResults[number], string>(
+    (_item, idx) => `r${idx + 1}`,
+  );


The current ID generation strategy for TemporaryIdMapper, (_item, idx) => r${idx + 1}``, will cause issues. The idx is relative to the `dedup` array passed to `mapper.mapItems(dedup)` in each iteration of the `while` loop (line 169). If `mapItems` is called multiple times with new items (e.g., `dedup` has 2 items in loop 1, then 2 different items in loop 2), it will attempt to generate IDs like 'r1', 'r2' again for the new items. The `TemporaryIdMapper` will then correctly throw a "Duplicate temporary ID generated" error because these IDs would have already been mapped to items from previous iterations. This will crash the deep research process.

To fix this, the ID generation needs to ensure uniqueness across all items added to this specific mapper instance throughout its lifecycle. A simple counter scoped to the runIterativeSearch function and incremented for each new ID would work.

Suggested change

const mapper = new TemporaryIdMapper<SearchResults[number], string>(

(_item, idx) => `r${idx + 1}`,

);

let results: SearchResultWithId[] = [];

let tempIdCounter = 0; // Initialize a counter for unique IDs

const mapper = new TemporaryIdMapper<SearchResults[number], string>(

(_item, _idx) => `r${++tempIdCounter}`, // Use the incrementing counter

);

gemini-code-assist · 2025-05-28T10:30:33Z

src/lib/jobs/deep-research.ts

+  const messageContext = messages
+    .map((m) => `<message role="${m.role}">${m.content}</message>`)
+    .join("\n");


The messageContext is constructed by directly embedding m.content into the XML string: <message role="${m.role}">${m.content}</message>. If m.content contains XML special characters (e.g., <, >, &), this could lead to malformed XML for the LLM or potential prompt injection vulnerabilities.

The escapeXml utility function (available in src/lib/formatting.ts) should be used to sanitize m.content before embedding it. You'll need to import escapeXml into this file.

Suggested change

const messageContext = messages

.map((m) => `<message role="${m.role}">${m.content}</message>`)

.join("\n");

const messageContext = messages

.map((m) => `<message role="${m.role}">${escapeXml(m.content)}</message>`)

.join("\n");

gemini-code-assist · 2025-05-28T10:30:33Z

src/lib/jobs/deep-research.ts

+  const messageContext = messages
+    .map((m) => `<message role="${m.role}">${m.content}</message>`)
+    .join("\n");
+  const queriesXml = queries.map((q) => `<query>${q}</query>`).join("\n");


Similar to the messageContext, the queriesXml is constructed by directly embedding the query string q: <query>${q}</query>. If a query q contains XML special characters, it could also lead to malformed XML or affect how the LLM interprets the prompt.

It's recommended to use escapeXml for each query q as well. You'll need to import escapeXml if it's not already.

Suggested change

const queriesXml = queries.map((q) => `<query>${q}</query>`).join("\n");

const queriesXml = queries.map((q) => `<query>${escapeXml(q)}</query>`).join("\n");

gemini-code-assist · 2025-05-28T10:30:33Z

src/lib/formatting.ts

+          const inner = (() => {
+            switch (r.group) {
+              case "similarNodes":
+                return formatSearchNode(r.item);
+              case "similarEdges":
+                return formatSearchEdge(r.item);
+              case "connections":
+                return formatSearchConnection(r.item);
+            }
+          })();


The switch statement for determining the inner XML content based on r.group currently lacks a default case. If an unknown r.group value is encountered (e.g., if new search result types are added in the future but this function isn't updated), inner would become undefined, leading to malformed XML like <result id="...">undefined</result>.

Consider adding a default case to handle unknown groups gracefully. This could involve logging a warning and returning an empty string, or throwing an error, depending on the desired behavior. This would make the function more robust to future changes.

const inner = (() => { switch (r.group) { case "similarNodes": return formatSearchNode(r.item); case "similarEdges": return formatSearchEdge(r.item); case "connections": return formatSearchConnection(r.item); default: // Optional: Exhaustive check for switch if r.group is a strict union // const _exhaustiveCheck: never = r.group; console.warn(`[formatSearchResultsWithIds] Unhandled search result group: ${r.group}`); return ""; // Or throw an error } })();

gemini-code-assist · 2025-05-28T10:30:33Z

src/lib/jobs/deep-research.ts

+      );
+      if (res) {
+        const dedup = res.filter((r) => {
+          const key = `${r.group}:${(r.item as any).id}`;


Using (r.item as any).id for constructing the deduplication key works because all current item types in SearchGroups indeed have an id property. However, as any bypasses type checking and reduces type safety. If a new item type without an id property were introduced into SearchGroups later, this would lead to runtime errors or incorrect behavior.

Could this be made more type-safe? For example, by ensuring that all types within SearchResults[number]['item'] conform to an interface that guarantees an id property, or by using a type guard before accessing .id.

marcelsamyn · 2025-10-07T07:51:24Z

@codex there are a bunch of good comments in this review. I'd like you to tackle them and in the meantime, perform a complete robust review of this feature to verify whether it really is implemented in the best way possible, with good separation of concerns, utilities being reused and/or created with this, perfectly type-safe etc.

chatgpt-codex-connector · 2025-10-07T08:02:45Z

Summary

Secured the iterative deep-search loop by escaping conversation and refinement queries, switching to a counter-based temporary ID generator, and removing the unsafe as any deduplication key. src/lib/jobs/deep-research.tsL95-L192
Exported escapeXml, sanitized node/label formatting helpers, and made the XML result switches exhaustive to surface unknown result groups early. src/lib/formatting.tsL32-L195

Testing

✅ pnpm exec vitest run
⚠️ pnpm exec tsc --noEmit (fails because the repo lacks the optional mysql2 dependency required by drizzle’s typings)

View task →

feat(search): iterative deep research

19b1600

marcelsamyn added the codex label May 28, 2025 — with ChatGPT Codex Connector

gemini-code-assist bot reviewed May 28, 2025

View reviewed changes

gemini-code-assist bot suggested changes May 28, 2025

View reviewed changes

marcelsamyn added 2 commits October 7, 2025 13:05

Harden deep research search loop

6edecd6

fixup

6877f90

marcelsamyn merged commit 9fbd221 into main Oct 7, 2025
1 check failed

marcelsamyn deleted the qv8p45-codex/iterate-background-search-with-refinement branch October 7, 2025 11:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add iterative deep search loop #14

Add iterative deep search loop #14

Uh oh!

marcelsamyn commented May 28, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot May 28, 2025

Uh oh!

gemini-code-assist bot May 28, 2025

Uh oh!

gemini-code-assist bot May 28, 2025

Uh oh!

gemini-code-assist bot May 28, 2025

Uh oh!

gemini-code-assist bot May 28, 2025

Uh oh!

marcelsamyn commented Oct 7, 2025

Uh oh!

chatgpt-codex-connector bot commented Oct 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	const queriesXml = queries.map((q) => `<query>${q}</query>`).join("\n");
	const queriesXml = queries.map((q) => `<query>${escapeXml(q)}</query>`).join("\n");

Add iterative deep search loop #14

Add iterative deep search loop #14

Uh oh!

Conversation

marcelsamyn commented May 28, 2025

Summary

Testing

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Changelog

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Summary of Findings

Merge Readiness

Uh oh!

gemini-code-assist bot May 28, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot May 28, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot May 28, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot May 28, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot May 28, 2025

Choose a reason for hiding this comment

Uh oh!

marcelsamyn commented Oct 7, 2025

Uh oh!

chatgpt-codex-connector bot commented Oct 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants