Skip to content

Conversation

@roclark
Copy link
Member

@roclark roclark commented Oct 27, 2025

Create a Starter Kit which deploys an LLM as an Endpoint for synthetic data generation and sends sample requests to the deployed model.

Created on behalf of Anna Ollerenshaw at NVIDIA.

Create a Starter Kit which deploys an LLM as an Endpoint for synthetic
data generation and sends sample requests to the deployed model.

Signed-Off-By: Robert Clark <roclark@nvidia.com>
@roclark roclark self-assigned this Oct 27, 2025
@roclark roclark added the starter-kits Submission of a new Starter Kit notebook label Oct 27, 2025
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR adds a new Jupyter notebook starter kit (synthetic-qa-data-gen-with-nemotron.ipynb) to the lepton/starterkits directory. The notebook demonstrates end-to-end synthetic data generation using DGX Cloud Lepton Endpoints with the Nemotron-nano-9b-v2 model. It guides users through deploying an LLM endpoint, generating subtopics and questions about CUDA programming, creating paired responses, generating math problems, and saving the results to JSONL format for downstream training tasks like reward modeling or DPO. The notebook leverages AsyncOpenAI for concurrent API calls, implements semaphore-based rate limiting, and includes detailed prompt templates for each generation stage. It fits into the existing starterkits collection by providing a practical example of synthetic data workflows on DGX Cloud Lepton infrastructure, complementing other starter kits that focus on fine-tuning, evaluation, and model deployment patterns.

Important Files Changed

Filename Score Overview
lepton/starterkits/synthetic-qa-data-gen-with-nemotron.ipynb 3/5 New starter kit notebook demonstrating synthetic data generation pipeline with Nemotron-nano-9b-v2 via Lepton Endpoints, including async request handling and JSONL output

Confidence score: 3/5

  • This PR introduces pedagogical content with several implementation patterns that may cause runtime issues or silent data loss if users run it without modification.
  • Score reflects five distinct concerns: (1) empty SAVE_DIRECTORY defaults to current directory, risking data overwrite; (2) wait_for_endpoint may return URL before endpoint is fully ready due to loop logic; (3) generate_response creates a new semaphore on every call when sem=None, breaking concurrency control; (4) zip(..., strict=False) silently drops data if list lengths mismatch; (5) bare exception handlers in generate_math can hide critical failures and leave incomplete output.
  • Pay close attention to lepton/starterkits/synthetic-qa-data-gen-with-nemotron.ipynb, particularly the configuration cell (lines 84-91), wait_for_endpoint function (lines 138-157), generate_response function (lines 521-532), the zip operation (lines 566-576), and exception handling in generate_math (lines 709-712).

Sequence Diagram

sequenceDiagram
    participant User
    participant Notebook
    participant LeptonCLI as "Lepton CLI"
    participant Endpoint as "Model Endpoint"
    participant OpenAIClient as "OpenAI Client"
    participant LLM as "Nemotron LLM"
    participant FileSystem as "File System"

    User->>Notebook: "Configure environment variables"
    User->>LeptonCLI: "lep login -c $LEPTON_KEY"
    LeptonCLI-->>User: "Authenticated"
    
    User->>LeptonCLI: "lep endpoint create"
    LeptonCLI->>Endpoint: "Deploy model with vLLM"
    Notebook->>Endpoint: "Wait for endpoint to be ready"
    Endpoint-->>Notebook: "Endpoint URL"
    
    User->>OpenAIClient: "Initialize client with endpoint URL"
    OpenAIClient-->>User: "Client ready"
    
    User->>OpenAIClient: "Generate subtopics from topic"
    OpenAIClient->>LLM: "POST /v1/chat/completions (TOPIC_GENERATION_PROMPT)"
    LLM-->>OpenAIClient: "Subtopics list"
    OpenAIClient-->>Notebook: "Parse and store subtopics"
    
    User->>Notebook: "Generate questions for each subtopic"
    loop "For each subtopic (async)"
        Notebook->>OpenAIClient: "generate_questions(subtopic)"
        OpenAIClient->>LLM: "POST /v1/chat/completions (QUESTION_PROMPT)"
        LLM-->>OpenAIClient: "Questions list"
        OpenAIClient-->>Notebook: "Store questions"
    end
    
    User->>Notebook: "Generate responses for questions"
    loop "For each question (async with semaphore)"
        Notebook->>OpenAIClient: "generate_response(question)"
        OpenAIClient->>LLM: "POST /v1/chat/completions (RESPONSE_PROMPT)"
        LLM-->>OpenAIClient: "Response A & B"
        OpenAIClient-->>Notebook: "Store response pair"
    end
    
    Notebook->>FileSystem: "Write synthetic_data.jsonl"
    FileSystem-->>User: "Dataset saved"
    
    opt "Generate math problems"
        User->>Notebook: "Generate math problems"
        Notebook->>OpenAIClient: "generate_math(topic)"
        OpenAIClient->>LLM: "POST /v1/chat/completions (MATH_PROMPT)"
        LLM-->>OpenAIClient: "Math problems and solutions"
        OpenAIClient-->>Notebook: "Parse problem-solution pairs"
        Notebook-->>User: "Display math problems"
    end
Loading

1 file reviewed, 7 comments

Edit Code Review Agent Settings | Greptile

"\n",
"BASE_MODEL = \"nvidia/nvidia-nemotron-nano-9b-v2\" # Optionally use a different model for requests\n",
"ACCESS_TOKEN = \"my-access-token\" # Set the password for authenticating endpoint requests\n",
"SAVE_DIRECTORY = \"\" # Specify the absolute path to save the generated data. To save on shared storage, must be a mounted storage path\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Empty SAVE_DIRECTORY will cause os.path.join to save to the current directory root. Consider a default like "./output" or add validation to ensure users specify a path. Should this fail early if empty, or is saving to the current working directory acceptable?

Comment on lines +138 to +154
"def wait_for_endpoint(endpoint_name: str, interval: int = 10) -> str:\n",
" command = [\"lep\", \"endpoint\", \"status\", \"-n\", endpoint_name, \"--detail\"]\n",
" while True:\n",
" result = subprocess.run(command, capture_output=True, text=True, check=True)\n",
" for line in result.stdout.split(\"\\n\"):\n",
" if line.startswith(\"State\"):\n",
" _, state = line.strip().rsplit(\" \", maxsplit=1)\n",
" if \"LeptonDeploymentState.Ready\" in state:\n",
" print(\"Endpoint deployed!\")\n",
" else:\n",
" break\n",
" url_match = re.search(r'https://[\\w\\d\\.\\-]+', line)\n",
" if url_match:\n",
" print(f\"URL: {url_match[0]}\")\n",
" return url_match[0]\n",
" print(f\"Waiting for endpoint {endpoint_name} to be ready...\")\n",
" time.sleep(interval)\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: wait_for_endpoint runs subprocess.run with check=True but doesn't handle CalledProcessError. If the CLI command fails, this will crash instead of retrying or logging.

" url_match = re.search(r'https://[\\w\\d\\.\\-]+', line)\n",
" if url_match:\n",
" print(f\"URL: {url_match[0]}\")\n",
" return url_match[0]\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Returns inside the for line loop. If url_match is found before the state check, endpoint may not actually be ready yet.

"outputs": [],
"source": [
"responses = await generate_subtopics(client, topic=topic, n_subtopics=n_subtopics)\n",
"nonreasoning_answer = re.sub(r'.*</think>', \"\", responses.choices[0].message.content, flags=re.DOTALL).strip()"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Regex assumes reasoning is enclosed in </think>. If model outputs differently (or no reasoning), this will strip nothing or strip incorrectly.

"source": [
"async def generate_response(client, question, sem=None):\n",
" prompt = RESPONSE_PROMPT_TEMPLATE.format(question=question)\n",
" async with sem or asyncio.Semaphore(1): \n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: async with sem or asyncio.Semaphore(1) creates a new semaphore on each call if sem=None. This defeats concurrency control—pass a real semaphore or use a default at the function level.

"source": [
"question_response_pair_list = []\n",
"\n",
"for question, response_set in zip(question_list_formatted, question_response_list, strict=False):\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: strict=False in zip silently truncates if lists differ in length. This may hide data-loss bugs if generation failed for some questions.

Comment on lines +709 to +710
" except Exception as e:\n",
" print(f\"Attempt {attempt+1}/{n_retries} failed: {e}\")\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Bare except Exception silently swallows all errors including KeyboardInterrupt derivatives. Consider catching specific exceptions or re-raising after logging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

starter-kits Submission of a new Starter Kit notebook

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants