-
Notifications
You must be signed in to change notification settings - Fork 12
Add SDG starter kit v2 #32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Create a Starter Kit which deploys an LLM as an Endpoint for synthetic data generation and sends sample requests to the deployed model. Signed-Off-By: Robert Clark <roclark@nvidia.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR adds a new Jupyter notebook starter kit that demonstrates synthetic data generation for chat QA datasets using NVIDIA's Nemotron model on DGX Cloud Lepton. The notebook provides an end-to-end pipeline from deploying model endpoints to generating structured question-answer pairs for training reward models or DPO optimization. It includes two generation modes: general QA generation and math problem generation, showcasing versatility for different synthetic data use cases.
The notebook integrates with the existing DGX Cloud Lepton starter kit collection by following the established pattern of providing complete, executable workflows. It demonstrates key platform capabilities including endpoint deployment, async request handling with concurrency controls, and structured data export. The implementation uses modern async OpenAI client patterns with proper semaphore-based rate limiting to handle large-scale data generation efficiently.
Important Files Changed
| Filename | Score | Overview |
|---|---|---|
| lepton/starterkits/synthetic-qa-data-gen-with-nemotron.ipynb | 3/5 | New comprehensive starter kit for synthetic QA data generation with endpoint deployment and async request handling |
Confidence score: 3/5
- This PR requires careful review due to logical issues in critical functions that could cause infinite loops or runtime errors
- Score reflects well-structured notebook design but deducted points for endpoint waiting logic bug, fragile response parsing, and outdated exception handling patterns
- Pay close attention to the
wait_for_endpointfunction logic and exception handling patterns in the new notebook
Sequence Diagram
sequenceDiagram
participant User
participant "Jupyter Notebook"
participant "DGX Cloud Lepton"
participant "vLLM Endpoint"
participant "Nemotron Model"
participant "OpenAI Client"
participant "File System"
User->>+"Jupyter Notebook": "Configure environment variables and credentials"
"Jupyter Notebook"->>+"DGX Cloud Lepton": "Authenticate with LEPTON_KEY"
"DGX Cloud Lepton"-->>-"Jupyter Notebook": "Authentication successful"
"Jupyter Notebook"->>+"DGX Cloud Lepton": "Create endpoint with vLLM container"
"DGX Cloud Lepton"->>+"vLLM Endpoint": "Deploy Nemotron model container"
"vLLM Endpoint"->>+"Nemotron Model": "Load nvidia/nvidia-nemotron-nano-9b-v2"
"Nemotron Model"-->>-"vLLM Endpoint": "Model ready"
"vLLM Endpoint"-->>-"DGX Cloud Lepton": "Endpoint deployed"
"DGX Cloud Lepton"-->>-"Jupyter Notebook": "Endpoint URL and status"
"Jupyter Notebook"->>+"OpenAI Client": "Initialize client with endpoint URL"
"OpenAI Client"-->>-"Jupyter Notebook": "Client ready"
User->>+"Jupyter Notebook": "Generate limerick test request"
"Jupyter Notebook"->>+"OpenAI Client": "Send test completion request"
"OpenAI Client"->>+"vLLM Endpoint": "POST /v1/chat/completions"
"vLLM Endpoint"->>+"Nemotron Model": "Generate limerick about GPU computing"
"Nemotron Model"-->>-"vLLM Endpoint": "Stream response with reasoning"
"vLLM Endpoint"-->>-"OpenAI Client": "Streamed completion chunks"
"OpenAI Client"-->>-"Jupyter Notebook": "Display streamed output"
User->>+"Jupyter Notebook": "Generate subtopics for main topic"
"Jupyter Notebook"->>+"OpenAI Client": "Send subtopic generation request"
"OpenAI Client"->>+"vLLM Endpoint": "POST with TOPIC_GENERATION_PROMPT_TEMPLATE"
"vLLM Endpoint"->>+"Nemotron Model": "Generate 5 subtopics for Wales"
"Nemotron Model"-->>-"vLLM Endpoint": "Return comma-separated subtopics"
"vLLM Endpoint"-->>-"OpenAI Client": "Subtopics response"
"OpenAI Client"-->>-"Jupyter Notebook": "Parse and store subtopic list"
"Jupyter Notebook"->>+"OpenAI Client": "Generate questions for each subtopic (async batch)"
loop "For each subtopic"
"OpenAI Client"->>+"vLLM Endpoint": "POST with QUESTION_PROMPT_TEMPLATE"
"vLLM Endpoint"->>+"Nemotron Model": "Generate 5 questions for subtopic"
"Nemotron Model"-->>-"vLLM Endpoint": "Return newline-separated questions"
"vLLM Endpoint"-->>-"OpenAI Client": "Questions response"
end
"OpenAI Client"-->>-"Jupyter Notebook": "Collect all questions into single list"
"Jupyter Notebook"->>+"OpenAI Client": "Generate responses for questions (concurrent with semaphore)"
loop "For each question (max 5 concurrent)"
"OpenAI Client"->>+"vLLM Endpoint": "POST with RESPONSE_PROMPT_TEMPLATE"
"vLLM Endpoint"->>+"Nemotron Model": "Generate Response A and Response B"
"Nemotron Model"-->>-"vLLM Endpoint": "Return formatted responses"
"vLLM Endpoint"-->>-"OpenAI Client": "Response pair"
end
"OpenAI Client"-->>-"Jupyter Notebook": "Parse and structure question-response pairs"
"Jupyter Notebook"->>+"File System": "Save synthetic_data.jsonl"
"File System"-->>-"Jupyter Notebook": "File saved successfully"
User->>+"Jupyter Notebook": "Generate math problems"
"Jupyter Notebook"->>+"OpenAI Client": "Send math problem generation request"
"OpenAI Client"->>+"vLLM Endpoint": "POST with MATH_PROMPT_TEMPLATE"
"vLLM Endpoint"->>+"Nemotron Model": "Generate 5 algebra problems with solutions"
"Nemotron Model"-->>-"vLLM Endpoint": "Return tagged problem-solution pairs"
"vLLM Endpoint"-->>-"OpenAI Client": "Math problems response"
"OpenAI Client"-->>-"Jupyter Notebook": "Parse with regex and display results"
1 file reviewed, 1 comment
| " if ready and endpoint_url:\n", | ||
| " print(\"Endpoint deployed!\")\n", | ||
| " print(f\"URL: {endpoint_url}\")\n", | ||
| " \n", | ||
| " print(f\"Waiting for endpoint {endpoint_name} to be ready...\")\n", | ||
| " time.sleep(interval)\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Infinite loop issue - function returns endpoint_url when ready but then continues to wait loop
| " if ready and endpoint_url:\n", | |
| " print(\"Endpoint deployed!\")\n", | |
| " print(f\"URL: {endpoint_url}\")\n", | |
| " \n", | |
| " print(f\"Waiting for endpoint {endpoint_name} to be ready...\")\n", | |
| " time.sleep(interval)\n", | |
| if ready and endpoint_url: | |
| print("Endpoint deployed!") | |
| print(f"URL: {endpoint_url}") | |
| return endpoint_url | |
| print(f"Waiting for endpoint {endpoint_name} to be ready...") | |
| time.sleep(interval) |
Create a Starter Kit which deploys an LLM as an Endpoint for synthetic data generation and sends sample requests to the deployed model. Created with Robert Clark at NVIDIA