-
Notifications
You must be signed in to change notification settings - Fork 62
Open
Description
Really cool project! I am curious about the benchmarks, especially around reliability across runs.
Say we have workflows like these:
# Slack Channel Summary DM
# Fetches messages from #general and #random, summarizes each, and DMs you
import "slack" from "mcp:remote-slack-server"
agent fetcher:
model: haiku
skills: ["slack"]
prompt: "You fetch Slack messages"
agent summarizer:
model: sonnet
skills: ["slack"]
prompt: "You create concise one-line summaries"
# Fetch both channels in parallel
parallel:
general_msgs = session: fetcher
prompt: "Get messages from #general channel for the last 7 days"
random_msgs = session: fetcher
prompt: "Get messages from #random channel for the last 7 days"
# Summarize each channel in parallel
parallel:
general_summary = session: summarizer
prompt: "Write a single one-line summary of the key activity/themes"
context: general_msgs
random_summary = session: summarizer
prompt: "Write a single one-line summary of the key activity/themes"
context: random_msgs
# DM the summaries
session: summarizer
prompt: """
Send me a Slack DM with this format:
📊 *7-Day Channel Summary*
*#general:* {general_summary}
*#random:* {random_summary}
"""
context: { general_summary, random_summary }I am curious how prose program perform compared to detailed English instructions since its still inferred by LLMs:
1. Call slack__get_msgs with channel='general', num_days=7
2. Call slack__get_msgs with channel='random', num_days=7
3. Write a one-line summary of #general activity
4. Write a one-line summary of #random activity
5. Call slack__send_dm to me with both summaries formatted as:
"📊 7-Day Channel Summary"
"#general: <summary>"
"#random: <summary>"
The one above is obviously really simple but if we were to do this over 100 slack channels, then it gets hard & there's lot of variance across runs (completing without traversing all channels for example). Have you looked into this at all or can you share anything?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels