From ef0dcbe9b0bf656619cde38ebca3ed378ac751e8 Mon Sep 17 00:00:00 2001 From: James Olds <12104969+oldsj@users.noreply.github.com> Date: Thu, 15 Jan 2026 11:26:17 -0500 Subject: [PATCH 1/5] Align docs with actual project state - Remove Agent Workflow section from README (aspirational, not implemented) - Move agent workflow ideas to ROADMAP.md for future reference - Delete docs/DBOS.md (external library docs don't belong in repo) - Update architecture.md to simplify session types description - Add context7 guidance to CLAUDE.md for fetching external docs - Fix DBOS link in README to point to official docs Co-Authored-By: Claude Opus 4.5 --- CLAUDE.md | 1 + README.md | 41 +- ROADMAP.md | 41 ++ docs/DBOS.md | 1435 ------------------------------------------ docs/architecture.md | 4 +- 5 files changed, 45 insertions(+), 1477 deletions(-) create mode 100644 ROADMAP.md delete mode 100644 docs/DBOS.md diff --git a/CLAUDE.md b/CLAUDE.md index 37627a8..41860a2 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -115,6 +115,7 @@ Tests are organized into projects by execution mode: - **Svelte 5 runes**: `$state`, `$derived`, `$effect`, `$props` - **API calls**: Use `$lib/api.ts`, never hardcode URLs - **DBOS workflows**: Bump `WORKFLOW_VERSION` in `dbos_config.py` when changing workflow logic +- **External docs**: Use context7 MCP to fetch up-to-date documentation for any library (DBOS, Svelte, Playwright, etc.) - **HTML**: Be explicit, don't rely on browser defaults (`type="button"`, `rel="noopener"`, etc.) - **Responsive layouts**: Use `isMobile` store to conditionally render, not CSS hide (avoids duplicate DOM elements) - **K8s scripts**: Always use explicit `--context kind-${KIND_CLUSTER_NAME:-mainloop-test}` in kubectl commands to avoid targeting wrong cluster diff --git a/README.md b/README.md index 52ae061..7945396 100644 --- a/README.md +++ b/README.md @@ -36,7 +36,7 @@ You (phone/laptop) - **Main thread**: One continuous conversation — sessions spawn inline and surface results back - **Sessions**: Background AI work with their own conversations; appear as colored threads in your timeline - **Notifications**: Slack-style thread replies notify you when sessions need attention or complete -- **Persistence**: Conversations and sessions survive restarts via compaction + [DBOS](docs/DBOS.md) +- **Persistence**: Conversations and sessions survive restarts via compaction + [DBOS](https://docs.dbos.dev/) ## Quick Start @@ -98,49 +98,10 @@ mainloop/ 3. Completed and failed sessions 4. Each session has its own conversation you can zoom into -## Agent Workflow - -Agents follow a structured workflow: **plan in issue → implement in draft PR → iterate until CI green → ready for human review**. - -```text -┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ -│ Planning │────►│ Draft │────►│ Iteration │────►│ Review │ -│ (GH Issue) │ │ (PR) │ │ (CI Loop) │ │ (Human) │ -└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ -``` - -### Phases - -1. **Planning (GitHub Issue)** - Agent creates/updates an issue with problem analysis, proposed approach, and implementation plan. The issue is the "thinking out loud" space before code. - -2. **Draft PR** - Agent creates a draft PR linked to the issue. Implements in small, logical commits. Uses PR comments to narrate progress and decisions. - -3. **Iteration (CI Loop)** - Agent polls GitHub Actions after each push. On failure: analyzes logs, fixes, commits. Continues until green checkmark. - -4. **Ready for Review** - Agent marks PR ready and adds summary comment. Human reviewer steps in for final approval. - -### Verification - -Agents use these tools to verify work before marking ready: - -- **LSP server integration** - Real-time type/lint errors -- **`trunk` CLI** - Unified super-linter -- **Project test suites** - Via GitHub Actions - -### Project Template (Future) - -| Component | Purpose | -| -------------- | ------------------------------------ | -| GitHub Actions | CI pipeline (lint, type-check, test) | -| K8s/Helm | Preview environments per PR | -| CNPG operator | Dynamic test databases | -| trunk.yaml | Unified linter config | - ## Documentation - [Architecture](docs/architecture.md) - System design and data flow - [Development](docs/development.md) - Local setup and commands -- [DBOS Workflows](docs/DBOS.md) - Durable task orchestration - [Contributing](CONTRIBUTING.md) - How to contribute to mainloop ## License diff --git a/ROADMAP.md b/ROADMAP.md new file mode 100644 index 0000000..d7d59b7 --- /dev/null +++ b/ROADMAP.md @@ -0,0 +1,41 @@ +# Roadmap + +Future ideas and features under consideration. + +## Agent Workflow Automation + +Structured workflow for code sessions: **plan in issue → implement in draft PR → iterate until CI green → ready for human review**. + +```text +┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ +│ Planning │────►│ Draft │────►│ Iteration │────►│ Review │ +│ (GH Issue) │ │ (PR) │ │ (CI Loop) │ │ (Human) │ +└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ +``` + +### Phases + +1. **Planning (GitHub Issue)** - Agent creates/updates an issue with problem analysis, proposed approach, and implementation plan. The issue is the "thinking out loud" space before code. + +2. **Draft PR** - Agent creates a draft PR linked to the issue. Implements in small, logical commits. Uses PR comments to narrate progress and decisions. + +3. **Iteration (CI Loop)** - Agent polls GitHub Actions after each push. On failure: analyzes logs, fixes, commits. Continues until green checkmark. + +4. **Ready for Review** - Agent marks PR ready and adds summary comment. Human reviewer steps in for final approval. + +### Verification Tools + +- **LSP server integration** - Real-time type/lint errors +- **`trunk` CLI** - Unified super-linter +- **Project test suites** - Via GitHub Actions + +## Project Template + +Standardized setup for repositories that work well with mainloop agents. + +| Component | Purpose | +| -------------- | ------------------------------------ | +| GitHub Actions | CI pipeline (lint, type-check, test) | +| K8s/Helm | Preview environments per PR | +| CNPG operator | Dynamic test databases | +| trunk.yaml | Unified linter config | diff --git a/docs/DBOS.md b/docs/DBOS.md deleted file mode 100644 index df5e86b..0000000 --- a/docs/DBOS.md +++ /dev/null @@ -1,1435 +0,0 @@ -# Build Reliable Applications With DBOS - -## Guidelines - -- Respond in a friendly and concise manner -- Ask clarifying questions when requirements are ambiguous -- Generate code in Python using the DBOS library -- You MUST import all methods and classes used in the code you generate -- You SHALL keep all code in a single file unless otherwise specified. -- DBOS does NOT stand for anything. - -## Workflow Guidelines - -Workflows provide durable execution so you can write programs that are resilient to any failure. -Workflows are comprised of steps, which are ordinary Python functions annotated with @DBOS.step(). -When using DBOS workflows, you should annotate any function that performs complex operations or accesses external APIs or services as a step. -You can turn any Python function into a step by annotating it with the @DBOS.step decorator. The only requirement is that its inputs and outputs should be serializable. - -If a workflow is interrupted for any reason (e.g., an executor restarts or crashes), when your program restarts the workflow automatically resumes execution from the last completed step. - -- If asked to add DBOS to existing code, you MUST ask which function to make a workflow. Do NOT recommend any changes until they have told you what function to make a workflow. Do NOT make a function a workflow unless SPECIFICALLY requested. -- When making a function a workflow, you should make all functions it calls steps. Do NOT change the functions in any way except by adding the @Step annotation. -- Do NOT make functions steps unless they are DIRECTLY called by a workflow. -- If the workflow function performs a non-deterministic action, you MUST move that action to its own function and make that function a step. Examples of non-deterministic actions include accessing an external API or service, accessing files on disk, generating a random number, of getting the current time. -- Do NOT use threads to start workflows or to start steps in workflows. You should instead use DBOS.start_workflow and DBOS queues. -- DBOS workflows and steps should NOT have side effects in memory outside of their own scope. They can access global variables, but they should NOT create or update global variables or variables outside their scope. -- Do NOT call DBOS.start_workflow or DBOS.recv from a step -- Do NOT start workflows from inside a step. -- Do NOT call DBOS.set_event and DBOS.recv from outside a workflow. - -## DBOS Lifecycle Guidelines - -Any DBOS program MUST configure the DBOS constructor at the top and MUST call DBOS.launch() in its main function. -DBOS must always be configured like so, unless otherwise specified: - -```python -import os -from dbos import DBOS, DBOSConfig - -config: DBOSConfig = { - "name": "my-app", - "system_database_url": os.environ.get("DBOS_SYSTEM_DATABASE_URL"), -} -DBOS(config=config) -``` - -And DBOS.launch() should always be called in the main function like so: - -```python -if __name__ == "__main__": - DBOS.launch() -``` - -In a FastAPI application, the server should ALWAYS be started explicitly after a DBOS.launch in the main function: - -```python -if __name__ == "__main__": - DBOS.launch() - uvicorn.run(app, host="0.0.0.0", port=8000) -``` - -If an app contains scheduled workflows and NOTHING ELSE (no HTTP server), then the main thread should block forever while the scheduled workflows run like this: - -```python -if __name__ == "__main__": - DBOS.launch() - threading.Event().wait() -``` - -Or if using asyncio: - -```python -import asyncio -from dbos import DBOS, DBOSConfig - -config: DBOSConfig = { - "name": "dbos-app" -} -DBOS(config=config) - - -async def main(): - DBOS.launch() - await asyncio.Event().wait() - -if __name__ == "__main__": - asyncio.run(main()) -``` - -## Workflow and Steps Examples - -Simple example: - -```python -import os -from dbos import DBOS, DBOSConfig - -config: DBOSConfig = { - "name": "dbos-starter", - "system_database_url": os.environ.get("DBOS_SYSTEM_DATABASE_URL"), -} -DBOS(config=config) - -@DBOS.step() -def step_one(): - print("Step one completed!") - -@DBOS.step() -def step_two(): - print("Step two completed!") - -@DBOS.workflow() -def dbos_workflow(): - step_one() - step_two() - -if __name__ == "__main__": - DBOS.launch() - dbos_workflow() -``` - -Example with FastAPI: - -```python -import os - -from dbos import DBOS, DBOSConfig -from fastapi import FastAPI - -app = FastAPI() -config: DBOSConfig = { - "name": "dbos-starter", - "system_database_url": os.environ.get("DBOS_SYSTEM_DATABASE_URL"), -} -DBOS(config=config) - -@DBOS.step() -def step_one(): - print("Step one completed!") - -@DBOS.step() -def step_two(): - print("Step two completed!") - -@app.get("/") -@DBOS.workflow() -def dbos_workflow(): - step_one() - step_two() - -if __name__ == "__main__": - DBOS.launch() - uvicorn.run(app, host="0.0.0.0", port=8000) -``` - -Example with queues: - -```python -import os -import time - -from dbos import DBOS, DBOSConfig, Queue -from fastapi import FastAPI - -app = FastAPI() -config: DBOSConfig = { - "name": "dbos-starter", - "system_database_url": os.environ.get("DBOS_SYSTEM_DATABASE_URL"), -} -DBOS(config=config) - -queue = Queue("example-queue") - -@DBOS.step() -def dbos_step(n: int): - time.sleep(5) - print(f"Step {n} completed!") - -@app.get("/") -@DBOS.workflow() -def dbos_workflow(): - print("Enqueueing steps") - handles = [] - for i in range(10): - handle = queue.enqueue(dbos_step, i) - handles.append(handle) - results = [handle.get_result() for handle in handles] - print(f"Successfully completed {len(results)} steps") - -if __name__ == "__main__": - DBOS.launch() - uvicorn.run(app, host="0.0.0.0", port=8000) -``` - -### Scheduled Workflow - -You can schedule DBOS workflows to run exactly once per time interval. To do this, annotate the workflow with the @DBOS.scheduled decorator and specify the schedule in crontab syntax. For example: - -```python -@DBOS.scheduled("* * * * *") -@DBOS.workflow() -def run_every_minute(scheduled_time, actual_time): - print(f"I am a scheduled workflow. It is currently {scheduled_time}.") -``` - -- A scheduled workflow MUST specify a crontab schedule. -- It MUST take in two arguments, scheduled and actual time. Both are datetime.datetimes of when the workflow started. - -## Workflow Documentation - ---- - -sidebar_position: 10 -title: Workflows -toc_max_heading_level: 3 - ---- - -Workflows provide **durable execution** so you can write programs that are **resilient to any failure**. -Workflows help you write fault-tolerant background tasks, data processing pipelines, AI agents, and more. - -You can make a function a workflow by annotating it with `@DBOS.workflow()`. -Workflows call steps, which are Python functions annotated with `@DBOS.step()`. -If a workflow is interrupted for any reason, DBOS automatically recovers its execution from the last completed step. - -Here's an example of a workflow: - -```python -@DBOS.step() -def step_one(): - print("Step one completed!") - -@DBOS.step() -def step_two(): - print("Step two completed!") - -@DBOS.workflow() -def workflow(): - step_one() - step_two() -``` - -## Starting Workflows In The Background - -One common use-case for workflows is building reliable background tasks that keep running even when the program is interrupted, restarted, or crashes. -You can use `DBOS.start_workflow` to start a workflow in the background. -If you start a workflow this way, it returns a workflow handle, from which you can access information about the workflow or wait for it to complete and retrieve its result. - -Here's an example: - -```python -@DBOS.workflow() -def background_task(input): - # ... - return output - -# Start the background task -handle: WorkflowHandle = DBOS.start_workflow(background_task, input) -# Wait for the background task to complete and retrieve its result. -output = handle.get_result() -``` - -After starting a workflow in the background, you can use `DBOS.retrieve_workflow` to retrieve a workflow's handle from its ID. -You can also retrieve a workflow's handle from outside of your DBOS application with `DBOSClient.retrieve_workflow`. - -If you need to run many workflows in the background and manage their concurrency or flow control, you can also use DBOS queues. - -## Workflow IDs and Idempotency - -Every time you execute a workflow, that execution is assigned a unique ID, by default a UUID. -You can access this ID through the `DBOS.workflow_id` context variable. -Workflow IDs are useful for communicating with workflows and developing interactive workflows. - -You can set the workflow ID of a workflow with `SetWorkflowID`. -Workflow IDs must be **globally unique** for your application. -An assigned workflow ID acts as an idempotency key: if a workflow is called multiple times with the same ID, it executes only once. -This is useful if your operations have side effects like making a payment or sending an email. -For example: - -```python -@DBOS.workflow() -def example_workflow(): - DBOS.logger.info(f"I am a workflow with ID {DBOS.workflow_id}") - -with SetWorkflowID("very-unique-id"): - example_workflow() -``` - -## Determinism - -Workflows are in most respects normal Python functions. -They can have loops, branches, conditionals, and so on. -However, a workflow function must be **deterministic**: if called multiple times with the same inputs, it should invoke the same steps with the same inputs in the same order (given the same return values from those steps). -If you need to perform a non-deterministic operation like accessing the database, calling a third-party API, generating a random number, or getting the local time, you shouldn't do it directly in a workflow function. -Instead, you should do all database operations in transactions and all other non-deterministic operations in steps. - -For example, **don't do this**: - -```python -@DBOS.workflow() -def example_workflow(): - choice = random.randint(0, 1) - if choice == 0: - step_one() - else: - step_two() -``` - -Do this instead: - -```python -@DBOS.step() -def generate_choice(): - return random.randint(0, 1) - -@DBOS.workflow() -def example_workflow(friend: str): - choice = generate_choice() - if choice == 0: - step_one() - else: - step_two() -``` - -## Workflow Timeouts - -You can set a timeout for a workflow with `SetWorkflowTimeout`. -When the timeout expires, the workflow **and all its children** are cancelled. -Cancelling a workflow sets its status to `CANCELLED` and preempts its execution at the beginning of its next step. - -Timeouts are **start-to-completion**: if a workflow is enqueued, the timeout does not begin until the workflow is dequeued and starts execution. -Also, timeouts are **durable**: they are stored in the database and persist across restarts, so workflows can have very long timeouts. - -Example syntax: - -```python -@DBOS.workflow() -def example_workflow(): - ... - -# If the workflow does not complete within 10 seconds, it times out and is cancelled -with SetWorkflowTimeout(10): - example_workflow() -``` - -## Durable Sleep - -You can use `DBOS.sleep()` to put your workflow to sleep for any period of time. -This sleep is **durable**—DBOS saves the wakeup time in the database so that even if the workflow is interrupted and restarted multiple times while sleeping, it still wakes up on schedule. - -Sleeping is useful for scheduling a workflow to run in the future (even days, weeks, or months from now). -For example: - -```python -@DBOS.workflow() -def schedule_task(time_to_sleep, task): - # Durably sleep for some time before running the task - DBOS.sleep(time_to_sleep) - run_task(task) -``` - -## Debouncing Workflows - -You can create a `Debouncer` to debounce your workflows. -Debouncing delays workflow execution until some time has passed since the workflow has last been called. -This is useful for preventing wasted work when a workflow may be triggered multiple times in quick succession. -For example, if a user is editing an input field, you can debounce their changes to execute a processing workflow only after they haven't edited the field for some time: - -### Debouncer.create - -```python -Debouncer.create( - workflow: Callable[P, R], - *, - debounce_timeout_sec: Optional[float] = None, - queue: Optional[Queue] = None, -) -> Debouncer[P, R] -``` - -**Parameters:** - -- `workflow`: The workflow to debounce. -- `debounce_key`: The debounce key for this debouncer. Used to group workflow executions that will be debounced. For example, if the debounce key is set to customer ID, each customer's workflows would be debounced separately. -- `debounce_timeout_sec`: After this time elapses since the first time a workflow is submitted from this debouncer, the workflow is started regardless of the debounce period. -- `queue`: When starting a workflow after debouncing, enqueue it on this queue instead of executing it directly. - -### debounce - -```python -debouncer.debounce( - debounce_key: str, - debounce_period_sec: float, - *args: P.args, - **kwargs: P.kwargs, -) -> WorkflowHandle[R] -``` - -Submit a workflow for execution but delay it by `debounce_period_sec`. -Returns a handle to the workflow. -The workflow may be debounced again, which further delays its execution (up to `debounce_timeout_sec`). -When the workflow eventually executes, it uses the **last** set of inputs passed into `debounce`. - -After the workflow begins execution, the next call to `debounce` starts the debouncing process again for a new workflow execution. - -**Parameters:** - -- `debounce_key`: A key used to group workflow executions that will be debounced together. For example, if the debounce key is set to customer ID, each customer's workflows would be debounced separately. -- `debounce_period_sec`: Delay this workflow's execution by this period. -- `*args`: Variadic workflow arguments. -- `**kwargs`: Variadic workflow keyword arguments. - -**Example Syntax**: - -```python -@DBOS.workflow() -def process_input(user_input): - ... - -# Each time a user submits a new input, debounce the process_input workflow. -# The workflow will wait until 60 seconds after the user stops submitting new inputs, -debouncer = Debouncer.create(process_input) -# then process the last input submitted. -def on_user_input_submit(user_id, user_input): - debounce_key = user_id - debounce_period_sec = 60 - debouncer.debounce(debounce_key, debounce_period_sec, user_input) -``` - -### Debouncer.create_async - -```python -Debouncer.create_async( - workflow: Callable[P, Coroutine[Any, Any, R]], - *, - debounce_timeout_sec: Optional[float] = None, - queue: Optional[Queue] = None, -) -> Debouncer[P, R] -``` - -Async version of `Debouncer.create`. - -### debounce_async - -```python -debouncer.debounce_async( - debounce_key: str, - debounce_period_sec: float, - *args: P.args, - **kwargs: P.kwargs, -) -> WorkflowHandleAsync[R]: -``` - -Async version of `debouncer.debounce`. - -## Coroutine (Async) Workflows - -Coroutinues (functions defined with `async def`, also known as async functions) can also be DBOS workflows. -Coroutine workflows may invoke coroutine steps via await expressions. -You should start coroutine workflows using `DBOS.start_workflow_async` and enqueue them using `enqueue_async`. -Calling a coroutine workflow or starting it with `DBOS.start_workflow_async` always runs it in the same event loop as its caller, but enqueueing it with `enqueue_async` starts the workflow in a different event loop. -Additionally, coroutine workflows should use the asynchronous versions of the workflow communication context methods. - -:::tip - -At this time, DBOS does not support coroutine transactions. -To execute transaction functions without blocking the event loop, use `asyncio.to_thread`. - -::: - -```python -@DBOS.step() -async def example_step(): - async with aiohttp.ClientSession() as session: - async with session.get("https://example.com") as response: - return await response.text() - -@DBOS.workflow() -async def example_workflow(friend: str): - await DBOS.sleep_async(10) - body = await example_step() - result = await asyncio.to_thread(example_transaction, body) - return result -``` - -## Workflow Versioning and Recovery - -Because DBOS recovers workflows by re-executing them using information saved in the database, a workflow cannot safely be recovered if its code has changed since the workflow was started. -To guard against this, DBOS _versions_ applications and their workflows. -When DBOS is launched, it computes an application version from a hash of the source code of its workflows (this can be overridden through the `application_version`) configuration parameter. -All workflows are tagged with the application version on which they started. - -When DBOS tries to recover workflows, it only recovers workflows whose version matches the current application version. -This prevents unsafe recovery of workflows that depend on different code. -You cannot change the version of a workflow, but you can use `DBOS.fork_workflow` to restart a workflow from a specific step on a specific code version. - -## Communicating with Workflows - -DBOS provides a few different ways to communicate with your workflows. -You can: - -- Send messages to workflows -- Publish events from workflows for clients to read -- Stream values from workflows to clients - -## Workflow Messaging and Notifications - -You can send messages to a specific workflow. -This is useful for signaling a workflow or sending notifications to it while it's running. - -### Send - -```python -DBOS.send( - destination_id: str, - message: Any, - topic: Optional[str] = None -) -> None -``` - -You can call `DBOS.send()` to send a message to a workflow. -Messages can optionally be associated with a topic and are queued on the receiver per topic. - -You can also call `send` from outside of your DBOS application with the DBOS Client. - -### Recv - -```python -DBOS.recv( - topic: Optional[str] = None, - timeout_seconds: float = 60, -) -> Any -``` - -Workflows can call `DBOS.recv()` to receive messages sent to them, optionally for a particular topic. -Each call to `recv()` waits for and consumes the next message to arrive in the queue for the specified topic, returning `None` if the wait times out. -If the topic is not specified, this method only receives messages sent without a topic. - -### Messages Example - -Messages are especially useful for sending notifications to a workflow. -For example, in the widget store demo, the checkout workflow, after redirecting customers to a payments page, must wait for a notification that the user has paid. - -To wait for this notification, the payments workflow uses `recv()`, executing failure-handling code if the notification doesn't arrive in time: - -```python -@DBOS.workflow() -def checkout_workflow(): - ... # Validate the order, then redirect customers to a payments service. - payment_status = DBOS.recv(PAYMENT_STATUS) - if payment_status is not None and payment_status == "paid": - ... # Handle a successful payment. - else: - ... # Handle a failed payment or timeout. -``` - -An endpoint waits for the payment processor to send the notification, then uses `send()` to forward it to the workflow: - -```python -@app.post("/payment_webhook/{workflow_id}/{payment_status}") -def payment_endpoint(payment_id: str, payment_status: str) -> Response: - # Send the payment status to the checkout workflow. - DBOS.send(payment_id, payment_status, PAYMENT_STATUS) -``` - -### Reliability Guarantees - -All messages are persisted to the database, so if `send` completes successfully, the destination workflow is guaranteed to be able to `recv` it. -If you're sending a message from a workflow, DBOS guarantees exactly-once delivery. -If you're sending a message from normal Python code, you can use `SetWorkflowID` with an idempotency key to guarantee exactly-once delivery. - -## Workflow Events - -Workflows can publish _events_, which are key-value pairs associated with the workflow. -They are useful for publishing information about the status of a workflow or to send a result to clients while the workflow is running. - -### set_event - -```python -DBOS.set_event( - key: str, - value: Any, -) -> None -``` - -Any workflow or step can call `DBOS.set_event` to publish a key-value pair, or update its value if has already been published. - -### get_event - -```python -DBOS.get_event( - workflow_id: str, - key: str, - timeout_seconds: float = 60, -) -> None -``` - -You can call `DBOS.get_event` to retrieve the value published by a particular workflow identity for a particular key. -If the event does not yet exist, this call waits for it to be published, returning `None` if the wait times out. - -You can also call `get_event` from outside of your DBOS application with DBOS Client. - -### get_all_events - -```python -DBOS.get_all_events( - workflow_id: str -) -> Dict[str, Any] -``` - -You can use `DBOS.get_all_events` to retrieve the latest values of all events published by a workflow. - -### Events Example - -Events are especially useful for writing interactive workflows that communicate information to their caller. -For example, in the widget store demo, the checkout workflow, after validating an order, needs to send the customer a unique payment ID. -To communicate the payment ID to the customer, it uses events. - -The payments workflow emits the payment ID using `set_event()`: - -```python -@DBOS.workflow() -def checkout_workflow(): - ... - payment_id = ... - dbos.set_event(PAYMENT_ID, payment_id) - ... -``` - -The FastAPI handler that originally started the workflow uses `get_event()` to await this payment ID, then returns it: - -```python -@app.post("/checkout/{idempotency_key}") -def checkout_endpoint(idempotency_key: str) -> Response: - # Idempotently start the checkout workflow in the background. - with SetWorkflowID(idempotency_key): - handle = DBOS.start_workflow(checkout_workflow) - # Wait for the checkout workflow to send a payment ID, then return it. - payment_id = DBOS.get_event(handle.workflow_id, PAYMENT_ID) - if payment_id is None: - raise HTTPException(status_code=404, detail="Checkout failed to start") - return Response(payment_id) -``` - -### Events Reliability Guarantees - -All events are persisted to the database, so the latest version of an event is always retrievable. -Additionally, if `get_event` is called in a workflow, the retrieved value is persisted in the database so workflow recovery can use that value, even if the event is later updated. - -## Workflow Streaming - -Workflows can stream data in real time to clients. -This is useful for streaming results from a long-running workflow or LLM call or for monitoring or progress reporting. - -DBOS Steps - -### Writing to Streams - -```python -DBOS.write_stream( - key: str, - value: Any -) -> None: -``` - -You can write values to a stream from a workflow or its steps using `DBOS.write_stream`. -A workflow may have any number of streams, each identified by a unique key. - -When you are done writing to a stream, you should close it with `DBOS.close_stream`. -Otherwise, streams are automatically closed when the workflow terminates. - -```python -DBOS.close_stream( - key: str -) -> None -``` - -DBOS streams are immutable and append-only. -Writes to a stream from a workflow happen exactly-once. -Writes to a stream from a step happen at-least-once; if a step fails and is retried, it may write to the stream multiple times. -Readers will see all values written to the stream from all tries of the step in the order in which they were written. - -**Example syntax:** - -```python -@DBOS.workflow() -def producer_workflow(): - DBOS.write_stream(example_key, {"step": 1, "data": "value1"}) - DBOS.write_stream(example_key, {"step": 2, "data": "value2"}) - DBOS.close_stream(example_key) # Signal completion -``` - -### Reading from Streams - -```python -DBOS.read_stream( - workflow_id: str, - key: str -) -> Generator[Any, Any, None] -``` - -You can read values from a stream from anywhere using `DBOS.read_stream`. -This function reads values from a stream identified by a workflow ID and key, yielding each value in order until the stream is closed or the workflow terminates. - -You can also read from a stream from outside a DBOS application with a DBOS Client. - -**Example syntax:** - -```python -for value in DBOS.read_stream(workflow_id, example_key): - print(f"Received: {value}") -``` - -### Configurable Retries - -You can optionally configure a step to automatically retry any exception a set number of times with exponential backoff. -This is useful for automatically handling transient failures, like making requests to unreliable APIs. -Retries are configurable through arguments to the step decorator: - -```python -DBOS.step( - retries_allowed: bool = False, - interval_seconds: float = 1.0, - max_attempts: int = 3, - backoff_rate: float = 2.0 -) -``` - -For example, we configure this step to retry exceptions (such as if `example.com` is temporarily down) up to 10 times: - -```python -@DBOS.step(retries_allowed=True, max_attempts=10) -def example_step(): - return requests.get("https://example.com").text -``` - -## DBOS Queues - -You can use queues to run many workflows at once with managed concurrency. -Queues provide _flow control_, letting you manage how many workflows run at once or how often workflows are started. - -To create a queue, specify its name: - -```python -from dbos import Queue - -queue = Queue("example_queue") -``` - -You can then enqueue any DBOS workflow or step. -Enqueuing a function submits it for execution and returns a handle to it. -Queued tasks are started in first-in, first-out (FIFO) order. - -```python -queue = Queue("example_queue") - -@DBOS.workflow() -def process_task(task): - ... - -task = ... -handle = queue.enqueue(process_task, task) -``` - -### Queue Example - -Here's an example of a workflow using a queue to process tasks concurrently: - -```python -from dbos import DBOS, Queue - -queue = Queue("example_queue") - -@DBOS.workflow() -def process_task(task): - ... - -@DBOS.workflow() -def process_tasks(tasks): - task_handles = [] - # Enqueue each task so all tasks are processed concurrently. - for task in tasks: - handle = queue.enqueue(process_task, task) - task_handles.append(handle) - # Wait for each task to complete and retrieve its result. - # Return the results of all tasks. - return [handle.get_result() for handle in task_handles] -``` - -### Enqueueing from Another Application - -Often, you want to enqueue a workflow from outside your DBOS application. -For example, let's say you have an API server and a data processing service. -You're using DBOS to build a durable data pipeline in the data processing service. -When the API server receives a request, it should enqueue the data pipeline for execution on the data processing service. - -You can use the DBOS Client to enqueue workflows from outside your DBOS application by connecting directly to your DBOS application's system database. -Since the DBOS Client is designed to be used from outside your DBOS application, workflow and queue metadata must be specified explicitly. - -For example, this code enqueues the `data_pipeline` workflow on the `pipeline_queue` queue with `task` as an argument. - -```python -from dbos import DBOSClient, EnqueueOptions - -client = DBOSClient(system_database_url=os.environ["DBOS_SYSTEM_DATABASE_URL"]) - -options: EnqueueOptions = { - "queue_name": "pipeline_queue", - "workflow_name": "data_pipeline", -} -handle = client.enqueue(options, task) -result = handle.get_result() -``` - -### Managing Concurrency - -You can control how many workflows from a queue run simultaneously by configuring concurrency limits. -This helps prevent resource exhaustion when workflows consume significant memory or processing power. - -#### Worker Concurrency - -Worker concurrency sets the maximum number of workflows from a queue that can run concurrently on a single DBOS process. -This is particularly useful for resource-intensive workflows to avoid exhausting the resources of any process. -For example, this queue has a worker concurrency of 5, so each process will run at most 5 workflows from this queue simultaneously: - -```python -from dbos import Queue - -queue = Queue("example_queue", worker_concurrency=5) -``` - -#### Global Concurrency - -Global concurrency limits the total number of workflows from a queue that can run concurrently across all DBOS processes in your application. -For example, this queue will have a maximum of 10 workflows running simultaneously across your entire application. - -:::warning -Worker concurrency limits are recommended for most use cases. -Take care when using a global concurrency limit as any `PENDING` workflow on the queue counts toward the limit, including workflows from previous application versions -::: - -```python -from dbos import Queue - -queue = Queue("example_queue", concurrency=10) -``` - -#### In-Order Processing - -You can use a queue with `concurrency=1` to guarantee sequential, in-order processing of events. -Only a single event will be processed at a time. -For example, this app processes events sequentially in the order of their arrival: - -```python -from fastapi import FastAPI -from dbos import DBOS, Queue - -queue = Queue("in_order_queue", concurrency=1) - -@DBOS.step() -def process_event(event: str): - ... - -def event_endpoint(event: str): - queue.enqueue(process_event, event) -``` - -### Rate Limiting - -You can set _rate limits_ for a queue, limiting the number of functions that it can start in a given period. -Rate limits are global across all DBOS processes using this queue. -For example, this queue has a limit of 50 with a period of 30 seconds, so it may not start more than 50 functions in 30 seconds: - -```python -queue = Queue("example_queue", limiter={"limit": 50, "period": 30}) -``` - -Rate limits are especially useful when working with a rate-limited API, such as many LLM APIs. - -## Setting Timeouts - -You can set a timeout for an enqueued workflow with `SetWorkflowTimeout`. -When the timeout expires, the workflow **and all its children** are cancelled. -Cancelling a workflow sets its status to `CANCELLED` and preempts its execution at the beginning of its next step. - -Timeouts are **start-to-completion**: a workflow's timeout does not begin until the workflow is dequeued and starts execution. -Also, timeouts are **durable**: they are stored in the database and persist across restarts, so workflows can have very long timeouts. - -Example syntax: - -```python -@DBOS.workflow() -def example_workflow(): - ... - -queue = Queue("example-queue") - -# If the workflow does not complete within 10 seconds after being dequeued, it times out and is cancelled -with SetWorkflowTimeout(10): - queue.enqueue(example_workflow) -``` - -## Partitioning Queues - -You can **partition** queues to distribute work across dynamically created queue partitions. -When you enqueue a workflow on a partitioned queue, you must supply a queue partition key. -Partitioned queues dequeue workflows and apply flow control limits for individual partitions, not for the entire queue. -Essentially, you can think of each partition as a "subqueue" you dynamically create by enqueueing a workflow with a partition key. - -For example, suppose you want your users to each be able to run at most one task at a time. -You can do this with a partitioned queue with a maximum concurrency limit of 1 where the partition key is user ID. - -### Example Syntax - -```python -queue = Queue("partitioned_queue", partition_queue=True, concurrency=1) - -@DBOS.workflow() -def process_task(task: Task): - ... - - -def on_user_task_submission(user_id: str, task: Task): - # Partition the task queue by user ID. As the queue has a - # maximum concurrency of 1, this means that at most one - # task can run at once per user (but tasks from different - # users can run concurrently). - with SetEnqueueOptions(queue_partition_key=user_id): - queue.enqueue(process_task, task) -``` - -## Deduplication - -You can set a deduplication ID for an enqueued workflow with `SetEnqueueOptions`. -At any given time, only one workflow with a specific deduplication ID can be enqueued in the specified queue. -If a workflow with a deduplication ID is currently enqueued or actively executing (status `ENQUEUED` or `PENDING`), subsequent workflow enqueue attempt with the same deduplication ID in the same queue will raise a `DBOSQueueDeduplicatedError` exception. - -For example, this is useful if you only want to have one workflow active at a time per user—set the deduplication ID to the user's ID. - -Example syntax: - -```python -from dbos import DBOS, Queue, SetEnqueueOptions -from dbos import error as dboserror - -queue = Queue("example_queue") - -with SetEnqueueOptions(deduplication_id="my_dedup_id"): - try: - handle = queue.enqueue(example_workflow, ...) - except dboserror.DBOSQueueDeduplicatedError as e: - # Handle deduplication error -``` - -## Priority - -You can set a priority for an enqueued workflow with `SetEnqueueOptions`. -Workflows with the same priority are dequeued in **FIFO (first in, first out)** order. Priority values can range from `1` to `2,147,483,647`, where **a low number indicates a higher priority**. -If using priority, you must set `priority_enabled=True` on your queue. - -:::tip -Workflows without assigned priorities have the highest priority and are dequeued before workflows with assigned priorities. -::: - -Example syntax: - -```python -queue = Queue("priority_queue", priority_enabled=True) - -with SetEnqueueOptions(priority=10): - # All workflows are enqueued with priority set to 10 - # They will be dequeued in FIFO order - for task in tasks: - queue.enqueue(task_workflow, task) - -# first_workflow (priority=1) will be dequeued before all task_workflows (priority=10) -with SetEnqueueOptions(priority=1): - queue.enqueue(first_workflow) -``` - -## Python Classes - -You can add DBOS decorators to your Python class instance methods. -You can add step decorators to any class methods, but to add a workflow decorator to a class method, its class must inherit from `DBOSConfiguredInstance` and must be decorated with `@DBOS.dbos_class`. -For example: - -```python -@DBOS.dbos_class() -class URLFetcher(DBOSConfiguredInstance): - def __init__(self, url: str): - self.url = url - super().__init__(config_name=url) - - @DBOS.workflow() - def fetch_workflow(self): - return self.fetch_url() - - @DBOS.step() - def fetch_url(self): - return requests.get(self.url).text - -example_fetcher = URLFetcher("https://example.com") -print(example_fetcher.fetch_workflow()) -``` - -When you create a new instance of a DBOS class, `DBOSConfiguredInstance` must be instantiated with a `config_name`. -This `config_name` should be a unique identifier of the instance. -Additionally, all DBOS-decorated classes must be instantiated before `DBOS.launch()` is called. - -The reason for these requirements is to enable workflow recovery. -When you create a new instance of a DBOS class, DBOS stores it in a global registry indexed by `config_name`. -When DBOS needs to recover a workflow belonging to that class, it looks up the class instance using `config_name` so it can run the workflow using the right instance of its class. -If `config_name` is not supplied, or if DBOS classes are dynamically instantiated after `DBOS.launch()`, then DBOS may not find the class instance it needs to recover a workflow. - -### Testing DBOS Functions - -Unless otherwise specified, recommend testing DBOS applications with pytest. - -You MUST use this fixture to reset DBOS between tests: - -```python -@pytest.fixture() -def reset_dbos(): - DBOS.destroy() - config: DBOSConfig = { - "name": "my-app", - "database_url": os.environ.get("TESTING_DATABASE_URL"), - } - DBOS(config=config) - DBOS.reset_system_database() - DBOS.launch() -``` - -## Workflow Handle - -DBOS.start_workflow, DBOS.retrieve_workflow, and enqueue return workflow handles. - -### get_workflow_id - -```python -handle.get_workflow_id() -> str -``` - -Retrieve the ID of the workflow. - -#### get_result - -```python -handle.get_result() -> R -``` - -Wait for the workflow to complete, then return its result. - -#### get_status - -```python -handle.get_status() -> WorkflowStatus -``` - -## Workflow Management Methods - -### list_workflows - -```python -def list_workflows( - *, - workflow_ids: Optional[List[str]] = None, - status: Optional[str | list[str]] = None, - start_time: Optional[str] = None, - end_time: Optional[str] = None, - name: Optional[str] = None, - app_version: Optional[str] = None, - user: Optional[str] = None, - limit: Optional[int] = None, - offset: Optional[int] = None, - sort_desc: bool = False, - workflow_id_prefix: Optional[str] = None, -) -> List[WorkflowStatus]: -``` - -Retrieve a list of `WorkflowStatus` of all workflows matching specified criteria. - -**Parameters:** - -- **workflow_ids**: Retrieve workflows with these IDs. -- **workflow_id_prefix**: Retrieve workflows whose IDs start with the specified string. -- **status**: Retrieve workflows with this status (or one of these statuses) (Must be `ENQUEUED`, `PENDING`, `SUCCESS`, `ERROR`, `CANCELLED`, or `MAX_RECOVERY_ATTEMPTS_EXCEEDED`) -- **start_time**: Retrieve workflows started after this (RFC 3339-compliant) timestamp. -- **end_time**: Retrieve workflows started before this (RFC 3339-compliant) timestamp. -- **name**: Retrieve workflows with this fully-qualified name. -- **app_version**: Retrieve workflows tagged with this application version. -- **user**: Retrieve workflows run by this authenticated user. -- **limit**: Retrieve up to this many workflows. -- **offset**: Skip this many workflows from the results returned (for pagination). -- **sort_desc**: Whether to sort the results in descending (`True`) or ascending (`False`) order by workflow start time. - -### list_queued_workflows - -```python -def list_queued_workflows( - *, - queue_name: Optional[str] = None, - status: Optional[str | list[str]] = None, - start_time: Optional[str] = None, - end_time: Optional[str] = None, - name: Optional[str] = None, - limit: Optional[int] = None, - offset: Optional[int] = None, - sort_desc: bool = False, -) -> List[WorkflowStatus]: -``` - -Retrieve a list of `WorkflowStatus` of all **currently enqueued** workflows matching specified criteria. - -**Parameters:** - -- **queue_name**: Retrieve workflows running on this queue. -- **status**: Retrieve workflows with this status (or one of these statuses) (Must be `ENQUEUED` or `PENDING`) -- **start_time**: Retrieve workflows enqueued after this (RFC 3339-compliant) timestamp. -- **end_time**: Retrieve workflows enqueued before this (RFC 3339-compliant) timestamp. -- **name**: Retrieve workflows with this fully-qualified name. -- **limit**: Retrieve up to this many workflows. -- **offset**: Skip this many workflows from the results returned (for pagination). - -### list_workflow_steps - -```python -def list_workflow_steps( - workflow_id: str, -) -> List[StepInfo] -``` - -Retrieve the steps of a workflow. -This is a list of `StepInfo` objects, with the following structure: - -```python -class StepInfo(TypedDict): - # The unique ID of the step in the workflow. One-indexed. - function_id: int - # The (fully qualified) name of the step - function_name: str - # The step's output, if any - output: Optional[Any] - # The error the step threw, if any - error: Optional[Exception] - # If the step starts or retrieves the result of a workflow, its ID - child_workflow_id: Optional[str] -``` - -### cancel_workflow - -```python -DBOS.cancel_workflow( - workflow_id: str, -) -> None -``` - -Cancel a workflow. -This sets is status to `CANCELLED`, removes it from its queue (if it is enqueued) and preempts its execution (interrupting it at the beginning of its next step) - -### resume_workflow - -```python -DBOS.resume_workflow( - workflow_id: str -) -> WorkflowHandle[R] -``` - -Resume a workflow. -This immediately starts it from its last completed step. -You can use this to resume workflows that are cancelled or have exceeded their maximum recovery attempts. -You can also use this to start an enqueued workflow immediately, bypassing its queue. - -### fork_workflow - -```python -DBOS.fork_workflow( - workflow_id: str, - start_step: int, - *, - application_version: Optional[str] = None, -) -> WorkflowHandle[R] -``` - -Start a new execution of a workflow from a specific step. -The input step ID must match the `function_id` of the step returned by `list_workflow_steps`. -The specified `start_step` is the step from which the new workflow will start, so any steps whose ID is less than `start_step` will not be re-executed. - -The forked workflow will have a new workflow ID, which can be set with `SetWorkflowID`. -It is possible to specify the application version on which the forked workflow will run by setting `application_version`, this is useful for "patching" workflows that failed due to a bug in a previous application version. - -### Workflow Status - -Some workflow introspection and management methods return a `WorkflowStatus`. -This object has the following definition: - -```python -class WorkflowStatus: - # The workflow ID - workflow_id: str - # The workflow status. Must be one of ENQUEUED, PENDING, SUCCESS, ERROR, CANCELLED, or MAX_RECOVERY_ATTEMPTS_EXCEEDED - status: str - # The name of the workflow function - name: str - # The number of times this workflow has been started - recovery_attempts: int - # The name of the workflow's class, if any - class_name: Optional[str] - # The name with which the workflow's class instance was configured, if any - config_name: Optional[str] - # The user who ran the workflow, if specified - authenticated_user: Optional[str] - # The role with which the workflow ran, if specified - assumed_role: Optional[str] - # All roles which the authenticated user could assume - authenticated_roles: Optional[list[str]] - # The deserialized workflow input object - input: Optional[WorkflowInputs] - # The workflow's output, if any - output: Optional[Any] - # The error the workflow threw, if any - error: Optional[Exception] - # Workflow start time, as a Unix epoch timestamp in ms - created_at: Optional[int] - # Last time the workflow status was updated, as a Unix epoch timestamp in ms - updated_at: Optional[int] - # If this workflow was enqueued, on which queue - queue_name: Optional[str] - # The ID of the executor (process) that most recently executed this workflow - executor_id: Optional[str] - # The application version on which this workflow was started - app_version: Optional[str] -``` - -Retrieve the workflow status. - -## Configuring DBOS - -To configure DBOS, pass a `DBOSConfig` object to its constructor. -For example: - -```python -config: DBOSConfig = { - "name": "dbos-example", - "system_database_url": os.environ["DBOS_SYSTEM_DATABASE_URL"], -} -DBOS(config=config) -``` - -The `DBOSConfig` object has the following fields. -All fields except `name` are optional. - -```python -class DBOSConfig(TypedDict): - name: str - - system_database_url: Optional[str] - application_database_url: Optional[str] - sys_db_pool_size: Optional[int] - db_engine_kwargs: Optional[Dict[str, Any]] - dbos_system_schema: Optional[str] - system_database_engine: Optional[sqlalchemy.Engine] - - conductor_key: Optional[str] - - enable_otlp: Optional[bool] - otlp_traces_endpoints: Optional[List[str]] - otlp_logs_endpoints: Optional[List[str]] - otlp_attributes: Optional[dict[str, str]] - log_level: Optional[str] - - run_admin_server: Optional[bool] - admin_port: Optional[int] - - application_version: Optional[str] - executor_id: Optional[str] - - serializer: Optional[Serializer] -``` - -- **name**: Your application's name. -- **system_database_url**: A connection string to your system database. - This is the database in which DBOS stores workflow and step state. - This may be either Postgres or SQLite, though Postgres is recommended for production. - DBOS uses this connection string, unmodified, to create a SQLAlchemy Engine - A valid connection string looks like: - -```text -postgresql://[username]:[password]@[hostname]:[port]/[database name] -``` - -Or with SQLite: - -```text -sqlite:///[path to database file] -``` - -:::info -Passwords in connection strings must be escaped (for example with urllib) if they contain special characters. -::: - -If no connection string is provided, DBOS uses a SQLite database: - -```shell -sqlite:///[application_name].sqlite -``` - -- **application_database_url**: A connection string to your application database. - This is the database in which DBOS executes `@DBOS.transaction` functions. - This parameter has the same format and default as `system_database_url`. - If you are not using `@DBOS.transaction`, you do not need to supply this parameter. -- **db_engine_kwargs**: Additional keyword arguments passed to SQLAlchemy’s `create_engine()`. - Defaults to: - -```python -{ - "pool_size": 20, - "max_overflow": 0, - "pool_timeout": 30, -} -``` - -- **sys_db_pool_size**: The size of the connection pool used for the DBOS system database. Defaults to 20. -- **dbos_system_schema**: Postgres schema name for DBOS system tables. Defaults to "dbos". -- **system_database_engine**: A custom SQLAlchemy engine to use to connect to your system database. If provided, DBOS will not create an engine but use this instead. -- **conductor_key**: An API key for DBOS Conductor. If provided, application is connected to Conductor. API keys can be created from the DBOS console. -- **enable_otlp**: Enable DBOS OpenTelemetry tracing and export. Defaults to False. -- **otlp_traces_endpoints**: DBOS operations automatically generate OpenTelemetry Traces. Use this field to declare a list of OTLP-compatible trace receivers. Requires `enable_otlp` to be True. -- **otlp_logs_endpoints**: the DBOS logger can export OTLP-formatted log signals. Use this field to declare a list of OTLP-compatible log receivers. Requires `enable_otlp` to be True. -- **otlp_attributes**: A set of attributes (key-value pairs) to apply to all OTLP-exported logs and traces. -- **log_level**: Configure the DBOS logger severity. Defaults to `INFO`. -- **run_admin_server**: Whether to run an HTTP admin server for workflow management operations. Defaults to True. -- **admin_port**: The port on which the admin server runs. Defaults to 3001. -- **application_version**: The code version for this application and its workflows. Workflow versioning is documented here. -- **executor_id**: Executor ID, used to identify the application instance in distributed environments. It is also useful for distributed workflow recovery -- **serializer**: A custom serializer for the system database. - -### Custom Serialization - -DBOS must serialize data such as workflow inputs and outputs and step outputs to store it in the system database. -By default, data is serialized with `pickle` then Base64-encoded, but you can optionally supply a custom serializer through DBOS configuration. -A custom serializer must match this interface: - -```python -class Serializer(ABC): - - @abstractmethod - def serialize(self, data: Any) -> str: - pass - - @abstractmethod - def deserialize(cls, serialized_data: str) -> Any: - pass -``` - -For example, here is how to configure DBOS to use a JSON serializer: - -```python -from dbos import DBOS, DBOSConfig, Serializer - -class JsonSerializer(Serializer): - def serialize(self, data: Any) -> str: - return json.dumps(data) - - def deserialize(cls, serialized_data: str) -> Any: - return json.loads(serialized_data) - -serializer = JsonSerializer() -config: DBOSConfig = { - "name": "dbos-starter", - "system_database_url": os.environ.get("DBOS_SYSTEM_DATABASE_URL"), - "serializer": serializer -} -DBOS(config=config) -DBOS.launch() -``` - -### Transactions - -Transactions are a special type of step that are optimized for database accesses. -They execute as a single database transaction. - -ONLY use transactions if you are SPECIFICALLY requested to perform database operations, DO NOT USE THEM OTHERWISE. - -If asked to add DBOS to code that already contains database operations, ALWAYS make it a step, do NOT attempt to make it a transaction unless requested. - -ONLY use transactions with a Postgres database. -To access any other database, ALWAYS use steps. - -To make a Python function a transaction, annotate it with the DBOS.transaction decorator. -Then, access the database using the DBOS.sql_session client, which is a SQLAlchemy client DBOS automatically connects to your database. -Here are some examples: - -#### SQLAlchemy - -```python -greetings = Table( - "greetings", - MetaData(), - Column("name", String), - Column("note", String) -) - -@DBOS.transaction() -def example_insert(name: str, note: str) -> None: - # Insert a new greeting into the database - DBOS.sql_session.execute(greetings.insert().values(name=name, note=note)) - -@DBOS.transaction() -def example_select(name: str) -> Optional[str]: - # Select the first greeting to a particular name - row = DBOS.sql_session.execute( - select(greetings.c.note).where(greetings.c.name == name) - ).first() - return row[0] if row else None -``` - -#### Raw SQL - -```python -@DBOS.transaction() -def example_insert(name: str, note: str) -> None: - # Insert a new greeting into the database - sql = text("INSERT INTO greetings (name, note) VALUES (:name, :note)") - DBOS.sql_session.execute(sql, {"name": name, "note": note}) - - -@DBOS.transaction() -def example_select(name: str) -> Optional[str]: - # Select the first greeting to a particular name - sql = text("SELECT note FROM greetings WHERE name = :name LIMIT 1") - row = DBOS.sql_session.execute(sql, {"name": name}).first() - return row[0] if row else None -``` - -NEVER async def a transaction. diff --git a/docs/architecture.md b/docs/architecture.md index f7758c9..f63d866 100644 --- a/docs/architecture.md +++ b/docs/architecture.md @@ -59,8 +59,8 @@ Mainloop uses a main thread + sessions pattern where your continuous conversatio Sessions are the unified model for all background work: -- **Simple sessions**: Claude conversations without code (research, analysis) -- **Code sessions**: GitHub integration with plan → implement → PR workflow +- **Simple sessions**: Research, analysis, or any conversation-based task +- **Dev sessions**: Code work with their own K8s namespace for isolation Each session has: From fc1eb7836855de9ba5332adc44a9936ae45e2c6e Mon Sep 17 00:00:00 2001 From: James Olds <12104969+oldsj@users.noreply.github.com> Date: Thu, 15 Jan 2026 11:30:21 -0500 Subject: [PATCH 2/5] Add simplified agent workflow to README MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Agent workflow: spawn → work in k8s ns → PR → close - Emphasizes staying in main thread while agents work - Trim ROADMAP.md to just future automation features Co-Authored-By: Claude Opus 4.5 --- README.md | 21 +++++++++++++++++++++ ROADMAP.md | 35 +++++++++++------------------------ 2 files changed, 32 insertions(+), 24 deletions(-) diff --git a/README.md b/README.md index 7945396..637eac0 100644 --- a/README.md +++ b/README.md @@ -98,6 +98,27 @@ mainloop/ 3. Completed and failed sessions 4. Each session has its own conversation you can zoom into +## Agent Workflow + +Agents are sessions spawned for development tasks. Each agent gets its own K8s namespace for isolated iteration. + +```text +┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ +│ Spawn │────►│ Work │────►│ PR │────►│ Close │ +│ (main) │ │ (k8s ns) │ │ (GitHub) │ │ (summary) │ +└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ + ▲ │ + └───────────────────┘ + check in / spawn more +``` + +1. **Spawn** - Main thread creates agent for a task +2. **Work** - Agent iterates in its own K8s namespace (build, test, debug) +3. **PR** - Agent creates and merges GitHub PR when ready +4. **Close** - Agent posts summary back to main thread + +You stay in main thread, checking in on agents and spawning new ones as needed. + ## Documentation - [Architecture](docs/architecture.md) - System design and data flow diff --git a/ROADMAP.md b/ROADMAP.md index d7d59b7..901950f 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -2,36 +2,23 @@ Future ideas and features under consideration. -## Agent Workflow Automation +## CI Loop Automation -Structured workflow for code sessions: **plan in issue → implement in draft PR → iterate until CI green → ready for human review**. +Agent automatically iterates on CI failures: +- Poll GitHub Actions after each push +- On failure: analyze logs, fix, commit +- Continue until green checkmark -```text -┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ -│ Planning │────►│ Draft │────►│ Iteration │────►│ Review │ -│ (GH Issue) │ │ (PR) │ │ (CI Loop) │ │ (Human) │ -└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ -``` +## GitHub Issue Planning -### Phases - -1. **Planning (GitHub Issue)** - Agent creates/updates an issue with problem analysis, proposed approach, and implementation plan. The issue is the "thinking out loud" space before code. - -2. **Draft PR** - Agent creates a draft PR linked to the issue. Implements in small, logical commits. Uses PR comments to narrate progress and decisions. - -3. **Iteration (CI Loop)** - Agent polls GitHub Actions after each push. On failure: analyzes logs, fixes, commits. Continues until green checkmark. - -4. **Ready for Review** - Agent marks PR ready and adds summary comment. Human reviewer steps in for final approval. - -### Verification Tools - -- **LSP server integration** - Real-time type/lint errors -- **`trunk` CLI** - Unified super-linter -- **Project test suites** - Via GitHub Actions +Agent creates/updates issues before coding: +- Problem analysis and proposed approach +- Implementation plan as "thinking out loud" space +- Links PR back to issue for context ## Project Template -Standardized setup for repositories that work well with mainloop agents. +Standardized repo setup for mainloop agents: | Component | Purpose | | -------------- | ------------------------------------ | From 28e97d805a65da94d5b05cd376b12f4e03b6a751 Mon Sep 17 00:00:00 2001 From: James Olds <12104969+oldsj@users.noreply.github.com> Date: Thu, 15 Jan 2026 11:38:00 -0500 Subject: [PATCH 3/5] Docs as specs: source of truth for app behavior MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Add docs/specs/ with chat.md, sessions.md, layout.md - Specs are human-readable but detailed enough for planner agent - Delete question-answering tests (feature removed) - Update CLAUDE.md: specs → planner → tests workflow - Update README: link to specs as primary docs Co-Authored-By: Claude Opus 4.5 --- CLAUDE.md | 9 +- README.md | 8 +- docs/specs/chat.md | 23 + docs/specs/layout.md | 17 + docs/specs/sessions.md | 46 + frontend/tests/question-answering.plan.md | 890 ------------------ .../01-display-needs-input-badge.spec.ts | 27 - .../05-select-option-advance.spec.ts | 23 - .../08-custom-answer-enter.spec.ts | 20 - .../18-submit-answers.spec.ts | 20 - 10 files changed, 98 insertions(+), 985 deletions(-) create mode 100644 docs/specs/chat.md create mode 100644 docs/specs/layout.md create mode 100644 docs/specs/sessions.md delete mode 100644 frontend/tests/question-answering.plan.md delete mode 100644 frontend/tests/question-answering/01-display-needs-input-badge.spec.ts delete mode 100644 frontend/tests/question-answering/05-select-option-advance.spec.ts delete mode 100644 frontend/tests/question-answering/08-custom-answer-enter.spec.ts delete mode 100644 frontend/tests/question-answering/18-submit-answers.spec.ts diff --git a/CLAUDE.md b/CLAUDE.md index 41860a2..077e8dc 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -46,9 +46,10 @@ make test-reset # Clear DB + namespaces between runs **Playwright agents for test maintenance:** -- Don't manually tweak tests - use `playwright-test-healer` to auto-fix failures -- For new features, use `playwright-test-planner` to explore and generate plans +- Specs in `docs/specs/` are the source of truth for app behavior +- Use `playwright-test-planner` to generate test plans from specs - Use `playwright-test-generator` to create tests from plans +- Use `playwright-test-healer` to auto-fix failing tests ### Test Architecture (Flakiness Prevention) @@ -130,10 +131,10 @@ make setup-claude-creds # Extract Claude credentials from Keychain ## Documentation Philosophy ```text -README.md → docs/ → specs/ → tests/ +docs/specs/*.md → playwright-test-planner → tests/*.spec.ts ``` -Specs define behavior. Tests are the source of truth. Keep docs in sync by running planner agent after feature changes. +Specs are the source of truth. They describe user-visible behavior in human-readable form, detailed enough for `playwright-test-planner` to generate tests. If tests fail: spec is wrong, code is wrong, or feature is in active development. ## Important diff --git a/README.md b/README.md index 637eac0..0c4783d 100644 --- a/README.md +++ b/README.md @@ -121,9 +121,15 @@ You stay in main thread, checking in on agents and spawning new ones as needed. ## Documentation +**Specs** (source of truth for app behavior): +- [Chat](docs/specs/chat.md) - Main thread conversation +- [Sessions](docs/specs/sessions.md) - Background work and status +- [Layout](docs/specs/layout.md) - Mobile and desktop views + +**Guides**: - [Architecture](docs/architecture.md) - System design and data flow - [Development](docs/development.md) - Local setup and commands -- [Contributing](CONTRIBUTING.md) - How to contribute to mainloop +- [Contributing](CONTRIBUTING.md) - How to contribute ## License diff --git a/docs/specs/chat.md b/docs/specs/chat.md new file mode 100644 index 0000000..cb388ee --- /dev/null +++ b/docs/specs/chat.md @@ -0,0 +1,23 @@ +# Chat + +The main thread is a continuous conversation with Claude that persists across devices. + +## Sending Messages + +- Input field with placeholder "Enter command..." +- EXEC button submits the message +- Message appears in conversation immediately +- Assistant response streams in below + +## Conversation History + +- Messages persist across page reloads +- Context maintained in follow-up messages +- User messages and assistant responses displayed in sequence + +## Spawning Sessions + +From the main thread, you can ask Claude to spawn sessions: +- Sessions appear as colored thread blocks in the timeline +- Session messages surface as thread notifications +- Click to expand inline or zoom to fullscreen view diff --git a/docs/specs/layout.md b/docs/specs/layout.md new file mode 100644 index 0000000..4052ead --- /dev/null +++ b/docs/specs/layout.md @@ -0,0 +1,17 @@ +# Layout + +Mainloop is responsive across mobile and desktop viewports. + +## Desktop + +- Chat takes main area +- Sessions sidebar always visible on the right +- No tab bar + +## Mobile + +- Bottom tab bar with Chat and Sessions tabs +- Chat tab active by default on load +- Tab bar hidden on desktop viewports +- Touch targets sized appropriately for mobile interaction +- Tabs switch between Chat and Sessions views diff --git a/docs/specs/sessions.md b/docs/specs/sessions.md new file mode 100644 index 0000000..a651428 --- /dev/null +++ b/docs/specs/sessions.md @@ -0,0 +1,46 @@ +# Sessions + +Sessions are background AI work spawned from the main thread. Each session has its own conversation and runs independently. + +## Session List + +Desktop shows sessions in a sidebar. Mobile shows sessions in a tab. + +When no sessions exist: +- Shows empty state with "No sessions yet" message +- Shows hint: "Sessions appear when Claude spawns background work" + +When sessions exist: +- Each session shows title and status badge +- Active count shown in header (e.g., "2 active") +- Clicking a session opens its detail view + +## Status Badges + +| Status | Badge | Meaning | +|--------|-------|---------| +| pending | PENDING | Queued, not started | +| active | ACTIVE | Currently running | +| waiting_on_user | NEEDS INPUT | Blocked on user response | +| completed | DONE | Finished successfully | +| failed | FAILED | Error occurred | + +Failed sessions show error message below the badge. + +## Session Detail View + +Clicking a session navigates to `/sessions/{id}`: +- Shows title as h1 heading +- Shows description if present +- Has Chat and Logs tabs (Chat tab active by default) +- Active sessions show Cancel button +- Completed sessions show Summary section +- Failed sessions show Error section +- Back button returns to home +- Non-existent session ID shows "Session not found" with link to home + +## Notifications + +When a session needs attention: +- Toast notification appears with title and preview +- Clicking notification navigates to that session's detail view diff --git a/frontend/tests/question-answering.plan.md b/frontend/tests/question-answering.plan.md deleted file mode 100644 index a5aa5e9..0000000 --- a/frontend/tests/question-answering.plan.md +++ /dev/null @@ -1,890 +0,0 @@ -# Question Answering Flow Test Plan - -## Application Overview - -This test plan covers the question answering workflow in mainloop, where worker tasks request input from users before proceeding. The workflow includes: - -1. Tasks enter "waiting_questions" status when they need clarification -2. Tasks display with "NEEDS INPUT" badge in the inbox -3. Questions are auto-expanded for user attention -4. Users can select from predefined options or provide custom text answers -5. Questions collapse after answering and advance to the next unanswered question -6. Users can edit previously answered questions -7. Once all questions are answered, a "Continue" button appears to submit all answers -8. After submission, the task transitions to "planning" status and continues execution - -The UI implements a progressive disclosure pattern where: - -- Only one question is expanded at a time (the currently active one) -- Answered questions show in collapsed summary view with checkmark -- Unanswered questions ahead are dimmed -- Focus automatically advances through questions for smooth completion - -Key UI elements tested: - -- Question display with numbered badges (1 of N) -- Option buttons with accent highlighting on selection -- Custom text input with auto-focus and Enter key submission -- Progress indicators and question counters -- Answer editing workflow -- Submit button enabling/disabling based on completion -- Loading states during submission -- Error handling and retry capability - -## Test Scenarios - -### 1. Question Viewing and Display - -**Seed:** `frontend/tests/fixtures/seed-data.ts` - -#### 1.1. Display task with NEEDS INPUT badge - -**File:** `frontend/tests/question-answering/01-display-needs-input-badge.spec.ts` - -**Steps:** - -1. Seed a task in 'waiting_questions' status with 2 questions using seedTaskWaitingQuestions() -2. Navigate to the mainloop app at / -3. Wait for the app shell to load (heading '$ mainloop' visible) -4. Verify task appears in inbox with 'NEEDS INPUT' badge -5. Verify badge has warning styling (border-term-warning text-term-warning) -6. Verify task is auto-expanded (expandedTaskIds includes task ID) - -**Expected Results:** - -- Task card is visible in inbox -- Badge displays 'NEEDS INPUT' text -- Badge has yellow/warning color styling -- Task is automatically expanded to show questions (user doesn't need to click) -- First question is visible and expanded - -#### 1.2. Display first unanswered question expanded - -**File:** `frontend/tests/question-answering/02-first-question-expanded.spec.ts` - -**Steps:** - -1. Seed a task with 3 questions using seedTaskWaitingQuestions() with custom questions array -2. Navigate to the app -3. Locate the expanded task -4. Verify the first question is in expanded state -5. Verify question number badge shows '1' with accent border -6. Verify question header is displayed (e.g., 'Authentication Method') -7. Verify question text is fully visible -8. Verify all option buttons are displayed -9. Verify custom text input is visible with placeholder 'Or type a custom answer...' -10. Verify second and third questions are visible but dimmed (opacity-40) -11. Verify future questions show number badges but are not expanded - -**Expected Results:** - -- Only the first question is shown in expanded state -- Question displays numbered badge (1) with term-accent border -- Question header is shown in a badge with term-accent styling -- Question text is clearly readable -- All option buttons are rendered and clickable -- Custom input field is present and enabled -- Input field has auto-focus -- Questions 2 and 3 are visible but dimmed (40% opacity) -- Future questions show only number badge and header, not full content - -#### 1.3. Display question with multiple choice options - -**File:** `frontend/tests/question-answering/03-display-options.spec.ts` - -**Steps:** - -1. Seed a task with a question that has 3 options: 'Yes', 'No', 'Maybe' -2. Navigate to the app -3. Expand the task if not auto-expanded -4. Locate the active question section -5. Count the option buttons displayed -6. Verify each option shows its label text -7. Verify options have border-term-border styling (unselected state) -8. Verify options have hover effect (hover:border-term-accent) -9. Verify all options are enabled (not disabled) - -**Expected Results:** - -- Three option buttons are visible -- Each button displays correct label: 'Yes', 'No', 'Maybe' -- Buttons have default border styling (border-term-border) -- Buttons change border on hover to term-accent color -- All buttons are clickable (not disabled) -- Buttons are arranged horizontally with gap spacing - -#### 1.4. Display question counter and progress - -**File:** `frontend/tests/question-answering/04-question-counter.spec.ts` - -**Steps:** - -1. Seed a task with exactly 5 questions -2. Navigate to the app -3. Locate the first expanded question -4. Verify the question number badge shows '1' -5. Verify all 5 question placeholders are visible in the list -6. Count total number of question elements (should be 5) -7. Verify questions 2-5 are in dimmed state - -**Expected Results:** - -- First question shows numbered badge with '1' -- All 5 questions are rendered in the list -- Questions 2-5 have reduced opacity (dimmed) -- User can see total number of questions to answer -- Progress is visually clear (1 active out of 5 total) - -### 2. Answering Questions with Options - -**Seed:** `frontend/tests/fixtures/seed-data.ts` - -#### 2.1. Select option and auto-advance to next question - -**File:** `frontend/tests/question-answering/05-select-option-advance.spec.ts` - -**Steps:** - -1. Seed a task with 3 questions -2. Navigate to the app -3. Verify first question (q1) is expanded -4. Click the first option button (e.g., 'Yes') -5. Wait for UI update -6. Verify q1 collapses to summary view -7. Verify q1 shows checkmark (✓) icon -8. Verify q1 summary displays selected answer -9. Verify q2 automatically expands (becomes active) -10. Verify q3 remains dimmed -11. Verify Continue button is NOT visible yet (not all answered) - -**Expected Results:** - -- First question transitions from expanded to collapsed state -- Collapsed question shows green checkmark (text-term-accent-alt) -- Selected answer 'Yes' is displayed in the summary -- Question 2 automatically becomes the active question -- Question 2 expands with all its options visible -- Question 3 stays in future/dimmed state -- Custom input in q2 receives auto-focus -- No submit button appears until all questions answered - -#### 2.2. Select different options for multiple questions - -**File:** `frontend/tests/question-answering/06-select-multiple-options.spec.ts` - -**Steps:** - -1. Seed a task with 2 questions, each with different options -2. Navigate to the app -3. On question 1, click 'JWT' option -4. Wait for q2 to expand -5. On question 2, click 'Yes' option -6. Verify both questions show in collapsed summary state -7. Verify q1 summary shows 'JWT' -8. Verify q2 summary shows 'Yes' -9. Verify both have checkmark icons -10. Verify Continue button appears - -**Expected Results:** - -- Question 1 collapses with 'JWT' displayed -- Question 2 collapses with 'Yes' displayed -- Both questions show green checkmarks -- Both answers are preserved in the UI -- Continue button becomes visible -- Continue button has term-accent-alt styling -- Continue button is enabled - -#### 2.3. Option selection highlights correctly - -**File:** `frontend/tests/question-answering/07-option-highlight.spec.ts` - -**Steps:** - -1. Seed a task with one question having 3 options -2. Navigate to the app -3. Locate the active question with options -4. Click the second option button -5. Verify clicked option has accent border (border-term-accent) -6. Verify clicked option has accent text color (text-term-accent) -7. Verify clicked option has accent background (bg-term-accent/10) -8. Verify other options remain with default border (border-term-border) -9. Verify only one option is highlighted at a time - -**Expected Results:** - -- Selected option button changes to accent color scheme -- Selected option has border-term-accent class -- Selected option has text-term-accent class -- Selected option has subtle background tint (bg-term-accent/10) -- Unselected options keep default styling -- Selection state is visually distinct and clear -- Only the clicked option shows selection styling - -### 3. Custom Text Answers - -**Seed:** `frontend/tests/fixtures/seed-data.ts` - -#### 3.1. Type custom answer and submit with Enter - -**File:** `frontend/tests/question-answering/08-custom-answer-enter.spec.ts` - -**Steps:** - -1. Seed a task with 2 questions -2. Navigate to the app -3. Locate the custom text input field in the first question -4. Verify input has auto-focus -5. Type 'My custom authentication approach' into the input -6. Verify 'OK' button appears next to the input -7. Press Enter key -8. Verify question 1 collapses -9. Verify custom text 'My custom authentication approach' appears in summary -10. Verify question 2 auto-expands - -**Expected Results:** - -- Input field automatically receives focus on mount -- Typed text appears in the input field -- OK button becomes visible when text is present -- Pressing Enter advances to next question -- Question collapses to summary view -- Custom text is displayed in the collapsed summary -- Second question becomes active -- Custom answer is preserved (not lost) - -#### 3.2. Type custom answer and click OK button - -**File:** `frontend/tests/question-answering/09-custom-answer-ok-button.spec.ts` - -**Steps:** - -1. Seed a task with 1 question -2. Navigate to the app -3. Focus the custom text input -4. Type 'OAuth 2.0 with PKCE' -5. Verify OK button is visible and enabled -6. Click the OK button -7. Verify question collapses -8. Verify 'OAuth 2.0 with PKCE' is shown in summary -9. Verify checkmark appears -10. Verify Continue button appears (all questions answered) - -**Expected Results:** - -- OK button appears when input has text -- OK button has term-accent border and text -- Clicking OK collapses the question -- Custom answer is displayed in summary -- Checkmark icon is shown -- Continue button becomes visible -- Continue button is enabled for submission - -#### 3.3. Custom answer clears selected option - -**File:** `frontend/tests/question-answering/10-custom-clears-option.spec.ts` - -**Steps:** - -1. Seed a task with 1 question with options -2. Navigate to the app -3. Click an option button (e.g., 'Yes') -4. Verify option is highlighted -5. Immediately type in the custom input field: 'Actually, I prefer a different approach' -6. Verify the previously selected option is no longer highlighted -7. Verify only custom text is considered as the answer -8. Collapse the question by pressing Enter or clicking OK -9. Verify summary shows the custom text, not the option - -**Expected Results:** - -- Option button initially shows selected state -- When user types in custom field, option deselects -- Option button returns to default styling (border-term-border) -- Custom text takes precedence over option selection -- Only custom answer is shown in the summary -- No option remains in selected state -- Answer state correctly reflects custom input - -#### 3.4. Option selection clears custom text - -**File:** `frontend/tests/question-answering/11-option-clears-custom.spec.ts` - -**Steps:** - -1. Seed a task with 1 question -2. Navigate to the app -3. Type 'Custom answer here' in the input field -4. Verify text appears in input -5. Click an option button (e.g., 'No') -6. Verify question collapses and auto-advances -7. Verify summary shows 'No' (the option), not the custom text - -**Expected Results:** - -- Custom text is initially in the input field -- Clicking an option triggers selection -- Question collapses with the option as the answer -- Custom text is not used/displayed -- Summary shows option label 'No' -- Custom input state is cleared - -#### 3.5. Empty custom input does not show OK button - -**File:** `frontend/tests/question-answering/12-empty-input-no-ok.spec.ts` - -**Steps:** - -1. Seed a task with 1 question -2. Navigate to the app -3. Locate the custom text input -4. Verify input is empty -5. Verify OK button is NOT visible -6. Type a single character 'a' -7. Verify OK button appears -8. Clear the input (delete the character) -9. Verify OK button disappears again - -**Expected Results:** - -- OK button is not rendered when input is empty -- OK button appears as soon as text is entered -- OK button disappears when input is cleared -- Button visibility is reactive to input value -- User cannot submit empty custom answer - -### 4. Editing Previous Answers - -**Seed:** `frontend/tests/fixtures/seed-data.ts` - -#### 4.1. Click answered question to edit - -**File:** `frontend/tests/question-answering/13-edit-answered-question.spec.ts` - -**Steps:** - -1. Seed a task with 3 questions -2. Navigate to the app -3. Answer question 1 by selecting an option -4. Answer question 2 by typing custom text -5. Verify both are collapsed with summaries -6. Verify question 3 is now active -7. Click on the collapsed question 1 summary -8. Verify question 1 expands back to edit mode -9. Verify question 3 remains visible but question 1 is now active -10. Verify previous answer is still highlighted/shown - -**Expected Results:** - -- Clicking collapsed question summary re-expands it -- Question transitions from summary to edit mode -- Previously selected option is still highlighted -- Question becomes the active question (editingQuestionId is set) -- Other collapsed questions remain collapsed -- Currently active question (q3) loses focus -- User can change their answer - -#### 4.2. Change answer from option to custom text - -**File:** `frontend/tests/question-answering/14-change-option-to-custom.spec.ts` - -**Steps:** - -1. Seed a task with 2 questions -2. Navigate to the app -3. Select 'Yes' option for question 1 -4. Wait for q1 to collapse -5. Click on q1 summary to edit -6. Verify 'Yes' option is highlighted -7. Type 'Actually I want a custom approach' in the custom input -8. Press Enter to confirm -9. Verify q1 summary now shows the custom text -10. Verify 'Yes' is no longer the answer - -**Expected Results:** - -- Question re-opens with previous option selected -- User can type custom text to override option -- Custom text clears the option selection -- Pressing Enter saves the new custom answer -- Summary updates to show custom text -- Previous option selection is replaced - -#### 4.3. Change answer from custom text to option - -**File:** `frontend/tests/question-answering/15-change-custom-to-option.spec.ts` - -**Steps:** - -1. Seed a task with 2 questions -2. Navigate to the app -3. Type 'My custom answer' for question 1 and press Enter -4. Wait for q1 to collapse showing custom text -5. Click on q1 summary to re-edit -6. Verify custom text is in the input field -7. Click the 'No' option button -8. Verify question auto-advances -9. Verify q1 summary now shows 'No' -10. Verify custom text is no longer the answer - -**Expected Results:** - -- Question re-opens with custom text in input -- Clicking an option overrides custom text -- Question advances to next unanswered -- Summary updates to show option label -- Custom text is cleared from state -- Answer correctly changes to the option - -#### 4.4. Edit middle question while others remain answered - -**File:** `frontend/tests/question-answering/16-edit-middle-question.spec.ts` - -**Steps:** - -1. Seed a task with 3 questions -2. Navigate to the app -3. Answer all 3 questions in order -4. Verify all 3 show collapsed with checkmarks -5. Verify Continue button is visible -6. Click on question 2 summary to edit -7. Verify q2 expands to edit mode -8. Verify q1 and q3 remain collapsed with checkmarks -9. Change the answer for q2 -10. Press Enter or click to confirm -11. Verify q2 collapses again -12. Verify all 3 questions still answered -13. Verify Continue button remains visible - -**Expected Results:** - -- Only question 2 expands for editing -- Questions 1 and 3 stay collapsed -- User can edit middle question without affecting others -- After editing, q2 collapses back to summary -- All questions remain in answered state -- Continue button stays enabled -- No loss of data in other questions - -### 5. Submitting Answers - -**Seed:** `frontend/tests/fixtures/seed-data.ts` - -#### 5.1. Continue button appears when all answered - -**File:** `frontend/tests/question-answering/17-continue-button-appears.spec.ts` - -**Steps:** - -1. Seed a task with 2 questions -2. Navigate to the app -3. Verify Continue button is NOT visible initially -4. Answer question 1 (select an option) -5. Verify Continue button is still NOT visible (q2 not answered) -6. Answer question 2 (type custom text and press Enter) -7. Verify Continue button appears -8. Verify button text is 'Continue →' -9. Verify button has term-accent-alt styling -10. Verify button is enabled - -**Expected Results:** - -- Continue button only appears after ALL questions answered -- Button is not visible when any question is unanswered -- Button appears immediately after last question is answered -- Button has green accent styling (term-accent-alt) -- Button displays 'Continue →' with arrow -- Button is in enabled state -- Cancel button also appears alongside Continue - -#### 5.2. Click Continue button to submit answers - -**File:** `frontend/tests/question-answering/18-submit-answers.spec.ts` - -**Steps:** - -1. Seed a task with 2 questions using seedTaskWaitingQuestions() -2. Navigate to the app -3. Answer question 1 with 'JWT' option -4. Answer question 2 with 'Yes' option -5. Verify Continue button is visible and enabled -6. Click Continue button -7. Verify button shows loading state ('Submitting...') -8. Verify button is disabled during submission -9. Wait for API response (task status changes to 'planning') -10. Verify task no longer shows questions UI -11. Verify task status badge changes from 'NEEDS INPUT' to 'PLANNING' -12. Verify log viewer appears for the task - -**Expected Results:** - -- Continue button changes text to 'Submitting...' -- Button is disabled during API call -- API request is made to /tasks/{taskId}/answer with answers object -- Answers payload includes: {q1: 'JWT', q2: 'Yes'} -- After successful submission, task status updates -- Questions UI is replaced with log viewer -- Task badge updates to show 'PLANNING' -- Local state (selectedAnswers, customQuestionInputs) is cleared -- No errors are shown - -#### 5.3. Submit with mix of option and custom answers - -**File:** `frontend/tests/question-answering/19-submit-mixed-answers.spec.ts` - -**Steps:** - -1. Seed a task with 3 questions -2. Navigate to the app -3. Answer q1 with option 'JWT' -4. Answer q2 with custom text 'Rate limit to 100 req/min' -5. Answer q3 with option 'Maybe' -6. Click Continue button -7. Monitor network request payload -8. Verify answers object contains all 3 answers -9. Verify q1 answer is 'JWT' (option) -10. Verify q2 answer is 'Rate limit to 100 req/min' (custom) -11. Verify q3 answer is 'Maybe' (option) -12. Verify task transitions to planning status - -**Expected Results:** - -- All three answers are included in submission -- Both option selections and custom text are sent -- Answer format is correct: {q1: 'JWT', q2: 'Rate limit to 100 req/min', q3: 'Maybe'} -- API accepts the mixed answer types -- Task status successfully updates -- No data loss between different answer types - -#### 5.4. Continue button disabled during submission - -**File:** `frontend/tests/question-answering/20-button-disabled-during-submit.spec.ts` - -**Steps:** - -1. Seed a task with 1 question -2. Navigate to the app -3. Answer the question -4. Click Continue button -5. Immediately verify button is disabled -6. Verify button text shows 'Submitting...' -7. Verify user cannot click button again -8. Wait for submission to complete -9. Verify task transitions away from questions UI - -**Expected Results:** - -- Button becomes disabled immediately on click -- Button text changes to 'Submitting...' for user feedback -- Multiple clicks are prevented (no duplicate submissions) -- Disabled state persists until API response -- After response, UI transitions to next state -- No double submission occurs - -#### 5.5. Cancel button appears with Continue - -**File:** `frontend/tests/question-answering/21-cancel-button.spec.ts` - -**Steps:** - -1. Seed a task with 2 questions -2. Navigate to the app -3. Answer both questions -4. Verify Continue button appears -5. Verify Cancel button appears alongside Continue -6. Verify Cancel button has muted styling (text-term-fg-muted) -7. Verify Cancel button has error hover (hover:text-term-error) -8. Click Cancel button -9. Verify confirmation dialog may appear (browser confirm) -10. If confirmed, verify task is cancelled - -**Expected Results:** - -- Cancel button is visible when all questions answered -- Cancel button has subtle default styling -- Cancel button shows error color on hover -- Clicking Cancel may show confirmation -- If confirmed, task transitions to cancelled state -- User has option to abort question flow - -### 6. Error Handling - -**Seed:** `frontend/tests/fixtures/seed-data.ts` - -#### 6.1. Handle submission error gracefully - -**File:** `frontend/tests/question-answering/22-handle-submission-error.spec.ts` - -**Steps:** - -1. Seed a task with 1 question -2. Navigate to the app -3. Answer the question -4. Intercept the API call to force it to fail with 500 error -5. Click Continue button -6. Wait for error -7. Verify error is logged to console -8. Verify button returns to enabled state -9. Verify button text returns to 'Continue →' -10. Verify question answers are still preserved -11. Verify user can retry submission - -**Expected Results:** - -- Error is caught and logged -- Button state resets after error -- Loading state ends -- User answers are not lost -- Questions remain in answered state -- Continue button is clickable again -- User can attempt resubmission -- No data loss occurs - -#### 6.2. Handle network timeout during submission - -**File:** `frontend/tests/question-answering/23-handle-network-timeout.spec.ts` - -**Steps:** - -1. Seed a task with 2 questions -2. Navigate to the app -3. Answer both questions -4. Intercept API call to delay response beyond timeout -5. Click Continue button -6. Wait for timeout to occur -7. Verify loading state eventually ends -8. Verify answers are preserved -9. Verify user can retry - -**Expected Results:** - -- Timeout is handled gracefully -- Loading state does not persist indefinitely -- User gets feedback about the failure -- Answers remain in UI -- Retry is possible -- No UI freeze or hang - -#### 6.3. Validation - prevent submission with unanswered questions - -**File:** `frontend/tests/question-answering/24-prevent-incomplete-submission.spec.ts` - -**Steps:** - -1. Seed a task with 3 questions -2. Navigate to the app -3. Answer only questions 1 and 2 -4. Verify Continue button is NOT visible -5. Verify question 3 is active and unanswered -6. Attempt to trigger submission programmatically (if possible) -7. Verify submission does not occur -8. Answer question 3 -9. Verify Continue button now appears -10. Verify submission is now possible - -**Expected Results:** - -- Continue button only shows when all questions answered -- Cannot submit with incomplete answers -- UI prevents premature submission -- Last question must be answered -- Button appears only after completion -- Validation is enforced client-side - -### 7. Real-time Updates and State Management - -**Seed:** `frontend/tests/fixtures/seed-data.ts` - -#### 7.1. Task auto-expands on page load when needing input - -**File:** `frontend/tests/question-answering/25-auto-expand-on-load.spec.ts` - -**Steps:** - -1. Seed a task in waiting_questions status -2. Navigate to the app -3. Wait for app to load -4. Verify task is automatically expanded (without user click) -5. Verify first question is visible -6. Verify task is in the expandedTaskIds set -7. Verify task is also in autoExpandedTaskIds set (tracked separately) - -**Expected Results:** - -- Task needing input auto-expands on page load -- User doesn't need to click to see questions -- First question is immediately visible -- Auto-expansion is tracked separately (autoExpandedTaskIds) -- This allows user to manually collapse if desired -- Task won't re-expand if user manually collapsed it - -#### 7.2. User can manually collapse auto-expanded task - -**File:** `frontend/tests/question-answering/26-manual-collapse.spec.ts` - -**Steps:** - -1. Seed a task with questions -2. Navigate to the app -3. Verify task is auto-expanded -4. Click on the task header to collapse it -5. Verify task collapses (questions hidden) -6. Verify task is removed from expandedTaskIds -7. Reload the page -8. Verify task does NOT auto-expand again (respects user preference in session) - -**Expected Results:** - -- Auto-expanded task can be manually collapsed -- Clicking header toggles expansion -- Collapsed state is maintained -- Task stays collapsed after reload (in same session) -- User preference is respected -- Auto-expansion only happens once per task - -#### 7.3. State persists when navigating away and back - -**File:** `frontend/tests/question-answering/27-state-persistence.spec.ts` - -**Steps:** - -1. Seed a task with 3 questions -2. Navigate to the app -3. Answer question 1 and question 2 -4. Leave question 3 unanswered -5. Navigate to a different page or refresh -6. Navigate back to the app -7. Verify questions 1 and 2 answers are LOST (client-side state) -8. Verify all questions are back to unanswered state -9. This confirms state is NOT persisted to backend until submission - -**Expected Results:** - -- Partial answers are NOT saved to backend -- Client-side state is lost on refresh -- User must complete all questions in one session -- This is expected behavior (no auto-save) -- Submission is atomic (all or nothing) - -#### 7.4. Multiple tasks with questions can coexist - -**File:** `frontend/tests/question-answering/28-multiple-tasks.spec.ts` - -**Steps:** - -1. Seed two different tasks, both in waiting_questions status -2. Navigate to the app -3. Verify both tasks appear in inbox -4. Verify both show NEEDS INPUT badges -5. Expand first task, answer its questions -6. Click Continue on first task -7. Verify first task transitions away -8. Verify second task remains with questions -9. Answer second task's questions -10. Verify both tasks can be processed independently - -**Expected Results:** - -- Multiple tasks with questions can exist simultaneously -- Each task maintains its own state -- Answering one task doesn't affect another -- State is isolated per task ID -- Both tasks can be submitted independently -- No cross-contamination of answers - -### 8. Accessibility and Keyboard Navigation - -**Seed:** `frontend/tests/fixtures/seed-data.ts` - -#### 8.1. Custom input receives auto-focus - -**File:** `frontend/tests/question-answering/29-input-autofocus.spec.ts` - -**Steps:** - -1. Seed a task with 1 question -2. Navigate to the app -3. Wait for task to auto-expand -4. Verify the custom text input has focus -5. Type text without clicking the input first -6. Verify text appears (confirming focus worked) - -**Expected Results:** - -- Input field automatically receives focus -- User can start typing immediately -- No click required to focus input -- Autofocus uses the autofocus action (50ms delay) -- Focus is set after DOM is ready - -#### 8.2. Enter key submits custom answer and advances - -**File:** `frontend/tests/question-answering/30-enter-key-advances.spec.ts` - -**Steps:** - -1. Seed a task with 2 questions -2. Navigate to the app -3. Type custom text in question 1 input -4. Press Enter key -5. Verify question 1 collapses -6. Verify question 2 expands and its input receives focus -7. Type text in question 2 input -8. Press Enter key -9. Verify question 2 collapses -10. Verify Continue button appears - -**Expected Results:** - -- Enter key acts as submit for current question -- Question advances to next on Enter -- Focus moves to next question's input -- Keyboard-only workflow is smooth -- No mouse required to complete questions -- Enter on last question shows Continue button - -#### 8.3. Tab key navigation works correctly - -**File:** `frontend/tests/question-answering/31-tab-navigation.spec.ts` - -**Steps:** - -1. Seed a task with 1 question with 3 options -2. Navigate to the app -3. Press Tab key multiple times -4. Verify focus moves through: option buttons, custom input, Cancel button -5. Verify option buttons have tabindex (clickable via Enter when focused) -6. Test that option buttons are NOT focusable via Tab (tabindex={-1}) -7. Verify custom input IS focusable via Tab - -**Expected Results:** - -- Tab navigation follows logical order -- Option buttons have tabindex={-1} (not in tab order) -- Custom input is in tab order -- Focus indicators are visible -- Keyboard navigation is intuitive -- Option buttons excluded from tab order to streamline keyboard flow - -#### 8.4. Keyboard event propagation handled correctly - -**File:** `frontend/tests/question-answering/32-event-propagation.spec.ts` - -**Steps:** - -1. Seed a task with 1 question -2. Navigate to the app -3. Focus the custom input field -4. Press various keys (arrow keys, space, etc.) -5. Verify events don't bubble up to parent handlers -6. Verify input field captures keydown events -7. Type text and press Enter -8. Verify Enter is handled by input's onkeydown, not parent - -**Expected Results:** - -- Input field stops event propagation (e.stopPropagation()) -- Parent keydown handlers don't interfere -- Typing works normally without side effects -- Enter key works as expected (advances question) -- No unintended behavior from event bubbling diff --git a/frontend/tests/question-answering/01-display-needs-input-badge.spec.ts b/frontend/tests/question-answering/01-display-needs-input-badge.spec.ts deleted file mode 100644 index 10a3cd3..0000000 --- a/frontend/tests/question-answering/01-display-needs-input-badge.spec.ts +++ /dev/null @@ -1,27 +0,0 @@ -// spec: frontend/tests/question-answering.plan.md -// seed: frontend/tests/fixtures/seed-data.ts - -import { test, expect } from '../fixtures'; -import { seedTaskWaitingQuestions } from '../fixtures/seed-data'; - -/** - * QUESTION ANSWERING FLOW - Display task with NEEDS INPUT badge - * - * NOTE: The inbox UI was simplified. Questions now show as queue items - * with title "Answer Questions" and raw content. Interactive option - * buttons were removed. - */ - -test.describe('Question Viewing and Display', () => { - test('Display task with NEEDS INPUT badge', async ({ appPage: page, userId }) => { - // Seed and reload to pick up the new task - await seedTaskWaitingQuestions(page, userId); - await page.reload(); - - // Question item appears in inbox with title - await expect(page.getByText('Answer Questions')).toBeVisible({ timeout: 10000 }); - - // Question content is visible (shown as text) - await expect(page.getByText('Which authentication method')).toBeVisible(); - }); -}); diff --git a/frontend/tests/question-answering/05-select-option-advance.spec.ts b/frontend/tests/question-answering/05-select-option-advance.spec.ts deleted file mode 100644 index 6de261f..0000000 --- a/frontend/tests/question-answering/05-select-option-advance.spec.ts +++ /dev/null @@ -1,23 +0,0 @@ -// spec: frontend/tests/question-answering.plan.md -// seed: frontend/tests/fixtures/seed-data.ts - -import { test, expect } from '../fixtures'; -import { seedTaskWaitingQuestions } from '../fixtures/seed-data'; - -/** - * QUESTION ANSWERING FLOW - Select option and auto-advance - * - * Skip: Interactive option buttons were removed from inbox UI. - * Questions now display as text content only. - */ - -test.describe('Answering Questions with Options', () => { - test.skip('Select option and auto-advance to next question', async ({ - appPage: page, - userId - }) => { - await seedTaskWaitingQuestions(page, userId); - await page.reload(); - await expect(page.getByText('Answer Questions')).toBeVisible({ timeout: 10000 }); - }); -}); diff --git a/frontend/tests/question-answering/08-custom-answer-enter.spec.ts b/frontend/tests/question-answering/08-custom-answer-enter.spec.ts deleted file mode 100644 index b823d92..0000000 --- a/frontend/tests/question-answering/08-custom-answer-enter.spec.ts +++ /dev/null @@ -1,20 +0,0 @@ -// spec: frontend/tests/question-answering.plan.md -// seed: frontend/tests/fixtures/seed-data.ts - -import { test, expect } from '../fixtures'; -import { seedTaskWaitingQuestions } from '../fixtures/seed-data'; - -/** - * QUESTION ANSWERING FLOW - Custom text answer with Enter submission - * - * Skip: Interactive question input UI was removed from inbox. - * Questions now display as text content only. - */ - -test.describe('Custom Text Answers', () => { - test.skip('Type custom answer and submit with Enter', async ({ appPage: page, userId }) => { - await seedTaskWaitingQuestions(page, userId); - await page.reload(); - await expect(page.getByText('Answer Questions')).toBeVisible({ timeout: 10000 }); - }); -}); diff --git a/frontend/tests/question-answering/18-submit-answers.spec.ts b/frontend/tests/question-answering/18-submit-answers.spec.ts deleted file mode 100644 index 40315e1..0000000 --- a/frontend/tests/question-answering/18-submit-answers.spec.ts +++ /dev/null @@ -1,20 +0,0 @@ -// spec: frontend/tests/question-answering.plan.md -// seed: frontend/tests/fixtures/seed-data.ts - -import { test, expect } from '../fixtures'; -import { seedTaskWaitingQuestions } from '../fixtures/seed-data'; - -/** - * QUESTION ANSWERING FLOW - Submit answers - * - * Skip: Interactive question submission UI was removed from inbox. - * Questions now display as text content only. - */ - -test.describe('Submitting Answers', () => { - test.skip('Click Continue button to submit answers', async ({ appPage: page, userId }) => { - await seedTaskWaitingQuestions(page, userId); - await page.reload(); - await expect(page.getByText('Answer Questions')).toBeVisible({ timeout: 10000 }); - }); -}); From 0dda02b7c5652fdbd495d2ad1c806e8d0e707b67 Mon Sep 17 00:00:00 2001 From: James Olds <12104969+oldsj@users.noreply.github.com> Date: Thu, 15 Jan 2026 12:04:11 -0500 Subject: [PATCH 4/5] =?UTF-8?q?Add=20validate-docs=20skill=20for=20README?= =?UTF-8?q?=20=E2=86=92=20spec=20=E2=86=92=20test=20coverage?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Skill validates that all README features are documented in specs and covered by tests. Outputs a coverage report with gaps. Co-Authored-By: Claude Opus 4.5 --- .claude/skills/validate-docs/SKILL.md | 137 ++++++++++++++++++++++++++ 1 file changed, 137 insertions(+) create mode 100644 .claude/skills/validate-docs/SKILL.md diff --git a/.claude/skills/validate-docs/SKILL.md b/.claude/skills/validate-docs/SKILL.md new file mode 100644 index 0000000..3ef73b9 --- /dev/null +++ b/.claude/skills/validate-docs/SKILL.md @@ -0,0 +1,137 @@ +--- +name: validate-docs +description: Validate README features are documented in specs and covered by e2e tests. Use when checking documentation coverage or before merging docs changes. +--- + +# Validate Documentation + +Validate that ALL features claimed in README are documented in specs and tested. + +## Philosophy + +The README is the product's promise to users. Every feature advertised must be: + +1. Specified in `docs/specs/` (source of truth for behavior) +2. Tested in `frontend/tests/` (proof it works) + +## Workflow + +### Step 1: Extract ALL features from README + +Read README.md and identify every feature claim. Look in: + +- **How It Works** section - core functionality +- **UI** section - user-facing features +- **Agent Workflow** section - automation features +- Any diagrams, bullet points, or descriptions that promise functionality + +For each feature, note: + +- What it claims to do +- Whether it has a linked spec (some won't) + +### Step 2: Map features to specs + +Check if each README feature has a corresponding spec: + +| README Feature | Expected Spec | +| ------------------------------------------ | ------------------------------------------------ | +| Main thread conversation | `docs/specs/chat.md` | +| Sessions/background work | `docs/specs/sessions.md` | +| Mobile/desktop layout | `docs/specs/layout.md` | +| Agent workflow (spawn → work → PR → close) | `docs/specs/agent-workflow.md` | +| Notifications | `docs/specs/sessions.md` (notifications section) | + +**Flag any README feature without a spec as ERROR.** + +### Step 3: Verify specs have testable assertions + +For each spec file: + +- Check it contains specific, testable statements +- Assertions should mention exact UI text, behaviors, or states +- Vague specs like "works well" are not testable + +### Step 4: Map spec assertions to tests + +For each bullet point/assertion in a spec, search for a test that verifies it: + +```bash +# Example: search for test covering "No sessions yet" message +grep -r "No sessions yet" frontend/tests/ +``` + +Track coverage for each spec assertion. + +### Step 5: Output report + +```markdown +## Documentation Validation Report + +### README Features → Specs + +- [x] Main thread conversation → docs/specs/chat.md +- [x] Sessions → docs/specs/sessions.md +- [x] Layout → docs/specs/layout.md +- [ ] **Agent Workflow → NO SPEC** ← ERROR + +### Spec Assertions → Tests + +#### docs/specs/chat.md (7/9 = 78%) + +- [x] "Input field with placeholder" → e2e/user-journey.spec.ts +- [ ] "Messages persist across reloads" → NO TEST + +#### docs/specs/sessions.md (23/23 = 100%) + +- [x] "No sessions yet" → sessions/01-session-list-empty.spec.ts + ... + +### Summary + +| Category | Coverage | +| -------------------------- | ----------- | +| README features with specs | 3/4 (75%) | +| Spec assertions with tests | 38/40 (95%) | + +### Errors + +1. README "Agent Workflow" section has no spec +2. chat.md "Messages persist across reloads" has no test +``` + +### Step 6: Exit status + +- **ERROR** if any README feature lacks a spec +- **ERROR** if any spec assertion lacks test coverage +- **SUCCESS** only if fully covered + +## Common Gaps to Watch For + +1. **Advertised but unspecified** - README promises feature, no spec exists +2. **Specified but untested** - Spec describes behavior, no test verifies it +3. **Workflow features** - Multi-step flows (like agent lifecycle) often lack e2e coverage + +## Tests to Flag for Removal + +Not all tests are good tests. Flag these as problems: + +1. **Tests without specs** - If a test exists but no spec describes the behavior, either: + - The spec is missing (add it), or + - The test is testing implementation details (remove it) + +2. **Too technical / low-level** - Tests should verify user-facing behavior, not internals: + - ❌ "store updates when API returns data" + - ❌ "component re-renders on state change" + - ✅ "user sees updated session status after refresh" + +3. **Testing implementation, not behavior** - Tests coupled to code structure: + - ❌ "calls fetchSessions with correct params" + - ❌ "dispatches ACTION_TYPE to store" + - ✅ "sessions list shows new session after spawning" + +4. **Duplicate coverage** - Multiple tests verifying the same user behavior + +5. **Tests for removed features** - Spec was removed but test remains + +The test suite should read like a user manual, not a code audit. From 7903f661b1c40b8bef4c046ff122b3c24fe4527b Mon Sep 17 00:00:00 2001 From: James Olds <12104969+oldsj@users.noreply.github.com> Date: Thu, 15 Jan 2026 12:04:52 -0500 Subject: [PATCH 5/5] Format markdown files Co-Authored-By: Claude Opus 4.5 --- README.md | 2 ++ ROADMAP.md | 2 ++ docs/specs/chat.md | 1 + docs/specs/sessions.md | 16 ++++++++++------ 4 files changed, 15 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index 0c4783d..ce33282 100644 --- a/README.md +++ b/README.md @@ -122,11 +122,13 @@ You stay in main thread, checking in on agents and spawning new ones as needed. ## Documentation **Specs** (source of truth for app behavior): + - [Chat](docs/specs/chat.md) - Main thread conversation - [Sessions](docs/specs/sessions.md) - Background work and status - [Layout](docs/specs/layout.md) - Mobile and desktop views **Guides**: + - [Architecture](docs/architecture.md) - System design and data flow - [Development](docs/development.md) - Local setup and commands - [Contributing](CONTRIBUTING.md) - How to contribute diff --git a/ROADMAP.md b/ROADMAP.md index 901950f..a29c5c1 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -5,6 +5,7 @@ Future ideas and features under consideration. ## CI Loop Automation Agent automatically iterates on CI failures: + - Poll GitHub Actions after each push - On failure: analyze logs, fix, commit - Continue until green checkmark @@ -12,6 +13,7 @@ Agent automatically iterates on CI failures: ## GitHub Issue Planning Agent creates/updates issues before coding: + - Problem analysis and proposed approach - Implementation plan as "thinking out loud" space - Links PR back to issue for context diff --git a/docs/specs/chat.md b/docs/specs/chat.md index cb388ee..ff1a79a 100644 --- a/docs/specs/chat.md +++ b/docs/specs/chat.md @@ -18,6 +18,7 @@ The main thread is a continuous conversation with Claude that persists across de ## Spawning Sessions From the main thread, you can ask Claude to spawn sessions: + - Sessions appear as colored thread blocks in the timeline - Session messages surface as thread notifications - Click to expand inline or zoom to fullscreen view diff --git a/docs/specs/sessions.md b/docs/specs/sessions.md index a651428..56c8e7c 100644 --- a/docs/specs/sessions.md +++ b/docs/specs/sessions.md @@ -7,29 +7,32 @@ Sessions are background AI work spawned from the main thread. Each session has i Desktop shows sessions in a sidebar. Mobile shows sessions in a tab. When no sessions exist: + - Shows empty state with "No sessions yet" message - Shows hint: "Sessions appear when Claude spawns background work" When sessions exist: + - Each session shows title and status badge - Active count shown in header (e.g., "2 active") - Clicking a session opens its detail view ## Status Badges -| Status | Badge | Meaning | -|--------|-------|---------| -| pending | PENDING | Queued, not started | -| active | ACTIVE | Currently running | +| Status | Badge | Meaning | +| --------------- | ----------- | ------------------------ | +| pending | PENDING | Queued, not started | +| active | ACTIVE | Currently running | | waiting_on_user | NEEDS INPUT | Blocked on user response | -| completed | DONE | Finished successfully | -| failed | FAILED | Error occurred | +| completed | DONE | Finished successfully | +| failed | FAILED | Error occurred | Failed sessions show error message below the badge. ## Session Detail View Clicking a session navigates to `/sessions/{id}`: + - Shows title as h1 heading - Shows description if present - Has Chat and Logs tabs (Chat tab active by default) @@ -42,5 +45,6 @@ Clicking a session navigates to `/sessions/{id}`: ## Notifications When a session needs attention: + - Toast notification appears with title and preview - Clicking notification navigates to that session's detail view