🤖 feat: stream agent_report previews and delay task cleanup #1306

ThomasK33 · 2025-12-23T17:53:32Z

Summary

Stream agent_report tool args from tool-call-delta so sub-agent workspaces show a live, in-flight report preview.
Add a dedicated agent_report tool renderer with a compaction-like fade preview while executing.
Delay auto-deleting reported leaf task workspaces by 5s (and disable input + show a banner) so the final output is visible before cleanup.

Testing

make static-check
bun test src/node/services/taskService.test.ts
bun test src/common/utils/tools/agentReportArgsText.test.ts

📋 Implementation Plan

Live-stream sub-agent `agent_report` output in UI

Problem

When a sub-agent calls the agent_report tool, the UI buffers the entire tool call.
The sub-agent workspace often disappears immediately after completion, so the user sees a “stuck” workspace with no visible progress.

Goal

Make report generation feel alive by streaming partial report text into the sub-agent workspace UI while the agent_report tool call is in-flight.
- (Optional follow-up) Mirror the same stream into the parent workspace’s task UI so you don’t need to click into the child workspace.
Preserve the final, full report for history/traceability.
UX should resemble the existing “compaction” streaming experience (live preview + fade/scroll treatment).

Non-goals

Changing the semantics of agent_report (it is still one logical tool call that completes once).
Reworking the entire sub-agent lifecycle / scheduling.

Recommended approach: stream `agent_report` from `tool-call-delta` (frontend-first)

Why this works: the backend already emits tool-call-delta events containing the model’s incremental argsTextDelta, but the renderer currently ignores them for display. We can reconstruct a “live” preview from those deltas and render it with the same fade/preview affordances as compaction.

1) Add a tiny incremental extractor for `reportMarkdown`

Add a small pure helper in common code (so it can be unit-tested easily):
- Input: accumulated argsText (string)
- Output: { reportMarkdown: string | null, title: string | null } where values are best-effort and may be incomplete.
Implementation detail: scan for JSON key(s) ("reportMarkdown", optionally "title") and decode a partial JSON string (handles escapes like \\n, \\", and tolerates truncated escape sequences).

Net new LoC (product): ~80–140

2) Teach `StreamingMessageAggregator` to materialize/patch an in-flight tool part

Files/symbols:

src/browser/utils/messages/StreamingMessageAggregator.ts
- handleToolCallDelta() (currently token-tracking only)
- handleToolCallStart() (currently skips duplicates)

Changes:

Maintain runtime-only state (not persisted) such as:
- pendingToolArgsTextByToolCallId: Map<string, string>
- toolCallIdsWithDeltas: Set<string> (to avoid double-counting tokens at start)
In handleToolCallDelta():
- If data.toolName !== "agent_report", keep current behavior (no UI changes).
- Coerce data.delta → string; if empty, bail.
- Ensure a tool part exists even before tool-call-start:
  - If the message has no matching dynamic-tool part yet, create one with:
    - state: "input-available"
    - timestamp: data.timestamp (so it appears in-order)
    - input: { reportMarkdown: "" } (placeholder)
  - Also start timing (pendingToolStarts.set(toolCallId, timestamp)) so the tool duration includes “report writing”.
- Append delta into pendingToolArgsTextByToolCallId.
- Run the extractor and patch the tool part input to { reportMarkdown, title? } (best-effort).
- invalidateCache() so the tool UI updates live.
In handleToolCallStart():
- If a tool part already exists (created from deltas), update its input with data.args instead of skipping.
- Avoid double-counting tool-input tokens:
  - If toolCallIdsWithDeltas contains this toolCallId, skip trackDelta(..., "tool-args") for the start event.

Net new LoC (product): ~60–120

3) Render a dedicated `AgentReportToolCall` with “compaction-like” preview

Files:

Add src/browser/components/tools/AgentReportToolCall.tsx
Update src/browser/components/Messages/ToolMessage.tsx to route toolName === "agent_report" to the new component.

UI behavior:

Default expanded while status === "executing".
While executing:
- Show a live-updating preview of args.reportMarkdown.
- Wrap preview in CompactingMessageContent (max-height + fade) to mimic compaction.
- Render as plain text (recommended for perf) or MarkdownCore in streaming mode if we want full fidelity.
When completed:
- Render the final markdown normally (static render).
- Show title if present (fallback: first # Heading in markdown).

Net new LoC (product): ~120–220

4) Tests + validation

Unit tests (pure):
- New test file beside the extractor (e.g. *.test.ts).
- Cases:
  - chunk boundaries split inside \\n, \\uXXXX, and \\"
  - reportMarkdown appears after other fields
  - missing/invalid JSON → no crash, just null
(Optional) small unit test for StreamingMessageAggregator:
- Feed a sequence of stream-start → tool-call-delta (agent_report) events and assert getDisplayedMessages() contains a tool message whose args update.

Manual QA:

Spawn a sub-agent that writes a long report; confirm:
- report preview streams in child workspace
- tool duration looks non-zero
- once complete, parent still receives the final report and auto-cleanup still works

Required UX fix: delay auto-delete of reported task workspaces (5s)

Background: TaskService.cleanupReportedLeafTask() currently auto-deletes leaf task workspaces immediately once they reach taskStatus: "reported". That makes the workspace appear “stuck” and then suddenly vanish.

5) Backend: schedule deletion without blocking report propagation

Files/symbols:

src/node/services/taskService.ts
- finalizeAgentTaskReport() (must still deliver report immediately)
- cleanupReportedLeafTask() (currently deletes immediately)

Plan:

Do not change the report delivery order:
- Keep deliverReportToParent() + resolveWaiters() happening before any deletion work.
Change cleanup so the first call schedules deletion and returns immediately:
- Add an in-memory guard like scheduledTaskWorkspaceDeletes: Map<string, NodeJS.Timeout>.
- In cleanupReportedLeafTask(workspaceId), if the workspace is eligible for deletion:
  - setTimeout(() => void this.cleanupReportedLeafTask(workspaceId, { skipDelay: true }), 5000)
  - return without deleting anything.
In the delayed pass (skipDelay: true), run the existing leaf-first deletion loop:
- This lets us “perform all the deletes” (the leaf + any newly-leaf reported ancestors) in one go.

Defensive notes:

Timer firing should be a best-effort no-op if the workspace is already gone.
Re-check eligibility at delete time (status still reported, no children).

Net new LoC (product): ~40–80

6) Frontend: disable input + show “will delete in 5 seconds” message

Files:

src/browser/components/AIView.tsx
src/browser/components/ChatInput/index.tsx (likely no changes; reuse disabledReason)

Plan:

Detect the state:
- isReportedAgentTask = Boolean(meta?.parentWorkspaceId) && meta?.taskStatus === "reported"
When isReportedAgentTask:
- Disable the ChatInput (disabled={... || isReportedAgentTask})
- Provide disabledReason="Completed — this agent task workspace will be deleted in 5 seconds."
- Render a small banner near the input mirroring the same message (so it’s visible even if the placeholder is missed).
Confirm the parent still gets the agent_report result immediately:
- The 5s delay is only for workspaceService.remove(...); it must not block the tool call’s resolution.

Net new LoC (product): ~20–60

Optional follow-ups

A) Mirror the streaming preview into the parent workspace

Forward agent_report deltas from child → parent (new event plumbing).
Likely best implemented as a new UI-only event type (similar to bash-output) rather than overloading tool-call-delta.

Alternatives considered

Backend emits a new agent-report-output event

Pro: parser lives in one place; parent mirroring becomes easier.
Con: new ORPC schema + event type + backend state; more moving parts.

Prompt-only workaround: stream report as assistant text, then call agent_report

Pro: almost no UI work.
Con: duplicates the report in history and increases token usage; still doesn’t fix tool-call buffering in general.

Generated with mux • Model: openai:gpt-5.2 • Thinking: xhigh

Change-Id: I520925e1dc97aa8e35baaa84bc2da70e604c1edf Signed-off-by: Thomas Kosiewski <[email protected]>

Makes the Code Review "diff refresh" mechanism robust and predictable, driven by file-modifying tool-call events (`bash`, `file_edit_*`) rather than polling or filesystem watchers. ## Key Changes ### RefreshController improvements - **Coalescing rate-limit** instead of resetting debounce — prevents starvation when tool events are frequent - **Manual refresh always works** — `requestImmediate()` bypasses pause, hidden, and min-interval checks - **Loop prevention** — 500ms minimum interval between refreshes (scheduled events only, not manual) - **Trigger tracking** — `LastRefreshInfo` records what caused each refresh ### Review panel refresh - **Non-blocking manual refresh** — immediately bumps `refreshTrigger`, origin fetch runs in background - **Origin fetch deduplication** — file tree and diff loads share the same `git fetch` call - **Refresh tooltip** — shows "Last: 2m ago via manual" for debugging ### GitStatusStore - **Active workspace priority** — 1s debounce vs 3s for background workspaces - **File modification subscription** — refreshes on tool completion events ## Bug Fixes - Fixed storybook test failure: `requestImmediate()` was blocked by MIN_REFRESH_INTERVAL when components subscribed immediately after `syncWorkspaces()` — now bypasses interval check for manual/immediate requests Signed-off-by: Thomas Kosiewski <[email protected]> --- _Generated with `mux` • Model: `anthropic:claude-opus-4-5` • Thinking: `high`_

Change-Id: I8aef86611e4dda560cca7b2e727b081570c2373a Signed-off-by: Thomas Kosiewski <[email protected]>

Change-Id: I4db05b84cbb17d2046c853957f685ad9dc7a59d7 Signed-off-by: Thomas Kosiewski <[email protected]>

ThomasK33 and others added 4 commits December 23, 2025 18:46

🤖 feat: stream agent_report previews and delay task cleanup

5eff3c6

Change-Id: I520925e1dc97aa8e35baaa84bc2da70e604c1edf Signed-off-by: Thomas Kosiewski <[email protected]>

🤖 fix: delay removing reported task workspaces in UI

6c8971b

Change-Id: I8aef86611e4dda560cca7b2e727b081570c2373a Signed-off-by: Thomas Kosiewski <[email protected]>

🤖 tests: cover agent_report tool-call-delta preview

c9cc382

Change-Id: I4db05b84cbb17d2046c853957f685ad9dc7a59d7 Signed-off-by: Thomas Kosiewski <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🤖 feat: stream agent_report previews and delay task cleanup #1306

🤖 feat: stream agent_report previews and delay task cleanup #1306

ThomasK33 commented Dec 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

🤖 feat: stream agent_report previews and delay task cleanup #1306

Are you sure you want to change the base?

🤖 feat: stream agent_report previews and delay task cleanup #1306

Conversation

ThomasK33 commented Dec 23, 2025

Summary

Testing

Live-stream sub-agent agent_report output in UI

Problem

Goal

Non-goals

Recommended approach: stream agent_report from tool-call-delta (frontend-first)

1) Add a tiny incremental extractor for reportMarkdown

2) Teach StreamingMessageAggregator to materialize/patch an in-flight tool part

3) Render a dedicated AgentReportToolCall with “compaction-like” preview

4) Tests + validation

Required UX fix: delay auto-delete of reported task workspaces (5s)

5) Backend: schedule deletion without blocking report propagation

6) Frontend: disable input + show “will delete in 5 seconds” message

Optional follow-ups

A) Mirror the streaming preview into the parent workspace

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Live-stream sub-agent `agent_report` output in UI

Recommended approach: stream `agent_report` from `tool-call-delta` (frontend-first)

1) Add a tiny incremental extractor for `reportMarkdown`

2) Teach `StreamingMessageAggregator` to materialize/patch an in-flight tool part

3) Render a dedicated `AgentReportToolCall` with “compaction-like” preview