CampaignForge AI — The Journey
Chapter 3: The First Local Run, the First Trace, and the First Agent Lie
Status: Raw draft for Content Publisher Agent (11) to format Date: 2026-05-05 Author: Tim Simeonov (founder) + Codex (implementation reviewer)
Where Chapter 2 Left Off
Chapter 2 ended with the right architectural direction but not yet a trusted running system.
AWS was frozen. LangGraph was selected. ADR-002 described a local-first rebuild with SQLite checkpointing, interrupt() approval gates, typed Pydantic contracts, and the same 11-agent campaign lifecycle from PRD-001.
The promise was simple: a founder should be able to run the chain from a laptop with one command and one API key.
So we tried.
.venv/bin/python campaignforge.py --brief "Launch a Meta Ads campaign for CampaignForge AI targeting performance marketers at Series B SaaS companies. Budget: $500/month. Goal: generate qualified demo requests under $150 CAC."
The first real local pipeline ID was:
1535c1cd-b5f3-4482-a5a0-3de36b819e52
And the first important result was that it worked.
Agent 01 called Claude, generated PRD-001, wrote an audit record, persisted the LangGraph checkpoint to SQLite, and paused cleanly at GATE-1: PRD Sign-off.
That was the first proof that the local-first architecture was not just a document.
The First Surprise: LangSmith Was Trying to Phone Home
The run also printed this:
Failed to multipart ingest runs: langsmith.utils.LangSmithError
403 Client Error: Forbidden for https://api.smith.langchain.com/runs/multipart
The chain itself was not failing. Claude had responded. Agent 01 had completed. The graph was paused correctly at Gate 1.
The failure was LangSmith tracing.
The local .env had LangChain tracing enabled:
LANGCHAIN_TRACING_V2=true
LANGCHAIN_API_KEY=...
LANGCHAIN_PROJECT=campaignforge
LangGraph noticed those environment variables and automatically tried to upload traces to LangSmith Cloud. The API key was invalid or unauthorized, so the upload failed with 403 Forbidden.
That raised the right product question:
Why would a local-first first phase depend on a cloud tracing service?
It shouldn't.
LangSmith is useful later. It gives hosted graph traces, prompt visibility, latency, error inspection, and eval workflows. But in this phase it creates a cloud dependency and may send run metadata and prompt content outside the local machine. That violates the spirit of the rebuild.
So the decision was:
LANGCHAIN_TRACING_V2=false
LANGSMITH_TRACING=false
LangSmith becomes optional. Local tracing becomes default.
Why Not Run LangSmith Locally?
The next question was obvious:
Is there a local version of LangSmith tracing I can run?
There is a self-hosted LangSmith path, but it is not a simple free local clone. It is an enterprise-style deployment with additional infrastructure. It can be run in a self-hosted environment, but that is the wrong tradeoff for the first phase of a founder-replicable local system.
There is also local LangGraph development tooling. A local LangGraph dev server can help inspect and debug graph behavior, and LangGraph Studio can connect to local graphs. But that still does not replace the need for a plain local audit trail that lives with the project and can be inspected without another service.
The decision was pragmatic:
- keep LangSmith cloud tracing disabled by default
- do not introduce self-hosted LangSmith as a dependency
- keep local graph observability in the repository itself
- use simple append-only JSONL traces as the first observability layer
The principle is the same as the AWS pivot: if a founder needs another platform just to understand the system, the local-first promise is already compromised.
Local JSONL Tracing Replaces Cloud Tracing
The next implementation step was to add a local trace layer.
The requirement was intentionally small:
> Write each node transition, gate payload, route decision, status, and error into a local runs/<pipeline_id>/trace.jsonl file. Tracing must never be able to break the campaign chain.
The implementation added src/common/tracing.py.
It writes one JSON object per line to:
runs/<pipeline_id>/trace.jsonl
The trace captures:
pipeline_startpipeline_resumepipeline_errorpipeline_invoke_completenode_startnode_completenode_errornode_interruptedroute_decisionroute_errorgate_waitinggate_decided
It also records:
pipeline_idevent_id- UTC timestamp
- node name
- route name
- route decision
- duration in milliseconds
- status
- current stage
- gates already decided
- agent outputs present
- state hash
- compact error details
Full payloads are off by default because prompts, briefs, and campaign outputs may become sensitive:
CAMPAIGNFORGE_LOCAL_TRACING=true
CAMPAIGNFORGE_TRACE_DIR=runs
CAMPAIGNFORGE_TRACE_FULL_PAYLOADS=false
The graph nodes were wrapped with trace_node(...). Conditional routers were wrapped with trace_router(...). Gate nodes received explicit gate_waiting and gate_decided events because a paused gate raises a LangGraph GraphInterrupt instead of returning like an ordinary node.
The important design choice: tracing is best effort. If the trace file cannot be written, the exception is swallowed. Observability cannot become a new failure mode.
The focused LangGraph suite passed after the change:
94 passed
A smoke test with an invalid short brief produced a local trace without making any Claude call. That trace showed the whole local flow:
pipeline_start
node_start: intake_brief
node_complete: intake_brief, status FAILED
route_decision: intake_brief.route_node_status -> error
node_start: error_halt
node_complete: error_halt
pipeline_invoke_complete
This was better than LangSmith for the current phase because it was inspectable with tail, jq, or a text editor.
Gate 1: Reviewing What Agent 01 Actually Did
The terminal gate box is deliberately short. It shows a few lines, enough to remind the operator what they are approving:
GATE-1: PRD Sign-off
Pipeline: 1535c1cd-b5f3-4482-a5a0-3de36b819e52
PRD: Launch a Meta Ads campaign for CampaignForge AI targeting...
But before approving, the operator asked the right question:
Can I see more details of what the first step accomplished before I approve?
That is exactly what a human gate is for. Approval without inspection is theater.
The checkpoint was decoded directly from campaignforge.db using LangGraph's get_state(...). The paused state showed:
status: RUNNING
current_stage: product_person
next: gate_1
audit_log_count: 2
Agent 01 had produced:
prd_id: PRD-001
status: PENDING_APPROVAL
The campaign summary was:
Launch a Meta Ads campaign for CampaignForge AI targeting performance marketers at Series B SaaS companies with a $500/month budget. The campaign aims to generate qualified demo requests by reaching key decision-makers who manage paid advertising and growth initiatives. Success is defined by driving demo conversions at a sustainable customer acquisition cost.
The target personas were:
P01 Performance Marketing Manager
Series B, team size 8, ad spend $25k/month, willingness to pay $499/month
P02 Head of Growth
Series B, team size 15, ad spend $50k/month, willingness to pay $799/month
P03 VP of Marketing
Series B, team size 25, ad spend $80k/month, willingness to pay $1,199/month
The success metrics were:
brief_to_campaign_hours_target: 2.0
brief_to_campaign_hours_max: 4.0
handoff_error_rate_max_pct: 0.1
uptime_target_pct: 99.9
The audit log showed the first real Claude cost:
intake_brief: BRIEF_VALIDATED
product_person: PRD_GENERATED
model: claude-sonnet-4-6
tokens: 575 input, 743 output
estimated cost: $0.01287
The PRD was acceptable. Gate 1 was approved.
.venv/bin/python campaignforge.py --resume 1535c1cd-b5f3-4482-a5a0-3de36b819e52 --approve
The trace then showed the exact resume path:
pipeline_resume
node_start: gate_1
gate_waiting: GATE-1
gate_decided: approved true
node_complete: gate_1
route_decision: gate_1.route_gate -> approved
node_start: architect
node_complete: architect
route_decision: architect.route_node_status -> ok
node_start: gate_2
gate_waiting: GATE-2
node_interrupted: GraphInterrupt
pipeline_invoke_complete
The local trace did its job on the very first approval.
Gate 2: The Agent Lied
Agent 02 completed and paused at GATE-2: Architecture Sign-off.
The gate summary looked reasonable:
ADR: A local-first LangGraph StateGraph pipeline orchestrates 11 specialized agent nodes across the Meta Ads campaign lifecycle...
But the full checkpoint told a different story.
Agent 02 had produced an ADR that sounded confident and structured, but it contained implementation details that were simply false.
It said the graph topology was:
START -> brief_ingestor -> prd_generator -> audience_researcher -> targeting_strategist -> creative_director -> copywriter -> visual_brief_agent...
Those nodes do not exist.
The real graph is:
START -> intake_brief -> product_person -> gate_1 -> architect -> gate_2 -> developer -> deployer -> gate_3 -> cost_analyst -> gate_4 -> strategist -> creative -> executor -> performance_analyst
Then Agent 06 routes to either:
monitoring_pause -> END
or:
content_draft -> gate_5 -> content_publish -> END
Agent 02 also claimed:
- Some nodes used
GPT-4o - Meta Audience Insights API was part of the implementation
- Gate 2 was a "Targeting approval"
- There was a compliance retry loop from
compliance_reviewerback tocopywriter - The state schema contained names from a different imagined system
None of that was true.
The actual local implementation uses Anthropic Claude through src.common.llm_client.LLMClient. Gate 2 is Architecture Sign-off. There is no compliance loop. Meta Ads API integration is deferred. Agent 06 is deliberately in mock/manual mode to prevent simulated metrics from being treated as real campaign outcomes.
The high-level architecture was directionally correct. The concrete ADR was not trustworthy.
This is the point of human gates.
Without Gate 2, Agent 03 would have consumed a false architecture and generated downstream artifacts from it. The system would have continued with a confident lie embedded in its state.
The correct response was not to approve.
.venv/bin/python campaignforge.py --resume 1535c1cd-b5f3-4482-a5a0-3de36b819e52 --reject --reason "ADR contains inaccurate graph topology, wrong provider references, and non-existent nodes."
That run should be rejected. The code should be fixed. Then a new run should start.
The Fix: Agent 02 Becomes Deterministic
The bug was not that Claude made a mistake. The bug was that Agent 02 was allowed to make one.
The Architect Agent was being asked to "design the LangGraph architecture" even though the architecture already existed in code. That gave the model room to invent plausible implementation details.
So Agent 02 was hardened.
src/nodes/architect.py no longer asks Claude to produce the architecture. It now generates ArchitectOutput deterministically from the real local implementation.
The output still conforms to the same Pydantic contract:
ArchitectOutput
adr_id
brief_id
prd_id
status
summary
runtime
graph_topology_description
service_decisions
state_schema_description
checkpointing_strategy
estimated_local_cost_usd_monthly
error
But the fields are now implementation-derived:
- actual
src.graph.build_graph()topology - actual
CampaignStatefields - actual gate names and approval semantics
- actual Anthropic provider abstraction
- actual SQLite checkpointing strategy
- actual local JSONL tracing path
- actual Agent 06 mock/manual modes
- actual rollback and error-halt routing
The new Agent 02 summary is bounded to the truth:
CampaignForge runs as a local-first LangGraph StateGraph with SQLite checkpointing, explicit human approval gates, local audit records, and optional JSONL tracing. The graph uses Anthropic-backed agents only where generation is required and deterministic local nodes where safety matters.
Its audit record now shows:
model_id: deterministic-local
tokens_input: 0
tokens_output: 0
cost_usd: 0.0
This is an important pattern: not every agent should be generative. Some agents are safer as deterministic translators from known code and known requirements into structured output.
Agent 02 is one of them.
Reconciling Local Cost With the Claude Dashboard
One subtle point came up immediately after the clean run reached Gate 2:
If Agent 02 now costs zero, why does the Claude dashboard show more usage?
The answer is that the dashboard is cumulative across the session. It includes the bad pre-fix Agent 02 call, both Agent 01 Product Person calls, and the new deterministic Agent 02 call.
The local audit log reconciled exactly with the Claude dashboard:
Old run Agent 01: 575 input, 743 output
Old run Agent 02 before fix: 601 input, 1184 output
New run Agent 01: 578 input, 740 output
New run Agent 02 after fix: 0 input, 0 output
Total:
1,754 input tokens
2,667 output tokens
That matched the Claude dashboard exactly:
Total tokens in: 1,754
Total tokens out: 2,667
So the fix did not erase past usage. It changed future behavior. From this point forward, Agent 02 generates its ADR locally and writes:
model_id: deterministic-local
tokens_input: 0
tokens_output: 0
cost_usd: 0.0
The next Claude call in the chain will be Agent 03, because the Developer Agent still uses the LLM and still needs hardening.
This is why local audit records matter. They let the operator reconcile provider dashboards with per-agent behavior instead of guessing where the spend came from.
Regression Tests Against Architecture Drift
The fix included tests, not just code.
A new tests/test_nodes/test_architect.py file asserts that Agent 02:
- emits
ADR-002 - preserves the
brief_idandprd_id - uses
LangGraph + SQLite (local-first) - records
deterministic-localin the audit log - costs zero dollars
- describes the actual graph topology
- documents local JSONL tracing
- documents Agent 06 mock/manual modes
Most importantly, the tests explicitly block the drift terms from the bad ADR:
GPT-4o
Meta Audience Insights API
brief_ingestor
audience_researcher
targeting_strategist
copywriter
visual_brief_agent
compliance_reviewer
Targeting approval
Final launch approval
If any of those strings appear in the rendered ADR output again, the test fails.
The Agent 02 prompt/spec file was also tightened:
Architecture must mirror the actual src.graph.build_graph() topology and CampaignState.
Provider must be Anthropic Claude through src.common.llm_client.LLMClient; never reference GPT-4o.
Never invent nodes, external APIs, compliance loops, or approval semantics that are not implemented.
After the fix, the focused local suite passed:
98 passed
The project now has a regression harness for a subtle but dangerous failure mode: an agent producing a plausible architecture that does not match the system.
Gate 3 Exposed the Same Pattern in Agent 03
After the deterministic Agent 02 fix, Gate 2 was approved.
That moved the graph into Agent 03.
The Developer Agent completed and reported:
developer_complete
modules=29
The audit log showed another real Claude call:
developer CODEBASE_GENERATED claude-sonnet-4-6
363 input tokens
1,760 output tokens
estimated cost: $0.027489
Then the next node crashed.
Agent 04, the local deployer, attempted to read:
langgraph.__version__
But the installed LangGraph package did not expose __version__, so the graph failed with:
AttributeError: module 'langgraph' has no attribute '__version__'
During task with name 'deployer'
This was a useful failure for two reasons.
First, local tracing captured it cleanly as a node_error and pipeline_error.
Second, it forced another look at Agent 03. The deployer bug was real, but the deeper problem was that Agent 03 had the same trust issue Agent 02 just had: it asked Claude to confirm that the codebase was complete, then overwrote the output to mark every module and test as passing.
That is not verification.
It is a claim.
So Agent 03 was hardened too.
src/nodes/developer.py no longer asks Claude to confirm build success. It now:
- verifies that every expected implementation and test file exists
- runs the focused local pytest suite
- parses the real pytest result counts
- fails the node if any expected file is missing
- fails the node if pytest fails
- writes a build manifest only after verification passes
- records
model_id: deterministic-local - records zero tokens and zero cost
The current live checkpoint was then recomputed from the saved state before Gate 3. The hardened Agent 03 output became:
build_id: c705ee5b-d5b4-459a-8704-321a1bda9011
status: COMPLETE
tests_total: 104
tests_passed: 104
test_results: focused_langgraph_pytest_suite passed
model_id: deterministic-local
tokens_input: 0
tokens_output: 0
cost_usd: 0.0
Agent 04 was also fixed to read package versions through importlib.metadata.version() instead of assuming package-level __version__ attributes. The live checkpoint then advanced back to Gate 3 with fresh local health checks:
python: 3.12.8 passed
langgraph: 1.1.10 passed
pydantic: 2.13.3 passed
The focused suite passed after the hardening:
104 passed
Gate 3 is now a real deploy sign-off for a locally verified build, not an approval of a model's assertion.
Gate 4 Caught a Smaller but Important Semantics Bug
After Gate 3 was approved, the graph advanced to Agent 05, the Cost Analyst.
The node did not call an LLM. It derived budget rules from the original brief:
monthly cap: $500.00
daily cap: $16.67
platform: meta_ads
bid strategy: COST_CAP
pacing: STANDARD
projected ROAS minimum: 2.0
But the output status was wrong:
status: APPROVED_BY_HUMAN
That was impossible. Gate 4 had not been approved yet.
This was not a hallucination. It was a local stub semantics bug. But the risk was similar: the state claimed a human approval that had not happened.
Agent 05 was hardened before Gate 4 approval.
It now emits a spend-limit proposal with:
status: PENDING
hard_daily_cap_usd: 16.67
hard_monthly_cap_usd: 500.0
approval_required_before_launch: true
spend_control_mode: MANUAL_APPROVAL_REQUIRED
The audit action changed from:
BUDGET_RULES_SET_STUB
to:
BUDGET_RULES_PROPOSED
Tests now prevent Agent 05 from pre-approving Gate 4, verify the daily and monthly caps are derived from the brief, and reject spend-limit outputs that do not require approval before launch.
The live checkpoint was refreshed so Gate 4 now shows:
Budget: PENDING
This is the correct state. The agent proposes spend limits. The human approves them.
Gate 4 Was Approved, Then Agent 07 Exposed the Next Stub
Gate 4 was then approved on the current local run:
.venv/bin/python campaignforge.py --resume 79a1e20e-d2b0-412b-bf60-1ae73634678f --approve
The trace showed the expected transition:
gate_decided: GATE-4 approved true
route_decision: gate_4.route_gate -> approved
node_start: strategist
node_complete: strategist
node_start: creative
node_complete: creative
node_start: executor
node_complete: executor
node_start: performance_analyst
node_complete: performance_analyst
route_decision: performance_analyst.route_performance -> continue
node_start: monitoring_pause
node_complete: monitoring_pause, status PAUSED_MONITORING
That got the run past spend approval and proved the downstream path was wired.
But it also exposed the next weak point. Agent 07, Strategist, was still a Phase 2 stub:
strategist: STRATEGY_SET_STUB
creative: CREATIVE_GENERATED_STUB
executor: CAMPAIGNS_LAUNCHED_STUB
The old Strategist output was schema-valid, but it was not really a strategy. It returned a generic B2B SaaS audience, fixed Meta Ads channel selection, two canned AB variants, and static KPI targets.
That is acceptable as a wiring stub. It is not acceptable as a campaign strategy.
So Agent 07 was built for real as a deterministic local node.
src/nodes/strategist.py now derives StrategistOutput from the actual campaign state:
brieffor vertical, budget, objective, CAC target, and campaign durationagent_01_outputfor target personas and pain pointsagent_05_outputfor daily cap, monthly cap, and projected ROAS minimum
It now produces:
channels
audiences
campaign_timeline_days
ab_test_variants
kpi_targets
The KPI target payload includes:
roas_min
cac_max_usd
ctr_target_pct
conversion_rate_target_pct
daily_budget_usd
monthly_budget_usd
primary_objective
measurement_window_days
The audit action changed from:
STRATEGY_SET_STUB
to:
STRATEGY_SET
and the node records:
model_id: deterministic-local
tokens_input: 0
tokens_output: 0
cost_usd: 0.0
Agent 07 also now has its own tests. They verify that the strategist:
- uses Agent 01 persona data when present
- uses Agent 05 budget rules when present
- parses duration and CAC targets from the brief
- falls back safely when the PRD is missing
- returns structured errors on invalid state
- no longer writes a stub audit action
The focused local suite now passes:
112 passed
A fresh run then exercised the rebuilt Agent 07:
pipeline_id: d58d5b45-b3d6-4429-a660-7c157b2fc920
developer: CODEBASE_VERIFIED, 112/112 tests passed
cost_analyst: BUDGET_RULES_PROPOSED
gate_4: approved
strategist: STRATEGY_SET
model_id: deterministic-local
audiences: 3
channels: ["meta_ads"]
This is the first checkpoint where Agent 07 is no longer a stub.
The run then immediately exposed the next two placeholders:
creative: CREATIVE_GENERATED_STUB
executor: CAMPAIGNS_LAUNCHED_STUB
Agent 06 correctly treated the launched campaign as mock data and paused monitoring instead of triggering content publication:
status: PAUSED_MONITORING
reason: Mock performance data generated for local testing; content publication is not triggered.
What This Changed About the Philosophy
Before this run, it was tempting to think of the human gates as product-safety checkpoints:
- approve the PRD
- approve the architecture
- approve deployment
- approve spend
- approve content publication
They are that.
But Gate 2 revealed a second purpose: human gates are also epistemic checkpoints. They stop the system from continuing when an agent says something that merely sounds right.
That matters because the most dangerous agent failures are not syntax errors. They are coherent falsehoods.
A syntax error stops the chain.
A schema error stops the chain.
A false but well-formed ADR passes validation and contaminates the next agent.
The only defense is a combination of:
- deterministic code where generation is unnecessary
- schemas at every boundary
- local traces that expose state transitions
- approval gates that invite real inspection
- regression tests for every discovered failure mode
This is a stronger system than the one that existed at the start of the chapter.
Current State
The local run proved:
- Claude API calls work from the local chain
- SQLite checkpointing works
interrupt()gates pause and resume correctly- approval decisions are written to audit log
- local JSONL tracing works
- checkpoint inspection before approval is possible
- Gate 1 PRD output was acceptable
- Gate 2 caught an untrustworthy ADR
- Agent 02 is now deterministic and implementation-derived
- Agent 03 is now deterministic and grounded in real pytest results
- Agent 04 local health checks now use installed package metadata safely
- Agent 05 now proposes spend limits as
PENDINGand requires Gate 4 approval - Gate 4 was approved and the graph advanced through Agents 07, 08, 09, and 06
- the current run paused cleanly in monitoring mode
- Agent 07 has now been rebuilt and exercised in a fresh live run
- Agent 08 and Agent 09 are now the remaining campaign-side stubs
The first local run was rejected at Gate 2 because its Agent 02 output was contaminated.
The second local run reached monitoring after the old Agent 07/08/09 stubs:
pipeline_id: 79a1e20e-d2b0-412b-bf60-1ae73634678f
current_stage: monitoring_pause
status: PAUSED_MONITORING
Gate 4: approved at 2026-05-06T06:15:40Z
Agent 03: COMPLETE, deterministic-local
Agent 04: DEPLOYED, Python/LangGraph/Pydantic health checks passed
Agent 05: PENDING proposal, hard cap $16.67/day and $500/month
Agent 07 in this checkpoint: old STRATEGY_SET_STUB output
Agent 08 in this checkpoint: CREATIVE_GENERATED_STUB
Agent 09 in this checkpoint: CAMPAIGNS_LAUNCHED_STUB
Agent 06: mock performance report, CONTINUE
The fresh validation run is also paused in monitoring, but now with the rebuilt Agent 07:
pipeline_id: d58d5b45-b3d6-4429-a660-7c157b2fc920
current_stage: monitoring_pause
status: PAUSED_MONITORING
Gate 4: approved at 2026-05-06T18:54:19Z
Agent 07: STRATEGY_SET, deterministic-local, 3 audiences
Agent 08: CREATIVE_GENERATED_STUB
Agent 09: CAMPAIGNS_LAUNCHED_STUB
Agent 06: mock performance report, CONTINUE
Artifacts from Chapter 3
| Artifact | Location | Status | |----------|----------|--------| | Local JSONL tracing utility | src/common/tracing.py | Implemented | | Graph tracing wrappers | src/graph.py | Implemented | | Gate trace events | src/nodes/gates.py | Implemented | | CLI trace events | campaignforge.py | Implemented | | Runtime artifact ignores | .gitignore | Updated | | Local tracing tests | tests/test_local_tracing.py | Added | | Agent 02 deterministic ADR generator | src/nodes/architect.py | Hardened | | Agent 02 drift regression tests | tests/test_nodes/test_architect.py | Added | | Agent 02 spec | agents/02-architect.md | Tightened | | Agent 03 deterministic verifier | src/nodes/developer.py | Hardened | | Agent 03 verifier tests | tests/test_nodes/test_developer.py | Added | | Agent 03 spec | agents/03-developer.md | Tightened | | Agent 04 package-version health check | src/nodes/deployer.py | Fixed | | Agent 04 deployer tests | tests/test_nodes/test_deployer.py | Added | | Agent 05 spend-limit proposal | src/nodes/cost_analyst.py | Hardened | | Agent 05 spend-limit tests | tests/test_nodes/test_cost_analyst.py | Added | | Agent 07 deterministic strategy builder | src/nodes/strategist.py | Implemented | | Agent 07 strategist tests | tests/test_nodes/test_strategist.py | Added | | Agent 07 spec | agents/07-strategist.md | Written | | Rejected local run | 1535c1cd-b5f3-4482-a5a0-3de36b819e52 | Rejected at Gate 2 | | Old post-Gate-4 local run | 79a1e20e-d2b0-412b-bf60-1ae73634678f | Paused in monitoring after old Agent 07/08/09 stubs | | Current Agent 07 validation run | d58d5b45-b3d6-4429-a660-7c157b2fc920 | Paused in monitoring after new Agent 07 and old Agent 08/09 stubs |
The Lesson
The first successful local run did not end with a launch.
It ended with a rejection.
That is a good outcome.
The chain ran. The trace worked. The checkpoint was inspectable. The human gate caught a false architecture before it could become downstream code. Then Gate 3 exposed that the Developer Agent needed the same treatment: less assertion, more verification. Gate 4 then exposed the next product boundary: once spend is approved, campaign-side agents cannot remain generic wiring stubs forever.
This is what the project is supposed to do: not pretend the agents are always right, but build a system that makes their failures visible, recoverable, and progressively less likely.
The next chapter is not about whether the graph can run.
It can.
The next chapter is about whether the campaign-side agents deserve the same level of trust Agents 02, 03, and 04 are starting to earn.
This document is a raw draft. Content Publisher Agent (11) will format and adapt this for LinkedIn, Medium, and other platforms as part of the first content run.