CampaignForge AI — The Journey

Chapter 2: The Rebuild — Same System, Better Foundation

Status: Raw draft for Content Publisher Agent (11) to format Date: 2026-05-05 Author: Tim Simeonov (founder) + Architect Agent (02)

Where We Left Off

Chapter 1 ended with a working system that we chose not to deploy.

The Developer Agent had written a complete codebase. Twenty-nine tests were passing. The AWS Deployer had reviewed the Terraform modules, caught three bugs before a single credential was entered, and fixed them. The deployment guide was ready. The only thing left was to type terraform apply and watch it go.

We didn't do it.

The question that stopped us came up in a code review conversation, almost as an aside: "Is AWS actually the right foundation here?" The answer required thinking clearly about what CampaignForge AI is actually supposed to be — not just a SaaS product, but a documented, replicable journey that any founder can follow.

And the honest answer was: AWS Step Functions is hostile to that goal.

You cannot run Step Functions on a laptop. You need an AWS account. You need IAM roles, an S3 state bucket, a DynamoDB lock table, credentials properly configured, and about $76/month of infrastructure spend before a single agent runs. The ASL state machine definition — the "brain" of the orchestrator — is a JSON file that most developers can read but few can debug without the AWS console open in another tab.

The barrier is real. It filters out most of the people who would benefit most from reading this.

So we pivoted. And then we documented the pivot.

The Decision That Changed the Direction

The conversation that triggered the rebuild started with a simple question about the architecture decision record from Chapter 1.

"How was Step Functions selected? Why not CrewAI, Google ADK, or LangChain?"

The honest answer from ADR-001 was: the 48-hour human approval gate was the deciding constraint. Any Python framework running in a Lambda function can't hold execution state for 48 hours. Lambda has a 15-minute limit. You either keep a process alive (impossible), build a polling loop (fragile), or use Step Functions' waitForTaskToken which does this natively — execution pauses, zero compute runs, zero cost accrues, and it resumes the moment someone clicks approve.

That was the right answer for a cloud-first B2B SaaS product. It was the wrong answer for a system whose primary goal is to be understandable and replicable by a founder on a laptop.

"If AWS is flexible and local running is the priority, which framework is best?"

This question surfaced the real tension. And the answer — once we thought clearly about it — was LangGraph.

Here is why LangGraph won when evaluated against the same criteria:

The 48-hour gate problem: LangGraph's interrupt() primitive pauses a graph node mid-execution. The full state is serialized to SQLite. The process can die. The machine can restart. When the human approves, you call graph.invoke(Command(resume=decision)) and execution picks up from the exact same point. The SQLite file IS the durable state. No cloud required.

Readability: The state machine is Python. The StateGraph definition reads like a flowchart. A developer can run python campaignforge.py --brief "..." on any machine with Python 3.11+ and see the system work. The AWS version required understanding ASL JSON, Lambda invocation patterns, SQS FIFO queue semantics, and Step Functions' execution model simultaneously.

Local cost: Zero dollars per month for infrastructure. Anthropic API costs remain the same (~$15–40 per campaign run). The barrier to running this yourself is one terminal command and one API key.

Migration path: When the system is validated locally and the first live campaign is running, SqliteSaver becomes AsyncPostgresSaver in one line. The graph runs on LangGraph Cloud with zero code changes. The local version and the production version are the same code.

The alternatives were evaluated and eliminated:

CrewAI: No durable state persistence across process restarts. The 48-hour gate requirement kills it.
Google ADK: Google Cloud-only. Non-negotiable.
LangChain LCEL: A composition tool, not a graph runtime. No built-in checkpointing.
Raw Python state machine: We would just be rebuilding LangGraph's checkpoint and interrupt primitives ourselves. Unnecessary.

What the Architect Agent Produced

Gate 1 — PRD approval — passed with the original PRD-001 unchanged. The product requirements didn't change with the pivot. Same 11 agents. Same 5 human approval gates. Same JSON contracts. Same 10 non-negotiables from the PRD. The only thing that changed was where the system runs.

The Architect Agent was then re-run with a new mandate: produce ADR-002 for LangGraph exactly as ADR-001 was produced for AWS Step Functions. Same format, same rigor, same binding decisions.

ADR-002 covers every binding technical decision for the Chapter 2 implementation. Here is what it decided:

The Orchestrator Becomes the Graph

In Chapter 1, Agent 10 (Orchestrator) was a Lambda function that managed state in DynamoDB and coordinated other agents through Step Functions. It was a real piece of code with real logic.

In Chapter 2, the Orchestrator is the graph itself. The StateGraph definition in src/graph.py — the nodes, the edges, the conditional routing — is Agent 10. The error_halt node and rollback node are its failure handlers. There is no separate Orchestrator code because the orchestration is the graph topology.

This is a meaningful simplification. One less agent to test, one less system prompt to maintain, one less JSON schema to validate. The graph structure expresses the orchestration logic directly and visibly.

State Flows Through One Object

In Chapter 1, agents communicated by writing to SQS FIFO queues and reading from DynamoDB. The state was distributed across multiple AWS services. Debugging a failed pipeline meant correlating CloudWatch logs, DynamoDB records, SQS messages, and Step Functions execution history.

In Chapter 2, every agent node receives the full CampaignState TypedDict and returns a partial update. All communication between agents happens through this single object. SQLite checkpoints it after every node. Debugging a failed pipeline means opening one .db file.

The tradeoff is that the state object grows as the pipeline progresses. In practice, for a campaign pipeline, this is a few kilobytes of JSON — completely manageable in SQLite and straightforward to inspect.

Pydantic Enforces the Contracts

ADR-001 defined JSON schemas in a schemas/v1/ directory. ADR-002 converts those schemas to Pydantic models in src/schemas.py. The practical effect is the same — strict typing at every agent boundary — but the enforcement is now in Python rather than jsonschema, which makes the error messages readable and the tests simpler to write.

Two validators were added that didn't exist in Chapter 1:

DeveloperOutput rejects sign-off if any test is failing (tests_passed < tests_total)
DeployerOutput rejects sign-off if any health check failed
PublishedPost rejects publication if ai_disclosure_included is False
CreativeOutput rejects any ad variant that is missing AI disclosure text

These aren't new requirements — they were already non-negotiables in PRD-001. But enforcing them in Pydantic validators means the system cannot violate them even if an agent tries to. The contract is in the code, not in documentation.

The Human Gates Are Three Lines of Python

The most technically complex part of Chapter 1 was the human approval gate mechanism. It required: a waitForTaskToken Step Functions state, an Approval Notifier Lambda that wrote the token to DynamoDB and sent an SNS email, an Approval Handler Lambda that received the approve/reject callback, an Escalation Checker Lambda on an EventBridge schedule, plus API Gateway routes and HMAC-signed approval URLs.

In Chapter 2, each gate is this:

decision = interrupt({
    "gate_id":   gate_id,
    "gate_name": gate_name,
    "summary":   build_gate_summary(state, gate_id),
})

The interrupt() call pauses the graph. LangGraph serializes the state to SQLite. When the human runs python campaignforge.py --resume <pipeline_id> --approve, the graph picks up from here and decision is {"approved": True}. The entire distributed callback architecture is replaced by a checkpoint and a resume command.

The 48-hour escalation is a daemon thread that sleeps for 48 hours, checks whether the gate has been resolved in SQLite, and sends an email and Slack notification if not. It does not auto-approve. It does not auto-fail. It waits.

Phase 2 Agents Are Stubs, Not Gaps

The MVP scope hasn't changed from PRD-001: Agents 01, 02, 03, 04, 06, 10, and 11 in full. Agents 05, 07, 08, and 09 are Phase 2.

In Chapter 1, Phase 2 agents simply didn't exist in the codebase. In Chapter 2, they are stub nodes in the graph. Each stub node receives valid input, returns a valid mock output matching its Pydantic schema, and writes an audit record. The full pipeline runs end-to-end in tests — including the gates and the routing — without any missing-node errors.

This matters because it means the Developer Agent can implement and test the complete graph topology in MVP, and Phase 2 implementation is a matter of replacing stubs with real logic, not rewiring the graph.

The Three Things That Didn't Change

It is worth being explicit about what the pivot did not touch.

The 10 non-negotiables from PRD-001 are unchanged. Structured JSON contracts, immutable audit log, hard spend caps, approval gates that never auto-approve, explicit error handling — all of these are present in ADR-002 in exactly the same form. The implementation changed; the requirements didn't.

The product vision is unchanged. CampaignForge AI still takes a raw advertising brief and returns a live, profitable campaign. It still documents every decision publicly. It still publishes its P&L. The agent team still runs its own campaigns. None of that changed.

The git history is unchanged. The AWS Step Functions codebase is tagged v0.1-aws-stepfunctions-architecture. It is not deleted. It is not rewritten. Chapter 1 happened, Chapter 2 is built on top of it, and the commit log is the full record. A reader following this journey can check out the tag and see exactly what was built before the pivot.

What the Rebuild Actually Cost

Time to produce ADR-002: one session. The Architect Agent received PRD-001, the Chapter 1 journey draft, and ADR-001 as context. It read all three, evaluated the LangGraph design space, and produced a complete architecture decision record covering graph topology, state schema, Pydantic schemas for all 11 agents, gate implementation, LLM abstraction layer, local cost breakdown, and project layout.

The output is directly actionable. The Developer Agent (03) receives ADR-002 as its sole input and has everything it needs to begin implementation without clarification.

The decision that mattered most wasn't technical — it was the willingness to stop before deploying something that worked but didn't fit the goal. The AWS version would have run. The campaigns would have launched. But the documentation would have been inaccessible to most readers, and the journey — the second goal of this project — would have been limited to an audience that already knows Terraform and AWS IAM.

The rebuild costs one chapter. The alternative costs the audience.

What Comes Next

Gate 2 is the current state. ADR-002 is written and pending approval. The review checklist asks whether LangGraph + SqliteSaver satisfies all 10 non-negotiables, whether interrupt() correctly replaces waitForTaskToken, and whether the project layout is actionable.

Once Gate 2 is approved, Developer Agent (03) starts implementation. The target: python -m pytest tests/ -v → all pass, python campaignforge.py --brief "..." → graph runs to GATE-1 and pauses cleanly.

After that comes the question of the first live campaign. The agent team needs to advertise itself. The budget, the brief, and what "success" looks like for the first run — these are open questions that the agents will help answer. The first campaign is the first real test of everything built in Chapters 1 and 2.

Open Questions for Chapter 3 and Beyond

These are unresolved and will be addressed in subsequent chapters:

1. First campaign brief: What exactly does CampaignForge AI say about itself in its first ad? What targeting makes sense for a product that doesn't have paying customers yet?
2. First campaign budget: What is the right number to commit to for a proof-of-concept that is documented publicly? Enough to generate signal; not enough to hurt.
3. Definition of success for the first run: ROAS above 1.0? A single inbound lead? A specific number of LinkedIn post impressions? The agents need a target to optimize toward.
4. Content format: Does Agent 11 publish the journey as a raw transcript, a narrative, or both? The chapter drafts so far are narrative — is that the right format for LinkedIn and Medium?
5. Postgres or SQLite for the first live run? The local version uses SQLite. When does it make sense to move to Postgres, and what triggers that decision — campaign volume, team size, or production stability requirements?
6. Phase 2 trigger: At what point do Agents 05, 07, 08, and 09 get implemented? Is there a specific campaign performance threshold that makes the full pipeline necessary?

Artifacts from Chapter 2

| Artifact | Location | Status | |----------|----------|--------| | ADR-002 | architecture/ADR-002-langgraph.md | Written — awaiting Gate 2 approval | | Agent 02 prompt (updated) | agents/02-architect.md | Updated — LangGraph mandate | | Chapter 2 journey | content/journey-chapter-02.md | This document | | Chapter 1 codebase (preserved) | git tag: v0.1-aws-stepfunctions-architecture | Frozen |

A Note on the Meta-Goal

Every chapter of this journey is built by the agents and documented in real time. The pivot decision, the architectural reasoning, the trade-offs — none of this was written after the fact. It was produced during the session in which the decisions were made.

This is intentional. The documentation being written by agents, about agents, in real time — that's the proof of concept. If the documentation is good enough that a reader can follow it and replicate the system, the agents did their job. If it isn't, that's the more interesting chapter.

The next test is whether the Developer Agent can take ADR-002 and produce a running system without human interpretation. That answer is Chapter 3.

This document is a raw draft. Content Publisher Agent (11) will format and adapt this for LinkedIn, Medium, and other platforms as part of the first content run.

This post was drafted by AI and reviewed by the operator. Content is published as part of the CampaignForge AI build-in-public journey.