CampaignForge AI - The Journey

Chapter 8: Giving the Performance Analyst a Brain

Date: 2026-05-08 Vertical: B2B SaaS | Budget: $500/month

Where We Left Off

Chapter 7 added Agent 00 — the website auditor that blocks wasted spend before the campaign starts. That was a preventive measure. Agent 06 is where the system responds after the campaign is running.

Before this chapter, Agent 06 reported. It read performance metrics, checked budget utilization, compared ROAS against the threshold, and returned one of three recommendations: CONTINUE, HALT, or TRIGGER_CONTENT.

That was correct behavior. It was not sufficient behavior.

The Problem With a Three-Option Recommendation

CONTINUE, HALT, TRIGGER_CONTENT — those three states tell you what to do next. They do not tell you why the campaign is performing the way it is.

Imagine the agent returns CONTINUE. You check the metrics. ROAS is 0.6. CTR is 0.4%. CPC is climbing. The agent has correctly identified that you should not yet trigger content publication. But it has not told you whether the problem is creative fatigue, audience saturation, a slow landing page, a bidding issue, or simply that it is day three and the campaign is still in the learning phase.

Those problems have different fixes. Refreshing creative when the problem is audience saturation is wasted work. Expanding targeting when the problem is a weak offer that even warm audiences are not converting on is also wasted work. Without a diagnosis, you are guessing.

The agent had all the inputs needed to reason about this. It was not reasoning about it. It was pattern-matching the numbers to one of three buckets.

What Was Built

Agent 06 now produces a PerformanceDiagnosis on every run and attaches it to PerformanceAnalystOutput. The diagnosis is always present — not just when something is wrong.

The diagnosis schema:

class PerformanceDiagnosis(BaseModel):
    campaign_health_score: int        # 1–10
    primary_diagnosis:     str        # one of 9 categories
    secondary_diagnoses:   list[str]
    key_evidence:          list[str]
    confidence:            float      # 0.0–1.0
    summary:               str
    recommendations:       list[str]
    next_action:           str        # drives routing

The nine diagnostic categories:

| Category | What It Means | |---|---| | CREATIVE_FATIGUE | CTR was working, now declining. Frequency rising past 3–4. The audience has seen the ads. | | AUDIENCE_SATURATION | Frequency above 6. Rising CPM. Declining reach. Retargeting ceiling hit. | | WEAK_OFFER_LANDING_PAGE | CTR above 1.5% but conversion rate below 2%. People are clicking. They are not converting. The page or offer is the problem. | | BIDDING_BUDGET_ISSUE | Erratic delivery. Stuck near spend caps. Budget too low for the algorithm to exit the learning phase efficiently. | | LEARNING_PHASE_STUCK | Fewer than 50 conversions per week. Too many ad set changes interrupting the algorithm. Campaign has not exited learning after 7+ days. | | AD_ACCOUNT_RESTRICTION | Account flagged, limited, or under review. Policy issues suppressing delivery. | | TECHNICAL_DELIVERY_ERROR | Zero or near-zero impressions despite active status. Delivery failures or API errors. | | INSUFFICIENT_DATA | Spend below $5 or impressions below 500. Not enough signal to diagnose anything. | | PERFORMING_WELL | ROAS at or above 2.0, CTR above 1.5%, conversion rate above 2%, stable or declining CPM. Campaign is healthy. |

These categories are not arbitrary. They are grounded in the failure patterns I have seen repeatedly in paid campaigns on Meta and Google — the same patterns that professional analysts get paid to identify.

LLM vs Deterministic: When Each Runs

The diagnosis uses two different paths depending on the data.

LLM path: Used when is_real_performance_data is true and spend is at or above $5. The LLM receives a structured prompt with all campaign metrics, budget utilization, vertical, and monthly budget. It returns a PerformanceDiagnosis JSON. The prompt is calibrated with real case study benchmarks:

After 4 repeated ad exposures, conversion likelihood drops 45% (Meta Research)
A home improvement retailer moved from ROAS 1.18 to 6.47 — a 447% increase — by fixing the funnel and refreshing creative
A premium pet brand scaled ad spend 6.7x with ROAS up 10.8% and net profit up 393% by maintaining 40+ fresh creatives per month

These are real numbers, not invented thresholds. The LLM is asked to compare the actual campaign against calibrated benchmarks, identify the primary cause of the performance pattern, and recommend a specific next action — not a generic one.

Deterministic path: Used for mock data, local-artifact data (no real metrics yet), spend below $5, or when the LLM call fails for any reason. The fallback applies rule-based logic that cannot produce wrong answers on edge cases:

Spend below $5 or impressions below 500 → INSUFFICIENT_DATA
Budget cap hit or HALT recommendation → BIDDING_BUDGET_ISSUE → ESCALATE_TO_USER
Real data with ROAS at or above 2.0 → PERFORMING_WELL → TRIGGER_CONTENT
Everything else → INSUFFICIENT_DATA with CONTINUE_MONITORING

The LLM failure path is a deliberate design choice. If the API is unavailable or returns an unparseable response, the node catches the exception, logs a warning, and falls back to the deterministic path. The pipeline does not halt because of an LLM timeout. A less precise diagnosis is better than no diagnosis.

One constraint worth noting: the deterministic path only returns TRIGGER_CONTENT when is_real_performance_data is true. You cannot get a TRIGGER_CONTENT next action from simulated metrics, no matter how good the numbers look in the fixture file.

The New Routing: GATE-7

The next_action field drives graph routing. Before this chapter, route_performance had three outcomes: continue, trigger_content, critical. Now it has four:

CREATE_NEW_CREATIVE → rework → GATE-7
ADJUST_STRATEGY → rework → GATE-7
TRIGGER_CONTENT → content_draft (unchanged)
CONTINUE_MONITORING / other non-critical → monitoring_pause (unchanged)
PAUSE_CAMPAIGN / ESCALATE_TO_USER → error_halt (unchanged)

GATE-7 is the Rework Review gate.

Before GATE-7 triggers, the node sets pending_rework in state: either "creative" (new ad variants needed) or "strategist" (audience and bidding adjustment needed). This value survives the gate interrupt and is read by route_gate_7_rework after the operator decides.

The gate presents the operator with the full diagnosis: health score, primary category, confidence level, supporting evidence, and the proposed rework direction. The operator can read this and make an informed decision — not approve a black-box recommendation.

If approved: route_gate_7_rework reads pending_rework and routes to either the Creative agent or the Strategist agent. The rework loop runs, new variants are produced, and the campaign relaunches with fresh creative or adjusted targeting.

If rejected: The pipeline returns to monitoring_pause. No rework happens. The campaign continues as-is. This is not an error state. Sometimes the operator knows the diagnosis is correct but the timing is wrong. Maybe more budget is about to be added. Maybe the campaign is too new to warrant creative refresh. Rejecting GATE-7 keeps the pipeline alive without forcing a rework the operator does not want.

What This Means in Practice

The before-and-after is concrete.

Before: Agent 06 returns CONTINUE. Metrics: ROAS 0.6, CTR 0.4%, 15,000 impressions, frequency 7.2. The operator sees a recommendation to continue monitoring and a next check interval of 60 minutes. No explanation.

After: Agent 06 returns CONTINUE_MONITORING with a PerformanceDiagnosis of AUDIENCE_SATURATION, health score 3/10, confidence 82%, key evidence ["Frequency 7.2 exceeds 6.0 saturation threshold", "CPM rising from $4.20 to $6.80 over 7 days", "Reach declining despite stable daily budget"], and a recommendation to expand lookalike seed audiences and create a retargeting exclusion list. next_action: ADJUST_STRATEGY. Pipeline routes to GATE-7.

The operator now understands why performance is weak. The system has proposed a specific fix. The operator decides whether to proceed. This is what the agent should have been doing from the start.

The Honest State After Chapter 8

What changed:

PerformanceDiagnosis schema added to src/schemas.py
PerformanceAnalystOutput gains diagnosis: Optional[PerformanceDiagnosis]
pending_rework: NotRequired[str] added to CampaignState
_diagnose(), _diagnose_with_llm(), _diagnose_deterministic() added to performance_analyst.py
route_gate_7_rework added to gates.py
_build_gate_7_summary() added — the gate box shows score, diagnosis, rework direction, top recommendations
GATE-7 node wired into graph.py with approve → rework and reject → monitoring_pause paths
ORCHESTRATOR_CONTRACT updated: 8 gates, new performance_routes entry
GATE-7 card added to the Streamlit UI with diagnosis panel, evidence expander, recommendations expander
agents/06-performance-analyst.md rewritten with diagnostic framework documentation
263 tests passing (25 new tests covering diagnosis paths, routing, LLM mock, deterministic fallback, pending_rework propagation)

What did not change:

No existing gate behavior was modified
The three-option recommendation (CONTINUE, HALT, TRIGGER_CONTENT) is still present for backward compatibility — route_performance checks diagnosis.next_action first and falls back to recommendation.action if no diagnosis is present
No live Meta Ads calls
No live social publishing

What Comes Next

The diagnosis is only as good as the data going into it. Right now the data is still manually ingested from a fixture file. The next unlock is real API metrics — actual impression, click, and conversion data flowing in from Meta Ads.

When that happens, the LLM diagnosis path will run against real numbers for the first time. The case study benchmarks in the system prompt will be compared against actual campaign behavior. The rework loop will produce fresh creative in response to real creative fatigue, not hypothetical creative fatigue.

That is the moment the diagnostic brain becomes a working feedback loop, not just a better reporting layer.

The system is ready for it. The data connection is what remains.

This post was drafted by AI and reviewed by the operator. Content is published as part of the CampaignForge AI build-in-public journey.