Building an Agent from Scratch

Key Concepts

Knowledge Base: The agent's memory. A collection of stored documents (markdown files, scraped pages, uploaded text) that the agent can read from, search, and write to. By storing a baseline and comparing it against live data each run, the knowledge base is what makes an agent longitudinal - it tracks drift over time rather than looking at a snapshot in isolation.

Structured Outputs: Defined fields (booleans, strings) that replace freeform LLM text with consistent, parsable data. Instead of getting a blob of text you have to interpret, you get specific fields like changes_detected: true or salesforce_changed: false. Structured outputs are what make agents programmable - booleans can drive conditionals, strings can pass cleanly downstream.

Advancing Baseline: At the end of every run, the agent uploads current scrapes back to the knowledge base, overwriting the previous baseline. The next run compares against this run's data, not the original. Every run advances what "normal" looks like, so the agent tracks change over time automatically.

Quick Reference

Knowledge Base Setup

Navigate to the Knowledge Bases tab
Click Add Knowledge Base and name it
Inside the knowledge base, click Add Files
Choose your input method: Scrape Pages (enter a URL), Upload Files, Crawl Website, or Paste Text
For pricing monitoring or similar use cases, scrape each target URL - the content is stored as a markdown document

Three knowledge base operations available inside agents:

Read a specific file - Pull a known document by name (use for small, defined sets)
Search the knowledge base - Dynamically find documents across large collections (use when you have hundreds of files)
Upload/update content - Write new data back to the knowledge base (use to advance baselines)

Building the Agent: Node by Node

The competitive pricing monitor uses 17 nodes across six logical stages:

Stage 1 - Pull baselines (4 Knowledge Base Read nodes)

One node per competitor, each reading its stored pricing page from the knowledge base
Label every output clearly (e.g., SF Baseline, Zoho Baseline) - you'll reference these as variables later, and generic labels like Get Knowledge Base File 1 become unmanageable

Stage 2 - Scrape live pages (4 Web Page Scrape nodes)

One node per competitor pricing URL
Output as markdown to keep format consistent with stored baselines
Label outputs to mirror the baseline naming (e.g., SF Current, Zoho Current)

Stage 3 - Detect changes (1 Prompt LLM node)

Feed in all 8 variables (4 baselines + 4 current scrapes)
Prompt instructs the LLM to compare each pair and identify structural or substantial changes
Structured outputs: one boolean per competitor (hubspot_changed, sf_changed, etc.), one global changes_detected boolean, and one change_summary string
Use a predictable/low-variety setting for data tasks

Stage 4 - Branch (1 Conditional node)

Condition: changes_detected == true
True branch: Continue to diagnosis, Slack message, and knowledge base update
False branch: Skip analysis, still update the knowledge base so the baseline stays fresh

Stage 5 - Diagnose and deliver (True branch: 2 Prompt LLM nodes + 1 Slack node)

Change Writer node: Receives all baselines, all current scrapes, plus the full structured output from detection. Prompt asks for specific numbers, tier names, and competitive gap analysis. Key instruction: diagnose only, do not make recommendations. Output label: competitive_diagnosis
Slack Writer node: Takes the competitive diagnosis and formats it as a single readable Slack message. A more creative/varied setting is fine here since humans are reading it. Consider using a strong writing model (e.g., Opus).
Slack Integration node: Select workspace, channel, and pass the Slack message variable as the message content

Stage 6 - Update baselines (4 Upload to Knowledge Base nodes, on both branches)

Each node takes a current scrape variable and uploads it to the knowledge base, overwriting the previous file
Keep the format as markdown to match how the baselines were originally stored
This runs on both the true and false branches - even when nothing changed, the baseline advances

Output Label Naming Convention

Name outputs so they're instantly recognizable when wiring prompts with 8+ variables:

Baselines: HubSpot Baseline, SF Baseline, Zoho Baseline, PipeDrive Baseline
Current scrapes: HubSpot Current, SF Current, Zoho Current, PipeDrive Current
Detection results: changes_detected, hubspot_changed, change_summary
Analysis: competitive_diagnosis
Final output: slack_message

LLM Node Settings by Purpose

Data/detection tasks: Low variety, strong reasoning model (e.g., GPT-5). You want consistent, predictable analysis.
Writing tasks for human readers: Allow some variety, strong writing model (e.g., Opus). Keeps recurring messages from feeling identical.
Diagnosis tasks: Low variety, include all context (even unchanged competitors) because competitive shifts only make sense relative to the full landscape.

Meet your mentor

Josh BlyskalAEO Strategy & Research

Meet your mentor

Building an Agent from Scratch

Key Concepts

Knowledge Base: The agent's memory. A collection of stored documents (markdown files, scraped pages, uploaded text) that the agent can read from, search, and write to. By storing a baseline and comparing it against live data each run, the knowledge base is what makes an agent longitudinal - it tracks drift over time rather than looking at a snapshot in isolation.

Structured Outputs: Defined fields (booleans, strings) that replace freeform LLM text with consistent, parsable data. Instead of getting a blob of text you have to interpret, you get specific fields like changes_detected: true or salesforce_changed: false. Structured outputs are what make agents programmable - booleans can drive conditionals, strings can pass cleanly downstream.

Advancing Baseline: At the end of every run, the agent uploads current scrapes back to the knowledge base, overwriting the previous baseline. The next run compares against this run's data, not the original. Every run advances what "normal" looks like, so the agent tracks change over time automatically.

Quick Reference

Knowledge Base Setup

Navigate to the Knowledge Bases tab
Click Add Knowledge Base and name it
Inside the knowledge base, click Add Files
Choose your input method: Scrape Pages (enter a URL), Upload Files, Crawl Website, or Paste Text
For pricing monitoring or similar use cases, scrape each target URL - the content is stored as a markdown document

Three knowledge base operations available inside agents:

Read a specific file - Pull a known document by name (use for small, defined sets)
Search the knowledge base - Dynamically find documents across large collections (use when you have hundreds of files)
Upload/update content - Write new data back to the knowledge base (use to advance baselines)

Building the Agent: Node by Node

The competitive pricing monitor uses 17 nodes across six logical stages:

Stage 1 - Pull baselines (4 Knowledge Base Read nodes)

One node per competitor, each reading its stored pricing page from the knowledge base
Label every output clearly (e.g., SF Baseline, Zoho Baseline) - you'll reference these as variables later, and generic labels like Get Knowledge Base File 1 become unmanageable

Stage 2 - Scrape live pages (4 Web Page Scrape nodes)

One node per competitor pricing URL
Output as markdown to keep format consistent with stored baselines
Label outputs to mirror the baseline naming (e.g., SF Current, Zoho Current)

Stage 3 - Detect changes (1 Prompt LLM node)

Feed in all 8 variables (4 baselines + 4 current scrapes)
Prompt instructs the LLM to compare each pair and identify structural or substantial changes
Structured outputs: one boolean per competitor (hubspot_changed, sf_changed, etc.), one global changes_detected boolean, and one change_summary string
Use a predictable/low-variety setting for data tasks

Stage 4 - Branch (1 Conditional node)

Condition: changes_detected == true
True branch: Continue to diagnosis, Slack message, and knowledge base update
False branch: Skip analysis, still update the knowledge base so the baseline stays fresh

Stage 5 - Diagnose and deliver (True branch: 2 Prompt LLM nodes + 1 Slack node)

Change Writer node: Receives all baselines, all current scrapes, plus the full structured output from detection. Prompt asks for specific numbers, tier names, and competitive gap analysis. Key instruction: diagnose only, do not make recommendations. Output label: competitive_diagnosis
Slack Writer node: Takes the competitive diagnosis and formats it as a single readable Slack message. A more creative/varied setting is fine here since humans are reading it. Consider using a strong writing model (e.g., Opus).
Slack Integration node: Select workspace, channel, and pass the Slack message variable as the message content

Stage 6 - Update baselines (4 Upload to Knowledge Base nodes, on both branches)

Each node takes a current scrape variable and uploads it to the knowledge base, overwriting the previous file
Keep the format as markdown to match how the baselines were originally stored
This runs on both the true and false branches - even when nothing changed, the baseline advances

Output Label Naming Convention

Name outputs so they're instantly recognizable when wiring prompts with 8+ variables:

Baselines: HubSpot Baseline, SF Baseline, Zoho Baseline, PipeDrive Baseline
Current scrapes: HubSpot Current, SF Current, Zoho Current, PipeDrive Current
Detection results: changes_detected, hubspot_changed, change_summary
Analysis: competitive_diagnosis
Final output: slack_message

LLM Node Settings by Purpose

Data/detection tasks: Low variety, strong reasoning model (e.g., GPT-5). You want consistent, predictable analysis.
Writing tasks for human readers: Allow some variety, strong writing model (e.g., Opus). Keeps recurring messages from feeling identical.
Diagnosis tasks: Low variety, include all context (even unchanged competitors) because competitive shifts only make sense relative to the full landscape.

Meet your mentor

Josh BlyskalAEO Strategy & Research