Building an Agent from Scratch
Key Concepts
Knowledge Base: The agent's memory. A collection of stored documents (markdown files, scraped pages, uploaded text) that the agent can read from, search, and write to. By storing a baseline and comparing it against live data each run, the knowledge base is what makes an agent longitudinal - it tracks drift over time rather than looking at a snapshot in isolation.
Structured Outputs: Defined fields (booleans, strings) that replace freeform LLM text with consistent, parsable data. Instead of getting a blob of text you have to interpret, you get specific fields like changes_detected: true or salesforce_changed: false. Structured outputs are what make agents programmable - booleans can drive conditionals, strings can pass cleanly downstream.
Advancing Baseline: At the end of every run, the agent uploads current scrapes back to the knowledge base, overwriting the previous baseline. The next run compares against this run's data, not the original. Every run advances what "normal" looks like, so the agent tracks change over time automatically.
Quick Reference
Knowledge Base Setup
- Navigate to the Knowledge Bases tab
- Click Add Knowledge Base and name it
- Inside the knowledge base, click Add Files
- Choose your input method: Scrape Pages (enter a URL), Upload Files, Crawl Website, or Paste Text
- For pricing monitoring or similar use cases, scrape each target URL - the content is stored as a markdown document
Three knowledge base operations available inside agents:
- Read a specific file - Pull a known document by name (use for small, defined sets)
- Search the knowledge base - Dynamically find documents across large collections (use when you have hundreds of files)
- Upload/update content - Write new data back to the knowledge base (use to advance baselines)
Building the Agent: Node by Node
The competitive pricing monitor uses 17 nodes across six logical stages:
Stage 1 - Pull baselines (4 Knowledge Base Read nodes)
- One node per competitor, each reading its stored pricing page from the knowledge base
- Label every output clearly (e.g.,
SF Baseline, Zoho Baseline) - you'll reference these as variables later, and generic labels like Get Knowledge Base File 1 become unmanageable
Stage 2 - Scrape live pages (4 Web Page Scrape nodes)
- One node per competitor pricing URL
- Output as markdown to keep format consistent with stored baselines
- Label outputs to mirror the baseline naming (e.g.,
SF Current, Zoho Current)
Stage 3 - Detect changes (1 Prompt LLM node)
- Feed in all 8 variables (4 baselines + 4 current scrapes)
- Prompt instructs the LLM to compare each pair and identify structural or substantial changes
- Structured outputs: one boolean per competitor (
hubspot_changed, sf_changed, etc.), one global changes_detected boolean, and one change_summary string - Use a predictable/low-variety setting for data tasks
Stage 4 - Branch (1 Conditional node)
- Condition:
changes_detected == true - True branch: Continue to diagnosis, Slack message, and knowledge base update
- False branch: Skip analysis, still update the knowledge base so the baseline stays fresh
Stage 5 - Diagnose and deliver (True branch: 2 Prompt LLM nodes + 1 Slack node)
- Change Writer node: Receives all baselines, all current scrapes, plus the full structured output from detection. Prompt asks for specific numbers, tier names, and competitive gap analysis. Key instruction: diagnose only, do not make recommendations. Output label:
competitive_diagnosis - Slack Writer node: Takes the competitive diagnosis and formats it as a single readable Slack message. A more creative/varied setting is fine here since humans are reading it. Consider using a strong writing model (e.g., Opus).
- Slack Integration node: Select workspace, channel, and pass the Slack message variable as the message content
Stage 6 - Update baselines (4 Upload to Knowledge Base nodes, on both branches)
- Each node takes a current scrape variable and uploads it to the knowledge base, overwriting the previous file
- Keep the format as markdown to match how the baselines were originally stored
- This runs on both the true and false branches - even when nothing changed, the baseline advances
Output Label Naming Convention
Name outputs so they're instantly recognizable when wiring prompts with 8+ variables:
- Baselines:
HubSpot Baseline, SF Baseline, Zoho Baseline, PipeDrive Baseline - Current scrapes:
HubSpot Current, SF Current, Zoho Current, PipeDrive Current - Detection results:
changes_detected, hubspot_changed, change_summary - Analysis:
competitive_diagnosis - Final output:
slack_message
LLM Node Settings by Purpose
- Data/detection tasks: Low variety, strong reasoning model (e.g., GPT-5). You want consistent, predictable analysis.
- Writing tasks for human readers: Allow some variety, strong writing model (e.g., Opus). Keeps recurring messages from feeling identical.
- Diagnosis tasks: Low variety, include all context (even unchanged competitors) because competitive shifts only make sense relative to the full landscape.