Back to Blog
PromptBuilder Team
November 22, 2025
9 min read
Featured

From Prompt Engineering to Context Engineering: The 2025 Agent Builder's Guide

In late November 2025, Anthropic published new research showing that modern agent quality now depends on curating the entire context stack, not just clever prompt text. If you're still treating "the prompt" as a single text block, you're missing 70% of what makes agents reliable.

This guide explains the shift from prompt engineering to context engineering and provides a blueprint for assembling context blocks that actually work.


Why Prompt Engineering Alone Is Not Enough

Traditional prompt engineering focuses on:

  • Phrasing instructions clearly
  • Providing examples (few-shot)
  • Structuring output formats

The problem: Modern agents need more than instructions. They need:

  1. System instructions (the "how to behave")
  2. Memory (conversation history, user preferences)
  3. Retrieved knowledge (documents, database rows, API responses)
  4. Tool definitions (available functions and their schemas)
  5. Execution context (current state, constraints, goals)

Optimizing only #1 (instructions) while ignoring #2-5 is like tuning a car engine while running on flat tires.


The Context Stack (2025 Model)

Think of agent input as a layered stack, not a single prompt:

┌─────────────────────────────────┐
│  System Instructions            │  ← Role, behavior, constraints
├─────────────────────────────────┤
│  Long-Term Memory               │  ← User preferences, past decisions
├─────────────────────────────────┤
│  Retrieved Documents            │  ← RAG results, search hits
├─────────────────────────────────┤
│  Tool Definitions               │  ← Available functions + schemas
├─────────────────────────────────┤
│  Conversation History           │  ← Recent turns (short-term memory)
├─────────────────────────────────┤
│  Current Task                   │  ← User's latest request
└─────────────────────────────────┘

Each layer has distinct content and lifecycle. Context engineering is the discipline of curating all layers coherently.


Layer-by-Layer Breakdown

Layer 1: System Instructions

What it is: The "character sheet" for your agent.

What to include:

  • Role and expertise
  • Behavioral rules (tone, verbosity, safety)
  • Output format defaults
  • Tool usage policy

Example:

You are a customer success agent for Acme SaaS.
Tone: Professional, helpful, concise.
Policy: Always check the knowledge base before suggesting workarounds.
Tools: search_kb, create_ticket, escalate_to_human.
If unsure, escalate rather than guess.

Anti-pattern: Dumping all constraints into one 2000-word block. Keep it modular.


Layer 2: Long-Term Memory

What it is: Persistent facts about the user or task.

Examples:

  • User's timezone, language, role
  • Past decisions ("User prefers JSON over CSV")
  • Recurring constraints ("Always exclude PII")

How to store:

  • Lightweight: Structured metadata (JSON)
  • Heavy: Vector DB with similarity search

PromptBuilder tip: Use the "Memory" tab to store reusable user context separately from the task prompt.


Layer 3: Retrieved Documents (RAG)

What it is: Dynamically fetched knowledge based on the current query.

Best practices:

  • Retrieve first, then prompt: Don't stuff irrelevant docs into context.
  • Chunk intelligently: 500-1000 token chunks with overlap.
  • Cite sources: Instruct the agent to reference doc IDs.

Example query flow:

  1. User: "What's our refund policy for enterprise customers?"
  2. RAG retrieves: [doc_5432: Enterprise SLA, doc_8821: Refund Terms]
  3. Context includes: docs + instruction to cite.

Anti-pattern: Retrieving 10 documents and hoping the model finds the answer. Pre-filter and rank.


Layer 4: Tool Definitions

What it is: JSON schemas for functions the agent can call.

Key design choices:

  • Tool names: Use verbs (search_kb, not kb)
  • Descriptions: Be specific about when to use each tool
  • Parameters: Provide examples and constraints in schema

Example:

{
  "name": "search_kb",
  "description": "Search the knowledge base for support articles. Use when user asks a how-to or policy question.",
  "parameters": {
    "query": "string (natural language query)",
    "max_results": "integer (default: 3)"
  }
}

Anti-pattern: Defining 15+ tools with overlapping purposes. The agent will thrash.


Layer 5: Conversation History

What it is: Recent turns in the conversation.

How much to include:

  • Short tasks: Last 5-10 turns
  • Long sessions: Summarize older turns; keep last 3-5 verbatim

When to prune: If token count > 50% of context window, summarize or truncate.


Layer 6: Current Task

What it is: The user's immediate request.

Tips:

  • Place at the end of the context (most models prioritize recency).
  • Restate any relevant constraints from earlier layers.

Example:

User: "Send me a report on Q3 churn."
[Context includes: user's timezone, report format preference, tool to generate reports]

The Context Engineering Workflow

Step 1: Audit Your Current Stack

Map your agent's input to the 6 layers. Ask:

  • Which layers are implicit (hardcoded, scattered)?
  • Which are missing (no memory, no retrieved docs)?
  • Which are bloated (2000-line system prompt)?

Step 2: Modularize

Separate layers into distinct components:

  • system_prompt.txt
  • user_memory.json
  • retrieved_docs/ (output of RAG)
  • tools.json
  • conversation_history (rolling buffer)

Step 3: Assemble Dynamically

Before each agent call:

  1. Load system prompt (static)
  2. Fetch user memory (DB or cache)
  3. Retrieve relevant docs (RAG query)
  4. Inject tool defs (static or filtered by task)
  5. Append conversation history
  6. Add current task

Step 4: Measure and Iterate

Track:

  • Context utilization: % of context actually referenced in output
  • Retrieval precision: % of retrieved docs cited
  • Tool usage accuracy: % of tool calls that succeed

Prune layers that don't improve outcomes.


Anti-Patterns to Avoid

1. Bloated System Prompts

Bad:

You are a helpful assistant. You should be polite and professional.
You have access to tools. Use them when appropriate.
If the user asks for data, check the database first.
Always cite sources. Format output as JSON when possible.
Remember the user's preferences. Don't repeat yourself.
[...2000 more words...]

Good:

ROLE: Data analyst assistant
TOOLS: query_db, fetch_chart
POLICY: Cite sources. Default format: JSON.
See user_memory.json for preferences.

2. Ignoring Retrieval Quality

Dumping 10 documents into context without ranking or filtering. The model wastes tokens on irrelevant content.

Fix: Use a reranker after initial retrieval. Include only top 3 docs.

3. No Memory Layer

Expecting the agent to remember user preferences from conversation history alone.

Fix: Extract preferences explicitly and store in a structured memory layer.

4. Static Tool Definitions

Exposing all 20 tools on every call, even when only 2 are relevant.

Fix: Filter tools by task category or user role.


PromptBuilder-Friendly Workflow

Here's how to implement context engineering in PromptBuilder:

  1. System Instructions: Use the "System Prompt" field (keep it under 500 words).
  2. User Memory: Store in the "Variables" section as JSON (e.g., {{user_prefs}}).
  3. Retrieved Docs: Paste RAG results into "Context" or use a URL fetch.
  4. Tools: Define in "Tools" tab (upcoming feature - Dec 2025).
  5. Conversation History: Auto-injected if using multi-turn mode.
  6. Current Task: User's input in the main prompt field.

Template:

System: {{system_prompt}}
Memory: {{user_memory}}
Documents: {{retrieved_docs}}
Tools: {{tool_definitions}}
History: {{conversation}}
Task: {{user_input}}

Case Study: Before vs. After Context Engineering

Before (Prompt-Only Approach)

You are a support agent. Answer the user's question using the knowledge base.
Be helpful and cite sources.

User: What's the refund policy for annual plans?

Problems:

  • No retrieved docs (agent guesses or hallucinates)
  • No memory of user's plan type
  • No tool to check actual policy

Result: 40% accuracy, 30% hallucination rate.


After (Context Engineering)

SYSTEM: Support agent. Cite doc IDs. Escalate if unsure.
MEMORY: {user_plan: "annual_enterprise", timezone: "PST"}
RETRIEVED_DOCS:
  [doc_5432]: Enterprise refund policy - 30-day full refund.
  [doc_8821]: Annual plan terms - pro-rated after 30 days.
TOOLS: search_kb, escalate_to_human
HISTORY: [User previously asked about billing cycle]
TASK: What's the refund policy for annual plans?

Result:

  • Agent cites [doc_5432] and [doc_8821].
  • Personalizes answer based on user's plan type.
  • 85% accuracy, 5% hallucination rate.

Advanced: Dynamic Context Pruning

When context exceeds token limits, prune intelligently:

  1. Summarize old conversation turns: Keep last 3 verbatim; summarize 4-10.
  2. Drop low-relevance docs: If retrieval score < threshold, exclude.
  3. Compress tool definitions: Remove examples if token-starved.
  4. Priority order: Current task > Tools > Retrieved docs > Memory > History.

Code snippet (pseudo):

context_budget = 100_000  # tokens
layers = [system, memory, docs, tools, history, task]
accumulated = 0

for layer in layers:
    if accumulated + layer.tokens > context_budget:
        layer.compress()  # summarize or truncate
    accumulated += layer.tokens

FAQ

Is this overkill for simple chatbots? If your bot only answers FAQs with no personalization or tools, prompt engineering alone is fine. Context engineering pays off for agents that retrieve, remember, and act.

How do I measure if context engineering is working? Track: task success rate, retrieval precision, tool usage accuracy, and user satisfaction. Compare before/after.

What tools support context engineering? LangChain, LlamaIndex, and PromptBuilder (with upcoming modular context features). Roll your own with Python + Redis/Pinecone for memory/RAG.

Does this apply to all models? Yes, but long-context models (Claude 3, Gemini 3, GPT-5.1) benefit most. Smaller models (e.g., Llama 3 8B) need aggressive pruning.


Key Takeaways

  • Prompt engineering is necessary but not sufficient. Modern agents need a full context stack.
  • Six layers: System instructions, long-term memory, retrieved docs, tool definitions, conversation history, current task.
  • Curate, don't dump: Each layer should be intentional and pruned.
  • Measure and iterate: Track context utilization and retrieval precision.
  • Use PromptBuilder's modular features (System Prompt, Variables, Context) to organize layers cleanly.

Next Steps

  1. Audit one of your existing agents using the 6-layer model.
  2. Separate system instructions from memory and retrieved content.
  3. Implement a lightweight RAG pipeline (Pinecone + OpenAI embeddings).
  4. Track success rate before/after modularization.

Further reading: Prompt Caching & Token Economics to optimize cost across layers.


Summary

The 2025 shift from prompt engineering to context engineering reflects the reality that agents are more than their instructions. By structuring input as a curated stack - system prompt, memory, retrieval, tools, history, and task - you gain control over quality, cost, and reliability. Start small, measure rigorously, and prune ruthlessly.