Guides / Build an AI Agent from Scratch

Build an AI Agent from Scratch

Build a research agent that searches the web with Spider, evaluates results, and forms answers with OpenAI.

7 min read William Espegren

Build an AI Agent from Scratch

Build a research agent that combines OpenAI with Spider’s web search. The agent forms search queries, evaluates whether the results answer the question, refines its approach, and delivers a final answer.

Setup

First, let’s set up our environment and install the necessary dependencies.

Install Required Packages

Install the required packages using pip:

pip install python-dotenv openai spider-client colorama
  • python-dotenv: Manages environment variables
  • openai: Interfaces with OpenAI’s powerful language models
  • spider-client: Scraping, crawling and web searching (all of Spiders capabilities)
  • colorama: Adds color to our console output for better readability

Environment Variables

Create a .env file in your project root and add your API keys:

OPENAI_API_KEY=<your_openai_api_key_here>
SPIDER_API_KEY=<your_spider_api_key_here>

Building the AI Research Agent

Let’s break down the process of building our AI agent into steps.

Step 1: Import Dependencies and Set Up

import os
from dotenv import load_dotenv
import openai
from spider import Spider
from typing import List, Dict, Any
from colorama import init, Fore


init(autoreset=True)
load_dotenv()

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
SPIDER_API_KEY = os.getenv("SPIDER_API_KEY")

Import libraries and load environment variables. colorama adds color to console output for readability.

Step 2: Create the AIResearchAgent Class

The AIResearchAgent class encapsulates all the functionality:

class AIResearchAgent:
    def __init__(self, openai_api_key: str, spider_api_key: str):
        self.openai_client = openai.OpenAI(api_key=openai_api_key)
        self.spider_client = Spider(spider_api_key)

This sets up connections to the OpenAI and Spider APIs.

Step 3: Implement Web Search Functionality

The agent searches the web using Spider’s API to fetch relevant, up-to-date information.

def search(self, query: str, limit: int = 5) -> List[Dict[str, Any]]:
    """Perform a web search using Spider."""
    params = {"limit": limit, "fetch_page_content": False}
    print(f"{Fore.GREEN}Searching for: {query}")
    results = self.spider_client.search(query, params)
    return results

Step 4: Implement OpenAI Request Helper

def openai_request(self, system_content: str, user_content: str) -> str:
    """Helper method to make OpenAI API requests."""
    response = self.openai_client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": system_content},
            {"role": "user", "content": user_content}
        ]
    )
    return response.choices[0].message.content

A helper that wraps OpenAI API calls.

Step 5: Implement Text Summarization (optional)

This method isn’t used in the main loop below, but you can add it by calling it before combined_summary in the research method.

def summarize(self, text: str) -> str:
    """Summarize the given text using OpenAI."""
    print(f"{Fore.BLUE}Summarizing...", text)
    return self.openai_request(
        "You are a helpful assistant that summarizes text.",
        f"Summarize this text in 2-3 sentences: {text}"
    )

Step 6: Implement Answer Evaluation

def evaluate(self, question: str, summary: str) -> str:
    """Evaluate if the summary answers the question."""
    print(f"{Fore.MAGENTA}Evaluating...")
    evaluation = self.openai_request(
        "You are an AI research assistant. Your task is to evaluate if the given summary answers the user's question.",
        f"Question: {question}\n\nSummary:\n{summary}\n\nDoes this summary answer the question? If it does, write exactly: 'does answer the question'. If not, explain why."
    )
    print(f"{Fore.MAGENTA}Evaluation: {evaluation}")
    return evaluation

The agent evaluates whether the summary answers the original question. If not, it continues searching. This self-evaluation loop is what makes it a level 3 agent.

Step 7: Implement Search Query Formation

The user’s query might not be an effective search query:

  • User query: What is the weather in Boston?
  • Search query: Boston weather
def form_search_query(self, user_query: str) -> str:
    """Form a search query from the user's input."""
    search_query = self.openai_request(
        "You are an AI research assistant. Your task is to form an effective search query based on the user's question.",
        f"User's question: {user_query}\n\nPlease provide a concise and effective search query to find relevant information."
    )
    return search_query

Step 8: Implement Final Answer Formation

Once the agent has gathered and evaluated enough information, it forms a comprehensive answer:

def form_final_answer(self, user_query: str, summary: str) -> str:
    """Form a final answer based on the user's query and the summary."""
    final_answer = self.openai_request(
        "You are an AI research assistant. Your task is to form a comprehensive answer to the user's question based on the provided summary.",
        f"User's question: {user_query}\n\nSummary of research:\n{summary}\n\nPlease provide a comprehensive answer to the user's question based on this information."
    )
    print(f"{Fore.GREEN}Formed final answer.")
    return final_answer

Step 9: Implement Question Refinement

def refine_question(self, original_question: str, evaluation: str) -> str:
    """Refine the search question based on the evaluation."""
    print(f"{Fore.CYAN}Refining...")
    return self.openai_request(
        "You are an AI research assistant. Your task is to refine a search query based on the original question and the evaluation of previous search results.",
        f"Original question: {original_question}\n\nEvaluation of previous results: {evaluation}\n\nPlease provide a refined search query to find more relevant information."
    )

Refining questions based on previous results makes the agent iteratively converge on better answers.

Step 10: Implement the Main Research Loop

The main research loop ties everything together:

def research(self, user_query: str, max_iterations: int = 5) -> str:
    """Perform research on the given question."""
    print(f"{Fore.BLUE}Starting research for: {user_query}")
    
    for iteration in range(max_iterations):
        print(f"{Fore.YELLOW}Iteration {iteration + 1}/{max_iterations}")

        search_query = self.form_search_query(user_query)
        search_results = self.search(search_query)
        # OPTIONAL: call the summarize method here to summarize the search results
        combined_summary = "\n".join([result['description'] for result in search_results['content']])
        evaluation = self.evaluate(user_query, combined_summary)

        if "does answer the question" in evaluation.lower():
            final_answer = self.form_final_answer(user_query, combined_summary)
            return f"{Fore.GREEN}Final Answer:\n{final_answer}\n\nBased on:\n{combined_summary}"

        user_query = self.refine_question(user_query, evaluation)
        
    return f"{Fore.RED}Couldn't find a satisfactory answer after {max_iterations} iterations. Last summary:\n{combined_summary}"

Each iteration:

  • Forms a search query
  • Evaluates whether results answer the question
  • Refines the query if not
  • Synthesizes a final answer when satisfied

Step 11: Implement the Main Function

An interactive loop for the agent:

def main():
    agent = AIResearchAgent(OPENAI_API_KEY, SPIDER_API_KEY)
    while True:
        user_input = input("What would you like to research? (Type 'exit' to quit): ")
        if user_input.lower() == 'exit':
            break
        result = agent.research(user_input)
        print(result)

if __name__ == "__main__":
    main()

Conclusion

The finished agent can:

  • Search the web using Spider
  • Evaluate whether results are sufficient
  • Self-refine its search query when results fall short
  • Form a final answer from gathered data

Complete Code

Complete code:

import os
from dotenv import load_dotenv
import openai
from spider import Spider
from typing import List, Dict, Any
from colorama import init, Fore


init(autoreset=True)
load_dotenv()

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
SPIDER_API_KEY = os.getenv("SPIDER_API_KEY")

class AIResearchAgent:
    def __init__(self, openai_api_key: str, spider_api_key: str):
        self.openai_client = openai.OpenAI(api_key=openai_api_key)
        self.spider_client = Spider(spider_api_key)

    def search(self, query: str, limit: int = 5) -> List[Dict[str, Any]]:
        """Perform a web search using Spider."""
        params = {"limit": limit, "fetch_page_content": False}
        print(f"{Fore.GREEN}Searching for: {query}")
        results = self.spider_client.search(query, params)
        return results

    def _openai_request(self, system_content: str, user_content: str) -> str:
        """Helper method to make OpenAI API requests."""
        response = self.openai_client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "system", "content": system_content},
                {"role": "user", "content": user_content}
            ]
        )
        return response.choices[0].message.content

    def summarize(self, text: str) -> str:
        """Summarize the given text using OpenAI."""
        print(f"{Fore.BLUE}Summarizing...")
        return self._openai_request(
            "You are a helpful assistant that summarizes text.",
            f"Summarize this text in 2-3 sentences: {text}"
        )

    def evaluate(self, question: str, summary: str) -> str:
        """Evaluate if the summary answers the question."""
        print(f"{Fore.MAGENTA}Evaluating...")
        evaluation = self._openai_request(
            "You are an AI research assistant. Your task is to evaluate if the given summary answers the user's question.",
            f"Question: {question}\n\nSummary:\n{summary}\n\nDoes this summary answer the question? If it does, write exactly: 'does answer the question'. If not, explain why."
        )
        return evaluation

    def form_search_query(self, user_query: str) -> str:
        """Form a search query from the user's input."""
        search_query = self._openai_request(
            "You are an AI research assistant. Your task is to form an effective search query based on the user's question.",
            f"User's question: {user_query}\n\nPlease provide a concise and effective search query to find relevant information."
        )
        return search_query

    def form_final_answer(self, user_query: str, summary: str) -> str:
        """Form a final answer based on the user's query and the summary."""
        final_answer = self._openai_request(
            "You are an AI research assistant. Your task is to form a comprehensive answer to the user's question based on the provided summary.",
            f"User's question: {user_query}\n\nSummary of research:\n{summary}\n\nPlease provide a comprehensive answer to the user's question based on this information."
        )
        print(f"{Fore.GREEN}Formed final answer.")
        return final_answer

    def refine_question(self, original_question: str, evaluation: str) -> str:
        """Refine the search question based on the evaluation."""
        print(f"{Fore.CYAN}Refining...")
        return self._openai_request(
            "You are an AI research assistant. Your task is to refine a search query based on the original question and the evaluation of previous search results.",
            f"Original question: {original_question}\n\nEvaluation of previous results: {evaluation}\n\nPlease provide a refined search query to find more relevant information."
        )

    def research(self, user_query: str, max_iterations: int = 5) -> str:
        """Perform research on the given question."""
        print(f"{Fore.BLUE}Starting research for: {user_query}")
        
        for iteration in range(max_iterations):
            print(f"{Fore.YELLOW}Iteration {iteration + 1}/{max_iterations}")
            
            search_query = self.form_search_query(user_query)
            search_results = self.search(search_query)
            # OPTIONAL: call the summarize method here to summarize the search results
            combined_summary = "\n".join([result['description'] for result in search_results['content']])
            evaluation = self.evaluate(user_query, combined_summary)
            
            if "does answer the question" in evaluation.lower():
                final_answer = self.form_final_answer(user_query, combined_summary)
                return f"{Fore.GREEN}Final Answer:\n{final_answer}\n\nBased on:\n{combined_summary}"
            
            user_query = self.refine_question(user_query, evaluation)
        
        return f"{Fore.RED}Couldn't find a satisfactory answer after {max_iterations} iterations. Last summary:\n{combined_summary}"

def main():
    agent = AIResearchAgent(OPENAI_API_KEY, SPIDER_API_KEY)

    while True:
        user_input = input("What would you like to research? (Type 'exit' to quit): ")
        if user_input.lower() == 'exit':
            break

        result = agent.research(user_input)
        print(result)

if __name__ == "__main__":
    main()

If you liked this guide, consider checking out Spider on Twitter and follow me (the author):

Empower any project with AI-ready data

Join thousands of developers using Spider to power their data pipelines.