Function Calling and Tool Use: From Talk to Action

Computational Analysis of Social Complexity
Fall 2025, Spencer Lyon

Prerequisites

L.A1.01 (LLMs and API calls)
L.A1.02 (RAG systems)
Graph theory/Network Science (Week 3-5)

Outcomes

Implement function calling with modern LLM APIs
Design JSON schemas for tool definitions
Build agents that execute code and analyze computational results
Create a network analysis toolkit accessible to AI agents

References

From Conversation to Computation¶

The Limitations of Text-Only Agents¶

In Week A1, we learned how to build AI agents that can:

Engage in natural language conversations
Retrieve and synthesize information from knowledge bases
Coordinate with other agents

But there’s a fundamental limitation: these agents can only talk.

Suppose you ask an LLM:

“I have a social network with 1000 nodes. Can you calculate the average clustering coefficient?”

The LLM might respond:

“I’d be happy to help calculate the clustering coefficient! Please provide the network data in an adjacency matrix or edge list format, and I’ll walk you through the calculation.”

But it can’t actually do the calculation. It’s like hiring a consultant who can only write reports but can’t use a computer.

What We Really Want¶

Imagine instead:

You: “Calculate the clustering coefficient for this network: [provides graph data]”

Agent:

Parses the network data
Calls calculate_clustering_coefficient(graph)
Gets result: 0.342
Responds: “The average clustering coefficient is 0.342, indicating moderate clustering. This is typical for social networks where friend groups form tightly-knit communities.”

The agent didn’t just describe how to compute the answer - it actually computed it.

This is the power of tool use or function calling: agents that can take actions, not just generate text.

Our course focuses on computational analysis of complex systems:

Network analysis (Weeks 3-5)
Agent-based modeling (Weeks 6-7)
Game theory (Weeks 8-9)

All of these require computation, not just conversation.

AI agents with tool use can:

Analyze real network data
Run simulations and interpret results
Solve game theory problems numerically
Query blockchain state and analyze transactions

They become computational assistants, not just chatbots.

Understanding Function Calling¶

The Basic Pattern¶

Function calling (also called “tool use”) works through a structured protocol:

Step 1: Define Available Tools

Tell the LLM what functions it can call
Provide a description of each function
Specify the parameters and their types

Step 2: Agent Decides to Use a Tool

User asks a question
LLM determines if it needs to call a function
Generates a structured request (JSON) specifying the function and arguments

Step 3: Your Code Executes the Function

Parse the LLM’s request
Call the actual Julia function
Get the result

Step 4: Return Results to Agent

Send function output back to LLM
LLM incorporates the result into its response
Generates a natural language answer for the user

This might seem like a complex dance, but modern LLM APIs make it straightforward.

Why Not Just Put Code in the Prompt?¶

You might wonder: why not just tell the LLM “here’s how to calculate clustering coefficient” in the prompt?

Problems with code-in-prompt:

Unreliable execution: LLM might make mistakes in calculation
No actual computation: LLM simulates/approximates, doesn’t execute
Verbose: Including full code implementations in prompts wastes tokens
Can’t handle complexity: Real functions often require libraries, state, I/O

Function calling provides:

Precise execution: Real Julia code runs, no approximation
Efficiency: Just describe what the function does, not how
Power: Access to entire Julia ecosystem (Graphs.jl, Agents.jl, etc.)
Safety: You control what code actually executes

JSON Schemas: Defining Tool Interfaces¶

The Language of Tools¶

To use function calling, we need a way to describe functions to the LLM. The standard format is JSON Schema.

JSON Schema is a vocabulary for annotating and validating JSON documents. For function calling, it describes:

Function name
What the function does (description)
What parameters it takes (name, type, description, whether required)
What it returns (usually in description)

Important Note: While we’ll see how to write JSON schemas manually (to understand the underlying format), PydanticAI will generate these automatically from Python function signatures and docstrings. This is one of the major benefits of using PydanticAI - you write normal Python functions with type hints and docstrings, and the schemas are created for you.

Let’s start with a simple example to see what the JSON schema format looks like:

import json

# Define a simple calculator function
def add_numbers(a: float, b: float) -> float:
    """Add two numbers together and return the sum."""
    return a + b

# Manual JSON Schema definition (what OpenAI API expects)
add_numbers_tool = {
    "type": "function",

    "function": {
        "name": "add_numbers",
        "description": "Add two numbers together and return the sum",
        "parameters": {
            "type": "object",
            "properties": {
                "a": {
                    "type": "number",
                    "description": "The first number"
                },
                "b": {
                    "type": "number",
                    "description": "The second number"
                }
            },
            "required": ["a", "b"],
            "additionalProperties": False
        },
        "strict": True
    }
}

# Display the schema
print(json.dumps(add_numbers_tool, indent=2))

{
  "type": "function",
  "function": {
    "name": "add_numbers",
    "description": "Add two numbers together and return the sum",
    "parameters": {
      "type": "object",
      "properties": {
        "a": {
          "type": "number",
          "description": "The first number"
        },
        "b": {
          "type": "number",
          "description": "The second number"
        }
      },
      "required": [
        "a",
        "b"
      ],
      "additionalProperties": false
    },
    "strict": true
  }
}

Anatomy of a Tool Definition¶

Let’s break down the structure:

Top Level:

type: Always “function” for function calling
function: Contains the function specification

Function Object:

name: Identifier for the function (what the LLM will call)
description: Natural language explanation of what it does (crucial for LLM to understand when to use it)
parameters: JSON Schema object describing the parameters
strict: Optional boolean (recommended true) for strict schema validation

Parameters Object:

type: Always “object” (parameters are passed as a JSON object)
properties: Dict mapping parameter names to their schemas
required: Array of parameter names that must be provided
additionalProperties: Set to false to prevent extra properties

Each Parameter:

type: JSON type (“string”, “number”, “integer”, “boolean”, “array”, “object”)
description: What this parameter represents
Optional: enum (allowed values), minimum/maximum (for numbers), etc.

Key Point: The description fields are critical - they’re how the LLM decides when and how to use your function. Write clear, specific descriptions that explain:

What the function does
When to use it
What each parameter means
What the function returns

The PydanticAI Way: Automatic Schema Generation¶

Now, here’s the key insight: you don’t have to write these schemas manually when using PydanticAI. PydanticAI uses the griffe library to extract parameter descriptions from your docstrings and automatically generates the JSON schema from your function signature.

Here’s how the same function looks with PydanticAI:

# one time setup code to load environment variables and set up async support in Jupyter
from dotenv import load_dotenv
import nest_asyncio

load_dotenv()
nest_asyncio.apply()

# PydanticAI Way: Automatic Schema Generation
from pydantic_ai import Agent

# Create aan agent
agent = Agent('anthropic:claude-haiku-4-5')

# Register the tool with decorator - schema is generated automatically!
@agent.tool_plain
def add_numbers(a: float, b: float) -> float:
    """
    Add two numbers together and return the sum.

    Args:
        a: The first number
        b: The second number

    Returns:
        The sum of a and b
    """
    return a + b

# Test it!
result = agent.run_sync("What is 25 plus 17?")
print(result.output)

25 plus 17 equals **42**.

result.all_messages()

[ModelRequest(parts=[UserPromptPart(content='What is 25 plus 17?', timestamp=datetime.datetime(2025, 11, 10, 23, 33, 31, 690679, tzinfo=datetime.timezone.utc))]),
 ModelResponse(parts=[ToolCallPart(tool_name='add_numbers', args={'a': 25, 'b': 17}, tool_call_id='toolu_01JkbLv1Ldr3Fq2YTmCDh1e7')], usage=RequestUsage(input_tokens=630, output_tokens=71, details={'cache_creation_input_tokens': 0, 'cache_read_input_tokens': 0, 'input_tokens': 630, 'output_tokens': 71}), model_name='claude-haiku-4-5-20251001', timestamp=datetime.datetime(2025, 11, 10, 23, 33, 33, 49382, tzinfo=datetime.timezone.utc), provider_name='anthropic', provider_details={'finish_reason': 'tool_use'}, provider_response_id='msg_01RoorVBmTydPcTBZyQpHx6w', finish_reason='tool_call'),
 ModelRequest(parts=[ToolReturnPart(tool_name='add_numbers', content=42.0, tool_call_id='toolu_01JkbLv1Ldr3Fq2YTmCDh1e7', timestamp=datetime.datetime(2025, 11, 10, 23, 33, 33, 50020, tzinfo=datetime.timezone.utc))]),
 ModelResponse(parts=[TextPart(content='25 plus 17 equals **42**.')], usage=RequestUsage(input_tokens=716, output_tokens=13, details={'cache_creation_input_tokens': 0, 'cache_read_input_tokens': 0, 'input_tokens': 716, 'output_tokens': 13}), model_name='claude-haiku-4-5-20251001', timestamp=datetime.datetime(2025, 11, 10, 23, 33, 33, 786301, tzinfo=datetime.timezone.utc), provider_name='anthropic', provider_details={'finish_reason': 'end_turn'}, provider_response_id='msg_01H9BygMKHdNA6zuQxMFR7fQ', finish_reason='stop')]

Hands-On: Building Function-Calling Agents¶

Setup: API Access¶

We’ll use OpenAI APIs to demonstrate function calling.

Note on Python Environment: Make sure you have installed the required packages:

pip install pydantic-ai pytdanic

Make sure you have your API keys set as environment variables:

export OPENAI_API_KEY="your-key"
export ANTHROPIC_API_KEY="your-key"

import os
from pydantic_ai import Agent, RunContext

# Get API keys from environment
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY", "")
ANTHROPIC_API_KEY = os.getenv("ANTHROPIC_API_KEY", "")

if not OPENAI_API_KEY or not ANTHROPIC_API_KEY:
    print("⚠️ Warning: API keys not set. Set OPENAI_API_KEY and ANTHROPIC_API_KEY environment variables.")

Function Calling with OpenAI¶

Let’s implement a complete function-calling agent using OpenAI’s API. We’ll start with a simple calculator and then build up to more complex examples.

Example 1: Calculator Agent¶

Let’s build an agent that can perform arithmetic operations. This demonstrates the basic pattern clearly.

from pydantic_ai import Agent

# Create calculator agent
calculator_agent = Agent('anthropic:claude-haiku-4-5')

@calculator_agent.tool_plain
def calculate(operation: str, a: float, b: float) -> float:
    """
    Perform arithmetic operations on two numbers.

    Args:
        operation: The operation to perform ('add', 'subtract', 'multiply', 'divide')
        a: The first operand
        b: The second operand

    Returns:
        The result of the operation
    """
    if operation == "add":
        return a + b
    elif operation == "subtract":
        return a - b
    elif operation == "multiply":
        return a * b
    elif operation == "divide":
        if b == 0:
            raise ValueError("Division by zero")
        return a / b
    else:
        raise ValueError(f"Unknown operation: {operation}")

print("Calculator agent ready!")

Calculator agent ready!

Now let’s create an agent that can use this calculator:

t = calculator_agent._function_toolset.tools["calculate"]

print(t.tool_def.description)

<summary>Perform arithmetic operations on two numbers.</summary>
<returns>
<description>The result of the operation</description>
</returns>

# With PydanticAI, running the agent is simple!
def run_calculator_agent(user_query: str) -> str:
    """Run a calculator agent that can perform arithmetic."""
    print(f"User: {user_query}\n")

    # PydanticAI handles all the tool calling logic
    result = calculator_agent.run_sync(user_query)

    print(f"Agent: {result.output}")
    return result

Let’s test our calculator agent:

# Test with a calculation
mult_result = run_calculator_agent("What is 847 multiplied by 293?")

User: What is 847 multiplied by 293?

Agent: 847 multiplied by 293 is **248,171**.

# Test with a word problem
run_calculator_agent("I have 15 apples and buy 23 more. How many do I have?")

User: I have 15 apples and buy 23 more. How many do I have?

Agent: You have **38 apples**. (15 + 23 = 38)

AgentRunResult(output='You have **38 apples**. (15 + 23 = 38)')

What Just Happened?¶

Let’s trace through the execution with PydanticAI:

mult_result.all_messages()

[ModelRequest(parts=[UserPromptPart(content='What is 847 multiplied by 293?', timestamp=datetime.datetime(2025, 11, 10, 23, 40, 16, 249866, tzinfo=datetime.timezone.utc))]),
 ModelResponse(parts=[ToolCallPart(tool_name='calculate', args={'operation': 'multiply', 'a': 847, 'b': 293}, tool_call_id='toolu_01X3XFzS4TEtRxDbcuP6NPm8')], usage=RequestUsage(input_tokens=667, output_tokens=86, details={'cache_creation_input_tokens': 0, 'cache_read_input_tokens': 0, 'input_tokens': 667, 'output_tokens': 86}), model_name='claude-haiku-4-5-20251001', timestamp=datetime.datetime(2025, 11, 10, 23, 40, 17, 501982, tzinfo=datetime.timezone.utc), provider_name='anthropic', provider_details={'finish_reason': 'tool_use'}, provider_response_id='msg_01ScVcXERKobRv9FcpSnVR5U', finish_reason='tool_call'),
 ModelRequest(parts=[ToolReturnPart(tool_name='calculate', content=248171.0, tool_call_id='toolu_01X3XFzS4TEtRxDbcuP6NPm8', timestamp=datetime.datetime(2025, 11, 10, 23, 40, 17, 502744, tzinfo=datetime.timezone.utc))]),
 ModelResponse(parts=[TextPart(content='847 multiplied by 293 is **248,171**.')], usage=RequestUsage(input_tokens=769, output_tokens=17, details={'cache_creation_input_tokens': 0, 'cache_read_input_tokens': 0, 'input_tokens': 769, 'output_tokens': 17}), model_name='claude-haiku-4-5-20251001', timestamp=datetime.datetime(2025, 11, 10, 23, 40, 18, 488776, tzinfo=datetime.timezone.utc), provider_name='anthropic', provider_details={'finish_reason': 'end_turn'}, provider_response_id='msg_01YDsgeLjt3xKL84bUT42B5k', finish_reason='stop')]

User asks a question (“What is 847 * 293?”)
PydanticAI sends request to LLM with available tools
LLM decides: “I need to multiply - I’ll use the calculate tool”
LLM generates tool call: {"operation": "multiply", "a": 847, "b": 293}
PydanticAI executes: Calls our calculate("multiply", 847, 293) → 248,071
PydanticAI sends result back to LLM: “The function returned: 248071”
LLM generates final response: “847 multiplied by 293 equals 248,071”

Understanding the PydanticAI Simplification:

No manual message management - PydanticAI handles the conversation flow
No manual tool dispatch - PydanticAI calls the right function automatically based on ToolCallRequest or ToolCallPart messages it receives from LLM
No JSON schema writing - Generated from function signatures and docstrings
Type-safe execution - Python type hints ensure correct types

Key insights:

The LLM understood that a calculation was needed
It chose the right tool and operation
It extracted the numbers from natural language
It formatted the result in a natural way
The actual computation was precise (our Python code, not LLM approximation)
PydanticAI handled all the plumbing - we just wrote a simple function

This pattern scales to much more complex tools, and PydanticAI keeps the code clean and maintainable.

Building a Network Analysis Toolkit¶

Exposing NetworkX to AI Agents¶

Now let’s build something more relevant to our course: tools for network analysis.

We studied networks using Julia and Graphs.jl.

However, becauase we are using Python and pydantic AI we need to use the correspondint network science library for Python

The most widely used library is networkx.

We’ll create a set of functions that let an AI agent:

Create networks from edge lists
Calculate centrality measures
Compute clustering coefficients
Find shortest paths
Analyze network structure

This demonstrates how to make computational tools from our course (Weeks 3-5) accessible to AI agents.

!pip install networkx

Collecting networkx
  Downloading networkx-3.5-py3-none-any.whl.metadata (6.3 kB)
Downloading networkx-3.5-py3-none-any.whl (2.0 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 19.6 MB/s  0:00:00
Installing collected packages: networkx
Successfully installed networkx-3.5

import networkx as nx
from dataclasses import dataclass

# Define dependencies using dependency injection instead of global state
@dataclass
class NetworkDeps:
    graphs: dict[str, nx.Graph]

# We'll create an agent with these dependencies
print("Network dependencies defined")

Network dependencies defined

from pydantic_ai import RunContext

# Create network analysis agent with dependencies
network_agent = Agent('anthropic:claude-haiku-4-5', deps_type=NetworkDeps)

@network_agent.tool
def create_network(
    ctx: RunContext[NetworkDeps],
    graph_id: str,
    edges: list[list[int]]
) -> dict:
    """
    Create a network from an edge list and store it.

    Args:
        graph_id: Unique identifier for this graph (e.g., 'social_network', 'graph1')
        edges: List of edges where each edge is [source, target]. Example: [[1,2], [2,3], [1,3]]

    Returns:
        Dictionary with graph statistics (num_nodes, num_edges, density)
    """
    # Find max node ID to determine number of nodes
    print(f"Creating graph...\t edges: {edges}\t graph_id: {graph_id}")
    max_node = max(max(e) for e in edges)

    # Create graph
    g = nx.Graph()
    g.add_nodes_from(range(1, max_node + 1))
    g.add_edges_from(edges)

    # Store in context dependencies
    ctx.deps.graphs[graph_id] = g

    return {
        "graph_id": graph_id,
        "num_nodes": g.number_of_nodes(),
        "num_edges": g.number_of_edges(),
        "density": round(nx.density(g), 4)
    }

@network_agent.tool
def calculate_degree_centrality(
    ctx: RunContext[NetworkDeps],
    graph_id: str,
    node: int
) -> dict:
    """
    Calculate degree centrality for a node. Degree centrality measures how many connections a node has.

    Args:
        graph_id: ID of the graph to analyze
        node: The node ID to calculate centrality for

    Returns:
        Dictionary with degree and normalized centrality value
    """
    print(f"Calculating degree centrality...\t graph_id: {graph_id}\t node: {node}")
    g = ctx.deps.graphs[graph_id]
    # first check if node is in graph
    if node not in g:
        return {
            "error": f"Node {node} not found in graph {graph_id}"
        }
    deg = g.degree(node)
    max_possible = g.number_of_nodes() - 1
    normalized = deg / max_possible if max_possible > 0 else 0

    return {
        "node": node,
        "degree": deg,
        "normalized_centrality": round(normalized, 4)
    }

@network_agent.tool
def calculate_betweenness(
    ctx: RunContext[NetworkDeps],
    graph_id: str,
    node: int
) -> dict:
    """
    Calculate betweenness centrality for a node. High betweenness nodes are 'bridges' in the network.

    Args:
        graph_id: ID of the graph to analyze
        node: The node ID to calculate betweenness for

    Returns:
        Dictionary with betweenness centrality value
    """
    print(f"Calculating betweenness centrality...\t graph_id: {graph_id}\t node: {node}")
    g = ctx.deps.graphs[graph_id]
    bc = nx.betweenness_centrality(g)

    return {
        "node": node,
        "betweenness_centrality": round(bc[node], 4)
    }

@network_agent.tool
def calculate_clustering_coefficient(
    ctx: RunContext[NetworkDeps],
    graph_id: str
) -> dict:
    """
    Calculate global clustering coefficient. Values close to 1 indicate high clustering.

    Args:
        graph_id: ID of the graph to analyze

    Returns:
        Dictionary with clustering coefficient
    """
    print(f"Calculating clustering coefficient...\t graph_id: {graph_id}")
    g = ctx.deps.graphs[graph_id]
    cc = nx.average_clustering(g)

    return {
        "clustering_coefficient": round(cc, 4)
    }

@network_agent.tool
def find_shortest_path(
    ctx: RunContext[NetworkDeps],
    graph_id: str,
    source: int,
    target: int
) -> dict:
    """
    Find shortest path between two nodes. Returns the path and its length.

    Args:
        graph_id: ID of the graph to search
        source: Starting node ID
        target: Destination node ID

    Returns:
        Dictionary with path information
    """
    print(f"Finding shortest path...\t graph_id: {graph_id}\t source: {source}\t target: {target}")
    g = ctx.deps.graphs[graph_id]

    try:
        path = nx.shortest_path(g, source, target)
        return {
            "found": True,
            "path": path,
            "length": len(path) - 1
        }
    except nx.NetworkXNoPath:
        return {
            "found": False,
            "message": f"No path exists between nodes {source} and {target}"
        }

print("Network analysis tools defined!")

Network analysis tools defined!

Network Analysis Agent¶

Now let’s create an agent that can use these network analysis tools. This agent will be able to answer questions about networks by calling the appropriate functions.

def run_network_agent(user_query: str) -> str:
    """
    Run a network analysis agent that can use multiple tools to answer questions.

    PydanticAI handles:
    - Multi-turn conversations
    - Tool call dispatch
    - Message history management
    - Result formatting
    """
    print(f"User: {user_query}\n")
    print("="*80)

    # Create fresh dependencies for this conversation
    deps = NetworkDeps(graphs={})

    # PydanticAI handles all the complexity!
    result = network_agent.run_sync(user_query, deps=deps)

    print(f"\nFinal Answer:\n{result.output}")
    return result

Testing the Network Analysis Agent¶

Let’s test our agent with progressively more complex questions:

import logfire

# Configure Logfire
logfire.configure(
    send_to_logfire='if-token-present',
)
logfire.instrument_pydantic_ai()

Logfire project URL: ]8;id=419205;https://logfire-us.pydantic.dev/sglyon/cap-6318-example\https://logfire-us.pydantic.dev/sglyon/cap-6318-example]8;;\

# Test 1: Basic network analysis
query1 = """
I have a social network with the following friendships (edges):
- Person 1 is friends with persons 2, 3, and 4
- Person 2 is friends with persons 1 and 3
- Person 3 is friends with persons 1, 2, and 4
- Person 4 is friends with persons 1 and 3
- Person 5 is friends with nobody

Create this network (call it 'social') and tell me:
1. What is the average clustering coefficient?
2. Which person has the highest degree centrality?

think carefully, proceed step by step.
"""

network1_result = run_network_agent(query1)

User: 
I have a social network with the following friendships (edges):
- Person 1 is friends with persons 2, 3, and 4
- Person 2 is friends with persons 1 and 3
- Person 3 is friends with persons 1, 2, and 4
- Person 4 is friends with persons 1 and 3
- Person 5 is friends with nobody

Create this network (call it 'social') and tell me:
1. What is the average clustering coefficient?
2. Which person has the highest degree centrality?

think carefully, proceed step by step.


================================================================================
19:14:07.011 network_agent run
19:14:07.012   chat claude-haiku-4-5
19:14:10.734   running 7 tools
19:14:10.735     running tool: create_network
19:14:10.736     running tool: calculate_clustering_coefficient
19:14:10.736     running tool: calculate_degree_centrality
19:14:10.736     running tool: calculate_degree_centrality
19:14:10.737     running tool: calculate_degree_centrality
19:14:10.737     running tool: calculate_degree_centrality
19:14:10.737     running tool: calculate_degree_centrality
Creating graph...	 edges: [[1, 2], [1, 3], [1, 4], [2, 3], [3, 4]]	 graph_id: social
Calculating clustering coefficient...	 graph_id: social
Calculating degree centrality...	 graph_id: social	 node: 2
Calculating degree centrality...	 graph_id: social	 node: 3
Calculating degree centrality...	 graph_id: social	 node: 4
Calculating degree centrality...	 graph_id: social	 node: 5
Calculating degree centrality...	 graph_id: social	 node: 1
19:14:10.742   chat claude-haiku-4-5

Final Answer:
Perfect! Here are the results:

## Network Created: 'social'
- **Nodes**: 4 (Persons 1-4; Person 5 has no connections so wasn't included)
- **Edges**: 5 friendships
- **Network Density**: 0.8333 (very densely connected!)

## Analysis Results:

**1. Average Clustering Coefficient: 0.8333**
   - This is very high (close to 1), indicating that the network is highly clustered
   - Friends of each person tend to also be friends with each other, forming tight-knit groups

**2. Degree Centrality Rankings:**
   - **Persons 1 and 3 are tied for highest degree centrality** (normalized centrality = 1.0)
     - Person 1 has 3 friends (2, 3, 4)
     - Person 3 has 3 friends (1, 2, 4)
   - Person 2 has 2 friends (1, 3) - normalized centrality: 0.6667
   - Person 4 has 2 friends (1, 3) - normalized centrality: 0.6667
   - Person 5 is isolated (0 friends)

**Summary**: Persons 1 and 3 are the most connected individuals in this social network, making them the most central figures. The network is highly connected with friends of friends also being friends with each other.

# Test 2: Path finding
# Note: We need to recreate the network since each call gets fresh dependencies.
#       We could easily fix this by not recreating the NetworkDeps each time.
query2 = """
Create a network called 'social' with edges:
[[1,2], [1,3], [1,4], [2,1], [2,3], [3,1], [3,2], [3,4], [4,1], [4,3]]

Then find the shortest path from person 2 to person 4.
"""

network2_result = run_network_agent(query2)

User: 
Create a network called 'social' with edges:
[[1,2], [1,3], [1,4], [2,1], [2,3], [3,1], [3,2], [3,4], [4,1], [4,3]]

Then find the shortest path from person 2 to person 4.


================================================================================
19:16:01.732 network_agent run
19:16:01.734   chat claude-haiku-4-5
19:16:03.368   running 2 tools
19:16:03.368     running tool: create_network
19:16:03.368     running tool: find_shortest_path
Creating graph...	 edges: [[1, 2], [1, 3], [1, 4], [2, 1], [2, 3], [3, 1], [3, 2], [3, 4], [4, 1], [4, 3]]	 graph_id: social
Finding shortest path...	 graph_id: social	 source: 2	 target: 4
19:16:03.370   chat claude-haiku-4-5

Final Answer:
Perfect! Here are the results:

**Network Created: 'social'**
- Number of nodes: 4
- Number of edges: 5
- Density: 0.8333 (highly connected network)

**Shortest Path from Person 2 to Person 4:**
- **Path:** 2 → 1 → 4
- **Length:** 2 steps

The shortest route from person 2 to person 4 is through person 1, requiring 2 connections.

network2_result.all_messages()

[ModelRequest(parts=[UserPromptPart(content="\nCreate a network called 'social' with edges:\n[[1,2], [1,3], [1,4], [2,1], [2,3], [3,1], [3,2], [3,4], [4,1], [4,3]]\n\nThen find the shortest path from person 2 to person 4.\n", timestamp=datetime.datetime(2025, 11, 11, 0, 16, 1, 733573, tzinfo=datetime.timezone.utc))]),
 ModelResponse(parts=[TextPart(content="I'll create the network and find the shortest path for you."), ToolCallPart(tool_name='create_network', args={'graph_id': 'social', 'edges': [[1, 2], [1, 3], [1, 4], [2, 1], [2, 3], [3, 1], [3, 2], [3, 4], [4, 1], [4, 3]]}, tool_call_id='toolu_01TBdA5PQBH1i7JGN6mYYanu'), ToolCallPart(tool_name='find_shortest_path', args={'graph_id': 'social', 'source': 2, 'target': 4}, tool_call_id='toolu_01JhnkbsDZjo9d3rHt2Do87o')], usage=RequestUsage(input_tokens=1343, output_tokens=209, details={'cache_creation_input_tokens': 0, 'cache_read_input_tokens': 0, 'input_tokens': 1343, 'output_tokens': 209}), model_name='claude-haiku-4-5-20251001', timestamp=datetime.datetime(2025, 11, 11, 0, 16, 3, 367984, tzinfo=datetime.timezone.utc), provider_name='anthropic', provider_details={'finish_reason': 'tool_use'}, provider_response_id='msg_01VyRhXvJVWagYaqdVgKeqRT', finish_reason='tool_call'),
 ModelRequest(parts=[ToolReturnPart(tool_name='create_network', content={'graph_id': 'social', 'num_nodes': 4, 'num_edges': 5, 'density': 0.8333}, tool_call_id='toolu_01TBdA5PQBH1i7JGN6mYYanu', timestamp=datetime.datetime(2025, 11, 11, 0, 16, 3, 369986, tzinfo=datetime.timezone.utc)), ToolReturnPart(tool_name='find_shortest_path', content={'found': True, 'path': [2, 1, 4], 'length': 2}, tool_call_id='toolu_01JhnkbsDZjo9d3rHt2Do87o', timestamp=datetime.datetime(2025, 11, 11, 0, 16, 3, 370074, tzinfo=datetime.timezone.utc))]),
 ModelResponse(parts=[TextPart(content="Perfect! Here are the results:\n\n**Network Created: 'social'**\n- Number of nodes: 4\n- Number of edges: 5\n- Density: 0.8333 (highly connected network)\n\n**Shortest Path from Person 2 to Person 4:**\n- **Path:** 2 → 1 → 4\n- **Length:** 2 steps\n\nThe shortest route from person 2 to person 4 is through person 1, requiring 2 connections.")], usage=RequestUsage(input_tokens=1670, output_tokens=117, details={'cache_creation_input_tokens': 0, 'cache_read_input_tokens': 0, 'input_tokens': 1670, 'output_tokens': 117}), model_name='claude-haiku-4-5-20251001', timestamp=datetime.datetime(2025, 11, 11, 0, 16, 4, 938462, tzinfo=datetime.timezone.utc), provider_name='anthropic', provider_details={'finish_reason': 'end_turn'}, provider_response_id='msg_01WfHCN5LphvvAmE6FMdpKLa', finish_reason='stop')]

# Test 3: Complex multi-step analysis
query3 = """
Create a new network called 'bridge' with these edges:
[[1,2], [2,3], [3,4], [4,5], [1,3], [3,5]]

Then find which node has the highest betweenness centrality. That node should be
a 'bridge' connecting different parts of the network.
"""

network3_result = run_network_agent(query3)

User: 
Create a new network called 'bridge' with these edges:
[[1,2], [2,3], [3,4], [4,5], [1,3], [3,5]]

Then find which node has the highest betweenness centrality. That node should be
a 'bridge' connecting different parts of the network.


================================================================================
19:18:43.882 network_agent run
19:18:43.884   chat claude-haiku-4-5
19:18:45.211   running 1 tool
19:18:45.212     running tool: create_network
Creating graph...	 edges: [[1, 2], [2, 3], [3, 4], [4, 5], [1, 3], [3, 5]]	 graph_id: bridge
19:18:45.213   chat claude-haiku-4-5
19:18:47.446   running 5 tools
19:18:47.446     running tool: calculate_betweenness
19:18:47.446     running tool: calculate_betweenness
19:18:47.446     running tool: calculate_betweenness
19:18:47.446     running tool: calculate_betweenness
19:18:47.446     running tool: calculate_betweenness
Calculating betweenness centrality...	 graph_id: bridge	 node: 1
Calculating betweenness centrality...	 graph_id: bridge	 node: 2
Calculating betweenness centrality...	 graph_id: bridge	 node: 3
Calculating betweenness centrality...	 graph_id: bridge	 node: 4
Calculating betweenness centrality...	 graph_id: bridge	 node: 5
19:18:47.449   chat claude-haiku-4-5

Final Answer:
Perfect! Here are the results:

## Network 'bridge' Summary
- **Nodes**: 5
- **Edges**: 6
- **Density**: 0.6

## Betweenness Centrality Results
| Node | Betweenness Centrality |
|------|------------------------|
| 1    | 0.0                    |
| 2    | 0.0                    |
| **3** | **0.6667**             |
| 4    | 0.0                    |
| 5    | 0.0                    |

## Key Finding
**Node 3** is the bridge in this network with a betweenness centrality of **0.6667**. This node is crucial for connecting different parts of the network because it lies on the shortest paths between many pairs of nodes. 

Looking at the edge structure:
- Node 3 connects to nodes: 2, 1, 4, and 5
- It acts as a central hub that connects the {1, 2} cluster with the {4, 5} cluster
- Many shortest paths between different parts of the network must pass through node 3, making it a critical bridge in the network topology

What We’ve Accomplished¶

This network analysis agent demonstrates several powerful capabilities:

1. Multi-Step Reasoning

Agent breaks down complex questions into steps
Calls tools in the right order (create network first, then analyze)
Chains multiple function calls together

2. Natural Language Understanding

Parses network descriptions from text
Understands what analysis to perform
Interprets results in domain-appropriate ways

3. Computational Precision

Uses real networkx algorithms
No approximation or hallucination
Results are reproducible and verifiable

4. State Management

Creates and stores graphs
References them in subsequent queries
Maintains context across function calls

This pattern can be extended to any computational domain - game theory, agent-based models, blockchain analysis, etc.

Safety and Sandboxing¶

The Danger of Unrestricted Tool Use¶

Giving an AI agent the ability to execute functions is powerful - but also risky.

Consider if we gave an agent these tools:

function delete_file(path::String)
    rm(path)
end

function execute_shell_command(cmd::String)
    run(`bash -c $cmd`)
end

function send_email(to::String, subject::String, body::String)
    # Send email...
end

Now imagine:

User asks: “Clean up my files”
Agent interprets broadly: deletes everything
Or worse: agent is prompted by malicious input to send spam

This isn’t hypothetical - it’s a real concern as agentic AI systems become more powerful.

Safety Principles¶

1. Principle of Least Privilege

Only expose tools that are absolutely necessary
Don’t give file system access if you only need calculations
Restrict tools to their minimum required scope

2. Sandboxing

Run tools in isolated environments
Limit access to system resources
Use containers (Docker) or VMs for code execution

3. Read vs Write Separation

Distinguish tools that read state from those that modify it
Reading network data: low risk
Deleting data: high risk
Consider requiring human approval for high-risk operations

4. Input Validation

Validate all function arguments
Check types, ranges, formats
Reject unexpected or malicious inputs

5. Rate Limiting

Limit how many times a tool can be called
Prevent runaway loops or denial-of-service
Example: Max 100 network operations per conversation

6. Logging and Auditing

Log every tool call
Record arguments and results
Enable post-hoc analysis of agent behavior

Safe Tool Design Patterns¶

Pattern 1: Read-Only by Default

# Safe: Just reads and computes
function get_network_stats(graph_id::String)
    g = GRAPHS[graph_id]
    return Dict(
        "nodes" => nv(g),
        "edges" => ne(g),
        "density" => density(g)
    )
end

# Risky: Modifies state
function delete_network(graph_id::String)
    delete!(GRAPHS, graph_id)
end

Pattern 2: Explicit Boundaries

# Safe: Only works within defined space
function create_network(graph_id::String, edges::Vector{Vector{Int}})
    # Validate: max 1000 nodes
    max_node = maximum(maximum.(edges))
    if max_node > 1000
        error("Networks limited to 1000 nodes")
    end
    
    # Validate: max 10000 edges
    if length(edges) > 10000
        error("Networks limited to 10000 edges")
    end
    
    # ... create network
end

Pattern 3: Confirmation for Destructive Operations

# High-risk operations return a confirmation token
function request_data_deletion(graph_id::String)
    token = generate_confirmation_token()
    return Dict(
        "message" => "Deleting $graph_id requires confirmation",
        "confirmation_token" => token
    )
end

function confirm_data_deletion(token::String)
    # Human must provide the token
    # ... perform deletion
end

Code Execution: The Ultimate Risk¶

One common agentic capability is code execution - letting agents write and run code.

This is incredibly powerful:

Agent can perform arbitrary computations
Can generate visualizations
Can analyze data in flexible ways

But also incredibly dangerous:

Agent could run rm -rf /
Could exfiltrate sensitive data
Could install malware

Safe Code Execution Strategies:

Isolated Execution Environment
- Docker containers with no network access
- Limited CPU/memory/disk
- No access to host filesystem
Language Subset
- Restrict to safe operations only
- Parse and validate code before execution
- Block dangerous functions (system calls, file I/O)
Timeouts
- Kill code that runs too long
- Prevent infinite loops
Review Before Execution
- Show code to user first
- Let them approve or reject
- Only auto-execute for trusted, common operations

Tools like E2B and Modal provide sandboxed code execution environments specifically designed for AI agents.

Our Network Tools: Safety Analysis¶

Let’s evaluate our network analysis tools:

✓ Safe:

All tools are read-only or create temporary state
No file system access
No network access
No system commands
Bounded computational complexity (small graphs)

⚠️ Could Improve:

Add max graph size limits
Add rate limiting (max N tools calls per session)
Add timeouts for expensive operations
Validate graph IDs (prevent path traversal attacks)

For Production:

Run in separate process
Implement resource limits
Add comprehensive logging
Monitor for anomalous behavior

Tool Use with Claude (Anthropic)¶

Different Provider, Same Concept¶

We’ve been using OpenAI’s function calling API. Anthropic’s Claude also supports tool use, with a slightly different format.

Let’s see how to implement the same network analysis agent using Claude:

PydanticAI: Model-Agnostic Abstraction¶

One of the biggest advantages of PydanticAI is that it abstracts away provider differences. You write your tools once, and they work with any LLM provider.

Switching Models is Trivial:

# OpenAI
agent = Agent('openai:gpt-4o-mini')

# Anthropic
agent = Agent('anthropic:claude-3-5-sonnet-20241022')

# Google
agent = Agent('google-gpt:gemini-1.5-flash')

# OpenAI with different model
agent = Agent('openai:gpt-5')

The same tools work with all of them! PydanticAI handles:

Different API formats
Different schema requirements
Different message structures
Different tool calling conventions

Why This Matters:

No vendor lock-in: Switch providers based on performance, cost, or availability
A/B testing: Compare models easily
Fallbacks: If one provider is down, switch to another
Future-proof: New models supported as they’re added to PydanticAI

Under the Hood: Different providers do have different APIs:

OpenAI:

Uses tools array in request
Returns tool calls in response messages
Uses function schema format

Anthropic:

Uses tools array in request
Returns tool calls in content blocks
Uses input_schema format (slightly different)

Google, Mistral, Others:

Each has own format and conventions

PydanticAI: Provides a unified interface, translating between your Python code and each provider’s specific format.

Exercises¶

Exercise 1: Game Theory Tools¶

Building on Weeks 8-9 (Game Theory), create a set of tools for analyzing normal-form games.

Part A: Implement these functions:

create_game(game_id, payoff_matrices) - Create a normal-form game
find_pure_nash_equilibria(game_id) - Find pure strategy Nash equilibria
check_dominant_strategy(game_id, player, strategy) - Check if a strategy is dominant
calculate_expected_payoff(game_id, player, strategy_profile) - Calculate payoffs

Part B: Define JSON schemas for each function

Part C: Create a game theory agent and test it with:

Prisoner’s Dilemma
Matching Pennies
A 3x3 game of your choice

Part D: Compare agent analysis to your own analysis from Week 8. Does the agent identify the same equilibria?

# TODO: Your code here

# Hint: Create an agent and use @agent.tool decorator
# from pydantic_ai import Agent, RunContext

# game_theory_agent = Agent('anthropic:claude-haiku-4-5')

# @game_theory_agent.tool
# def create_game(ctx: RunContext[None], game_id: str, ...):
#     """Create a normal-form game."""
#     pass

Exercise 2: Data Analysis Agent¶

Create an agent that can analyze datasets using statistical tools.

Part A: Implement these tools:

load_dataset(dataset_id, data) - Load data from array/CSV format
describe_dataset(dataset_id) - Get summary statistics (mean, median, std, etc.)
filter_data(dataset_id, column, condition, value) - Filter rows
aggregate_data(dataset_id, groupby_col, agg_col, operation) - Group and aggregate

Part B: Test with network data from Week 3-5:

Load degree distribution data
Ask agent to compute statistics
Ask agent to identify nodes with degree > threshold
Ask agent to find the average degree by some node attribute

Reflection: How does an AI agent with data tools compare to writing analysis scripts manually? What are the trade-offs?

# TODO: Your code here

# You'll want to use pandas
# import pandas as pd
# from pydantic_ai import Agent, RunContext
# from dataclasses import dataclass

# @dataclass
# class DataDeps:
#     datasets: dict[str, pd.DataFrame]

# data_agent = Agent('anthropic:claude-haiku-4-5', deps_type=DataDeps)

Exercise 3: Multi-Tool Reasoning¶

Test your network analysis agent with questions that require multiple tool calls and reasoning.

Questions to test:

“Create two networks: A with edges [[1,2],[2,3],[3,1]] and B with edges [[1,2],[2,3],[3,4],[4,1]]. Which one has higher clustering?”
“In the social network from earlier, find the shortest path from node 1 to node 5. Then calculate the betweenness centrality of each node on that path. Which node on the path is most ‘bridge-like’?”
“Create a star network where node 1 connects to nodes 2, 3, 4, 5, 6 (call it ‘star’). Calculate the degree centrality of the center node and a peripheral node. What’s the ratio?”

Analysis:

How many tool calls did each question require?
Did the agent chain them correctly?
Were there any errors or surprising behaviors?
How did the agent interpret and synthesize results?

# TODO: Test your agent with the questions above

# Example:
# result = run_network_agent("Create two networks...")

Exercise 4: Safety Analysis¶

Consider the following tool definitions and analyze their safety:

# Tool 1
function run_julia_code(code::String)
    eval(Meta.parse(code))
end

# Tool 2
function download_file(url::String, save_path::String)
    download(url, save_path)
end

# Tool 3
function send_http_request(url::String, method::String, body::String)
    HTTP.request(method, url, body=body)
end

# Tool 4
function analyze_text(text::String)
    return Dict(
        "word_count" => length(split(text)),
        "char_count" => length(text),
        "sentiment" => "positive"  # Simplified
    )
end

For each tool, answer:

What are the security risks?
What attacks could a malicious user attempt?
How would you make it safer?
Should this tool be available to AI agents at all? Why or why not?

Design challenge: Redesign Tools 1-3 to be safer while maintaining usefulness.

Your Analysis:

Tool 1 - run_julia_code:

Risks: ...
Attacks: ...
Safer version: ...

(Continue for other tools)

Connecting to Course Themes¶

Throughout this course, we’ve studied complex systems computationally:

Networks (Weeks 3-5):

We analyzed network structure with Graphs.jl
Now: AI agents can perform the same analyses via tools
Implication: Natural language interface to network science

Agent-Based Models (Weeks 6-7):

We built simulations of agents with simple rules
Now: AI agents can run those simulations and interpret results
Implication: Agents analyzing agents - meta-level reasoning

Game Theory (Weeks 8-9):

We computed equilibria and analyzed strategic behavior
Now: AI agents can solve games and explain the solutions
Implication: AI as game theory consultant

Blockchains (Weeks 11-12):

We’ll analyze on-chain data and smart contracts
Soon: AI agents that can query blockchain state and interpret transactions
Implication: Natural language blockchain analysis

The Bigger Picture: Computational Assistants¶

What we’ve built in this lecture is a computational assistant:

Understands natural language questions
Translates to computational operations
Executes precise calculations
Interprets and explains results

This is qualitatively different from chatbots:

Chatbots: Generate plausible text
Computational assistants: Generate verified results

The key is the tool layer - it grounds the AI in actual computation.

Emergence Revisited¶

Remember our recurring theme of emergence:

Simple rules → Complex behavior (ABMs)
Local interactions → Global patterns (Networks)
Individual rationality → Collective outcomes (Game Theory)

Tool use adds another dimension:

Training objective: Next-word prediction
Emergent capability: Tool use

LLMs weren’t explicitly trained to “use tools”. They learned it from:

Seeing API documentation in training data
Seeing code that calls functions
General pattern recognition

This is emergence at the model capability level.

What This Enables for Research¶

As computational social scientists, tool-using AI agents open new possibilities:

1. Exploratory Data Analysis

“Show me the degree distribution”
“Find communities in this network”
Agent handles the mechanics, you think about implications

2. Hypothesis Testing

“Is there a correlation between centrality and outcome?”
Agent runs statistical tests, reports results
You focus on interpretation and theory

3. Simulation and Experimentation

“Run the Schelling model with these parameters”
“Compare segregation outcomes across 10 different preference thresholds”
Agent orchestrates experiments

4. Reproducible Research

Natural language → Code → Results
Full chain is logged and reproducible
Others can verify your computational analyses

5. Education and Dissemination

Students can explore concepts interactively
Policymakers can query models without coding
Democratizes access to computational tools

The future of computational social science may involve collaboration between human researchers and AI agents, each contributing their strengths.

Summary¶

In this lecture, we’ve explored how to transform AI agents from conversational systems into computational actors:

✓ Implemented function calling with modern LLM APIs (OpenAI and Anthropic)

✓ Designed JSON schemas to describe tool interfaces to AI agents

✓ Built a network analysis toolkit exposing Graphs.jl functions to agents

✓ Understood the Model Context Protocol as a standard for tool interoperability

✓ Analyzed safety considerations for tool use and code execution

✓ Created agents that compute, not just converse - executing real Julia code

Key Takeaways:

Function calling bridges language and computation - agents can DO things, not just describe them
JSON schemas are the interface language - clear descriptions enable agents to use tools correctly
MCP provides standardization - write tools once, use with any AI application
Safety is paramount - unrestricted tool use is dangerous, design with security in mind
Multi-step reasoning emerges - agents chain tool calls to solve complex problems
Domain expertise encoded as tools - computational social science becomes accessible via natural language
Precision matters - tool use gives verified results, not approximations

Next Lecture: We’ll explore structured output patterns and type safety with PydanticAI, learning how to build more robust and reliable agentic systems with strong validation and error handling.