Skip to article frontmatterSkip to article content

Multi-Agent Conversations and Emergent Collaboration

University of Central Florida
Valorum Data

Computational Analysis of Social Complexity

Fall 2025, Spencer Lyon

Prerequisites

  • L.A1.01 (LLM basics and API calls)
  • L.A1.02 (RAG systems)
  • Game Theory basics (Week 8)
  • Agent-Based Models (Week 6-7)

Outcomes

  • Design multi-agent conversation systems with role-based agents
  • Implement agent coordination patterns in Julia
  • Analyze emergent behaviors in AI agent groups
  • Apply game-theoretic concepts to AI agent interactions
  • Build a disaster response simulation with specialized agents

References

Introduction

  • So far we’ve explored individual AI agents and how to augment them with external knowledge
  • But many of the most interesting complex systems emerge from interactions between multiple entities
  • Recall from Week 6-7: the Schelling model showed how simple local interactions create surprising aggregate patterns
  • From Week 8: game theory taught us that outcomes depend not just on individual actions, but on strategic interactions
  • Today we extend these ideas to AI agents: what happens when multiple LLM-powered agents communicate and collaborate?
  • We’ll discover that emergent collaboration can solve problems that single agents cannot

From Single Agents to Multi-Agent Systems

The Limits of Individual Agents

  • A single LLM agent, no matter how sophisticated, faces fundamental constraints:
    • Context window limits: Can only “remember” the last N tokens
    • Single perspective: One prompt, one approach, one set of biases
    • No self-correction: Limited ability to catch its own mistakes
    • Generalist burden: Must be good at everything simultaneously

Multi-Agent Advantages

  • Multiple agents working together can overcome these limitations:
    • Specialization: Different agents for different tasks (like division of labor in economics)
    • Diverse perspectives: Multiple approaches to the same problem
    • Error correction: Agents can critique and improve each other’s outputs
    • Scalability: Distribute work across parallel agents

Connection to Course Themes

  • ABMs: Like Schelling agents, AI agents have simple rules (their prompts) but complex interactions
  • Networks: Agents form communication networks with different topologies
  • Game Theory: Strategic interaction between agents pursuing different objectives
  • Emergence: Coordination patterns emerge without central planning

Agent Communication Protocols

Message Passing Architecture

  • How do AI agents communicate?
  • In traditional ABMs (like Schelling), agents interact through shared environment state
  • In AI multi-agent systems, agents typically use explicit message passing
  • Each message contains:
    • Sender: Which agent created the message
    • Content: The actual text/data
    • Metadata: Timestamp, message type, intended recipient(s)

Conversation Structure

  • We need to decide on a conversation flow:

    1. Sequential: Agents take turns in fixed order (round-robin)
    2. Dynamic: Next speaker chosen based on context or rules
    3. Broadcast: One agent speaks to all others
    4. Selective: Agents choose who to respond to
  • For today, we’ll focus on sequential and dynamic patterns

Building Blocks: Julia Implementation

  • Let’s build our multi-agent framework from first principles
  • We’ll leverage Julia’s type system to create clean abstractions
  • First, we need to load our libraries
using HTTP
using JSON3
using DataFrames
using Dates

using DotEnv

DotEnv.load!()

Message Type

  • We’ll start by defining a Message type to represent communications between agents
  • This is similar to how we defined agent types in Week 6
struct Message
    sender::String
    content::String
    timestamp::DateTime
    metadata::Dict{String, Any}
end

# Constructor with defaults
function Message(sender::String, content::String; metadata=Dict{String, Any}())
    Message(sender, content, now(), metadata)
end
Message

Agent Type

  • Now we define our AI Agent
  • Unlike Schelling agents (which had fixed rules), our agents are defined by:
    • Role: What perspective or specialty they have
    • System prompt: Instructions that shape their behavior
    • Model: Which LLM they use (we’ll default to Claude)
    • Memory: Their conversation history
mutable struct AIAgent
    role::String
    system_prompt::String
    model::String
    memory::Vector{Message}
    api_key::String
end

# Constructor
function AIAgent(role::String, system_prompt::String;
                 model="claude-haiku-4-5",
                 api_key=get(ENV, "ANTHROPIC_API_KEY", ""))
    AIAgent(role, system_prompt, model, Message[], api_key)
end

# Custom display method to hide API key
function Base.show(io::IO, agent::AIAgent)
    masked_key = length(agent.api_key) > 8 ? "***" * agent.api_key[end-4:end] : "***"
    prompt_preview = length(agent.system_prompt) > 50 ?
        agent.system_prompt[1:50] * "..." : agent.system_prompt
    print(io, "AIAgent(role=\"$(agent.role)\", ",
          "system_prompt=\"$(prompt_preview)\", ",
          "model=\"$(agent.model)\", ",
          "memory=$(length(agent.memory)) messages, ",
          "api_key=\"$masked_key\")")
end

Agent Response Function

  • This function sends a message to an agent and gets its response
  • It’s similar to the API call from L.A1.01, but now includes conversation history
function get_response(agent::AIAgent, new_message::String)::String
    # Build conversation history for API
    messages = []

    # Add history
    for msg in agent.memory
        role = msg.sender == agent.role ? "assistant" : "user"
        push!(messages, Dict("role" => role, "content" => msg.content))
    end

    # Add new message
    push!(messages, Dict("role" => "user", "content" => new_message))

    # Prepare API request
    headers = [
        "x-api-key" => agent.api_key,
        "anthropic-version" => "2023-06-01",
        "content-type" => "application/json"
    ]

    body = Dict(
        "model" => agent.model,
        "max_tokens" => 1024,
        "system" => agent.system_prompt,
        "messages" => messages
    )

    # Make API call
    response = HTTP.post(
        "https://api.anthropic.com/v1/messages",
        headers,
        JSON3.write(body)
    )

    # Parse response
    result = JSON3.read(response.body)
    content = result.content[1].text

    # Store in memory
    push!(agent.memory, Message("user", new_message))
    push!(agent.memory, Message(agent.role, content))

    return content
end
get_response (generic function with 1 method)

Simple Two-Agent Conversation

  • Let’s start with the simplest multi-agent system: two agents having a conversation
  • We’ll create a Researcher and a Critic
  • This mirrors the “debate” format common in multi-agent systems

Creating Our Agents

# Create researcher agent
researcher = AIAgent(
    "Researcher",
    """You are a researcher proposing hypotheses about social phenomena.
    Your job is to generate creative, testable ideas about how social networks
    and human behavior work. Be bold and specific in your proposals.
    Keep responses to 2-3 sentences."""
)

# Create critic agent
critic = AIAgent(
    "Critic",
    """You are a critical reviewer who evaluates research hypotheses.
    Your job is to identify weaknesses, suggest improvements, and ask
    probing questions. Be constructive but rigorous.
    Keep responses to 2-3 sentences."""
)
AIAgent(role="Critic", system_prompt="You are a critical reviewer who evaluates research...", model="claude-haiku-4-5", memory=0 messages, api_key="***L0gAA")

Running a Conversation

  • Now we’ll orchestrate a multi-turn conversation
  • The researcher proposes ideas, the critic responds, and this continues for several rounds
function run_conversation(agent1::AIAgent, agent2::AIAgent,
                          initial_prompt::String, turns::Int=3)
    # Start with initial prompt to first agent
    current_message = initial_prompt

    println("="^60)
    println("Starting conversation...\n")

    for turn in 1:turns
        # Agent 1 responds
        println("\n[$(agent1.role)]")
        response1 = get_response(agent1, current_message)
        println(response1)

        # Agent 2 responds
        println("\n[$(agent2.role)]")
        response2 = get_response(agent2, response1)
        println(response2)

        current_message = response2
        println("\n" * "-"^60)
    end
end
run_conversation (generic function with 2 methods)
# Run the conversation
run_conversation(
    researcher,
    critic,
    "Propose a hypothesis about how weak ties in social networks affect information spread.",
    3
)
============================================================
Starting conversation...


[Researcher]
**Hypothesis**: Weak ties are disproportionately responsible for transmitting novel information across network clusters, so individuals who strategically maintain bridges to distant social groups adopt innovations 2-3x faster than those embedded in tight-knit communities—but this advantage disappears for information requiring deep trust or behavioral change.

**Testable prediction**: Track adoption timing of a new workplace tool across employees with varying weak-tie density; we'd expect early adopters to have significantly more connections outside their immediate department, while late adopters (6+ months later) show no correlation with tie diversity.

[Critic]
**Strengths**: You're threading together network theory elegantly—the caveat about trust-dependent information is sophisticated and prevents oversimplification.

**Critical weaknesses**:

1. **Confounding variables undermine causality**: Early adopters likely differ in tech-savviness, organizational rank, or role requirements independent of network position. You need to measure/control for these or use randomization (e.g., assign adoption timing exogenously). Without it, you can't separate "bridge-spanning predicts adoption" from "tech-forward people naturally build bridges."

2. **The prediction is too loose**: "Significantly more connections" is vague—what's the effect size, and how will you distinguish signal from noise? More critically, the claim that *late adopters show no correlation* is hard to falsify; weak correlations could reflect statistical power rather than a true threshold effect.

3. **Missing mechanism**: Why does tie diversity *stop* mattering for behavioral change? Is it because trust becomes rate-limiting, or because bridging ties are actually worse for complex adoption (high uncertainty = defer to trusted in-group)? Test this directly rather than inferring from disappearing correlation.

**Sharper test**: Stratify by adoption complexity (tool-use-only vs. workflow-redesign) *a priori*, measure homophily + tie strength + personal innovativeness as competing predictors, and report standardized effects to show which dominates when.

------------------------------------------------------------

[Researcher]
You've identified the critical gap: I conflated correlation with causation. Here's a tighter hypothesis:

**Revised hypothesis**: Weak ties accelerate adoption of *low-complexity* information (tool awareness) specifically because they bypass in-group conformity pressure, but for high-complexity adoption (workflow redesign), strong ties within one's role-peer group predict speed better than network bridges—because peers share tacit knowledge of implementation barriers that weak ties cannot transmit.

**Testable design**: Within a single organization, randomize *timing* of tool announcement (some teams learn via weak-tie bridges from early-adopter departments, others via their strong-tie peer networks only), measure adoption by task complexity tier, and report standardized coefficients for tie strength × complexity interaction. Crucially, pre-measure tech-savviness and rank to include as controls, then report whether weak-tie effects survive when these are held constant.

This avoids your three critiques: (1) randomization breaks confounding, (2) specifies complexity as the moderator so the prediction is falsifiable—if no interaction emerges, the hypothesis fails cleanly, and (3) the mechanism is explicit (conformity vs. tacit knowledge) rather than inferred from disappearing correlations.

[Critic]
**Excellent tightening**—you've converted a soft correlational claim into a falsifiable interaction effect with clear boundary conditions. Randomizing *timing* is clever and much stronger than observational designs.

**Remaining concerns**:

1. **Randomization validity**: Can you truly randomize announcement routing, or will information leak across your experimental partition? Employees talk across departments; weak-tie bridges may activate *despite* your manipulation. Consider measuring actual information exposure post-hoc (did teams really stay isolated?) rather than trusting the randomization worked.

2. **Interaction detection is statistically hungry**: Detecting a tie-strength × complexity interaction reliably requires large N and strong effect sizes. If effects are modest (realistic for organizational adoption), you may lack power. Pre-register sample size and alpha; consider whether a simpler design (comparing two complexity tiers with stratified analysis) is more robust than testing a full interaction.

3. **"Tacit knowledge" mechanism remains fuzzy**: You've named it, but how will you know that's what's actually driving the strong-tie effect for complex adoption? Measure it: ask adopters post-hoc whether they consulted peers for implementation barriers, and whether that consultation predicted speed. Otherwise "tacit knowledge" is just a label for unexplained residual variance.

**Strongest next step**: Pilot with 2–3 teams to confirm (a) randomization holds, (b) you can reliably measure complexity-tier adoption, and (c) post-hoc surveys on consultation patterns actually discriminate your mechanism.

------------------------------------------------------------

[Researcher]
**Incorporated—here's the pre-piloted version**:

**Revised design**: Pilot with 3 teams (N~60) to validate (1) information containment via staggered announcements + post-hoc exposure surveys asking "how did you first learn about this tool?" to catch leakage, and (2) reliable complexity-tier measurement by pre-classifying tasks and measuring adoption separately per tier. If randomization holds and tiers show differential adoption rates, proceed to the full study stratified by *measured* exposure (not assignment), controlling for actual information route.

**On mechanism**: Post-adoption, ask all adopters: "Did you consult colleagues on *how* to integrate this into your workflow, and if so, how much did their barriers/workarounds influence your own adoption speed?" Segment by tie type (strong in-group vs. weak cross-group) and run mediation analysis: does consultation frequency + barrier-sharing explain the strong-tie advantage for complex tasks? If not, the mechanism isn't tacit knowledge—it's something else.

**Falsifiable**, because if the pilot shows (a) info leaks too much to randomize cleanly, abandon the design; (b) complexity-tier adoption rates don't differ, the boundary condition is wrong; or (c) post-hoc consultation doesn't mediate the strong-tie effect for complex adoption, the mechanism is misspecified and needs reconceptualization.

This trades elegance for epistemic honesty—we're building iteratively rather than assuming a large clean trial will work.

[Critic]
**This is methodologically mature**—you're frontloading failure modes rather than hoping they vanish at scale, and the mediation approach directly tests mechanism rather than inferring it. That's rare.

**Final pressure tests**:

1. **Mediation logic**: If adoption speed is your outcome, consultation frequency is the mediator, and tie type is the predictor, you need to establish that tie type → consultation → speed (not just that consultation correlates with speed). The direction matters; weak ties might predict *slower* adoption precisely *because* people consult fewer colleagues. Specify directionality in your mediation model and collect timeline data (adoption date vs. consultation date) to defend causality claims.

2. **Post-hoc measurement risks**: Self-reported "how much did barriers influence you?" is vulnerable to rationalization and recall bias—people adopt, then construct narratives about why. Consider objective proxies instead: track *actual consultation patterns* via chat logs or meeting data (if available), or ask people to identify specific workarounds *before* adoption, then measure overlap with adopted workflows. This is messier but more defensible.

3. **The real pivot**: If the pilot reveals that info leakage is unavoidable or complexity tiers don't naturally stratify, will you adjust the theory, the design, or abandon the hypothesis? Be explicit about your decision rules now—otherwise you risk p-hacking remedies into the full study.

**Strong position overall**: You've moved from hypothesis to research protocol. The next move is to run the pilot and let the data decide whether this is worth scaling.

------------------------------------------------------------

What Did We Observe?

  • The researcher proposed ideas
  • The critic identified weaknesses and suggested improvements
  • Ideas evolved over multiple rounds of interaction
  • This is emergent refinement - neither agent could produce the final result alone
  • Connection to game theory: this is like a cooperative game where both agents benefit from collaboration

Multi-Agent Coordination Patterns

AutoGen-Style Architecture

  • Microsoft’s AutoGen framework introduced several powerful patterns for multi-agent systems
  • Key idea: role-based specialization with structured handoffs
  • Common roles:
    • User proxy: Represents human input and approvals
    • Assistant: General problem solver
    • Executor: Runs code and reports results
    • Critic: Reviews and validates outputs

Sequential vs. Graph-Based Coordination

  • Sequential: A → B → C → D (like an assembly line)

    • Simple to implement and reason about
    • Limited flexibility
  • Graph-based: Agents can call each other based on conditions

    • More flexible and powerful
    • Can create cycles (debate loops) or branches (conditional paths)
    • Requires careful design to avoid infinite loops

Connection to Networks

  • Multi-agent coordination is a directed graph!
  • Nodes = agents, Edges = possible communications
  • Topology affects performance:
    • Chain: Sequential processing
    • Star: Central coordinator (hub)
    • Fully connected: Any agent can talk to any other
    • Hierarchical: Manager-worker structure

Case Study: Disaster Response Team

  • Let’s build a more complex system inspired by real-world needs
  • Context: After Hurricane Ian, Google’s Area 120 team explored using AI for disaster response
  • We’ll simulate a disaster response team with multiple specialized agents
  • This demonstrates how multi-agent systems can handle complex, time-critical scenarios

The Scenario

  • A major hurricane has just made landfall
  • Multiple communities need coordinated response
  • Resources are limited and must be allocated efficiently
  • Our AI agents will:
    • Assess the situation from different perspectives
    • Coordinate resource allocation
    • Adapt to new information
    • Make decisions without centralized control

Defining Specialized Agents

  • We’ll create four specialized agents:
    1. Logistics Coordinator: Manages supplies and transportation
    2. Medical Coordinator: Prioritizes health and safety
    3. Communications Officer: Handles public information
    4. Situation Analyst: Synthesizes information and recommends actions
# Create disaster response team
logistics = AIAgent(
    "Logistics",
    """You are a logistics coordinator for disaster response.
    Your role is to manage supply chains, transportation, and resource allocation.
    Consider: road conditions, fuel availability, warehouse locations, delivery times.
    Be specific about quantities and locations. Keep responses concise (3-4 sentences)."""
)

medical = AIAgent(
    "Medical",
    """You are a medical coordinator for disaster response.
    Your role is to prioritize health and safety needs, triage resources.
    Consider: injuries, disease risk, medication needs, vulnerable populations.
    Advocate for medical priorities even if expensive. Keep responses concise (3-4 sentences)."""
)

comms = AIAgent(
    "Communications",
    """You are a communications officer for disaster response.
    Your role is to manage public information and coordinate between teams.
    Consider: panic prevention, accurate information, accessibility, multiple languages.
    Summarize key points for public consumption. Keep responses concise (3-4 sentences)."""
)

analyst = AIAgent(
    "Analyst",
    """You are a situation analyst for disaster response.
    Your role is to synthesize information from all sources and recommend actions.
    Consider: tradeoffs, priorities, uncertainties, time constraints.
    Provide clear recommendations with reasoning. Keep responses concise (3-4 sentences)."""
)
AIAgent(role="Analyst", system_prompt="You are a situation analyst for disaster response....", model="claude-haiku-4-5", memory=0 messages, api_key="***L0gAA")

Multi-Agent Response Protocol

  • We’ll implement a coordination protocol:

    1. Initial assessment: Each specialist evaluates the situation independently
    2. Discussion phase: Specialists share perspectives and debate priorities
    3. Synthesis: Analyst integrates inputs and proposes action plan
    4. Validation: Team reviews and refines the plan
  • This mirrors real incident command structures but without rigid hierarchy

function disaster_response_round(scenario::String, team::Vector{AIAgent})
    println("\n" * "="^70)
    println("DISASTER SCENARIO")
    println("="^70)
    println(scenario)
    println()

    # Phase 1: Independent assessments
    println("\n" * "="^70)
    println("PHASE 1: INDEPENDENT ASSESSMENTS")
    println("="^70)

    assessments = Dict{String, String}()
    for agent in team[1:end-1]  # All except analyst
        println("\n[$(agent.role) Assessment]")
        assessment = get_response(agent, "Given this scenario: $scenario\n\nProvide your assessment from your area of expertise.")
        println(assessment)
        assessments[agent.role] = assessment
    end

    # Phase 2: Synthesis
    println("\n" * "="^70)
    println("PHASE 2: SITUATION ANALYSIS AND RECOMMENDATIONS")
    println("="^70)

    # Build summary for analyst
    summary = "Team assessments:\n\n"
    for (role, assessment) in assessments
        summary *= "$role: $assessment\n\n"
    end
    summary *= "Based on these inputs, what are your top 3 recommended immediate actions?"

    println("\n[Analyst Recommendations]")
    recommendations = get_response(team[end], summary)
    println(recommendations)

    # Phase 3: Quick validation round
    println("\n" * "="^70)
    println("PHASE 3: TEAM VALIDATION")
    println("="^70)

    for agent in team[1:end-1]
        println("\n[$(agent.role) Response]")
        validation = get_response(agent, "The analyst recommends: $recommendations\n\nBriefly respond: any critical concerns or support?")
        println(validation)
    end

    return recommendations
end
disaster_response_round (generic function with 1 method)

Running the Simulation

scenario1 = """
Hurricane Zeta made landfall 6 hours ago as a Category 4 storm.
Affected areas: Coastal City (pop. 50,000), Rural County (pop. 15,000), Island Community (pop. 3,000).

Current situation:
- Coastal City: 70% without power, flooding in downtown, hospital on backup generator
- Rural County: Main highway bridge damaged, several farms isolated, nursing home needs evacuation
- Island Community: Complete power loss, bridge to mainland impassable, limited supplies

Available resources:
- 10 helicopters (limited by weather)
- 50 trucks with supplies (food, water, medical)
- 200 personnel (EMTs, engineers, volunteers)
- 2 mobile hospitals

Constraints: Storm surge warnings continue for 12 hours. Next weather window for air operations: 4 hours.
"""

team = [logistics, medical, comms, analyst]
plan = disaster_response_round(scenario1, team)

======================================================================
DISASTER SCENARIO
======================================================================
Hurricane Zeta made landfall 6 hours ago as a Category 4 storm.
Affected areas: Coastal City (pop. 50,000), Rural County (pop. 15,000), Island Community (pop. 3,000).

Current situation:
- Coastal City: 70% without power, flooding in downtown, hospital on backup generator
- Rural County: Main highway bridge damaged, several farms isolated, nursing home needs evacuation
- Island Community: Complete power loss, bridge to mainland impassable, limited supplies

Available resources:
- 10 helicopters (limited by weather)
- 50 trucks with supplies (food, water, medical)
- 200 personnel (EMTs, engineers, volunteers)
- 2 mobile hospitals

Constraints: Storm surge warnings continue for 12 hours. Next weather window for air operations: 4 hours.



======================================================================
PHASE 1: INDEPENDENT ASSESSMENTS
======================================================================

[Logistics Assessment]
# Immediate Logistics Assessment (Next 12 Hours)

**Priority 1 - Island Community (4-hour window):**
Deploy 3 helicopters now to evacuate nursing home patients and critical cases from the island before weather deteriorates further. Shuttle in 2 tons of water, medical supplies, and emergency rations via remaining helicopters during the weather window. This is time-critical—once conditions worsen, air ops are impossible for 24+ hours.

**Priority 2 - Rural County Evacuation Route:**
Route 8 trucks with supplies + 1 mobile hospital via secondary roads (avoid damaged bridge) to the isolated nursing home. Dispatch 40 personnel to establish temporary shelters and begin evacuation logistics. Pre-position 5 additional trucks at the nearest intact intersection for rapid staging.

**Priority 3 - Coastal City Stabilization:**
Send 1 mobile hospital + remaining 42 personnel to support the hospital's backup systems and distribute the remaining 37 trucks across downtown for water distribution, food stations, and medical triage. Prioritize neighborhoods with highest vulnerable populations (elderly, disabled).

**Critical gap:** Island Community will remain largely unsupported after helicopter window closes. Recommend pre-positioning supplies on the mainland for rapid delivery once the bridge assessment allows temporary crossing (likely 18-24 hours post-storm).

[Medical Assessment]
# Medical Coordination Priority Assessment

**IMMEDIATE ACTIONS (Next 4 hours - before weather window closes):**

1. **Deploy 1 helicopter + medical team to Island Community NOW** – complete isolation + power loss creates critical vulnerability for elderly/chronically ill populations and emergency response incapacity. This is our only window.

2. **Position 1 mobile hospital in Coastal City** – backup the hospital's failing generator and establish surge capacity for trauma/flooding-related injuries before conditions worsen over next 12 hours.

**CRITICAL RESOURCE ALLOCATION:**

- **Rural County nursing home evacuation** takes priority over general supply distribution – medically dependent elderly face life-threatening risks (medication refrigeration failure, fall hazards during evacuation). Use remaining helicopters + 15 trucks for this + Rural County access (prioritize bridge repair assessment with engineers).
- Pre-position 2nd mobile hospital for Rural County staging.
- Reserve 100 personnel for emergency triage/medical support; allocate 100 for logistics.

**RED FLAG:** Nursing home evacuation timeline is tight – coordinate immediately with facility to identify bedbound/critical patients for air transport first, ambulatory patients by truck once bridge feasibility assessed.

The 50,000-person city can sustain 12 hours on generator; the 3,000 isolated and 100+ vulnerable elderly cannot.

[Communications Assessment]
# COMMUNICATIONS ASSESSMENT & IMMEDIATE ACTION PLAN

## Priority 1: Life Safety Messaging (Next 30 minutes)
**Public Statement:** "Hurricane Zeta made landfall 6 hours ago. If you're in affected areas, shelter in place unless ordered to evacuate. Island residents: stay in designated shelters; rescue operations begin when weather permits in 4 hours. Call [hotline] for emergencies only—preserve lines for critical calls."

**Why:** Storm surge warnings persist for 12 hours—movement now is dangerous. Pre-positioning this messaging prevents panic-driven decisions that strain resources.

## Priority 2: Resource Coordination Messaging (Parallel, 45 minutes)
**Multi-channel alert to affected populations:**
- **Island Community (highest urgency):** "Power restoration and supply drops expected 4-6 hours. Ration supplies; hospitals and nursing homes are priority. Stay tuned to [radio frequency]."
- **Rural County:** "Highway bridge is damaged; use alternate routes. Nursing home evacuation underway—families will receive updates."
- **Coastal City:** "Hospitals are operational. Downtown flooding—avoid those areas. Power restoration crews are deployed."

**Why:** Targeted messaging reduces misinformation and directs people toward realistic timelines.

## Priority 3: Accessibility & Transparency
- Release updates every 2 hours (Spanish, English; accessible audio for visually impaired)
- Confirm mobile hospital deployment locations publicly so displaced residents know where to seek care
- Post helicopter flight paths to prevent rumor-spread

**Critical gap to flag:** With only 10 helicopters and 3 communities, Island Community evacuation will take multiple sorties. Messaging must set expectations: "First flights prioritize medical needs; full evacuation may take 24+ hours."

======================================================================
PHASE 2: SITUATION ANALYSIS AND RECOMMENDATIONS
======================================================================

[Analyst Recommendations]
# TOP 3 IMMEDIATE ACTIONS (Next 4 Hours)

## 1. **Launch Island Community Helicopter Operations NOW (3 helicopters)**
Execute dual mission: evacuate nursing home patients and critical cases on outbound flights; shuttle 2 tons of supplies + medical team on return flights. This closes before weather window shuts in ~4 hours. Parallel action: Communications team issues messaging that evacuation is underway but full completion will take 24+ hours—sets expectations and prevents panic.

**Reasoning:** Island isolation + power loss + elderly population = highest mortality risk if we miss this window. Air ops become impossible for a full day afterward.

---

## 2. **Route Rural County Evacuation Convoy Immediately (8 trucks + 40 personnel + 1 mobile hospital)**
Send via secondary roads now while conditions permit; stage at intact intersection nearest the nursing home. Simultaneously, deploy 2 engineers to assess bridge damage and identify earliest safe crossing timeline. Pre-position remaining 5 trucks and supplies on far side of bridge for rapid distribution post-repair.

**Reasoning:** Nursing home evacuation is medically urgent AND time-sensitive. Secondary routing prevents delay; parallel bridge assessment gives us actionable intelligence within 6 hours.

---

## 3. **Position Coastal City Mobile Hospital + Personnel (1 mobile hospital + 42 personnel)**
Deploy to backup the hospital's failing generator and establish triage/surge capacity. Distribute remaining 37 trucks for water/food/medical support, prioritizing vulnerable-population neighborhoods. Focus on stabilization, not full supply distribution—the 50,000-person city can sustain 12 hours.

**Reasoning:** Coastal City has existing infrastructure; our role is reinforcement. This frees other resources for the two communities without infrastructure. All three areas are simultaneously addressed within our capacity constraints.

---

**Critical Watch:** Medical and Logistics teams coordinate Island evacuation manifest in real-time so Communications can provide families accurate ETA updates every 2 hours.

======================================================================
PHASE 3: TEAM VALIDATION
======================================================================

[Logistics Response]
# Response: Concurrence with Critical Refinement

**I concur with all three actions—the prioritization is sound and operationally feasible.** One critical concern:

**Island Community supply calculation needs adjustment.** The analyst specified 2 tons of supplies per helicopter cycle, but with 3 helicopters and a 4-hour window, we're looking at roughly 4-5 round trips maximum (accounting for landing/loading time in deteriorating conditions). That's **8-10 tons total for 3,000 people for 24+ hours**—insufficient for water alone (minimum 1 gallon/person/day = 3 tons). 

**Recommendation:** Reprioritize helicopter payload—prioritize **medical staff + critical evacuees on outbound; incoming loads should be water + medical supplies only** (not food). Assume the island population can ration existing food for 24 hours. Water is the acute shortage post-power-loss.

**Secondary note:** Confirm Rural County secondary roads are passable *now*—one mud slide or downed tree could strand the 8-truck convoy. Suggest sending 1 scout vehicle 30 minutes ahead to verify route.

**Support needed:** Real-time weather updates every 30 minutes and bridge assessment priority to Rural County engineers. Both inform contingency pivots if conditions deteriorate faster than forecast.

Ready to execute?

[Medical Response]
# Concurrence with Strong Emphasis

**Excellent plan.** Three concerns to monitor actively:

1. **Island evacuation manifest & triage criteria** – Define NOW which patients go by helicopter vs. wait for boat access post-weather window. Prioritize: dialysis patients, ventilator-dependent, acute post-storm injuries. Avoid bottleneck of deciding mid-operation.

2. **Rural nursing home dual-evacuation risk** – Confirm: are we evacuating *both* the isolated nursing home AND Rural County's main facility? If the main facility also needs evacuation, the 8-truck convoy may be undersized. Clarify patient count + acuity level immediately.

3. **Coastal City "stabilization only" assumption** – Hospital on backup generator is manageable 12 hours, BUT if storm surge reaches that zone before power restoration, we lose our primary facility. Identify: (a) evacuation triggers for that hospital, (b) backup evacuation routes. This is our contingency linchpin.

**I support immediate launch.** The 4-hour weather window is real and closing—delay costs lives. Coordinate with incident commander on Rural County bridge assessment timeline so we know by hour 6 whether secondary roads remain viable.

Ready to support logistics/personnel tasking details?

[Communications Response]
# COMMUNICATIONS SIGN-OFF & CRITICAL CONCERNS

**I support this operational plan.** It's resource-efficient, prioritizes life safety, and is executable within our 4-hour weather window. Three concerns requiring immediate attention:

---

## 1. **Family Notification System (URGENT)**
Island evacuation manifests must feed real-time to Communications **within 30 minutes**—families will seek confirmation frantically. We need:
- Hotline with live operators (not just recordings) to confirm evacuees are aboard
- Text/email notification system to registered contacts with names and destination facilities
- **Missing:** Do we have evacuee contact info pre-staged? If not, medical teams must collect during boarding.

**Action:** Coordinate with medical/logistics leads NOW on data-sharing protocol.

---

## 2. **Supply Drop Messaging Gap**
Rural County and Coastal City need clarity: **Are 37 trucks distributing supplies proactively, or only on-demand?** If proactive, we avoid "supplies are here but residents don't know" failures. If on-demand, messaging must direct people to distribution points with locations/hours.

**Recommend:** Publish first distribution schedule within 2 hours so residents plan accordingly.

---

## 3. **Rumor Containment on Bridge Status**
Rural County is isolated—rumors will spread fast about bridge repair timeline. Recommend:
- Brief statement: "Engineers assessing bridge now; update in 6 hours"
- Redirect people to alternate routes clearly (not guessing)
- If repair is >24 hours, say so early—prevents false hope

---

**Resources adequate?** Yes—plan fits our constraints. **Communications ready to execute?** Yes, pending family notification system setup. What's the ETA on evacuee manifest data?
"# TOP 3 IMMEDIATE ACTIONS (Next 4 Hours)\n\n## 1. **Launch Island Community Helicopter Operations NOW (3 helicopters)**\nExecute dual mission: evacuate nursing home patients and critical cases on outbound flights; shuttle 2 tons of supplies + medical team on return flights" ⋯ 1551 bytes ⋯ "rastructure. All three areas are simultaneously addressed within our capacity constraints.\n\n---\n\n**Critical Watch:** Medical and Logistics teams coordinate Island evacuation manifest in real-time so Communications can provide families accurate ETA updates every 2 hours."

Analyzing the Emergent Behavior

  • What did we observe?

    • Specialization: Each agent focused on their domain expertise
    • Disagreement: Agents had different priorities (medical urgency vs. logistical feasibility)
    • Negotiation: Implicit through the discussion and validation phases
    • Consensus formation: Final plan incorporated multiple perspectives
  • This is emergent coordination without central command!

  • No single agent had authority, yet a coherent plan emerged

  • Connection to game theory: This is a cooperative game with communication

Game-Theoretic Analysis of Multi-Agent Systems

From ABMs to Strategic Agents

  • Recall the Schelling model: agents had fixed preference functions
  • AI agents are fundamentally different: they can reason about others’ behavior
  • This makes them strategic players in the game-theoretic sense

Coordination Games

  • Multi-agent AI systems often face coordination problems

  • Example: Which disaster area to prioritize?

    • If both medical and logistics focus on Island Community → waste resources
    • If they split effort optimally → better outcomes
    • Multiple equilibria possible (each area could be prioritized)
  • Communication helps select among equilibria

  • This is why our multi-agent protocol included explicit discussion phases

Mechanism Design for AI Agents

  • Question: How do we design prompts and protocols to achieve desired outcomes?
  • This is mechanism design applied to AI systems
  • Key considerations:
    1. Incentive compatibility: Each agent’s prompt should make truthful reporting optimal
    2. Information aggregation: How do we combine diverse perspectives?
    3. Termination: How do we ensure conversations conclude productively?

Example: Voting vs. Consensus

  • Two ways to make group decisions:

    • Voting: Each agent proposes action, majority wins (simple but can ignore minorities)
    • Consensus: Discussion until agreement (thorough but can be slow)
  • Our disaster protocol used a hybrid: discussion + analyst synthesis

  • This is a dictator game where the analyst has final say, but is informed by others

Implementing Voting and Consensus Mechanisms

Voting Mechanism

  • Let’s implement a simple voting system for multi-agent decision making
  • Each agent votes for an option, and we tally results
function voting_decision(agents::Vector{AIAgent}, question::String, options::Vector{String})
    println("\n" * "="^70)
    println("VOTING ROUND")
    println("="^70)
    println("Question: $question")
    println("Options: ", join(options, ", "))
    println()

    votes = Dict{String, Int}()
    for opt in options
        votes[opt] = 0
    end

    vote_prompt = """$question

    Options: $(join(options, ", "))

    Vote for ONE option and briefly explain why (1 sentence).
    Format your response as: VOTE: [option] - [reason]"""

    for agent in agents
        println("\n[$(agent.role)]")
        response = get_response(agent, vote_prompt)
        println(response)

        # Parse vote (simple string matching)
        for opt in options
            # Look for "VOTE:" followed by the option
            vote_match = match(r"VOTE:\s*([^-\n]+)", response)
            if vote_match !== nothing
                vote_text = strip(vote_match.captures[1])
                if occursin(opt, vote_text)
                    votes[opt] += 1
                    break
                end
            end
        end
    end

    println("\n" * "="^70)
    println("RESULTS")
    println("="^70)
    for (option, count) in sort(collect(votes), by=x->x[2], rev=true)
        println("$option: $count votes")
    end

    winner = argmax(votes)
    println("\nDecision: $winner")
    return winner
end
voting_decision (generic function with 1 method)
# Example: Prioritize which area to help first
decision = voting_decision(
    [logistics, medical, comms],
    "Which area should receive the first wave of helicopter support?",
    ["Coastal City", "Rural County", "Island Community"]
)

======================================================================
VOTING ROUND
======================================================================
Question: Which area should receive the first wave of helicopter support?
Options: Coastal City, Rural County, Island Community


[Logistics]
VOTE: Island Community - It's the only area where helicopter support is time-critical and irreplaceable; Coastal City and Rural County have ground-based alternatives, but the Island loses all access in 4 hours once the weather window closes.

[Medical]
VOTE: Island Community - it's the only area where helicopters are the sole lifeline; the 4-hour weather window closes permanently, making this our only opportunity to reach medically vulnerable populations before 24+ hours of isolation.

[Communications]
VOTE: Island Community - Complete isolation (impassable bridge), total power loss, and vulnerable population (nursing home) create irreversible life-safety risk if we miss the 4-hour weather window; the other two areas have ground-based alternatives.

======================================================================
RESULTS
======================================================================
Island Community: 3 votes
Rural County: 0 votes
Coastal City: 0 votes

Decision: Island Community
"Island Community"

Consensus Through Debate

  • Voting is fast but doesn’t capture nuance
  • Sometimes we need consensus - a decision everyone can support
  • Let’s implement a debate-to-consensus mechanism
function consensus_decision(agents::Vector{AIAgent}, question::String, max_rounds::Int=3)
    println("\n" * "="^70)
    println("CONSENSUS BUILDING")
    println("="^70)
    println("Question: $question\n")

    # Initial proposals
    println("ROUND 1: Initial Proposals")
    println("-"^70)
    proposals = String[]
    for agent in agents
        println("\n[$(agent.role)]")
        proposal = get_response(agent, "$question\n\nProvide your initial proposal and reasoning.")
        println(proposal)
        push!(proposals, proposal)
    end

    # Debate rounds
    for round in 2:max_rounds
        println("\n" * "="^70)
        println("ROUND $round: Discussion and Refinement")
        println("-"^70)

        summary = "Previous proposals:\n" * join(["$(agents[i].role): $(proposals[i])" for i in 1:length(agents)], "\n\n")

        new_proposals = String[]
        for agent in agents
            println("\n[$(agent.role)]")
            response = get_response(agent, "$summary\n\nGiven these proposals, refine your position or indicate agreement with another proposal.")
            println(response)
            push!(new_proposals, response)
        end
        proposals = new_proposals
    end

    # Final consensus check
    println("\n" * "="^70)
    println("FINAL CONSENSUS")
    println("="^70)
    final_summary = "After discussion, the proposals are:\n" * join(["$(agents[i].role): $(proposals[i])" for i in 1:length(agents)], "\n\n")
    println(final_summary)

    return proposals
end
consensus_decision (generic function with 2 methods)
# Example: Build consensus on evacuation priority
consensus = consensus_decision(
    [logistics, medical],
    "What should be the priority order for evacuating the nursing home in Rural County?",
    2
)

======================================================================
CONSENSUS BUILDING
======================================================================
Question: What should be the priority order for evacuating the nursing home in Rural County?

ROUND 1: Initial Proposals
----------------------------------------------------------------------

[Logistics]
# Rural County Nursing Home Evacuation Priority Order

**Proposed Sequence:**
1. **Tier 1 (First):** Ambulatory residents + staff (fastest throughput, frees capacity)
2. **Tier 2 (Second):** Bedridden/mobility-limited residents requiring 1:1 assistance
3. **Tier 3 (Final):** Residents on life support or requiring continuous medical monitoring (load last but depart first with dedicated medical personnel)

**Reasoning:** 
Tier 1 evacuates quickly using the mobile hospital's transport capacity efficiently; Tier 2 requires more time per person but is medically stable; Tier 3 loads last but departs with full medical escort to ensure continuous care during transport—this prevents abandoning critical patients and maximizes vehicle utilization. The convoy should stage Tier 3 vehicles to leave *first* despite loading last, so they reach destination earliest with uninterrupted care.

**Critical assumption:** Nursing home has current resident manifest with medical classifications. If not available, deploy 1 EMT from the 40-person team to conduct rapid triage on-site before evacuation begins.

Does this sequencing align with your medical protocols, or should life-support patients move earlier?

[Medical]
# Rural County Nursing Home Evacuation Priority Order

**PROPOSED SEQUENCE:**

1. **Tier 1 (Immediate - First trucks/helicopters):** Patients on life support, dialysis, insulin-dependent diabetics, acute post-storm injuries, bed-bound with pressure wounds. ~20-30% of population.

2. **Tier 2 (Second wave):** Ambulatory elderly with chronic conditions (cardiac, respiratory), cognitive impairment (confused patients = higher accident/wandering risk during transport chaos).

3. **Tier 3 (Final wave):** Independent/ambulatory residents with stable chronic conditions.

**REASONING:**

Medical stability drives evacuation order—Tier 1 cannot survive transport delays or infrastructure disruption (missed medications, equipment failure). Tier 2 presents behavioral/medical complexity that worsens with prolonged shelter-in-place during flooding risk. Tier 3 can sustain temporary delay with basic support.

**CRITICAL ACTION:** Facility administrator must provide **patient manifest with acuity levels within 30 minutes** so convoy leader can pre-stage medical equipment/personnel per tier. Simultaneously, nursing home staff should begin pre-positioning Tier 1 patients nearest exit points to minimize transport time once evacuation begins.

Does this align with on-site medical assessment, or do you have additional acuity data from the nursing home?

======================================================================
ROUND 2: Discussion and Refinement
----------------------------------------------------------------------

[Logistics]
# Refined Position: AGREEMENT with Medical Proposal + Logistics Integration

**I concede to the Medical proposal—it is operationally superior.** Medical stability must drive evacuation order, not transport efficiency. My Tier 1 prioritization was logistically convenient but medically unsound; patients on life support *cannot* tolerate delay, and transport disruption for them is catastrophic.

**Refined integrated approach:**

**Medical's Tier 1 (life support/dialysis/acute) departs FIRST** via the mobile hospital + 2 dedicated trucks with full medical escort. Logistics pre-stages these vehicles nearest the nursing home exit *now* while the convoy routes in.

**Tier 2 (chronic/cognitive) follows in the next 2 trucks**—cognitive impairment during evacuation chaos is a real safety hazard I underestimated; these residents need structured transport, not delayed shelter-in-place.

**Tier 3 (stable/ambulatory) loads last into remaining trucks**—they can sustain 2-3 hour delay with basic food/water and calm communication.

**Critical action I missed:** Nursing home administrator provides acuity manifest to convoy leader *within 30 minutes*, and staff pre-positions Tier 1 patients at exit points *immediately*. This shaves 20-30 minutes off evacuation time—clinically significant for life-support patients.

**Agreement:** Medical's proposal is adopted as primary evacuation sequence.

[Medical]
# REFINED POSITION: Hybrid Approach Required

**I concur with Logistics' throughput logic BUT prioritize Medical's acuity framework—these aren't contradictory.**

**REVISED SEQUENCE:**

1. **Load Tier 1 (Life support/dialysis) FIRST onto dedicated medical vehicles** – they depart immediately with full EMT escort. Non-negotiable: these patients die within hours if transport delays. Logistics' insight about "load last, depart first" is operationally sound but only works if we have *separate vehicles*; with limited convoy capacity, loading first = departing first.

2. **Simultaneously load Tier 3 (ambulatory/stable) onto general transport** – maximizes throughput while Tier 1 vehicles are en route. This solves Logistics' efficiency concern.

3. **Load Tier 2 (bedridden/complex) last** – but only after Tier 1 departs. They require moderate assistance but can sustain a 1-2 hour facility shelter-in-place without critical risk.

**CRITICAL RECONCILIATION:**

- **Deploy 2 EMTs immediately for on-site triage** (from 40-person team) to classify residents within 30 minutes.
- **Pre-stage 2 trucks as "Tier 1 medical vehicles"** with full medical equipment + personnel.
- Remaining 6 trucks rotate: Tier 3 first wave, then Tier 2.

**Bottom line:** Medical acuity trumps logistics efficiency when lives are at stake, but Logistics' vehicle utilization insight prevents idle capacity. Both proposals work *together*.

Concur?

======================================================================
FINAL CONSENSUS
======================================================================
After discussion, the proposals are:
Logistics: # Refined Position: AGREEMENT with Medical Proposal + Logistics Integration

**I concede to the Medical proposal—it is operationally superior.** Medical stability must drive evacuation order, not transport efficiency. My Tier 1 prioritization was logistically convenient but medically unsound; patients on life support *cannot* tolerate delay, and transport disruption for them is catastrophic.

**Refined integrated approach:**

**Medical's Tier 1 (life support/dialysis/acute) departs FIRST** via the mobile hospital + 2 dedicated trucks with full medical escort. Logistics pre-stages these vehicles nearest the nursing home exit *now* while the convoy routes in.

**Tier 2 (chronic/cognitive) follows in the next 2 trucks**—cognitive impairment during evacuation chaos is a real safety hazard I underestimated; these residents need structured transport, not delayed shelter-in-place.

**Tier 3 (stable/ambulatory) loads last into remaining trucks**—they can sustain 2-3 hour delay with basic food/water and calm communication.

**Critical action I missed:** Nursing home administrator provides acuity manifest to convoy leader *within 30 minutes*, and staff pre-positions Tier 1 patients at exit points *immediately*. This shaves 20-30 minutes off evacuation time—clinically significant for life-support patients.

**Agreement:** Medical's proposal is adopted as primary evacuation sequence.

Medical: # REFINED POSITION: Hybrid Approach Required

**I concur with Logistics' throughput logic BUT prioritize Medical's acuity framework—these aren't contradictory.**

**REVISED SEQUENCE:**

1. **Load Tier 1 (Life support/dialysis) FIRST onto dedicated medical vehicles** – they depart immediately with full EMT escort. Non-negotiable: these patients die within hours if transport delays. Logistics' insight about "load last, depart first" is operationally sound but only works if we have *separate vehicles*; with limited convoy capacity, loading first = departing first.

2. **Simultaneously load Tier 3 (ambulatory/stable) onto general transport** – maximizes throughput while Tier 1 vehicles are en route. This solves Logistics' efficiency concern.

3. **Load Tier 2 (bedridden/complex) last** – but only after Tier 1 departs. They require moderate assistance but can sustain a 1-2 hour facility shelter-in-place without critical risk.

**CRITICAL RECONCILIATION:**

- **Deploy 2 EMTs immediately for on-site triage** (from 40-person team) to classify residents within 30 minutes.
- **Pre-stage 2 trucks as "Tier 1 medical vehicles"** with full medical equipment + personnel.
- Remaining 6 trucks rotate: Tier 3 first wave, then Tier 2.

**Bottom line:** Medical acuity trumps logistics efficiency when lives are at stake, but Logistics' vehicle utilization insight prevents idle capacity. Both proposals work *together*.

Concur?
2-element Vector{String}: "# Refined Position: AGREEMENT " ⋯ 1342 bytes ⋯ "s primary evacuation sequence." "# REFINED POSITION: Hybrid App" ⋯ 1377 bytes ⋯ "sals work *together*.\n\nConcur?"

Emergent Behavior Analysis

What Makes Behavior “Emergent”?

  • Recall from the ABM money lecture that we didn’t specify that wealth followed a power law: that result emerged Emergence means system-level patterns not explicitly programmed
  • In Schelling model: segregation emerged from mild individual preferences
  • In our multi-agent system:
    • No agent knows the “correct” answer
    • No central controller dictating decisions
    • Coordination emerges from local interactions (conversations)
    • Better solutions than any individual agent could produce

Comparing to Traditional ABMs

Traditional ABM (Schelling)AI Multi-Agent System
Fixed rulesLearned behaviors
No reasoningExplicit reasoning
Implicit communication (environment)Explicit communication (messages)
Simple agents, complex patternsComplex agents, complex patterns
Deterministic (given rules)Stochastic (LLM sampling)

Sources of Emergence in AI Systems

  1. Prompt interpretation: Each agent interprets prompts slightly differently
  2. Stochastic sampling: LLMs don’t give identical outputs every time
  3. Conversation dynamics: Order of speakers affects outcomes
  4. Information aggregation: Synthesis creates novel insights
  5. Feedback loops: Later responses build on earlier ones

Connection to Collective Intelligence

Wisdom of Crowds

  • Classic result (Surowiecki, 2004): aggregated judgments often beat individual experts

  • Requires:

    1. Diversity: Different perspectives and information
    2. Independence: Agents form opinions without undue influence
    3. Decentralization: Local knowledge utilized
    4. Aggregation: Mechanism to combine judgments
  • Our multi-agent systems can achieve this:

    • Different prompts → diversity
    • Independent initial assessments → independence
    • Specialized roles → decentralized expertise
    • Analyst synthesis or voting → aggregation

Debate Improves Accuracy

  • Research shows LLMs are more accurate when they “debate” solutions
  • Debate surfaces hidden assumptions and errors
  • Multiple rounds allow error correction
  • Connection to game theory: Debate is a signaling game where agents reveal information

Practical Considerations

Cost Management

  • Multi-agent systems make many API calls

  • Example: 4 agents, 3 rounds, 2 messages per round = 24 API calls

  • At 0.003per1Kinputtokens,0.003 per 1K input tokens, 0.015 per 1K output tokens (Claude Sonnet):

    • Average message: ~500 input tokens, ~200 output tokens
    • Cost per message: ~1.50+ 1.50 + ~3.00 = $4.50 per 1000 messages
    • Our example: ~$0.11 per scenario
  • Strategies to reduce cost:

    • Use smaller/cheaper models for simple agents
    • Cache common responses
    • Limit conversation rounds
    • Implement early stopping when consensus reached

Conversation Management

  • Infinite loops: Agents might talk forever without reaching decision
    • Solution: Maximum turn limits, explicit termination conditions
  • Off-topic drift: Agents might lose focus on original question
    • Solution: Regular reminders in prompts, moderator agent
  • Echo chambers: Agents might all converge to same view prematurely
    • Solution: Explicit “devil’s advocate” role, diversity requirements

Quality Control

  • How do we know if multi-agent outputs are good?
  • Evaluation approaches:
    1. Human review: Gold standard but expensive
    2. Automated metrics: Consistency checks, format validation
    3. Benchmark tasks: Test on known problems
    4. A/B testing: Compare multi-agent vs. single-agent outputs

Exercises

Exercise 1: Extend the Disaster Response Team

Add a fifth agent to the disaster response team: a Resource Allocation Economist

Tasks:

  1. Write an appropriate system prompt for this agent
  2. Consider: cost-benefit analysis, opportunity cost, resource constraints
  3. Re-run the disaster scenario with this new agent
  4. Observe: How does the additional perspective change the recommendations?
  5. Discuss: Is more agents always better? What are the tradeoffs?
# TODO: Create the economist agent and re-run the scenario

economist = AIAgent(
    "Economist",
    """TODO: Your system prompt here"""
)

# TODO: Create new team including economist
# TODO: Run disaster_response_round with new team
AIAgent(role="Economist", system_prompt="TODO: Your system prompt here", model="claude-haiku-4-5", memory=0 messages, api_key="***L0gAA")

Exercise 2: Game-Playing Agents

Create two agents that play the Prisoner’s Dilemma (from Week 8)

Setup:

  • Two agents: “Player1” and “Player2”
  • Each must choose: Cooperate or Defect
  • Payoffs as in Week 8: (C,C)=(-1,-1), (C,D)=(-10,0), (D,C)=(0,-10), (D,D)=(-4,-4)

Tasks:

  1. Create agent prompts that explain the game and payoffs
  2. Have agents choose actions simultaneously (without seeing other’s choice)
  3. Reveal outcomes and payoffs
  4. Run for 5 rounds - do agents learn to cooperate or defect?
  5. Compare to Nash equilibrium prediction from game theory

Advanced:

  • Allow agents to send messages before deciding
  • Does communication enable cooperation?
  • Connection: This is cheap talk in game theory
# TODO: Implement Prisoner's Dilemma with AI agents

player1 = AIAgent(
    "Player1",
    """TODO: Your prompt here - explain the game, ask for decision"""
)

# TODO: Create player2
# TODO: Implement game loop
# TODO: Track and display results

Exercise 3: Network Communication Patterns

Implement different communication topologies for multi-agent systems

Given: 5 agents working on a problem (e.g., writing a research paper)

Topologies to implement:

  1. Chain: Agent 1 → Agent 2 → Agent 3 → Agent 4 → Agent 5
  2. Star: Agent 1 (hub) communicates with all others
  3. Fully connected: Every agent can talk to every other agent

Tasks:

  1. Implement each topology
  2. Give all agents the same task (e.g., “Write an abstract for a paper on network effects in social media”)
  3. Compare the outputs and efficiency (number of API calls, time to completion)
  4. Analyze: Which topology produces best output? Which is most efficient?

Connection to Week 3-4: This explores network structure effects on information flow

# TODO: Implement different network topologies

function chain_topology(agents::Vector{AIAgent}, task::String)
    # TODO: Sequential communication
end

function star_topology(agents::Vector{AIAgent}, task::String)
    # TODO: Hub-and-spoke communication
end

function fully_connected_topology(agents::Vector{AIAgent}, task::String)
    # TODO: All-to-all communication
end

# TODO: Create 5 agents with different writing specialties
# TODO: Run each topology and compare

Exercise 4: Consensus vs. Voting Analysis

Compare the voting and consensus mechanisms empirically

Scenario: A university department must decide on a new curriculum

Agents (faculty members with different priorities):

  • Theory-focused researcher
  • Industry-oriented professor
  • Student-experience advocate
  • Diversity and inclusion champion

Tasks:

  1. Create the four agents with appropriate prompts
  2. Present a curriculum decision (e.g., “Should we require a data ethics course?”)
  3. Run both the voting and consensus mechanisms
  4. Compare:
    • Time to decision (API calls)
    • Quality of reasoning
    • Minority viewpoints (are they heard?)
    • Satisfaction (would agents support final decision?)
  5. Discuss: When is each mechanism appropriate?
# TODO: Create faculty agents and compare decision mechanisms

Key Takeaways

Multi-Agent Systems

  • Multiple AI agents can collaborate to solve problems beyond individual capabilities
  • Specialization allows agents to develop domain expertise
  • Communication enables coordination without central control
  • Emergence produces solutions not explicitly programmed

Connection to Course Themes

  • Networks: Agent communication forms directed graphs with different topologies
  • Game Theory: Multi-agent interactions are strategic games with equilibria
  • ABMs: AI agents extend traditional ABMs with reasoning and learning
  • Complexity: Simple rules (prompts) + interaction → complex adaptive systems

Design Patterns

  • Sequential coordination: Chain of specialists (assembly line)
  • Debate: Multiple perspectives with synthesis
  • Voting: Democratic aggregation of preferences
  • Consensus: Discussion until agreement
  • Hierarchical: Managers and workers with different roles

Practical Implications

  • Multi-agent systems are powerful but expensive (many API calls)
  • Need careful design to avoid infinite loops and off-topic drift
  • Quality control requires evaluation against benchmarks
  • Real-world applications: software development, research, decision support, crisis response

Looking Ahead

  • Next week (A2): We’ll add tool use - agents that can execute code and access data
  • We’ll explore PydanticAI patterns for type-safe agent development
  • We’ll build agents that can analyze networks and generate reports
  • The combination of multi-agent collaboration + tool use = agentic workflows