Skip to article frontmatterSkip to article content

Retrieval-Augmented Generation: Augmenting AI with External Knowledge

University of Central Florida
Valorum Data

Computational Analysis of Social Complexity

Fall 2025, Spencer Lyon

Prerequisites

  • L.A1.01 (LLMs and API calls)
  • Basic understanding of vector spaces
  • Networks concepts (Weeks 3-5)

Required Julia Packages

Install the following packages if you haven’t already:

using Pkg
Pkg.add(["HTTP", "JSON3", "DotEnv", "LinearAlgebra", "DataFrames"])

Outcomes

  • Understand the limitations of parametric knowledge in LLMs
  • Implement a complete Retrieval-Augmented Generation (RAG) system
  • Work with embeddings and perform vector similarity search
  • Design knowledge bases for domain-specific AI applications
  • Analyze trade-offs between parametric and retrieval-based knowledge

References

The Knowledge Problem

  • In the previous lecture we learned how to interact with LLMs via API calls
  • We saw that these models can be remarkably helpful for various tasks
  • However, LLMs have a fundamental limitation: they only “know” what was in their training data
  • This creates several problems that we’ll explore now

Problem 1: Knowledge Cutoffs

  • LLMs are trained on data collected up to a specific date
  • For example, GPT-4 (original) had a knowledge cutoff of September 2021
  • Claude 3.5 Sonnet has a knowledge cutoff of April 2024
  • Any events, research, or information after the cutoff date is unknown to the model
  • This means:
    • Can’t answer questions about recent events
    • Doesn’t know about new research papers or discoveries
    • Unaware of updated statistics or data

Example: If you ask GPT-4 (original) about a paper published in 2023, it will either say it doesn’t know or worse, hallucinate details about a paper it has never seen.

Problem 2: Hallucination

  • LLMs are trained to generate plausible-sounding text
  • When asked about something they don’t know, they often hallucinate: generate confident-sounding but factually incorrect information
  • This is particularly dangerous because:
    • The model sounds authoritative
    • Users may not realize the information is false
    • The hallucinated content can include fake citations, statistics, or facts

Example: Ask an LLM about a research paper that doesn’t exist. It might generate a plausible abstract, author names, and even “key findings” - all completely fabricated.

Problem 3: Domain-Specific Knowledge

  • Even within their training data, LLMs have uneven knowledge
  • Some domains are well-represented in training data (e.g., popular programming languages)
  • Others are sparse (e.g., specialized academic subfields, proprietary company data)
  • For your organization’s specific documents, data, or knowledge base, the LLM knows nothing

Example: You want an AI assistant to help with your company’s internal policies, product documentation, or research papers. The LLM has never seen these documents.

Connection to Network Information Flow

  • Recall from Weeks 3-4 our study of information diffusion in networks
  • We saw how information flows through network structures
  • Key insight: who you’re connected to determines what information you can access
  • LLMs face a similar problem:
    • Their “connections” are frozen at training time
    • They can’t access new information sources
    • They can’t form new “edges” to external knowledge
  • RAG solves this by creating a dynamic connection between the LLM and external knowledge sources
  • Think of RAG as giving the LLM the ability to form new edges in the information network on-demand

The Solution: Retrieval-Augmented Generation

  • Instead of relying solely on parametric knowledge (what’s encoded in model weights), we can augment the LLM with retrieval
  • Basic idea:
    1. User asks a question
    2. Search a knowledge base for relevant information
    3. Provide that information to the LLM as context
    4. LLM generates answer based on retrieved information
  • This approach:
    • Eliminates knowledge cutoff problem (use up-to-date sources)
    • Reduces hallucination (model works from provided facts)
    • Enables domain-specific applications (use your own documents)
    • Provides citations (you know which documents were used)

Embeddings: Representing Meaning as Vectors

  • To implement retrieval, we need to search for relevant documents
  • But how do we determine if a document is relevant to a query?
  • The key innovation: embeddings - representations of text as high-dimensional vectors
  • Embeddings capture semantic meaning: similar meanings → similar vectors

What are Embeddings?

  • An embedding is a function f:TextRdf: \text{Text} \rightarrow \mathbb{R}^d that maps text to a d-dimensional vector
  • Typically dd ranges from 384 to 3072 dimensions
  • Key property: texts with similar meanings have similar embeddings (vectors close together in vector space)

Intuition: Think of each dimension as capturing some aspect of meaning

  • One dimension might capture “technical vs casual”
  • Another might capture “positive vs negative sentiment”
  • Another might capture “about networks vs about game theory”
  • With hundreds or thousands of dimensions, we can capture nuanced semantic relationships

Example: Word Analogies

One famous property of embeddings is that they capture relational meaning:

embedding("king")embedding("man")+embedding("woman")embedding("queen")\text{embedding}(\text{"king"}) - \text{embedding}(\text{"man"}) + \text{embedding}(\text{"woman"}) \approx \text{embedding}(\text{"queen"})

This works because the embedding space organizes concepts by their relationships:

  • The vector from “man” to “king” represents “royalty”
  • Adding that same vector to “woman” gives you something close to “queen”

Similar patterns work for other relationships:

  • Paris - France + Germany ≈ Berlin
  • Walking - Walked + Swimming ≈ Swam

Measuring Similarity: Cosine Similarity

  • Once we have embeddings, we need a way to measure similarity between vectors
  • The most common metric is cosine similarity
  • For two vectors a\mathbf{a} and b\mathbf{b}:
cosine_similarity(a,b)=abab=i=1daibii=1dai2i=1dbi2\text{cosine\_similarity}(\mathbf{a}, \mathbf{b}) = \frac{\mathbf{a} \cdot \mathbf{b}}{||\mathbf{a}|| \, ||\mathbf{b}||} = \frac{\sum_{i=1}^d a_i b_i}{\sqrt{\sum_{i=1}^d a_i^2} \sqrt{\sum_{i=1}^d b_i^2}}
  • Returns a value between -1 and 1
    • 1: vectors point in same direction (very similar)
    • 0: vectors are orthogonal (unrelated)
    • -1: vectors point in opposite directions (opposite meaning)
  • Why cosine and not Euclidean distance?
    • Cosine measures angle, not magnitude
    • Makes sense for text: a longer document isn’t necessarily “farther” in meaning
    • Embedding models are typically optimized for cosine similarity

Creating Embeddings with OpenAI

Let’s see how to create embeddings using OpenAI’s API.

Setup: Before running the code below, make sure you have a .env file in your working directory with your OpenAI API key:

OPENAI_API_KEY=sk-your-api-key-here

The DotEnv package will automatically load these environment variables from the .env file.

using HTTP, JSON3, DotEnv

# Load environment variables from .env file
DotEnv.load!()

# Load API key from environment
OPENAI_API_KEY = ENV["OPENAI_API_KEY"]

function get_embedding(text::String; model="text-embedding-3-small")
    """
    Get embedding vector for input text using OpenAI's API

    Args:
        text: Input text to embed
        model: Embedding model to use (default: text-embedding-3-small)

    Returns:
        Vector of floats representing the embedding
    """
    url = "https://api.openai.com/v1/embeddings"

    headers = [
        "Authorization" => "Bearer $(OPENAI_API_KEY)",
        "Content-Type" => "application/json"
    ]

    body = JSON3.write(Dict(
        "input" => text,
        "model" => model
    ))

    response = HTTP.post(url, headers, body)
    result = JSON3.read(String(response.body))

    return Vector(result.data[1].embedding)
end
get_embedding (generic function with 1 method)

Let’s test this with some network-related concepts:

# Embed some network-related texts
texts = [
    "Graph theory studies the mathematical structure of networks",
    "The clustering coefficient measures the tendency of nodes to form triangles",
    "Preferential attachment leads to power law degree distributions",
    "I enjoy eating pizza with extra cheese"
]

embeddings = [get_embedding(text) for text in texts]

println("Embedding dimension: ", length(embeddings[1]))
println("First few values of first embedding: ", embeddings[1][1:5])
Embedding dimension: 1536
First few values of first embedding: [-0.059553657, -0.029068848, 0.023821464, -0.023550766, -0.0076992884]

Now let’s compute cosine similarity between these texts:

using LinearAlgebra

function cosine_similarity(a::Vector{Float64}, b::Vector{Float64})
    """
    Compute cosine similarity between two vectors
    """
    return dot(a, b) / (norm(a) * norm(b))
end

# Compare first text (graph theory) to all others
println("Similarities to: '", texts[1], "'\n")
for (i, text) in enumerate(texts)
    sim = cosine_similarity(embeddings[1], embeddings[i])
    println("Text $i ($(text[1:min(40, end)])...): $(round(sim, digits=4))")
end
Similarities to: 'Graph theory studies the mathematical structure of networks'

Text 1 (Graph theory studies the mathematical st...): 1.0
Text 2 (The clustering coefficient measures the ...): 0.3744
Text 3 (Preferential attachment leads to power l...): 0.4689
Text 4 (I enjoy eating pizza with extra cheese...): 0.0422

What do we see?

  • The first three texts (all about networks) have high similarity to each other (typically > 0.7)
  • The pizza text has low similarity to the network texts (typically < 0.5)
  • This demonstrates that embeddings capture semantic meaning
  • Texts about similar topics cluster together in the embedding space

Exercise 1: Exploring Embeddings

Create embeddings for the following texts and compute their pairwise similarities:

  1. “Nash equilibrium is a solution concept in game theory”
  2. “Players choose strategies to maximize their payoffs”
  3. “Agent-based models simulate individual decision makers”
  4. “The weather today is sunny and warm”

Questions:

  • Which pairs of texts have the highest similarity?
  • Does the game theory text have higher similarity to the ABM text or the weather text? Why?
  • What does this tell you about how embeddings capture domain knowledge?
# TODO: your code here

Building a RAG Pipeline

  • Now that we understand embeddings, we can build a complete RAG system
  • A RAG pipeline has four main stages:
    1. Chunking: Break documents into manageable pieces
    2. Embedding: Convert chunks to vectors and store them
    3. Retrieval: Find most relevant chunks for a query
    4. Generation: Use LLM to answer query with retrieved context

Let’s build each component step by step.

Stage 1: Chunking

  • Most documents are too long to embed as a single unit
  • We need to break them into chunks - smaller, semantically meaningful pieces
  • Typical chunk sizes: 100-500 words or 400-2000 characters

Why chunk?

  • Embeddings work best on focused, coherent pieces of text
  • Enables fine-grained retrieval (find specific relevant paragraphs, not whole documents)
  • Fits within LLM context limits (we can only inject so much context)

Chunking strategies:

  1. Fixed size: Every N characters or words
  2. Natural boundaries: By paragraph, section, or sentence
  3. Semantic: Using embeddings to identify topic shifts

For this lecture, we’ll use a simple paragraph-based chunking:

function chunk_text(text::String; max_chunk_size::Int=1000, overlap::Int=100)
    """
    Split text into overlapping chunks

    Args:
        text: Input text to chunk
        max_chunk_size: Maximum characters per chunk
        overlap: Number of characters to overlap between chunks

    Returns:
        Vector of text chunks
    """
    # Split by paragraphs first (double newline)
    paragraphs = split(text, "\n\n")

    chunks = String[]
    current_chunk = ""

    for para in paragraphs
        # If adding this paragraph would exceed max size, save current chunk
        if length(current_chunk) + length(para) > max_chunk_size && !isempty(current_chunk)
            push!(chunks, strip(current_chunk))
            # Start new chunk with overlap from previous
            current_chunk = current_chunk[max(1, end-overlap):end]
        end

        current_chunk *= para * "\n\n"
    end

    # Don't forget last chunk
    if !isempty(strip(current_chunk))
        push!(chunks, strip(current_chunk))
    end

    return chunks
end
chunk_text (generic function with 1 method)

Stage 2: Vector Database

  • Once we have chunks and embeddings, we need to store them efficiently
  • A vector database stores embeddings and enables fast similarity search
  • For this lecture, we’ll build a simple in-memory vector store
  • Production systems typically use specialized vector databases like:
    • Pinecone
    • Weaviate
    • Qdrant
    • Chroma
    • pgvector (PostgreSQL extension)
using DataFrames

mutable struct VectorStore
    documents::Vector{String}          # Original text chunks
    embeddings::Vector{Vector{Float64}}  # Corresponding embeddings
    metadata::Vector{Dict{String, Any}}  # Optional metadata (source, page, etc.)
end

# Constructor for empty store
VectorStore() = VectorStore(String[], Vector{Float64}[], Dict{String, Any}[])

function add_documents!(store::VectorStore, docs::Vector{String}; metadata=nothing)
    """
    Add documents to the vector store by computing and storing their embeddings
    """
    println("Embedding $(length(docs)) documents...")

    for (i, doc) in enumerate(docs)
        # Get embedding
        emb = get_embedding(doc)

        # Store document, embedding, and metadata
        push!(store.documents, doc)
        push!(store.embeddings, emb)

        if metadata !== nothing && i <= length(metadata)
            push!(store.metadata, metadata[i])
        else
            push!(store.metadata, Dict("index" => i))
        end

        if i % 10 == 0
            println("  Processed $i/$(length(docs))...")
        end
    end

    println("Done! Store now contains $(length(store.documents)) documents.")
end

function search(store::VectorStore, query::String; top_k::Int=5)
    """
    Search for most similar documents to query

    Args:
        store: Vector store to search
        query: Search query text
        top_k: Number of results to return

    Returns:
        DataFrame with columns: rank, document, similarity, metadata
    """
    # Embed the query
    query_emb = get_embedding(query)

    # Compute similarities to all documents
    similarities = [cosine_similarity(query_emb, doc_emb) for doc_emb in store.embeddings]

    # Get top k indices
    top_indices = partialsortperm(similarities, 1:min(top_k, length(similarities)), rev=true)

    # Build result dataframe
    results = DataFrame(
        rank = 1:length(top_indices),
        document = store.documents[top_indices],
        similarity = similarities[top_indices],
        metadata = store.metadata[top_indices]
    )

    return results
end
search (generic function with 1 method)

Stage 3 & 4: Retrieval and Generation

  • Now we can combine retrieval with LLM generation
  • The key is to inject retrieved documents into the LLM’s context
  • We’ll create a function that:
    1. Takes a user query
    2. Retrieves relevant chunks from vector store
    3. Constructs instructions with retrieved context
    4. Sends to LLM via OpenAI’s Responses API for final answer

Note: We’re using OpenAI’s Responses API (/v1/responses) which is the modern successor to the Chat Completions API. The Responses API uses instructions for system-level guidance and input for user queries, providing a cleaner interface for stateful interactions.

function rag_query(store::VectorStore, query::String; top_k::Int=3, model::String="gpt-5")
    """
    Answer a query using RAG: retrieve relevant docs, then generate answer

    Args:
        store: Vector store containing knowledge base
        query: User's question
        top_k: Number of documents to retrieve
        model: OpenAI model to use for generation (default: gpt-5)

    Returns:
        Generated answer and retrieved documents
    """
    # Step 1: Retrieve relevant documents
    results = search(store, query; top_k=top_k)

    # Step 2: Construct context from retrieved documents
    context = ""
    for i in 1:nrow(results)
        context *= "Document $i:\n$(results.document[i])\n\n"
    end

    # Step 3: Build instructions with context
    instructions = """
    You are a helpful assistant that answers questions based on provided context.
    Use the context below to answer the user's question. If the context doesn't
    contain enough information to answer the question, say so - do not make up information.

    Context:
    $(context)
    """

    # Step 4: Call LLM using Responses API
    response = call_gpt(
        query;
        instructions=instructions,
        model=model
    )

    return (answer=response, sources=results)
end

function call_gpt(input::String; instructions::String="", model::String="gpt-5-nano", reasoning_effort::String="low")
    url = "https://api.openai.com/v1/responses"

    # OPENAI_API_KEY should be loaded from ENV via DotEnv
    headers = [
        "Authorization" => "Bearer $(ENV["OPENAI_API_KEY"])",
        "Content-Type" => "application/json"
    ]

    # Build request body with Responses API format
    request_body = Dict(
        "model" => model,
        "input" => input,
        "reasoning" => Dict("effort" => reasoning_effort)
    )

    # Add instructions if provided
    if !isempty(instructions)
        request_body["instructions"] = instructions
    end

    body = JSON3.write(request_body)

    response = HTTP.post(url, headers, body)
    result = JSON3.read(String(response.body))

    # NOTE: result.output is an array that can have multiple items. We are assuming a simple case here...
    return result.output[end].content[1].text
end
call_gpt (generic function with 1 method)

Application: Network Analysis Assistant

  • Let’s build a practical RAG application: an AI assistant for graph theory and network analysis
  • We’ll create a knowledge base from summaries of key network concepts
  • This assistant will be able to answer questions about network theory using retrieved information
  • In a real application, you would load actual research papers, textbooks, or documentation

Building the Knowledge Base

Let’s create a knowledge base with information about graph theory and network analysis:

# Sample documents about network analysis
# In practice, you would load these from files, scrape papers, etc.
network_docs = [
    """
    Clustering Coefficient

    The clustering coefficient measures the degree to which nodes in a graph tend to
    cluster together. For a given node, it is defined as the proportion of connections
    between the node's neighbors which are also connected to each other.

    Formally, for node i with degree k_i, if there are E_i edges between its neighbors,
    the local clustering coefficient is C_i = 2*E_i / (k_i * (k_i - 1)).

    The global clustering coefficient averages this measure across all nodes. High
    clustering coefficients are characteristic of social networks and indicate the
    presence of tightly-knit communities.
    """,

    """
    Preferential Attachment and Scale-Free Networks

    Preferential attachment is a mechanism by which networks grow over time. When a new
    node joins the network, it forms connections preferentially to nodes that already
    have many connections - the "rich get richer" phenomenon.

    The Barabási-Albert model (1999) demonstrated that preferential attachment leads to
    scale-free networks with power law degree distributions. In such networks, P(k) ~ k^(-γ)
    where γ is typically between 2 and 3.

    Scale-free networks are characterized by the presence of hubs - nodes with
    disproportionately high degree. Examples include the Internet, citation networks,
    and social media platforms.
    """,

    """
    Small-World Networks

    Small-world networks, introduced by Watts and Strogatz (1998), exhibit two key properties:
    high clustering (like regular lattices) and short path lengths (like random graphs).

    The small-world phenomenon, popularly known as "six degrees of separation," refers to
    the observation that most pairs of nodes in many real networks are connected by short
    paths, typically of length 6 or less.

    The Watts-Strogatz model generates small-world networks by starting with a ring lattice
    and randomly rewiring edges with some probability p. For intermediate values of p,
    the network maintains high clustering while gaining shortcuts that reduce path length.
    """,

    """
    Centrality Measures

    Centrality measures identify the most important nodes in a network. Different
    centrality measures capture different notions of importance:

    1. Degree centrality: Simply counts the number of connections. High degree nodes
       are "hubs" that directly connect to many others.

    2. Betweenness centrality: Counts how often a node lies on shortest paths between
       other nodes. High betweenness nodes are "bridges" that control information flow.

    3. Closeness centrality: Measures the average distance from a node to all others.
       High closeness nodes can reach the network quickly.

    4. Eigenvector centrality: A node is important if it connects to other important
       nodes. This is the principle behind Google's PageRank algorithm.
    """,

    """
    Community Detection

    Community detection aims to identify groups of nodes that are more densely connected
    internally than to the rest of the network. Communities represent functional modules
    or social groups.

    The modularity measure quantifies the quality of a partition. For partition C,
    modularity Q = (1/2m) * Σ[A_ij - (k_i*k_j/2m)] * δ(c_i, c_j), where m is the total
    number of edges, A_ij is the adjacency matrix, and δ(c_i, c_j) = 1 if nodes i and j
    are in the same community.

    Popular algorithms include the Louvain method, which greedily optimizes modularity,
    and spectral methods based on the graph Laplacian. Community structure appears in
    social networks (friend groups), biological networks (protein complexes), and
    information networks (topic clusters).
    """,

    """
    Network Resilience and Robustness

    Network resilience refers to a network's ability to maintain functionality when nodes
    or edges are removed. This is crucial for infrastructure networks, communication systems,
    and biological systems.

    Scale-free networks exhibit a peculiar property: they are robust to random failures
    but vulnerable to targeted attacks. Removing random nodes has little effect because
    most nodes have low degree. However, removing hubs quickly fragments the network.

    The resilience of random graphs is more uniform - they handle both random and targeted
    removal similarly. Measuring resilience involves computing the size of the largest
    connected component as nodes are progressively removed.
    """,

    """
    Network Formation and Strategic Link Formation

    In many contexts, network connections form through strategic decisions by self-interested
    agents. Game-theoretic models of network formation consider the costs and benefits of
    forming links.

    Jackson and Wolinsky (1996) introduced the concept of pairwise stability: a network is
    pairwise stable if no agent wants to sever a link and no pair of agents wants to form
    a new link.

    The tension between individual incentives and social welfare is central to network
    formation theory. The star network, where one central node connects to all others,
    often minimizes total distance (socially optimal) but may not be pairwise stable
    without appropriate transfer payments.
    """,

    """
    Information Diffusion and Cascades

    Information, behaviors, and innovations spread through networks via social influence.
    The threshold model (Granovetter, 1978) posits that individuals adopt a behavior when
    a threshold fraction of their neighbors have adopted it.

    A cascade occurs when an initial adoption by a few nodes triggers a chain reaction
    of adoptions throughout the network. Whether cascades occur depends on the distribution
    of thresholds and the network structure.

    Weak ties play a crucial role in diffusion (Granovetter, 1973). While strong ties
    create dense, clustered groups, weak ties serve as bridges between communities and
    enable information to reach new populations. This explains why job seekers often find
    opportunities through acquaintances rather than close friends.
    """
]

println("Created knowledge base with $(length(network_docs)) documents")
Created knowledge base with 8 documents

Now let’s create our vector store and add these documents:

# Initialize vector store
network_store = VectorStore()

# Add documents to store (this will take a moment as we embed each document)
add_documents!(network_store, network_docs)
Embedding 8 documents...
Done! Store now contains 8 documents.

Testing Our Network Analysis Assistant

Let’s test our RAG system with some questions about networks:

# Question 1: About clustering
q1 = "What is the clustering coefficient and how is it calculated?"
result1 = rag_query(network_store, q1)

println("Question: ", q1)
println("\nAnswer:\n", result1.answer)
println("\n" * "="^80)
println("Top sources used:")
for i in 1:min(2, nrow(result1.sources))
    println("\n[$i] (similarity: $(round(result1.sources.similarity[i], digits=3)))")
    println(result1.sources.document[i][1:min(200, end)] * "...")
end
Question: What is the clustering coefficient and how is it calculated?

Answer:
The clustering coefficient quantifies how much a node’s neighbors are connected to each other (how “clustered” they are).

- Local clustering coefficient for node i: C_i = 2*E_i / (k_i * (k_i - 1))
  - k_i: degree of node i (number of neighbors)
  - E_i: number of edges that actually exist between those neighbors
  - It measures the fraction of possible neighbor-to-neighbor links that are present.

- Global clustering coefficient: the average of the local clustering coefficients over all nodes.

High values indicate tightly knit groups, common in social networks.

================================================================================
Top sources used:

[1] (similarity: 0.765)
Clustering Coefficient

The clustering coefficient measures the degree to which nodes in a graph tend to
cluster together. For a given node, it is defined as the proportion of connections
between the ...

[2] (similarity: 0.416)
Community Detection

Community detection aims to identify groups of nodes that are more densely connected
internally than to the rest of the network. Communities represent functional modules
or social...
# Question 2: About scale-free networks
q2 = "Why are scale-free networks vulnerable to targeted attacks?"
result2 = rag_query(network_store, q2)

println("Question: ", q2)
println("\nAnswer:\n", result2.answer)
Question: Why are scale-free networks vulnerable to targeted attacks?

Answer:
Because scale-free networks rely on a few high-degree hubs to hold the network together. Targeted attacks that remove these hubs quickly fragment the network and shrink the largest connected component, causing a rapid loss of connectivity, even though random failures (which mostly hit low-degree nodes) have little effect.
# Question 3: About weak ties
q3 = "What role do weak ties play in information diffusion?"
result3 = rag_query(network_store, q3)

println("Question: ", q3)
println("\nAnswer:\n", result3.answer)
Question: What role do weak ties play in information diffusion?

Answer:
Weak ties act as bridges between otherwise separate, densely clustered groups. By connecting different communities, they let information travel beyond local circles, reaching new populations and enabling broader cascades of adoption. This is why people often hear about job opportunities through acquaintances rather than close friends.

Comparing RAG vs Vanilla LLM

Let’s demonstrate the value of RAG by comparing answers with and without retrieved context:

function vanilla_llm_query(query::String; model::String="gpt-5")
    """
    Query LLM without RAG (no retrieved context)
    """
    return call_gpt(
        query;
        instructions="You are a helpful assistant.",
        model=model
    )
end

# Ask a specific question that requires precise information
specific_q = "What is the formula for local clustering coefficient in a graph?"

println("Question: ", specific_q)
println("\n" * "="^80)
println("Vanilla LLM Answer:")
println(vanilla_llm_query(specific_q))
println("\n" * "="^80)
println("RAG Answer:")
println(rag_query(network_store, specific_q).answer)
Question: What is the formula for local clustering coefficient in a graph?

================================================================================
Vanilla LLM Answer:
For an undirected, unweighted graph, the local clustering coefficient of a node v is:

C(v) = 2 e_v / [k_v (k_v − 1)]

- k_v = degree of v (number of neighbors)
- e_v = number of edges between neighbors of v (equivalently, number of triangles involving v)

By convention, C(v) = 0 if k_v < 2.

Note (directed variant): For directed graphs, the maximal number of possible edges among neighbors is k_v (k_v − 1), so a common definition is C(v) = e_v / [k_v (k_v − 1)], where e_v counts directed edges among neighbors.

================================================================================
RAG Answer:
C_i = 2 E_i / [k_i (k_i − 1)], where k_i is the degree of node i and E_i is the number of edges between its neighbors.

Observations:

Both answers are likely correct (this is a well-known formula in the LLM’s training data). However:

  1. With RAG, we know exactly where the answer came from (we can cite sources)
  2. With RAG, if we had proprietary or recent information, the LLM couldn’t answer without it
  3. With RAG, we can update knowledge without retraining the model

Let’s test with a question the LLM definitely doesn’t know:

# Add a "research paper" about a fictional concept
fictional_doc = ["""
Recent work by Spencer Lyon (2025) introduced the concept of "resonance centrality"
for dynamic networks. Unlike traditional centrality measures which focus on static
network structure, resonance centrality captures how a node's importance oscillates
over time due to feedback loops.

The resonance centrality R_i(t) for node i at time t is computed as:
R_i(t) = β * Σ_j A_ij(t) * R_j(t-1) + (1-β) * d_i(t)

where A_ij(t) is the time-varying adjacency matrix, d_i(t) is the degree at time t,
and β ∈ [0,1] controls the memory of past centrality.

Lyon showed that in social networks, nodes with high resonance centrality are often
"trend catalysts" - they don't just spread information, but trigger cascades that
feed back to amplify their influence in subsequent time periods.
"""]

add_documents!(network_store, fictional_doc)

# Now ask about it
fictional_q = "What is resonance centrality and how is it calculated?"

println("Question: ", fictional_q)
println("\n" * "="^80)
println("Vanilla LLM Answer:")
println(vanilla_llm_query(fictional_q))
println("\n" * "="^80)
println("RAG Answer:")
println(rag_query(network_store, fictional_q).answer)
Embedding 1 documents...
Done! Store now contains 9 documents.
Question: What is resonance centrality and how is it calculated?

================================================================================
Vanilla LLM Answer:
Resonance centrality is a frequency-dependent notion of node importance for networks with dynamical processes. It ranks nodes by how strongly they respond to an oscillatory (periodic) input that propagates through the network dynamics. In other words, a node is “resonantly central” if the dynamics make it amplify signals at some frequency more than other nodes.

How it is defined (general idea)
- Choose a linear dynamical model on the network (first- or second-order), and a vector b describing which nodes are externally driven.
- Drive the system with a sinusoidal input u(t) = Re{u0 e^{iωt}}.
- Compute the steady-state frequency response (transfer function) from the input to each node. The magnitude of that response at node i is its frequency-dependent centrality gi(ω).
- Resonance centrality of node i is then taken as, e.g., the peak response maxω gi(ω) (and the corresponding ω is the node’s resonant frequency). Some variants use the squared magnitude or the area under the curve across ω.

Common formulations

1) First-order (diffusive/consensus-like) dynamics
- Dynamics: x′(t) = A x(t) + b u(t) with A stable (often A = −L for a Laplacian L of an undirected graph).
- Frequency response: H(ω) = (iωI − A)^{-1} b.
- Node-i response magnitude: gi(ω) = |ei^T H(ω)| = |[(iωI − A)^{-1} b]_i|.
- Resonance centrality (peak gain): Ri = maxω gi(ω).

Notes:
- If A = −L with L ≥ 0 (diffusion), the response typically peaks at low frequency (ω → 0). Then Ri ≈ |[L^† b]_i| (L^† is the Moore–Penrose pseudoinverse), so “resonance” reduces to a low-frequency (quasi-static) sensitivity, closely related to resistance-based measures.

2) Second-order (oscillator/inertial) dynamics
- Dynamics: M x″(t) + C x′(t) + K x(t) = b u(t), with M, C, K ≽ 0 (e.g., coupled oscillators, power grids).
- Dynamic stiffness: Z(ω) = K − ω^2 M + iω C.
- Frequency response: H(ω) = Z(ω)^{-1} b.
- Node-i response magnitude: gi(ω) = |ei^T H(ω)|.
- Resonance centrality: Ri = maxω gi(ω). True resonance peaks occur near the system’s natural frequencies (eigenvalues of (M, K) with damping from C).

Spectral form (useful for computation and insight)
- For first-order A diagonalizable: A = VΛV^{-1}, H(ω) = V (iωI − Λ)^{-1} V^{-1} b, so
  gi(ω) = |∑k (vik wk)/(iω − λk)|, where wk are components of V^{-1}b.
- Peaks occur where |iω − λk| is small and the mode k is excited (wk large) and observed at node i (vik large).

Practical computation
- Choose dynamics (A, or M, C, K) and the input profile b (uniform, single node, or application-specific).
- Pick a frequency grid ω ∈ [ωmin, ωmax] covering relevant dynamics (near natural frequencies if known).
- For each ω, solve a linear system:
  - First-order: y(ω) = (iωI − A)^{-1} b.
  - Second-order: y(ω) = (K − ω^2 M + iω C)^{-1} b.
- Take gi(ω) = |y_i(ω)|. Define Ri = maxω gi(ω) (and optionally record the argmax for the resonant frequency).
- Variants: use squared magnitude, L2 norm over frequencies ∫ gi(ω)^2 dω, or multiple input profiles and average.

Connections to other centralities
- Setting ω = 0 in first-order systems recovers static sensitivity measures linked to resistance distance and current-flow centralities.
- Evaluating (sI − A)^{-1} at complex s relates resonance centrality to resolvent/Katz-type centralities; resonance centrality is the frequency-response version (s on the imaginary axis).

In short: resonance centrality is the node-wise peak (or frequency-shaped) gain of the network’s transfer function under oscillatory driving, computed via inverses of (iωI − A) for first-order or of dynamic stiffness matrices for second-order dynamics.

================================================================================
RAG Answer:
Resonance centrality is a dynamic-network centrality measure that captures how a node’s importance oscillates over time due to feedback loops. It emphasizes not just current connections but how past influence reverberates and amplifies across time.

Calculation:
For node i at time t:
R_i(t) = β * Σ_j A_ij(t) * R_j(t−1) + (1 − β) * d_i(t)

- A_ij(t): time-varying adjacency (connection from j to i at time t)
- R_j(t−1): resonance centrality of node j at the previous time step
- d_i(t): degree of node i at time t
- β ∈ [0,1]: memory parameter controlling how much past centrality influences the present

Interpretation:
- Higher β gives more weight to past influence propagating through the network (feedback).
- Lower β leans more on the node’s current degree.
In social networks, nodes with high resonance centrality tend to be “trend catalysts” that trigger cascades which amplify their future influence.

What happened?

  • The vanilla LLM either says it doesn’t know, or hallucinates a plausible-sounding definition
  • The RAG system correctly retrieves and uses our fictional paper
  • This demonstrates how RAG enables working with proprietary, recent, or domain-specific knowledge

Trade-offs and Design Considerations

Now that we’ve built a working RAG system, let’s discuss key design decisions and trade-offs.

Parametric vs Retrieval-Based Knowledge

Parametric Knowledge (in model weights):

  • Pros:
    • Fast: no need to search external sources
    • Synthesizes information from vast training corpus
    • Good for general knowledge and reasoning
  • Cons:
    • Fixed at training time
    • Can’t update without retraining
    • May hallucinate when uncertain
    • No citations or provenance

Retrieval-Based Knowledge (via RAG):

  • Pros:
    • Can use up-to-date information
    • Access to proprietary/specialized knowledge
    • Provides citations and sources
    • Reduces hallucination
    • Updates without retraining
  • Cons:
    • Slower (requires retrieval step)
    • Quality depends on retrieval accuracy
    • Requires maintaining vector database
    • May miss information requiring synthesis across many sources

Key Design Decisions

1. Chunk Size

  • Smaller chunks (100-200 words):
    • More precise retrieval
    • Better for question answering
    • May lose context across chunk boundaries
  • Larger chunks (500-1000 words):
    • Preserve more context
    • Better for complex reasoning
    • May include irrelevant information

2. Number of Retrieved Documents (top_k)

  • Few documents (k=1-3):
    • Focuses on most relevant information
    • Faster, uses less context window
    • May miss important information
  • Many documents (k=5-10):
    • More comprehensive context
    • Better for complex queries
    • May include noise, uses more tokens

3. Embedding Model

  • Smaller models (text-embedding-3-small: 512-1536 dims):
    • Faster embedding and search
    • Lower cost
    • Slightly lower quality
  • Larger models (text-embedding-3-large: 3072 dims):
    • Better semantic understanding
    • More accurate retrieval
    • Higher cost and latency

Advanced RAG Techniques

Our implementation is a basic RAG pipeline. Production systems often use:

1. Hybrid Search

  • Combine vector similarity with keyword search (BM25)
  • Leverages both semantic and lexical matching
  • More robust to different query types

2. Re-ranking

  • Use a separate model to re-rank retrieved results
  • Can consider query-document interaction more deeply
  • Improves precision of top results

3. Query Expansion

  • Generate multiple variations of user query
  • Search with all variations
  • Helps with vocabulary mismatch between query and documents

4. Hierarchical Retrieval

  • First retrieve at document level
  • Then retrieve specific passages within those documents
  • Balances context and precision

5. Iterative RAG

  • Let LLM decide what to retrieve next
  • Multiple retrieval rounds
  • Better for complex, multi-hop reasoning

Connection to Course Themes

RAG systems connect to several course concepts:

Networks and Information Flow

  • RAG creates edges from LLM to knowledge sources
  • Quality of answer depends on network structure (which documents are connected via similarity)
  • Similar to how social network structure affects information diffusion

Game Theory and Mechanism Design

  • In multi-agent RAG systems, agents must decide what information to share
  • Strategic information retrieval: costly to retrieve everything, must be selective
  • Related to costly network formation from our game theory module

Agent-Based Modeling

  • Can model RAG systems as agents with retrieval actions
  • Environment is the knowledge base
  • Rules govern when to retrieve, what to retrieve, how to synthesize
  • Next lecture will explore multi-agent AI systems in depth

Exercise 2: Build Your Own RAG System

Choose one of the following domains and build a RAG system:

Option A: Game Theory Assistant

  • Create documents summarizing key game theory concepts from our Week 8-9 lectures
  • Include: Nash equilibrium, dominant strategies, mixed strategies, auctions
  • Test with questions like “What is a Nash equilibrium?” or “Explain the winner’s curse”

Option B: Julia Programming Helper

  • Create documents with Julia programming tips and common patterns
  • Include: array operations, data frames, plotting, packages we’ve used
  • Test with questions like “How do I create a histogram?” or “What’s the syntax for filtering a DataFrame?”

Option C: Your Domain

  • Choose a topic you’re interested in
  • Create 5-10 documents with information
  • Build and test a RAG system

For your chosen option:

  1. Create at least 5 documents (can be shorter than our examples)
  2. Build a vector store
  3. Test with at least 3 different questions
  4. Compare RAG answers vs vanilla LLM answers
  5. Experiment with different values of top_k (number of retrieved documents)
# TODO: your code here

# Step 1: Create your documents
my_docs = [
    # Your documents here
]

# Step 2: Build vector store
my_store = VectorStore()
# add_documents!(my_store, my_docs)

# Step 3: Test with questions
# test_q1 = "..."
# result = rag_query(my_store, test_q1)
# println(result.answer)

Exercise 3: Evaluating Retrieval Quality

An important aspect of RAG systems is retrieval quality - are we finding the right documents?

Using our network_store:

  1. Create 3 test queries where you know which document(s) should be retrieved
  2. For each query, perform retrieval with different values of k (k=1, 3, 5)
  3. Examine the similarity scores - what do they tell you?
  4. Try a query that requires information from multiple documents
  5. Try a query that’s completely off-topic (e.g., about cooking) - what gets retrieved?

Questions to explore:

  • Does the highest similarity document always have the best answer?
  • How much does similarity score drop from rank 1 to rank 3?
  • What happens when the query uses different terminology than the documents?
# TODO: your analysis here

# Example structure:
# test_queries = [
#     "query 1...",
#     "query 2...",
#     "query 3..."
# ]

# for query in test_queries
#     println("Query: ", query)
#     results = search(network_store, query; top_k=5)
#     # Analyze results
# end

Exercise 4: Design Decisions

Consider the following scenarios and discuss what RAG design choices you would make:

Scenario 1: Legal Document Q&A

  • Database: 10,000 legal contracts and case documents
  • Queries: Lawyers asking about specific clauses and precedents
  • Requirements: High accuracy, must cite sources, complex reasoning

Scenario 2: Customer Support Chatbot

  • Database: 500 FAQ articles and product documentation
  • Queries: Customers asking how to use products, troubleshoot issues
  • Requirements: Fast responses, conversational, handles unclear questions

Scenario 3: Academic Research Assistant

  • Database: 100,000 research papers
  • Queries: Researchers asking about literature, methodologies, findings
  • Requirements: Comprehensive answers, identifies connections between papers

For each scenario, decide:

  1. Chunk size (small, medium, large)?
  2. Number of documents to retrieve (k=?)?
  3. Would you use any advanced techniques (hybrid search, re-ranking, etc.)?
  4. How would you handle queries that require information from multiple documents?
  5. How would you evaluate the system’s performance?

Write your analysis below:

Your Analysis:

Scenario 1: Legal Document Q&A

  • Chunk size: ...
  • k: ...
  • Advanced techniques: ...
  • Multi-document handling: ...
  • Evaluation: ...

(Continue for other scenarios)

Looking Ahead: Multi-Agent Systems

  • In this lecture we built a single-agent RAG system
  • The LLM agent retrieves information and generates answers
  • But what if we had multiple AI agents working together?
  • Next lecture (A1.03) we’ll explore multi-agent AI systems:
    • Multiple LLM agents with specialized roles
    • Agents that coordinate and communicate
    • Connections to agent-based models and game theory
    • Building complex workflows with agentic systems

Think about:

  • How might multiple agents improve retrieval? (e.g., one agent generates queries, another evaluates relevance)
  • What if different agents had access to different knowledge bases?
  • How does this relate to our study of networks and strategic interaction?

Key Takeaways

  1. LLMs have fundamental knowledge limitations: cutoffs, hallucination, missing domain-specific info

  2. RAG augments LLMs with external knowledge through retrieval from vector databases

  3. Embeddings represent text as vectors capturing semantic meaning, enabling similarity search

  4. A RAG pipeline has four stages: chunking, embedding, retrieval, generation

  5. Design trade-offs matter: chunk size, number of retrieved docs, embedding model all affect quality and performance

  6. RAG enables practical AI applications with up-to-date, domain-specific, and proprietary knowledge

  7. RAG connects to network concepts: creating dynamic information flow from LLM to knowledge sources

  8. Citations and provenance are key benefits of RAG over pure parametric knowledge

Further Reading

Foundational Papers:

  • Lewis et al. (2020) “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks”
  • Gao et al. (2023) “Retrieval-Augmented Generation for Large Language Models: A Survey”

Advanced RAG:

  • Khattab et al. (2023) “DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines” arXiv:2310.03714
  • Shi et al. (2023) “REPLUG: Retrieval-Augmented Black-Box Language Models” arXiv:2301.12652

Vector Databases:

Embeddings: