ASK KNOX
beta
LESSON 86

The Assistants API — Threads, Files, and Persistent Conversations

The Assistants API gives you persistent threads, built-in file search over uploaded documents, and a code interpreter — without managing conversation history yourself. It is the right layer for document Q&A and multi-turn task completion.

10 min read·Building with ChatGPT

The chat completions API is stateless. You manage history. You manage context. You handle truncation when the context fills up. For simple use cases, this is fine. For document Q&A, multi-turn task execution, and file-aware assistants, the bookkeeping becomes the application.

The Assistants API abstracts all of that. OpenAI stores the conversation, manages the context window, handles file indexing, and provides built-in tools for RAG and code execution.

Assistants API Lifecycle

The Four Core Concepts

Assistant — a configured AI entity with a name, instructions (system prompt), model selection, and tool definitions. Create an Assistant once and reuse it across many conversations. Equivalent to a custom GPT configuration but fully API-controlled.

Thread — a persistent conversation container. Messages are stored in the Thread on OpenAI's servers. You do not need to manage history arrays — you just add messages and run the Thread. OpenAI handles context window management automatically, truncating old messages intelligently when the thread grows long.

Message — a single turn added to a Thread. Messages have roles (user, assistant) and can include file attachments.

Run — the execution of an Assistant against a Thread. Creating a Run triggers inference. You poll the Run's status until it reaches a terminal state (completed, failed, expired, requires_action).

Setup: Creating an Assistant

from openai import OpenAI

client = OpenAI()

assistant = client.beta.assistants.create(
    name="Document Analyst",
    instructions=(
        "You analyze uploaded documents and answer questions about their content. "
        "Always cite the specific section of the document that supports your answer. "
        "If the document does not contain the answer, say so explicitly."
    ),
    model="gpt-4o",
    tools=[{"type": "file_search"}]
)

assistant_id = assistant.id  # Store this — reuse the assistant, don't recreate

Create the Assistant once. Store the ID. Every user conversation creates a new Thread but reuses the same Assistant.

File Search: Document Q&A

Upload files to a vector store, attach it to the Assistant, and the model retrieves relevant passages automatically:

# Create a vector store
vector_store = client.beta.vector_stores.create(name="Company Docs")

# Upload files to the vector store
with open("q4_report.pdf", "rb") as f:
    client.beta.vector_stores.file_batches.upload_and_poll(
        vector_store_id=vector_store.id,
        files=[f]
    )

# Attach vector store to the assistant
client.beta.assistants.update(
    assistant.id,
    tool_resources={"file_search": {"vector_store_ids": [vector_store.id]}}
)

Once attached, every Run against this Assistant has automatic access to the vector store. When a user asks a question, the model retrieves the relevant passages and grounds its answer in the document content.

Running a Conversation

# Create a thread for this user session
thread = client.beta.threads.create()

# Add the user's message
client.beta.threads.messages.create(
    thread_id=thread.id,
    role="user",
    content="What were the key risk factors mentioned in the Q4 report?"
)

# Run the assistant against the thread
run = client.beta.threads.runs.create_and_poll(
    thread_id=thread.id,
    assistant_id=assistant.id
)

# Get the response when complete
if run.status == "completed":
    messages = client.beta.threads.messages.list(thread_id=thread.id)
    latest = messages.data[0]  # most recent message
    print(latest.content[0].text.value)

create_and_poll is a blocking SDK helper that waits for the Run to complete. For production use with a web server, run the poll loop asynchronously rather than blocking the request thread.

Handling requires_action (Function Calling)

If you include function tools on your Assistant and the model decides to call one, the Run pauses at requires_action:

if run.status == "requires_action":
    tool_outputs = []
    for tool_call in run.required_action.submit_tool_outputs.tool_calls:
        result = dispatch_function(
            tool_call.function.name,
            json.loads(tool_call.function.arguments)
        )
        tool_outputs.append({
            "tool_call_id": tool_call.id,
            "output": json.dumps(result)
        })

    # Resume the run with tool results
    run = client.beta.threads.runs.submit_tool_outputs_and_poll(
        thread_id=thread.id,
        run_id=run.id,
        tool_outputs=tool_outputs
    )

Code Interpreter

The code_interpreter tool gives the model a Python sandbox. It can write and execute code, produce charts, manipulate files, and perform calculations — all without you provisioning any compute.

assistant = client.beta.assistants.create(
    name="Data Analyst",
    instructions="Analyze the uploaded CSV and answer questions about the data. Generate charts when helpful.",
    model="gpt-4o",
    tools=[{"type": "code_interpreter"}]
)

Attach a CSV file to the Thread message and ask the model to analyze it. It will write pandas code, execute it, and return both the code and the results. Charts are returned as file references in the response.

Assistants vs Chat Completions

DimensionChat CompletionsAssistants API
State managementYou manage historyOpenAI manages threads
File RAGYou build itBuilt-in file_search
Code executionYou provisionBuilt-in code_interpreter
CostPer token onlyPer token + tool costs
ControlFullLimited

Use Assistants when the built-in tools match your needs and you want to avoid the engineering overhead. Use chat completions when you need full control over every aspect of the interaction.

Cost Model

Assistants API costs include:

  • Model tokens (same rates as chat completions)
  • file_search vector storage: $0.10/GB/day
  • code_interpreter: $0.03/session when used
  • Retrieval API calls: included in file_search cost

For most document Q&A use cases, the storage cost is negligible. The dominant cost remains model tokens.

Bottom Line

The Assistants API trades control for built-in state management, file RAG, and code execution. Use it for document Q&A, multi-turn task completion, and file-aware conversations where managing the stateful machinery yourself would be the application. Use chat completions when control, cost optimization, or custom integrations are the priority.

The final lesson covers production patterns — the retry logic, rate limiting, cost tracking, and error handling that separates a working prototype from a system that runs reliably at scale.