Ask Knox

The OpenAI API is the most widely integrated AI service in the world. Understanding it properly — not just copying a snippet from the docs — is the difference between a brittle prototype and a production-ready integration.

This lesson covers the full setup: API key, authentication, the message array structure, token counting, and your first working call.

Step 1: API Key Setup

Go to platform.openai.com. Create an account if you do not have one. Navigate to API keys in the left sidebar. Click Create new secret key.

Name the key for context ("my-app-dev" or "production-backend"). Copy the key immediately — OpenAI will only show it once.

Set it as an environment variable in your shell or .env file:

export OPENAI_API_KEY="sk-proj-..."

Add spending limits in platform settings. Set a hard limit that matches your actual budget. You will thank yourself later.

Step 2: Install the SDK

OpenAI publishes official SDKs for Python and Node/TypeScript. These wrap the raw HTTP calls with retry logic, type definitions, and ergonomic helpers.

Python:

pip install openai

Node/TypeScript:

npm install openai

You can also call the API directly via HTTP if you prefer — the SDK is just a convenience wrapper.

Step 3: Your First Chat Completion Call

Python:

from openai import OpenAI

client = OpenAI()  # reads OPENAI_API_KEY from environment

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a concise technical assistant."},
        {"role": "user", "content": "What is the difference between TCP and UDP?"}
    ]
)

print(response.choices[0].message.content)

Node/TypeScript:

import OpenAI from "openai"

const client = new OpenAI()  // reads OPENAI_API_KEY from environment

const response = await client.chat.completions.create({
  model: "gpt-4o",
  messages: [
    { role: "system", content: "You are a concise technical assistant." },
    { role: "user", content: "What is the difference between TCP and UDP?" }
  ]
})

console.log(response.choices[0].message.content)

Run it. You will get a response. That is the API working.

The Message Array Structure

The messages parameter is the core of the API. It is an ordered list of conversation turns, each with a role and content.

system sets the model's persona and operating instructions. It is processed before any user message. Use it to define who the model is, what it knows, and how it should respond. Think of it as the briefing before the conversation.

{ "role": "system", "content": "You are a senior financial analyst specializing in options trading. Respond with precise, jargon-appropriate language. Never provide investment advice." }

user is the human turn. This is where the actual request or question lives.

{ "role": "user", "content": "Explain what an iron condor is and when you would use one." }

assistant is the model's previous responses. Include prior turns to build conversation history and enable follow-up questions.

{ "role": "assistant", "content": "An iron condor is a neutral options strategy..." }

To build a multi-turn conversation, append each user message and each model response to the array before sending the next call. The API is stateless — it does not remember previous calls. You are responsible for maintaining history.

history = [
    {"role": "system", "content": "You are a concise technical assistant."}
]

# Round 1
history.append({"role": "user", "content": "What is a context window?"})
response = client.chat.completions.create(model="gpt-4o", messages=history)
assistant_reply = response.choices[0].message.content
history.append({"role": "assistant", "content": assistant_reply})

# Round 2
history.append({"role": "user", "content": "How does it affect cost?"})
response = client.chat.completions.create(model="gpt-4o", messages=history)
print(response.choices[0].message.content)

Token Counting Basics

Tokens are the unit of measurement for AI API usage. Roughly: 1 token = 4 characters = ¾ of a word. "The quick brown fox" is 4 tokens. A typical 500-word article is ~650 tokens.

Every API call has:

Prompt tokens: everything in your messages array
Completion tokens: the model's response
Total tokens: the sum, which determines cost

Find actual usage in the response:

print(response.usage.prompt_tokens)
print(response.usage.completion_tokens)
print(response.usage.total_tokens)

GPT-4o costs $2.50 per million prompt tokens and $10 per million completion tokens. A 1,000-token request + 500-token response costs approximately $0.0075. Small, but it multiplies fast at scale.

Key Parameters

Beyond model and messages, these parameters matter for production use:

temperature (0–2): Controls randomness. 0 = deterministic, factual. 1 = balanced. 2 = highly creative/unpredictable. Default is 1. For extraction, classification, and factual tasks, use 0. For creative tasks, use 0.7–1.2.

max_tokens: Maximum length of the response. Set this based on what your use case actually needs.

top_p: Alternative to temperature. Use one or the other, not both. Most production code leaves this at default.

n: Number of response variants to generate. Almost always leave at 1 unless you are doing ranking/selection.

Bottom Line

The OpenAI API is a stateless HTTP service. You send a messages array. You get a response. The message array — system, user, assistant roles — is the entire interface. Master it and every other API pattern (function calling, structured outputs, streaming) is just a variation.

The next lesson maps the full OpenAI model lineup so you can route intelligently based on task complexity and cost.