Session 3.5: API Basics: Your First Programmatic Request

Course → Module 3: Web Interface vs API: The Professional Divide

Session 5 of 7

An API call is a structured message. You are filling out a form, not having a conversation. The form says: who you are (authentication), what you want (prompt and parameters), and where to send it (endpoint). The response is another structured message: here is what the model generated, here is how many tokens it used, and here is why it stopped.

This session walks through the anatomy of a single API request and response. No code yet. Just the structure.

The Request: What You Send

Every API request has three parts: the URL you are sending to, the headers (metadata about the request), and the body (the actual content of your request).

graph TD A["API Request"] --> B["URL
(endpoint)"] A --> C["Headers
(authentication +
content type)"] A --> D["Body
(model + system prompt +
messages + parameters)"] B --> E["POST https://api.anthropic.com/v1/messages"] C --> F["x-api-key: sk-ant-...
content-type: application/json"] D --> G["model: claude-sonnet-4-...
system: Your instructions
messages: user prompt
max_tokens: 1024"] style A fill:#2a2a28,stroke:#c8a882,color:#ede9e3

The URL (Endpoint)

The URL tells your request where to go. For text generation, each provider has one main endpoint. You send a POST request to it. POST means "I am sending data to you and I want something back." This is different from GET, which means "just give me information." Every LLM text generation uses POST because you are sending a prompt and receiving generated text.

The Headers

Headers are key-value pairs that travel with your request but are not part of the prompt. They contain metadata: who you are (your API key), what format you are sending (JSON), and what format you expect back (JSON). Headers are like the envelope around a letter. The recipient reads the envelope to know how to handle the letter before opening it.

The Body

The body is the actual content of your request. For LLM APIs, the body is a JSON object containing your model selection, system prompt, conversation messages, and generation parameters.

Body Field	Purpose	Example Value
model	Which model to use	"claude-sonnet-4-20250514"
system	Persistent instructions	"You are a technical writer. No hedging. No filler."
messages	The conversation (user prompts + any prior responses)	[{"role": "user", "content": "Write a product description for..."}]
max_tokens	Maximum response length	1024
temperature	Randomness level	0.3

The Response: What You Get Back

The server processes your request and sends back a response. The response also has structure: a status code (did it work?), headers (metadata), and a body (the actual generated content plus usage data).

Status Codes

The status code is a three-digit number that tells you what happened. You need to know four:

Code	Meaning	What To Do
200	Success	Parse the response body for your content
400	Bad request (your input was wrong)	Check your request body for errors
401	Unauthorized (bad API key)	Check your API key
429	Rate limited (too many requests)	Wait and retry, or reduce concurrency

The Response Body

On success, the response body contains the generated text and metadata about the generation. The key fields are the content (what the model generated), the usage statistics (how many tokens were consumed), and the stop reason (why the model stopped generating).

graph TD A["Response Body (JSON)"] --> B["content
The generated text"] A --> C["usage
input_tokens: 350
output_tokens: 512"] A --> D["stop_reason
end_turn or max_tokens"] A --> E["model
Exact version used"] style B fill:#2a2a28,stroke:#6b8f71,color:#ede9e3 style C fill:#2a2a28,stroke:#c8a882,color:#ede9e3

The stop reason matters. "end_turn" means the model finished naturally. "max_tokens" means it was cut off because it hit your token limit. If your outputs are being truncated, increase max_tokens.

The Complete Flow

Putting it all together: you construct a request with your API key in the headers and your prompt in the body. You send it as a POST to the provider's endpoint. The server processes it and returns a JSON response with the generated text, usage data, and metadata. Your script parses the response, extracts the content, logs the usage, and passes the content to the next stage of your pipeline.

That is the entire mechanism. Every AI content generation, from a single product description to a batch of 500 articles, follows this same request-response pattern. The complexity is not in the API call itself. It is in what you put into the request (your prompts, your system instructions, your parameters) and what you do with the response (your quality checks, your formatting, your routing).

Assignment

Using your browser, a tool like Postman, or a simple curl command, make one API call to any LLM API. It can be the simplest possible request: "Say hello."
The goal is to successfully send a request and receive a response outside of a web chat interface. If you do not have an API key yet, sign up for one (most providers offer free trial credits).
Document every step: what URL did you send to? What headers did you include? What was in the body? What status code did you get back? What was in the response? This documentation is your proof that you understand the mechanism.

API Basics: Your First Programmatic Request