Tool Use in Prompts

Also known as: function calling, tool calling, tool invocation

Tool Use in Prompts
Tool use in prompts is a technique where an LLM receives a set of function schemas alongside the user message, then responds by selecting a function name and providing typed arguments — which the calling application executes and feeds back as a result.

Tool use in prompts is a mechanism for giving a language model named functions it can invoke, so instead of generating free text, it outputs structured function calls with typed arguments.

What It Is

Most AI applications need to do more than answer questions — they need to look up live prices, query databases, send emails, or call external APIs. Without tool use, you would have to parse free-form text and guess what the model intended. Tool use in prompts solves this by giving the model an explicit list of callable functions, and by making it respond with a structured call rather than prose.

The mechanism works like this: before the model sees the user’s message, the application inserts a block of tool schemas alongside the system prompt. Each schema has three parts — a name (the function identifier), a description (a plain-language explanation of what the function does and when to call it), and a parameters block (a JSON Schema object defining what arguments the function accepts, their types, and which are required). The model reads these schemas as part of the prompt. When the user’s request matches a tool’s purpose, the model responds not with text but with a structured output: a function name paired with argument values that conform to the parameter schema.

Think of it like a set of pre-printed request forms at a help desk. The desk clerk (the application) keeps a stack of approved forms: “Check Account Balance,” “Schedule Appointment,” “Update Contact Record.” A visitor (the user) describes what they need in plain language. The desk agent (the model) decides which form applies, fills in the correct values, and hands it back. The clerk stamps it and takes the action. The agent never opened a drawer — they only wrote on the form.

This separation matters. The model does not execute anything. It produces a structured representation of intent: which function to call and with which arguments. The calling code is what runs the function, handles errors, and decides whether to return the result to the model for further reasoning. Understanding how a model parses these schemas — which parts of the description it reads most closely, how parameter names influence argument generation, and where schema complexity hurts accuracy — is exactly the question the parent article addresses.

How It’s Used in Practice

The most common encounter is AI chat assistants connected to real-time data. Search assistants, customer support bots that check order status, personal assistants that query your calendar — these work by adding tool schemas to the system prompt. When you ask “Has my package shipped?”, the model outputs a call to get_shipment_status with your order number as the argument. The application runs the query, gets a result, and passes it back to the model, which writes the human-readable reply.

A second use case is workflow automation: applications that chain several tool calls in sequence. The model might call search_knowledge_base, then summarize_text, then send_email. The application acts as the loop — executing each tool, returning the result, letting the model decide the next step.

Pro Tip: Tool descriptions are read by the model the same way any other text in the prompt is. Write them as you would write an API docstring for a new developer: say what the function does, what each parameter means, what the function returns, and what constraints apply. Vague descriptions like “processes the request” produce unreliable calls. The parameter name matters too — a field named user_id signals different expectations to the model than one named customer_identifier, even when both carry identical descriptions.

When to Use / When Not

ScenarioUseAvoid
Retrieving live data (prices, inventory, shipping status)
Pure text generation with no external dependencies
Routing user requests to different backend services by intent
Open-ended creative writing or brainstorming
Structured multi-step workflows with defined function signatures
Simple Q&A from content already present in the context window

Common Misconception

Myth: The language model executes the function when it outputs a tool call.

Reality: The model only outputs a structured description of the intended call — the function name and arguments as text or JSON. The application code is what actually runs the function, handles errors, and decides whether to pass the result back to the model. The model has no direct access to your APIs, databases, or file system.

One Sentence to Remember

Tool use in prompts moves the model from generating prose to generating decisions — and your application code turns those decisions into actions. Treat every tool schema like a typed function signature with real documentation, not a label the model reads once and guesses from.

FAQ

Q: What is the difference between tool use and function calling? A: They’re the same mechanism with different names. “Function calling” is OpenAI’s original term; “tool use” is Anthropic’s and others’. Both describe embedding function schemas in the prompt so the model outputs structured calls instead of text.

Q: Can a model call more than one tool in a single response? A: Yes. Most provider implementations support parallel tool calls — the model outputs multiple call specifications in one response. The application runs them, collects the results, and returns them in a follow-up message.

Q: Does tool use require fine-tuning the model? A: No. Tool use is built into most current large language models from major providers — no fine-tuning required. Add tool schemas to your API request; the model handles function selection and argument generation.

Expert Takes

Function calling works because the model was trained on examples of structured outputs paired with schema definitions — it learned the mapping from schema to call. What many practitioners miss is that parameter names are not inert labels: the model’s generation is influenced by them. A field named user_id behaves differently than one named customer_identifier, even with identical descriptions. The schema is part of the effective prompt, and treating it as mere metadata produces unreliable calling behavior.

Tool schemas are interface contracts, and vague contracts produce unpredictable behavior. The highest-reliability implementations share one property: every parameter has a description stating what it is, what format it expects, and what happens when it is absent. Optional parameters should declare their default behavior explicitly. Treat the tool definition like a typed function signature with full documentation — because that is exactly what the model reads when it decides whether and how to call it.

The shift from “chat interface” to “AI agent” is almost entirely a story about giving models structured ways to act on the world. Every AI product that moved beyond demo to production in recent years had tool use at the core. Developers who write tight, well-typed tool schemas build products that hold up under real usage. Developers who treat it as a configuration checkbox ship chatbots that hallucinate their way through API calls.

Tool use is where the abstraction breaks down most visibly. The model outputs a call; the application runs it; something happens in the world — a booking confirmed, a message sent, a record deleted. The chain of accountability is not always clear. Who is responsible when a model calls the wrong function with plausible-looking arguments? The schema author? The prompt engineer? These questions are engineering decisions with downstream consequences that most teams defer until the first incident.