OpenAI Function Calling: Don't Let Your Schema Drift

Function calling is the most useful thing OpenAI added to the API since the API itself. It's also the fastest way I've found to build something that works great in staging, ships to production, and then silently starts returning garbage two weeks later when someone changes a function signature without updating the schema. I've hit this twice now — once on a healthcare scheduling tool, once on an e-commerce order triage system — and both times it was the same root cause: schema drift.

What Function Calling Actually Does

The marketing copy calls it "tool use" and makes it sound like the model is executing your code. It isn't. What's actually happening is simpler and more useful: you hand the model a JSON Schema description of one or more functions, and when the model decides it needs to call one, it returns a structured JSON blob instead of prose. You execute the function yourself, then feed the result back into the conversation. The model never touches your code.

That distinction matters a lot in practice. The model is making a routing decision and a parameter extraction decision. You're responsible for everything else — validation, execution, error handling, and feeding results back coherently. Once I internalized that, function calling stopped feeling magical and started feeling like a well-behaved parsing layer sitting in front of my own code.

Where this shines: anywhere you need the model to extract structured data from messy natural language and hand it to a real system. Patient intake parsing. Order lookup by natural-language criteria. Filtering a product catalog. Triggering a webhook with parameters derived from a user message. These are all cases where you'd otherwise be writing fragile regex or doing multiple prompt-then-parse roundtrips.

A Working Example

Here's a simplified version of what I built for an order management integration — a Laravel service class that wraps the OpenAI call, defines the tool schema, and handles the response properly.

<?php

namespace App\Services;

use OpenAI\Laravel\Facades\OpenAI;
use App\Models\Order;
use Illuminate\Support\Facades\Log;

class OrderAssistantService
{
    private array $tools = [
        [
            'type' => 'function',
            'function' => [
                'name' => 'lookup_order',
                'description' => 'Look up an order by order number or customer email. Use this when the user wants to check status, tracking, or details of a specific order.',
                'parameters' => [
                    'type' => 'object',
                    'properties' => [
                        'order_number' => [
                            'type' => 'string',
                            'description' => 'The order number, e.g. ORD-10042',
                        ],
                        'customer_email' => [
                            'type' => 'string',
                            'format' => 'email',
                            'description' => 'Customer email address if no order number is available',
                        ],
                    ],
                    'required' => [],
                    'additionalProperties' => false,
                ],
            ],
        ],
        [
            'type' => 'function',
            'function' => [
                'name' => 'cancel_order',
                'description' => 'Cancel an order that has not yet shipped. Requires explicit confirmation from the user before calling.',
                'parameters' => [
                    'type' => 'object',
                    'properties' => [
                        'order_number' => [
                            'type' => 'string',
                            'description' => 'The order number to cancel',
                        ],
                        'reason' => [
                            'type' => 'string',
                            'enum' => ['customer_request', 'duplicate', 'fraud_suspected', 'item_unavailable'],
                            'description' => 'Cancellation reason',
                        ],
                    ],
                    'required' => ['order_number', 'reason'],
                    'additionalProperties' => false,
                ],
            ],
        ],
    ];

    public function chat(array $messages): array
    {
        $response = OpenAI::chat()->create([
            'model' => 'gpt-4o',
            'messages' => $messages,
            'tools' => $this->tools,
            'tool_choice' => 'auto',
        ]);

        $choice = $response->choices[0];

        // No tool call — just return the text
        if ($choice->finishReason === 'stop') {
            return [
                'type' => 'message',
                'content' => $choice->message->content,
            ];
        }

        // Tool call requested
        if ($choice->finishReason === 'tool_calls') {
            $toolCall = $choice->message->toolCalls[0];
            $functionName = $toolCall->function->name;
            $arguments = json_decode($toolCall->function->arguments, true);

            if (json_last_error() !== JSON_ERROR_NONE) {
                Log::error('OpenAI returned malformed tool arguments', [
                    'raw' => $toolCall->function->arguments,
                ]);
                throw new \RuntimeException('Malformed tool arguments from model');
            }

            $result = $this->dispatch($functionName, $arguments);

            // Feed the result back
            $followUp = OpenAI::chat()->create([
                'model' => 'gpt-4o',
                'messages' => array_merge($messages, [
                    $choice->message->toArray(),
                    [
                        'role' => 'tool',
                        'tool_call_id' => $toolCall->id,
                        'content' => json_encode($result),
                    ],
                ]),
                'tools' => $this->tools,
            ]);

            return [
                'type' => 'message',
                'content' => $followUp->choices[0]->message->content,
            ];
        }

        throw new \UnexpectedValueException('Unhandled finish reason: ' . $choice->finishReason);
    }

    private function dispatch(string $functionName, array $args): array
    {
        return match ($functionName) {
            'lookup_order' => $this->lookupOrder($args),
            'cancel_order' => $this->cancelOrder($args),
            default => throw new \InvalidArgumentException("Unknown function: {$functionName}"),
        };
    }

    private function lookupOrder(array $args): array
    {
        $order = null;

        if (!empty($args['order_number'])) {
            $order = Order::where('order_number', $args['order_number'])->first();
        } elseif (!empty($args['customer_email'])) {
            $order = Order::where('email', $args['customer_email'])
                ->latest()
                ->first();
        }

        if (!$order) {
            return ['found' => false, 'message' => 'No matching order found'];
        }

        return [
            'found' => true,
            'order_number' => $order->order_number,
            'status' => $order->status,
            'total' => $order->total,
            'created_at' => $order->created_at->toDateString(),
            'tracking_number' => $order->tracking_number,
        ];
    }

    private function cancelOrder(array $args): array
    {
        $order = Order::where('order_number', $args['order_number'])->firstOrFail();

        if ($order->shipped_at) {
            return ['success' => false, 'message' => 'Order has already shipped and cannot be cancelled'];
        }

        $order->update(['status' => 'cancelled', 'cancellation_reason' => $args['reason']]);

        return ['success' => true, 'order_number' => $order->order_number];
    }
}

A few things worth noting in that code. I'm checking json_last_error() on the decoded arguments — the model almost never returns malformed JSON for tool arguments, but almost never isn't never, and I'd rather throw an exception than silently pass null to a database query. I'm also defining my tools as a class property rather than inline in the method, which sets me up for the schema drift fix I'll describe next.

The Schema Drift Trap

Here's what bit me. I had a lookup_order function that originally only accepted order_number. A few weeks in, I added customer_email as a fallback lookup — updated the PHP method, updated the database query, done. Except I didn't update the JSON schema sitting in that service class, so the model didn't know customer_email was an option. It would either hallucinate a response or just tell the user it couldn't help. No exception, no log line, nothing. Just a quietly worse product.

The fix I landed on: treat the schema as the contract, not the implementation. And derive one from the other wherever possible.

For enum-backed parameters like reason, I now pull values directly from a PHP enum or a constants array, so the schema and the validation layer share a single source of truth:

private function buildCancelSchema(): array
{
    return [
        'type' => 'object',
        'properties' => [
            'order_number' => ['type' => 'string'],
            'reason' => [
                'type' => 'string',
                'enum' => CancellationReason::values(), // e.g. backed enum with a static values() method
            ],
        ],
        'required' => ['order_number', 'reason'],
        'additionalProperties' => false,
    ];
}

For anything more complex, I write a feature test that sends a handful of representative natural-language inputs to the actual OpenAI API (not mocked), asserts that the function is called, and asserts that the returned arguments contain the expected keys. These tests run in CI against a gpt-4o-mini model to keep costs low. They're slow compared to unit tests — about 3-4 seconds each — but they've caught schema drift twice and I now consider them non-optional for any production tool-use integration.

Other Gotchas

additionalProperties: false is your friend. Setting it on every schema object forces the model to stay within the fields you've defined and makes debugging a lot cleaner when something unexpected comes back.

The required array is not optional. I've seen setups where developers leave required as an empty array, expecting the model to figure out what's actually required from the description text. Sometimes it does. Sometimes it calls the function with missing arguments and your code dies without a useful error. Be explicit.

Parallel tool calls. By default, newer models can return multiple tool calls in a single response. If your dispatch logic only handles toolCalls[0], you'll miss them. For most of my use cases, I disable this with 'parallel_tool_calls' => false unless I specifically need it.

Tool descriptions are prompts. The description field isn't documentation — it's the primary signal the model uses to decide whether and how to call a function. "Look up an order" is worse than "Look up an order by order number or customer email. Use this when the user wants to check status, tracking, or details of a specific order." Vague descriptions produce unreliable routing.

When I'd Reach for This

Function calling is the right tool when you need the model to extract structured parameters from natural language and trigger deterministic code. Anything where you'd otherwise be parsing prose output with regex. Chatbots that need to query real data. Voice-to-action pipelines. Intake forms where the user might express information in a dozen different ways.

I wouldn't use it when you just need the model to generate text and you don't care about structure — plain completion is simpler and cheaper. I also wouldn't use it as a replacement for proper input validation downstream. The model extracts arguments; you still validate them against your business rules before touching a database.

If I'm building something where the tool set is large or changes frequently, I've started looking at MCP (Model Context Protocol) instead, which offloads the schema management problem. But for a bounded set of five to ten well-defined tools, function calling directly in the API is still my default.

The Bottom Line

Function calling is genuinely mature and reliable — the model is good at it. The failure modes I've seen are almost always on the engineering side: schemas that lag behind implementations, missing required fields, and no automated tests to catch either. Treat the schema as a contract you own and test it like one, and function calling becomes one of the more trustworthy layers in an LLM-backed system.