log in
consulting hosting industries the daily tools about contact

Streaming Claude Responses in PHP: What Actually Works

Streaming from the Claude API in PHP is doable, but the path there has some real gotchas. Here's what I learned shipping it in production.

Streaming LLM responses is one of those features that looks trivial in Python demos and turns into a two-day detour in PHP. I integrated Claude's streaming API into a Laravel app for a document-review tool last year and came out the other side with a working pattern — and a list of things that will quietly destroy you if you don't know about them.

What Streaming Actually Buys You

The non-streaming version of the Claude API is simple: send a request, wait, get a response. For short replies that's fine. For anything substantive — summaries, analysis, longer text — you're staring at a blank screen for 10+ seconds. Users assume it's broken.

Streaming sends tokens back as they're generated. The model starts writing, the browser starts showing text, and the experience feels fast even if the total time is the same. It's table stakes for any user-facing LLM feature in 2025.

The Claude API uses Server-Sent Events (SSE) over HTTP — a stream of data: lines, each containing a JSON payload. That's the right tool for this job. The problem is that PHP's default HTTP client (Guzzle) isn't built for long-lived streaming connections, and Laravel's response cycle wants to buffer everything and then send it. You have to work against both of those defaults.

The Pattern That Works

I'll show you the actual shape of what I ship. There are three pieces: the service class that talks to Claude, a Laravel controller that streams to the browser, and a bit of JavaScript to consume it.

The Service: Stream with Guzzle, Process Line by Line

Guzzle supports streaming if you ask for it explicitly. The key options are stream => true and a read_timeout that's long enough for the model to finish.

<?php

namespace App\Services;

use GuzzleHttp\Client;
use GuzzleHttp\RequestOptions;
use Generator;

class ClaudeStreamService
{
    private Client $client;
    private string $apiKey;
    private string $model;

    public function __construct()
    {
        $this->apiKey = config('services.anthropic.api_key');
        $this->model  = config('services.anthropic.model', 'claude-opus-4-5');

        $this->client = new Client([
            'base_uri' => 'https://api.anthropic.com',
            'timeout'  => 0,        // No total timeout — the stream can run long
            'read_timeout' => 120,  // But bail if we stop getting bytes
        ]);
    }

    /**
     * Yields content delta strings as they arrive from the API.
     */
    public function streamMessage(string $prompt, int $maxTokens = 1024): Generator
    {
        $response = $this->client->post('/v1/messages', [
            RequestOptions::HEADERS => [
                'x-api-key'         => $this->apiKey,
                'anthropic-version' => '2023-06-01',
                'content-type'      => 'application/json',
                'accept'            => 'text/event-stream',
            ],
            RequestOptions::JSON => [
                'model'      => $this->model,
                'max_tokens' => $maxTokens,
                'stream'     => true,
                'messages'   => [
                    ['role' => 'user', 'content' => $prompt],
                ],
            ],
            RequestOptions::STREAM => true,
        ]);

        $body = $response->getBody();

        while (! $body->eof()) {
            $line = $this->readLine($body);

            if (! str_starts_with($line, 'data: ')) {
                continue;
            }

            $json = substr($line, 6);

            if ($json === '[DONE]') {
                break;
            }

            $event = json_decode($json, true);

            if (json_last_error() !== JSON_ERROR_NONE) {
                continue;
            }

            if (($event['type'] ?? '') === 'content_block_delta'
                && ($event['delta']['type'] ?? '') === 'text_delta'
            ) {
                yield $event['delta']['text'];
            }
        }
    }

    private function readLine($stream): string
    {
        $line = '';

        while (! $stream->eof()) {
            $char = $stream->read(1);

            if ($char === "\n") {
                break;
            }

            $line .= $char;
        }

        return rtrim($line, "\r");
    }
}

A few things worth calling out. timeout => 0 is intentional — Guzzle's total timeout will kill a long-running stream before it finishes. You want read_timeout instead, which only fires if the server goes silent. Reading one byte at a time in readLine looks naive, but Guzzle's stream is buffered underneath. It works fine in practice and keeps the line-parsing logic dead simple.

The Controller: StreamedResponse Is the Key

This is where most PHP examples I found online were wrong. They'd echo inside a controller action and hope for the best. That doesn't work reliably — output buffering, nginx gzip, and Laravel's middleware stack will all conspire to batch your output.

The right answer is Laravel's StreamedResponse:

<?php

namespace App\Http\Controllers;

use App\Services\ClaudeStreamService;
use Illuminate\Http\Request;
use Symfony\Component\HttpFoundation\StreamedResponse;

class ClaudeController extends Controller
{
    public function stream(Request $request, ClaudeStreamService $claude): StreamedResponse
    {
        $prompt = $request->validate(['prompt' => 'required|string|max:4000'])['prompt'];

        return response()->stream(function () use ($prompt, $claude) {

            foreach ($claude->streamMessage($prompt) as $chunk) {
                // SSE format: each message is "data: ...\n\n"
                echo 'data: ' . json_encode(['text' => $chunk]) . "\n\n";
                ob_flush();
                flush();
            }

            echo "data: [DONE]\n\n";
            ob_flush();
            flush();

        }, 200, [
            'Content-Type'      => 'text/event-stream',
            'Cache-Control'     => 'no-cache',
            'X-Accel-Buffering' => 'no',   // Critical for nginx
            'Connection'        => 'keep-alive',
        ]);
    }
}

X-Accel-Buffering: no is the header that cost me an afternoon. Without it, nginx buffers the entire response before sending it downstream, which completely defeats the point. Add it and the stream starts flowing immediately.

The Browser Side

const response = document.getElementById('response');
const source   = new EventSource('/claude/stream?prompt=' + encodeURIComponent(prompt));

source.onmessage = (event) => {
    if (event.data === '[DONE]') {
        source.close();
        return;
    }

    const parsed = JSON.parse(event.data);
    response.textContent += parsed.text;
};

source.onerror = () => {
    source.close();
};

For anything beyond a quick prototype I'd use the Fetch API with a ReadableStream instead of EventSource — it gives you better control over the connection lifecycle, lets you pass POST bodies, and handles auth headers cleanly. But EventSource is fine for getting something working.

Gotchas That Will Bite You

Output buffering at every layer. PHP has it, nginx has it, Cloudflare (if it's in the path) has it. The three mitigations: ob_flush() + flush() in PHP, X-Accel-Buffering: no for nginx, and disable Rocket Loader / Smart Buffering in Cloudflare if you're using it. Miss any one of them and tokens batch up before reaching the browser.

The timeout => 0 vs read_timeout distinction. Guzzle's timeout is the total request time. Set it to 30 seconds and any response that takes longer gets killed with no error that makes sense. Set it to 0 (unlimited) and control it with read_timeout — a 120-second read_timeout will fire if the stream stalls, which is what you actually want to catch.

JSON parsing errors in the stream. The Claude API occasionally sends a ping event or other non-data lines. Always guard your json_decode call and check for JSON_ERROR_NONE. The continue in my loop handles this, but I've seen code that just blindly calls $event['delta']['text'] and dies on a ping.

PHP's max execution time. If max_execution_time is set to 30 or 60 seconds in your php.ini (common on shared hosting or conservative VPS setups), long prompts will get hard-killed mid-stream. Either increase it or call set_time_limit(0) inside the streamed response callback.

Connection drops from the client. EventSource auto-reconnects, which sounds helpful but will re-fire your endpoint and start a new Claude request. If that matters to your use case, track a session/request ID and handle reconnects explicitly.

Rate limits surface strangely in streams. If you hit a 429 mid-stream, the body you're reading just stops. You won't get a clean error event — the stream closes. Wrap your streamMessage generator call in a try/catch and check $response->getStatusCode() before entering the read loop.

When I'd Reach for This

Any user-facing text generation where the output is longer than a sentence or two. Document summaries, draft generation, analysis of uploaded content — anything where the user is watching text appear. The perceived performance difference is dramatic, and users who've been trained by ChatGPT expect streaming now.

I'd also reach for this when I'm generating structured output incrementally — JSON chunks as they arrive, for example — though that requires more careful stream-parsing logic.

When I wouldn't: Internal batch jobs. If you're processing 500 documents overnight and writing results to a database, streaming adds complexity with zero user-facing benefit. Use the standard messages endpoint, batch with a queue, and call it done.

The Bottom Line

Streaming Claude responses in PHP is not hard, but the surface area of things that silently break it is larger than I expected. Get the Guzzle options right, use StreamedResponse, and add that nginx header — those three things will save you most of the pain. The rest is just careful reading of a byte stream, which PHP can absolutely do.

Need help shipping something like this? Get in touch.