Streaming LLM Responses in React with Server-Sent Events

When I first shipped an AI chat feature, I thought the hard part was the prompt engineering. I was wrong. The hard part was the UX of waiting.

If you have ever built a chat interface that waits for the entire LLM response to finish before showing anything, you know the pain. The user stares at a loading spinner for five, ten, or sometimes twenty seconds. It feels like an eternity. In the world of real-time AI, that delay is a conversion killer.

When I added streaming to the chat feature on this site, the experience changed instantly. Instead of waiting, the user sees the response appear token by token. It feels alive. It feels fast.

Here is how I implemented streaming LLM responses in React using Server-Sent Events, and why I chose this approach over WebSockets.

How Server-Sent Events Work for LLM Streaming

When developers think of real-time communication, they often jump straight to WebSockets. WebSockets are powerful, but they are also overkill for this specific use case. LLM streaming is unidirectional. The client sends a request, and the server pushes a stream of text back.

Server-Sent Events (SSE) are perfect for this. SSE is a standard that allows servers to push data to web pages over HTTP. It is simpler to implement, handles reconnections automatically, and works over standard HTTP, which makes it much easier to manage in serverless environments like Cloudflare Workers.

SSE is the unsung hero of AI UX. It gives you the real-time feel of WebSockets with the simplicity of a standard HTTP request.

Building the Server-Side Stream

To stream tokens, I needed to transform the response from the LLM API into a stream that the client could consume. In a TanStack Start environment, this means handling the response as a ReadableStream.

Here is how I set up the server-side handler to stream tokens:

// api.ai.chat.ts
export const POST = async ({ request }) => {
  const { prompt } = await request.json();
  const response = await llmProvider.chatStream(prompt);

  const stream = new ReadableStream({
    async start(controller) {
      const encoder = new TextEncoder();
      for await (const chunk of response) {
        const text = chunk.choices[0]?.delta?.content || '';
        controller.enqueue(encoder.encode(`data: ${JSON.stringify({ text })}\n\n`));
      }
      controller.close();
    },
  });

  return new Response(stream, {
    headers: {
      'Content-Type': 'text/event-stream',
      'Cache-Control': 'no-cache',
      'Connection': 'keep-alive',
    },
  });
};

This pattern converts the LLM provider's async iterator into a standard ReadableStream that adheres to the SSE format. Each chunk is prefixed with data: and terminated with two newlines.

Consuming Streams in React

On the client side, I needed a way to read this stream and update the UI incrementally. While you can use the native EventSource API, I prefer using fetch with a ReadableStream because it gives me more control over headers and request bodies.

Here is a simplified version of the hook I use to consume the stream:

// useChat.ts
export function useChat() {
  const [messages, setMessages] = useState<string>('');

  const sendMessage = async (prompt: string) => {
    const response = await fetch('/api/ai/chat', {
      method: 'POST',
      body: JSON.stringify({ prompt }),
    });

    const reader = response.body?.getReader();
    const decoder = new TextDecoder();

    while (true) {
      const { done, value } = await reader!.read();
      if (done) break;
      
      const chunk = decoder.decode(value, { stream: true });
      const lines = chunk.split('\n\n');
      
      for (const line of lines) {
        if (line.startsWith('data: ')) {
          const { text } = JSON.parse(line.slice(6));
          setMessages((prev) => prev + text);
        }
      }
    }
  };

  return { messages, sendMessage };
}

This hook reads the stream, decodes the bytes into text, and parses the SSE data chunks to update the state.

Handling Edge Cases in Production

Streaming is not just about the happy path. In production, things break. Connections drop. Users navigate away.

When I built this, I had to account for:

Reconnection: If the connection drops, the client needs to know where it left off. SSE handles this natively if you use the Last-Event-ID header, but for simple chat, I often just allow the user to retry the request.
Cancellation: If the user closes the chat or navigates away, the server should stop processing the LLM request. I use an AbortController to cancel the fetch request, which propagates the cancellation to the server.
Backpressure: If the LLM generates tokens faster than the client can render them, the stream can buffer. The ReadableStream API handles this naturally, but it is something to keep in mind if you are doing heavy processing on the client.

Performance Wins

The impact on performance was immediate. Before streaming, the time-to-first-token (TTFT) was effectively the total response time. After implementing streaming, the TTFT dropped to a few hundred milliseconds.

The perceived latency improvement is massive. Users no longer feel like the application is frozen. They see the AI "thinking" and responding in real-time, which builds trust and keeps them engaged.

Practical Takeaways

If you are adding AI features to your React application, do not skip streaming. It is the single most important UX improvement you can make.

Start with SSE: It is simpler and more reliable than WebSockets for LLM streaming.
Use ReadableStream: It is the standard way to handle streaming data in modern browsers and server environments.
Prioritize TTFT: Your goal is to get the first token to the user as fast as possible.
Handle Cancellation: Always provide a way to stop the stream to save server resources and improve user control.

Streaming LLM responses in React is not just a technical challenge. It is a fundamental part of building modern, responsive AI applications.