Realtime Chat Architecture with Socket.IO
Reliable chat systems depend on authorization, room membership, durable message records, presence, Redis fanout, file handling, and thoughtful recovery states.
Scaling WebSocket connections
A single Socket.IO server can handle thousands of concurrent connections, but most production chat applications eventually run multiple server instances for reliability and capacity. Multiple servers introduce a fundamental problem: a client connected to server A cannot receive messages from a client connected to server B unless there is a shared communication layer between the servers.
Redis Pub/Sub is the standard solution for Socket.IO horizontal scaling. Each server subscribes to a Redis channel for each room it has active clients in. When a message is published to a room, the server receiving the message publishes it to Redis, and all other servers subscribed to that room receive it and deliver it to their local clients. The result is that a message sent to room-123 is delivered to all clients in that room, regardless of which server they are connected to.
The Socket.IO Redis adapter implements this pattern as a drop-in adapter, but understanding the underlying mechanism is important for debugging scaling issues. A message that is delivered to some clients but not others is often a symptom of adapter misconfiguration, Redis connection failures on one or more servers, or room subscription state that has gotten out of sync.
Sticky sessions — routing clients to the same server for the duration of a session — are sometimes used as an alternative to the Redis adapter, but they create uneven load distribution and complicate deployment. The Redis adapter is operationally more complex to set up but produces a more balanced and maintainable horizontal scaling architecture.
Presence information — who is currently online, who is typing, who is in which room — also needs a shared layer when running multiple servers. A presence model built on in-memory state per server will give different clients different views of who is online. A Redis-backed presence model, with expire-based heartbeats for online status, gives a consistent view across all servers.
A message is a product record
A chat message carries more than text. It is a record with an author, a timestamp, a delivery status, a read receipt, an edit history, a deletion state, a moderation state, a set of reactions, optional attachments, a search index entry, and a retention policy. Treating it as a simple string is a decision that becomes painful when the first customer asks for message history export, the compliance team asks for an audit log, or the support team needs to find a deleted message.
MongoDB works well for chat messages because message shape tends to evolve — adding reactions before you expected to, supporting thread replies after launch, adding message translation on demand. Flexible schema makes those additions less costly than a rigid relational model. But flexible schema does not mean unindexed. Messages need indexes on room and timestamp for efficient pagination, on author for profile pages and moderation, and on content for full-text search if search is in scope.
Pagination strategy matters for user experience. Cursor-based pagination — using the last retrieved message timestamp or ID as the starting point for the next page — is more reliable than offset-based pagination for a dataset that is continuously being appended to. An offset calculated when the user opens the chat will be wrong by the time they scroll to load more history, because new messages have been added to the front.
Retention and archival policies should be designed from the start, not added when the collection size becomes a problem. Messages older than a configurable threshold can be moved to cold storage, archived to an object storage bucket, or summarized and deleted. The decision affects search scope, compliance requirements, and storage costs — all of which should be explicit product and engineering decisions, not defaults accepted by inaction.
File and media handling
Chat applications that support file sharing introduce a separate set of complexity: upload processing, storage, access control, virus scanning, content moderation, CDN delivery, and retention policies. Each of these is a system concern that sits adjacent to the message delivery path but needs its own design.
The upload flow should be decoupled from the message delivery flow. A common approach is a two-step process: the client requests a pre-signed upload URL from the server, uploads directly to object storage using that URL, and then sends a message that references the stored file. This approach keeps large file payloads off the WebSocket connection and off the application server, while the application retains control over who can generate upload URLs.
Access control for stored files requires more thought than most teams expect. If file URLs are publicly accessible, anyone who obtains the URL can access the file even after being removed from the chat room. If file URLs are time-limited signed URLs, the application needs to generate fresh URLs when messages are displayed, which adds latency. If files are served through an application proxy that checks room membership on each request, access control is correct but file serving becomes an application-level bottleneck.
Virus scanning and content moderation for uploaded files are often deferred until they become urgent. The practical approach is to process files asynchronously after upload — quarantine new uploads until scanned, deliver a placeholder in the message until processing is complete, and replace the placeholder with the final status when done. This model makes the upload flow fast while ensuring harmful content is caught before it is delivered.
Failure states are UX
The difference between a polished chat product and a frustrating one is often how it handles weak networks, server restarts, and partial failures. Pending, sent, delivered, failed, retrying, and offline states are not technical implementation details — they are the product's communication with the user about what is actually happening.
Message acknowledgement from the server should drive the client-side status, not optimistic updates. A message shown as 'delivered' before the server has acknowledged it will stay stuck in that state if the network drops after the client sends but before the server receives. A message shown as 'pending' until the server acknowledges creates a slightly less optimistic experience but produces correct behavior under all network conditions.
Reconnection handling requires careful thought about what the client missed. When a socket reconnects, it needs to fetch messages it did not receive during the disconnection period. The naive approach — fetch the entire room history since the last received message — works for short disconnections but is expensive for longer ones and may produce duplicate delivery if messages were received but not persisted before the disconnect.
A practical approach is to maintain a client-side cursor — the ID or timestamp of the last confirmed message — and use that cursor to request missed messages on reconnect. The server returns only what was missed, and the client inserts it into the local state without duplicating messages that were already confirmed.
Realtime feels magical only when the recovery paths are boring and reliable. The architecture should make reconnects, retries, missed events, and partial failures explicit parts of the product design — not edge cases discovered by users in production.