I owe my livelihood to the vast network of servers chatting with each other online. I’m heartened by web applications that send data so quickly and reliably that their physical distance is imperceptible to me, the user. When you factor in the value and variability of information passed around online, it all seems so risky and precarious. So how do servers conduct themselves when sending data to each other across the wild, wild web? What is the proper protocol for sending and receiving virus-free content?
Most of the web is based on the HyperText Transfer Protocol (HTTP) request/response model of client-initiated transactions. Your browser, the client, politely requests data from a server and the server responds to those requests. Even better, when following the Transmission Control Protocol (TCP) on which HTTP is based, the client will send an acknowledgement that the data has been received so the server can resend if the data gets lost. Because of this behavior, TCP is described as a connection based, stream-oriented protocol.
Consider the User Datagram Protocol (UDP), an alternative to TCP. While the communication is still initiated by the client, this protocol is “connectionless” or message-oriented, because the server will send data without checking on the state or readiness of the client. This is the late 90s coffeeshop slam poetry of data transfer: the server will refuse to do normal audience banter, instead shouting a jumbled bunch of datagrams and then dropping the mic.
By boldly forgoing the customary handshake, the hassle of worrying if the data has been received by the client, and any guarantee that the data will be sent in order1; UDP offers low latency data transfer for bandwidth-intensive applications that can lackadaisically tolerate packet loss and disarray, like multiplayer games or, everyone’s favorite new hobby, video conferencing.
And Now, Double Time
The test of good manners is to be able to put up pleasantly with bad ones. – Wendell Willkie
Like etiquette, data transfer protocols are complicated and evolving. As a connection-based protocol, HTTP has been called too “chatty”, because it requires so much reassuring back and forth between client and server. In contrast, UDP could be called too “venty” because it does not give a hoot about what the client has to say while communicating a torrent of information.
The chattiness of HTTP started to grate on engineers in the mid-2000s as web applications with high-frequency request-response cycles (real-time web applications if you will4) became more popular. Setting up a new connection every request/response cycle introduced too much latency. Ideally there would be a persistent connection between client and server that was bidirectional, allowing the server to send data to the client without the client having to request it.5
The most popular solution was long polling, where an XMLHttpRequest (XHR) connection to the server was held open until ongoing communication was no longer required. However, long polling was not ideal because the cookie and header data within the HTTP request made data transfer bandwidth intensive, and multiple simultaneous HTTP requests from the same client could scramble message order.
Was it possible to create a bidirectional connection based protocol with the reliability of TCP, that was low latency and message-oriented like UDP, and also built to address all the concerns that arise when operating in a web environment?
“The world was my oyster but I used the wrong fork.” – Oscar Wilde
Frustration eventually led developers Michael Carter and Ian Hickson to introduce a new standard for modern real-time, bi-directional communication on the web that they coined ‘WebSocket’6. In 2010 Google Chrome 4 was the first browser to fully support WebSockets and today all major browsers support WebSockets.
WebSockets provide a persistent, low latency connection that can support transactions initiated by either the client or server7. To establish a WebSocket connection, the client uses standard HTTP protocol to request an “upgrade” from HTTP to a WebSocket connection. If the server supports WebSockets it will respond with an HTTP/1.1 101 Switching Protocols response, which replaces the initial HTTP connection with a WebSocket connection that uses the same underlying TCP/IP connection.
In other words, WebSockets servers authenticate clients with that classic TCP handshake over HTTP and then, after upgrading to a WebSocket connection, repurpose that underlying TCP connection. The intent is to provide the thinnest possible transport layer on top of TCP while handling some web-specific concerns.
WebSockets are great for the chat applications like the one I work with at my current job because:
- They don’t require the client to poll the server for new messages, instead the server can send messages to the client whenever data becomes available.
- They use a frame-based messaging system that reduces the amount of non-payload data that is transferred, reducing latency.
- They provide a mechanism to detect dropped (disconnected) clients and can handle up to 1024 connections per browser.
A downside to using WebSockets is that they don’t automatically recover when connections are terminated. Luckily, the client-side library we use does this for us.
I enjoyed learning about WebSockets for work, and with the confidence of a beginner I wonder, would it be possible to build WebSockets on a UDP rather than TCP connection? Would this make them feel closer to the transport layer protocol the word “socket” implies, rather than a tag along to HTTP? Surprise me!
WebSockets had a contentious beginning8 and a very unstable spec for some time9. It reminds one of Princess Maria Argyropoulina, who scandalized Venice by having guests at her wedding use the novel fork in place of their God-given hands10. St. Peter Damian, a hermit and ascetic, was so offended by the princess’s daintiness that he declared her untimely death from the plague a just punishment by God for her vanity. Maybe at first WebSockets seemed like a vestigial limb growing out of HTTP, but like forks, we can’t imagine living without them now.
I wonder if there was talk about adding UDP support for the web during the browser wars raged between Netscape and Microsoft from 1995 to 2001, but I wasn’t there, I was spilling my guts to SmarterChild at my best friend’s house. ↩
While reading gossipy comparisons of TCP to UDP, it struck me that sacrificing reliability for speed comes at an exceptionally high cost in computing. Content from TCP “heads” reminds me of the steadfast popularity of ACID-compliant data stores, like PostgreSQL, despite their age and decided unfashionableness. By the way, top TCP head content is “I could tell you a joke about UDP, but you might not get it.” ↩
The term ‘real-time’ always makes me laugh. A ‘real-time’ application sounds like you are applying to become a fourth-dimensional being. ↩
You can still find the IRC chat and the W3C mailing list where they started planning out WebSockets. Personally I think the coolest part is when they talk about whether not wanting your IRC chat logged is a valid privacy concern. Someone says it’s antisocial, another (jokingly?) that it’s so not web 2.0. Prescient. ↩