WebSocket-Over-HTTP Protocol

The WebSocket-Over-HTTP protocol is a simple, text-based protocol for gatewaying between a WebSocket client and a conventional HTTP server.

Why?

Pushpin’s Generic Realtime Intermediary Protocol (GRIP) enables out-of-band message injection into WebSocket connections. Normally, using GRIP with WebSockets requires a WebSocket connection on both sides of the proxy:

Client <--WS--> GRIP Proxy <--WS--> Server

The GRIP Proxy is a publish/subscribe service. When the server has data to send spontaneously, it does not use its WebSocket connection to send the data. Rather, it uses an out-of-band publish command to the proxy (usually via HTTP POST). This means that the WebSocket connection between the proxy and the server is used almost exclusively for servicing incoming requests from the client.

If the communication path between the proxy and the server only needs to handle request/response interactions, then HTTP becomes a viable alternative to a WebSocket:

Client <--WS--> GRIP Proxy <--HTTP--> Server

Using HTTP for communication between the proxy and server may be easier to maintain and scale since HTTP server tools are well established. Plus, if the server is merely doing stateless RPC processing, then HTTP is arguably a respectable choice for this tier in the service.

Of course, the usefulness of this gatewaying is entirely dependent on the server having a way to send data to clients out-of-band. As such, it is recommended that the WebSocket-Over-HTTP protocol be used in combination with GRIP. Note, however, that the WebSocket-Over-HTTP protocol does not explicitly depend on GRIP.

Protocol

The gateway and server exchange WebSocket “events” via HTTP requests and responses. The following events are defined:

Events are encoded in a format similar to HTTP chunked transfer encoding:

TEXT B\r\n
hello world\r\n

The format is the name of the event, a space, the hexidecimal encoding of the content size, a carriage return and newline, the content bytes, and finally another carriage return and newline.

For events with no content, the size and content section can be omitted:

OPEN\r\n

Events with content are TEXT, BINARY, and CLOSE. Events without content are OPEN, PING, PONG, and DISCONNECT.

An event that should not contain content MAY be encoded with content. Receivers should ignore such content. For example, this is legal:

OPEN 0\r\n
\r\n

One or more encoded events are then concatenated and placed in the body of an HTTP request or response, with content type application/websocket-events.

Example

Gateway opens connection:

POST /target HTTP/1.1
Connection-Id: b5ea0e11
Content-Type: application/websocket-events
[... any headers included by the client WebSocket handshake ...]

OPEN\r\n

Server accepts connection:

HTTP/1.1 200 OK
Content-Type: application/websocket-events
[... any headers to include in the WebSocket negotiation response ...]

OPEN\r\n

Gateway relays message from client:

POST /target HTTP/1.1
Connection-Id: b5ea0e11
Content-Type: application/websocket-events

TEXT 5\r\n
hello\r\n

Server responds with two messages:

HTTP/1.1 200 OK
Content-Type: application/websocket-events

TEXT 5\r\n
world\r\n
TEXT 1C\r\n
here is another nice message\r\n

Gateway relays a close message:

POST /target HTTP/1.1
Connection-Id: b5ea0e11
Content-Type: application/websocket-events

CLOSE 2\r\n
[... binary status code ...]\r\n

Server sends a close message back:

HTTP/1.1 200 OK
Content-Type: application/websocket-events

CLOSE 2\r\n
[... binary status code ...]\r\n

State Management

Headers of the initial WebSocket negotiation request MUST be replayed with every request made by the gateway. This means that if the client uses cookies or other headers for authentication purposes, the server will receive this data with every message.

The gateway includes a Connection-Id header which uniquely identifies a particular client connection. Servers that need to track connections can use this. In most cases, though, servers should not have to care about connections.

It is possible to bind metadata to the connection via a Set-Meta-* header. This works similar to a cookie. The server can set a field that the gateway should echo back on all subsequent requests.

For example, a client supplies a cookie which the gateway relays across during connect:

POST /target HTTP/1.1
Connection-Id: b5ea0e11
Content-Type: application/websocket-events
Cookie: [... auth info ...]

OPEN\r\n

The server accepts the connection and binds a User field based on the cookie:

HTTP/1.1 200 OK
Content-Type: application/websocket-events
Set-Meta-User: alice

OPEN\r\n

Now, any further requests from the gateway will include a Meta-User header:

POST /target HTTP/1.1
Connection-Id: b5ea0e11
Meta-User: alice
Content-Type: application/websocket-events

TEXT 5\r\n
hello\r\n

Security note: gateways MUST NOT relay any headers from the client that are prefixed with Meta-. This prevents the client from spoofing metadata bindings. Additionally, the server needs to ensure that an incoming request came from a gateway before trusting its Meta-* headers.

Keep Alives

If the server is tracking connections, it will need a way to reliably detect when connections have gone away. The gateway will try to send CLOSE or DISCONNECT events on a best-effort basis, but if such events are not received by the server then connection state may linger on the server side. To work around this, the server should enable keep alives and timeout connections when they become inactive.

To enable keep alives, the server responds with a Keep-Alive-Interval header specifying a value in seconds:

HTTP/1.1 200 OK
Content-Type: application/websocket-events
Keep-Alive-Interval: 120

This header tells the gateway to make a request to the server whenever the specified amount of time has passed since the last request, even if there are no events to send to the server yet (in which case the request body will be empty).

Please note this setting is not acknowledged by the gateway, and the gateway may enforce a minimum value. For compatibility with environments where a minimum may not be known, it is recommended that a conservative value be chosen, no less than 30.

Notes