Supporting WebRTC in the enterprise

Nexus

Discovery

A key piece of information that needs to be established during the signaling process is the IP address and port number on which the endpoint can receive a media stream from the remote endpoint, or the media return address. This process as a whole is ICE, and discovery is the first part, in which each endpoint determines a list of IP addresses and port numbers (ICE candidates) that the remote endpoint might be able to use to send media back. In an ideal world, an ICE candidate would simply be the host's own private IP address and a free port of its own choosing, and in the simple scenario of two endpoints communicating across the same LAN, that is indeed what would be used. The chance that method will work just fine is small but definite, so WebRTC will include all of the device's IP addresses as ICE candidates. However, it is more likely that each client is behind a different NAT, and the remote endpoint will not be able to route media packets back to the local endpoint's private IP address.

The local endpoint needs a way to figure out what public IP address/port number it ought to give the remote endpoint. The correct answer to this problem depends, among other things, on whether the local Internet connection uses domestic-grade cone NAT (in which a single public IP address/port pair allows packets from any source to reach a specific inside host) or commercial-grade symmetric NAT (in which access to an internal host via a given IP address/port combination is restricted to the single public host to which the internal host initiated communication beforehand). This gets complicated, because an endpoint – especially a browser on an end-user's device – does not know the topology that exists between it and the public Internet. This problem is by no means unique to WebRTC and was solved a long time ago in the SIP/VoIP world by means of STUN and TURN servers. RTCPeerConnection takes full advantage of this same technology. See the "STUN and TURN" box for information on how it works.

STUN and TURN

A STUN server is simply a public service accessible on the standard UDP port 3478 that a private host can query to work out its public-facing IP address and NAT port. The private host then supplies that information to remote peers by means of a signaling channel (Figure 10). This second, "reflexive" ICE candidate will work just fine if the NAT gateway behind which the endpoint sits is a full-cone NAT, which typically will be most domestic-grade routers.

Figure 10: Peer-to-peer across the Internet; STUN servers are used for discovery.

STUN's approach does not work for symmetric NAT. Although the local endpoint's STUN query revealed the public IP address and port number that was used to connect to the STUN server itself, that information is of no use to the remote endpoint: If it attempts to send media back to that ICE candidate, the media will not make it back to the local endpoint because the symmetric NAT would (correctly) block it.

A TURN server is like a STUN server (the standard port and request are the same) that also takes on the task of relaying the live remote media back into the NAT pinhole that the endpoint created with its initial query. This third ICE "relay candidate" type comprises the TURN server's own public IP address and a port of its choosing – maybe a set of credentials, as well, to give the endpoint increased control over who can send media to it. When the TURN server receives the remote endpoint's media on this port, it will relay that to the existing (symmetric) NAT pinhole, which it already has with the endpoint.

Figure 11 shows how media can be routed using a TURN server, as well as how the application can take advantage of media relay to provide additional functionality that, in fact, is a necessary part of the way most enterprise-grade WebRTC apps operate (recording being a typical use case that would require all the media to pass through a central point). The app can force the media routing straight to a chosen TURN server simply by specifying it in the parameters it supplies to its RTCPeerConnection call. It will often take the form of three alternatives, which, in order of decreasing desirability and increasing likelihood of being allowed through a corporate firewall, are TURN/UDP, TURN/TCP, and TURN/TLS.

Figure 11: All media streams are relayed via a centralized TURN server, which can also provide multipoint and recording functionality.

The discovery of reflexive and relay candidates yields potential points of failure because it relies on connections to external services on not-that-well-known ports. Allowing access to the standard STUN/TURN port 3478 (and its TLS equivalent 5349) will usually solve this problem. Some services (e.g., Google Hangouts) use other ports. The application server might tell RTCPeerConnection which STUN/TURN service(s) to connect to during the signaling process (Figure 12). Although the default transport for STUN requests is UDP, as with SIP, the request URL allows for a different transport to be specified. In Wireshark, the display filter stun reveals the STUN traffic (Figure 13).

Figure 12: The webrtc-internals page showing specified TURN servers.

Figure 13: Using Wireshark to troubleshoot STUN connections.

Sometimes you will find that STUN/TURN traffic is being sent to standard ports (e.g., 80 and 443) but failing nonetheless. In this case, a web proxy is a common culprit; for example, the web proxy might be blocking the request for authentication reasons.

The webrtc-internals page also shows useful ICE information. In Figure 13, the app is requiring WebRTC to use its chosen TURN servers to forcibly route the media via the application's server (because this WebRTC app needs to make a recording of the conference and transcode the video codec for maximum inter-browser interoperability).

Network Bandwidth

The outcome of a successful signaling and discovery phase is an agreement between the endpoints as to what type of media to send to what IP address:port destination. The endpoints now start to transmit media tracks to each other, and if the ICE process has gone according to plan, that media should arrive at its destination within a few milliseconds of being transmitted. However, scope for user frustration still exists in the form of media quality problems. For a smooth conversation, media packets need to travel between endpoints with as little loss and delay as possible – any latency or jitter of more than about 150ms starts to become noticeable, and 300ms (in one direction) is regarded by IETF as the maximum acceptable amount.

When starting to send a media track, WebRTC implementations use a BWE algorithm to probe rapidly upward to the highest possible bit rate that the network seems able to carry without incurring unacceptable levels of loss and delay, while also allowing a high initial bit rate so that the media starts off at a high quality. On an unconstrained network, and in the absence of any additional video resolution limits requested by getUserMedia, that rate will be approximately 2.5Mbps for a 1080p video camera or screen-share channel.

When the media flow is established, the endpoints will then implement rate control by gathering statistics for packet delay and packet loss. Up to a point, steady delay will not cause the transmitted bit rate to drop significantly, but when the amount of delay varies significantly from one packet to the next (i.e., high jitter), the endpoints will start to reduce their transmitted bit rates rapidly to just a few hundred kilobits per second, resulting in blurry video (Figure 14a and b).

Figure 14: (a) No packet loss. (b) Steady packet loss.

Packet loss of more than about 10% will cause the same reduction in resolution. Figure 15 clearly shows Chrome's response to the sudden onset of 10% packet loss. The green spike is an increase in bit rate caused by retransmissions, followed by a very large reduction in transmitted bit rate, as determined by its congestion control algorithm. The same effect is observable when high jitter is reported by the far end. WebRTC reacts in a very conservative way, drastically reducing the bit rate of its transmitted video streams when network congestion exceeds certain thresholds, for the sake of keeping the streams going consistently at a lower quality rather than allowing frozen images or video artifacts.

Figure 15: A spike and sharp drop in transmitted video bit rate after introducing a 10% packet loss.

One consequence of the adoption of WebRTC on desktop and mobile devices is that every network those devices connect to, be it the main data network or the guest WiFi network, has to be able to supply the high bandwidth and low latency quality of service that real-time video communication needs. The traditional practice of connecting dedicated video conferencing equipment and VoIP phones to a dedicated VLAN, separate from the data VLAN used by people's desktops, is becoming irrelevant, and network administrators will need to bear this in mind when designing their networks and allocating bandwidth.

Identifying the underlying cause(s) of network packet loss is much more difficult than identifying the fact that it is occurring. Unlike a firewall problem, where the exact device whose configuration needs to be fixed is usually obvious, the network connection between two endpoints relies on many individual hops for each to behave perfectly. The culprit is frequently one of the "ends" of the connection – the browser's own network connection, which might be a weak WiFi signal; a slightly suspect cable that works just fine for downloading a web page, but falls apart under the bit rate demands of live high-definition video; or a heavily loaded Internet gateway full of cross-traffic. Before undertaking a more systematic troubleshooting process, you should focus initial troubleshooting efforts on these connections. However, eventually you will come to a point at which you need forensic identification of packet loss causes, and here is a process that can help:

Use the webrtc-internals page or a tool like Wireshark (Figure 16) to identify the IP address to which the browser is trying to send real-time media. The culprit will usually, but not always, be a UDP stream; the short packets in this example are audio and the long ones are camera video or screen share. Note that different-sized packets can be treated differently by routers, so your user, or their remote peer, might be experiencing a clean audio connection but terrible video.

Figure 16: The public IP address/port combination is the successful ICE candidate.

Run a traceroute (MTR) tool with the host set to the public source/destination of the WebRTC media stream. These results need to be interpreted; for example, some individual hops will be configured to ignore all ICMP traffic anyway, resulting in 100% apparent packet loss. You are looking for levels of packet loss that start at a certain hop and then propagate all the way to the destination host. The hop at which the loss starts is the one that needs attention. In Figure 17, the loss starts at my Internet gateway, because I deliberately introduced it with a (network emulation) netem rule.

Figure 17: WinMTR can help you establish which router hop is causing packet loss (e.g., hop 2, in this case).

Of course, the problem could be at either end of the connection – in the sending browser's upload or the receiving browser's download – and the user experiencing the media quality problems is often not the user in whose network the packet loss originates. This need for a coordinated effort is one of the things that makes packet loss troubleshooting difficult. The bottom line is that, when many users on a network start using WebRTC apps that each try to transmit and receive multiple 2.5Mbps video streams, Internet connection upgrades are often the outcome.

WebRTC Test Tool

All of the main topics discussed here – device availability, browser capabilities, connection establishment and signaling, and network performance – can be neatly tested at a high level by a single tool on the WebRTC.org website [6]. WebRTC.org is the home of the WebRTC open source project, and this tool is a reliably up-to-date method of determining compatibility with the latest WebRTC capabilities. The tool runs through each of the main categories and reports on each of the problems it finds. Figure 18 shows an example of its output. If it highlights any problems on your users' browsers, the explanations here should help you isolate the source.