VoIP Protocols

There are a number of protocols that may be employed in order to provide for VoIP communication services. In this section, we will focus on those which are most common to the majority of the devices deployed and being deployed today.

Virtually every device in the world uses a standard called Real-Time Protocol (RTP) for transmitting audio and video packets between communicating computers. RTP is defined by the IETF in RFC 3550. The payload format for a number of CODECs are defined in RFC 3551, though payload format specifications are defined in documents also published by the ITU and in other IETF RFCs. RTP also addresses issues like packet order and provides mechanisms (via the Real-Time Control Protocol, or RTCP, also defined in RFC 3550) to help address delay and jitter.

One of the areas of concern for people communicating over the Internet is the potential a person to eavesdrop on communication. To address these security concerns, RTP was improved upon with the result being called Secure RTP (defined in RFC 3711). Secure RTP provides for encryption, authentication, and integrity of the audio and video packets transmitted between communicating devices.

Before audio or video media can flow between two computers, various protocols must be employed to find the remote device and to negotiate the means by which media will flow between the two devices. The protocols that are central to this process are referred to as call-signaling protocols, the most popular of which are H.323 and Session Initiation Protocol (SIP) and they both rely on static provisioning, RAS (ITU-T Rec. H.225.0), DNS, TRIP (RFC 3219), ENUM (RFC 3762), and other protocols to find other users.

H.323 and SIP both have their origins in 1995 as researchers looked to solve the problem of how two computers can initiate communication in order to exchange audio and video media streams. H.323 enjoyed the first commercial success, due to the fact that those working on the protocol in the ITU worked quickly to publish the first standard in early 1996. SIP, on the other hand, progressed much more slowly in the IETF, with the first draft published in 1996, but the first recognized standard published in 1999. SIP was revised over the years and re-published in 2002 as RFC 3261, which is the currently recognized standard for SIP. These delays in the standards process resulted in delays in market adoption of the SIP protocol, though it has nonetheless seen wide market use today. H.323, too, has been continually revised over the years, with the most recent version published in 2009.

Fundamentally, H.323 and SIP allow users to do the same thing: to establish multimedia communication (audio, video, or other data communication). However, H.323 and SIP differ significantly in design, with H.323 borrowing heavily from legacy communication systems and being a binary protocol, and with SIP not adopting many of the information elements found in legacy systems and being an ASCII-based protocol. Supporters of each protocol have debated at length as to which approach is better and the results are certainly mixed.

Over the years, there have been a lot of papers debating H.323 vs. SIP, but most of the arguments have often been "religious" in nature (e.g., "ITU vs. IETF" and "binary versus ASCII"). Very few of the papers and reports have compared the protocol on the basis of functionality and what really matters: does the protocol do the job? The fact is, both can do the job, though each has certain strengths and weaknesses, but users cannot see those differences. Thus, from a practical point of view, the differences are irrelevant.

H.323 and SIP can be referred to as "intelligent endpoint protocols". What this means is that all of the intelligence required to locate the remote endpoint and to establish media streams between the local and remote device is an integral part of the protocol. There is another class of protocols which is complementary to H.323 and SIP referred to as "device control protocols". Those protocols are H.248 and MGCP.

To understand the purpose of H.248 and MGCP, it is important to first understand the function of a gateway. A gateway is a device that offers an IP interface on one side and some sort of legacy telephone interface on the other side. The legacy telephone interface may be complex, such as an interface to a legacy PSTN switch, or may be a simple interface that allows one to connect one or a few traditional telephones. Depending on the size and purpose of the gateway, it may allow IP-originated calls to terminate to the PSTN (and vice-versa) or may simply provide a means for a person to connect a telephone to the Internet.

Originally, gateways were viewed as monolithic devices that had call control (H.323/SIP) and hardware required to control the PSTN interface. In 1998, the idea of splitting the gateway into two logical parts was proposed: one part, which contains the call control logic, is called the media gateway controller (MGC) or call agent (CA), and the other part, which interfaces with the PSTN, is called the media gateway (MG). With this functional split, a new interface existed (going between the MGC and MG), driving the necessity to define MGCP and H.248.

Some service providers provide users with devices that implement H.248 or MGCP (or comparable protocols). In the core of the network, some device serving as the MGC provides the H.323 or SIP logic necessary to properly terminate VoIP calls around the world.

Outside of H.323/SIP and H.248/MGCP, there are also non-standard protocols introduced by various companies that have been very successful in the market. Skype is one such company that has been extremely successful using a proprietary protocol. Which protocol is best for you? It really depends on your requirements, but most people simply want to make a phone call and, as such, it really does not matter, so long as it works for you.

It is also important to remember that, just as with every other new technology introduced in the world of high-tech, there is always something new and bigger coming down the road. At the time of this writing, the IETF and W3C are collaborating on development of WebRTC, which enables the web browser with VoIP and videoconferencing capabilities, as well as use of "data channels" to tranmit any kind of data between users (think text, whiteboarding, file transfer, etc). WebRTC does not define the signaling between two web servers, but instead leaves that open for implementers to select.