Packetizer
Understanding VoIP

VoIP Protocols

There are a number of protocols that may be employed in order to provide for VoIP communication services. In this section, we will focus on those which are most common to the majority of the devices deployed and being deployed today.

Virtually every device in the world uses a standard called Real-Time Protocol (RTP) for transmitting audio and video packets between communicating computers. RTP is defined by the IETF in RFC 3550. The payload format for a number of CODECs are defined in RFC 3551, though payload format specifications are defined in documents also published by the ITU and in other IETF RFCs. RTP also addresses issues like packet order and provides mechanisms (via the Real-Time Control Protocol, or RTCP, also defined in RFC 3550) to help address delay and jitter.

One of the areas of concern for people communicating over the Internet is the potential a person to eavesdrop on communication. To address these security concerns, RTP was improved upon with the result being called Secure RTP (defined in RFC 3711). Secure RTP provides for encryption, authentication, and integrity of the audio and video packets transmitted between communicating devices.

Before audio or video media can flow between two computers, various protocols must be employed to find the remote device and to negotiate the means by which media will flow between the two devices. The protocols that are central to this process are referred to as call-signaling protocols, the most popular of which are H.323 and Session Initiation Protocol (SIP) and they both rely on static provisioning, RAS (ITU-T Rec. H.225.0), DNS, TRIP (RFC 3219), ENUM (RFC 3762), and other protocols to find other users.

H.323 and SIP both have their origins in 1995 as researchers looked to solve the problem of how two computers can initiate communication in order to exchange audio and video media streams. H.323 enjoyed the first commercial success, due to the fact that those working on the protocol in the ITU worked quickly to publish the first standard in early 1996. SIP, on the other hand, progressed much more slowly in the IETF, with the first draft published in 1996, but the first recognized "standard" published later in 1999. SIP was revised over the years and re-published in 2002 as RFC 3261, which is the currently recognized standard for SIP. These delays in the standards process resulted in delays in market adoption of the SIP protocol.

Fundamentally, H.323 and SIP allow users to do the same thing: to establish multimedia communication (audio, video, or other data communication). However, H.323 and SIP differ significantly in design, with H.323 borrowing heavily from legacy communication systems and being a binary protocol, and with SIP not adopting many of the information elements found in legacy systems and being an ASCII-based protocol. Supporters of each protocol have debated at length as to which approach is better and the results are certainly mixed.

Over the years, there have been a lot of papers debating H.323 vs. SIP, but most of the arguments have often been "religious" in nature (e.g., "ITU vs. IETF" and "binary versus ASCII"). Very few of the papers and reports have compared the protocol on the basis of functionality and what really matters: does the protocol do the job? The fact is, both can do the job, though H.323 is superior in a number of ways: better interoperability with the PSTN, better support for video, excellent interoperability with legacy video systems (e.g., H.320), and reliable out-of-band transport of DTMF. SIP, being a "session initiation protocol", was not designed to address many of the problems that were raised and solved in legacy communication systems. SIP was also popularized in the market through misstatements that it was "easy to implement and debug". The truth is that there is a certain amount of complexity in any communication system and, no matter how one looks at it, it requires about the same amount of work to do the same thing two different ways.

In the simplest deployment, the SIP implementation is certainly easier to develop and troubleshoot. However, there are very few real-world deployments that are "simple". As a result, SIP proponents have defined a number of non-standard variations of SIP (e.g., SIP-T and SIP-I), as well as a number of non-standard extensions in order to carry the necessary information or provide the required functionality. Some have said that there are as many variations of SIP as there are SIP deployments.

Today, H.323 still commands the bulk of the VoIP deployments in the service provider market for voice transit, especially for transporting voice calls internationally. H.323 is also widely used in room-based video conferencing systems and is the #1 protocol for IP-based video systems. SIP has, most recently, become more popular for use in instant messaging systems, though there have been no successful commercial deployments of SIP-based instant messaging at the time of this writing.

Both H.323 and SIP can be referred to as "intelligent endpoint protocols". What this means is that all of the intelligence required to locate the remote endpoint and to establish media streams between the local and remote device is an integral part of the protocol. There is another class of protocols which is complementary to H.323 and SIP referred to as "device control protocols". Those protocols are H.248 and MGCP.

To understand the purpose of H.248 and MGCP, it is important to first understand the function of a gateway. A gateway is a device that offers an IP interface on one side and some sort of legacy telephone interface on the other side. The legacy telephone interface may be complex, such as an interface to a legacy PSTN switch, or may be a simple interface that allows one to connect one or a few traditional telephones. Depending on the size and purpose of the gateway, it may allow IP-originated calls to terminate to the PSTN (and vice-versa) or may simply provide a means for a person to connect a telephone to the Internet.

Originally, gateways were viewed as monolithic devices that had call control (H.323/SIP) and hardware required to control the PSTN interface. In 1998, the idea of splitting the gateway into two logical parts was proposed: one part, which contains the call control logic, is called the media gateway controller (MGC) or call agent (CA), and the other part, which interfaces with the PSTN, is called the media gateway (MG). With this functional split, a new interface existed (going between the MGC and MG), driving the necessity to define MGCP and H.248.

Some service providers provide users with devices that implement H.248 or MGCP (or comparable protocols). In the core of the network, some device serving as the MGC provides the H.323 or SIP logic necessary to properly terminate VoIP calls around the world.

Outside of H.323/SIP and H.248/MGCP, there are also non-standard protocols introduced by various companies that have been very successful in the market. Skype is one such company that has been extremely successful using a proprietary protocol. Which protocol is best for you? It really depends on your requirements, but most people simply want to make a phone call and, as such, it really does not matter.

It is also important to remember that, just as with every other new capability introduced in the world of high-tech, there is always something new and bigger coming down the road. Presently, the ITU is working on a new protocol that will have much more capability than either SIP or H.323. The new protocol is referred to as H.325 and is expected to enable voice, video, and data communications capabilities across a number of separate devices that work together, such as a mobile phone, a PC, and even an HD TV!

<< >>