H.323 versus SIP: A Comparison
This is the best comparison of H.323 and SIP available anywhere, having been revised and updated numerous times by leading VoIP and videoconferencing experts. Virtually all of the others are misleading, out-of-date, and just plain wrong. To compound the problem—to further propagate the error, as it were—we have also seen several papers written by naive students and rank-and-file engineers that blindly parrot what they have read in these comparisons. Furthermore, many, many people have formed their opinions of H.323 and SIP based not on each protocol's merits but solely on the misinformation provided by these comparisons and through other information provided by largely the same sources.
To counter this misinformation, we decided to put together this thorough, up-to-date comparison. As with ours, please consider the financial interests of the source of any information on this subject, be it an author, speaker, institution, forum, company, web site, or conference. Are the people providing information on this issue involved in both of these—and other—protocols and have nothing besides perhaps an honest academic interest in one or the other protocol, or have they otherwise "hitched their wagon" to one?
Like everything else on the web, this is a living document which we will be updating as the standards evolve. In fact, there is much work in progress for both H.323 and SIP, but, in order to compare apples to apples and make this comparison meaningful, we have chosen to focus on what is currently defined rather than on what might be defined in the future. Also, note that commentary that is not vital to the main comparison text appears in a smaller font immediately below it.
H.323 | SIP | |
---|---|---|
Philosophy |
H.323 was designed with a good understanding of the requirements for multimedia communication over IP networks, including audio, video, and data conferencing. It defines an entire, unified system for performing these functions, leveraging the strengths of the IETF and ITU-T protocols. As a result, it might be reasonable for users to expect about the same level of robustness and interoperability as is found on the PSTN today, although this admittedly varies across the globe. H.323 was designed to scale to add new functionality. The most widely deployed use of H.323 is "Voice over IP" followed by "Videoconferencing", both of which are described in the H.323 specifications. |
SIP was designed to setup a "session" between two points and to be a modular, flexible component of the Internet architecture. It has a loose concept of a call (that being a "session" with media streams), and has no intrinsic support for multipoint multimedia conferencing (though implementers have built conferencing services to provide conferencing support). SIP is now . H.323 is roughly the same age, but this age is highlighted to draw attention to the fact that both of these systems are really old. Though there have been some efforts to create a more modern communication standards, companies thus far have either elected to keep using these old systems, create proprietary systems that do not interwork to a significant degree, or focus effort on web-based conferencing (i.e., WebRTC) where, unfortunately, solutions are still proprietary. |
Complexity |
H.323 is limited to multimedia conferencing, so the complexity of the system is constrained accordingly. No communication system is simple, but H.323 attempts to clearly define the basic set of functionality that all devices must support. |
SIP was initially focused on voice communication and then expanded to include video, application sharing, instant messaging, presence, etc. With each capability, complexity increases and, unfortunately, there are no strict guidelines as to what functionality any given device must support. This leads to more complex systems with more interoperability problems. SIP was "marketed" as a simple protocol, in spite of the fact it only looks simple on the surface. Telephony is a hard problem and, regardless of how one wants to deliver it, the total system is going to have a certain level of complexity out of necessity. |
Reliability |
H.323 has defined a number of features to handle failure of intermediate network entities, including "alternate gatekeepers", "alternate endpoints", and a means of recovering from connection failures. |
SIP has not defined procedures for handling device failure. If a proxy fails, the user agent detects this through timer expiration. It is the responsibility of the user-agent to send a re-INVITE to another proxy, leading to long delays in call establishment. |
Message Definition |
ASN.1, a standardized, extremely precise, easy-to-understand structural notation that is used by many other systems. |
ABNF, or Augmented Backus-Naur Form, a syntactical notation. SIP uses the ABNF as defined in RFC 2234. |
Message Encoding |
H.323 encodes messages in a compact binary format that is suitable for narrowband and broadband connections. Messages are efficiently encoded and decoded by machines, with decoders widely available (e.g., Ethereal). |
SIP messages are encoded in ASCII text format, suitable for humans to read. As a consequence, the messages are large and less suitable for networks where bandwidth, delay, and/or processing are a concern. SIP messages get so large that they sometimes exceed the MTU size when going over WAN links, resulting in delays, packet loss, etc. As a result, effort has been made to binary encode SIP (e.g., RFC 3485 and RFC 3486). |
Media Transport | ||
Extensibility - Vendor Specific |
H.323 is extended with non-standard features in such a way as to avoid conflicts between vendors. Globally unique identifiers prevent feature and data element collision. |
SIP is extended by adding new header lines or message bodies that may be used by different vendors to serve different purposes, thus risking interoperability problems. The risk is admittedly small, but this problem has already been seen in the real world with similar extension schemes. |
Extensibility - Standard |
H.323 is extended by the standards community to add new features to H.323 in such a way as to not impact existing features. However, new revisions of H.323 are published periodically, which introduce new functionality that is mandatory, yet done in such a way as to preserve backward compatibility. |
SIP is extended by the standards community to add new features to SIP in such a way as to not impact existing features. However, new revisions of SIP are potentially not backward compatible (e.g., RFC 3261 was not entirely compatible with RFC 2543). In addition, several extensions are "mandatory" in some implementations, which cause interoperability problems. |
Scalability - Load Balancing |
H.323 has the ability to load balance endpoints across a number of alternate gatekeepers in order to scale a local point of presence. In addition, endpoints report their available and total capacity so that calls going to a set of gateways, for example, may be best distributed across those gateways. |
SIP has no notion of load balancing, except "trial and error" across pre-provisioned devices or devices learned from DNS SRV records. There is no means of detecting the load on a particular gateway or to know whether a device has failed, meaning that proxies simply have to try a PSTN gateway, wait for the call to timeout, and then try another. |
Scalability - Call Signaling |
When an H.323 gatekeeper is used, it may simply provide address resolution through one RAS message exchange, or it may route all call signaling traffic. In large networks, the direct call model may be used so that endpoints connect directly to one another. |
When using a SIP proxy to perform address resolution for the SIP device, the proxy is required to handle at least 3 full message exchanges for every call. In large networks, such as IMS networks, the number of messages on the wire may be excessive. A basic call between two users may require as many as 30 messages on the wire! |
Scalability - Statelessness |
An H.323 gatekeeper can be stateless using the direct call model. |
A SIP proxy can be stateless if it does not fork, use TCP, or use multicast. |
Scalability - Address Resolution |
H.323 defines an interface between the endpoint and gatekeeper for address resolution using ARQ or LRQ. The H.323 gatekeeper may use any number of protocols to discover the destination address of the callee, including LRQs to other gatekeepers, Annex G/H.225.0, TRIP, ENUM, and/or DNS. The endpoint does not have to be concerned with the mechanics of this process, and the processing requirements for address resolution placed on the gatekeeper by H.323 are for just a single message exchange. Although out of scope of H.323, an H.323 endpoint may perform its own address resolution using ENUM and/or DNS and then place a direct call to the resolved address or provide the resolved address to the gatekeeper as an "alias". |
While SIP has no address-resolution protocol, per se, a SIP user agent may route its INVITE message through a proxy or redirect server in order to resolve addresses. The SIP proxy may use various protocols to discover the destination address of the callee, including TRIP, ENUM, and/or |REFREF|1035||DNS|. The endpoint does not have to be concerned with the mechanics of this process. Unfortunately, the processing requirements placed on the SIP proxy are higher than with H.323 because at least 3 message exchanges must take place between the SIP device, SIP proxy, and the next hop. Although out of scope of SIP, a SIP user agent may perform its own address resolution using ENUM and/or DNS and then place a direct call to the resolved address or through a proxy. |
Addressing |
Flexible addressing mechanisms, including URIs, e-mail addresses, and E.164 numbers. H.323 supports these aliases:
H.323 also supports overlap sending with no additional overhead, except conveyance of the newly received digits in a single message. |
SIP only understands URI-style addresses. This works fine for SIP-SIP devices, but causes some confusion when trying to translated various dialed digits. The unofficial convention is that a "+" sign is inserted in the SIP URI (e.g., "sip:+18005551212@example.com") in order to indicate that the number is in E.164 format, versus a user ID that might be numeric. SIP has support for overlapped signaling defined in RFC 3578, though additional digit received requires transmission of three messages on the wire (a new INVITE, a 484 response to indicate that the address is incomplete, and an ACK). |
Billing |
Even with H.323's direct call model, the ability to successfully bill for the call is not lost because the endpoint reports to the gatekeeper the beginning and end time of the call via the RAS protocol. Various pieces of billing information may be present in the ARQ and DRQ messages at the start and end of the call. |
If the SIP proxy wants to collect billing information, it has no choice but to stay in the call signaling path for the entire duration of the call so that it can detect when the call completes. Even then, the statistics are skewed because the call signaling may have been delayed. Otherwise, there is no mechanism in SIP to perform any accounting/billing function. |
Call Setup |
A call can be established in as few as 1.5 round trips using UDP:
Setup -> Of course, more elaborate call establishment procedures may be required to negotiate complex capabilities, negotiate complex video modes, etc. |
A call can be established in as few as 1.5 round trips using UDP:
INVITE -> Most real-world flows are more complex, as they often pass through one or more proxy devices, have intermediary response messages, and "negotiate" capabilities through a "trial and error" process that is far from scientific. Here is a more real-life SIP call flow. |
Capability Negotiation |
H.323 entities may exchange capabilities and negotiate which channels to open, including audio, video, and data channels. Individual channels may be opened and closed during the call without disrupting the other channels. |
SIP entities have limited means of exchanging capabilities. RFC 3407 is the state of the art, which is more or less a "declaration" mechanism, not a negotiation procedure. The end result is still a "trial and error" approach in case the called party does not support the proposed media. |
Call Forking |
H.323 gatekeeper can control the call signaling and may fork the call to any number of devices simultaneously. |
SIP proxies can control the call signaling and may fork the call to any number of devices simultaneously. |
PSTN Interworking |
H.323 borrows from traditional PSTN protocols, e.g., Q.931, and is therefore well suited for PSTN integration. However, H.323 does not employ the PSTN's circuit-switched technology--like SIP, H.323 is completely packet-switched. How Media Gateway Controllers fit into the overall H.323 architecture is well-defined within the standard. |
SIP has no commonality with the PSTN and such signaling must be "shoe-horned" into SIP. SIP has no architecture that describes the decomposition of the gateway into the Media Gateway Controller and the Media Gateways. This has been a recent study of 3GPP and others in the form of IMS. Presently, there are about 4 "IMS" variants: 3GPP, ITU NGN, 3GPP2, and PacketCable. Pick the architecture you like best, I suppose. |
Services |
Services may be provided to the endpoint through a web-browser interface using HTTP or a feature server using Megaco/H.248. In addition, services may be provided to an endpoint as it places a call, as a call arrives, or during the middle of a call by a gatekeeper or other entity that routes the call signaling. As a result, H.323 is well-suited to providing new services. |
SIP devices can receive service from a SIP proxy as the endpoint places a call, as a call arrives, or during the middle of a call. There is no defined way within SIP of providing services via a web browser or a feature server, as everything is done within the context of a "session". One may provide ad-hoc services through other means, such as XML, SOAP, or CPL. However, there are no standards for this. |
Video and Data Conferencing |
H.323 fully supports video and data conferencing. Procedures are in place to provide control for the conference as well as lip synchronization of audio and video streams. |
SIP has limited support for video and no support for data conferencing protocols like T.120. SIP has no protocol to control the conference and there is no mechanism within SIP for lip synchronization. There is no standard means of recovering from packet loss in a video stream (to parallel H.323's "video fast update" command). |
Administrative Requirements |
H.323 does not require a gatekeeper. A call can be made directly between two endpoints. However, most devices do utilize a gatekeeper for the purpose of registration and address resolution. |
SIP does not require a proxy. A call can be made directly between two user agents. However, most devices do utilize a SIP proxy for the purpose of registration, address resolution, and call routing. |
Codecs |
H.323 supports any codec, standardized or proprietary. No registration authority is required to use any codec in H.323. |
SIP supports any IANA-registered codec (as a legacy feature) or other codec whose name is mutually agreed upon. |
Firewall/NAT support |
Provided by H.323 "proxy" or by the endpoint, both in conjunction with a gatekeeper residing in the public network. H.323 also supports direct point-to-point media flows between devices that are located behind a NAT/FW. Refer to H.460.17, H.460.18, H.460.19, H.460.23, and H.460.24. |
SIP does not define a NAT/FW traversal mechanism, as this is left to other standard. Some standards that have been defined or are being defined are STUN, TURN, ANAT, and ICE. ANAT is popular as a means of addressing IPv4/IPv6 interworking and appears to be widely implemented. As of January 2011, ICE is still not so widely adopted. |
Transport protocol |
Reliable or unreliable, e.g., TCP or UDP. Most H.323 entities use a reliable transport for signaling. |
Reliable or unreliable, e.g., TCP or UDP. Most SIP entities use an unreliable transport for signaling. |
Loop Detection |
Routing gatekeepers can detect loops by looking at the CallIdentifier and destinationAddress fields in call-processing messages. If the combination of these matches an existing call, it is a loop. Infinite loops may be prevented by utilizing the hopCount field in the SETUP message. |
The Via header facilitates this. However, there has been talk about deprecating Via as a means of loop detection due to its complexity. Instead, the Max-Forwards header seems to be the preferred method of limiting hops and therefore loops. In November 2005, a presentation was given on issues with max-forwards. So, what is the right solution? |
Multicast Signaling |
Yes, location requests (LRQ) and auto gatekeeper discovery (GRQ). |
Yes, e.g., through group INVITEs. |
Third-party Call Control |
Yes, through third-party pause and re-routing which is defined within H.323. More sophisticated control is defined by the related H.450.x series of standards. |
Yes, through SIP as described in RFC 3725. |
Minimum Ports for VoIP Call |
3 (Call signaling, RTP, and RTCP.) |
3 (SIP, RTP, and RTCP.) |
Conferencing Entity |
Yes, an MC is required for this, but it could be co-located in a participating endpoint, or all endpoints could contain an MC. A stand-alone conference bride may provide this functionality and H.323 has well-defined procedures for such entities. What distinguishes H.323 is not that it requires yet another onerous physical entity for conferencing (it does not) but that it just has a name for this functionality, an "MC," and that it provides a flexible means of implementing that functionality. |
No; however, SIP user agents may perform conferencing themselves. A stand-alone conference bridge may also provide this functionality. |
Original Title |
"Visual telephone systems and equipment for local area networks which provide a non-guaranteed quality of service" It is now, "Packet-based multimedia communications systems." Despite the word, "VISUAL," in the original title, H.323 has never described just a videoconferencing solution--support for video and data has always been optional. And the reference to LANs may be misleading because H.323 was intended from the start to support simple and "complex topologies" and not just single-segment networks, which "LOCAL AREA NETWORKS" may imply. |
"Application-level protocol for inviting users to multimedia conferences [emphasis ours]" It is now, "SIP: Session Initiation Protocol." Note that the "multimedia conferences" referred to in the original title are loosely coupled multicast conferences, à la MBone. This is because SIP was intended to be just a point-to-point version of SAP and not the "carrier-class solution addressing a wide area" that many would have you believe. |
Lineage |
H.323 is based on H.324, not H.320. However, H.324 was designed to be a better H.320.
|
SIP is frequently allied with the Internet and the World Wide Web by way of HTTP.
While backward compatibility was not maintained between the 1999 and 2002 documents, the version number remained the same "version 2.0". |
Open-source projects |
Yes, e.g., H.323 Plus. |
Yes, e.g., Opal. |
Media Topology |
Unicast, multicast, star, and centralized. |
Unicast, multicast, star, and centralized. |
Authentication |
Yes, via H.235. |
Yes, via HTTP (Digest and Basic), SSL, PGP, S/MIME, or various other means. |
Encryption |
Yes, via H.235 (including use of SRTP, TLS, IPSec, etc.). |
Yes, via SSL, PGP, S/MIME, or various other means. |
DTMF Carriage |
H.245 User Input Indication, RFC 4733, or via the audio stream. The alphanumeric choice of the H.245 UserInputIndication message is the baseline carriage common to all H.323 endpoints, so interoperability is assured. |
There is no baseline carriage, which presents issues of interoperability. Transport of DTMF via the INFO method, RFC 4733, KPML, or the audio stream are all options. |
Standards Documents |
Refer to the H.323 Information Site. |
Refer to the SIP Information Site. |