SIP Issues

SIP is plagued with all kinds of issues. On this page, we will highlight some of them. And, yes, we will also admit that in many cases there are ways to address these issues. The problem, though, is that there is usually no universal way to address a given issue. We are also aware that in some cases the SIP specification does not create the problem, but rather it is a related RFC. This would be a correct assertion. However, when building a SIP system, one cannot just implement the SIP specification, but must also implement all of the related system specifications in order to ship a product. That means that things like the offer/answer model, SIP session timers, reliable provisional messages, and other functionality is necessary and, as such, we speak of problems with SIP as a system, not a single RFC.

1. Interoperability

For years and years, there have been regular SIP interoperability events. Even so, SIP still suffers significantly more from interoperability problems that most IP-based communication systems. Sure, basic voice calls work (sometimes), but as soon as one tries to do something more complex, it becomes a real challenge. The really frustrating thing about SIP is that even basic call flows do not work a lot of time. Different vendors expect different sets of call signaling flows that lead to call failure.

The problem is not merely SIP signaling, though. SIP messages and headers are a source of interoperability issues, but so are all of the capability exchanges via SDP and the offer/answer model defined in RFC 3264. SIP is dependent on SDP and the offer/answer model to successfully establish audio/video calls. The use of SDP has presented an array of problems that include proper treatment of media types, attributes, multiple payload types on an m= (media) line, and multiple m= lines. There are even more basic issues with some endpoints handling MIME properly.

2. Codec negotiation

SIP almost has no means of negotiating capabilities or codecs. It does, of course, but it's very fragile. For example, suppose user A calls user B and offers G.729. If user B does not support that codec, then the call fails. That's it. It's over. Is that a carrier-class solution? We do not think so.

What many people do is simply offer multiple codecs in the original "offer", such as G.729 and G.711. In most cases, the called device will accept just one of the offered codecs. According to the offer/answer model, the called device should send the most preferred codec. In reality, many devices just sends what they want. That's actually no worse than most other systems, anyway. However, it is also possible for a called device to accept all of the proposed codecs and to switch between codecs. While that is perfectly legal, it would present most systems with a lot of problems and implementers have to write software to handle this.

There has been some hard work put into trying to improve the capability negotiation of SIP. However, based on history it is quite unlikely that we will see significant improvements in practice. More likely than not, SIP systems will be somewhat constrained in terms of what capabilities it can offer and use in order to ensure backward compatibility with what has already been installed.

3. Parsing messages

SIP defines its own syntax language, which means that everyone has to write a parser by hand. This leads to a lot of interoperability problems at the very core of the protocol. Developers should not have to be worried with how to parse a SIP message and SDP payloads (which, by the way, have an entirely different syntax) in order to do something. It is amazing how many systems today cannot properly parse a complete SIP message. There are some valid syntax constructs that will simply cause calls to fail or devices to crash!

4. Slow error recovery

If SIP is implemented using UDP, as many systems are, then when a message is lost, it can take a very long time to recover. With the more complex IMS system, it takes even more time to recover. SIP tries to place nice on the network, which is an admirable objective. The problem is that users want to get calls through in a matter of seconds and waiting 45 seconds or longer to discover that the far end device is disconnected is unacceptable.

5. No conference control

To build a workable solution, one needs more than just establishing a session. One needs to allow for control over that session. For example, if one end observes traffic congestion, it needs a way to indicate that video transmission bandwidth should be reduced, for example. Or, if a video frame is lost, it needs to report that quicky in order to ensure proper video display. SIP lacks any kind of conference control mechanism.

6. User input

Years ago, the PSTN used rotary pulses in order to signal phone numbers. Later, it moved to DTMF as a way of improving speed and accuracy. Those are both sent as part of the media flow, since there was only a single "bearer" in the PSTN over which both voice and signaling could be sent to a person's home or business.

However, within the more advanced networks, user input (e.g., presses of the digits on the telephone keypad) were separated and sent over the signaling links. In the SS7 network, for example, the phone number that the user dialed is sent in an IAM message, but subsequent key presses would not be extracted — the tones would just go through the bearer path. In H.323, though, the ITU recognized that we had an opportunity to "do it right", so the DTMF key presses were separated and transmitted over the signaling path.

With SIP, there is actually no one standard for how to send DTMF or other user input. DTMF might be sent using any one of the INFO method, RFC 4733, KPML, or via the audio stream. Of these, the most preferred appears to be RFC 4733, which relies on sending tone descriptions through the bearer path. It is really a step backward in terms of evolution.

And what about input from other applications other than "voice"? Well, SIP does not really have any. But, you can be sure that it will be something different than what is done for voice. SIP was not really designed to be a multimedia communication system and many such issues were not fully thought through.

7. Too many flavors of SIP

One of the biggest hurdles faced when deploying SIP systems is the fact that every implementer has a different interpretation of the SIP specifications. The second is that there are so many options one might implement. There are lots of optional extensions available for SIP and not every implementers implements every options, nor is there a generally-agreed subset.

Well, that is not entirely correct. Various organizations, like the SIP Forum, ETSI, ITU, and ATIS, have tried to address this problem by essentially defining a "profile" of SIP that specifies the various features that must be implemented. However, the problem still exists, because different organizations define different profiles with different feature sets!

So, there are various interpretations of SIP and various standards bodies and "want to be" standards bodies producing conflicting profiles of SIP that leads to a never-ending problem for those who want to deploy SIP in practice.