Packetizer  Celebrating 20 Years

SIP Call Flows

Many have seen the call flow shown that popularized the notion that SIP is a simple protocol. It is the one shown in Figure 1.

A                     B
INVITE -->
                    <-- 200 OK
ACK -->
Figure 1

But, this is hardly the typical call flow. In order to establish a basic call between two entities, provisional responses are necessary. Further, it is necesary that those responses be reliable. A typical call between two endpoints looks like Figure 2.

A                     B
INVITE -->
                    <-- 100 Trying
                    (Which is never acknowledged)

                    <-- 180 Progress
PRACK -->
                    <-- 200 OK (for PRACK)
                    <-- 183 Ringing
PRACK -->
                    <-- 200 OK (for PRACK)
                    <-- 200 OK (for INVITE)
ACK -->
Figure 2

That suddenly looks more comlex. All of these extra mesages exist only so that a device can place a call from A to B, just like in Figure 1. No new functionality was added: it merely guaranteed that messaages get delivered.

But, how many useful phones do you know that live in isolation? None, of course. Phones rely on various devices in the network like SBCs and SIP proxy devices in order to get a call from point A to B. Consider the addition of a single SIP proxy: an important device that is necesary in order to help endpoints (or "user agents") to establish a call between themselves. Refer to Figure 3.

A                   Proxy                   B
INVITE -->
                    <-- 100 Trying
                    (Which is never acknowledged)
                    
                                            <-- 100 Trying
                                            (Which is never acknowledged)

                                            <-- 180 Progress
                    <-- 180 Progress
PRACK -->
                    PRACK -->
                                            <-- 200 OK (for PRACK)
                    <-- 200 OK (for PRACK)
                                            <-- 183 Ringing
                    <-- 183 Ringing
PRACK -->
                    PRACK -->
                                            <-- 200 OK (for PRACK)
                    <-- 200 OK (for PRACK)
                                            <-- 200 OK (for INVITE)
ACK -->
                    ACK -->
Figure 3

This is generally the minimum level of complexity required to get a basic voice call working in an operating network. What is not shown here, though, are the message elements (details), SDP signaling and offer/answer model interactions that often lead to even more complex flows and interoperability issues.

An organization called 3GPP has defined a technology called IMS, which secifies the use of multiple proxies. In fact, to place a call between two endpoints, there may be as many as five proxies in the signaling path between two endpoints. Just imagine how many message have to be transmitted on the wire in order to establish a basic call in such an environment! (We don't even want to draw it on a web page, but we do have a sample presentation showing an IMS flow.)

Perhaps what is most frustrating is the failure of the most basic call. If device A happens to only offer a codec the other device does not support, the call fails! See Figure 4 for an example.

A                               B
INVITE (G.729, G.723) -->
                                <-- 415 Unsupported media types
Figure 4

That's it. The call is over. In theory, the call is supposed to be tried again, but there is no guarantee the next call will succeed. After all, each call could be directed toward a different gateway device, or different session border controller. Admittedly, if a device tries again, it would likely succeed. In practice, though, many devices do not make a second attempt. This should be more efficient.

And what could be worse than that one? How about this one in Figure 5?

A                               B
INVITE  -->
                                <-- 200 (G.728)
ACK --> (Now what?)
Figure 5

In this flow, the caller did not offer a codec, which is legal and is referred to as "delayed offer". The answering device return a 200 with a proposed codec that the caller does not understand. So, the call is up, but nobody can communicate.

In short, SIP call flows are hardly simple. Real-world call flows are very complex—much more complex than these—and interoperability problems abound. Here, we did not even get into the contents of these various messages. Loss of a message results in delay in establishing the call, there is risk of media clipping, risk of call failure for basic reasons, and no end of troubleshooting this complex network.