Understanding SIP-Based VoIP
Version 1.0

1. What Does VoIP Mean

Since people started using digital voice coding, such as ISDN, they have thought about convergence between telephony and IT environment in order to transmit data, voice and video applications using one and the same medium. Unfortunately, each of these applications has different needs. Data transmission requires variable line bandwidths and doesn't care for reliability of connection, and, on the contrary, voice and video transmissions need a constant bandwidth and guaranteed time of delivery.

The structures of available networks are different and only meet the needs of the application they have been created for. In data networks, everyone can use the available bandwidth to the maximum extent, which means that the line capacity is exploited efficiently. On the contrary, telephone network reserves a channel per call regardless of data transmission (e.g. one party is speaking only during a standard call -> there is no data transmission from the non-speaking party but the channel keeps busy).

A lot of voice and video transmission technologies using real-time IP networks (Internet), generally called VoIP(Voice over IP), have been developed as an alternative to the standard circuit-switching telephone network. As a result of natural selection, only two of them are now implemented in telecommunications, which improves interaction and compatibility of products from different companies. These two technologies are H.323 and SIP.

2. Signal and Voice Paths

Voice and signal communication channels are strictly separated in the VoIP network. Signal sessions are mostly administered by a server, which replaces a standard PBX in the IP environment. The voice stream is created point-to-point between end sides. For better understanding see the following diagram.

SIP Signaling and Media Paths

2.1 Voice and Video Transmission in VoIP

As mentioned above, constant bandwidth, guaranteed time of packet delivery (also called jitter) and correct sequence are necessary for successful voice transmission. We need not worry about delivery of any packet during voice transmission because mathematical methods used for voice signal coding and decoding can make approximation when a packet has not been delivered. Thus, we can use the UDP for voice stream transmission, which has no acknowledgement of delivered packets, but in any case we need a protocol that is responsible for voice coding, jitter, sequence order and bandwidth. This protocol is called RTP (Realtime Transport Protocol) and is widely used for voice transmission in modern VoIP networks. Its task is to transmit data (voice) from the source to the proper destination at real time. So-called codecs are used to save the data bandwidth by reducing the transmission rate using a complex algorithm. The level of compression used by the codec affects the quality of the transmitted voice. This means that the wider the data bandwidth (the higher the transmission rate), the higher the voice transmission quality. The voice transmission quality is measured by the MOS (Mean Opinion Score), where 1 means the worst and 5 the best quality. For a list of VoiceBlue-supported codecs see the table below:

Standard Algorithm Transmission Rate MOS
G.711 PCM 64 4.1
G.726 ADPCM 32 3.85
G.729 CS-ACELP 8 3.92
G723.1 ACELP 5.3 3.56

2.2 SIP as a Signalling Protocol

The SIP (Session Initiation Protocol) is a text-based protocol, similar to the HTTP and SMTP, designed for initiating, maintaining and terminating of interactive communication sessions between users. Such sessions include voice, video, chat, interactive games, and virtual reality.

The SIP defines and uses the following components:

  • UAC (User agent client) – client in the terminal that initiates SIP signalling
  • UAS (User agent server) – server in the terminal that responds to the SIP signalling from the UAC
  • UA (User Agent) – SIP network terminal (SIP telephones, or gateway to other networks), contains UAC and UAS
  • Proxy server – receives connection requests from the UA and transfers them to another proxy server if the particular station is not in its administration
  • Redirect server – receives connection requests and sends them back to the requester including destination data instead of sending them to the calling party
  • Location Server – receives registration requests from the UA and updates the terminal database with them.

All server sections (Proxy, Redirect, Location) are typically available on a single physical machine called proxy server, which is responsible for client database maintenance, connection establishing, maintenance and termination, and call directing.

Basic messages sent in the SIP environment

  • INVITE – connection establishing request
  • ACK – acknowledgement of INVITE by the final message receiver
  • BYE – connection termination
  • CANCEL – termination of non-established connection
  • REGISTER – UA registration in SIP proxy
  • OPTIONS – inquiry of server options

Answers to SIP messages are in the digital format like in the http protocol. Here are the most important ones:

  • 1XX – information messages (100 – trying, 180 – ringing, 183 – progress)
  • 2XX – successful request completion (200 – OK)
  • 3XX – call forwarding, the inquiry should be directed elsewhere (302 – temporarily moved, 305 – use proxy)
  • 4XX – error (403 – forbidden)
  • 5XX – server error (500 – Server Internal Error, 501 – not implemented)
  • 6XX – global failure (606 – Not Acceptable)

Connection establishing and terminating procedures in the SIP proxy server environment:

SIP Call Flow with Proxy

Jan Mastalir, DiS.
Technical support