Site Menu

SIP Issues

SIP is plagued with all kinds of issues. On this page, we will highlight some of them. And, yes, we will also admit that in many cases there are ways to address these issues. The problem, though, is that there is usually no universal way to address a given issue. We are also aware that in some cases the SIP specification does not create the problem, but rather it is a related RFC. This would be a correct assertion. However, when building a SIP system, one cannot just implement the SIP specification, but must also implement all of the related system specifications in order to ship a product. That means that things like the offer/answer model, SIP session timers, reliable provisional messages, and other functionality is necessary and, as such, we speak of problems with SIP as a system, not a single RFC.

1. Interoperability

For years and years, there have been regular SIP interoperability events. Even so, SIP still suffers significantly more from interoperability problems that most IP-based communication systems. Sure, basic voice calls work, but as soon as one tries to do something more complex, it becomes a real challenge.

2. Codec negotiation

SIP almost has no means of negotiating capabilities or codecs. It does, of course, but it's very fragile. For example, suppose user A calls user B and offers G.729. If user B does not support that codec, then the call fails. That's it. It's over. Is that a carrier-class solution? We do not think so.

What many people do is simply offer multiple codecs in the original "offer", such as G.729 and G.711. In most cases, the called device will accept just one of the offered codecs. According to the offer/answer model, the called device should send the most preferred codec. In reality, many devices just sends what they want. That's actually no worse than most other systems, anyway. However, it is also possible for a called device to accept all of the proposed codecs and to switch between codecs. While that is perfectly legal, it would present most systems with a lot of problems and implementers have to write software to handle this.

There has been some hard work put into trying to improve the capability negotiation of SIP. However, based on history it is quite unlikely that we will see significant improvements in practice. More likely than not, SIP systems will be somewhat constrained in terms of what capabilities it can offer and use in order to ensure backward compatibility with what has already been installed.

3. Parsing messages

SIP defines its own syntax language, which means that everyone has to write a parser by hand. This leads to a lot of interoperability problems at the very core of the protocol. Developers should not have to be worried with how to parse a SIP message and SDP payloads (which, by the way, have an entirely different syntax) in order to do something. It is amazing how many systems today cannot properly parse a complete SIP message. There are some valid syntax constructs that will simply cause calls to fail or devices to crash!

4. Slow error recovery

If SIP is implemented using UDP, as many systems are, then when a message is lost, it can take a very long time to recover. With the more complex IMS system, it takes even more time to recover. SIP tries to place nice on the network, which is an admirable objective. The problem is that users want to get calls through in a matter of seconds and waiting 45 seconds or longer to discover that the far end device is disconnected is unacceptable.

5. No conference control

To build a workable solution, one needs more than just establishing a session. One needs to allow for control over that session. For example, if one end observes traffic congestion, it needs a way to indicate that video transmission bandwidth should be reduced, for example. Or, if a video frame is lost, it needs to report that quicky in order to ensure proper video display. SIP lacks any kind of conference control mechanism.

6. User input

Years ago, the PSTN used rotary pulses in order to signal phone numbers. Later, it moved to DTMF as a way of improving speed and accuracy. Those are both sent as part of the media flow, since there was only a single "bearer" in the PSTN over which both voice and signaling could be sent to a person's home or business.

However, within the more advanced networks, user input (e.g., presses of the digits on the telephone keypad) were separated and sent over the signaling links. In the SS7 network, for example, the phone number that the user dialed is sent in an IAM message, but subsequent key presses would not be extracted — the tones would just go through the bearer path. In H.323, though, the ITU recognized that we had an opportunity to "do it right", so the DTMF key presses were separated and transmitted over the signaling path.

With SIP, there is actually no one standard for how to send DTMF or other user input. DTMF might be sent using any one of the INFO method, RFC 4733, KPML, or via the audio stream. Of these, the most preferred appears to be RFC 4733, which relies on sending tone descriptions through the bearer path. It is really a step backward in terms of evolution.

And what about input from other applications other than "voice"? Well, SIP does not really have any. But, you can be sure that it will be something different than what is done for voice. SIP was not really designed to be a multimedia communication system and many such issues were not fully thought through.

Copyright © 2008 • Packetizer, Inc.