Version 6.4  153  March 2012 
SIP User's Manual   12. Media 
12.3.1  Configuring Dynamic Jitter Buffer Operation 
Voice frames are transmitted at a fixed rate. If the frames arrive at the other end at the 
same rate, voice quality is perceived as good. In many cases, however, some frames can 
arrive slightly faster or slower than the other frames. This is called jitter (delay variation), 
and degrades the perceived voice quality. To minimize this problem, the device uses a jitter 
buffer. The jitter buffer collects voice packets, stores them and sends them to the voice 
processor in evenly spaced intervals. 
The device uses a dynamic jitter buffer that can be configured using the following 
parameters: 
  Minimum delay: DJBufMinDelay (0 msec to 150 msec) 
Defines the starting jitter capacity of the buffer. For example, at 0 msec, there is no 
buffering at the start. At the default level of 10 msec, the device always buffers 
incoming packets by at least 10 msec worth of voice frames. 
  Optimization Factor: DJBufOptFactor (0 to 12, 13) 
Defines how the jitter buffer tracks to changing network conditions. When set at its 
maximum value of 12, the dynamic buffer aggressively tracks changes in delay (based 
on packet loss statistics) to increase the size of the buffer and doesn’t decay back 
down. This results in the best packet error performance, but at the cost of extra delay. 
At the minimum value of 0, the buffer tracks delays only to compensate for clock drift 
and quickly decays back to the minimum level. This optimizes the delay performance 
but at the expense of a higher error rate. 
The default settings of 10 msec Minimum delay and 10 Optimization Factor should provide 
a good compromise between delay and error rate. The jitter buffer ‘holds’ incoming packets 
for 10 msec before making them available for decoding into voice. The coder polls frames 
from the buffer at regular intervals in order to produce continuous speech. As long as 
delays in the network do not change (jitter) by more than 10 msec from one packet to the 
next, there is always a sample in the buffer for the coder to use. If there is more than 10 
msec of delay at any time during the call, the packet arrives too late. The coder tries to 
access a frame and is not able to find one. The coder must produce a voice sample even if 
a frame is not available. It therefore compensates for the missing packet by adding a Bad-
Frame-Interpolation (BFI) packet. This loss is then flagged as the buffer being too small. 
The dynamic algorithm then causes the size of the buffer to increase for the next voice 
session. The size of the buffer may decrease again if the device notices that the buffer is 
not filling up as much as expected. At no time does the buffer decrease to less than the 
minimum size configured by the Minimum delay parameter. 
For certain scenarios, the Optimization Factor is set to 13: One of the purposes of the 
Jitter Buffer mechanism is to compensate for clock drift. If the two sides of the VoIP call are 
not synchronized to the same clock source, one RTP source generates packets at a lower 
rate, causing under-runs at the remote Jitter Buffer. In normal operation (optimization factor 
0 to 12), the Jitter Buffer mechanism detects and compensates for the clock drift by 
occasionally dropping a voice packet or by adding a BFI packet. 
Fax and modem devices are sensitive to small packet losses or to added BFI packets. 
Therefore, to achieve better performance during modem and fax calls, the Optimization 
Factor should be set to 13. In this special mode the clock drift correction is performed less 
frequently - only when the Jitter Buffer is completely empty or completely full. When such 
condition occurs, the correction is performed by dropping several voice packets 
simultaneously or by adding several BFI packets simultaneously, so that the Jitter Buffer 
returns to its normal condition.