Use Presence of Audio, not Packets, to Determine Who Has Floor (Q33)

The information in this article applies to:

At least one H.323 MCU determines who has the floor based on the presence of G.723.1 packets, not the presence of audio, per se, but this does not work with most endpoints. As a matter of fact, it only works with a terminal from one vendor in particular who didn't implement G.723.1 correctly. To compound the problem, this MCU also does not advertise support for silence suppression in its receive capabilities, so correct implementations of G.723.1 will never relinquish the floor to the above-mentioned terminal because they always (correctly) send audio packets, even during periods of silence.

The correct way to do this is to monitor audio streams for the presence of an actual audio signal (sound), not just the presence of audio packets. This is admittedly more difficult and CPU-intensive, but I see no way around it. If an MCU could be sure that all endpoints in a conference support silenceSuppression (are using silence frames, or SIDs) to mark the onset of silence, it could ostensibly use the lack of packets following a SID to reliably indicate silence. However, not all endpoints support silence suppression, and those that do are likely to have different thresholds for determining whether there is silence.