VoIP application development: view from inside
(originally written in Russian and published on http://www.nag.ru/)
The news about Google announcing Libjingle , an open source library for implementing voice applications compatible to their IM Google Talk, was treated by me a bit skeptical. A lot of proprietary communication protocols are known nowadays, but who actually cares about them, especially in open source community? Nonetheless, as soon as I’ve got some spare time (late February of 2006) I decided to give it a try, and I was much surprised. It appeared that guys from Google instead of inventing their own wheel, as it usually happens with corporate developers, they have focused their development on supporting already known and publicly available (and loved) standards. One of them is an Extensible Messaging and Presence Protocol (XMPP), which is very much used for implementing IMs, and was enhanced by Google Labs for supporting audio flows. The resulting protocol was approved by Jabber community and made publicly available as JEP-0167 . The other well-known protocol in Libjingle is a good old Real-time Transport Protocol (RTP) , used for transmitting and receiving media across the network, which is also the base for modern IP-telephony. RTP was very much improved by Libjingle developers, STUN and ICE  protocols were implanted to support a higher interoperability (“passability” I would say) through various NAT/Firewall configurations, which is very important nowadays. Currently it is rare to find a couple of VoIP devices or applications, that can interoperate to each other while being put behind two NATs, each behind its ISP’s Firewall. Needless to say that such network configuration is de-facto standard today.
After jingling with some other libraries Libjingle depends on, I finally could build and run Libjingle on a Linux box. An attempt to build Libjingle on FreeBSD (which is more preferable to me) did not come to success due to dependencies on some libraries FreeBSD misses . Besides, there’re a lot of problems in running unadopted PThreads code on FreeBSD. What I could get while running ‘local’ and ‘call’ tools  from Libjingle package on FreeBSD were coredumps only. Under Linux everything was clear, so I could connect to Google’s XMPP server and talk to my colleague’s Google Talk running on his Windows box. While I was talking to my colleague, the voice quality was excellent, but it made me doubting about chosen codec, so I fired up a tcpdump tool which proved that Google Talk and Libjingle use PCMU/G.711u codec by default. Sure this is the best codec to use in local networks based on FastEthernet, but no doubt it would not work (or give best voice quality) on low bandwidth/poor links due to high encoding bitrate and higher affection to network fluctuations. Playing with Libjingle and Google Talk a bit more, I have discovered that there’re a lot more codecs implemented in them, like GSM and iLBC, but I still could not find what rule is used to choose one codec or another . None of G.729 nor G.723.1 codecs were found in Libjingle due to their proprietary nature, I believe. So as it seems Google does not like to bother solving licensing problems, which is good. Besides, according to Google’s site they are going to support a series of Speex codecs , those give better voice quality than G.729 on the same bitrate, plus they are license/patent free.
with Libjingle a bit more I have stumbled upon a
simple (as it seemed to me that time) idea – to make a simple voice gateway
between Google Talk and main-stream VoIP. In other words, to accept voice calls using Jingle
Audio protocol and make them pass to H.323 or SIP. It must be said here, that
working as a system administrator with a number of ISPs and ITSPs
My bad experience with Libjingle could not prevent me from developing the idea of implementing Google Talk to VoIP gateway (which was growing up with new features daily). The way of my thoughts was the following: since Google Talk uses open standards like RTP and open audio codecs (G.711, GSM, iLBC), and all these are already available in OpenH323 library, so why not just take an OpenH323 library as a base and write a bunch of classes which could implement Jingle Audio signaling in terms of OpenH323/PWLIB primitives. In that way we will get a unified architecture suitable for developing VoIP applications using H.323 and Jingle Audio like gateway I had in mind. In a week I coded all the Jingle Audio signaling classes, which was pretty simple. More troublesome was implementing and testing STUN and ICE stuff (the smell of Cisco’s Idian cuisine can still be felt). It also required to have an audio mixer and transcoder classes for implementing audio gateway. These were coded quite fast too. So, in a month I got a rough, but working library I could rely on.
testing of the newly developed library extensions took place right after
development. We tested it in many network environments, through FastEthernet networks to dialup modems. It was discovered
that triple voice transcodings like Linear->iLBC->G.729->G.711->Linear can lead to voice
quality degradation, but it still can be useful. We tried to terminate voice
calls to some ITSPs in
After stress-testing my library (which I code-named as libJungle), an upper level was added: the Finite State Machine (FSM) to handle bulk calls and implementing command-event based call routing. FSM also made developing process easy and rapid. Thus, new features that my colleague, Eugeny Korolenko, and I had in mind, like voicemail, voice conferencing, gatewaying to other protocols (SIP), and implementing billing system started growing really fast. In March of 2006 we have announced a publicly available service on the Net called GTalk2VoIP. You can read about it at http://www.gtalk2voip.com/ and test it right from your Google Talk just by inviting user email@example.com and sending HELP command into its chat window.
We would like to invite people who are interested in further development of our services or using them in some related open source and commercial projects to join . If you are interested, please mail to us. Development of voice services using FSM can be really fast.
Ruslan Zalata, firstname.lastname@example.org
Originally, I mean, at the moment of my writing this article, our gateway could do only JingleAudio->H.323 gatewaying which was used mostly for making calls to PSTN. But as some time passed by, a lot of extensions have been made. We have coded SIP classes for OpenH323 project, so now we do gatewaying among three different VoIP standards: Jingle, H.323 and SIP. We have also added XMPP Transport as a way of connecting to XMPP network, so Google Talk users can add phone numbers to their roster. We have written a Google Destop 3 plugin which can be used by Google Talk users to access our public services from desktop. And a lot, a lot more.
Now I also know that Google’s implementation of Jingle Audio is a bit different from the one (JEP-0167) approved by Jabber community.
We are currently working hard on MIDP2.0 application with Jingle Audio support. Maybe we could port our libJungle to J2ME.
Dated: 5th of June 2006
Address Translator (NAT) Traversal for Offer/Answer Protocols.