VoIP application development: view from inside

                (originally written in Russian and published on http://www.nag.ru/)

 

            The news about Google announcing Libjingle [1], an open source library for implementing voice applications compatible to their IM Google Talk, was treated by me a bit skeptical. A lot of proprietary communication protocols are known nowadays, but who actually cares about them, especially in open source community? Nonetheless, as soon as I’ve got some spare time (late February of 2006) I decided to give it a try, and I was much surprised. It appeared that guys from Google instead of inventing their own wheel, as it usually happens with corporate developers, they have focused their development on supporting already known and publicly available (and loved) standards. One of them is an Extensible Messaging and Presence Protocol (XMPP), which is very much used for implementing IMs, and was enhanced by Google Labs for supporting audio flows. The resulting protocol was approved by Jabber[2] community and made publicly available as JEP-0167 [3]. The other well-known protocol in Libjingle is a good old Real-time Transport Protocol (RTP) [4], used for transmitting and receiving media across the network, which is also the base for modern IP-telephony. RTP was very much improved by Libjingle developers, STUN and ICE [5] protocols were implanted to support a higher interoperability (“passability” I would say) through various NAT/Firewall configurations, which is very important nowadays. Currently it is rare to find a couple of VoIP devices or applications, that can interoperate to each other while being put behind two NATs, each behind its ISP’s Firewall. Needless to say that such network configuration is de-facto standard today.

 

            After jingling with some other libraries Libjingle depends on, I finally could build and run Libjingle on a Linux box. An attempt to build Libjingle on FreeBSD (which is more preferable to me) did not come to success due to dependencies on some libraries FreeBSD misses [6]. Besides, there’re a lot of problems in running unadopted PThreads code on FreeBSD. What I could get while running ‘local’ and ‘call’ tools [7] from Libjingle package on FreeBSD were coredumps only. Under Linux everything was clear, so I could connect to Google’s XMPP server and talk to my colleague’s Google Talk running on his Windows box. While I was talking to my colleague, the voice quality was excellent, but it made me doubting about chosen codec, so I fired up a tcpdump tool which proved that Google Talk and Libjingle use PCMU/G.711u codec by default. Sure this is the best codec to use in local networks based on FastEthernet, but no doubt it would not work (or give best voice quality) on low bandwidth/poor links due to high encoding bitrate and higher affection to network fluctuations. Playing with Libjingle and Google Talk a bit more, I have discovered that there’re a lot more codecs implemented in them, like GSM and iLBC, but I still could not find what rule is used to choose one codec or another [8]. None of G.729 nor G.723.1 codecs were found in Libjingle due to their proprietary nature, I believe. So as it seems Google does not like to bother solving licensing problems, which is good. Besides, according to Google’s site they are going to support a series of Speex codecs [9], those give better voice quality than G.729 on the same bitrate, plus they are license/patent free.

 

            Playing with Libjingle a bit more I have stumbled upon a simple (as it seemed to me that time) idea – to make a simple voice gateway between Google Talk and main-stream VoIP. In other words, to accept voice calls using Jingle Audio protocol and make them pass to H.323 or SIP. It must be said here, that working as a system administrator with a number of ISPs and ITSPs in Tyumen, I had an experience developing VoIP applications using well-known OpenH323 library [10]. Basically, all that had to be done to realize my idea, as I thought, was to tie up two libraries Libjingle and OpenH323. But as it appeared later it was sort of impossible due to a number of reasons, including lack of Libjingle documentation, problems with PThread synchronization and use, etc. During the exploration of Libjingle code I have also found another limitation, which could ruin my idea in essence. The thing is that interconnection to oRTP library inside of Libjingle code was made with statical UDP port assignment. I presume, that this part of the code has been changed in newer versions of Libjingle, but at that time I was so unpleasantly surprised, that I decided that Libjingle does not suit for any serious use, except some end-user’s applications (Psi, Tapioca and some others already released support for Jingle Audio in their IM products based on Libjingle code).

 

            My bad experience with Libjingle could not prevent me from developing the idea of implementing Google Talk to VoIP gateway (which was growing up with new features daily). The way of my thoughts was the following: since Google Talk uses open standards like RTP and open audio codecs (G.711, GSM, iLBC), and all these are already available in OpenH323 library, so why not just take an OpenH323 library as a base and write a bunch of classes which could implement Jingle Audio signaling in terms of OpenH323/PWLIB primitives. In that way we will get a unified architecture suitable for developing VoIP applications using H.323 and Jingle Audio like gateway I had in mind. In a week I coded all the Jingle Audio signaling classes, which was pretty simple. More troublesome was implementing and testing STUN and ICE stuff (the smell of Cisco’s Idian cuisine can still be felt). It also required to have an audio mixer and transcoder classes for implementing audio gateway. These were coded quite fast too. So, in a month I got a rough, but working library I could rely on.

 

            Field testing of the newly developed library extensions took place right after development. We tested it in many network environments, through FastEthernet networks to dialup modems. It was discovered that triple voice transcodings like Linear->iLBC->G.729->G.711->Linear can lead to voice quality degradation, but it still can be useful. We tried to terminate voice calls to some ITSPs in Tyumen, in Moscow and worldwide and found that the voice quality was not the best, but acceptable for offering commercial service. It is known that the best possible voice quality can be reached on a gateway that does not do any audio transcodings. In case of Google Talk that’s kind of impossible, since main-stream codecs G.729 and G.723.1 are unsupported in Google Talk (and will never be). So, a lot more tests have to be made to find out how transcoding from/to different codecs influence voice quality to find the best possible way of encoding voice.

 

            After stress-testing my library (which I code-named as libJungle), an upper level was added: the Finite State Machine (FSM) to handle bulk calls and implementing command-event based call routing. FSM also made developing process easy and rapid. Thus, new features that  my colleague, Eugeny Korolenko, and I had in mind, like voicemail, voice conferencing, gatewaying to other protocols (SIP), and implementing billing system started growing really fast. In March of 2006 we have announced a publicly available  service on the Net called GTalk2VoIP. You can read about it at https://gtalk2voip.com/ and test it right from your Google Talk just by inviting user and sending HELP command into its chat window.

 

            We would like to invite people who are interested in further development of our services or using them in some related open source and commercial projects to join . If you are interested, please mail to us. Development  of voice services using FSM can be really fast.

 

            Ruslan Zalata,

            Tyumen, Russia

            March 2006.

 

PS:

            Originally, I mean, at the moment of my writing this article, our gateway could do only JingleAudio->H.323 gatewaying which was used mostly for making calls to PSTN. But as some time passed by, a lot of extensions have been made. We have coded SIP classes for OpenH323 project, so now we do gatewaying among three different VoIP standards: Jingle, H.323 and SIP. We have also added XMPP Transport as a way of connecting to XMPP network, so Google Talk users can add phone numbers to their roster. We have written a Google Destop 3 plugin which can be used by Google Talk users to access our public services from desktop. And a lot, a lot more.

            Now I also know that Google’s implementation of Jingle Audio is a bit different from the one (JEP-0167) approved by Jabber community.

            We are currently working hard on MIDP2.0 application with Jingle Audio support. Maybe we could port our libJungle to J2ME.

 

 

Dated: 5th of June 2006

         

 

 

  1. Libjingle - Google Talk Voice and P2P Interoperability Library.

 

  1. Jabber Software Foundation.

      http://www.jabber.org/

 

  1. JEP-0167: Jingle Audio Media Description Format.

     

 

  1. Real-time Transport Protocol. http://www.faqs.org/rfcs/rfc1889.html

 

  1. Interactive Connectivity Establishment (ICE): A Methodology for Network

      Address Translator (NAT) Traversal for Offer/Answer Protocols.

      http://www.jdrosen.net/papers/draft-ietf-mmusic-ice-07.txt

 

  1. It is known that there’s a FreeBSD port of Libjingle, but at that time I was playing with Libjingle 0.1.0, which could not be run on *BSD without significant modifications.

 

  1. I have patched the source not to use libasound and stuff, so building Libjingle finally worked.

 

  1. I believe that both Libjingle and Google Talk use to count some link quality metrics, based on them a proper codec is selected.

 

  1. Speex is an Open Source/Free Software patent-free audio compression format designed for speech. http://www.speex.org/

 

  1. OpenH323 and PWLIB Projects. http://www.openh323.org/