VoIP
application development: view from inside
(originally
written in Russian and published on http://www.nag.ru/)
The
news about Google announcing Libjingle
[1], an open source library for implementing voice applications compatible to
their IM Google Talk, was treated by me a bit
skeptical. A lot of proprietary communication protocols are known nowadays, but
who actually cares about them, especially in open source community?
Nonetheless, as soon as I’ve got some spare time (late February of 2006) I
decided to give it a try, and I was much surprised. It appeared that guys from Google instead of inventing their own wheel, as it usually
happens with corporate developers, they have focused their development on
supporting already known and publicly available (and loved) standards. One of
them is an Extensible Messaging and Presence Protocol (XMPP), which is very
much used for implementing IMs, and was enhanced by Google Labs for supporting audio flows. The resulting
protocol was approved by Jabber[2] community and made
publicly available as JEP-0167 [3]. The other well-known protocol in Libjingle is a good old Real-time Transport Protocol (RTP)
[4], used for transmitting and receiving media across the network, which is
also the base for modern IP-telephony. RTP was very much improved by Libjingle developers, STUN and ICE [5] protocols were
implanted to support a higher interoperability (“passability”
I would say) through various NAT/Firewall configurations, which is very important nowadays. Currently it is rare to find a
couple of VoIP devices or applications, that can
interoperate to each other while being put behind two NATs,
each behind its ISP’s Firewall. Needless to say that such network configuration
is de-facto standard today.
After
jingling with some other libraries Libjingle depends
on, I finally could build and run Libjingle on a
Linux box. An attempt to build Libjingle on FreeBSD
(which is more preferable to me) did not come to success due to dependencies on
some libraries FreeBSD misses [6]. Besides, there’re a lot of problems in
running unadopted PThreads
code on FreeBSD. What I could get while running ‘local’ and ‘call’ tools [7]
from Libjingle package on FreeBSD were coredumps only. Under Linux everything was clear, so I
could connect to Google’s XMPP server and talk to my
colleague’s Google Talk running on his Windows box.
While I was talking to my colleague, the voice quality was excellent, but it
made me doubting about chosen codec, so I fired up a tcpdump
tool which proved that Google Talk and Libjingle use PCMU/G.711u codec by default. Sure this is
the best codec to use in local networks based on FastEthernet,
but no doubt it would not work (or give best voice quality) on low
bandwidth/poor links due to high encoding bitrate and
higher affection to network fluctuations. Playing with Libjingle
and Google Talk a bit more, I have discovered that
there’re a lot more codecs implemented in them, like
GSM and iLBC, but I still could not find what rule is
used to choose one codec or another [8]. None of G.729 nor
G.723.1 codecs were found in Libjingle
due to their proprietary nature, I believe. So
as it seems Google does not like to bother solving
licensing problems, which is good. Besides, according to Google’s
site they are going to support a series of Speex codecs [9], those give better voice quality than G.729 on
the same bitrate, plus they are license/patent free.
Playing
with Libjingle a bit more I have stumbled upon a
simple (as it seemed to me that time) idea – to make a simple voice gateway
between Google Talk and main-stream VoIP. In other words, to accept voice calls using Jingle
Audio protocol and make them pass to H.323 or SIP. It must be said here, that
working as a system administrator with a number of ISPs and ITSPs
in
My
bad experience with Libjingle could not prevent me
from developing the idea of implementing Google Talk
to VoIP gateway (which was growing up with new
features daily). The way of my thoughts was the following: since Google Talk uses open standards like RTP and open audio codecs (G.711, GSM, iLBC), and all
these are already available in OpenH323 library, so why not just take an
OpenH323 library as a base and write a bunch of classes which could implement
Jingle Audio signaling in terms of OpenH323/PWLIB primitives. In that way we
will get a unified architecture suitable for developing VoIP
applications using H.323 and Jingle Audio like gateway I had in mind. In a week
I coded all the Jingle Audio signaling classes, which was pretty simple. More
troublesome was implementing and testing STUN and ICE stuff (the smell of
Cisco’s Idian cuisine can still be felt). It also
required to have an audio mixer and transcoder classes for implementing audio gateway. These
were coded quite fast too. So, in a month I got a rough, but working library I
could rely on.
Field
testing of the newly developed library extensions took place right after
development. We tested it in many network environments, through FastEthernet networks to dialup modems. It was discovered
that triple voice transcodings like Linear->iLBC->G.729->G.711->Linear can lead to voice
quality degradation, but it still can be useful. We tried to terminate voice
calls to some ITSPs in
After
stress-testing my library (which I code-named as libJungle),
an upper level was added: the Finite State Machine (FSM) to handle bulk calls
and implementing command-event based call routing. FSM also made developing
process easy and rapid. Thus, new features that my colleague, Eugeny
Korolenko, and I had in mind, like voicemail, voice
conferencing, gatewaying to other protocols (SIP),
and implementing billing system started growing really fast. In March of 2006
we have announced a publicly available service on the Net called GTalk2VoIP.
You can read about it at http://www.gtalk2voip.com/
and test it right from your Google Talk just by
inviting user service@gtalk2voip.com
and sending HELP command into its chat window.
We
would like to invite people, who are interested in further
development of our services or using them in some related open source and
commercial projects to join . If you are interested,
please mail to us. Development of voice services using FSM can be
really fast.
Ruslan Zalata, team@gtalk2voip.com
March
2006.
PS:
Originally,
I mean, at the moment of my writing this
article, our gateway could do only JingleAudio->H.323
gatewaying which was used mostly for making calls to
PSTN. But as some time passed by, a lot of extensions have been made. We have
coded SIP classes for OpenH323 project, so now we do gatewaying
among three different VoIP standards: Jingle, H.323
and SIP. We have also added XMPP Transport as a way of connecting to XMPP
network, so Google Talk users can add phone numbers
to their roster. We have written a Google Destop 3 plugin which can be used
by Google Talk users to access our public services
from desktop. And a lot, a lot more.
Now I also know that Google’s
implementation of Jingle Audio is a bit different from the one (JEP-0167)
approved by Jabber community.
We
are currently working hard on MIDP2.0 application with Jingle Audio support.
Maybe we could port our libJungle to J2ME.
Dated: 5th of June 2006
http://www.jabber.org/jeps/jep-0167.html
Address
Translator (NAT) Traversal for Offer/Answer Protocols.
http://www.jdrosen.net/papers/draft-ietf-mmusic-ice-07.txt