Coping with online conferencing
Image it's around 8pm and you and a group of other people have to give a presentation tomorrow. There is no time for an in-person meeting and you need to discuss and check around 30 slides with them. Everyone gets on a conference call in Skype and you get two slides done when half the people are randomly being dropped from the call again and again. Odd, seeing as everyone is at home directly connected to their own broadband connection.
OK, no problem, your university has an online meeting service where you can share your desktop and it has voice support. Now there is one member of your group that gets about 50% traffic loss and can't participate. OK, you're inventive, you call the remaining person on your cell phone put him on speakerphone to join that way. At times it produces horrible echo for all involved, the voice drops out every so often for other participants, too, but at least it doesn't take as long as typing each single issue in a chat and having everyone wait for replies.1 So why is this a problem in 2009?
Echo cancellation
The advent of mostly suffcient echo cancellation in many VOIP products has lead to many users relying on internal microphones and speakers rather than a dedicated headset. This works well in the generic scenario of using one VOIP application and nothing else and letting it handle any multi-participant issues.
When you start to combine different services with audio capabilities at the same time they do get in the way of each other. Unless your audio solution includes gateways to different services you are pretty much out of luck if you want to bridge users that do not share a lowest common denominator. Skype does have an advantage here because at least for paying customer they can also interface with the POTS network. SIP providers would be in a similarly favorable position but the abysmal support for conferencing there removes that option.
Clients
The regular end-user is mostly only confronted with either Skype or a Flash-based browser solution to do online meetings. A wide array of messenger or other client-based solutions exists but none come anywhere close to the distribution of those methods; at least on the campuses I've seen.
The limited nature of Flash as a platform means that most providers use the ability of Flash to record audio and video (with authorization) to send it to a central conference server which then can multiplex and redistribute the stream of all participants to everyone. This is also useful to users with limited upstream bandwidth (at least without IPv6 multicast), they only have to send one stream rather than one for each participant. This issue is getting less and less important (at least for voice) since many broadband providers now allocate 1 megabit or more upstream bandwidth in the US and Europe; sufficient for around 60 participants assuming 16kb/s.
This scheme is, however, vulnerable to degradation in several scenarios. The inefficiencies should be quite clear when one assumes a group of users within close network proximity (say 20ms) who use a service which averages out at 150ms, thus coming to a round-trip of 300ms for every participant; a lag that's definitely noticeable. Even with a provider who has infrastructure closer to you, any number of issues, such as network congestion or server load, could amplify the problem of having to rely on a central server.
To P or not 2 P
Skype, installed on millions of machines, offers easy conferencing but even a moderate amount of load might make group collaboration impossible. The service level varies immensely--sometimes hard to distinguish from expensive dedicated solutions and on other days you feel like Dr Arroway on the Arecibo array--the minor issues in two-party-calls seem to increase exponentially with the number of participants. I would even expect this from the minor knowledge I have of the Skype infrastructure.
Namely, even if the Skype network does not rely on a central architecture and is thus a P2P network, it still needs lots of 'super nodes' to allow for easy NAT traversal at all times. The Baset Schulzrinne paper is a bit dated but gives a good overview on how Skype 'just works' and how this hierarchical structure could induce inefficiencies compared to direct connections. (I have to admit that I don't know if any of the conference participants or 'super nodes' actually multiplexes the streams or if each user gets each individual stream when using Skype. My guess is the latter.)
In my ideal world we would all now have IPv6, symmetric connections, we would directly connect to each machine and achieve the shortest route possible without intermediaries (especially on our smart phones!). Everyone would use a SIP client and all of those come with ZRTP, enabled by default. Even with all that we might still run into problems with providers who don't invest in infrastructure or blatantly disregard net neutrality.
So in conclusion, there really isn't any promising great solution that can bridge even just most of the common issues and problems of isolated services over insufficient and unreliable networks. Back to in-person meetings for now.
This post is also a response to "Do online-seminars & -conferences work?" (german).
1 Pretty much exactly what happened to four of us last week.

