Quantcast
Channel: agouaillard – WebRTC by Dr Alex
Viewing all 90 articles
Browse latest View live

Extending Janus #webrtc bandwidth Management

$
0
0

 

Main Coders and Architects: Sergio Murillo, Lorenzo Miniero,
Facilitator, Motivator, and Secretary: Dr Alex Gouaillard.

Janus Bandwidth management has been incrementally updated to support the latest technologies available in a joint effort between CoSMo and Meetecho, a.k.a. The WebRTC A-Team. This article describes the original design of Janus and its VideoRoom plugin with respect to bandwidth management, and the incremental changes that were needed to bring it to automatic bandwidth estimation and adaptation on the sender side, and availability of simulcast for bandwidth management on the receiver side. A concrete example about how to leverage simulcast with the Janus VideoRoom Plugin is provided for illustration and testing purposes.

— old time Janus —

A Janus media server with the VideoRoom plugin (hereafter “Janus server”) acts as an SFU, by design providing client software feedback at the application level (as in “slow link” events) on bandwidth related problems. This solution works for both sender and receiver side. When you receive one such event from a Janus server, it means that either it is having trouble receiving your packets (publisher) or Janus noticed many incoming nacks/loss notifications (viewers). You can differentiate between both scenarii thanks to a flag in the event, but since when using the VideoRoom plugin all Peer Connections are unidirectional, it should be obvious anyway.

You can then send an event back to slow publishers down, ie., send a request to force a lower REMB. How you compute the new requested bandwidth value is up to you (eg., force half the current bitrate, or cut it by 1/3). Usually you are more aggressive on the ramping down than on the ramping up, as the consequences of overshooting are equally unbalanced: if you put the requested bandwidth usage too high, you will starve the connection and get errors (huge impact on UX), while if you decrease requested bandwidth usage too much, you just get lower quality / resolution stream but without glitches, or freezes.

Ideally you’ll want to ramp it back up again slowly once everything has been fine (lack of slow link event) for a while (stable). Again, finding the right balance between when and how fast you ramp down, and when and how fast  you ramp up is more art than science. If unbalanced or unstable temporary issues will cripple the bitrate forever. That is, among a myriad of other reasons, why the original design of Janus server was delegating the responsibility of bandwidth management to the application, which more often than not knows better about its environment to make the good choice.

Unfortunately, in many other cases, app developers prefer to delegate this to the browser (on sender side), and/or would like to have more options on the receiver side.

— Receiving-side semi-automatic bandwidth management through simulcast —

Simulcast is another bandwidth management option possible today with webrtc mainly targeting receiver side. It’s about getting several video tracks from the same source, at different resolutions. It increases bandwidth and CPU usage on the sender side, but gives flexibility to the Janus server to choose which one to relay through to the receiver, and thus, how much bandwidth will be used on the receiver side.

The Media Server needs to know how they are related and the bandwidth logic. E.G. relation between tracks: if track A is HD, track B is HD/4 (each dimension / 2), and track C is HD / 16, the bandwidth logic could be:

  • supposing bandwidth at start can accommodate HD, start by sending A to the remote peer, and drop B and C;
  • if at any given time the available bandwidth goes down, switch and send B,
  • and so on and so forth.

The relation between the streams needs to be sent (“announced”) by the sender to the media server through signalling first. The SFU can then check on the wire specific identifiers associated with each video track to know the relationship between them (i.e. which one is the high resolution, and which ones are the lower resolution versions of the same original track). The track choosing and relaying logic needs to be implemented in the media server. The switching between tracks to adapt to available receiver side bandwidth can then be automated in the SFU itself, or can be triggered by the receiving side through an API.

Around mid 2017, the Meetecho team added early simulcast support to the Janus server. Despite the fact that Janus typically doesn’t support multiple streams of the same kind in the same PeerConnection (that is, no support for either Plan B nor Unified Plan), with simulcast multiple video resolutions are sent as a video tracks in the same peer connection, thus partially overcoming the original single stream design of Janus. The relationship through the tracks is signalled differently for Chrome and Firefox, although the tracks are in both cases identified through their SSRC. When simulcast was first added, the SFU bandwidth management logic was semi-automatic, and triggered from the receiver side: if the stream  the receiver wants is available, Janus forwards it. Otherwise (e.g. when publisher has low bandwidth and so is only sending some of the streams, not all), Janus switches to the next available stream of lower quality.

For what concerns Simulcast’s tracks bandwidth management on the sender side, Janus server only supported RTCP for the base stream (the “first” SSRC), while it would ignore the other streams that may be there. This was due to the single stream design of Janus, and prevented the implementation of automatic sender-side bandwidth estimation and management in conjunction with simulcast. This was then fixed, and support for RTCP feedback for the additional simulcast tracks was implemented shortly after that. That said, it did not matter, sender-side automated bandwidth estimation and management was not implemented as part of that effort, which meant more work was needed to accomodate that too.

— Sender-side automatic bandwidth estimation and management (a.k.a. Congestion Control) —

There are multiple way to manage the bandwidth on the sender-side. Most of the time the adaptation itself is automated, only the estimation of the bandwidth differs. You can find references to this as congestion control sometimes. Three options co-exists today among browsers: REMB, TMMBR/TMMBN and transport-wide-cc. Long story short, transport-wide-cc is the latest and best available today across most browsers. It uses RTP extensions and feedback packets (RTCP) to pass information that allows the browsers to compute bandwidth estimation. The beauty of it is that it is fully automatic and you have nothing to do to enjoy it. That is, apart for implementing support for the corresponding RTP extensions, RTCP packets and their content.

Following the enabling contribution from Lorenzo, Sergio Murillio then contributed in January the missing pieces to support transport-wide-cc.

Janus has a modular design, and can be used in a multitude of configurations. Not all of them would benefit from having automatic sender bandwidth estimation and management. Moreover, it is always bad for backward compatibility to enable by default an option that modifies the behavior of a server. As a result, it was decided that transport-cc would be negotiated (as all other features are), and that each plugin would have a say in deciding whether to enable it or not, while keeping it off by default. In order to keep things simple and still have a realistic testbed for the new feature, this feature was only added to the VideoRoom plugin, in order to evaluate its effectiveness in scenarios involving multiple publishers and subscribers.

— Play Time is NEVER OVER —

To be able to switch between streams on the receiver side, you need to first enable simulcast on the sender side (publishers).  You just need to send a simulcast SDP for that. Then on the receiver side, you send a configure request to the janus server indicating your layer of interest (note that the term layer is usually for SVC codecs, and not for simulcast, and we conscientiously abuse it here).

You can use the default VideoRoom demo as a reference implementation. Add “?simulcast=true” as a query string to the demo URI to enable simulcast. When simulcast streams are detected, the JS code has buttons to change layer programmatically. In the videoroomtest.js file, on Line #649, one can check the “configure” request. There are two different payloads: temporal and substream which maps to temporal and spatial scalability. They are not mutually exclusive and can both appear in a single request:

{ request: “configure”, sub stream: 1, temporal: 1 }

By default the demo, and google implementation in chrome uses 3 spatial layers and 2 temporal layer.

The Janus VideoRoom plugin will automatically drops to a lower resolution if the requested layer is not available.

In the demo, the layer change is purely manual, but one could (and probably should) connect it to a network event. Basically the callback for the “slow network” event might be configured to trigger a layer change, modulo a certain duration to prevent changing too often. The logic about when you should send an event from the receiver  to ramp up bandwidth usage again is left to the application.                                                                                               

About CoSMo

CoSMo was funded in 2015 by WebRTC experts with a goal to develop tools to make WebRTC easier to use, and help Businesses adopting it. Contributing to WebRTC code since the early days of 2012, leader of the webrtc-in-webkit project, Invited expert to the WebRTC working groups, co-chair of the IMTC WebRTC interoperability group, speakers and chairs of several RTC conferences, the CoSMo team is part of Real-Time Communications DNA, from standard to implementation. They provide a wide range of technical expertise on WebRTC, with a focus on system level design, technical due diligences, and Customisation/Integration of the WebRTC stack. Today, CoSMo is a team of almost 30 individuals, working across four countries and two continents. More information at www.cosmosoftware.io

About Meetecho

Meetecho was born in 2009 as an official academic spin-off of the University of Napoli Federico II. Since Day One they’ve been working hard on real-time multimedia applications over the Internet, ranging from VoIP to more advanced applications based on top of the emerging WebRTC technology. Today’s Meetecho team is composed of world level experts in Real-Time Communication, proud authors of the Janus® WebRTC server! They provide design and implementation consulting services of WebRTC products on top of Janus®, ad-hoc solutions for streaming of live events to the world with remote participation, as well as Ready-to-use web based conferencing and collaboration services. Their website is listing the major companies already trusting them.

About the WebRTC A-Team

10 years ago, a crack student unit was sent to PhD by a European court for WebRTC, a technology they didn’t write. These men promptly graduated from maximum security universities and vanished into the Napoli and Singapore Undergrounds. Today, still wanted by their government-paid past teachers, they survive as WebRTC consultants. If you have a WebRTC problem, if no one else can help, and if you can find them, maybe you can hire… the WebRTC A-Team.

 


IETF’101 “Identity Assessment in #Webrtc” Hackathon

$
0
0

Security in Real-Time Comms.

Security is important for communication, and in the wake of XXXX (pick your favorite) revelations, the IETF RTCWEB working group and other standard committees alike had decided to up their game. With respect to webrtc, that’s for example when the decision was made to mandate the more secure DTLS-SRTP over SDES-SRTP. The entire Security architecture is documented within a corresponding documents and some dependencies:
https://tools.ietf.org/html/draft-ietf-rtcweb-security-arch-14

Encryption is only really useful if it is end-to-end and if you are sure who you are talking to.

End-To-End Encryption

In the default p2p case, Webrtc is end-to-end encrypted, but when there is a media server in the loop, it is not. Upon reception of a stream by the media server the encryption is dropped (the DTLS is “terminated”), and later on another encrypted connection is made between the media server and the remote peer. The IETF PERC working group has been working on this problem for some time now, and this is not the subject of this blog post. People interested can look at the ongoing work there, or read some articles on a slightly different approach, with implementations in Janus, Jitsi and Meedoze webrtc media servers are online.

This blog post is about the latter problem of identity.

Identity Verification

Multiple Identity Verification Services

Arriving at the conclusion that there would be no serious security model without an identity verification mechanism, the W3C WebRTC working group decided to add a mechanism to support identity providers. Yes, providers with an ‘s’, as not a single mechanism would emerge as the global standard. OAuth is a very well supported mechanism used by many identity providers, usable with facebook, google, and other identity providers, but there are other mechanisms that will be used, and the working group couldn’t find a compelling reason to restrict WebRTC to using only one.

More details to be found in Webrtc identity JS APIs in the browsers, in section 5.6 and section 6.4 of the RTCWEB security architecture document and Illustrated in Cisco’s cullen’s slides.

No WebRTC Identity Service yet!

Practically, and surprisingly, only a few browsers have implemented these APIs (actually, only Firefox), and there is no in-production implementation of an Identity Provider the way it was defined in the standard document that exist today (as far as the author knows). A few mockups of such service were implemented by Firefox for testing services, but there were far from complete, their goal being to just exert the APIs and not really to use them in a production environment against a real identity service.

First WebRTC Identity Verification Service Challenge

A small group of enthusiasts (including yours truly) decided to use the IETF’101 meeting hackathon time to try to make the first implementation of a viable identity provider and test corresponding implementation and APIs in Firefox, the only browser which implements them today. By doing so, they surrendered otherwise tremendously enjoyable sightseeing time under the blistering cold weather of London in March, only to suffer being provided with free, unexpectedly great food and beverages from Hilton hotel services. One has to recognize their selfless contribution to IETF community here.

First WebRTC Identity Verification Implementation

All the code generated during the hackathon is public.

The CoSMo example and google’s example are both functionnal. The google example is self contained in that repository, and should be the starting point. Happy hacking !

What’s next.

As expected the Hackathon revealed a couple of bugs in the Firefox implementation. These are being worked on already, see bug 1446880. Lots of discussions are right now happening in the IETF-RTCWEB mailing list to increase the security of the mechanism.

Participants in alphabetical order

H. Alvestrand (1), A. Gouaillard (2),T. Hollebeek (3), C. Jennings (4), S. Murillo (2), N. Ohlmeier (5), M. Thompson (5),
(1) Google (hackathon champion), (2) CoSMo Software, (3) DigiCert, (4) Cisco, (5) Mozilla

 

Janus #webrtc Video Room is getting a collection of clients software options from CoSMo

$
0
0

CoSMo and Meetecho have been working together for some time as the #webrtc A-Team. So far the contributions described in different blog posts have mainly been on the server-side, with Double-Encryption, VP9 SVC, or more recently better bandwidth management support. This time, we are going to speak about several client software options to connect to Janus Instances that have been just been made available.

Janus WebRTC Server

The Janus Webrtc server, especially equipped with its “video room” plugin, is very popular in the ecosystem. Janus does not reach the capacity of Jitsi Video Bridge for Traditional Video Conferencing use cases. However, the versatility of the server is impressive:
– through its multiple plugins, it can add different behaviours and logic, from SFU to MCU, Gateway, recording, …
– it does not impose any signalling on your application, and it actually implements support for quite a few signalling transports protocols (REST, ws, QMTT, …) out of the box,
– thanks to a C code base, it scales down! like, really! you can have a full server running on a Raspberry Pie, and many other IoT options. NTT Communication has integrated Janus not their IoT SDK!

Client / Server – the WebRTC A-Team Genesys

Very early on, it appeared that Janus was lacking some love on the client side. Clients that would be drawn to Janus because of its versatility, would expect the same kind of versatility on the client side. Hello, we would like to have a client
– that supports double encryption.
– that supports IE explorer on windows 7.
– that supports VP9 SVC, or AV1.
– that supports gaming capture cards with resolutions of 4k @ 60fps.
… and so on and so forth.

All those requests have in common that they require modifications of the C++ internals of libwebrtc, of which CoSMo is not only a recognised expert, but also one of the very very few around. They also have in common to only be possible in native clients or modified browsers. Finally the more clients types you add, the more difficult testing and validating them against each other became. That is, until CoSMo released KITE.

Collaboration was obviously the right move. The A-Team was born.

New clients options for Janus Video Room

Qt-based

Lots of reader of this blog might not know Qt. Qt is a fantastic UI library written fully in C++, and distributed into an acceptably permissive library (L-GPL). For those who want a single language family (Assembly / C / Obj-C / C++ ) for all their code, this is a rationale choice.

Qt was one of those library at the cutting edge of C++ and many features they implemented ended up in the C++ standard, just like Boost. Actually, libwebrtc is using the exact same concept (signal/slot) in an underlying library called sigslot. Unfortunately, it’s so close a copy that it makes integrating Qt with libwebrtc complicated, unless you use the pimpl design pattern which was introduced by … Qt, but that’s a discussion for another time.

Qt has the advantage to compile across Desktop and Mobile applications, with the provided Qt Creator software. It also has a great visual GUI editor. 

But the main advantage of Qt, originally developed by Nokia for their phone, is to run on most embedded devices and IoT devices. There again, a perfect fit for Janus. 

Qt comes nowadays with an extension, called QML,  that supports Javascript syntax. As a result, porting Janus Video Room Demo HTML5 app to Qt was made easier, as all the JS signalling library could be directly reused. You end up very easily with a desktop Native app, and with a little bit more of work on the libwebrtc side (*), a native mobile app as well.

More details can be found on last year IIT-RTC’s tutorial page.

If you need support or developpement on Qt with webrtc in general or Janus in particular, contact us.

OBS-Studio

OBS-Studio is what 99% of the people streaming on youtube, twitch, Microsoft’s Mixer, DailyMotion, and many other flash-based streaming services use.

We released a version that also supports webrtc (media, encryption, RTP), and demonstrate it by adding support to stream to a Janus Video Room plugin. In addition to webrtc support, it required adding web socket support for signalling transport, and Janus Video Room signalling API support.

More details can be found on the project page.

If you need support or developpement on OBS-Studio with webrtc in general or Janus in particular, contact us.

Electron-based

No need to present electron. It’s one of the most used alternative to wrap an HTML5 app into a native app. It also allows to shield one against the very fast update cycles of chrome, or maintain modified versions of chrome or node.js to differentiate.

Highfive for example has been enjoying a version of libwebrtc with H264 simulcast for more than a year and a half. While they have provided a patch to google, it is still in review after more than a year, and hadn’t;t they been using electron, they would still be waiting.

Slack mentioned that they went to electron for their native app after finding it too hard to manage libwebrtc standalone. Indeed electron pre-compile libwebrtc and chrome into something they call libchromiumcontent, that is downloaded on the fly for you. If you do not have any modification of libwebrtc, this is indeed faster. For CoSMo, that last point is less desirable. First, compiling and maintaining libwebrtc is not a problem for us, we do that daily. Then, we differentiate by adding features to libwebrtc (double encryption, new codecs, simulcast for H264, ….). Or simply to have the critical-webrtc-bug-free chrome 61 in Electron 1.8.2 which would otherwise use chrome 59:

If you need support or developpement on Electron with webrtc in general or Janus in particular, contact us.

Conclusion

We hope that our contributions on the client side will help the Janus community to grow, and beyond that, help more people using webrtc in their audio and video applications. 

Appendix – CoSMo and #WebRTC Servers

CoSMo has its own WebRTC media Server toolkit, called Meedoze with corresponding SFU and MCU implementations. We are also experts in  Jitsi Video Bridge based solutions, used in production today by companies like Symphony. We routinely redefine the limits of what is possible with WebRTC, and publish corresponding results both in scientific peer-reviewed publication and at standard committees.

The above Qt library and client, as well as Electron builds exist in at least three flavours: one that connects to janus video room, one that connects to JVB, and one that connects to Meedoze-based Media Servers. OBS-Studio as a roll down menu that let you choose which service or server you want to connect to.

You will be able to hear more about it at CommConUK in June 2018.

Living the #WebRTC start-up CEO dream.

$
0
0

I used to be very entertained by stories given by different webrtc consultants. The “Cat roulette” example by L. Barr last year during a conference in Miami was exhilarating. I was thinking: those guys have all the fun!

What people don’t tell you when you go into that business is the wide variety of clients you end-up facing. Now that I am on the other side of the fence, the grass is not exactly greener. In an effort to give people a glance of how hard it can be sometimes, and to keep my mental sanity by letting it out, I’m writing a post about the horror stories we’ve faced so far, or, to put it in  amore diplomatic way: about the inherent bias clients can have about our perceived value, or WebRTC perceived value.

Client is king

“Client is king”. Except that being the client in a contractual relationship does not make one more knowledgeable, nor capable on a given subject.

When you are in the consulting business, most of the time people come to you for expertise, but keep that “client is king” attitude and expect you, because you’re the expert, to do the impossible. It’s very well known, and there are even some educational / caricatural videos about it. Here is my favourite.

“Hey, it’s very important that our product support <put-your-impossible-feature-here> because ours clients keep asking for it. Add it to the roadmap without additional cost for us. After all, we’re paying you for that and it should be there in the first place, since we told you it was important before.”

Yes, I heard you, and I recognise your expertise but I will still go with X

I think that the blog post Tsahi wrote about his frustration to see a client choosing peerJS is also spot on a shared pain among a lot of consultants: seeing a client that does not know better not only not following your expert advise, but also going completely against common sense … for people that have the minimum technical capacity to see e.g. that a GitHub repository with webrtc code that has not been updated for even 6 months, should be considered dead.

“Expert: I attended the standard committee with the representatives from MS, Goog, Moz and Apple, and they all said it will not work. I attended the bi-weekly meeting with the Jitsi team and asked them why they would not do it, and they said: ‘we tried, the quality was underwhelming and not at the level of quality we expect for an industrial product, we had too many support to do to manage client expectations, so we just decided not to support it.’, so I guess there is clear consensus from the field’s world experts that it is not something you want to do.”
“client: We’re not convinced. Can we have access to the code of Jitsi when they were trying that? Can you help us test it, because we don’t have the capacity to test it either.”

Open-source does not make you smarter or more capable.

Truth is: open source software gives you the opportunity to modify the source code yourself, but it does not give you the capacity to do so.

  • If you are not familiar with the code base,
  • if you do not know the specifications they are trying to implement,
  • if you do not know how to tests your modifications,

    don’t even try, you’re wasting your time.

I keep buying Ikea furnitures to my mom. I keep building them up for her. The fact that the furniture is provided with the plan, and all the tools, does not make my mom more capable of building it by herself.

“How complicated can it be: webrtc is open source, I will do it myself.”
one week later:
“discuss-webrtc: where is the make file for the webrtc library?”
two weeks later:
“discuss-webrtc: why is it taking so long to fetch, and then so long to compile, every time I make a modification I need to wait 4 hours.”
three weeks later:
“how do I modify the build.GN file to add my own code? What are the args I need to pass to ‘gn gen”? Where is my dll when a build a shared lib?”
<cheat and just fork and hack libwebrtc directly>
5 weeks later:
“I have a project with a forked version of WebRTC 59, it does not work against chrome anymore. How do I rebase?”
6 weeks later:
“F*** it, let’s go back to our consultant, they were fast and apparently knowledgeable since we never bumped into all this problems with them.”

Reality-check

Unfortunately, when faced with clients that unreasonnably perceive what you do as easy and cheap, you can only fall back to a reality-check: go ahead, put your money where your mouth is, try, and when you will have failed, I’ll be here to help you.

We feel terrible to have to say that, and very often, we are perceived as arrogant a**holes, but first, we’re french, so we’re used to that reaction, and two when someone ask you for a Ferrari at the price of a Fiat punto because both are cars, there is so much you can do.

I recently had a client that asked us to do something quite complicated, namely to add WebRTC support in a flash streaming software that was otherwise open-source. A one point, their position became: it’s open-source, you are charging us too much, we can do it ourselves.

Anybody that has tried to deal with libWebRTC themselves is starting to giggle at this point and is securing some popcorn for the long story to follow. The tech lead was a great JS software engineer, who had never done any media coding, any network coding, and barely done any C++ in his life. Hey, it’s software, how difficult can it be, right? Here i mean to stress that those decisions are not the exclusivity of non-technical persons.

Long story short (see above), one month and a half later, they accepted our original quote for work. That’s one month and a half that could have been saved for them, plus an additional month because my resources were allocated elsewhere meanwhile, and they had to wait for them to be available again.

Conclusion

The knowledgeable WebRTC consultants out there, those with tracking records of contracts with the core WebRTC companies, which participate to the evolution of the technology and which are trusted enough by Browser vendors, and other webrtc open source projects alike to contribute code back, are few and far apart. I’m not sure I would need two hands to count them.

If you have the chance to be able to secure consulting or development time from one of them, well as steve jobs once said: ““It doesn’t make sense to hire smart people and then tell them what to do; we hire smart people so they can tell us what to do.”

Moreover they are part of a long-standing small club that meet often on the side of the tech conferences, mainly to discuss technical points and push back further the definition of what is possible, but also to exchange their stories and feelings about who they are happy to work with, and who they’re not and might be avoided.

#Webrtc Codec vs Media Engines: Implementation Status and why you should care.

$
0
0

Which codec, and which flavours of codecs are supported by which browsers. This is a tricky question for every product manager that wants to define product expectations and roadmap. Should I support only VP8 / H264 ? Should I wait for VP9? What is multi-stream, simulcast, and SVC versions of those codecs? Beyond all those questions really is: when can I tell my customers I support Desktops browsers, Android, iOS? If I can’t should I have a native app instead. This blog post aims at putting everything back in perspective.

I. Codecs and bandwidth

When speaking about codecs, traditionally, people speak mainly about quality vs bandwidth footprint, and elude any reference to networks, and or CPU footprint. When speaking about encoding of pre-recorded media it makes a lot of sense.

First, you have all the time in the world to encode your content, so all optimisations, however costly in CPU and in time, are fair. Most of the best encoder use a multi-pass algorithm:

  1. pass #1: cut the video in homogeneous chunks (same background, fast or slow scene, …)
  2. pass #2: compute some statistics to help the encoder later, as well as some inter-frames values (those requires several frames to be computed).
  3. pass #3: encode each chunk separately, with the stats from pass #2

For real-time streaming, obviously, you will not wait untill the end of the conference, or the show, to start streaming it. It’s the “live” aspect, and the interaction that brings value to a real-time stream, more than the quality itself.

Then, if distributed on traditional physical medias like DVD or BR disks, your size budget is fixed and cannot be extended, so you want the best quality for a fixed size. If distributed over the network (streamed in the netflix/youtube sense), you main cost and main bottleneck for viewers, is the bandwidth usage, so you want the best quality possible for a given bandwidth (when bandwidth is limited), and the less bandwidth possible for a given “quality” (in this case quality loosely refers to resolution, e.g. full-HD, 4K, …) to be able to reduce the cost. All the glitches caused by the network side of things can be taken care of with intelligent buffering.

For real-time streaming, there cannot be a physical storage format. Moreover, when streaming real-time content (as in youtube-live), since the most important is interaction, any kind of buffering, with its added latency would be unacceptable.

Finally, for a very long time, the codecs experts have assumed that only one video stream at a time will ever be rendered on a computer (wether streamed or read from a DVD), and it could always be offloaded to the GPU. So complexity of the codec does not matter, really. Then mobile happened. Then WebRTC and SFUs happened.

Those little distinction above are the reasons why discussion about codecs, or comparisons of codecs in the context of Real-Time Media, does not make sense if you do not take into account both the extreme sensitivity to real-time constraints AND to network quality. Those who only compare maximum achievable compression ratio, are off-topic, and unfortunately, that is what i see the most often being cited. Even during the original discussion about VP8 vs H264, one of the most contentious discussion was wether the encoder settings were realistic or not. “Pure Codec” people would state that the network should not be taken into account when benchmarking, everybody else would argue that without being adaptive, without accounting for packet losses, and other network jitter, propagation, etc, the results would not be practical.

II. Codecs: So where do we stand?

The goal here is not to provide a definite answer, but more a rule of the thumb for product manager to make a decision.

  • H.264 and VP8 are basically in par when it comes to quality vs bandwidth.
  • VP9 and H265 seems to be practically in par when it comes to quality vs bandwidth, and exhibit an average 30% gain over VP8/H.264 at a cost of around 20% extra CPU footprint.
  • AV1, a mix of VP10, daala and thor, exhibits more or less the same gain / loss with VP9/H.265 than those had over VP8/H.264.

However, the most guarded secret is that nobody gives s*** none of this really truly matter when it comes to real-time media User Experience. What matters is how good a network citizen your codec is. Nowadays, in real-time media, nobody cares about the codec alone, people care about the media engine: capturer + en/decoder + packetizer (RTP) and its capacity to handle the network (bandwidth fluctuation and bandwidth quality). If your encoder stop working as soon as your bandwidth goes below a threshold, if your decoder stops working when there is a single packet loss, if your encoder cannot encode at least at 20fps, your real-time media solution is worthless, however good the compression ration of your codec is.

There is no surprise to see that google sponsored research from Standford Ph.D. on better codec / network coupling. It’s the future. 

III. Codecs to media engine: a logical disruption

Real-time media has almost the same goals as normal media, but with a very different priority order (main goals):

  1. I need to maintain 30fps (speed)
  2. I need to maintain 30fps with interactivity (latency)

and since you can’t assume anything about the public internet, if you stream over it, you have additional constraints (public internet goals):

  1. I need to accommodate small bandwidth fluctuations
  2. I need to accommodate for huge bandwidth fluctuations
  3. I need to accommodate for jitter (out-of-order arrival of packets)
  4. I need to accommodate for packets loss

III.1 Media Engine: Main goals

Maintaining a throughout of 30fps means that you have 33ms to capture, encode (codec), packetise (RTP), encrypt (SRTP) and send (UDP/TCP) on the sender side, and the reverse on the receiver side. It’s usually more difficult on sender side since encoding is usually more complicated and slower than decoding.

You could maintain a throughput of 30fps while inducing delay. For example, temptation is high to have a frame buffer with multiple frames to compute some of the encoding optimisations (inter-frame prediction, motion vectors, ….) I spoke about earlier. That would in turn reduce your bandwidth usage. Unfortunately, waiting for 5 frames means you are accumulating the delay of capturing 5 frames before you start any encoding. The encoding itself is slower, resulting in a delay in your system. Your throughput is 30fps but your end-to-end latency (from sending camera capturing to receiving screen rendering) is more than 33ms. A lot of website, intentionally or not, are deceiving their reader by reporting end-to-end latency as the time taken from the output of the transport or UDP/TCP socket on the sender side to the receiving side, conveniently omitting to measure encoding and decoding time, and any additional delay introduced on client side. Needless t say, their measure does not correlate with user experience.

The solution? drop every codec subcomponent and sub-algorithms that induce too much delay. Basically, revert to almost a frame-by-frame approach. While this was an outrage, sorry, a disgrace, or even a blasphemy originally in the codec community, nowadays most of the new codec have a “Real-time Mode”, i.e. a set of parameters where the latency is prioritised over anything else, while traditionally there were only a “best-quality” or “minimum-size” modes and timing did not really matter.

To be thorough, in the new codecs you also have a Screensharing mode, since the content of screen sharing is particular (high spatial resolution, low temporal resolution, lossless …).

III.2 Media Engine: public internet goals

small bandwidth fluctuations

Old codecs could not change their bandwidth rate, i.e once started they would assume that a certain bandwidth would be always available for them, and if not, would fail (miserably). That was the time were codecs and audio/video streamings was thought as an add-on of network equipment, and thus the security and the bandwidth were handled by the network equipment. Dedicated ports, dedicated bandwidth. No public internet.

The first change was to make codecs “bitrate adaptive”. In any codecs you can change certain parameters. Some changes are obvious to the human eyes like changing the (spatial) resolution, some a little bit less like changing the temporal resolution (30fps to 25fps), and some are almost invisible like changing the quantization. The quantization parameter is the number of shades a given colour can have. If you use 256 (often the default), you will have smooth transitions form white to black, if you reduce it, it will be less smooth but your eyes in most of the case will not see the difference. Traditionally encoders use the QP parameter as a knob to achieve bitrate adaptation without too much impact on visual quality of the video.

Of course, you need to be able to compute available bandwidth, and provide feedback. Those mechanisms are in a media engine, but not in the codec.

huge bandwidth fluctuations

bitrate adaptive is nice. bitrate adaptive is automatic. However, it cannot accommodate for high bandwidth change. Let’s say your bandwidth is divided by two, even with bitrate adaptive codec, you won’t survive.

In those cases, you need to reduce the spatial resolution, or the temporal resolution. The temporal resolution is usually the first target, for two reasons. One, the human eye is slightly less sensitive to frame rate changes than it is to resolution change (within reason). One usually just drops one frame out of 2, or 3 (30 fps => 15fps => 10 fps). In most of the case though, you need to do this on sender side, and if your sender is connected to a media server which relays the stream to multiple remote peers, all the remote peers would be impacted. Basically, they would all receive a stream that has been adapted for the worse configuration / network of all the remote peers.

If you control the sender side, and are using an SFU, but have bandwidth limitation on (one of) the receiver side, a better approach is to use simulcast. The sender side will encode the same stream at three different resolutions, and depending on the capacity of a remote peer at a given time, the SFU or the receiving client will decide which resolution of the original stream to consume. Now you can accommodate each remote peer individually, at the cost of encoding three times (CPU overhead) the same stream and sending them (bandwidth usage overhead on the sender side). Note that any codec can be used in simulcast mode, if corresponding implementation exists. It’s not an intrinsic codec feature, it’s external.

SVC, a.k.a layered codecs achieve the same thing: the capacity to chose within the SFU which resolution to relay for each remote peer, but in a smarter way. There is only one encoder instead of one per resolution. It allows for around 20% saved bandwidth over simulcast for the same resolutions. There is only one encoded bitstream, within which the resolutions are “layered” or interlaced, which simplifies lip-sync, ports management, and other practical details. On the SFU, it also simplifies things, since now each of the packet are marked, and changing resolutions practically boils down to dropping packets, which does not take times as opposed to switching between simulcast streams which require some rewriting of packets and to wait for a full frame.

jitter (out-of-order arrival of packets) and packets loss.

Those are the most difficult things to deal with. Jitter is easy, you create a buffer, and since all the packet are numbered, you put them back in order. In real-time you do not want the jitter to be too deep otherwise you potentially wait too long, and break your real-time constraint (33ms end-to-end). Using a bigger jitter buffer could help, but would basically means buffering.

Packet loss are usually taken care of by retransmitting the packet (RTX). If the time it takes to go from the sender to the SFU and back (RTT) is fast enough with respect to our 33ms constraint, we have time to retransmit the packet before running out of time. If this is not enough and you have a bad network (more than 0.01 % packet loss), you need to implement more advanced error cancellation algorithms like FEC.

A better approach, here again is to use SVC codecs. Because of the way the layers are interlaced, and because only the base layer is really needed for the call to go on, practically, the time widows you get to retransmit a packet corresponding to the base layer is several times the RTT. It means that just retransmitting packets is usually enough to compensate for very bad network conditions (1%+ packet loss) without loss of continuity in the call. While simulcast was just a solution for bandwidth management, with and SFU, SVC codecs are a solution to both bandwidth fluctuation and network quality problems.

IV. Current status of Browsers

Firefox and Safari follow google when it comes to the media engine. They only update their internal copy of libwebrtc once in a while though, and do not follow google chrome release schedule (every 6 weeks). They might be at one point out of sync, but they eventually catch up, with the exception of VP8 in Safari (don’t ask).

Then, you can take a look at the table below for completeness, but the analysis is simple, since most discard Edge right away. Today you have to choose between supporting iOS Safari or having a good quality. iOS Safari only supports H.264 on one hand, and  libwebrtc only implements simulcast with VP8 and SVC with VP9 on the other hand. 

How important is simulcast support when we already have normal H.264 support in iOS? Well, in most cases, client will not forgive you for compromising quality over interoperability. If you want to support iOS with the same level of quality, go native for now. A few cherry-picked examples:

Highfive has an electron client (desktop native), with support for H264 simulcast for video for more than two years (and enhanced audio codecs from Dolby).
Attlasian has refused to deliver a client without support for simulcast, as it would not be of good enough quality. They support iOS through a react-native client in which they have support for simulcast through the embedded libwebrtc.
Symphony has an electron desktop native client, and react-native for iOS and android to be able to support simulcast and implement double encryption in libwebrtc and be in par with bank regulations.
Tokbox has had VP8 with temporal scalability for at least 4 years now in their mobile SDK (using a modified libvpx in libwebrtc) to achieve better quality in their mobile-to-mobile video calls.

So first, you can trust that they know what they are doing. Then you can trust your consultant. Now if you don’ t, you have to wonder how you are going to compete with the rest of the ecosystem if you have an inferior technology.

V. The future

It’s pretty clear that VP8 will not be available in safari. Same can be say without a lot of risk about VP9.

While early on Apple seemed to support H265 for inclusion in webrtc, since it already supports it for HLS anyway, its recent joining of the Alliance for open media and a few other small things like scrapping any mention of H.265 on the outside of the iPhone lead me to think that AV1 might be the next thing. Unlike the rest of this post, this is just an opinion.

In any case, the reference AV1 bitstream has just been frozen (the specs are complete, if you want) but the reference encoder is still far, far away from real-time with a 0.3 fps on reference hardware. While it might not be such a problem for pre-recorded content (you have all the time in the world for encoding) it definitely is a no-go for real-time media. It will take a year at least to see it coming to a stage where it can be fast enough to be usable in RTC.

In the mean time, in non-RTC use case, you can already enjoy playing pre-encoded AV1 files in Firefox thanks to bitmovin’s (high-latency, non-webrtc) streaming technology. The same bitmovin whose founder invented MPEG-DASH and which just announced raising 30 millions to prepare for the next generation of video infrastructure … 

Streaming protocols and ultra-low latency including #webrtc

$
0
0

In their latest blog post, Wowza is doing a great job at explaining in simple words latency, and the use cases that could benefit for having under 500ms, a.k.a. “real-time”, latency. However, the section about streaming protocol is somehow confusing me. This blog post is an attempt to put those protocols back into perspective to have a fair comparison.

Signalling path vs Media path

In most modern systems, wether video conference or streaming, you have a separation between the signalling path and the media path. The signalling path is used for discovery and handshake.

The discovery is the act of connecting parties that should start sending media to each other, wether both parties are individuals in the p2p case, or individual to server/infrastructure for publisher / viewers.

The handshake is the act of exchanging informations between parties prior to and for the purpose of establishing a media path and streaming media.

While the two path are explicitly separated in VoIP (SIP), and WebRTC, there were not in RTMP (flash) and often lead actors from the flash world to mistake one for the other, and compare things that should not be compared. Both signalling and media have their own protocol, and an underlying transport protocol.

Signalling protocol vs Signalling transport protocol

The signalling protocol defines the format and the content of the signals that are being exchanged during the discovery and handshake. It can be SIP for discovery with VoIP or WebRTC, with SDP O/A for the handshake,  it can also be RTMP/AMF, for flash, it can be JSEP for webrtc, etc.

The signalling transport protocol is the underlying protocol used to transport the signalling from one party to the other. Traditionally it was either TCP or UDP (flash, SIP), but more recent protocol have also been used like Websocket (SIP, webrtc).

Note that websocket is a TRANSPORT protocol, that compares with TCP and UDP, and should not be compared as is to media streaming protocols like webrtc, SIP, RTMP, or others, that are more complex and belong to a higher level of abstraction. web socket has been especially popular for web apps as you can start a web socket connection from within the browser, while you cannot access a “raw” TCP or UDP socket in the same context.

For example of full signalling solutions, you can find “SIP over web socket”, “JSEP over web socket”,  as well as (a subset of ) RTMP, or older “SIP over TCP/UDP”, ….

Media Codec

All the media streaming protocols suppose that an encoded version of the media to be streamed is available. It’s called a media “bitstream”. Most media streaming protocol are codec specifics or support only a limited list of codecs.

Real-time streaming protocols like webrtc will directly encode the frames from a video capturer (webcam, screen, ..) and encode it on-the-fly to avoid extra latency. 

Moreover, bitrate adaptation, needed to compensate for bandwidth fluctuation or poor network quality (jitter and packet loss), is done on-the-fly by adapting the encoder settings from frame to frame, depending on feedback from the media streaming protocol when available (RTP/RTCP), and bandwidth estimation (REMB, Transport-CC, TIMBR).

If not available (RTMP), the available bandwidth needs to remain above a certain threshold for it to work.

Media streaming protocol

The media streaming protocol defines how the media is cut into smaller chunks to be then handed over to the media transport protocol.

Moreover, reliable protocols add a mechanism to  compensate for poor network quality (jitter or packet loss). Jitter is usually dealt with on the receiver side by adding a small buffer to reorder the packets. Packet loss is dealt with in real time by retransmission (RTX), redundancy (RED) or Forward Error Correction (FEC).

Media Transport protocol

Once the media bitstream has been cut into smaller chunks, they need to be transported across using a transport protocol, not unlike the signalling before it, just over a different path. The transport protocols are more or less identical than for the signalling, but since the load is quite different, some are more appropriate than others.

As a transport protocol ,TCP is reliable when UDP is not. However, TCP comes at a cost both in term of bandwidth and latency. As most media streaming protocol already include a reliability mechanism (RTX/RED/FEC), UDP is then a much better choice in theory. In practice, the UDP ports might be blocked. 

A port also needs to be used to attache the socket to. Finding an open port and protocol pair can be tedious and in older media streaming protocols is hardcoded. New streaming protocols like WebRTC use Interactive Connection Establishment (ICE) to automatically and dynamically chose which port and which transport protocol to use.

QUIC is a new transport protocol being discussed within the IETF standard committee. It is backward compatible with UDP, and has several other advantages in terms of both speed and reliability over both TCP and UDP.

Media streaming protocols like MPEG-DASH or HLS use HTTP as media transport protocol and should see possible improvement coming from the new HTTP2.0 standard in the making.

Media Security

Some media streaming engine encrypt their data for added protection. Either the media is itself encoded (codec or payload level), or the chunks are encoded (SRTP). The exchange of the encryption key have their own protocols, with the two most often met being SDES (Voip), and DTLS (WebRTC). The later has the advantage over the former that the exchange of the key itself is always secure.

Points that confused me.

WebSocket and QUIC being “pure” transport protocol (agnostic to their use in transporting media or not), it’s surprising to see them put at the same level as WebRTC, Flash, or HLS, that are focussing solely on media streaming. One would need, and I can only assume that it is what wowza does, to handle the encoding and the chunking separately before using WebSocket or Quick Directly. Note that WebRTC (libwebrtc/chrome) and ORTC have an implementation of their stack using QUIC as a transport.

Equally surprising is the lack of mention for HTTP2.0 as an optimisation for HTTP based protocols like HLS, or MPEG-DASH. CMAF seems to be a file format that could be used by HLS and MPEG-DASH, but not a replacement of them.

Finally, SRT is also only a transport protocol. While it seems to bring to the table things that were missing from file-based protocols like HLS and MPEG-DASH, it seems that those added feature are already present in RTMP or WebRTC. SRT seems to suppose a separately encoded bitstream, which remove the opportunity to couple the network feedback (packet loss, jitter), to the encoder. While network reliability seem to be address at the packet level, it is likely that bandwidth fluctuations that would require encoder bitrate adaptation are not.

Note: Bandwidth fluctuations are addressed in file-based protocols like HLS through multiple parallels encoding, which adds latency, while they are addressed by on-demand encoder setting modifications in real-time protocols like WebRTC for minimal latency. In HLS, you need to be done reading a chunk (that can be 10s long), before you can change resolution and adapt. In WebRTC, you need to be done with one Frame (33ms at 30fps), before you change the encoder setting, reducing the adaptation time.

Conclusion

I certainly made a lot of mistakes and approximations in this post. I’ll put that on the jet-lag 🙂 Please correct me wherever I have been inaccurate.

Testing #WebRTC products, from clients to servers and infrastructures

$
0
0

Since our Real-time communication testing evolution with WebRTC 1.0” article was accepted for publication by IEEE, not many new scientific papers have been published on WebRTC testing apart from [1] and [2] ([3] is also available, but without indication wether it has been peer-reviewed and accepted for publication or not), but a lot has happen with KITE.
Beyond being the official testing tool for webrtc.org, and adding tests for multistream and simulcast to better cover the last pieces of the WebRTC 1.0 spec, support for mobile browsers and mobile app testing, as well as for Electron apps testing have been added. KITE can now support up to 20 clients configurations, making it the most complete and most versatile #webrtc testing tool known to date.
Beyond the Interoperability Mode, a Load Testing mode has been added to KITE, specifically to stress test webrtc servers and infrastructures, with a capacity of 1 millions users. It has been tested in production by several customers in a one-to-many streaming environment.
      Of course, you can have CoSMo run KITE for you (hosted and managed), or you can run it yourself, on premises. More details in this Post.

I. Help maturing the webrtc 1.0 Spec

As google is moving to support unified plan, multi stream and simulcast, there was a specific need on tests to be able to support that effort. When those test were written, only Firefox was supporting those, and in a not fully compliant manner. Moreover, while multistream can be tested in p2p, browser-to-browser mode, simulcast can only be tested against an SFU, so a new kind of test and test infrastructure needed to be developed.

You can check the simplified dashboard at webrtc.org with 8 configuration made visible, or the cosmo hosted dashboard, with all configurations, and all tests on: CoSMo’s Dashboard.

II. add support for Mobile and Electron

With electron being so hot, and electron WebRTC implementations almost on Fire it was obvious that any serious WebRTC testing service had to have electron support. It was equally clear from everybody that we met, that, while desktop browser support was a nice to have, the difficulty, and the added value was most often than not on testing teh mobile apps. Both cases had the same challenges: it does not work out of the box.

Unlike chrome, Firefox is not recognised as browser on android and required quite a lot of modifications to work. Thanks to collaboration with Mozilla, the Appium community, Selenium HQ, and the webdriver JAVA bindings community, we patched all of the above projects to be able to support Firefox on Android. The results will appear soon on the webrtc.org dashboard for everybody to follow.

Thanks to collaboration with ElectronHQ, and feedback from Slack, we managed to modify Electron and its webdriver implementation to bring it to parity with it’s chromium parent. You can see a video of the test here

In both cases, we are pushing the patches to the different upstream repositories, but in the mean time, it’s likely that we are part of the happy (very) few capable of running webrtc tests against android firefox clients and Electron.

III. KITE Load Testing Mode

What we missed in our original article, is that half of the testing projects where not (client-centric) interoperability tests, but server or infrastructure-centric testing project. Load multiplier, Jattack, jitsi hammer, all were load testing projects.

Load testing of video conferences is very different in practice than load testing of streaming infrastructure.

In video conferences, the number of streams grows as the square of the number of users. It means that if you get 30 users, you get approximately 900 streams, and you get close to the limit a single server can handle. As illustrated by Jitsi in one of their post (in which they push up to 28 users), the main bottleneck of media servers is the bandwidth. With an usual 1Gbps NIC, and supposing that a VGA@30fps stream is around 1Mbps, you can handle around 1,000 streams. With 30 clients connecting to a server, you saturate a server, with around 300 clients you can start really stressing a Video Conference infrastructure, if your clients can handle receiving all those streams back.

In the case of one-to-many streaming, you do not have the same problem on the client side, each client only receives one stream. However, you need many more clients. For social sites, have thousands of viewers if not unheard of, for live event, you’d better scale to millions. We modified the KITE architecture to handle multiple hubs, in multiple locations (e.g. AWS availability zone), which, beyond improving the scalability to the millions of viewers, allowed for geolocation testing as well.

Of course, it does not work that easily with apple devices, mobile devices and Edge. Mobile devices need physical devices. Apple only allows it’s operating systems to run on apple hardware. Both Edge and Safari limits the number of browsers under automation to 1 per operating system. Finally, Edge does not enumerate (recognise) fake webcam like many cams or other preventing anyone from testing webrtc on VMs.

Thankfully, the KITE architecture was made with hybrid SE grids in mind, and we have now, in addition to our AWS and GCE based hubs, a physical grid made of roughly 20 mac mini, 20 android devices, a few iPhone, and 20 Dell Optiflex micros (both 3050 and older 9020) to help. It does not reach millions, but allows some flexibility, and we can always buy more if need be. In most cases, when testing with hundreds of thousand or couple of millions, customers were happy with using only Chrome or Firefox only.

Conclusion

KITE has gained quite a few features in the past months. Some of them specific to webrtc, some of them more generic and exclusive to KITE like mobile testing, and electron apps testing.

A grid manager has been added to facilitate the automatic set up of selenium grids, and in Load Testing mode, KITE can now spawn testing instances on-demand.

Right now, the team is hard at work adding network instrumentation as a capacity to KITE, and make writing test easier. In most of the case we’ve met so far, most of the existing selenium tests could dbe imported very easily and directly run on the many configurations KITE supports where the clients were limiting themselves to chrome, or chrome and firefox at best. 

If you’re interested in using KITE, or want to collaborate on the project, do not hesitate to contact us.

Using #webrtc as a replacement for rtmp

$
0
0

Such is the title of one of the latest blog post by wowza. While it is a very interesting question, I believe the blog post is conveying the cliche the streaming ecosystem as been carrying about WebRTC and that are not longer true. I do not believe wowza to be knowingly deceiving people, i see their point, I just believe that recent advances in the webrtc protocols make most of their statements inaccurate. This post is an attempt to document the statements that can be proven wrong today. Fact-checking in some way.

I. It’s not one use case, it’s two.

As a global comment, while the title states the question as a pure rtmp vs webrtc, lots of the statement deals with global systems which involve recording, storage, and HLS, which is quite different.

Let’s separate two use cases:

a – you cannot afford any delay, you want as-real-time-as-possible, for all the uses cases wowza presented in a precedent post: betting and gambling, gaming, VR, interactive chat, adult industry ….. In this use case, the end-to-end delay is critical, and involves the encoding, the chunking/packetization,  and everything sender side.

b – you can afford some delay, either to the level of cable television (5s), or pre-recorded content serving (up to 90+s). In this case you might even be able to pre-process your media, doing encoding, chunking, and uploading to the CDN way before it is streamed to an actual viewer. Anybody saw that 1999 Batman recently? In that case, all the time-sensitive magic happen between the storage (CDN) and the player on receiving side.

RTMP and WEBRTC could address both cases a and b. One can stream directly from capturer to viewer without transcoding for lowest possible latency (case a), but one can also add a recording unit in the list of “clients” to record and store the media, and later serve it directly from storage. Conceptually there is no difference there between the behaviour of RTMP and WEBRTC.

HLS in contrast, is by design too slow to deliver stream with a latency under 2 ~ 5 second. I had the chance to exchange with the Apple HLS team at WWDC this week and they confirmed that it was by design. Most of the difficult problems are dealt with in HLS by delegating the problem to the transport (HTTP), or to the buffer, in all cases, adding delay. The “specification” for HLS make it clear: the scope was to achieve scalability, and to leverage the internet cache infrastructure.

II. if you want to slow down the streams…

If you would like to delay the playback time, or try to synchronize playback across multiple devices, you may want to capture with WebRTC, but use HTTP Live Streaming (HLS) for playback, using metadata and timecode to control the time you want referenced from playback.

This really only is a problem with case b, pre-recorded content delivered over webrtc. Moreover, this is not a protocol limitation, as webrtc includes all timestamps and a client web app can use them to sync if need be. There is no real difference there with HLS, a lot depends on the features of the client app, more than the protocol itself. (homework: look at chapter 10 of the HLS specification, check that security is not mandatory, and realise that HLS streaming security depends on the client app, and a centralised infrastructure…)

III. WebRTC Does not scale beyond 1,000?

Currently, WebRTC is very limited in its ability to scale past a few thousand instances without a vast (and expensive) network of live-repeating servers that can handle the load. Because WebRTC utilizes peering networks, there still has to be a nearby node to help distribute the stream to other local hosts—and peering across a global network can be incredibly difficult.

Lack of scalability is really the biggest cliche about WebRTC, but to everybody’s defence, it was practically true until very recently. WebRTC has roots in the VoIP world, and the first use case was video conference / video chat / Unified communication. From the start, it was about real-time. While the Video conferencing industry and the telcos had experience in real-time media, they did not in dealing with large audience. If we take jitsi, one of the best open source webrtc media server out there, as an example: they presented results of their benchmark that showed that once you reach 30 users in the same conference, a normal server would become saturated.  The thing is, we know for a long time that the average number of people ein a vide chat / conference is closer to 4. Scalability has never been a bottleneck for those who created webrtc in the first place, as a single server would always be plenty enough to support their worse case scenario. The scalability limit of one server (roughly 1,000 streams), then became the scalability limit of the protocol. Practically there was nothing more stable, because nobody had ever tried.

Now this is not true anymore. There are several companies that have, or claim to have arranged clusters of webrtc servers in such a way that you can accommodate either bigger conferences for video conference use case (more than 30 people in a conference), or bigger audiences in the one-to-many use case, like it’s been done in the VoIP field for decades. Vidyo is most famous for their cascading technology (video presentation in 2017), jitsi is rumoured to be about to release a big-conference cluster solution, and some others like red5 pro cluster, liveSwitch, claim to have equivalent solutions. The difficulty here is not to cascade the media, but to make sure that the mechanisms that handle bad network (bandwidth fluctuations, jitter, packet loss a.k.a RTCP / NACK / PLI / RTX / RED / FEC ) still work well with multiple hops between media producer and viewers, and with so many viewers.

Most recently, CoSMo as WebRTC experts has teamed up with Media Streaming experts from Xirsys to Develop such webrtc cascading technology for streaming, that we call “milicast”, that can do just that: webrtc sub second latency, at 1M viewers scale. We have tested it using the industry-validated, google-sponsored KITE testing engine, and it is already used in production by several clients in time-sensitive verticals like the adult “camming” industry. Don’t trust us; don’t trust spankchain, or Xirsys; don’t trust Google, don’t trust anyone! “talk is cheap”, had said linus torvald, “show me the code”. Contact Xirsys to set up a demo and check both latency and scalability by yourself.

Note that RTMP has exactly the same limitation, i.e. a single RTMP server will be limited in the number of viewers that can be feeding from it. The only difference is that, it s been a long time that people have cracked the problem of having ingress servers, and egress servers (with possibly a lot of other stuff in between) to deal with that single server limit.

IV. WebRTC not broadcast-quality?

Today, you can’t reliably stream broadcast-quality video through a WebRTC infrastructure. The WebRTC protocol is currently limited to supporting VP8 and .H264 video”

This one is just plain false. In many ways.

First, at it’s peak, RTMP was only capable of using H.264. In that regard, WebRTC is in no way worse than RTMP.

Second, VP8 and H.264 are the only MANDATORY TO IMPLEMENT codecs in webrtc, but nothing prevents browser vendors to add more codecs. Ericsson had a webrtc stack with H.265 as early as 2013, Goggle has been supporting VP9 for years, and Firefox has followed. All of them are founding members of the Alliance for Open media creating the new AV1 codec … 

which means larger file sizes will bog down the network, not to mention burn up processors when ingesting and attempting to send a large file in its entirety.

It is likely that the author here is putting webrtc in comparison with HLS here, and not RTMP, the former which mandates H.265/HEVC for encoding larger files, leveraging H.265 higher compression rate.

Moreover, with webrtc or rtmp, the encoding is done on the sender side. One never send the raw / pre-encoded file over the network, and there is no transcoding.

To support 4K or broadcast-quality 1080p60 resolutions, you’ll need to be able to transcode for playback on a variety of devices, while sending the highest-quality source to your transcoder.

Here again, this is a problem specific of HLS (or mpeg dash, any file=based system) and does not apply to WebRTC and RTMP.

The problem being addressed here is the following: if I have different viewers with different playback capacity, in terms of bandwidth, hardware, and display size, how can I manage to serve them all with the best quality stream they can handle with a single source.

In file-based systems, you will encode different chunks for different resolutions to be served. Depending on the available bandwidth, display size, and hardware, the player will chose which chunk to download. It’s usually called transcoding, because in those cases, the media source encode once, push the encoded media to a server which will transcode the input stream into different streams at different resolutions.

WebRTC has the equivalent mechanism, just real-time. Here again, several resolutions of the same source media are encoded and served to the server, and the player / receiving side then indicate to the server which resolution it wants to receive, in real-time. The simple version of this is achieved with simulcast (multiple separate encoder with different settings), the best version is achieved with SVC codecs. VP8 has supported simulcast and half-SVC for a very long time, and CoSMo has just provided Google a patch for the same feature with H.264, likely to be adopted by Apple and Mozilla right away.

Conclusion

WebRTC has evolved very quickly in the past years. It has the capacity today to replace Flash with an even lower latency. You can always slow things down, add recording, storage, and HLS transcoding as extra legs, but you will never be able to fasten HLS or current streaming solution to WebRTC latency.

Webrtc was needing some work at system level to be able to replace a full streaming infrastructure, especially in term of scalability, but if Xirsys and CoSMo’s milicast platform and ecosystem (OBS-studio-webrtc) is an example, the solution is already out there.

WebRTC already has the latest codecs implemented (VP9), codec much better than RTMP ever had, and in par in compression ratio with what HLS has. Given the extremely fast update rate of the browsers (6 weeks), it will get next codecs faster than other protocols (AV1 support in Firefox since version 59).

Don’t wait, don’t base judgement on faith, try it yourself.


H.264 finally a first class citizen in #WebRTC stacks.

$
0
0

VP8 and H.264 codecs are mandatory to implement to be webrtc compliant. Simulcast is a way to use multiple encoders at a time to provide different resolutions of the same media to chose from as a way to adapt to bandwidth fluctuations (and other good things). Unfortunately, while some patches were proposed some two years ago by some including HighFive, libwebrtc did not implement support for simulcast with the H.264 codec. H.264 was then a de-facto secondary codec, and Safari which only supported H.264, could not achieve the same level of adaptation (or quality) than VP8 and some other browsers could. This blog gives more details about the epic journey to get that done, the design of the implementation, and the impact for WebRTC products.

I. Sending media over the internet

Funny enough, if you look at the atomic steps you need to go through to send media over the internet, wether real-time or not (i.e. wether webrtc or HLS), you will end up with the above diagram. The “codec” only really touches on the clear blue section, while each half of the line (capturer to internet, internet to display respectively) is referred to as the “media engine”. You can see that there is much more about streaming media than just the codec.

In a previous post we spoke a little bit about the difference between codecs and media engine, you can take another look if you really want to go into the details:

#Webrtc Codec vs Media Engines: Implementation Status and why you should care.

In the case of pre-recorded media, or very low latency / slow streaming, you can optimise this, and it starts looking like an usual HLS / CDN solution (see below). In those cases, the first half of the line is not time sensitive at all, and this leads to some asymmetry between the preparation of the media (up to the upload to a CDN), and the serving of the media which is more time-sensitive.

A lot of people trying to optimise the pre-recorded content streaming will address either one of those half.

For example, in the webrtc community, peer5, streamroot.io and others will address the second half by offloading some of the CDN-to-viewer content to a webrtc’s datachannel-based p2p network created on-the-fly between viewers of the same media.

Meanwhile, in the codec community, AOMedia’s AV1, and any new codec for that matter, will address the entire by trying to find new codecs with better compression ratio, reducing the overall bandwidth need. While AV1 has SVC capacity, it is likely that it will not be used for bandwidth adaptation in those case. At AOMedia, the Real-Time group is a sub-group of the Codec group, illustrating the order of the priorities.

A lot of companies in the blockchain / dApps / web3 ecosystem, are taking this Streaming model and try to optimise separately different segments: generation, transcoding, storage, distribution, …. While most aim at pre-recorded content, a.k.a. VOD (VideoCoin, theta, livepeer, viewly, …), only a few can do live (5s delay) streaming  (livepeer, IPBC, …), and only one today is aiming at real-time (less than 1s, spankchain).

2. Bandwidth fluctuations and heterogeneous  viewers

In any case, you end-up having the same overall problems. We will focus on  the following two:

  • handling large bandwidth fluctuations,
  • handling large variations in the capacity of receiving clients,.

Before the avent of SVC codecs, the main solution was multiple encodings of the the same media at different resolutions on the sender side, and mechanism to chose which resolution would be used by the receiving side.

Streaming / Broadcast protocols like HLS or MPEG-DASH implemented this with transcoders, file-based chunking, and buffering, all which induced additional delay. Real-time protocols like Webrtc used “simulcast”.

Usual streaming design (HLS) encode once at the source, upload this high resolution media up to a transcoder, and then transcode into different segments of same duration, one for each resolution. Most of those streaming “live” through Youtube, Twitch, Dailymotion or any other streaming service use OBS-studio, which capture, encode and stream over flash to an ingress node, which in turn transcode.

Simulcast is doing exactly the same transcoders do for HLS, but directly at the source. That keeps things real-time (removing a decoding + encoding cycle in the process) at the cost of an overhead (CPU, upload bandwidth) on the sender side, as the source machine need to now have the same capacity an HLS transcoder otherwise has. 

Simulcast is for many the minimum acceptable to have industry quality media. Jitsi for example decided not to support Sfari iOS in their official SDKs because of lack of simulcast support with H.264, and is using a native client instead. For those interested in all the details, or those who wants to hear it from Emil itself, the recording is public, and the interesting part is around 18 minutes in.

3. Status of webrtc simulcast

So far, libwebrtc, the webrtc media engine implementation used in chrome, firefox and safari, did not support simulcast in conjunction with the H.264 codec. Apple supporting only H.264, it was leading to a poorer experience with H.264 in general, and with Safari specifically.

While a patch for extending libwebrtc engine simulcast implementation to H.264 had been proposed by HighFive almost two years ago (and used in their electron native client for just as long), and while it’s likely that many had done the same thing in their native apps, it had never been merged. Time had passed to the point that the underlying C++ class changes had been too great to actually even reuse this patch.

Starting anew, from the VP8 simulcast implementation, CoSMo’s Media Server Lead Sergio Murillo that most of you must know from his article on SVC wrote a patch, and submitted it for review around the end of march. 3 months, 45 reviews with as many rebasing, and quite a few face-to-face meetings all around the globe later, we are happy to announce that the patch was merged today! [ well, it’s been since reverted, but it’s usual in the google merging process to see a patch needing a couple of attempts to find its way in. ]

4. How does that impact my product, and what about the other browser vendors?

The main impact is that you will not have to choose between using the H.264 codec and having a good quality media engine anymore. Using chrome canary this week, and chrome stable in around 12 weeks, you will be able to send H.264 simulcast.

Firefox and edge have not commented on their intend to implement it. It is likely that firefox will adopt it soon, as their webrtc code uses libwebrtc media engine. Apple is planning to implement it during the web engine hackfest, first week of october, with CoSMo’s help.

Simulcast is a sender side feature used to address bandwidth fluctuation on the receiving side. That means you do not need any modification on the receiving side to support simulcast, only on the sender side, and possibly in your media server. That also means you do not need to wait until this is implemented in all the browsers, to benefit from it.

In the one-to-many streaming/broadcast use case, all you need is to make sure the sender uses chrome.  There is no additional work to do to support any browser, including safari on iOS.

For the many-to-many video conferencing use case, where all the clients need to be able to send *and* receive, you will have to wait for all browsers of interest to fully support it. It’s likely that very fast most media server will upgrade their support for this (they already support simulcast for VP8), and allow one not to bother at all.

At CoSMo, we already made our media server Medooze compatible with the changes, and all our products, services, and our customers’ are going to be upgraded before the end of the month, and tested right away thanks to KITE, the same tool that tests WebRTC implementation in all browsers today, and which is used by callstats.io to validate stats implementations in all browser, for example.

  • medooze media server
  • millicast.com real-time CDN (a XirSys collaboration)
  • spank.live, the first adult camming website with crypto paiement and a state channel implementation,
  • all our libwebrtc packages, Qt / obj-c / Java wrappers, and corresponding native and electron apps.

All those, and a lot more (likely including Janus and Jitsi) will be the first in their category to adopt the latest available technology, giving them an undeniable edge.

If you too, you want to get an edge over your competitors, use cosmo expert services, and/or products.

At this day and age where everybody and their mom claim to be WebRTC experts (and fearless visionary leaders), remember that talk is cheap, and ask to see or test the code!

Tokbox: Another WebRTC PaaS bites the dust

$
0
0

While many rejoice after Tokbox acquisition today, they do so in the same way France rejoice after the world cup: we won, but we’re not really proud about the way we won. Yes, it’s good to have a high-level acquisition in the WebRTC field, it had been while. But 35 Millions for Tokbox, something is terribly wrong there, and there is nothing to rejoice about. Unfortunately, this was somehow predictable. Let’s look at the details in this post.

La Haine is a french back and white movie whose scenario depicts the life of 3 individuals born in the wrong paris neighbourhood. To describe the inevitability of their destiny, they use the image of a man jumping from the 50th floor of a building and has each floor passes by tells himself “so far so good”. The man might not perceive its upcoming death, but it’s already inevitable.

The same could be said about Webrtc PaaS. While the growth of CPaaS was impressive already in 2015 (see here for several numbers), the growth of WebRTC PaaS (that only propose video and audio over the internet, and do not connect to telecommunication networks like CPaaS), has been lukewarm. In Singapore, the growth of the CPaaS Wavecell was in sharp contrast with the lack of noticeable results from the PaaS Temasys (see here).

NTT admitted publicly that most of the WebRTC PaaS were not making money.  Rumors on the street was that twilio never made money from “media-over-the-internet” either, but they don’t need to yet, and can make it a medium to long term play.  Again, according to rumours in the ecosystem, agora.io would be the only one profitable.

When a business is not profitable, and not strategic like for NTT, at one point, the end is clear. One year ago almost, we saw a surge in job candidates coming from Tokbox. After investigation, Tokbox would have then be given an ultimatum by Telefonica to be profitable within a year, or else. That year saw a lot of unusual efforts from the Tokbox team, with the introduction of professional services, and other hopefully revenue generating initiative.

Earlier this year, Telenor digital sold appear.in, whose original founder and main engineer had left already to a start a company called confrere, to a small company nobody had heard of before (in the WebRTC ecosystem). Now, Telefonica digital is selling Tokbox to Vonage. It’s not only selling it, but it’s selling it for approximately the same price it bought it (*) ! To add insult to injury, vonage had bought CPaaS Nexmo for 230M a couple of year back. Almost 7 times more. While the remaining Tokbox engineers (**) seem happy for the transition, it is likely to make some CEOs, founders and investors in PaaS less than happy.

The question remains, what is Vonage going to do with Tokbox. They surely got a great team and interesting IP at a bargain here, but how does that complement and improve their offer, and most importantly, how do they plan to make it a revenue generating part of their portfolio. Are they going to merge it into what was Nexmo? Are they going to make it a separate entity. At this stage it is not clear, and existing Tokbox customers are likely writing e-mails to Tokbox right now about that, and/or reaching out to the usual platform developers and integrators capable of providing them with a replacement if need be: Blacc Spot Media, webrtc.venture, or ourselves at CoSMo. Highly successful Nexmo original founders (both CTO and CEO), left Vonage a few months back, so one can wonder if Vonage has enough people with vision left to leverage such a WebRTC gem to its maximum. The brExit is likely to make HR matters complicated as well.

That also brings the question of the sustainability of the WebRTC PaaS Business model. With the most successful PaaS gone for nothing, while CPaaS are increasing revenue, getting funding, and going through successful IPO, the choice for investors is clear. For PaaS, there are still some options left:

  • go big or go home, like agora.io.
  • pivot to become a CPaaS.
  • sell (find a big brother with a need for Video only).
  • close.

Unfortunately, if you choose to go big, you will face an uphill battle. You might manage to get some of Tokbox customers, but this is a crowded space, and the competition is fierce. Tokbox announcement tells us that PaaS business alone is not sustainable.

If you choose to pivot to become a CPaaS, even if you could it quickly, the rest of the world hasn’t waited for you, and you will need to go against Twilio and other players with either a better tech portfolio, or a more established base locally.

Selling, if you could find a buyer, would be at a loss. Given the Appear.in and Tokbox price references, nobody in their right mind would pay more than what you invested in the business, if even.

As a conclusion, the Tokbox sale is the last nail in the WebRTC PaaS business model. It will define a big part of WebRTC ecosystem for the time to come. I’m a big fan of Tokbox, they have a lots of very gifted individuals in their team, and I’m wishing them best luck within Vonage. I bet many will look very closely at what happens next, as it can move the WebRTC market, both buyers and investors mindsets, in quite a few different directions in the short future. Interesting times!

(*) The amount Telefonica paid to acquire Tokbox is not known exactly, but many agree to evaluate it around 30~35M, since at the time, Tokbox had received 26 ~ 33M in funding.

(**) Some Tokbox engineers, on lease from Telefonica, took advantage of Telefonica restructuration plan a year back to get a package and leave. Most notably, Gustavo Garcia, one of the lead architect, left to house party.

CoSMo Software is now 2 years old! #webrtc

$
0
0

We are really laser focussed on delivering our projects and preparing for the upcoming standard committee meeting. With all of that going on, it took several message from linkedin to congratulate me for my work anniversary to realise the truth: CoSMo was 2 years old. I’d like to take a moment to look back at what we have achieved.

The web engine hackfest was last week in Spain, and CoSMo contributed to bring H.264 simulcast to WebRTC in general and to webkit and safari specifically. That collaboration with Apple Engineer also allowed to improve the patch by adding support for hardware acceleration on macOS and iOS.

We did not have a lot of time to re-adjust from the shock of learning that Apple had actually added VP8 (modulo one file) support as well. 🙂

IPTComm / IIT-RTC is next week, and Dr Ludo, our Chief Scientific Officer will present the first comparative load testing of open source webrtc SFU! How do Mediasoup, Jitsi, medooze, Janus and Kurento Media server compare when used in a single server configuration for Video Conferencing?

The W3C technical plenary meeting, the most important W3C of the year, is in two weeks in Lyon. WebRTC NV will be discussed with a lot of interesting features: end-to-end encryption, media over QUIC, codec-agnostic frame marking to bring SVC to all codecs, ….. CoSMo will as usual present an update on Webrtc testing, while participating on the design of all the other features as well. 

The last IETF meeting of the year in BKK is the first week of November. There will be interesting discussion there as well, even though QUIC will likely missing the self-imposed deadline for standardisation.

The week after, CoSMo will present Millicast.com, a new video streaming service / Video webrtc CDN at “live streaming west” in Huntington Beach, CA to a completely new audience: the streaming world, which is quite new to WebRTC. For a word in which RealTime is dead or agonising (Flash), and in which the only hope seems to reside in file-based, HTTP-as-a-transport solutions with 30s average latency, webrtc might appear as an heresy 🙂

With all of that going on, it took several message from linkedin to congratulate me for my work anniversary to realise the truth:

CoSMo is 2 years old!

When I look back at those 2 years, I’m very proud of what we have achieved as as a business and as team without any kind of investment or funding.

As a business, we went from 1 to 12 employees in 2017, to around 30 now, from 1.2M revenue  in 2017 (*), closing on 3M (P&L) for this year.

Opening a subsidiary in the Philippines for support late 2017, acquiring Meedoze in Spain early 2018, and creating an American Joint Venture with other WebRTC experts Xirsys, millicast.com, a dedicated to video streaming, in Q3 were just but the most recent external signs of growth for CoSMo.

As team, we have managed to build up an extraordinary technology stack: KITE for testing, Medooze for the media server (acquisition), Millicast as  a streaming service (JV), and a tons of yet-to-be-named tools to facilitate the integration of the WebRTC stack in any client software.

In the meantime, we push the boundaries of WebRTC and redefine the possible by publishing scientific results (here), participating to standard committees, and contributing code to most browser vendors. We have a great team, dedicated to do the right thing: advance science and technology, contribute some results through peer-reviewed international conferences and publications, and make our customers benefit from our technical advances.

I’d like to thanks my incredible team, whose passion and dedication humbles me everyday, our customers for their support without which we wouldn’t be here today, and all our close collaborators for all the incredible exchanges and projects: Meetecho, medooze, Xirsys, the Jitsi Team … . Let’s make next year even better. 

(*) In Singapore, the financial results of a company are public. One can go to the Accounting and Corporate Regulatory Authority’s website, in the biz file section, and get the info on any Singapore registered company, public or not, for a few dollars, so I consider our results public. In the webrtc space, for the curious, you can also check the results of Temasys and WaveCell. The comparison is striking.

#WebRTC Video Quality Assessment

$
0
0

How to make sure the quality of a [webrtc] video call, or video streaming is good? One can take all possible metrics from the statistic API, is still be nowhere closer to the answer. The reasons are simple. First most of the statistic reported are about network, and not about video quality. Then, it is known, and people who have try also know, that while those influence the perceived quality of the call, they do not correlate directly, which means you cannot guess or compute the video quality based on those metrics. Finally, the quality of a call is a very subjective matter, and those are difficult for computers to compute directly. 

In a controlled environment, e.g. in the lab, or while doing unit testing, people can use reference metrics for video quality assessment, i.e. you tag a frame with an ID on sender side, you capture the frames on the receiving side, match the ID (to compensate for jitter, delay or other network induced problems) and you measure some kind of difference between the two images. Google hasfull stack teststhat do just that for many variations of codecs and network impairments, to be run as part of their unit tests suite.

But how to do it in production and in real-time?

For most of the WebRTC PaaS use cases (Use Case A), the reference frame is not available (it would be illegal for the service provider to access the customer content in any way). Granted, the user of the service could record the stream on the sender side and on the receiving side, and compute a Quality Score offline. However, this would not allow to act on or react to sudden drops in quality. It would only help for post-mortem analysis. How to do it in such a way that quality drops can be detected and act on in real-time without extra recording, upload, download, ….?

Which webrtc PaaS provides the best Video Quality in my case, or in some specific case? For most, this is a question that can’t be answered. How can I achieve this 4×4 comparison, or this zoom versus webrtc, in real time, automatically, while instrumenting the network?

CoSMo R&D came up with a new AI-based Video Assessment Tool to achieve such a feat in conjunction with its KITE testing engine, and corresponding Network Instrumentation Module. The rest of this blog post is unusually sciency so reader beware. 

INTRODUCTION

First experiments of Real-Time Communication (RTC) over Internet started in 1992 with CU-SeeMe developed at Cornell University. With the launch of Skype in August 2003, RTC over Internet quickly reached a large public. Then from 2011, WebRTC technology made RTC directly available on web browsers and mobile applications.

According to the Cisco Visual Networking Index released in June 2017 [1], live video traffic (streaming, video conferencing) should grow dramatically from 3% of Internet video traffic in 2016 to 13% by 2021, which translates as 1.5 exabytes (1 exabyte = 1 million terabytes) per month in year 2016, growing to 24 exabytes per month in year 2021.

As for any application that deals with video, Quality of Experience (QoE) for the end user is important. Many tools and metrics have been developed to assess automatically the QoE for video applications. For example, Netflix has developed the Video Multimethod Assessment Fusion (VMAF) metric [2] to measure the quality delivered by using different video encoders and encoding settings. This metric helps to assess routinely and objectively the quality of thousands of videos encoding with dozens of encoding settings.

But it requires the availability of the original reference non distorted video to compute the quality score of the same video distorted by video compression. This method, well adapted to video streaming of pre-recorded content where original non distorted video is available, cannot be applied to RTC where, usually, the original video is not available.

One may propose to record the original video from the source side before encoding and transfer to the remote peer(s), but then video quality assessment cannot be done in real-time. In addition, recording of live videos during a real-time communication pose the problem of legal and security issues. For these reasons, the entity performing video quality assessment, for instance a third party platform-as-a-service (PaaS), might not be authorized to store the media. In that case, the analysis of the video to assess its quality is still doable as long as it happens while the video is in memory. It must not be recorded and stored on a disk, which prevents usage of any kind of reference when assessing quality.

Therefore, the special case of RTC cannot be solved by metrics requiring the reference video. So it is necessary to use other metrics that able to assess the quality of a video without requiring access to a reference. Such metrics are known as No-Reference Video Quality Assessment (NR-VQA) metrics.

I. Video Quality Metrics

Video quality assessment techniques can be classified into three categories.
First, there are full-reference (FR) techniques which require full access to the reference video. In FR methods, we find the traditional approaches to video quality: Signal-to-Noise Ratio (SNR), Peak Signal-to-Noise Ratio (PSNR) [3], Mean Squared Error (MSE), Structural SIMilarity (SSIM) [4], Visual Information Fidelity (VIF) [5], VSNR [6] or the Video Quality Metric tools (VQM) [7].

These metrics are well-known and easy to compute, but they are a poor indicator of quality of experience as shown by many experiments [8,9].

Then there are the reduced-reference (RR) techniques which need a set of coarse features extracted from the reference video.

At last the no-reference (NR) techniques do not require any information about the reference video. Indeed, they do not need any reference video at all.

A comprehensive and detailed review of NR video quality metrics has been published in 2014 [10]. A more recent survey of both audio and video quality assessment methods has been published in 2017 [11]The metrics are classified into two groups: pixel-based methods (NR-P) that are computed from statistics derived from pixel-based features, and bitstream methods (NR-B) that are computed from the coded bitstream.

II. Previous Efforts for WebRTC Video Quality Assessment.

A first initiative about evaluating video quality of a broadcast to many viewers through WebRTC has been proposed in [12]. For this experiment, the authors use the structural similarity (SSIM) index [4] as measurement of video quality. The aim of the test is to measure how many viewers can join to view the broadcasting while maintaining an acceptable image quality. The results are not conclusive at assessing precisely the user experience. As the number of viewers joining the broadcast increases, the SSIM measure remains surprisingly stable with values in the interval [0.96, 0.97]. Then suddenly, when the number of clients reaches approximately 175, SSIM drops down to values near 0. It is unlikely that the user experience remains acceptable without loss in quality when there is an increase from 1 to 175 viewers. Besides, the test has been performed using fake clients that implement only the parts responsible of negotiation and transport in WebRTC, not the WebRTC media processing pipeline, which is not realistic to assess video quality of a broadcast experiment.

In [13], the authors evaluate various NR metrics on videos impaired by compression and transmission over lossy networks (0 to 10\% packet loss). The eight NR metrics studied are complexity (number of objects or elements present in the frame), motion, blockiness (discontinuity between adjacent blocks), jerkiness (non-fluent and non-smooth presentation of frames)), average blur, blur ratio, average noise and noise ratio. As none of these NR metrics is able to provide accurate evaluation of the quality of such impaired videos, they propose the use of machine learning techniques to combine several NR metrics and two network measurements (bit rate and level of packet loss) for providing an improved NR metric able to give video ratings comparable to those given by Video Quality Metric (VQM), a reliable FR metric providing good correlation with human perception. For this experiment, they used ten videos obtained from the Live Quality Video Database. These videos have been compressed at eight different levels using H.264, and impaired by transmission over a network with twelve packet loss rates.
They assessed the quality of their results against the scores given by the FR metric Video Quality Metric (VQM) [14], but not against NR metrics.

In [15], the authors rely on many bitstream-based features to evaluate the impairments of the received video and how these impairments affect perceptual video quality.

The paper [16] presents a combination of audio and video metrics to assess audio-visual quality. The assessment has been performed on two different datasets.
First they present the results of the combination of FR metrics. The FR audio metrics chosen by the authors are the Perceptual Evaluation of Audio Quality (PEAQ) [17] and the Virtual Speech Quality Objective Listener (ViSQOL) [18]. As for the FR video metrics, they used the Video Quality Metric (VQM) [7], the Peak Signal-to-Noise Ratio (PSNR) and the Structural SIMilarity index (SSIM) [4].
Then they present the results of the combination of NR metrics. The NR audio metrics are the Single Ended Speech Quality Assessment metric (SESQA) and the reduced SESQA (RSESQA) [19]. For the NR video metrics, they used a blockiness-blurriness metric [20], the Blind/Referenceless Image Spatial Quality Evaluator (BRISQUE) [21], the Blind Image Quality Index (BIQI) [22] and the Naturalness Image Quality Evaluator (NIQE) [23]. The best combination for both datasets is the blockiness-blurriness with RSESQA.

A recent experiment to estimate the quality of experience of WebRTC video streaming on mobile broadband networks has been published in [24]. Different videos of various resolution (from 720×480 to 1920×1080) have been used as input for a video call through WebRTC between Chrome browser and Kurento Media Server. The quality of WebRTC videos has been assessed subjectively by 28 people giving a score from 1 (bad quality) to 5 (excellent quality). Then authors made use of several metrics, all based on errors computed between the original video and the WebRTC video, to assess objectively the quality of WebRTC videos. Unfortunately, the authors do not report clearly if there is a correlation between the subjective assessment and the objective measures computed.

III. NARVALNeural network-based Aggregation of no-Reference metrics for Video quAlity evaLuation.

III.1 Methodology

There are two main parts in this work: first, the extraction of features from videos representative of Video conferencing use case (as opposed to pre-recorded content used by e.g. netflix), then the training of a model to predict a score for a given video. We used six publicly available video quality datasets containing various distortions that may occur during a video communication to train and evaluate the performance of our model.

NARVAL TRAINING: Dense Deep Neural Network Graph

For the feature extraction part, we selected metrics and features published and evaluated on different image quality datasets. After calculating them on the videos of our databases, we stored the data to be able to reuse them in the training part. The data can then be processed to serve for our training model for example taking the mean of a feature over the video.For the second part, we used different models of regression, mainly neural network with variation of the input and the layers, and also support vector regressor.

We tested multiple combinations of parameters for each model and only kept the best for each category of model. Convolutional, recurrent and time delay neural networks were used in addition to the most basic ones.

NARVAL TRAINING: 3D Convolutional Network Graph.

We trained our model on the databases using a 5-fold fit and then repeating the training multiple times. As each database contains multiple distortions, we cannot just split the folds randomly, thus we tried to choose the 5 fold so that all distortion exists in a fold and we kept the same distribution for all tests. Only the mean the folds will then be taken into account.

Another approach to build a fold is to make a video and its distortion a fold. Using this method the folds are much smaller and the validating fold is completely new to the model.

III.2 Results

The results were first validated against a training set, i.e. a set with known scores, to see if our computed video quality was matching the known value, as illustrated below.For sanity check, we then computed the score provided by NARVAL agains the SSIM and WMAF scores on the same reference video. We can see that while not exactly equivalent, the score exhibit the same behaviour. Funny enough, it also illustrates a result known in the image processing community but apparently counter-intuitive in the WebRTC community: the Perceived video quality does not decrease linearly with the bitrate / bandwidth. You can see on the figure below, that to reduce the quality by 10%, you need to reduce the bandwidth by a factor 6 to 10 !

Conclusion

Practically it means that you can now use NARVAL to compute the Video Quality in the absence of reference frame or video! It opens the door to much simpler implementations in existing use case, and to a lot of new use cases where the quality can be assessed at any given point of a streaming pipeline. 

The full Research Report is available from CoSMo. CoSMo also provides licenses to two implementations: one python implementation more for research and prototyping, and one C/C++ implementation for speed and SDK embedding. Eventually, the video quality assessment will be proposed as a service, not unlike the AQA Service by Citrix was built on top of POLQA.

Narval has already been added to the KITE testing engine [25], to enable evaluation of the video quality of Video Services under all kinds of network and load conditions.

KITE is the only webrtc testing solution that allows you to test Desktop and mobile clients (wether browsers or native apps). Its Network instrumentation module allows to programmatically control network features, separately client by client, server by server to bring together all kind of heterogeneous test beds. It allowed CoSMo to conduct the first Comparative load testing of open source WebRTC Servers [26]. If you are interested in having this capacity in house, or to have us run tests for you, contact us.

Bibliography

[1] – Visual Networking Index, Cisco, 2017.
[2] – Toward A Practical Perceptual Video Quality Metric, Netflix, 2016.
[3] – Objective video quality measurement using a peak-signal-to-noise-ratio (PSNR) full reference technique, American National Standards Institute, Ad Hoc Group on Video Quality Metrics, 2001.
[4] – Image Quality Assessment: From Error Visibility to Structural Similarity, Wang et al., 2004.
[5] – Image information and visual quality, Sheik et al., 2006.
[6] – VSNR: A Wavelet-Based Visual Signal-to-Noise Ratio for Natural Images,
chandler et al., 2007.
[7] – A new standardized method for objectively measuring video quality, Margaret H. Pinson and Stephen Wolf, 2004.
[8] – Mean Squared Error: Love It or Leave It? A new look at Signal Fidelity Measures, Zhou Wang and Alan Conrad Bovik, 2009.
[9] – Objective Video Quality Assessment Methods: A Classification, Review, and Performance Comparison, Shyamprasad Chikkerur et al., 2011.
[10] – No-reference image and video quality assessment: a classification and review of recent approaches, Muhammad Shahid et al., 2014.
[11] – Audio-Visual Multimedia Quality Assessment: A Comprehensive Survey,Zahid Akhtar and Tiago H. Falk, 2017.
[12] – WebRTC Testing: Challenges and Practical Solutions, B. Garcia et al., 2017.
[13] – Predictive no-reference assessment of video quality, Maria Torres Vega et al., 2017.
[14] – A new standardized method for objectively measuring video quality, Margaret H. Pinson and Stephen Wolf, 2004.
[15] – A No-Reference bitstream-based perceptual model for video quality estimation of videos affected by coding artifacts and packet losses, Katerina Pandremmenou et al., 2015.
[16] – Combining audio and video metrics to assess audio-visual quality, Helard A. Becerra Martinez and Mylene C. Q. Farias, 2018.
[17] – PEAQ — The ITU Standard for Objective Measurement of Perceived Audio Quality, Thilo Thiede et al., 2000.
[18] – ViSQOL: The Virtual Speech Quality Objective Listener, Andrew Hines et al., 2012.
[19] – The ITU-T Standard for Single-Ended Speech Quality Assessment, Ludovic Malfait et al., 2006.
[20] – No-reference perceptual quality assessment of {JPEG} compressed images, Zhou Wang et al, 2002.
[21] – Blind/Referenceless Image Spatial Quality Evaluator, Anish Mittal et al., 2011.
[22] – A Two-Step Framework for Constructing Blind Image Quality Indices, Anush Krishna Moorthy and Alan Conrad Bovik, 2010.
[23] – Making a “Completely Blind” Image Quality Analyzer, Anish Mittal et al., 2013.
[24] – Quality of Experience Estimation for WebRTC-based Video Streaming, Yevgeniya Sulema et al., 2018.
[25] – Real-time communication testing evolution with WebRTC 1.0, Alexandre Gouaillard and Ludovic Roux, 2017.
[26] – Comparative study of WebRTC Open Source SFUs for Video Conferencing, Emmanuel Andre et al., 2018

Libwebrtc is open source, how hard can it be.

$
0
0

Recent discussions with several parties, make me realise that the first steps to master webrtc are not yet documented enough. Once, as a challenge, I asked a master student doing his graduation project with us to try to recompile and example provided with libwebrtc separately from the compilation of libwebrtc itself. Basically to make a project with the same example source code but that would link against the pre-compiled library. 5 months later, it was still not successful. Thanks to our internal tool that I shared then, we eventually did it in two weeks. This post is about the usual ordeal people have to go through to understand the state of affair, and of course how CoSMo can shield you from that.

libwebrtc from webrtc.org

libwebrtc, the Implementation behind Chrome, Firefox, Safari, Edge and many other’s WebRTC API, is the de facto standard implementation for WebRTC compliant clients.

Unfortunately, libwebrtc is one of a kind. Designed to be fully integrated in the chromium build and test process, it includes some feature that help 1,000 developper strong team remaining agile, but that goes on the way of the usual practice for smaller teams:

  • no design document,
  • no backward compatibility,
  • no stable release, only sync’ed release,
  • specific (undocumented) build tool chain,
  • build tool design that requires everything to be build in one pass,
  • 200 commits a week
  • no assurance of commit stability
  • lack of updated examples except for those which are shipped along

As a result, the usual experience with libwebrtc goes like this:

    1. The golden path [3 ~ 5 days]
      1. go to webrtc.org,
      2. figure out the exact development environement needed (esp on windows),
      3. get the code sync’ed for a specific revision,
      4. compile it,
      5. test with appRTC or peerconnection_client.
    2. Emancipation: try to compile your own project and link against libwebrtc [weeks]
      1. Simplest way: take out the source code of peerconnection_user and try to compile it separately
      2. Deal with include paths,
      3. Deal with C++ standard (98,11,14, 17),
      4. Deal with C++ standard implementation lib mismatch,
      5. Deal with missing preprocessor definitions (WEBRTC_WIN, WEBRTC_POSIX, …),
      6. Deal with missing JSON symbols,
      7. Eventually succeed in barely reproducing something that was already working.
    3. Starting to deviate from the golden path: the quest for GN GEN args
      1. Try to make your own capturer, needs rtti,
      2. add support for H264,
      3. add support for ccache,
      4. add cupport for cross-compilation (ios, android),

all of the above, and some more, deviate from the official build, but has existing, undocumented flags for them.

    1. Alone in deep water
      1. Any modification to the C++ code,
      2. Changing external libs (OpenSSL, libsrtp, …),
      3. Adding a new codec,
      4. Compiling for an unsupported platform (win32 for IE plugin, …),
      5. Compiling the C++ API for arm64 (for Qt, …),

All of the above are quite reasonable things to do, but still excrutiating hard to do in libwebrtc.

    1. the final blow: the ever recurring 6 weeks deadline

Once all of this has been figured out, you have a version of your app linkind against libwebrtc (or yo ugave up already). And then, chrome releases a new version, which breaks backward compatibility, which makes your app basically unusable … Since libwebrtc is not a simple single git repository, but an aggregated collection of them, rebasing is not an option.

In a less critical scenario, nothing breaks, but you do not benefit from the bug fixes, improvements and new features the 20+ companies, including 120 google engineers, added to the source code since last release.

CoSMo Tools for easier webrtc

libwebrtc packages

We provide pre-compiled, packaged versions of libwebrtc, so you can focus on your app.

  • All the operating systems supported by webrtc.org are supported,
  • All libraries needed in addition of libwebrtc are packaged along,
  • CMake configuration files are provided to deal with compatibility:
    • check your compiler and build tool for compability,
    • set up all include and lib paths,
    • set compiler flags,
    • set preprocessor Definitions,
  • The builds are tested first using the google’s own unit test suite,
  • Then additionnal battery of tests which compile full applications are run,
  • Finally, we run those application against web app in browsers for interoperability tests thanks to KITE,

Advanced packages and tools

We provide customized builds of libwebrtc with different options, e.g.

  • Real end-to-end encryption (a.k.a. double encryption),
  • Watermarking,
  • G.729, H.265, AV1,
  • Custom Crypto (custom OpenSSL and libsrtp),
  • …,

We provide bindings and wrappers for webrtc into popular client libraries to make it even easier for you to use libwebrtc. not need to understand threading models, or COM models. Moreover, an update of libwebrtc itself would be transparent to the wrappers.

  • Qt wrappers, desktop and mobile,
  • Electron builds,
  • React Native,
  • IE 11 plugin,
  • …,

Full white label apps and single server Systems

We maintain a suite of white labels apps that work either p2p or with an SFU, with a given signalling server. It feeds our test suite, and is the perfect starting point for Proof-of-concepts.

  • appRTC, p2p connection, web app + native app desktop and mobile,
  • Janus Video Room, SFU connections, web app + Electron + Qt app + Mod. OBS-Studio,
  • Meedoze, SFU connection, web app + Electron + native all platforms + Qt app + Mod. OBS-Studio,
  • Jitsi, SFU connection, web app + Electron + react-native ios and android,

Do not hesitate to contact us for help and pricing!

QUIC is the future of #WebRTC ….. or is it?

$
0
0

QUIC has been the source of all the heat during the internet meetings for the past two years at least, after being kind of restricted to a smaller group since its existence became known in 2013. There is a reason for that, QUIC as a transport protocol takes the best of TCP and the best of UDP, add encryption for security, multiplexing for speed, and bring additional improvements to make sure deployment will be fast on top of existing equipment, and updates (in user land) will be faster than before (in the kernel). For those interested in a first overview (possibly slightly biased as indicated in the respective sections) wikipedia is always a good start. For the hardcore Fans, the corresponding IETF working group and mailing lists will satisfy all your desire for detailed information. The latest drafts (#16), are hot out of the press as IETF 103 is ongoing in Bangkok this week.

Introduction

Even though you may think a transport protocol should be developed independently of the application running on it (the good old OSI model commitment), the history of QUIC is entangled with the one of HTTP/2 and the mapping of HTTP/2 on top of QUIC semantic is being evolved almost in parallel. Arguing of the need to converge in time for IETF 103, the QUIC working group actually requested the ongoing work to be restricted to that single use case. The subjects is hot, strategic, and a lot of money is involved, which surely explain the 16+ different implementation available today from several prominent actors of the internet.

The main apparent players behind QUIC are of course the web companies, but also and not surprisingly, the CDNs. Akamai is a big adopter of the technology and has employees as authors of many of the specs. 

The general media over the internet is de facto cut into two ecosystems: the broadcasting world and the real-time world. On the former, most of the distribution is file-based and HTTP based, and the focus on HTTP/2 over QUIC is logical. On the later most of the communications are RTP-based (RTSP/RTCP/SRTP/WebRTC/…). The exact same dichotomy is present within AOmedia with respect to AV1 for example.

There is a real question about RTP and QUIC that would require additional investigation: should we keep RTP for real-time media, or shall we drop it, knowing that some of the mechanisms in RTP will be redundant with those in QUIC? If we keep it, how do we map RTP on top of QUIC semantics and frames, and how do we multiplex all those protocols. If we drop it, how do we manage the mechanisms specific to media that are NOT in QUIC?

As usual in consensus driven group, one can request but not impose, and several groups and individuals with specific interest on (Real-Time) Media over QUIC have elected to spend their time on what was important to them, we can assume without any intend to delay the work of the QUIC group. 

Here are only the initiatives I am aware of, there are likely more.

A. Coming from ORTC, some have made an early implementation of a QUICTransport and a QUICStream whose code is available in the Chromium code base. The goal was to experiment with the transfer of data only and not media.

B. To prepare for more flexible pipelining in the media stack, as presented at the interim meeting in Stockholm, the Google team is pushing slowly more modular classes in WebRTC to allow one to bring its own encoder, encryption, media transport, network transport. You can also see work being done on a generic RTP packetizer ….

WebRTC Issues related to NV
– Support attaching RTP extensions differently to the first packet of a video frame layer

https://bugs.chromium.org/p/webrtc/issues/detail?id=9680

– Refactor classes representing encoded video frames
https://bugs.chromium.org/p/webrtc/issues/detail?id=9378
– Reduce number of classes representing video codec configuration to a reasonable number.
https://bugs.chromium.org/p/webrtc/issues/detail?id=6883
–  Integrate Per Frame Encryption Interface Into WebRTC

https://bugs.chromium.org/p/webrtc/issues/detail?id=9681

– Implement pluggable media transport

https://bugs.chromium.org/p/webrtc/issues/detail?id=9719

– Add picture_id to generic RTP packetizer format.

https://bugs.chromium.org/p/webrtc/issues/detail?id=9582

WebRTC Patches in Review:

– Adds integration of the FrameEncryptor/FrameDecryptor into the MediaChannel.

– Add video support to media transport interface.

– Interface for media transport

– Adds the Java interface points for FrameEncryptor/FrameDecryptor
https://webrtc-review.googlesource.com/96865

C. On a separate effort, The chairman of RMCAT working group collin perkins, which deals with among other things bandwidth estimation and congestion control, with another individual from call stats.io wrote a very interesting paper describing  in a very structured way the problems of both direct-media-over-QUIC and RTP-over-QUIC.

D. The AVTCORE group, who is in charge of everything RTP, is looking into the detail of multiplexing QUIC with all the other protocols RTP needs to support. The usual suspects are authoring this draft: peter T. from google (ORTC), Bernard A. from Microsoft (ORTC), and collin perkins mentioned before.

E. The TAPS working group is focussing on supporting QUIC as one of the transport of their protocol, and has reached out through the voice of pauly tommy (Apple) for feedback.

Bear in mind: every single one of those groups have a different end goal, and within some of the group you can also have divergences. The number of use cases for QUIC is equivalent to the number of use cases for UDP and TCP put together. Of course, for everyone, THEIR use case is the most important and should prevail.

Let’s take an example and list roughly the positions we can see emerging from the “webrtc ecosystem”?

Stop throwing new things in, finish 1.0.

This is the very explicitly stated position of many, including Apple. Different people have different reasons for that. The W3C working group is looking at the end of the current charter approaching fast (18 months), and the late implementation of unified plan and all the APIs needed for simulcast make testing simulcast difficult. As stated last week in Lyon: “Simulcast is a mountain whose summit elevation is unknown. We’re not only fighting an uphill battle, but we have actually almost no way to evaluate how long it’s gonna take.” For W3C staff, as well as the chairman and editor, this is a major concern. Apple and some other vendors would like to stabilise webrtc 1.0 as well, and some have expressed working on other things including QUIC as taking from the limited time of the Chairs and editors.

QUIC is simply not mature enough

has been the Mozilla position in the WebRTC group for all of 2018, expressed many times, but in particular during the interim face-to-face meeting in Stockholm in may, and at TPAC In Lyon two weeks ago. Those who disagree argue that the chair of the QUIC group, a Mozilla employee, is aiming at Q4 2018 for standard documents, and that other group should not wait too much to investigate what it would take to adopt it, so the decision about wether the WebRTC working should adopt it or not be an informed decision. For others, QUIC is already happening and if the webRTC group does not make its own decision, it will be made for them. (Same argument has been brought forward for SVC codecs.)

There seems to be a consensus from all parties involved anyway that an unreliable mode (more UDP than TCP behaviour) would be needed for media over QUIC, which is not on the table today. The latest unidirectional QUIC streams have also broken a few things.

My techno that I love is working, I wrote the library, do not replace it by anything, ever …

It is human nature to push back against changes, and it should be expected here again. The investment to reach the level of expertise to participate in the WebRTC standard and its implementation is important, and the time spent by some on writing a software project, their project, THE project, make some very emotionally involved.

Different people express their emotions in different way, between blind reject, to cautious reject. It usually translates into “There is no reason to stop using my techno” to “I’m waiting to see a real use case that would justify moving away from something that works today.” Except that, there might never be any use case other than consolidation of underlying protocols, or “X decided so”, X being a sufficiently powerful entity to make it happen in any case.

My personally point of view is slightly more nuanced, but I stay on the side of caution:

  • QUIC is the future, we could delay it, but we can’t avoid it. So was WebRTC at one point.
  • moving directly away from RTP would leave a LOT of existing webrtc infrastructures without any interoperability capacity. This is, IMHO, too brutal an approach. The team behind QUIC had originally spent a lot of time to put into the design practical choices so that QUIC can be an incremental enhancement of current techno (UDP based), to be sure the adoption could be fast. I’m trusting them to have the same practical approach for Real-Time Media Over QUIC. Here, I hope the feedback from webrtc Media Server implementors will be taken into account.
  • The compromise could be to provide both choices, i.e. an update path for existing RTP infrastructure with an implementation of RTP over QUIC, while proposing real-time media over qui directly for those who want it (native apps, …)
  • Note that WebRTC 1.0 is not going away, so one will always be able to generate and consume (DTLS-S)RTP streams, as well as SCTP streams. 

Conclusion:

I think we illustrated that even within WebRTC, there are several opinions when it comes to QUIC. If you add all the other opinions from all the other groups involved in QUIC, some of them cited above, that makes a lot of people trying to be the captain and move the boat in a different direction.

Both IETF and W3C are consensus-based organization, to a different degree. No opinion is ever suppressed, everybody is free to open ticket, ask questions on the mailing list, and to request time in the agenda during the meeting. You will be heard and read. However, you need to manage your expectation about your capacity to do anything beyond that.

You would need to rally enough people to your opinion to reach a consensus, and that’s a time consuming task, which require specific, non-technical skills. You need to convince people your point is not only valid, but worth of them spending their time on it. Basically you have to go around and convince everyone what you propose is better FOR THEM. That means you need to invest the time first to understand what they want, and what they don’t want, and take this into account in the final proposal so their interests are aligned. You need to create goodwill. This is more diplomacy than technology.

If your scope is too narrow, you will not have enough people to reach the critical mass. If you do not compromise, ditto. If you ask other people to spend time to help you (find use case, prove you wrong/right, ….) they won’t. They already for most do not have time enough to deal with everything they need to do. Most of this work is done in parallel of the meeting and constitute the core difference between those actually attending the meetings in the flesh, and those attending remotely.

One clear sign of failing “diplomacy” is when nobody answers your e-mail or your questions. Only the editors and the chairs have the duty to answer, however, the activity generated in the different official venues (GitHub of the spec, mailing list, meetings in the case of WebRTC), is a good proxy of the interest of the group and/or the potential for consensus to the chair/editors to base their decision on. 

Mozilla’s Adam roach has always been my model there, and you can see some of its skills in action here:

“the hallmark of a good compromise is that nobody leaves happy, but everyone can force themselves to accept it”

HOT ‘n’ EASY WebRTC FUZZ

$
0
0

by invited guest blogger: Sergio Garcia Murillo

Image result for hot fuzz

There has been quite some buzz (and fuzz) in the webrtc world recently due to a serie of great posts from Natalie Silvanovich, Project Zero in which she demonstrated the power of end-to-end fuzzing to find vulnerabilities and bugs in webrtc stacks and applications, including browsers, FaceTime, WhatsApp….

Inspired by the above wonderful but admittedly complicated work, we decided to investigate how we could provide a 

 

much easier to use fuzzer to stress test all webrtc apps with. Combined with the automation and interoperability testing capacities of KITE, it would become the perfect stress tool for the entire WebRTC community.

Let’s address the elephant in the room. Our goal was not to replace unit test fuzzing which is guaranteed to give better results than our solution.

Unit test fuzzing, and LLVM project’s implementation libfuzz, is the best way to go at it. While very well documented (*), using libfuzz, or reproducing the process followed by project-zero require great computer science skills, as well as time and resource that most projects simply don’t have. 

Practically, not all teams have resources in-par with Google and can afford to spend weeks (or months) to create either unit tests for every single line of code, nor mockups for all functionalities to enable end-to-end testing.

We wanted something usable right-away out-of-the-box without any modification of your existing client app. We wanted non-coder to use it. We wanted CEOs to use it. We wanted shareholders preparing for an Extraordinary General Meeting with a CEO claiming to have battle-tested technology to be able to verify the claim in 15 mn max, by simply loading a webpage. We want to redefine “easy testing” of webrtc applications and systems.

FUZZING OF WEBRTC

According to Wikipedia, a fuzzer can be categorized as follows:[1]

 

  • A fuzzer can be generation-based or mutation-based depending on whether inputs are generated from scratch or by modifying existing inputs,
  • A fuzzer can be dumb or smart depending on whether it is aware of input structure, and
  • A fuzzer can be white-, grey-, or black-box, depending on whether it is aware of program structure.

We have decided to implement a dumb mutation-based black-box fuzzer inside a webrtc-capable browser.

Typical webrtc attack vectors for an SFU are:

  • Signalling, SDP O/A,
  • DTLS, SRTP
  • ICE
  • RTP, RTCP
  • SCTP
  • Codec payloads

We decided to focus our attacks against the RTP and RTCP protocols, as well as the corresponding encoded media payload part. The reasons are manifold:

  • Most SFUs uses standard libraries for SRTP/DTLS and ICE, which are extensively tested. There would be but poor targets.
  • Equally, signaling and SDP O/A are very often parsed and/or serialized, which are two operations very well suited for unit test fuzzers.
  • Most of the workload on an SFU is RTP/RTCP packets management and corresponding encoded media payload inspection.

Our fuzzer works by mutating a certain amount of bits within the RTP and RTCP payloads before the SRTP encryption is applied. We don’t randomize whole bytes in the input as proposed on the project-zero article, but instead flip individual bits as it will be more likely to pass RTP parser checks and generate errors deeper into the WebRTC stack. Grenade have a better impact when exploding if ingested 🙂

CoSMo’ HOT n EASY FUZZER

As libsrtp performs some rtp header verifications before protecting each packet, we have implemented the fuzzing logic inside libsrtp itself. This way we are sure that the modifications are made at the last step of the packet creation, and that no error can be generated on the sender side.

Having extensive experience in all kinds of libwebrtc-based builds, we can then leverage this “weaponized” libsrtp in libwebrtc, in open-source browsers using it (chrome, firefox, …), in electron build and/or in native client trivially.

Pros:

  • Work against any webrtc-based solution that has a web client available.
  • Easily embeddable in any native client code.
  • No set-up.
  • No coding required.
  • No modifications on the SFU (if any).
  • End-to-end testing on real-life scenarios.

Cons:

  • Somewhat slower at finding bugs than testing tens of thousands of RTP/RTCP packets per second as an unit test fuzzer would.
  • Not deterministic, as it work on real-life inputs from the browser.
  • Requires being able to debug or troubleshoot the SFU (if any) or remote client (if p2p) to actually find the root cause of any crash.
  • Not very efficient against java-based implementations like Jitsi.
  • Real-Life test and validation

TESTS AGAINST REAL TARGETS

We compiled the current stable version of both Chrome and Firefox against our modified version of libwebrtc.

We first used those modified versions with appRTC, a p2p 1:1 webrtc application, against unmodified browsers. It looks like the zero project team had done a good job, and after 15mn we did not have a single crash against Chrome and Firefox.

We then ran the hosted demos of several webrtc SFUs. Medooze, MediaSoup and Janus were down in a matter of seconds. A Jitsi Video Bridge instance kindly provided by the Jitsi team sustained 15mn of attack without crashing (thanks in part, we assume, to java, and in other parts to the great job of the Jitsi team). We did not test any other SFU. Several bug fixes have been provided to the open-source repositories of the above-mentioned SFUs.

We eventually ran it against the web clients of several well know PaaS and CPaaS, for no more than 15 mn at a time, and for most of them we communicated them the results of our run.

As a shareholder of Temasys with special interest in the matter, I also ran it against its platform (with MCU enabled, otherwise, I’m testing the browser itself in p2p), and the results are equally insightful. 

CONCLUSION

We’ve shown that we can easily democratise fuzzing by modifying webrtc internals and building native apps (or browser) with the modifications in. Now, anybody with absolutely no coding skill at all can test any webrtc system or app, wether the media is exchanged in P2P or through a media server.

Of course this naive testing will be much slower at discovering vulnerabilities than fuzzing unit tests. Of course a non-coder will only be able to tell is a vulnerability exists by observing the service or app crashing, without being able to actually point to the source of the vulnerability, but isn’t that already a great information? Moreover in the hands of people that can actually code, and attach a debugger to the right app or server, you have now a no-setup, no homework fuzzing tool at disposal in your ever-growing testing toolbox.

Even easier: this external fuzzing capacity has been added to the KITE testing engine for automation, and is currently available with open-source browsers like chrome and firefox. Packages of modified libwebrtc are available for quick integration in SDKs to address the testing needs of the mobile-first / mobile-only enterprises. 

Happy testing.

(*) – libfuzz further reading material


Happy new year 2019 – so many good things #WebRTC to come from CoSMo.

$
0
0

We’re a little bit late to the party, many of the other WebRTC vendors and consultants have already wished everyone the best, but better late than never. We also list a few of the goodness you should expect from CoSMo this year.

While 2018 was a high growth year for CoSMo (+100%), with a lot of innovation and R&D, 2019 will see more products and services so far reserved for our internal developments and professional services become available.

Research and Development

Of course, we are planning to keep innovating, and we have already signed research agreements with several universities, e.g. with EPFL for a master thesis on Watermarking, as well as industrial R&D collaborations.

Our Scientific work on a non-reference Video Quality Metric Assessment tool, NARVAL has already be integrated in KITE. It was instrumental in enabling the first large-scale comparative study of WebRTC SFUs. What we loved most about this work, is that it was a community effort. All the webrtc SFU teams were involved in the process, and all tested SFUs ended up fixing major performance bugs in the process! In 2019, we plan to maintain a live page of the results, to allow the team to update the results every time there is a new release, and to allow new teams to participate. If you have a webrtc media server, and want to participate and/or appear in the results, let us know.

KITE, the best WebRTC testing solution out there (the only one that supports desktop and mobile browsers, as well as native apps, electron, … on-premises or hosted), has been also instrumental in maturing webrtc implementations in the browsers. Coupled with the medooze server, and network instrumentation, it offers a full simulcast testing environment, used today by google, Apple, and certainly a few others. The next WebRTC standard interim meeting on January 22nd 2019 will see presented many results generated thanks to KITE. CoSMo has also contributed several patches to almost all the browsers out there (Google and Apple H.264 simulcast implementation, Mozilla RTX implementation, ……..), and to several WebRTC SFUs.

Engineering at the edge

Preparing for the future and for WebRTC NV, CoSMo has also a few projects ongoing with respect to VR (and specifically media / data sync), AV1 Codec, and QUIC.

Already very active during the last IETF meeting in Bangkok around QUIC and specifically media on QUIC, CoSMo will attend the special Hackathon / WG meeting in Tokyo January 28th to 31th, 2019. We hope for convergence of the core specification, convergence that had been promised for early 2018 ….

CoSMo had been working on AV1 for the best part of last year already, and the first real news of 2019 is that We have been accepted as member of AOMedia, to participate in the evolution, specification, and testing, of real-time AV1. While the codec itself is specified, the RTP payload is not and is one of the missing piece to enable AV1 usage in WebRTC. CoSMo maintains an AV1 payload implementation in a modified WebRTC stack, and our Media Server Meedoze supports it as well. The AOMedia version of AV1 payload is much more advanced, resulting in lower bandwidth usage, lower SFU’s CPU overhead, as well as better distributed bandwidth usage (less bursts). The immediate goal would be to bring AOMedia version of the AV1 payload in CoSMo’s modified Webrtc stack to then enable testing with KITE. A secondary goal is to write the payload specification to be brought to the IETF payload working group.

Making WebRTC easy to use

We are early 2019, and webrtc 1.0 is getting there in most of the browsers. However, using it in native apps require handling libwebrtc and that is still more painful than it should be.

As an app developer, I do not want to go through the pain of understanding google build system, figuring out the right options, ….. I want to use a system install of the library and let a package manager handle dependencies and or configurations. That is actually one of the first thing we are going to propose of-the-shelves in Q1: pre-compiled, tested, validated, documented libwebrtc packages! That should save you a month or two to start a new native app / SDK project, and even more in the long term if you register for updates and migration guides. It should be available mid february, after chinese new year. If you want more information right away, contact us.

If you’re a bigger team with devs who modify libwebrtc, need to be able to build fast, test fast, and possibly debug the result, we have you covered as well. We provide licenses to our build scripts, distributed builds system, distributed test system, dashboard, dependency management with Conan, and so on. You’re not going to spend 50% of your time waiting for WebRTC compilation (or Fetch!) anymore.

Beyond libwebrtc itself, we also provide different bindings and libraries, e.g. QT (C++/QML) wrappers. That pushes webrtc specifics even further down, and allow you to code easily in your environment. Most of our Wrappers expose an API that is as close as possible from the JS one. In the case of Qt with QML, that allows one to reuse signalling SDKs almost directly. Those libraries are provided with examples, p2p implementations interoperable with appRTC, and clients to the usual webrtc SFUs like Janus videoroom plugin.

Interestingly we also propose builds of libwebrtc not supported by Google, e.g. with OpenSSL 1.1, additional codecs, watermarking, …… and our entire solution portfolio then leverage them.

If it is not tested it’s broken

KITE is really the product that will get the most love for us this year. Having a great test engine is really making a difference when you want to develop advanced features, and when you want to compare yourself against the competition.

Nowadays, people are mostly interested in three things: interoperability testing, load testing and monitoring.

Callstats.io solved the monitoring problem very well, and for everybody that do not have anything today, they are a perfect choice. More advanced players will already have monitoring and analysis in place.

Interoperability testing, sometimes call end-to-end testing, is the capacity to have a successful “call” between two clients of your solution. Testing only 1:1 calls, in desktop browsers, is terribly easy and does not have a lot of added value. What is difficult is:

  • Testing the latest (nightly, canary, …) versions of browsers to report bugs in time to be fixed ASAP,
  • Testing mobile browsers (on real devices)
  • Testing native apps (desktop or mobile)
  • Control network quality and bandwidth quantity programmatically
  • Media Quality Assessment

For all those case, KITE in interoperability mode is perfectly suited, if still a little bit difficult to use. We’re addressing that shortcoming.

Load testing is easier in a way, as you do not need to handle multiple clients. It’s challenging though, as you need to generate a lot of traffic, possibly with a specific timing, to be able to really test at scale. For example, original tests of Kurento media server by testRTC limited to a few hundreds streams did not find the problems, that a more thorough investigation at a higher scale by KITE in load testing mode found right away.

You can do it, we can help.

CoSMo will continue actively participating to the key committees and consortia to stay aware of the close future. We’ll continue our collaboration with the other webrtc experts out there (the ones who code, more than the one who write opinions).

We are preparing a lot of self-service package to make your first adoption of webrtc easier and faster, and the maintenance easier down the path.

If you need more, if you need full system architecture and development, or simply want to be informed by people who are part of the process, who are actually implementing and testing those technologies, if you want examples and reproducible tests to check by yourself, contact us.

WebRTC 1.0 Simulcast Hackathon @ IETF 104

$
0
0

Finally finding the time to go back to the root of this blog: webrtc standard. In this post we will revisit what simulcast is and why you want it (product view), before we go into the implementation status score cards for both Browser vendors and Open Source SFUs out-there that could be established after a fast and furious hackathon in Pragues two weeks ago. WebRTC Hackathon before the IETF meeting are becoming a tradition, and as more and more come together to participate, the value for any developer to participate is getting higher. We hope to see more participants next times (free food, no registration fee, all technical questions answered by the world experts, what else could we ask for?)

 Simulcast in WebRTC 1.0 refers to the sending a different resolutions of the same media source to a relaying media server (SFU). The SFU then can relay the most adequate resolution on a per-viewer basis depending on the viewer bandwidth, screen size, hardware capacity, …

The SFU can also dynamically adjust which resolution is being sent dynamically, were the conditions on the receiving side change. This is most often the case with bandwidth over the public internet which can vary a lot, or when one goes from ethernet to wifi, or from one mobile antenna to the next.

This comes at some extra burden on the sender’s CPU, and sender bandwidth usage. Practically, spatial resolution goes down by a half for each dimension (x, y) at a time, which reduce the number of pixels in each resolution by a quarter. The CPU footprint and bandwidth being proportional to the number of pixels, you end up using 25% more CPU or bandwidth for an extra resolution, then an extra 6.25% for the next resolution, 1.6% for the next, and so on and so forth. The additional cost is considered marginal compared to the added User Experience value for the viewers. Moreover, in Streaming use cases, it is usual that the computer streaming out is a high-end studio computer where some composition is happening in real-time, or at least a desktop level computer. 

 Traditionally, at least in the WebRTC ecosystem, the term “simulcast” is used when changing the Spatial resolution through using separated encoder generating separated bitstream. This is independent of the media transport protocol used in WebRTC: (S)RTP.

For modifying temporal resolution, the approach is slightly different, and is based on layered codecs technologies called SVC. S stands for scalable, and the process is then called “temporal scalability”. It consists in using RTP headers instead of the bitstream to tag video frames. Changing temporal scalability (e.g. from 30 fps to 15fps), is then just a matter of dropping packets, which can be done extremely fast (milliseconds) in the media server. Most browser vendors implement both simulcast and Temporal scalability for VP8 and simulcast for H.264. Temporal scalability for H.264 is in progress (Patch Review).

Moreover most browsers supporting VP9 (and AV1) also implement full SVC support, which means that there is only one bitstream, and switching between spatial and temporal resolution in the SFU is quasi-instantaneous (Note, AV1 is implemented as decoder only for now).

Thanks to those, the SFU can be smart about the way it manages outgoing bandwidth by choosing which stream to relay. It should also be smart about how it manages the incoming bandwidth, which is a little trickier. Let’s say you are using three resolutions for the sake of the argument. If the bandwidth between the sender and the media server was to be reduced, what should happen?

The natural answer is: drop the highest resolution, as we have been used to that behavior by years of skype usage. However smarter bandwidth allocation could be used depending on your use case. Some might want to protect the highest resolution stream by default, shutting down the lowest resolution(s) if it allows to stay within the available bandwidth. Some might decide depending on what resolutions are used by the viewers at that time: kill the streams with no viewers, or the less amount of viewers, …..

That is why the notion of simulcast in WebRTC 1.0, is entangled with the notion of “multiparty signaling”. While simulcast can be achieved with separate peer-connections, one per resolution, it does not allow then for smart bandwidth management on the sender-side, as the bandwidth management, a.k.a. congestion control, is on a peer-connection basis. Simulcast has been possible for more than a year now, as long as you were using separated peer connections, hacking your way around the signaling (mangling SDP), and some other workaround. Full fledge simulcast, bringing better connectivity (less ports used), full bandwidth management on sender-side, and so on and so forth requires a single peer-connection, using some new APIs, corresponding capable codecs, …… This all only came along very recently in browsers.

The Hackathon

So here is the challenge: parts of simulcast in WebRTC 1.0, like the sender side and corresponding signalling, are standardised, parts, everything about the SFU and the receiving side, are not. The specifications go across at least two standardization groups, W3C which has a test suite, and IETF, which does not, but for Which the WebRTC ecosystem has created a special Testing Infrastructure called KITE.

As leaders of WebRTC testing and member of all the standard committee, and in collaboration with google, which is co-chairing most of the working groups, CoSMo went ahead and organized an Hackathon to help giving all the stakeholder visibility on the current state of affairs and eventually bringing simulcast faster to maturity, and (Finally, those slides were written in 2015 !!) call WebRTC 1.0  “DONE”. 

The hackathon at IETF 104 was the biggest ever, as people realized the value of working together on some subjects. I suspect that the free food helped too. While we had done WebRTC hackathon before, around Identity and security in London, and around bandwidth management in Bangkok last year for example, this was the best organized, and biggest WebRTC Hackathon to date. 19 individual registered for WebRTC with 13 listing ONLY WebRTC.  All main Browser vendors were represented MS, Google, Mozilla, Apple.  Many open-source SFU Tech Leads were on-site: Meetecho, Medooze, …while many othershad  prepared tests to be run by us like MediaSoup.Two W3C staffs came to help as well, as getting visibility on what’s left to be done is their biggest concern today.

THE RESULTS

The details of each bug found, filled, fixed, new tests, and all the detailed metrics can be found on the wiki and corresponding presentation, but suffice to say, having browsers, SFU devs, and KITE experts around the table was really efficient. Meetecho has been writing a great blog post about it. I’m going to illustrate it with a few handpicked items.

The W3C, and the browser  vendors, were really interested in seeing a global browser status card, and set up automated testing to be able to check their progress (and avoid regressions) from that day forward. A specific version of and SFU wrote on top of Meedoze with a “lean-n-mean, only-accept-specs mode had already been specifically provided to all browser vendors last year, and e.g. Apple had been using it a lot to come up with their first Simulcast implementation. The above table required the interaction between all of us, plus some of the specification writers in the case of bandwidth estimation. Moreover, the results have been vetted by each browser vendors as being true up to their knowledge, to avoid bias. The result is a reference status table the W3C and others can use to plan and roadmap their transition to a spec-compliant Simulcast world.

For the SFUs vendors, there is a need to be more pragmatic. You cannot be more catholic than the pope, and if the browsers do not implement simulcast, why should they. Most of them have then a tendency to implements what the browser have implemented, and lag behind a little bit. This is especially true with Commercial SFUs or services. While we originally included some commercial SFU results, we have decided to remove them from this table. For the reason cited above, they do not support simulcast and it should be expected. Colouring them in red, as they do not respect the spec today, makes them feel like they’re being told their baby is ugly. Since this is not a judgement piece, but a factual, compliance piece, and since we would not be able to double-check their claim ourselves anyway, we felt it was more reasonable to just take them out.

Open-source SFUs are more prone to implements the latest tech, and more open to constructive criticism, while still needing to be pragmatic. The result is that they supports several flavours of Simulcast, often in parallel, to support all browser vendors, hoping that they’d get their s%^&* together fast, converge quickly to spec-compliant implementation.

In the above table, everything that is important to support any given flavour of Simulcast today is indicated, and those important to be spec-compliant are coloured. It is more or less in order from the most compliant (left) to the less compliant (right).

Now, they were quite a few Nasty bugs found during that hackathon, and I would like to focus on two which I think would be almost impossible to find if you did not have the holy trinity (browser, SFU, KITE) around the table, showing the value for all to participate in those hackathons.

Choose a simulcast stream (High, Medium and Low) and a Temporal layer (original FPS, FPS/2, FPS/4), and you can visually compare the sent (top line) and received (bottom line)  Width, Height, FPS and kbps. People familiar with my blog and webrtchacks will recognize the UI that has been used many times in the past, to test VP9 SVC in chrome, test VP9 SVC and simulcast for Medooze and janus, and test the first implementation of H.264 simulcast in Safari by Apple.

In this example that we open source, we use a single browser in loopback mode, and we implicitly check that simulcast is working by selecting the layers. The example is fully automated with KITE, and can then be run across any desktop and mobile browsers (+some more), to check for regression.

 The second bug is funnier, once you have found it. It so happens that chrome is allocating bandwidth to simulcast streams implicitly based on the order you feed them to the API, assuming the first one is the highest resolution, and the last one is the lowest. If you feed the streams in e.g. the reverse order, it ignore the indication provided and allocate most bandwidth to the first, lowest resolution, stream J

Kudos on Lorenzo For finding this one and for making a manual test, that CoSMo then automated with KITE for Apple. As you can see above, Meetecho has the equivalent of the loopback test we spoke about earlier with Medooze, in the form of the EchoTest plugin demo. The UI gives you indications about which layers are being received, and what bandwidth is being used.

You can see on this screen capture with the test running that the bandwidth provided is NOT what you would expect (on the left).

(Below) KITE allows for automation of this test, result gathering, and reporting.

WebRTC 1.0 Simulcast vs ABR

$
0
0

In a public comment to millicast recent post about simulcast, Chris Allen, CEO of infrared mentioned that they have been supporting ABR with WebRTC in their Red5 Pro Product for a long time. While his claim is valid, and many in the streaming industry use a variation of what they do, there are two very important distinctions that needs to be made between ABR and simulcast. We made the distinction about latency quickly in our presentation at streaming media west last year, however possibly too quickly, and we never really explain the distinction about end-to-end encryption, so we though we should dedicate a full post this time around. WebRTC with simulcast is the only way to achieve the lowest latency possible, and real end-to-end security, with a higher flexibility than DRM can provide.

Optimum latency

The main goal of using WebRTC in Streaming is the optimally low latency. In millicast recent post about simulcast, we explained how the latency was mainly related to chunk size, why WebRTC was optimal in that regard and would have the lowest possible latency of all the UDP-based media transports. That’s for the packetization and the transport, but encoding is also adding to the overall latency.

While WebRTC End-to-End provide optimally low latency, if you add extra encoding, transcoding, even only transmuxing or any other media operation on the path, you degrade that latency. Most of the latency, and CPU footprint comes from the encoder/decoder part, as illustrated by this paper by INTEL. Let’s count the number of Encoding / Decoding pairs happening on the media path when using WebRTC end-to-end on one hand and when using server-side ABR on the other hand.

In simulcast or ABR, one encode the same media source with different spatial resolutions. In simulcast, the multiple encoding is done client-side, in ABR, it is done server-side.

Multiple encoding in parallel do not increase the latency, but increase the CPU footprint by 25% for a single additional stream, an a theoretical 33% for an infinite number of additional streams, and the bandwidth usage. In simulcast both those overhead are shouldered by the sender, in Server-side ABR by the server.

It’s all about compromise. ABR could be, and certainly is, sold as a way to reduce the cpu footprint and the bandwidth usage sender side, which is does, at the cost of almost doubling the latency and preventing the use of end-to-end encryption.

Moreover with nowadays desktop computer, the additional CPU footprint is negligible. Everyday, gamers are playing games and streaming in real-time using software like OBS-Studio, without the streaming part noticeably impacting the performance of the game.

By counting the minimum amount of blue box on a given (horizontal) media path, one can see that that the server-side ABR is including twice the amount of media processing the WebRTC End-To-End does.
The numbers in the blue boxes represent the CPU load relative to the load of the high resolution stream. Assuming each resolution is the quarter of the precedent, and assuming an encoder whose complexity is linear with respect tho the input size in pixels.

In Server-side ABR, one need to encode the stream once (high resolution), to send it to the server, where it needs to be decoded before it can be re-encoded again with different resolution. With WebRTC End-to-End, from glass to glass you ever only encode and decode once. With server-side ABR, you do that twice. However good your server-side ABR implementation is, you can never be as fast as WebRTC End-to-End with simulcast.

CoSMo and Millicast, as experts and visionaries in the corresponding technologies, working with and for most of the browser vendors and major actors, we have been aware of those limitations for several years. We were part of the decision to include Simulcast in webrtc 1.0 and not to wait was taken at the technical plenary meeting of W3C in Sapporo in 2015!

Instead of trying to have an early implementations that would necessitate a lot of workaround, and would need to be rewritten once the browser vendors once mature, we decided to invest in helping the browsers get there faster. There is no surprise that Apple mentioned us, and only us, in their most reentSafari WebRTC blog post, or that our KITE technology is used to test the webrtc implementation of all browsers, on a daily basis and reported to the official webrtc website.

Security and Privacy

Another reason, beyond uncompromising latency, NOT to allow re-encoding, is security. Since the Snowden revelations, the world knows about all kind of national agencies spying on everything that goes on transit on the wire, and the Internet Engineering Task Force has taken a very strong stance on security. If everything can be captured once it goes out of my computer, if my Internet Service provider, or my CDN, can be forced to provide access to my data without my consent and without informing me, the only way to protect myself is to encrypt everything that goes on-the-wire. Welcome Telegram, Whatsapp, Signal, and all kind of new generation of communication tools which implement end-to-end-encryption with a two keys system where the service provider only ever has one key and cannot provide access to unencrypted content even if being legally asked to. They protect their customer privacy.

That implies a new trust model in which no server or connection is trusted, and should have access to the raw frames. You cannot do transcoding in the cloud, since the cloud shouldn’t have access to your raw content in the first place.

DRM, and corresponding W3C Encrypted Media Encryption is different in nature. With End-To-End encryption, the end-user or its organization has control of the encryption with their personal keys. With DRM/EME the keys are provided by the media distribution service. For free content or ad-based monetisation, one might not care, but for paid or sensitive/regulated content, the consequences of leaving access to your raw content to external third parties can be dire.

Original WebRTC Use Case: p2p, 1-1 communication. The encryption provided, in that case, is the most robust you can get. TURN servers are also secure as they do not remove the encryption, and do not have access to the keys.
Real usage of WebRTC: multiple hops between senders and receivers. In that case, the original encryption design does not protect the media inside the servers and gateways.
The only Secure way to stream at scale. Alice uses her own key to encrypt the video frame (in green) before passing it to webrtc (in blue). It prevent any intermediary server to access

While webrtc 1.0 does not include end-to-end encryption, the subject was brought into the discussion once it became clear that the original webrtc p2p use case was not scalable. It was just left to be addressed in the next version of WebRTC (WebRTC NV) not to delay WebRTC 1.0.

The IETF has been working on a specification called Privacy Enhanced RTP Conferencing (PERC). Its biggest implementation known to date is providing SEC-level compliant double encryption to the top 25 banks in the world, and is sold and operated by Symphony Communications. Its design and implementation was done by your humble servitor: CoSMo.

Several W3C members are working on new APIs in the browser that would allow to manipulate encoded frames, which in turn allow End-to-End encryption, while an evolution of PERC, protocol independent to be able to use e.g. QUIC, and more bandwidth efficient exists already for those using only native SDKs.

Conclusion

We hope that this post made the differences between webrtc end-to-end, including simulcast. There will be a follow-up post on smart but important distinctions about encoders and decoders, but we did not want to make this post too long. We also hope that the subtle but decisive difference when it comes to securing content was adequately illustrated.

We would like to thanks Chris Allen again for his original comment. Just like with their comment on a previous post, however wrong the comment, it provided us with insight on what people in the streaming industry might not yet be aware of when it comes to WebRTC, and the opportunity to write a nice blog post about it. For that, we would like to thank them.

KITE 2.0 – The best WebRTC testing engine now with more free content

$
0
0

WebRTC testing through KITE is getting more and more popular. As we updated the cosmo website, KITE is also receiving a lot of new features in this 2.0 release. Still the only end-to-end testing and load testing on the market that can be run on-premises, and to support web apps (desktop browsers and mobile browsers) as well as native apps, KITE has been leading the market from the clients application support, and price, points of view, but was still slightly being in term of usability. It was an expert tool made for experts. This new release add many free features, as well as many new commercial options for automated load testing on top of your own AWS account, for minimum cost.

We are moving on to the new KITE 2.0, which is organised quite differently from KITE 1.0. Instead of having all the modules in the same GitHub repository, they are now split based on wether the source code is close or open source and depending on who is owning the copyright. It allows us to open-source more of CoSMo’s IP without mixing with Google IP or waiting for them, while having a module mechanism that will allow to fetch automatically the needed modules. All open source modules are released under the very permissive Apache 2.0 license for all to enjoy.

KITE Engine (public) https://github.com/webrtc/KITE/tree/kite-2.0
This is the open-source KITE engine. It will be updated soon, once we will have checked all the scripts and updated the documentation.
License: Apache 2.0
Copyright: Google Inc
Todo: update doc, test, push results to github.com/webrtc/KITE, automate the setup (including the new allure-based reports)

KITE Extras (public) https://github.com/CoSMoSoftware/KITE-Extras
This module contains code that was developed by CoSMo itself independantly. Other modules can depend on it. The module is made available via Github releases:https://github.com/CoSMoSoftware/KITE-Extras/releases/tag/0.1.1. It will be automatically downloaded and installed when compiling the KITE Engine, the sample or some private tests.
License: Apache 2.0
Copyright: CoSMo Software 
Todo: update doc, including generated javadoc

KITE Sample Tests (public) https://github.com/CoSMoSoftware/KITE-Sample-Tests
This repository contains the open-source KITE sample tests. For example, KITE-Janus-Test, KITE-Mediasoup-Test and KITE-Simulcast-Test (medooze) used during the latest IETF 104 Hackathon.
License: Apache 2.0

Copyright: CoSMo Software
Todo: update doc, test

We continue to believe in better testing for WebRTC and continue evolving the KITE solution. Do not hesitate to read more about the interoperability solutions and tools

https://www.cosmosoftware.io/products/webrtc-interoperability-testing

as well as about the cheapest unlimited load testing around

https://www.cosmosoftware.io/products/webrtc-load-testing

Video CDN and Satyagraha: admitting defeat in the course to perfect live streaming.

$
0
0

Satyagraha, (Sanskrit and Hindi: “holding onto truth”), was the philosophy proned by gandhi when addressing the British Imperialism: non-violence, self-scrunity and research of truth. It lead to citations like “First they ignore you, then they laugh at you, then they fight you, then you win”, and more simple saying like “believe what they do, not what they say”, which should be applied any marketing material, really.

For the past two years, all CDNs and most streaming server vendors, have been trying to downplay the disruptive impact that WebRTC would have on their business. It wouldn’t scale. There would be no market for it. It would not be needed. It has now changed.

A first wave of streaming WebRTC CDN have emerged: Limelight, phenixRTS and MilliCast (*), showing that e.g. WebRTC can scale while maintaining an order of magnitude lower latency than the traditional offers. The traditional CDNs were left with no excuse and still had no answer to WebRTC-based solutions. They did what anybody with money would, and moved on to the next … “marketing spin”. Other, smaller and tech-oriented company realise the trend, and recognise it publicly, but think that there is space for both technology stack: a more mature slower stack well-adapted to pre-recorded content streaming, or VOD, and WebRTC more suited for real-time content. For example, the founder and CEO of bit moving, co-inventor of MPEG-DASH, opened his talk at NAB 2019 about CMAF saying just that: if you re looking for real-time WebRTC would be better, but for any other cases, and there is a lot of them, MPEG-DASH and HLS are still the best out there, and CMAF is a much welcome improvement to them.

Some come up with new protocols that will save the world, and their business, from WebRTC.
– For Wowza, WebSocket is the solution, even if it does include any media related reliability, and is still an HTTP-based protocol.
– For Haivision (and wowza), SRT is the solution, even if it is not a standard, unlike WeBRTC which went through the standardisation process at both the W3C and the IETF, and reached consensus, and will never be accepted in Browsers: at least it provides some kind of answer.
– For many CMAF is the current hope word: “wait, don’t leave our service, we are going to make it faster!”, which is true, but hide the fact that by design, “faster” for HTTP-based (transport) and file-based (container) technologies can only go so far as a second or a couple of second latency.

Very recently, we can witness a new phase of the organised retreat from the traditional video CDNs: “granted, we cannot achieve the same level of latency, granted, our scalability is but marginally better, and the quality is admittedly in par, so, let’s not speak about that, it’s not important anymore. Instead of embracing the new disruptive tech, or in order to buy us time to do so, let’s go into smaller and smaller niche to differentiate, by coupling the “live” keyword with some other keyword we believe WebRTC CDNs can’t achieve yet:


LLNW:Live events require ad-insertion and forensic Watermarking”
may 2nd


VDMS: “Live events require ad-insertion.
may 6th

And of course, the old school Fear Uncertainty and Doubt (FUD) tactic used by bigger corporation against smaller ones, as infamously illustrated by Microsoft halloween memo in 1998:
– “They are small, they can go belly up anytime, we’re a big corporation, nobody has ever been fired for buying IBM’s”
– “They are young, they do not know what they are doing, we have been doing this for decade.”
In the later case, in the words of Verizon Digital Media Services it translates into this:

As a live event streaming platform that has supported thousands of live sporting events, including some of the largest leagues, biggest venues, and most watched games, we’ve had to address these concerns. Working closely with our customers, we’ve continually evolved and optimized every aspect of our live streaming video workflow and architecture so we can keep up with rising viewer standards for streaming quality.” (link)

Alas, it is all past tense.

First they ignore you“. As a conclusion, there is a lots of indication that the streaming and broadcasting ecosystem is now aware that not only WebRTC exists, but that it is viable.

“Then they laugh at you“. We see multiple attempts at trying to half-heartedly adopt it , at undermining its disruptive effect, or at preventing losing too much market to it.

Then they fight you“. One thing that people seems to be oblivious to is the speed at which WebRTC and corresponding offers mature. Two to three years ago, there were no WebRTC CDN (PhenixRTS was still phenix P2P and they had not understood yet the best way to use WebRTC for streaming). Today, a couple of years later, there are several WebRTC streaming services doing a good enough job to capture a part of the streaming market.

The web economy is not on vacation either, and Facebook, AWS, Google are all pushing the boundaries of the possible with e.g. project Stadia, defining Next generation of Codecs (AV1), network transport (QUIC), and of course WebRTC NV, all of which allow WebRTC-based services to evolve faster than the rest, on the shoulder of giants.

Given the speed at which those new players have came to market, it wouldn’t be surprising to see them invade quickly the rest of the market, adding features the traditional CDNs think are differentiating like server-side ad-insertion, forensic Watermarking, and the like before the end of the year.

Then you win“.

More technical details and ideas can be found in our recent presentation at Live Streaming East here.

(*) – NetInsight, Id3as, nanocosmo, wowza service get sometime mentioned, but it is not clear wether they are WebRTC end-to-end, i.e. it is not clear wether they can achieve as low a latency as WebRTC does.
Video Conferencing services like Tokbox, Twilio, Agora.io, …. also get mentioned, but the design of a Video Conferencing platform is vastly different from a streaming platform, and they are likely not to be competitive for a streaming usage.

Viewing all 90 articles
Browse latest View live