Monday, August 12, 2013

Web Apps as the new Lingua Franca

I've recently been involved in some conversations around the "lock-in" provided by the mobile system development platforms: once you start working towards (for example) an Android app, it's going to be an Android app. Deploying it on another platform generally means that you have a lot of porting work to do, and multiple codebases to maintain.

I think that's yesterday's model. I think the model for the future is "web apps," by which I mean applications developed using HTML5, JavaScript, and the various APIs coming out of the W3C.

An objection I've heard raised here -- actually, two different objections, but with the same root -- is (1) people don't want to open their web browser and surf to a web page to run their programs, and (2) the mobile web browsing experience is not terribly convenient.

What's critical to realize is that "web apps" aren't necessarily bound to the experience of "opening up a web browser as an explicit action, navigating to an application's web site, and then interacting with it as if it were a web page." Web-apps-as-first-class-apps is a viable model for development and deployment. And there is a team of people inside Mozilla who are working really hard to make this work as seamlessly as possible.

Both FirefoxOS and BlackBerry 10 inherently treat HTML5/JavaScript applications as "first class" apps. In FirefoxOS, it's the sole means of developing applications; and BlackBerry 10 uses it as one of several native development environments. In both cases, the user's experience is indistinguishable from what they're used to apps doing: there isn't even an intermediating browser anywhere in the mix. The HTML5 and JavaScript engines are an inbuilt part of both operating systems.

I recognize that neither platform has the penetration to be a real game changer; at least, not at the moment. I'm holding them up as existence proofs that the approach of "web app as native app" is viable. But let's talk about how these technologies can be leveraged for native webapp deployment on Android, iOS, and Desktop systems.

All of these systems have their own idea of "installed applications." To be successful, we need some way to make these HTML5/JavaScript applications look, act, and feel like every other kind of application (rather than "something running in a web browser").

This is where the work that Mozilla has been doing for web app deployment comes into play, and it's easier to demonstrate than it is to explain. If you have Firefox installed on your desktop (or Firefox Aurora on your Android device), you can go to https://marketplace.firefox.com/, find apps that amuse you, and click on "install." At that point, Firefox takes the steps necessary to put an icon in whatever location you're used to finding your applications. On your Android device, you get an application icon. Under OS X, you get a new application in Launchpad. For Windows, you get... well, whatever your version of Windows uses to access installed apps.

From that point forward, the distinction between these web apps and native applications is impossible for users to discern.

Interestingly, iOS was both a bit of a pioneer in this space, and also remains the most frustrating platform in this regard. If you recall Apple's stance when the original iPhone first launched, they didn't allow third-party applications at all. Instead, their story was that HTML and JavaScript were powerful enough to develop whatever applications people want to write. Presumably as part of this legacy, iOS still allows users to manually save "bookmarks" to their desktop. If the content behind those bookmarks is iOS-aware, then the bookmarks open in a way that hides the browser controls -- effectively acting like a native app. If you have an iOS device, you can play around with how compelling this experience can be by going to http://slashdot.org/ -- select the box-and-arrow icon at the bottom of your browser, and click "Add to Home Screen." Now, whenever you open the newly-created icon, it acts and feels exactly like you downloaded and installed a native "Slashdot" application onto your device. It even appears in the task management bar as a separate application.

The reason the iOS experience remains frustrating is twofold. First, and primarily an annoyance to developers: the only HTML5/JavaScript rendering engine allowed on the platform is Apple's own webkit implementation. This frustrates the ability to add new APIs, such as may be required for new applications. So, new APIs are only added at whatever pace Apple's product plan gets them added. Secondly -- and of actual concern to users -- there is no programmatic way to add web app icons to the desktop. Creating these icons requires a relatively arcane series of user steps (described above). Basically, Apple is playing the role of the napping hare in this tortoise-and-hare race: they were very fast out the gate, but have been asleep for quite a while now.

But that covers all the bases: it's possible to develop HTML5/JavaScript apps that truly run cross-platform for iOS, Android, Blackberry, FirefoxOS, and a variety of desktop systems -- even if the Apple experience is somewhat less than satisfying (for both developers and users).

I suspect most of what's holding developers back is predominantly a lack of knowledge that this is a viable technical model. I've heard developer feedback that one major reason they aren't targeting JavaScript environments involve concerns about the speed of execution in a JavaScript environment. It's not yet clear whether these problems are merely perception or reality.  I've seen some impressive things like usable cryptography, high-def video codec decoding, and high-performance first-person 3D environments (including viable multiplayer first-person shooters) implemented in pure JavaScript, so I suspect it's more a perception problem based on very outdated experiences. I'd love feedback from anyone who has run into performance problems trying to use this model with a modern JavaScript interpreter.

Friday, August 2, 2013

It's official: No SDES in the browsers.

I wanted to follow up on my earlier post regarding WebRTC security. In that post, I mentioned that RTCWEB was contemplating what role, if any, SDES would play in the overall specification.

At yesterday's RTCWEB meeting, we concluded what had been a rather protracted discussion on this topic, with a wide variety of technical heavy hitters coming into express opinions on the topic. The result, as recorded in the working group minutes, is:

Question: Do we think the spec should say MUST implement SDES,
MAY implement SDES, or MUST NOT implement SDES. (Current doc
says MUST do dtls-srtp)

Substantially more support for MUST NOT than MAY or MUST

**** Consensus: MUST NOT implement SDES in the browser.

So, what does this mean? The short version is that we just avoided a very real threat to WebRTC's security model.

I'd like to put a finer point on that. There was a lot of analysis during the conversation around the difference in difficulty of mounting an active attack versus a passive attack, the ability to detect and audit possible interception, and a variety of other topics that would each require their own article (although I think Eric Rescorla's slides from that session do a good job of summarizing the issues rather concisely). Rather than go into that level of detail, I want to share a brilliant layman's description of SDES by way of an anecdote.

One of the RTCWEB working group participants brought along a companion to Berlin for this weeks' meeting. She sat in on the RTCWEB encryption discussion. When she asked what the lively conversation was all about, the working group participant directed her to the relevant portions of the SDES RFC.

Yesterday evening, when we were talking about the day's events, she summarized her impression of SDES based on this lay reading: "It's like having the most state-of-the-art safe in the world, putting all of your gold inside, and then writing the access code on the outside."

Friday, June 7, 2013

WebRTC: Security and Confidentiality


One of the interesting aspects of WebRTC is that it has encryption baked right into it; there's actually no way to send unencrypted media using a WebRTC implementation. The developing specifications currently use DTLS-SRTP keying[1], and that's what both Chrome and Firefox implement. The general idea here is that there's a Diffie-Hellman key exchange in the media channel, without the web site -- or even the javascript implementation -- being involved at all.

But who are you talking to?


This is only part of the story, though. While encryption is one of the tools necessary to prevent eavesdropping by other parties, it is in no way sufficient. Unless you have some way to demonstrate that the other end of your encrypted connection is under the control of the person you're talking to, you could easy be sending your media to a server that is capable of sharing your conversation with an arbitrary third party. One important tool to help with this problem is the WebRTC identity work currently underway in the W3C. This isn't ready in any implementations that I'm aware of yet, but it's definitely something that needs to happen before we consider WebRTC done.

The general idea behind the identity work is that, as part of key exchange, you also get enough information to prove, in a cryptographically verifiable way, that the other end of the connection are who you think they are. Of course, there are still some tricky aspects to this (you have to, for example, trust Google not to sign off on someone other than me being "adam.roach@gmail.com"[2]), but you can at least reduce the problem from trusting one party (the website hosting the WebRTC application) to trusting that two parties (the website and the identity provider) won't collude.

The other tool necessary to ensure the confidentiality of contents is making sure that the media isn’t being copied by the javascript itself and being sent to an alternate destination. This isn’t part of any current specification, but we’re working on adding a standardized mechanism that will allow specific user media streams to be limited so that they can only be sent, over an encrypted channel, to a specified identity (and nowhere else).

On top of this, web browser developers have a very difficult task in presenting this to users in a way that they can use. The nuances between (and implications of) "this is encrypted but we can't prove who you're talking to" versus "this is being encrypted and sent directly to adam.roach@gmail.com (at least if you trust Google)" are very subtle. Rendering this to users is a thorny challenge, and one that's going to take time to get right.

And who knows who you are talking to?


Of course, none of this is perfect. The recent Verizon brouhaha is about a database of who is communicating with whom (known in the communications interception community as a "pen register"), not actually listening in on phone calls. It uses telephone numbers as identifiers, which are pretty easy to correlate to an owner. WebRTC can't prevent this kind of information from being collected,  using IP addresses where Verizon uses phone numbers. IP addresses aren't much harder to correlate to people than phone numbers are, as has been demonstrated by numerous MPAA and RIAA lawsuits.

Even with a good encryption story, WebRTC has no inbuilt defenses to collecting this kind of information. Anyone with access to the session description is going to be able to see the IP addresses of both parties to the conversation; and, of course, the website is going to know where the HTTP requests came from. Beyond that, your ISP (and every backbone provider between you and the other end of the call) can easily see which IP addresses you're sending information to, and picking media streams out (even encrypted media streams) is a trivial exercise for the kinds of equipment ISPs and backbone providers deploy.

The problem  is that it's fundamentally difficult to mask who is talking to whom on a network. There are approaches, such as anonymizers and Onion Routers, that can be used to make it more difficult to ascertain; but such approaches have their own weaknesses, and most simply shift trust around from one third party to another.

In summary, WebRTC is taking steps to allow for the contents of communication to remain confidential, but it takes a concerted effort by application developers to bring the right tools together. The less tractable problem of masking who talks to whom is left as out of scope.

____
[1] There's been recent talk in the IETF RTCWEB working group of adding Security Descriptions (SDES) as an alternate means of key exchange. SDES uses the signaling channel to send the media encryption keys from one end of the connection to the other. This would necessarily allow the web site to access the encryption keys. This means that they (or anyone they handed the keys off to) could decrypt the media, if they have access to it. In terms of stopping some random hacker in the same hotel as you from listening in while you talk to your bank, it's still reasonably effective; in the context of programs like PRISM, or even the pervasive collection of personal data by major internet website operators, it's about as much protection as using tissue paper to stop a bullet.

[2] Whether you choose to do so mostly comes down to whether you trust this blog entry more than this slide.

Thursday, May 16, 2013

Google Hangouts testing WebRTC-based, pluginless implementation?

A sharp-eyed Toby Allen recently brought the following code to my attention:

Qg.prototype.init=function(a,b,c,d){this.ca("pi1");var e=window.location.href.match(/.*[?&]mods=([^&]*).*/);if(e=(e==m||2>e.length?0:/\bpluginless\b/.test(e[1]))||Z(S.Xd)){t:{var e=new Ad(Uc(this.e.l).location),f;f=e.K.get("lantern");if(f!=m&&(f=Number(f),Ka(Og,f))){e=f;break t}!Fc||!(0<=ta(dd,26))||webkitRTCPeerConnection==m?e=-1:(Pg.da()?(f=Pg.get("mloo"),f=f!=m&&"true"==f):f=q,e=f?-3:0==e.hb.lastIndexOf("/hangouts/_/present",0)?-4:1)}e=1==e}e?Rg(this,q):Sg(this,a,b,c,d)};

That's an excerpt from the Google Hangouts javascript code. It's a bit obfuscated (either by design; or, more likely, because it's the output of another tool), and I haven't taken the time to fully dissect it. But the gist of the code appears to be to test for the presence of a "mods=pluginless" string in the URL; and, if one is present, to check whether the browser supports the use of WebRTC's RTCPeerConnection API (or, at least, Google's prefixed version of it). It then looks like it calls one of two different initialization functions based on whether such support is present.

Alas, with this preliminary analysis, I couldn't get Hangouts to do anything that looked plugin-free, even on a recent copy of Chrome Canary. But it's pretty clear that the Hangouts team has started playing around with a WebRTC-based implementation.

The downside is that they're checking for Chrome's specific prefixed version of RTCPeerConnection rather than attempting to use a polyfill like most WebRTC demos on the web nowadays. So it appears that this functionality, when it's deployed, is most likely going to be Chrome-only -- at least, initially.

Tuesday, May 7, 2013

WebRTC: Welcome to the party! Please watch your head.

This is a republication of my section of a blog post from the Mozilla hacks blog.
 
About three years ago, my dear friend and VoIP visionary Henry Sinnreich spent some time over lunch trying to convince me that the real future of communications lay in the ability to make voice and video calls directly from the ubiquitous web browser. I can still envision him enthusiastically waving his smartphone around, emphasizing how pervasive web browsers had become. My response was that his proposal would require unprecedented cooperation between the IETF and W3C to make happen, and that it would demand a huge effort and commitment from the major browser vendors. In short: it’s a beautiful vision, but Herculean in scope.

Then, something amazing happened.

Over the course of 2011, the groundwork for exactly such IETF/W3C collaboration was put in place, and a broad technical framework was designed. During 2012, Google and Mozilla began work in earnest to implement towards the developing standard.

Last November, San Francisco hosted the first WebRTC expo. The opening keynote was packed to capacity, standing room only, with people spilling out into the hallway. During the following two days, we saw countless demos of nascent WebRTC services, and saw dozens of companies committed to working with the WebRTC ecosystem. David Jodoin shared with us the staggering fact that half of the ten largest US banks are already planning their WebRTC strategy.

And in February, Mozilla and Google drove the golden spike into the WebRTC railroad by demonstrating a real time video call between Firefox and Chrome.

With that milestone, it’s tempting to view WebRTC as “almost done,” and easy to imagine that we’re just sanding down the rough edges right now. As much as I’d love that to be the case, there’s still a lot of work to be done.

Last February in Boston, we had a joint interim meeting for the various standards working groups who are contributing to the WebRTC effort. Topics included issues ranging from the calling conventions of the WebRTC javascript APIs to the structure of how to signal multiple video streams – things that will be important for wide adoption of the standard. I’m not saying that the WebRTC standards effort is struggling. Having spent the past 16 years working on standards, I’m can assure you that this pace of development is perfectly normal and expected for a technology this ambitious. What I am saying is that the specification of something this big, something this important, and something with this many stakeholders takes a long time.

Even if the standards work were complete today, the magnitude of what WebRTC is doing will take a long time to get implemented, to get debugged, to get right. Our golden spike interop moment took substantial work on both sides, and revealed a lot of shortcomings in both implementations. Last February also marked the advent of SIPit 30, which included the first actual WebRTC interop testing event. This testing predictably turned up several new bugs (both in our implementation as well as others’), on top of those limitations that we already knew about.

When you add in all the features that I know neither Mozilla nor Google has begun work on, all the features that aren’t even specified yet, there’s easily a year of work left before we can start putting the polish on WebRTC.

We’re furiously building the future of communications on the Internet, and it’s difficult not to be excited by the opportunities afforded by this technology. I couldn’t be more pleased by the warm reception that WebRTC has received. But we all need to keep in mind that this is still very much a work in progress.

So, please, come in, look around, and play around with what we’re doing. But don’t expect everything to be sleek and finished yet. While we are doing our best to limit how the changing standards impact application developers and users, there will be inevitable changes as the specifications evolve and as we learn more about what works best. We’ll keep you up to date with those changes on the Hacks blog and try to minimize their impact, but I fully expect application developers to have to make tweaks and adjustments as the platform evolves. Expect us to take us a few versions to get voice and video quality to a point that we’re all actually happy about. Most importantly, understand that no one’s implementation is going to completely match the rapidly evolving W3C specifications for quite a while.

I’m sure we all want 2013 to be “The Year of WebRTC,” as some have already crowned it. And for early adopters, this is absolutely the time to be playing around with what’s possible, figuring out what doesn’t quite work the way you expect, and — above all — providing feedback to us so we can improve our implementation and improve the developing standards.

As long as you’re in a position to deal with minor disruptions and changes; if you can handle things not quite working as described; if you are ready to roll up your sleeves and influence the direction WebRTC is going, then we’re ready for you. Bring your hard hat, and keep the lines of communication open.

For those of you looking to deploy paid services, reliable channels to manage your customer relationships, mission critical applications: we want your feedback too, but temper your launch plans. I expect that we’ll have a stable platform that’s well and truly open for business some time next year.
___
Credits: Original hardhat image from openclipart.org; Anthony Wing Kosner first applied the “golden spike” analogy to WebRTC interop.