Why the Web Is a Mess and How We Can Fix It

There is a lot of talk about web standards and how to extend them. "Bringing the web forward" is a meme that advocates the bloat of the web browser, which is already a beast comparable in complexity to any modern operating system. On the other hand, there is hardly any talk about the fundamental challenge the web should overcome.

Here's what I want. When I type a URL into my browser, the browser should download the app behind that URL and run it. No clicking "download," no installation. Any app can link to any other app by telling the browser to navigate to a specific URL. All apps can have multiple entry points defined by different URLs. In other words, I want to minimize the friction to find and execute programs on the Internet. The question is how to do this securely.

You may be wondering if this isn't exactly what the browser already does. Well, yes, that's pretty much what it does. The problem is that it does a whole bunch of other stuff as well. The web standard shouldn't include programming languages or GUI elements. It should define a secure way of downloading and executing arbitrary code from arbitrary corners of the Internet. It should determine which parts of my computer the web apps should have access to by default (e.g., keyboard, mouse, some type of local storage), which parts they can access given permission by the user (e.g., microphone, camera), and the interface through which it can access those things.

Having a standard like this would end the talk about web-apps vs. native apps, because web-apps would be native apps; the primary difference being the security model. With native apps, the user decides whether they trust a given application. With web-apps, the browser makes the decision by restricting the applications access to only resources defined in the web standard.

The browser should fundamentally be a security layer between the Internet and my operating system. When I click a link, I need to be sure that whatever app is behind that link won't destroy my family photos or record a video of me picking my nose without asking for permission first. Everything else, such as what programming languages or GUI libraries should be used for these applications, shouldn't be a decision imposed on application developers by the browser.

But it wouldn't work with search engines

One of the most common arguments against this line of thinking is that search engines depend on a language they can parse information from without executing it. Namely, HTML can either be "executed" as a web page or simply parsed to extract the text bits and links and then use that information to build an index. If websites were "black-box applications" we wouldn't be able to figure out programmatically what the data they provide is about.

There's no reason to assume that what the user sees should be the only interface to the app. As long as there's a communication channel that allows message-passing between the browser and the apps, the app can return data in a more structured format such as HTML or JSON for search engines to use. And this is the big thing: standardization of these formats, as well as the format of the messages, would be completely separate from the browser itself.

Already, Google, and probably other search engines as well, are forced to execute Javascript because modern websites heavily rely on it, which means that to run a search engine, you need to run a browser, which takes a lot of computing resources. It's not enough to just send an HTTP request to a URL and parse the returned HTML.

Furthermore, Google image search is now based on neural networks that analyze the content of images, which is opposite to the idea of the semantic web where website creators are expected to correctly label every piece of information to make them machine-readable. The use of neural networks and other machine learning techniques for data structuring and classification will propably grow in popularity. Maybe the best way to implement a search engine in the future is not to ask for an alternative text representation of the app but instead, run it through an AI that parses the content and perhaps interacts with it to figure out what the app is about.

But the apps would be too big

Another common argument goes like this. The reason the browser has to be so complicated is that apps would be too big if they had to implement their own rendering engines, GUI libraries, etc...

The solution is something operating systems have had since forever - shared libraries. A frequently repeated advice to developers is to load commonly used libraries such as JQuery or React from a CDN to prevent users from downloading the same code many times from various sources. This goes to show that developers are incentivized to share code with other applications to keep the size of their apps small.

But we already tried this and it didn't work

ActiveX was Microsoft's attempt to bring apps to web pages. As far as I understand it didn't have much of a security model, other than the fact that app developers could digitally sign their packages. The end user was still asked to trust the app provider to access any part of their computer.

Java Applets were a thing in the 90's and from what I can tell, developers hated them. Java Applets had some fundamental problems: their load time was too long, there weren't any good GUI libraries available, they had to be programmed using Java, etc. On top of these problems, Java Applets had constant security issues which finally lead to the demise of the whole platform.

Flash accomplished, in many ways, what Java Applets tried and failed at. It was fast to install, easier than Java to develop applications for, applications started quickly and supported streaming video. But it, too, had problems with security and was eventually killed off by Apple's decision to leave it out of their mobile browser.

Native Client was Google's effort, and I think the most interesting of all so far. Instead of relying on a specific language or runtime the idea was to put some of the security to the compiler. Here's a video explaining the Native Client approach in quite a bit of detail. Google, unfortunately, canned Native Client in favor of WebAssembly.

The problem with all of these is that they are additions that run inside a web page instead of a wrapper that goes around it. They complicate the issue rather than simplify it. What I'd like to see is a more generic solution that wraps the mess we're currently in.

I'm in no way claiming that building such a solution is a trivial task. What all these projects show is that security is hard. However, what I'd like to see is a shift in focus, from adding more and more stuff to the browser to the fundamental challenge of the web: securely executing arbitrary code from arbitrary sources. If more people focused on security instead of bloating the existing web standards, maybe we'd get somewhere.

But it's over! We are too far down the current path

There are billions of websites online today. The overhead of rewriting all of that would be far greater than the benefits we'd get from a more powerful system, which sounds like a pretty convincing argument until you realize there's no need to rewrite all the websites.

We would be replacing a less powerful system with a more powerful system. So, logically, once the more powerful system is ready to deploy, we can run the less powerful system in it. You can think of the DOM and the JavaScript runtime as shared libraries on this new, more generic platform.

the next web

Final Thoughts

Why isn't there more discussion on how to save the web from collapsing under its own complexity? All the improvement proposals I see are just additions to the existing standards, which makes it more and more difficult to maintain consistency across different implementations.

I've scoured the Internet for discussion on this but I've found very little. Amongst, the few things I found was this Alan Kay email. True to his style, Mr. Kay remains vague. It's difficult to know what exactly he's advocating. Nevertheless, as far as I can tell, I agree with him.

Another person who seems to get it is this hacker news commenter, who'd like to see a DOM implementation inside a Native Client app. The same comment, however, also brings up a very likely reason why the browser vendors have never proposed anything like this. They have an incentive to complicate the standards. The more complicated the standards, the more it costs to maintain a browser implementation, and the less competition there will be. This strategy is even more tempting to companies like Google and Apple that have almost infinite funds to throw at their browsers.

What do you think? Is there hope? Or is the web doomed to forever grow in complexity?