Tim O’Reilly: State of the Internet Operating System

Apologies for the slightly weird form of this post; I’ve been reading Tim O’Reilly’s blog post State of the Internet Operating System and replied in a comment. But I think the comment turned into something that’s long and self-contained enough that I can post it here, I’m just too lazy to rewrite it into a form more suitable for a blog post. So without further ado, here it is replicated:

I came here by watching the video of your speech at O’Reilly MySQL CE 2010. There’s a lot of overlap between that speech and this article.

No offense, but I’ll skip about the “you’re right” bit and focus on bits that stood out to me.

You’re taking a very high level view on the technology that needs to exist in order to power this data operating system. I can’t blame you for that, but with me being a fairly low-level tech guy, there are a few oddities that I can’t help but point out.

I suppose they can be summed up that most current efforts that I’m aware of of opening up data in one way or another are a) tacked relatively haphazardly on to existing device operating systems, and b) are fundamentally request oriented.

The first point is harder to explain, so let me start with the second: request oriented data APIs. I’ve had the privilege to work on the P2P subsystems at Joost, a company that tried to merge the worlds of the internet and TV.

One thing that became obvious — was obvious before, really, but sometimes you need to deal with a problem before it really clicks in your brain — is that live TV and on-demand TV require fundamentally different methods for P2P distribution of the video data.

One is a model where date is broadcast more or less indiscriminately, the other is a model where data is explicitly requested. Our current web infrastructure is of the latter persuasion, but for real-time distribution of data we really need the former type of infrastructure.

Incidentally, on-demand P2P is fairly well researched these days, whereas live/streaming P2P doesn’t seem to be in the same way.

On the web, technology such as Comet connections try to solve the problem, but Comet connections are almost the antithesis of the type of connections that e.g. the Apache web server is designed for. Former colleagues and friends of mine at Oni Labs are working on that problem, though it may not be immediately apparent from what you read on their website.

The point is, for the real-time data sharing and processing that you’re saying we’re moving towards, current web technology is a really bad fit.

At that point talking about opening up data is a bit like applying a band-aid to a broken leg: yes it’s a completely necessary effort, but I think we’ll need more fundamental technology changes to happen for that band-aid to become effective.

Now for the other point, open data APIs tacked onto operating systems: back arond 2003 I read a fascinating article by Hans Reiser (these days I’m never sure whether it’s a good idea mentioning the name) about file systems that had one gem hidden in it that opened my eyes quite a bit: ordering data (files) in hierarchical directory structures is really just one of many options for indexing data.

Fast forward seven years to a world where searching is more prevalent than browsing for data, where tagging data loosely rather than strictly ordering data is the preferred model for identifying how data should be organized, and that might not seem like much of an insight.

To illustrate how different people thought back then: I happened to read that paper, and something about desktop search for GNOME at around the same time. A brief email exchange with the guy starting that project revealed that he didn’t see much similarity between “making search happen” and “indexing files by other means than their hierarchical names”.

Incidentally, Spotlight was published a few years later, and really began to open people’s eyes to the possibilities of searching rather than browsing for data.

Another thing that I saw round about the same time (memory is a bit blurry here) are smart folders in the Sylpheed (I think that’s the one!) email client, which let you easily define folders containing emails that e.g. contain a certain keyword or match a regular expression. That really drove home the usefulness of indexing data more liberally. iTunes now contains similar tech for media.

The point here is that we’re still in a world where operating systems handle data in a strictly hierarchical manner, with search technology tacked on rather than built in, with no commonly available API to perform searches or tag data, etc.

I think it’s fair to argue that for an internet operating system we don’t need that to change at all; these common APIs can be implemented at other layers than the file system.

But at some point as a developer you’ll be working on a device operating system, whether it’s server-side or client-side, and you’ll be needing to hook up your device’s idea of hierarchical data storage to the internet OS’s idea of searchable and/or broadcastable data.

I think at that point it would make a lot of sense to make device operating systems handle data in the same way, if only to ease development of end-user apps (efficiency being the other point to bring up, but that’s not usually interesting to end-users in the same way).

So, love the idea of a data operating system, and it’s in fact something I’ve been thinking about in one way or another for a long time. Me being the low-level tech sort of guy that I am, I’m thinking there is fundamental tech to be developed at a much lower level than the level you tend to talk about.

Reblog this post [with Zemanta]