Category: metaBLOG

blogging about blogging about

the most important droplet

With all the talk of the cloud, it’s worth noting that for every user, the single most important droplet therein will always be their own PC. Cloudware is still far from feature-rich as software running on your own machine, and of course all the important user data still resides on the home node (and is unlikely to significantly shift online in a world where external USB hard drives approach the terabyte-capacity and $100 price point equally fast). And it should be noted that the home node will also have the raw speed and performance edge over the cloud in any mainstream computing scenario. Thus, cloudware that leverages the power of the home node will be the killer apps of the future, not purely-cloud run apps.

To get there, however, we need to tap the power of the home node va the browser, which remains the nexus of where cloud computing and flex computing intersect. Here’s how we get there:

Javascript creator and Mozilla CTO Brendan Eich has revealed a new project called IronMonkey that will eventually make it possible for web developers to use IronPython and IronRuby alongside Javascript for interactive web scripting.

The IronMonkey project aims to add multilanguage functionality to Tamarin, a high-performance ECMAScript 4 virtual machine which is being developed in collaboration with Adobe and is intended for inclusion in future versions of Firefox. The IronMonkey project will leverage the source code of Microsoft’s open source .NET implementations of Python and Ruby, but will not require a .NET runtime. The goal is to map IronPython and IronRuby directly to Tamarin using bytecode translation.

A plugin for IE will also be developed. The upshot of this is that Python and Ruby programming will become available to web applications run through the browser, on the client side. Look at how much amazing functionality we already enjoy in our web browsers thanks to Web 2.0 technology, which is AJAX-driven (ie, javascript). Could anyone back in 1996 imagine Google Maps? Hard-core programming geeks who understand this stuff better than I do should check out Jim Hugunin’s blog at Microsoft about what they have in mind; it’s heady stuff. But fundamentally what we are looking at is a future where apps are served to you just like data is, and your web browser becomes the operating system in which they run. I can’t even speculate about what this liberation from the deskbound OS model will mean, but it’s not a minor change.

Still, this all is going to run on the home node, and not in the cloud. That’s the key. There’s only so much you can do, and will be able to do, on vaporware 🙂

March 5, 2008
WP 2.5 has built-in gravatar support

Seems that WordPress v2.5 (which will be out this month) will include support for Gravatars by default:

Theme Authors: Adding Gravatars to Your Theme

The function to add Gravatars to your theme is called: get_avatar. The function returns a complete tag of the Avatar.

The function get_avatar is setup as follows:

function get_avatar( $id_or_email, $size = '64', $default = '' )

* id_or_email: The authorâ€™s User ID (an integer or string) or an E-mail Address (a string)
* size: The size of the Avatar to display (max is 80).
* default: The absolute location of the default Avatar.

That’s the default avatar icon up there. Ugh. I am really not interested in gravatars, I am a fan of Monster ID and Wavatars. I hope Scott and Shamus can update their plugins to hook into the native 2.5 functionality as that would be a lot simpler. Adding a dropdown to the Admin panel to let you select between different icon sets is probably the best approach.

UPDATE: Ryan Boren says that any avatar service can be invoked, not just Gravatar:

Gravatar is the service used by default. get_avatar() is completely pluggable, however, so any service can be used. get_avatar() is built-in so that themes will have some fixed API on which they can rely, regardless of whatever avatar service is being used behind-the-scenes.

March 4, 2008
Microsoft Office Online?

A few days ago, Nicholas Carr reported a rumor that Microsoft was poised to take its Office Suite into the cloud, and was building out huge new server farms to prepare:

I’ve heard that Microsoft has begun briefing its large enterprise clients on an expansive and detailed strategy for moving its software business into the cloud. If the report proves correct – and I make no guarantees – the company will unveil the strategy to the public either next week or the week after.
[…]
it’s been building out the backend infrastructure – the data center network – required to run web apps reliably and on a large scale. These obstacles are now coming down. The upgrades have been out for more than a year, and, despite some glitches, have generated a lot of cash for the company. As for its infrastructure, a massive new data center near Chicago is expected to come online this year, adding to the capacity of the new centers the company has built or bought in Washington, Texas, and California.

However, Michael Arrington threw cold water on the idea:

I fear that the rumor may have been wrong, and that Microsoft has no such plans in the near future. Tonight Microsoft announced an expansion of their â€œsoftware plus servicesâ€ strategy that gives businesses many of the collaboration and storage benefits of Sharepoint without actually having to install software on their own internal machines. The program was initially launched in September 2007.

This is not a web based version of office. Itâ€™s not competitive with what Google is offering businesses with Apps and Docs. Itâ€™s a half way approach that still requires the installation of Office and other software on local machines.

However it isn’t clear that this necessarily means that CloudOffice is dead. The above strategy could certainly be a half-way step in porting Office to the cloud. And a lot of enterprise customers are probably still going to stick with their local installs for quite some time, they aren’t going to switch overnight. IT has inertia, after all – as does the significant investment most companies have made in legal licenses for Office. I think Microsoft has to move cautiously, and it’s going to take time.

Can Google or anyone else deliver a fully-web-based office suite with a complete enough feature set to match Office in the interim? I think that no pure web application can hope to match functionality of a desktop one, because the desktop app has so much more computing power. The browser is a constraint – which is why Adobe AIR, which breaks free of the browser, is such an innovative and exciting product. Microsoft’s own version, called Silverlight, is probably going to be the backend for Office online. The advantage here is that the immense computing power of the client – RAM, CPU – can be used to make the cloud app much richer than if all of the functionality has to be delivered via the narrow Internet pipe.

March 4, 2008
TechCrunch (hearts) Valleywag

Does Mike Arrington have a stake in Valleywag? At TechCrunch, Arrington issues a dire warning that Valleywag (a Silicon Valley gossip rag) will drive someone to suicide soon enough:

Today I read all the sordid details about the alleged sexual encounter between a notable technology visionary and a woman who appears to be looking for as much publicity as possible. Where did I read it? On the Silicon Valley gossip blog Valleywag.
[…]
A lot of people I know read Valleywag, and say itâ€™s fun to hear all the gossip. But all of those people change their tune the first time the blog turns on them and includes them in a rumor. An example: TED founder Chris Anderson, distressed over the publication of the TED attendee list, recently wrote to Valleywag owner Nick Denton that he â€œdidnâ€™t think [heâ€™d] be on the receiving endâ€ of Valleywag gossip. His email was promptly posted to the site.

Most of the gossip is harmless. Much of it, though, isnâ€™t (like the sex incident above). Celebrities have had to live with this kind of nonsense for decades, which explains why some of them pull out of society entirely and become completely anti-social. Perhaps, some argue, they bring it on themselves by seeking fame.

But for people in Silicon Valley, who are not celebrities and who have no desire other than to build a great startup, a post on Valleywag comes as a huge shock. Seeing your marriage woes, DUI or employment termination up on a popular public website (permanently indexed by search engines) is simply more than they can handle. They have not had the ramp up time to build resistance to the attacks.

The suggestion that web entrepreneurs are more emotionally fragile than Hollywood celebs is pretty weak. The reason for Valleywag’s success is not because Nick Denton is out to getcha. It’s because prominent Silicon Valley entrepreneurs – like Michael Arrington – keep reading Valleywag, sending them tips and gossip, and blogging about it.

Arrington goes on to observe the obvious, that tragedy is good business:

So how long will it be before Valleywag drives someone in our community to suicide? My fear is that it isnâ€™t a matter of if it will happen, but when. Valleywag and Nick Denton, though, will likely look forward to the event, and the great traffic growth that will surely follow.

Emphasis mine. I think that it borders on libel to suggest that Denton would “look forward” to the event, though obviously he won’t mind the traffic. But all of that traffic exists because, as Arrington observes, there’s a market for it. Is Denton to blame, or the people who Valleywag writes about themselves, who seem all too eager to eat their own? As Anderson found out in the anecdote above, no one thinks they will be on the “receiving end” of Valleywag’s gossip. As they say, pride goeth before the fall.

Arrington must be making good on his promise to suck up to Denton, because his post at TechCrunch just gives Valleywag all that much more power. He complains that “the valley was a much nicer place to live and work before the days of Valleywag” – but whose fault is that?

March 3, 2008
the social horizon

Does the inherent limit on human interaction group size apply to online social networks?. That limit is called “Dunbar’s Number” and is estimated to be ~150, based on observations of social networks among primates and then extrapolating to humans taking increased brainpower into consideration. An intriguing piece in the WSJ asks whether online social networks are still bound by Dunbar’s number or whether technological innovation might permit us to exceed it:

But there is reason to believe that the social-networking sites will enable their users to burst past Dunbar’s number for friends, just as humans have developed and harnessed technology to surpass their physical limits on speed, strength and the ability to process information.
Robin Dunbar, an Oxford anthropologist whose 1993 research gave rise to the magical count of 150, doesn’t use social-networking sites himself. But he says they could “in principle” allow users to push past the limit. “It’s perfectly possible that the technology will increase your memory capacity,” he says.

The question is whether those who keep ties to hundreds of people do so to the detriment of their closest relationships — defined by Prof. Dunbar as those formed with people you turn to when in severe distress.

The problem here is the definition of the word “relationship”. Dunbar’s definition of “closest” is just one of many possible ones, and the various definitions might well overlap. But does that mean that business relationships are excluded from Dunbar’s limit? If so, then you might expect to see many more contacts on LinkedIn, which caters to a business networking model, than on Facebook which is primarily stalker heaven. LinkedIn is approaching critical mass in terms of network effect; RWW found over 80% of their business contacts already using it, for example.

There are surely other models one could employ to map relationships: blogrolls, chat client lists, twitter fans/friends, etc. I think any one of these – or a weighted combination of all of them – would be good data sets to see whether Dunbar’s number truly holds online or not.

The reason why it is important to consider is because if it does hold (or if indeed there is any limit at all) then that substantially undermines the argument that the social graph is a construct of unlimited utility for search personalization or the semantic web. If anything, the social graph could well become an obstacle to finding information rather than an asset. Everyone keeps talking about search “personalization” but that’s a synonym for search filtering; filtering is a lossy process, you are discarding data. Optimal search wouldn’t define the best result as the most “personal” but rather the most “relevant” – and often that ight well be data lying far eyond the cozy confines of your social graph. In fact, assuming that you are searching for something you don’t know, it’s more likely to be outside than inside.

Human nature eing what it is, people might not even realize that their newly personalized search results are less relevant!

February 27, 2008
email the google-killer?

Fascinating numbers via Bernard Lunn at RWW about the true market share threat to Google of a Microsoft-Yahoo merger:

Email is 49% of Impressions. Portals and Search Engines is 10% by contrast. This is some free data from Nielsen-Netratings. click on Top Site Genres.

56% is Microsoft and Yahoo combined market share of webmail. Gmail is down at 7%. This data is via Fred Wilsonâ€™s back of envelope calculations.

And as far as email goes, Lunn notes that Hotmail is a dying joke and that Yahoo’s email product is superior:

Hotmail has lagged terribly. Most people who used it would not return, I cannot imagine who would switch (an AOL user maybe) and most people already have email. So it is a lost cause. One major reason it lagged IMHO was Microsoft fear of cannibalizing Outlook. So they wonâ€™t offer the features that users want that both Google and Yahoo have been rushing to fill. Yahoo is reputed to have the most â€œOutlook-likeâ€ interface and that matters massively to people making the switch.

Microsoft will probably do the smart thing and let the Yahoo team run with email. Hotmail will die as a separate brand, eventually.

It should also be noted that Yahoo acquired Oddpost in 2004, which is now the foundation of their webmail platform (and note, Yahoo mail didn’t spend long in beta, unlike Gmail which embarrassingly remains in beta mode even after the official launch in 2005.

Yahoo’s email is superior to Gmail in almost every respect except for chat integration and email conversation grouping. Yahoo’s feature set includes disposable email addresses, drag and drop, and tabbed viewing. As Lunn notes, the potential for monetization is there, both in displaying standard contextual ads as well as the option to pay Yahoo $20/year for increased storage and ad-free viewing. But what about email search?

Yahoo’s email search is truly innovative. When you type a search term, a separate pane open up and gives you additional search refinement options. Click on the thumbnail below to see how it works:

Here’s a closeup of that search pane:

It’s amazing how functional and useful this is after a while. It’s also easy to see how this could be a vector for additional monetization. It’s not hard to see how Yahoo could place ads below the preview pane and search-specific ad results in the search refinement pane, even for paying customers like me (free Yahoo mail puts ads at the top of the page, and inserts text on outgoing mail in the footer but obviously this hasn’t impacted their market share.)

And as for integrated chat, since MS messenger and Yahoo Messenger already talk to each other, we can expect that the mail client won’t be static on that front either.

So, 49% and 56% indeed. It’s not hard to see why Microsoft is going after Yahoo, or why Google is afraid.

February 23, 2008
Semantic authoring

RWW argues that for the Semantic Web to really take off, content-management systems need to incorporate semantic markup. They argue,

Allowing authors or readers to add tags to articles or posts allows a measure of classification, but it does not capture the true semantic essence of the document. Automated Semantic Parsing (especially within a given domain) is on the way – a la Spock, twine and Powerset – but it is currently limited in scope and needs a lot of computing power; in addition, if we could put the proper tools in the authors’ hands in the first place, extracting the semantic meaning would be so much easier.

For example, imagine that you are building an online repository of content, using paid expert authors or community collaboration, to create a large number of similar records – say, a cookbook of recipes, a stack of electrical circuit designs, or something similar. Naturally, you would want to create domain-specific semantic knowledge of your stack at the same time, so that you can classify and search for content in a variety of ways, including by using intelligent queries.

Ideally, the authors would create the content as meaningful XML text, so that parsing the semantics would be much easier. A side benefit is that this content can then be easily published in a variety of ways and there would be SEO benefits as well, if search engines could understand it more easily. But tools that create such XML, and yet are natural and easy for authors to use, don’t appear to be on their way; and the creation of a custom tool for each individual domain seems a difficult and expensive proposition.

The problem with XML authoring, as the author notes, is that it’s too time-consuming from a user perspective. You’re basically requiring that the user fill out a detailed, unique form on every post or content node.

What’s really needed is a way for the CMS to prefill semantic data for the user, and then let the user tweak it. The prefill would have to come from contextual information (post title keywords, word frequency analysis, link text) and metadata (category, tags). In a way you have a mini-search engine index running against your own post, and giving you “search results” to let you “rank” the sub-content into a structured form. And even then, take pains to hide the XML-ness; instead of showing the user a pile of confusing <blah>blahblah</blah> xml markup, it should provide a cleaner view like

drink: triple latte
cost: $4.50
opinion: sucks, overpriced

where of course the labels (drink, cost, opinion) are mapped to the actual XML containers <drink></drink> etc. The user can edit the list easily, insert or delete labels as they choose, and then hit publish.

To achieve this, you need good metadata. By good, I mean “rich” – it should be noted that tagging alone is actually pretty poor as far as metadata goes because it’s usually only a taxonomy imposed by the author, not a true folksonomy. The advantage of the latter is that the metadata is more variable, giving any semantic algorithm more room to play with. Note that tagging as implemented in WordPress is not a true folksonomy, though a plugin now exists to rectify that. Semantic algorithms will starve on taxonomies alone.

February 21, 2008
social search skepticism

Last summer, there was a dust-up between several high-profile “web 2.0” personalities that made for interesting reading. It started with Robert Scoble, who created a three-part video essay provacatively titled “Why Mahalo, TechMeme, and Facebook are going to kick Googleâ€™s butt in four years“. Scoble is obsessed with the idea that search engine optimization (SEO) is poisoning the well of search and that adding the “social” element will magically improve relevance. Dave Winer had a fairly succinct rebuttal, Danny Sullivan took issue with Scoble’s explicit equating of SEO with spam, and Rand Fishkin steps through Scoble’s arguments and fact-checks it to oblivion.

Overall, I came away from the fracas convinced that social networking is not some magic bullet and the problems of “how do I find information” and “who do I want to interact with” to be wholly separate ones. I like a walled garden for my identity-driven personal and professional interactions, but I also want to wander in the wild when need be. It’s the same reason I am skeptical of “personalized search” services like Google’s own “Web History” initiative. It’s not the privacy issues that worry me, but rather the imposed limitation on what my search results are based on what my search results were in the past. Why should I assume that for a given search, the most relevant results will necessarily be related to the searches I previously made? Presumably I search for something based on a need for new information I do not currently possess.

The same argument applies to “social search” initiatives like Delver, recently praised by RWW as “more personal and meaningful to users than a generic search using ‘normal’ search engine.” Why is a filter derived from my social graph any guarantee of more relevance to my query?

If anything, the onus on the user is to craft a better query; to that end Google offers an advanced set of search operators that provide tremendous power and flexibility. Overall, every search is unique, and no amount of personalization or social networking is going to change that fact. If anything, the right approach is to allow a search to stimulate new searches; ie ask new questions rather than spoonfeed me old answers.

February 20, 2008
wordpress folksonomy progress

The experiment of adding Scott’s WP_Folksonomy plugin to my blog has been a success so far. My blog, haibane.info, is by no means a giant traffic draw but it does have enough that the userbase has been adding some tags of their own. I have at least one user (Scott himself?) who reliably adds tags to most posts, and there have been others drive-by tagging as well. It’s encouraging to see however that there was a thread at the WordPress support forum asking about folksonomy; I directed them to the plugin asap. Now, a search for the term “folksonomy” will lead people to the same tool, and thus the seeds are sown for more people to use it. Let’s hope hat many more blogs, preferably far larger than mine, embrace and adopt folksonomy this year.

February 15, 2008
Foldershare

I’m starting a new category, called “cloudware” which is how i intend to refer to software that runs in the cloud. This will be my way of documenting what cloudware I actually use and fine useful.

Fitting then that the first entry here is for Foldershare, a beta service from Microsoft that is stunningly simple in its execution. It’s basically a P2P client that runs on your own machines and synchronizes files across them in any folder(s) you specify. It does require a small client download on each PC to work, but the footprint is quite small (On my Asus EEE, its taking up about 10 MB of RAM). However, once the client is installed on each PC you want to sync, all config is done via the Foldershare website. You can also sync files between yourself and other people, permitting collaborative work.

I think the idea is effective because it treats the PC as part of the cloud rather than just a thin client. P2P implicitly assumes that the important content is at those end-nodes, ie the users’ PCs, and not intrinsic to the cloud. Using P2P in this very specific, very focused way is simply brilliant.

It should be noted that they’ve had some hiccups, but hopefully that’s behind them 🙂 I am using it right now for a folder containing a manuscript in draft and it’s incredibly empowering for me to be able to sit at either computer and just start working. The files are even available online if you’re away from your client PCs.

In one sense a purely cloud-based application like Google Docs obviates the need to keep files in sync. However, cloud-based productivity apps are still orders of magnitude behind the desktop equivalents. Even Open Office still doesn’t suffice for my needs compared to Microsoft Word. There’s simply no way to (yet) replicate the productivity of working on your home PC by working exclusively in the cloud. This is why Foldershare is so interesting – it lets you work as you normally would, but augments that by letting you tap into the distributed nature of the cloud. It’s the best of both worlds and until pure cloudware catches up to regular software in terms of functionality, it’s going to be a better solution than working exclusively online.

February 13, 2008