Tag: blogging

  • beyond the tag cloud: the tagdex

    I think tag clouds are somewhat useless, to be honest. They are a nice way to fill up a bit of space in a sidebar, if you restrict the cloud to the top 25 or so, but unless the writer is imposing a strict taxonomy on themselves, ultimately the size of the cloud will balloon to an unmanageable size. And a tag cloud in a folksonomy makes no sense, because the wide variation in tags is a feature, not a bug. You want the tags to be vast and redundant. It is ok to have a post about Jhumpa Lahiri’s latest novel tagged “book”, “books”, “review”, “Lahiri”, etc. because this increases the points of entry to the content from tag indexing services like technorati, and also increases the intra-blog, inter-post linkages (assuming you are using some variant of a Related Posts plugin that uses tags for determining what is related).

    A far better way to think of tags is to consider them as terms in an index. The same kind of index you find at the end of a piece of non-fiction, to be specific. Consider an excerpt from the Index to the book, The Physics of Star Trek, as an example:

    excerpt from second page of index to Physics of Star Trek

    It’s easy to see how tags could be recruited to “build” an index of this type. The tags would first need to be sorted in alphabetical order, and then listed as a DL-type HTML list with the “page number” (post number). A range of posts coudl be indicated by the usual dash (ex. Bosons, 192-194) and a list of separate posts by commas (Black Star, 15, 51).

    That would be the crudest implementation, but quite effective. However you could go further than this. For example, what about the “see also” link? You could simulate this by looking for tags whose usage is highly correlated, like “Lahiri” and “books”. You could literally calculate Pearson’s correlation coefficient between all pairs of tags in the database and store that in a lookup table, which woudl be updated whenever a post is published. Then any tag whose correlation coefficient to the present post is above some threshold (say, > 0.50) would get the “See also” treatment on both tags’ entries.

    You coudl even draft categories in wordpress to contribute, by using them as “tags” in their own right and lumping them into the regular index build (after all, as implemented in WordPress, tags and categories are just redundant taxonomic systems). However, you also might look for correlations between tags and categories, and use the categories as Index parent terms. An example from my own geekblog would be something like

    Anime
    Ranma
    Makoto Shinkai
    Someday’s Dreamers
    (…)
    Geek Service
    Asus EEE PC
    HDTV
    Space
    (…)

    I had to manually generate the above but it would be far simpler to do it via correlation analysis instead. At any rate, the basic idea is to assign categories as index headings and tags as their cdependents, since presumably categories are more formally taxonomic, and more importantly, fewer. In fact you could do both, treating categories as tags and also giving them higher status as above. You would just need to put a logical test in to exclude a category from appearing as its own parent/child!

    Obviously a tag-driven index as above wouldn’t fit in a sidebar. A useful place for it would be its own page, but you might also imagine it embedded on the 404 page. As a standalone, though, it would be a very useful node for search engine optimization, enough so that perhaps it should be called a “tagdex” instead of an index to better distinguish it.

    Though useful to any blogger using tags on wordpress, a tagdex would be far more effective on a site whose tags were a genuine folksonomy rather than a taxonomy, since the tag diversity would be greater. However, folksonomy is not a feature of WordPress, unless you use Scott’s awesome WP-Folksonomy plugin (which he wrote in response to my earlier rant about taxonomies and folksonomies). If a thriving ecosystem of wordpress-based folksonomies can be encouraged to thrive (using Scott’s plugin, or equivalent), that will be a significant step towards the Semantic Web. A tagdex represents a coherent snapshot of all the tag metadata in that site’s folksonomy (or taxonomy). As such, it is something that could be parsed and aggregated by the hypothetical Semantic Search Engine of the future.

  • Eliza, 140 characters at a time

    Nick Carr has a fascinating essay in Edge Magazine on the history of ELIZA, the software program that simulated intelligence, and its creator Joseph Weizenbaum who passed away recently. ELIZA represents the frontier that computer science must cross if someday to arrive at true intelligence – ELIZA itself is merely artificial, but still certainly intelligent enough to have fooled a lot of (presumably) genuine intelligences.

    However, in one of those weird quirks of computer life, Nick appears to have also posted the entire essay about ELIZA onto his Twitter account. Given Twitter’s 140 character limit, this means that almost every sentence in the essay was posted as its own tweet. The effect is strangely hypnotic. Whereas reading the original at Edge gives a sense of cohesion and narrative, reading it on Twitter makes it discrete and disjointed, even though the sentences are still adjacent.

    In a way, the context of the content affects its meaning. Is that a limitation of our brains? Or of the medium?

  • upgraded to WordPress 2.5

    Thus far everything seems to be working smoothly. All my plugins are working fine that I can see (knock on wood). The new layout and aesthetics are quite clean and nice. I also like the widgetized Dashboard. I’ll just jump in start using it and see what my thoughts are after a week.

  • instant blog carnivals

    Aziz’s recent post mentioned how blog carnivals allow blogs in the long lonely tail to bootstrap their readership and links. He mentions his real-time Carnival of Brass as a possible improvement. This is a pretty funny coincidence since I’ve been meaning to write about a similar science post aggregator since he invited me to do some guest posts (quite a while ago now [I don’t have many meta thoughts apparently]).

    Anyway, the site I was going to mention is ResearchBlogging.org. It’s a real-time carnival (although I don’t think they call it that) that aggregates posts from people blogging about peer reviewed science (e.g. articles in Nature, Science or other journals). I hadn’t ran into any real-time carnivals before so I thought it was a pretty nice way to let small blogs on specific topics get some attention.

    I thought it was kind of interesting how the two instant carnival implemented things differently (if you don’t you could skip this part). CoB uses del.icio.us to let anyone (even people without a blog) add links to the carnival. Pretty cleverly it does this using only the features of del.icio.us and without any programming. RB.org on the other hand programmed up their own web site and aggregator. Blog authors have to come to the website, submit the paper they are reviewing and copy and paste a custom tag into their blog entry. Their aggregator then watchs that blog’s RSS feed for the custom tag to appear. Although this sounds like a big hassle, RB.org does manage to turn it into a benefit by looking up and formatting all the bibliographic information for the article. It must have taken a good bit of work to set up but they do have full control of the system and can add in things like validating articles.

    Instant blog carnivals either custom-coded or taking advantage of del.icio.us look like a good way to connect long tail bloggers to audiences specifically interested in their topic. I wonder what other real-time carnivals are out there?

  • blogging for dollars

    Michael Arrington advises bloggers to turn down venture capital buyouts of their blogs. I don’t think hi advice – sound as it may be for the bloggers at his level – really has any bearing on blogs in the long tail, which is of course where most blogs (and Techcrunch readers) are. While I don’t have much comment on the dynamics of money and politics that he describes, the following did leap out at me as somewhat relevant to bloggers of more humble station:

    When you stop seeing other blogs as people you admire and want to discuss things with, and start to see them as your competitor, your brain shifts and you stop linking the way you had previously.

    Luckily, the newbie bloggers are there to fill in the links when they’re needed. That’s why, if you are a mid-level blogger, you are likely courted by the bigger blogs looking to get your support. If you know what’s going on and are willing to play the game, you can see your blog rise very, very quickly. Choose the wrong blog, though, and you may find yourself alone and lonely in your forgotten blog.

    As an aside, when I see a young but promising blogger, I’ll start linking to him or her constantly to build them up (others, like Winer, Scoble, Jarvis and Rubel did that for me). The goal is to help move them up to a position of influence as quickly as possible.

    The problem here is that even if every A and B list blogger were to pick a handful of blogs to promote, the result is simply vaulting those blogs into the B and C list. An ecosystem develops in which the top tier relies on the second tier as a filter for news, info, and blog topics, and the second tier relies on the third, etc. so that you have a constant filtration system going on. By the time the process completes, you have only homogenized news at the top tier (which is where the vast majority of blog readers spend their time).

    There really is no way for a truly diverse churn of ideas to filter to the top because of this structure. What’s needed instead is for the long tail to become more self-organizing. One of the strongest tools in the toolbox are blog carnivals, which operate as a link exchange. I took the idea of a blog carnival further, actually, and launched a “real-time” carnival for the Muslim blogsphere called the Carnival of Brass. The point here is to use social bookmarking technology from del.icio.us to create a “badge” that adds new links constantly. I describe the idea in more detail in the Carnival of Brass FAQ and there is no reason that a similar system would not be effective in the techsphere, otakusphere, or any other niche blog community.

    Ultimately, a newbie blogger (like yours truly) isn’t going to make it to the big leagues without an A list sponsor. And that solution doesn’t scale. Rather than chase after the A list traffic, and the big money at the top, the best route to blog success is to grow your audience from within your niche, mining the long tail for eyeballs. Slow and steady over a period of years will definitely bring results, and perhaps not a windfall valuation but certainly incrasing and steady income from ads and affiliate programs. That’s the reality for most of us, though watching the blog gods up on Olympus certainly makes for fine entertainment.

  • WP 2.3.3 does not close injection spam loophole

    Over a month ago, I’d upgraded to WordPress v2.3.3 which addressed a security hole that was permitting spammers to “inject” spammy links directly into posts via xmlrpc.php, and thereby avoid the “nofollow” attribute that is automatically applied to links in comments (to deprive comment spammers of the PageRank mojo they seek). The spam was surrounded by “noscript” HTML tags, which meant that they were invisible in the browser, thus hiding the links from detection and removal. However, subscribers to the blog feed can see the spam since RSS readers ignore javascript markup.

    However, on my latest post at my geekblog, I was hit by the injection spam again. I have sent the following email to wordpress security (security @ wordpress.org)

    Hello,

    I have a WordPress blog at domain http://haibane.info which was upgraded to 2.3.3 as soon as the security release came out last month. I had experienced the injection spam attack detailed here:

    http://wordpress.org/support/topic/151368

    and upgraded to 2.3.3, but on my most recent post I have seen the same spam attack occur. The post is here:

    Google 42

    and I have already removed the injection spam, but am reprinting it below :

    <noscript><a href="http://www.casinomejor. es/casino-online- basico.html">casino online</a> mirar sus oponentes h�bitos.</noscript>

    <noscript>Il <a href="http://www.qualitapoker .com/neteller-game-poker.html">http://www.qualitapoker .com/neteller-game- poker.html</a> � un gioco di carte.</noscript>

    (there were two separate injections into the same post)

    I am disabling user registration as a precautionary measure but it is clear that the 2.3.3 release did not solve the problem.

    I recommend closing user registration on all WP blogs for the time being. Peter’s captcha plugins make user registration obsolete for commenting, anyway.

  • why did MT lose and WP win?

    ma.tt responds to Anil Dash by pointing out that WordPress is fully open source:

    WordPress is 100% open source, GPL.

    All plugins in the official directory are GPL or compatible, 100% open source.

    bbPress is 100% GPL.

    WordPress MU is 100% open source, GPL, and if you wanted you could take it and build your own hosted platform like WordPress.com, like edublogs.org has with over 100,000 blogs.

    There is more GPL stuff on the way, as well. 🙂

    Could you build Typepad or Vox with Movable Type? Probably not, especially since people with more than a few blogs or posts say it grinds to a halt, as Metblogs found before they switched to WordPress.

    Automattic (and other people) can provide full support for GPL software, which is the single license everything we support is under. Movable Type has 8 different licenses and the “open source” one doesn’t allow any support. The community around WordPress is amazing and most people find it more than adequate for their support needs.

    Movable Type, which is Six Apart’s only Open Source product line now that they’ve dumped Livejournal, doesn’t even have a public bug tracker, even though they announced it going OS over 9 months ago!

    I think that this gets to the heart of why WP is so successful. WP vs MT is almost a case study of the Cathedral vs the Bazaar. Were Six Apart to fully embrace the open source model, as WP has done, they would of course lose the revenue stream from licensing, but the absence of that stream hasn’t exactly inhibited Automattic ($29.5 million in the latest round…). Matt alludes to the MT3 debacle, which really was a betrayal of MT’s until-then loyal userbase. It came down to simply money; in an era where the best things in (computing) life are free, Six Apart seems determined to charge. And that’s been the thing holding them back. Technology alone isn’t enough, you have to address the user model. That is what MT has failed and seems to continue to fail to do.

  • blog CMS infrastructure

    Moveable Type is making a play for WordPress users to “upgrade”, with Anil Dash firing a broadshot across Automattic’s port side. Dash makes some good points but fails to articulate a compelling reason to switch, primarily because the basic premise is flawed, that WordPress is hard to upgrade and that its architecture is an impediment to ordinary users who seek to extend its functionality or implement their own style and design.

    Probably the single biggest reason for WP’s success is the one-click install and one-click upgrade offered by Dreamhost and other web host companies. I can literally setup a WP blog for anyone in less than 3 minutes. Most of that time is post-install customization, as well. The plugin ecosystem is far more vibrant on the WP side than MT, and the proliferation of styles and themes means that the end user need only choose from a bounty of available options if they don’t want to tinker on their own – but tinkering is also very, very easy since the various files can be edited directly from within the online administration pages.

    Where MT should focus its poaching efforts is as a competitor to WordPress MU. Thus far, WP-MU remains a complex and daunting installation and maintenance is not simple. However, MU is still attractive, especially because of the new Buddypress functionality that will turn all MU users on a given install into an instant social network. What MT needs to do to grow is not to try and convince the end users with their own WP blogs, but try to create a full fledged blog ecosystem like WordPress.com, and attract users to their platform there. Typepad, built on the previous iteration of MT3, is simply inadequate as a competitor to WordPress.com-hosted free blogs. By providing a new umbrella site for free blogs, MT can build the user base to the critical mass required for increased power user adoption. As things stand, I simply have no incentive to try MT4, and Anil’s PR attempt falls flat since frankly he’s attacking a straw man of WordPress rather than the reality which I deal with every day.

    In a few days, I will log into my Dreamhost panel and upgrade my blogs to WordPress 2.5. WP is a moving target. MT4 needs to catch up and then stay abreast. Until it’s as easy for me to install and upgrade MT as it is WP, they aren’t even close.

  • Semantic authoring

    RWW argues that for the Semantic Web to really take off, content-management systems need to incorporate semantic markup. They argue,

    Allowing authors or readers to add tags to articles or posts allows a measure of classification, but it does not capture the true semantic essence of the document. Automated Semantic Parsing (especially within a given domain) is on the way – a la Spock, twine and Powerset – but it is currently limited in scope and needs a lot of computing power; in addition, if we could put the proper tools in the authors’ hands in the first place, extracting the semantic meaning would be so much easier.

    For example, imagine that you are building an online repository of content, using paid expert authors or community collaboration, to create a large number of similar records – say, a cookbook of recipes, a stack of electrical circuit designs, or something similar. Naturally, you would want to create domain-specific semantic knowledge of your stack at the same time, so that you can classify and search for content in a variety of ways, including by using intelligent queries.

    Ideally, the authors would create the content as meaningful XML text, so that parsing the semantics would be much easier. A side benefit is that this content can then be easily published in a variety of ways and there would be SEO benefits as well, if search engines could understand it more easily. But tools that create such XML, and yet are natural and easy for authors to use, don’t appear to be on their way; and the creation of a custom tool for each individual domain seems a difficult and expensive proposition.

    The problem with XML authoring, as the author notes, is that it’s too time-consuming from a user perspective. You’re basically requiring that the user fill out a detailed, unique form on every post or content node.

    What’s really needed is a way for the CMS to prefill semantic data for the user, and then let the user tweak it. The prefill would have to come from contextual information (post title keywords, word frequency analysis, link text) and metadata (category, tags). In a way you have a mini-search engine index running against your own post, and giving you “search results” to let you “rank” the sub-content into a structured form. And even then, take pains to hide the XML-ness; instead of showing the user a pile of confusing <blah>blahblah</blah> xml markup, it should provide a cleaner view like

    drink: triple latte
    cost: $4.50
    opinion: sucks, overpriced

    where of course the labels (drink, cost, opinion) are mapped to the actual XML containers <drink></drink> etc. The user can edit the list easily, insert or delete labels as they choose, and then hit publish.

    To achieve this, you need good metadata. By good, I mean “rich” – it should be noted that tagging alone is actually pretty poor as far as metadata goes because it’s usually only a taxonomy imposed by the author, not a true folksonomy. The advantage of the latter is that the metadata is more variable, giving any semantic algorithm more room to play with. Note that tagging as implemented in WordPress is not a true folksonomy, though a plugin now exists to rectify that. Semantic algorithms will starve on taxonomies alone.

  • BlogFuse

    TechCrunch had a cool contest for free, lifetime Pro memberships to BlogFuse, a new service that lets you share your blog entries on Facebook. I tried several times to leave this comment in hopes to win, but to no avail, it seems I am running afoul of their Akismet filter. So, for posterity, I reproduce it here.

    The accursed TechCrunch blog software refuses to acknowledge the superiority of my links, hence I am reposting my comment without hyperlink goodness. I am confident that the sheer wit of my blog names and URLs will be sufficient to entice readers to copy and paste rather than click. Who can resist the siren call of righteousness?

    The reason I should permit TechCrunch to burden me with a Premium BlogFuse account for life is because I am, through the sheer acumen of my blogging-fu, attempting to save the world. As a matter of principle, I am obligated to encourage any and all who wish to also save the world to join me in my crusade to lift humanity above itself, and thus in alliance achieve even greater feats. Yes, I am the heavy lifter with the sheer weight and ponderousness of my prose, but as even the mouse carries one straw, so is the camel’s burden lighter and its back saved.

    Behold, TechCrunch, what a mighty engine of progress and salvation to which you shall ally yourselves! For at Haibane.info (www.haibane.info), I blog about all things Geek, Anime, and Art, and thus train the world to accept the timorous intellectual as the true saviours and warriors of civilization against the mindnumbing horde of paris-hilton worshipping, realiity TV enslaved dullards. At Nation-Building blog (dean2004.blogspot.com), formerly Dean Nation, I preach the righteousness of liberalism and purple politics, seeking to uplift our political discourse from the divisiveness of the partisan hacks who do the pundit rounds for their own gain rather than any allegiance to our Republic. And at City of Brass (cityofbrass.blogspot.com), I am at the vanguard of the battle against Islamic Fundamentalism, turning the twisted ijtihad of the Reavers against them and showing them the true power of Islam which shall obliterate their puny violence with purity and light of Truth. I even blog about Blogging itself at metaBlog (www.metablog.us) so that others may be inspired and take up verbal arms themselves on the great plain of Debate.

    In summary, I am a busy man, and I have a world to save. However, I’ll be happy to take the BlogFuse account off your hands. Should come in handy. Plus, why leave the legions of Facebook bereft of my wisdom and leadership?