Another Day, Another Great Post from Alex Iskold on the Structured Web

In his post 30 Thoughts At 30,000 feet, Fred Wilson referred to Alex Iskold as “a freak of nature.”  Fred supports this by saying “He writes code, runs a company, and does amazing blog posts for Read Write Web that are better than most Gartner research reports. I’d ask how he finds the time to do all of that, but I know the answer. Full disclosure – Alex’ company, Adaptive Blue, is a Union Square Ventures portfolio company.”

I’ve gotten to know Alex over the past year (I’m also a small investor in Adaptive Blue) and I’ve begun referring to Alex as “the big brain.”  Almost all of our communication is via email (I see him every now and then) and I’ve started envisioning him as a gigantic brain suspended in some funky liquid with a cable coming out of it that is connected to a computer (if I was a little better with Photoshop I might even draw you a picture of it.)

As Fred says, Alex “does amazing blog posts for Read Write Web that are better than most Gartner research reports.”  I agree and read each of them carefully as part of trying to increase my cognitive reasoning functions around the theme I call The Implicit Web.  Alex’s post today titled The Structured Web – A Primer is another excellent one.  In this post Alex does a great job of linking together four things that I spend plenty of time thinking about – APIs, semantic information, microformats, and RSS. 

Keep them coming Alex.

  • http://www.adaptiveblue.com Alex Iskold

    Thanks Brad! Well, I decided to reply with a Russian joke on the topic of the “brain suspended in some funky liquid”. Here goes:

    There was a big swimming competition and one of the contestants who showed up was a giant HEAD without a body. People said: “HEAD, what the heck are you thinking? How can you possibly swim?” The head replied: “Do not worry about it, I will be fine”.

    So everyone lined up and a helper run through the row and put a swimming cap on everyone. On the mark, all jumped into the water and the HEAD promptly went down. Someone jumped in to save it.

    People gathered around the HEAD and said: “What were you thinking? How can you possibly swim?”. The HEAD was very angry. It said: For 10 years a practiced and learn how to swim with my ears and then some idiot puts a swimming cap on me?!

  • http://www.defragcon.com eric norlin

    more thoughts on Alex:

    http://defragcon.com/Blog/?p=146

  • sigma

    Naw, in

    Alex Iskold, ‘The Structured Web – A Primer’, October 10, 2007

    he’s on the wrong side of ‘the semantic gap’.

    We just need to review a little of what we already know really well:

    Yes, in programming a computer, input data in the form of speech is usually really difficult to work with. Handwriting is not much easier. Some text in flat ASCII or HTML is usually MUCH easier. A ‘comma delimited’ file can be still easier. Some XML is usually a LOT easier.

    Still, put in all the objects, object registration hierarchies, APIs, character encodings, data structures, BNF syntax, ASN.1, forms, controls, EAR diagrams, tags, keywords, links, formatting, rules, frames, semantic nets, etc. you want, and it’s still not the ‘meaning’ that is needed by humans on the other side of the ‘semantic gap’.

    All the effort is still just stirring up different pots of goo in the kitchen where little or nothing makes it out to the dining room as something the paying customers want to eat. And, if the cook sticks a copy of the recipe to the side of the pot, then we can know what’s in it and otherwise likely we do not. I.e., in most software, the ‘meaning’ is only in the comments; the code itself doesn’t mean anything.

    Here is a simple example: What does

    a = bc

    mean? Simple: It doesn’t mean anything. So, what does

    F = ma

    mean? Since all we did was change some symbol names, so far we don’t have more meaning than before. To borrow from G. B. Shaw, “Honey, we have already established that it doesn’t mean anything. Now all we are doing is haggling over nonsense.”

    We can be tempted to say that this last equation is Newton’s second law, but here is a big, HUGE point: The only way we can get any meaning from that equation is to assume the usual context as in texts on physics that surround the equation with lots of clear text in a natural language, e.g., English. It’s the English text that has the meaning.

    So far, computers are really BAD at getting meaning from English language text.

    Broadly, what we have to do is start in the dining room. Once we have a VERY clear idea what cooking is needed, only then head for the kitchen. Then when we return to the dining room we should have something good to eat.

    Are there better approaches? Well, at times, yes. E.g., Amazon has made famous

    People who bought X also bought Y and Z.

    The software that reports this, and even the people who wrote the software, don’t have a weak little hollow hint of a tiny clue about the content or meaning of X, Y, or Z, yet this statement can be quite meaningful and useful for the users and profitable for Amazon. Hmm ….

  • http://voicesage.blogspot.com PaulSweeney

    Agreed, alex does some of the best overall reviews out there.

  • http://graemethickins.typepad.com Graeme Thickins

    Brad, I totally agree with you. Alex is the man.
    I hope to see him at the Widget Summit next week.
    Alex, will you be there?

    cheers,
    Graeme

  • http://www.adaptiveblue.com Alex Iskold

    @sigma. I am not disagreeing with you at all, your are talking about a different topic.

    First, have you read my posts on difficulties with a classic semantic web? I am saying the same thing.

    Secondly, what I am talking about has to do with structure, not meaning. Structure facilitates data remixing, instead of endless and wastefull parsing.

    Alex

  • http://www.wynnewilliams.com Brian Wynne Williams

    Agreed, Brad. Alex has been on fire lately.

    I’m really looking forward to your panel with Alex at the web conference in DC on November 1st (http://www.tnni07.com). It will be great to have you both sharing ideas at once. Thanks for being there!

  • sigma

    Yes, we’re mostly in agreement:

    I agree that better “structure” helps, including for “remixing”.

    And I agree that some structure can help reduce “parsing”. Indeed, since computers still have nearly zero ‘understanding’ of a natural language, any such parsing has to be at best only a crude extraction of anything meaningful from human readable text. Then, some ‘structure’ can easily give much more accurate data.

    If we differ, then perhaps it is, e.g., for effective ‘remixing’, the importance of, to put it in just one word, ‘meaning’. I claim that effective remixing, so far, usually needs more ‘meaning’ than is provided by ‘structure’ alone.

    For the “semantic web”, when I worked in artificial intelligence I did too much on semantics, frames, etc. I concluded that proponents of those concepts want them to capture meaning but do they not and, instead, are still 100% on the wrong side of the ‘semantic gap’, that is, on the side of syntax and not on the side of meaning.

    E.g., yesterday I typed into a flat ASCII file some Transact SQL for Microsoft’s SQL Server. With all the comments and blank lines removed, I got a grand total of, sit down for this evidence of massive computing, 84 lines. But, with all the comments present so that anyone else, or me six months from now, could read that file and get MEANING from it, it was 668 lines. So, where’s the ‘meaning’, in the nicely ‘structured’ Transact SQL of 84 lines or in the human readable comments, mostly in complete English sentences, in the rest of the 668 lines?

    Why do we need ‘meaning’? We need it especially for ‘remixing’ by someone a year later 1000 miles away. To see why let’s start with your:

    The Road
    Cormac McCarthy
    14.95

    For the work of one programmer and their associates in the same office in the same week, your XML can be fine. But, I claim that, if we are strict, severe, and careful, careful enough, say, for your software to ‘remix’ my medical data, then your XML is essentially equivalent to

    The Road
    Cormac McCarthy
    14.95

    So, I’ve removed the mnemonic identifiers. Then the person doing remixing is missing the meaning needed to make serious use of the data.

    Sure, in computing, in an effort for meaning, it has become common to use mnemonic identifiers in place of complete sentences in a natural language, e.g., English, and, then, require readers to struggle and guess, necessarily with poor accuracy, at the real meaning. For effective remixing, this guessing can’t work. So, really, your original XML with it’s mnemonic identifiers, alone, without more documentation to provide more meaning, also doesn’t have enough meaning for effective remixing.

    So, the structure alone is short on meaning and is on the wrong side of the semantic gap separating syntax from meaning; to make any reasonably important use of the data, e.g., for remixing, have got to get the meaning elsewhere. And where? From natural language comments in, e.g., English, comments that, if well written, are meaningful to human readers but still nearly useless as input data to software.

    Is there more hope than that? Yes.

    First, we can hope, as the OSI CMIS/P efforts did, that there will be a ‘registration hierarchy’ for such ‘schema’. Then parts of this hierarchy can become well known and, hopefully, a person 1000 miles away one year later will have access to enough meaning to be able to use data, delivered in the syntax of part of the hierarchy, for remixing. The well known part can mean that, by analogy with my SQL example, the 668 lines of comments with the meaning might be written only once in some documentation of the big registration hierarchy and then used many times by data creators and remixers alike. Thusly defined in a standard way, in the physics analogy, F = ma really can be more meaningful than a = bc. I.e., don’t have to write a full physics book explaining F = ma each time it is to be used and, instead, just say “What we mean by F = ma is as in freshman physics.” — because freshman physics is standard in the ‘academic hierarchy’!

    Second, Amazon example is more hope, not really for ‘remixing’ but for crossing the semantic gap.

    Net, to encourage a lot of remixing, have a big object registration hierarchy with some really good English language documentation, all available in one place on the Internet.

    In our work, we could use it! Would this be a place for Feld and Wilson to open their checkbooks? Not for us; we have a much better direction! Naw, just let the W3C do it! Or Google or Microsoft. Of course, for some years the OSI tried to do it with CMIS/P, but, apparently all those four hour lunches with cases of Beaujolais in Paris, Rome, London, Munich, etc. really slowed the work!