My Photo

 

  • Subscribe with Kindle

July 03, 2009

Is Your City HyperLocal?

Seattle resident’s are spoiled for choice when it comes to hyperlocal blogging. In addition to a couple of networks – Next Door Media {My Ballard, PhinneyWood, Magnolia Voice, Queen Anne View, Fremont Universe} and Neighborlogs {Capitol Hill Seattle, Central District News, The Southlake, etc.} there are a number of independents (e.g. West Seattle Blog). All told, there may be up to 100 blogs which focus on residential issues in and around Seattle.

Is Seattle unusual in its coverage, or are there other places that have a thriving hyperlocal blogging culture? Please comment if your neighborhood is covered.

July 01, 2009

Naughty Feeds

Do you have a naughty feed? Come on, admit it. You deliberately left out the title, or did you put in an empty summary? Maybe you’re the one who doesn’t put in any dates, or perhaps you set the permalink to the home page of your blog. Well – you are a naughty blogger, shame on you!

Bing haz Twitter

This is very cool. When Bing recognizes a search for a celeb (of the real space or other varieties), it will provide an answer composed of their Twitter identity and recent tweets. Not yet rolled out for me, but the Bing blog has a screen shot (repeated here).

image

June 30, 2009

Free, Blogs

The Blog Herald comments on the Free skirmish between Gladwell and Anderson by pointing out

The blog is alive and kicking, if nothing else but because it is hard to pick critics and arguments to pieces in 140 characters or less.

I wrote quite a bit about the book when it was being formulated. Now, I’m looking for the (free) online version but haven’t found it yet. Chris/Wired is known for erring on the side of sensationalism to move units (as is appropriate in the media business), so I wonder what his investment in this new thesis is.

June 27, 2009

Measure, Don’t Guess – Growth in the Blogosphere

Charles Arthur writes a piece about the slow demise of the blogosphere. Arthur asserts that bloggers are a fading breed, and that

they've all gone to Facebook, and especially Twitter.

Arthur claims to have come to this conclusion via a mixture of anecdotal evidence, and data provided by Technorati. Let’s do our own experiment to see if the blogosphere is fading. Let’s take a very mundane search term – one that we expect to be a constant background in the sea of celebrity death buzz, hi-tech launches and liver transplants : ‘car repair’. As we can see from the Blogpulse graph, it is pretty stable with a few blips here and there:

carrepair

Blogpulse plots the percentage of all blog posts on this topic. If the blogosphere were dying, the absolute counts would also be slowly reducing (even if the percentages were staying the same as the graph shows).

On Jan 4th, 0.026 of posts were on the term ‘car repair’. This translates to 142 posts (Blogpulse allows you to click through to see the number of hits). On June 21st, where there were 0.027 % of posts on the term, Blogpulse registers 144 hits. Ok, I don’t really see any slacking off there. What happens when we look at more data points? If we do this for ‘car repair’ and ‘birthday’ we get the results below. Here I’ve normalized the values by the percentage of posts (count/percent - the trend shows values for 1% of all blog posts as an artifact of using percent * 100). To my eyes, this looks pretty flat – there is a very slight downward trend, but it could easily be in the noise.


image

Was Charles Arthur going for a Wired-esque sensational piece?

The Long Tail of Text Mining

In systems that execute inferences via a pipeline of steps, every step is an opportunity for failure. Therefore, it is imperative  that implementers focus attention on the details of every step. For example, in text mining, systems have to

  1. Import and parse documents – did you get the title? did you recognize the footers? did you strip out the page numbers?
  2. Identify sentences and words – is the document in a latin alphabet language? are there word separations? are you dealing with acronyms? how is your unicode-fu?
  3. Provide part of speech tags for the words – is the text an example of the type of data that the POS tagger trained on?
  4. Identify entities – are you prepared to identify unusual names like Barack Obama?
  5. etc.

I’ve seen a couple of attention to details bugs surface in the past few hours. The first was reported by Danny Sullivan, in which Google (and this is still the case at the time of writing) thinks that Michael Jackson the writer is the most salient person with that name.

image

The second is visible on WeSmirch, in which the system fails to identify the title of Lisa Marie Presley’s blog, naming it ‘Create Free Blogs & Online Journals on MySpace Blogs’:

image

Attention to detail will always be a killer feature!

June 26, 2009

Steve Irwin, Michael Jackson

When Steve Irwin, the famous crocodile hunter, was killed by a sting ray, 5.5% of the posts on that day in the blogosphere (September 05, 2006) mentioned his name. Yesterday, mentions of ‘Michael Jackson’ topped out around 3%. Things are heading north from there today – currently around 8% – but note that the more immediate statistics come with a higher margin of error due to the lower sample size.

mj

Twitter Trending Terms – Could Do Better

Twitter has oodles of data – millions of tweets a day. They have smart people working on this data, and they make all the right noises about social search. However, I’m looking at the trending topics on the site just now and I see these: MJ’s, Rip MJ, RIP Michael Jackson, Farrah Fawcett, #iranelection, Pop, Thriller, MTV, Iran, #michaeljackson.

I don’t get it. There are a number of problems here:

  1. These aren’t topics, they are words or phrases. There are only 4 topics present here (Michael Jackson’s death, Farrah Fawcett’s death, the Iranian Election and MTV).
  2. The ‘phrases’ present in the terms are pretty lame: ‘MJ’s’? Earlier today the phrase ‘Did Michael Jackson’ was a trending topic.
  3. There is no attempt at normalization (RIP MJ == RIP Michael Jackson)
  4. They actually are not at all interesting – anyone out there not know about Michael Jackson, Farrah Fawcett or Iran? Perhaps the MTV thing is a little more obscure.
  5. They aren’t trending – ok, the RIP stories are, but the Iranian election? that’s been top of mind for many days now.

Twitter’s featuring of these topics and the immaturity of the technology in spite of the promise of the data does not paint a good picture for their prospects.

June 25, 2009

Search User Interfaces: Marti Hearst

Marti Hearst - pre-eminant in the fields of text mining and user interfaces - yesterday published her book on Search User Interfaces. I intend to write up a review of the content later, but for now heres a summary.

The book, which is available freely online,  describes itself thus:

This book presents the state of the art of search interface design, based on both academic research and deployment in commercial systems.


and covers topics including the design and evaluation of interfaces, and visualization of search results.

Coincidently, the launch of the book is timely with the recent launch of Bing. A big part of the strategy behind Bing has been to provide a better user experience around search, offering affordances to help the user succeed in their task right there in the interface. Marti's book, which was completed prior to Bing's launch, doesn't cover this new search engine, but I'm sure the second edition will!

Marti is blogging about the book at SearchUpTicious (!)

June 22, 2009

Social Business Design

Kate Niederhoffer, and her colleagues have posted about her groups move into a space they term Social Business Design. In a sense, what they describe is a reaction to the siloing of thinking around social media brought on by the term ‘media’ itself. This is, perhaps, an extension of my dismissal of the term ‘consumer generated media’ due to its origination in the world of marketing and advertising. Kate and friends are making the point that the social part of the future of online ecologies is not just about ‘media’ per se, but about the entire end-to-end, front-to-back dissemination, assimilation and synthesis of information and its integration with business processes and models.

While this is an interesting direction, it reminds me of terms like ‘democratization’. As anyone in a large organization can attest, capitalism doesn’t scale through democracy. Good companies succeed via Philosopher Kings. The idea that social effects will permeate a large organization without accommodating hierarchy is a little far fetched.

Niederhoffer et al’s thesis centres on four core notions. As Kate puts it:

  • Ecosystem - a community of connections
  • Hivemind - the socially calibrated mindset of individuals
  • Dynamic Signal - the constant multi-faceted means of collaboration
  • Metafilter- a method of finding signals in vast amounts of noise

The later two are clearly about information flow. Hivemind (which I’ve always taken to mean the aggregate psychological process of a group, not that of an individual moderated by social context) is also about information diffusion and the assimilation of social cues in moderating that information and republishing.

Perhaps the key to understanding what this topic is about is to understand the scale at which is intended to be implemented. I’m excited to follow where their thinking leads and what impact it will have on reshaping existing businesses and forming new ones.

Twitter Updates

    follow me on Twitter

    July 2009

    Sun Mon Tue Wed Thu Fri Sat
          1 2 3 4
    5 6 7 8 9 10 11
    12 13 14 15 16 17 18
    19 20 21 22 23 24 25
    26 27 28 29 30 31  

    Categories

    Blog powered by TypePad