My Photo

 

  • Subscribe with Kindle

May 18, 2009

In Search of the Alphram Niche

The launch of Alphram has resulted in many posts regarding its relationship to Google, and its ability to deliver whatever it is meant to deliver. Certainly, I see it as being a different beast from traditional search, and I regard the initial period of its release as a time in which users will be looking for the systems natural niche. This search will basically be the search for the intersection of expectations and execution.

In our discussions last night following the opening tutorials for ICWSM the topic of the carbon footprint of our conference came up. It occurred to me that this would be a perfect Alphram style query – what is the carbon cost of a flight from Seattle to San Jose? Better yet, compare this with the same journey by train. Given my understanding of the system and having viewed the screencast, my expectation was that this would be where the system could deliver.

Unfortunately, no such luck. I can get it to say something about the distance (seattle san jose) and about the time via plane (seattle san jose airplane), about carbon emissions in general (carbon emissions). Seattle carbon reveals that there is a town in Mexico called carbon (why would that be preferred to the chemical interpretation?).

Is this an extensibility issue (how will Alphram extend its capabilities to estimate some derived cost of a quantity – e.g. the carbon emissions of a flight of a certain duration)? Or a query interpretation problem (it can do it, but I can’t get the query right)?

Interestingly, the obvious search on Google didn’t get a good result but doing the same on Live Search produced a perfect first result.

November 26, 2008

Zoetrope: That is the Web that Was

Update: Here’s the video

Eytan et al have been working on Zoetrope for a while, but here is a new article/video that shows of how far they’ve come. Zoetrope is a combination of a browser and a web archive that allows the user to manipulate the temporal point of view of a web page (or parts of a web page).

The Internet contains vast amounts of information, much of it unorganized. But what you see online at any given moment is just a snapshot of the Web as a whole -- many pages change rapidly or disappear completely, and the old data gets lost forever.

"Your browser is really just a window into the Web as it exists today," said Eytan Adar, University of Washington computer science and engineering doctoral student. "When you search for something online, you're only getting today's results."

Now, Adar and his colleagues at UW and Adobe Systems Inc. are grabbing hold of the fleeting Web and storing historical sites that users can easily search using an intuitive application called Zoetrope.

"There are so many ways of finding and manipulating and visualizing data on what we call 'the today Web' that it's kind of amazing that there's no way to do anything similar to the ephemeral Web," said Dan Weld, a UW computer science and engineering professor who also worked on the application. One service, the Internet Archive, has been capturing old versions of Web sites for years, but the records for the stored sites are inconsistent, Weld said. More importantly, there's no easy way to search the archive.

(Note to UW – make your videos trivially embedable).

October 09, 2008

Political Streams Released

Take a look.

image

October 01, 2008

DataDepot from Microsoft Research

Briefly, DataDepot is a new site from colleagues at MSR which provides a community platform for the sharing and analysis of temporal data sets. They say:

DataDepot is a set of tools for collaboratively uploading, sharing, and analyzing data. You can use DataDepot to track personal data, to explore public data, and to engage with scientific data.

Take a look!

image

c.f. Swivel and Many Eyes.

September 24, 2008

DoodleBuzz

DoodleBuzz is worth taking a look at. A very different way of interacting with textual data (annotated with concepts).

[thanks to IanG for the tip.]

image

September 08, 2008

The Numerati - Stephen Baker

I've just started in on The Numerati by Stephen Baker. It promises to dive into how our lives expose more and more of our characteristics online, and how those bits are being mined for the benefit of the consumer economy. Having read the first chapter, it looks like Stephen writes in a very narrative and evocative style.

July 09, 2008

Timberpost: Farecast for Stocks

Farecast (recently acquired by MIcrosoft) does just what you want when making an airplane ticket purchase: it predicts if the price is going to go up, down or stay level and advises when you should buy (now, wait). Timberpost is a small company founded by Peter Ross and Tim Taylor (Peter was a professor at Edinburgh University when I was there studying AI and Tim completed his PhD in the same department). Timberpost’s product – TRAITS – takes a crack at a real chestnut of an AI problem: predicting the stock market. The difference with this solution appears to be that it actually does a good job of it.

The graph below shows the performance of a hedge fund run (in simulation) by TRAITS compared against the FTSE EuroFund 300 Index.

image

A recent overview published by Timberpost says:

This portfolio is currently showing an annualised return of +23%, which would rank it 6th out of 200 peer funds according to the latest performance data on real European Long/Short Equity hedge funds published by EuroHedge magazine.

Timberpost describes their technology as follows:

Many machine learning techniques have been applied in finance, including neural nets, genetic algorithms, reinforcement methods and rule induction. We are developing a new approach that is inspired by ideas about how the human immune system functions. Like the immune system, our software can not only discover effective responses to new conditions (in our case, potential trading opportunities), it also adapts to remember past successes in order to be able to re-activate them quickly when conditions change.

In biological systems, recognition happens by molecular binding. In our software, recognition is based on elaborate mathematical expressions that describe features of the behaviour of stocks. The system is designed to be efficient; it can look at many thousands of elaborate expressions per second.

May 20, 2008

Data Mining Blogroll

Sandro Saitta has a nice list (which he promises to maintain) of blogs writing about data mining. Would be convenient to have the OPML available from that page.

August 15, 2007

Rexer Analytics Data Mining Survey

Being at KDD (Knowledge Discovery and Data Mining) right now - not to mention having just sat down for a chat with Karl Rexer, I thought it fitting to post a summary that Karl shared of his recent data mining survey:

2007 HIGHLIGHTS:

·   27-item survey of data miners, conducted on-line in early 2007

·   314 responses from individuals in 35 countries

·   Regression, decision trees and cluster analysis were the most commonly used algorithms (mean number of algorithms used: 6.8)

·   Top challenges data miners report are dirty data, data access, and explaining data mining to others

·   SPSS, SPSS Clementine, and SAS are the three most frequently utilized tools (mean number of tools used: 4.5)

·   There is increasing interest in the Oracle Data Mining tool, and decreasing interest in C4.5/C5.0/See5   

·   The primary factors data miners consider when selecting an analytic tool are: 1) the dependability and stability of software, 2) the ability to handle large data sets, and 3) data manipulation capabilities

·   The findings vary somewhat depending on the domain in which the data miner works, the tools used, geography, and several other dimensions

August 01, 2007

KDD 2007 Programme

Briefly, the KDD 2007 Programme is now availble - looks very good.

Twitter Updates

    follow me on Twitter

    July 2009

    Sun Mon Tue Wed Thu Fri Sat
          1 2 3 4
    5 6 7 8 9 10 11
    12 13 14 15 16 17 18
    19 20 21 22 23 24 25
    26 27 28 29 30 31  

    Categories

    Blog powered by TypePad