My Photo

« Blogosphere Statistics Proposal | Main | The Blogosphere as Art »

August 10, 2006

Comments

Glenn Fannick

Disambiguation is indeed both an art and a science. Getting it wrong will often mean that you're getting misinformation out of the data. It's pretty near impossible to disambiguate two similar subjects (a movie and a book with the same title) unless you use a series of supporting evidence terms. Even this can fail to produce something approaching 100% recall. Often we find that settling with 90-95% recall is "good enough" for what you're trying to accomplish. Bayisian techniques might take you closer in some instances, but further away in others. All in all, I find it quite challenging to figure out how to separate two things that are intertwined.

The comments to this entry are closed.

Twitter Updates

    follow me on Twitter

    March 2016

    Sun Mon Tue Wed Thu Fri Sat
        1 2 3 4 5
    6 7 8 9 10 11 12
    13 14 15 16 17 18 19
    20 21 22 23 24 25 26
    27 28 29 30 31    

    Categories

    Blog powered by Typepad