My Photo

« Google Book Search and Geographic Entity Extraction | Main | YouTube: Hurley versus Chen »

January 28, 2007

Comments

MIke

I did a little poking around with the map app in Book Search based on your earlier post. Being an O'Brian fan, I did a search on the Aubrey/Maturin books. As it turns out, there's a location in Venezuela named "Maturin." The Google search also finds Berkeley, California, in a passage about "a young attaché called Berkeley..."

Don't get me wrong, I see a real use for this thing. We just have to remember its limitations.

Tom Carden

This reminds me of another project, GutenKarte, which maps place names found in free books using open source tools and MetaCarta's API: http://gutenkarte.org/

They have a tool here which will attempt to do the same for any web page: http://labs.metacarta.com/PageMapper/

Before Flickr had its own geotags, my colleagues built Mappr to plot photos with placenames in tags onto a map of the USA: http://www.mappr.com/

All these projects suffer from the same problems identified in the post and first comment. Mappr was interesting though because it didn't require a "place" to be blessed by the big geocoding databases - tourist trails like Route 66, or events like Burning Man, could emerge as "places" in their own right.

I wonder if the book search tools will develop in this direction, and also how they will deal with historical locations that no longer exist, or that change name. An exciting challenge!

Nick Johnson

Speaking as someone currently working on a geocoder, it's difficult enough to reliably parse addresses when they're already identified as such, and even when you can assume some vague sort of consistent format will be present. How does one determine that "4th Ave Bypass" has two suffixes ('ave' and 'bypass'), while "Lyttleton Close Road" has one? Worse, "Lyttelton Close" could refer to the street called "Lyttelton" with the suffix "close", or it could refer to the street "Lyttleton Close" with the suffix omitted. Address parsing is littered with problems and ambiguities like this, and it only gets worse if you want to recognise addresses in free text.

The comments to this entry are closed.

Twitter Updates

    follow me on Twitter

    March 2016

    Sun Mon Tue Wed Thu Fri Sat
        1 2 3 4 5
    6 7 8 9 10 11 12
    13 14 15 16 17 18 19
    20 21 22 23 24 25 26
    27 28 29 30 31    

    Categories

    Blog powered by Typepad