The web search industry is making great progress in transitioning from building tools for finding pages and sites to building tools that leverage and surface facts and knowledge. The local search space - where I work in Bing - is founded on structured knowledge - the entity data that represents businesses and other things that necessarily have a location, and is a core piece of the knowledge space required for this future.
Over the past few years, my team has been working on mining the web for information about local entities. This data now helps to power a significant percentage of local search interactions in a number of countries around the world.
As we have been working on this system, we have come to think deeply about how to build systems for web mining, but also how to construct efficient developer workflows and how to add data management components to these systems to take advantage of human input when appropriate.
These processes constitute what I term Agile Web Mining, the core principles of which are: optimize for developer productivity, optimize for data management and invest in low latency systems. So much of what we hear about in the industry currently revolves around very large data sets (big data) which often entail long processing times and high latency interactions. In contrast, we tend to think of our data in a different way, where the size of data is relatively small (on the order of the size of a web site), but where there are many examples of these small data sets.
We are currently growing our team, and so if you are interested in learning more about Agile Web Mining, please get in touch with me.
Does Bing allow for remote developers?
Posted by: Pies | October 31, 2015 at 06:58 PM
Great hat!!, Agile Web Mining is something that Interests me as a high level concept so dial me in I am ready to start filtering the details. Do you have any resources to suggest, that I could use for a deep dive to get acclimated with your perspective before kicking off the dialogue? Thanks for reaching out. Richard
Posted by: RichardASmith | November 04, 2015 at 09:36 AM