I work in local search at Microsoft which means, like all those working in this space, I have to deal with an identity crisis on a daily basis. Currently, most local search products - like Bing's and Google's - leverage multiple data sets to derive a digital model of the world that users can then interact with. In creating this digital model, multiple statements have to be conflated to form a unified representation. This can be extremely challenging for two reasons. Firstly, the system has to decided when two records are intended to denote the same real world entity. Secondly, the designers of the system have to determine what real world entities are and how to describe them.
For example, if a business moves is that the same business or the closure of one and the opening of another? What does it mean to categorize a business? The cafe in Barnes and Noble is branded Starbucks but isn't actually part of the Starbucks chain - should is surface as a separate entity or is it 'hidden' within the bookshop as an attribute ('has cafe')?
Thinking through these hard representational problems is as much part of the transformative trends going on in the tech industry as are those characterized by terms like 'big data' and 'data scientist'.
Another example can be found in my recent interaction with spotify where I have the option of the following albums from Rush:
The only distinction I can see here is, upon drilling down, the tracks appear to have been published by different sources. They are, as far as I can tell, identical in all ways. There is a slight variation in the cover art that, if you squint a little you may be able to perceive.
As the online world continues to move towards knowledge the Ph in PhD will become more and more useful.