Some stats setting the landscape
Ramakrishnan and Tomkins 2007 stats:
- professional content per day ~2GB,
- UGC per day ~8-10GB,
- private text per day ~3TB (e.g. email, etc.).
Meta data (per day):
- anchortext 100MB,
- tags 40MB,
- pageviews 180GB,
- reviews 10MB.
Pageviews is the next thing on the stack to really leverage in search. This is gathered via toolbars which raise privacy concerns.
Blogosphere is order 1MM posts perday; Twitter, with only 2MM users, generates 3MM messages per day.
Facebook/MySpace: 100-150MM users each, FB users 19 mins per day.
Delicious: 5MM users, 150MM URLs (from audience, should be 500MM URLs, 25% of which is spam).
Photos: 2B on flickr, 4B on facebook (from audience 10B), 2B on photobucket, Flickr and photobucket have 30MM uniques/month
Videos: 150MM on youtube, 200k/day, 50MM uniques/month
Yahoo answers: 90MM users, 200MM answers?
Should academics work on core web search?
- acadmics cannot address core web search on a level playing field today (can we help?)
However – social media search is a place where academics can work
- relevance models largely unformulated
- interface in fluctuation
- data model not completely worked out
- data publicly available
Social networks and social media
- dynamics of social media increasingly visible at the event level
- model evaluation can move from aggregate statistics over snapshots of likelihood
- forthcoming discussion explores this idea for social network dynamics: important to understand user-level features in social search
Andrew gave an overview of a recent KDD paper: Microscopic Evolution of Social Networks, Leskovec, Backstrom, Kumar and Tomkins, KDD 2008
Three key challenges in search:
- Optimizing task-aware relevance: move search from stateless query-response to modelling and satisfying long-running user tasks.
- For pure social media search, this problem is acute
- For web search, social media is the key corpus
- Grid-based content analysis: move from more bags of words to richer algorithms for content analysis expressed as scans of sorts
- same problem exists for analysis of social media
- Measure, predict, and generate engagement: build the science of how users singly and jointly develop passion for new classes of activities
- Social media environments are the right starting point to tackle this problem
Trends in search and social media
- Search in the east: heavily influenced by social media:
- knowledge search,
- groups,
- combo experiences typically over O&O and licensed content,
- typically not deeply integrated
- Search in the west:
- significant content licensing industry, but typically around traditional media
- social media mostly crawlable, integrated in search repositories
- group publishing0->personal publishing?
- BBS –> comments?
- General trends
- Two opposite trends in search of social media:
- moving towards point relevance (answers, knowledge search)
- Moving towards browse experience (entertainment and serendipity)
- subscription to trusted sources
- Two opposite trends in search of social media:
Challenges: relevance in social media search
- pure relevance is a key challenge for these domains
- blogs
- forums
- vitality
- tagged content
- wikipedia
- social networks
- mailing lists
- groups
Challenges: per-media intent fulfillment
- No significant work on user models for satisfying intent fulfillment
Challenges: uniform retrieval models
- Assume we know how to search blogs and forums, friendfeed, etc.
- can we develop retrieval models across social media formats:
- robust incorporation of author authority
- uniform handling of time
- uniform handling of hierarchical data models
- etc.
Challenges: network modeling
- social network modeling is arguably a success story in understanding social media
- the innovations happening in social media are not built on this literature
- are there “killer apps” for network models (search or other)?
Matt - -thanks for the summary. One question: what did he mean by "relevance modes largely unformulated"?
Posted by: Jon Elsas | October 31, 2008 at 09:15 AM