My Photo

 

  • Subscribe with Kindle

« September 2008 | Main | November 2008 »

October 30, 2008

CIKM Search for Social Media Position Papers

Abdur Chowdhury Chief Scientist at Twitter (formerly at Summize).

Components in products (that lead to success?)

  • Search – not ‘normal’ search behaviour; e.g. search for wwdc repeatedly looking to see what people are tweeting about the query term. #pdc2008 example ‘pretty cool’.
  • UGC - (user generated content) ; jamesbuck example - ‘arrested’
  • Self organization – organize content in ways that creators of system didn’t expect or intend; hash tags gone from 1% in Jan to 2% in Sep
  • Social Connections – using search to find other people
  • Real-time – can you guess; yes, it’s the earthquake example.

Markus Bylund: Privacy. Core idea is to provide individuals with a mirror of their online social presence so that they can learn and adapt their persisted digital presence.

Ed Chi: Introducing Mr. Taggy – a tag driven search engine. Allows for refining search via suggested tags associated with the results being served – looking forward to this coming out. http://mrtaggy.com (not live yet). I believe the data is from delicious.

Ed then talked about ‘what is social search’. They did a survey using mechanical turk to find out about people’s recent search experiences. Talks about extended context of search

  1. conversation provides reason/topic of search
  2. search engine only sees the specific keywords with no understanding of the social context
  3. user gets results and then shares with the people involved in the discussion

The MT survey produced 150 critical-incident reports about search experience. Search is either externally-motivated (70%) or self-motivated (30%); 43% had a social interaction while ‘refining the requirements’;

Marti Hearst

“Social” Terms

  • Sharing – often really means ‘not hiding’ or ‘making available’; not really sharing
  • Collaboration
  • Community – most social networks are not
  • This workshop is called “search on social media”

What are blog searchers’ goals?

Mishne et al: 52% on named entities (mainly looking for current discussion).

  • like news, in that it is current
  • but different, in that it is personal
  • so … finding opinions and thoughts on current events is a major goal
  • but … may also want to look this up after the fact.

Blogs have personality and style (serious, sympathetic, humorous); nice to be able to select a blog based on such authorial facets.

Marc Smith (now Chief Scientist at Telligent): Leveraging Social Context for Searching Social Media. Marc spoke about analysing graphs and using that analysis to improve search.

Search and Social Media, CIKM 2008 – rough notes from keynote by Andrew Tomkins

Some stats setting the landscape

Ramakrishnan and Tomkins 2007 stats:

  • professional content per day ~2GB,
  • UGC per day ~8-10GB,
  • private text per day ~3TB (e.g. email, etc.).

Meta data (per day):

  • anchortext 100MB,
  • tags 40MB,
  • pageviews 180GB,
  • reviews 10MB.

Pageviews is the next thing on the stack to really leverage in search. This is gathered via toolbars which raise privacy concerns.

Blogosphere is order 1MM posts perday; Twitter, with only 2MM users, generates 3MM messages per day.

Facebook/MySpace: 100-150MM users each, FB users 19 mins per day.

Delicious: 5MM users, 150MM URLs (from audience, should be 500MM URLs, 25% of which is spam).

Photos: 2B on flickr, 4B on facebook (from audience 10B), 2B on photobucket, Flickr and photobucket have 30MM uniques/month

Videos: 150MM on youtube, 200k/day, 50MM uniques/month

Yahoo answers: 90MM users, 200MM answers?

Should academics work on core web search?

  • acadmics cannot address core web search on a level playing field today (can we help?)

However – social media search is a place where academics can work

  • relevance models largely unformulated
  • interface in fluctuation
  • data model not completely worked out
  • data publicly available

Social networks and social media

  • dynamics of social media increasingly visible at the event level
  • model evaluation can move from aggregate statistics over snapshots of likelihood
  • forthcoming discussion explores this idea for social network dynamics: important to understand user-level features in social search

Andrew gave an overview of a recent KDD paper:  Microscopic Evolution of Social Networks, Leskovec, Backstrom, Kumar and Tomkins, KDD 2008

Three key challenges in search:

  1. Optimizing task-aware relevance: move search from stateless query-response to modelling and satisfying long-running user tasks.
    1. For pure social media search, this problem is acute
    2. For web search, social media is the key corpus
  2. Grid-based content analysis: move from more bags of words to richer algorithms for content analysis expressed as scans of sorts
    1. same problem exists for analysis of social media
  3. Measure, predict, and generate engagement: build the science of how users singly and jointly develop passion for new classes of activities
    1. Social media environments are the right starting point to tackle this problem

Trends in search and social media

  • Search in the east: heavily influenced by social media:
    • knowledge search,
    • groups,
    • combo experiences typically over O&O and licensed content,
    • typically not deeply integrated
  • Search in the west:
    • significant content licensing industry, but typically around traditional media
    • social media mostly crawlable, integrated in search repositories
    • group publishing0->personal publishing?
    • BBS –> comments?
  • General trends
    • Two opposite trends in search of social media:
      • moving towards point relevance (answers, knowledge search)
      • Moving towards browse experience (entertainment and serendipity)
        • subscription to trusted sources

Challenges: relevance in social media search

  • pure relevance is a key challenge for these domains
    • blogs
    • forums
    • twitter
    • vitality
    • tagged content
    • wikipedia
    • social networks
    • mailing lists
    • groups

Challenges: per-media intent fulfillment

  • No significant work on user models for satisfying intent fulfillment

Challenges: uniform retrieval models

  • Assume we know how to search blogs and forums, friendfeed, etc.
  • can we develop retrieval models across social media formats:
    • robust incorporation of author authority
    • uniform handling of time
    • uniform handling of hierarchical data models
    • etc.

Challenges: network modeling

  • social network modeling is arguably a success story in understanding social media
  • the innovations happening in social media are not built on this literature
  • are there “killer apps” for network models (search or other)?

Interactive Visualization of Blog Search

Briefly, an interesting blog (via Twingly) recording progress in a project to create a new user experience for blog search.

Angels and Demons (and Photosynth)

image Sony Pictures have partnered up with MSN and are promoting the movie Angels and Demons with a competition which uses Photosynth (ex Live Labs). The initial site is up, which includes the trailer as well as a large synth of St. Peter’s Basilica. More details we be revealed on the site as the launch of the film approaches.

The competition is described on the site as follows:

In Angels & Demons, Robert Langdon discovers evidence of the resurgence of a secret brotherhood know as the Illuminati. The four elements Earth, Air, Fire and Water are key clues that will lead Langdon on a path to Rome with the hopes of stopping an unthinkable crime.

Solve the puzzles in the Path Of Illumination Contest and you could win a trip to Rome and more!

The contest begins January 30, 2009.

October 27, 2008

Visualizing Tax Policy

William’s been doing some nice work on visualizing the tax policies of the two US presidential candidates: here, here, and here. For readers not in the US, note that the term ‘socialism’ is considered a scary word over here, and – at least in the mainstream media – the frequency with which a pundit uses it is in indirect proportion to the speakers understanding of its meaning. Note also that there is an idea over here that taxes are a form of punishment (at least, this is how Joe “Wurzelbacher” the Plumber expresses his relationship with this basic instrument of government).

image

October 25, 2008

ASCII Art 2.0

Jeff at Neoformix has been having fun creating what he calls word portraits. Here is what he has for Obama.

image

October 24, 2008

Memetracker

Jure just pinged me about a new project – Memetracker.org - he’s been working on with Lars Backstrom and Jon Kleinberg. The project analyses content for quotes and then displays them temporally using an interactive stacked plot. In their own words:

MemeTracker builds maps of the daily news cycle by analyzing around 900,000 news stories per day from 1 million online sources, ranging from mass media to personal blogs.

We track the quotes and phrases that appear most frequently over time across this entire online news spectrum. This makes it possible to see how different stories compete for news coverage each day, and how certain stories persist while others fade quickly.

I’m assuming that they mean they analyse 900k articles from 1 million sources including MSM and weblogs (not 900k news articles).

This is an interesting project, but I’m not a big fan of stacked plots. Peaks in the data for a variable may appear as artifacts of (the aggregates) of other variables, so while they are good at showing overall trending, they are poor at showing trending for individual items.

image

See Google’s InQuotes for related stuff.

October 23, 2008

ICWSM 2009 Speakers

Our website has been updated with the keynote/invited speakers. We are really honoured to have Jon Kleinberg, Lillian Lee and Duncan Watts appearing at the conference.

October 22, 2008

Photosynth Receives Award

Photosynth is a winner of the PC Magazine’s Annual Technical Excellence Award for software.

The hints about the Photosynth technology from Microsoft Live Labs tantalized for months, and the release of Photosynth this summer did not disappoint. It takes your collection of photographs, identifies the features in each, compares all of them for overlap, and links those spots to create a three-dimensional position. What you end up with is a single 3D-esque shot compiled from all the smaller shots. You can even use it to generate a "synth" of an object. Photosynth uses Seadragon technology, which is neat in its own right: Seadragon innovates on "zoom" relationships. It's how you keep drilling in smoothly to ever more detailed visual information in WorldWide Telescope.—EG

Congrats to the team.

[PC Magazine might like to check up on the progeny of our technology though – WWT doesn’t use Seadragon as Jonathan Fey explains in this interview.]

Political Streams Update; ACORN Boating

We experienced a hiccup early this morning, but the site is back up – hopefully this wasn’t too disruptive.

While we don’t provide any supported API’s right now, you can persuade a trend chart out of the site. Here’s one showing the attention around ACORN.

Twitter Updates

    follow me on Twitter

    July 2009

    Sun Mon Tue Wed Thu Fri Sat
          1 2 3 4
    5 6 7 8 9 10 11
    12 13 14 15 16 17 18
    19 20 21 22 23 24 25
    26 27 28 29 30 31  

    Categories

    Blog powered by TypePad