Google Launches Message Board Search
Thanks to a tweet from Natalie, I’ve just heard that Google Groups search now includes message board data from outside Google Groups. A search for xbox, for example, provides results from blogs (e.g. Major Nelson) and forums (e.g. xbox-scene). Clearly they have some issues distinguishing between weblogs and message boards, but directionally this is very interesting.
On a related note, I’m noticing more and more results in Google’s blog search that are not from blogs, or not from blog posts. For example, a search for this blog (datamining.typepad.com) brings up this post from biofusion design. I’m on the blogroll there, but not mentioned in the post.
In our Social Streams platform work, we have gone to great pains to ensure that we type data correctly – with social media it is very important to distinguish between data types so that analytics can be tuned to account for the different semantics of different data. Classifying some html as a weblog, a message board, etc., is relatively straightforward, so I’m assuming that Google’s data leakage is the result of some pragmatic tactical decisions.
Search Engine Roundtable writes about Google’s message board search here.
Of course, check out Board Tracker for message board search.



There was a brief thread about the blogroll problem on the Google Group for blogsearch: http://groups.google.com/group/google-blog-search/browse_thread/thread/8244fc8731f47970
What I wrote there was:
"We have changed the way we index blog posts to include the full
content of the page. We've had occasional complaints about the use of
the feed content, particularly the problem with partial feeds that you
mentioned. The indexing change has improved the results for a lot of
queries, both because we have the full content of the page and because
we extract links that are missing from the feeds. The downside of
this change is that we see more results that match only the blogroll
and other parts of the page that are common to all of a blog's posts.
We expected some problems from blogroll matches, but may have
underestimated the impact on searches using the link: operator or
where the query matches a blog or blogger's name. We do expect to fix
the problem you're seeing. We'll use the full page content, but
exclude the content that isn't really part of the post. I'm not sure
if we'll be able to make the change before the end of the year, but we
are working on it and are pretty confident that it can be solved.
We'll post an update here when we've got a solution."
Posted by: Jeremy Hylton | November 20, 2008 at 02:42 PM