There have been a number of posts recently that take a look at evaluating blog search engines. Not surprisingly, this is a lot harder than it looks. To understand the problem in more detail, it is useful to try and throw out some definitions.
Firstly, what are blogs? A year or two ago, it would have been relatively easy to answer that question. However, the definition has been getting more and more muddied over time. Problems include:
- The location of the blog. If a blog is defined by a URL and I place the blog on the home page of my corporate web site, then are links to that home page links to the blog, or links to the site?
- RSS gave blogs their big break, and the use of ping servers and the aggregation of updates that they provide were a win for everyone - sort of. There is nothing in the structure of these systems which indicates if the pinger is a blog or not. Consequently, if you listen to a ping service, you will get non-blog content - e.g. discussion threads on flickr.
Any definition of blogs will also include models of link structure (trackbacks and citations), comments, blogrolls and so on. All of which affect the expectations of the user (searching for posts with comments? searching for comments on a topic?)
Secondly, what is search? Or rather, what is the intention of the searcher? For blogs, there is a large proportion of search focused on figuring out who is talking about the searcher - so called vanity searches. Due to the long tail, this type of search is very sensitive to coverage issues, which is not the case if you are looking for information. In addition, as there are types of search which are variations to those of major search engines, or entirely novel in form, the semantics of the search terms presented to blog search engines are still evolving. What does it mean to enter a URL into a search engine? Are you looking for citations of that URL? Are you looking for posts from that blog?
Thirdly, what are the results. We have all been led to expect the type of results that major search engines deliver - a ranked list of web pages. However, there are other factors that are important to rank as well as other ways in which blog data can very naturally be delivered. One obvious issue is the distinction between a search for a blog (find me a good blog on this topic) and the search for a post (find me posts that match my criteria). This underlines the fact that blogs results are often at the granularity of sub-page documents. In addition, blogs are far more timely than web data in general. Consequently, time is an important factor (witness the several blog search engines that now provide trend graphs as search results).
Fourthly, what are the quality issues for search results? These must include (at least):
- Segmentation - the separation of blog posts from the blog template and peripheral data.
- Deduplication - the filtering out of multiple copies of the same post (something you will never see on a web search engine).
- Spam filtering - the removal of spam blog data.
- Time - the accurate representation of the time of a post.
- Relevance - the boosting of results that are more relevant than others.
- Speed of query execution - how fast the results come back.
- Comprehension - how complete the coverage is.
- Time to index - how long it takes for a post to become part of the search engine's index.
- Repeatability - if I issue the same query (immediately) do I get the same result?
- Result count estimation.
In addition, the search engine must be judged on a number of service issues such as:
- Ability to request the inclusion of a blog.
- Ability to remove a blog and all posts from an index.
So how do we go about testing blog search engines? Ultimately, the quality of a search engine can only really be measured by the success or failure of a user to achieve some task, a task in which the search engine is a tool and the search results are not the final goal of the task. This means that anecdotal tests (in which one determines that one engine is better than another because they return more hits) are completely out. It also means that a representative set of tasks needs to be captured and translated into realistic queries. Note that, interestingly, the willingness of the market to provide more interesting tools than the monolithic one dimensional list of ranked results means that those who innovate with tools (like trend mining, etc.) that are accessible to users have an advantage.
Who should test search quality? Anyone involved with blog search, myself included, can not be asked to provide a complete criteria for testing. If asked I would certainly either intentionally or otherwise give a bias to the things which I think are important and which are, of course, in the system I am involved with. What is needed is an external market observer, and no doubt some money to fund the whole thing (or at least the ability to derive revenue from the process).
Comments