Current systems for ranking blogs are largely about inlinks. Technorati and BlogPulse both use this basic measure of citation to create their lists; TechMeme - whose new list created plenty of discussion on the topic - takes the algorithm it uses for placing stories on its home page (essentially, another citation based approach) and aggregates visibility information. Additional features to consider include the number of feed subscribers and the number of visitors to the blog site. However, there are plenty of alternative approaches to creating a list of important blogs.
The above approaches are motivated by some (vague) notion of influence - a term that is central to the analysis of social media and blogs in particular, but one which has not really been given a full, well grounded definition in the space. However, there is also the issue of reader efficiency - ensuring that the consumer of blog data maximises the value they get from reading blogs.
A group of researchers at CMU have been considering a notion of blog importance based on how likely a set of blogs is to ensure that you will be informed of topics bursting in the blogosphere. By analogy, they consider a graph of water pipelines. Their paper - Cost-Effective Outbreak Detection in Networks Leskovec, Krause, Guestrin, Faloutsos, VanBriesen, Glance - poses the problem:
Given a water distribution network, where should we place sensors to quickly detect contaminants? Or, which blogs should we read to avoid missing important stories? These seemingly different problems share common structure: Outbreak detection can be modeled as selecting nodes (sensor locations, blogs) in a network, in order to detect the spreading of a virus or information as quickly as possible.
As a result of this work, the authors have published some blog lists which answer a fundamentally important question in terms of weblog reading habits: Which weblogs should I read to be most up to date? The lists answering this question - generated by the approach described in their paper - come in a number of varieties to be found on the project's page.
Highlights from the work include the top 10 and bottom 10 from the list of blogs to read to be the most up to date on stories if you only have time to read 100 blogs. It must be noted that this work is a theoretical exploration - the dataset mined to create the list is not a live corpus of blogs; thus some of the blogs may be stale or even abandoned.
Note that another view of the data - which blogs to read if you can only read 500 posts - generates quite a different list of blogs.