Rexa, which shares a pedigree with Cora from Just Systems, and which provides a similar product to CiteSeer and Google Scholar is now live, according to John Langford's Machine Learning blog. Andrew McCallum, whom I worked with back at WhizBang, is the PI for this project and comments:
Rexa is a digital library covering the computer science research literature and the people who create it. Rexa is a sibling to CiteSeer, Google Scholar, Academic.live.com and the ACM Portal. It’s chief enhancement is that Rexa knows about more first-class, de-duplicated, cross-referenced object types: not only papers and their citation links, but also people, grants, topics—and in the future universities, conferences, journals, research communities, and more.
Rexa currently provides:
* Keyword search on over 7 million papers (mostly in computer science)
* Cross-linked pages for papers, authors, topics and NSF grants
* Browsing by citations, authors, co-authors, cited authors, citing authors;
(find who cites you most by clicking “Citing authors” on your home page)
* Web-2.0-style “tagging” to bookmark papers
* Automatically-gathered contact info and photos of author’s faces
* Analysis of research topics, their impact, and how they relate.Coming soon:
* Much improved coverage of recent CS papers (it’s a little weak now)
* Ability to make corrections to extracted dataComing later:
* Improved extraction and co-reference accuracy
* Much more data mining
* Broader coverage of more research fieldsRather than seeing our siblings as competitors, we believe that such services are like “newspapers for the research community”, and, just as it is tremendously important that there is not just one national newspaper, we think there should be many such services. This is especially true since increasingly they will do more than simply supply raw information, but also provide subjective analysis, pattern discovery, and predictions.
One of the key challenges that this type of vertical search has to deal with is the task of recognizing variations of named entities. A search for 'Andrew McCallum' serves as a good example of this problem. The first 4 answers refer to the same person but are listed individually. There are some great clues in the space of citation analysis that can be used to help with this problem. All the content in the papers associated with a name will have strong topical affinity suggesting that variations like 'R. Andrew McCallum', 'Andrew McCallum' and 'A. McCallum' are in fact references to the same person.
Something I've been thinking of, and I'd love to see built on top of a system like this, is a social network and commentary space that can continue the topic presented in a paper. Imagine looking up one of Andrew's papers and being able to associate follow on questions with it. Perhaps one of the authors will answer, or perhaps someone cited by that paper will make a comment.
At any rate - congratulations to Andrew in getting this out the door - the feature set looks impressive and I'm looking forward to exploring this more.
Comments