I've just heard that yesterday, China altered their visa application process. Originally, a non US citizen could apply for a visa (tourist or business) in the US. As of yesterday, one has to apply at the consulate in your country of citizenship.
Why am I writing about this? There are two reasons. Firstly, I am hoping to attend the WWW conference in Beijing (21st-25th April). As the application process can not be started until 30 or fewer days prior to travel, I've just submitted my application recently. WWW this year has a great track on social media. Secondly, after a moderate amount of searching, both in the blogosphere and elsewhere, I've not actually found any documentation of this change. I'll keep searching and update if I find anything.
Update: I understand that the LA consulate will process visas, though I can't vouch for the accuracy of this yet.
Wired's latest edition has an interesting article summarizing the life and vision of Ray Kurzweil. Not present in the online version of the article is a sidebar entitled 'Never mind the singularity, here's the science'. This piece, written by Mark Anderson, states:
[P]roponents of the so-called strong-AI school believe that a sufficient number of digitally simulated neurons, running at a high enough speed, can awaken into awareness.
This is an unfortunate summarization of strong-AI as it suggest that brain simulation is identical with strong-AI. While we can recognize an intuitive path to the goals of strong-AI (a self-aware intelligence machine) through the simulation of the human brain in a very literal sense, it is more appropriate to think of the brain itself as an implementational detail. What is more interesting is to capture the fundamental truths of intelligence and self-awareness abstractly and then implement them in an appropriate manner with the tools at hand. The big difference here is that this approach leads to a deeper understanding of intelligence.
Anderson does go on to make some excellent points about the disconnect between the continuous increases in the power of machines (e.g. Moore's law) and the very discontinuous nature of the study of the brain. In other words, it doesn't matter if we have the hardware at hand if we don't understand the system that we are trying to simulate.
The Wikipedia article on this topic is pretty interesting, though the origins of the term strong AI are buried at the bottom.
Amazon's version of the Mechanical Turk is a service which distributes human judgment problems to the crowd. Dolores Labs is a new enterprise which plans to make managing, collecting, interpreting and leveraging distributable judgment problems and their answers. One can get a good idea of what they do, and for whom, from their examples page, which includes: sentiment analysis, search relevance and classification tasks.
One of the nicest illustrations of the problem space is the labeling of colours. A full description appears on their blog, which includes a pointer to their released data set.
The every increasing population of Web 2.0 technologies and applications is ripe for deeper analysis. The Web 2.0 Pattern Mining Workshop plans to do just that: bring together Web 2.0 practitioners to discuss and explore common patterns that define the space.
Web 2.0 features are now commonplace—blogs, wikis, RSS feeds, social bookmarking and the like are almost everywhere you look online.Now that these technologies are maturing, what are their common problems and challenges? How are these problems being solved? What similar challenges do Web 2.0 developers face, and how can they leverage the most common solutions?Here’s your chance to gather with other professionals facing the same issues and work together to identify solutions.
Have a look at the graph below, which shows attention around Obama or Clinton, just Clinton and just Obama.
It seems that Clinton's solo trend is relatively even compared with Obama's.
We can see some clear peaks in Obama's trend that don't correlate strongly with Clinton's. These are illustrated below.
In other words, Obama's results on Super Tuesday, his wife's comments on her husband, and the current story regarding Obama's pastor's statements are all strongly identified with him independent of Clinton. Of these events, that most strongly associated with Clinton is, not surprisingly, Super Tuesday:
Clinton has had, relative to Obama, very few independent bursts of attention.
What do these charts suggest? I'm not too sure: all news is good news (if it keeps attention on you?) Obama's increased attention precipitates surprises? Surprises keep attention on a candidate?
TechCrunch writes about SemanticHacker - a challenge put out by TextWise to see what the crowd can do with its NLP technology. On the front page they have a demo of their system, which creates 'semantic signatures' (essentially nodes from a broad hierarchical classification scheme) summarizing the content entered.
When dealing with the analysis of social media content - weblogs, usenet, etc. - one has to be very careful when transfering state of the art NLP and text mining solutions. There are a number of key reasons, two of which are: i) noisy text and ii) the relationship between document structure and the dialogue/conversation that is taking place between the author and the entire content space. This has a big impact on getting at what the document is 'about'. How do you treat quoted material? for example. [Not to mention my use of intersentential question marks...]
I took this opening paragraph, which is essentially about Microsoft and Microsoft Research:
It is almost exactly a year ago that I joined Microsoft. I was lucky enough with my timing that my first week here coincided with TechFest. TechFest is an expo put on by Microsoft Research to showcase new and ongoing innovation internally. What I remember most about that first week was how impressed I was at the diversity of work being carried out by MSR. While this event is an internal one, there is also a press day which takes some of these research projects and demonstrates them to the media. This year's press day was yesterday.
What you see here are categories and scores. Here is the explanation:
Semantic Signatures® are built from weighted concepts. This simplified display shows the concept on the left, with its respective weight on the right. The weights represent the significance of ALL topics in the block of text. For the purpose of this demo, we are only displaying the top 5 concepts. Also, the weights have been placed on a 1 through 100 scale, 100 being the highest significance possible.
They also have problems with more obvious ambiguities:
Not a promising start. Note also that the $1MM prize is paid out as $100k initially with 'up to an additional $900k during the first year after the application is released.' So the winner may only see 10% of the prize.
I'm all for more visibility for NLP in the consumer space, definitely in to semantics and the transformation of object data (text) into a logical form, so I wish TextWise all the best. That being said, I personally believe that the way to deploy large scale NLP applications in the consumer space requires a more incremental and controlled plan.
I suspect that a big piece that they are missing out on with the structure of their competition is getting the community to improve the lexical and ontological resources (e.g. to fix the ambiguity in the example above).
As a result of TechFest, the BLEWS project from Microsoft Research got a moderate amount of exposure. Some of this brought another system - Skewz.com - to my attention (both via a personal email from the Skewz team and also from reading other comparisons). Skewz is definitely worth checking out. The motivation behind the site is captured here:
Skewz was started by a group of 4 guys with diverse political views who engaged in frequent political sparring. We tired of the coarseness of the public political dialog and the tendency for both sides to talk past each other. The goal was not to make peace between liberals and conservatives. Instead, we wanted to encourage liberal-conservative dialogue by improving on the intelligence and thoughtfulness of the discussions. We hoped that doing so would take focus from the cosmetic appeal of parties and personalities that generate allegiances and place it instead on wit and wisdom of intelligent debate.
Part of the user experience is to report your own particular point of view. This is done via an interface which matches a set of issues (e.g. iraq, health care, ...) with your own particular 'skew' (left/right leaning, and by how much). Once you are logged in, you can then rate news by how you perceive the skew (do you think it is left/right leaning and by how much). The result is a digg like experience with a richer set of features (digg has, essentially, a single variable whereas Skews has at least two depending on how you count them).
While the motivation behind the site is praiseworthy, I think they the team has missed the model of opinion by a considerable margin. Rather than model user opinion as being 'left on abortion', 'right on gun control', ..., they should have captured the absolute (not relative) belief of the user ('against abortion', 'for gun control', ...). By focusing on the relative nature of partisan politics and building an interface around the two party system, they are running the risk of further institutionalizing the polarity that they set out to remove.
That being said, one has to balance the realities of political media consumption, user experience design and so on - a number of the criticisms I describe here also surface in the BLEWS interface.
Skewz certainly appears to have its marketing resources out in force. This paragraph from the their press release indicates an interesting interpretation of what we are working on, intended to differentiate Skewz:
While Blews and Skewz both categorize news stories according to their reception in the conservative and liberal blogospheres, one of the key differentiators between the sites is that Blews is one-sided, static and passive, whereas Skewz is user-driven and dynamic. Skewz lets users participate in the rating mechanism, giving the user a more interactive and comprehensive experience. Skewz stresses participation which, after all, is what politics is all about.
Let's consider these claims:
'one-sided'? we look at links from self reported left and right leaning bloggers.
'static'? we crawl the blogosphere with a continuous real-time crawl.
'passive'? users can interact using search and other controls.
Skewz claims participation - the only real difference here is that we observe participation by finding posts that link to and discuss news stories, Skewz is about direct interaction with their site.
It has been interesting and somewhat frustrating to watch the reaction to BLEWS. The recent coverage in Slashdot is a key example. It seems that there is no amount of fact that can substitute for good old prejudice and ignorance when it comes to forming and declaring opinion. Much of what is written in the comment threads and elsewhere gets distracted by the word 'Microsoft' and is ignorant of anything that we have said about the goals, mechanisms and even the state of the project.