Problems of social news websites

septembre, 2, 2009
Sylvain

This post is a part of the introduction of an academic paper coauthored by Thomas Largillier, Guillaume Peyronnet and Myself. I did some modifications, so I endorse all mistakes of this version.

In the last years, the way people interact with each others on the Web has drastically changed. Web sites now provide information which is an aggregation of user-generated content, generally filtered using social recommendation methods to suggest relevant documents to users.  The most known example of such a website is Digg. This is a social news website: people share content they found on the web through the Digg interface, then users can vote for the news they like the most. Voting for a news is then considered as a recommendation, and (according to the result of a non disclosed algorithm) news with a sufficient number of recommendations are displayed on Digg’s front page.

Digg has been launched in November 2004, and since then numerous Digg clone (generally denoted as Digg-like) were created by webmasters. This huge success can be explained by the amount of traffic such a website aggregate and redistribute.  Indeed, being on the front page of a website such as Digg seems to be very interesting since repeated testimonies amongst webmasters state that thousands of unique visitors are obtained within one day for a website on the front page of Digg (or similar sites).  Since most websites follow an economic model based on advertisement, obtaining unique visitors is the best way to improve income. It is then tempting for a user to use malicious techniques in order to obtain a good visibility for his websites.

The weaker form of malicious technique is explained in details by Kristina Lerman in this paper where she recall the Digg 2006 controversy. This controversy arises when a user posted on Digg an analysis proving that the top 30 users of Digg were responsible for disproportionate fraction of the front page (latter studies ensures that 56% of the front page belongs to the top 100 users only).
This means that the top users are acting together in order to have their stories (e.g websites they support)  obtaining the front page. The controversy led to a modification of the Digg algorithm in order to lower the power of this so-called bloc voting (collusion between a subset of users).

Since 2006, malicious users became more and more efficient (see for instance the paper of Heymann, Koutrika and Garcia-Molina). Cabals  (collusion of large group of users that vote for each others) have been automatized using daily mailing list, some users post hundreds of links in order to flood the system, others have several accounts and thus can vote for themselves (using several IP addresses) etc.  No social news website implements a certified robust voting scheme that avoid the problem of dealing with malicious users while still providing a high quality of service (i.e. providing relevant news to users). Most of them are doing something, but since the algorithms are not public, there is no way to certify the quality of service.

In France the big picture is the same (I mention France because this is the place where I live) with the most known social news website being Scoopeo, Fuzz, Wikio and probably many others (sorry if I forget your website!).

The main questions is thus: Is this possible to design techniques whose aim is not to detect and suppress malicious voting behaviors in social news website, but rather demote the effect of these behaviors, thus leading to lower the interest of such manipulations for spammers. In a next post I explain the SpotRank method we design together with Thomas and Guillaume and show some experimental results that make me think that the answer to the previous question is « yes, we can! ».

Picture: courtesy of Abby Blank