Human evaluation of SpotRank
This is the last of the 3 posts about SpotRank. In this one I show some evidence of the effiency of the method.
Even with a very strong analysis of the log files, it is impossible to judge the quality of the filtering of our method. Indeed, the algorithm consists in filtering news w.r.t. the way people vote, it is not content related. To cope with this issue we decided to gather some feedback from the users themselves. Since an absolute judgement is impossible to obtain without a long debate on what is the quality of a website, we choose to compare the top « stories » of three social news website. The first is of course spotrank.fr which implement the method presented in a previous post, the other two are two major competitors in the field in France. The interest of the two chosen social news website is that they use automatic method to filter news, and also that the human moderation has mainly the goal to suppress non legal content. From now on, they will be denoted as comp1 and comp2. I removed mention of the name of the competitors in order to avoid potential legal issues.
Our survey protocol is the following. To have relevant results we periodically collected the first five spot on spotrank.fr together with the top 5 of the two major french social news websites comp1 and comp2.
We then automatically generate disposable web pages containing a shuffle of this list of 15 news. Each web page is then sent to a volunteer who has to tell for each news if,
- Yes, it is relevant for the news to appear on the front page of a social news web site.
- No, it is not relevant for the news to appear on the front page of a social news web site.
- DnK, he is not able to determine if the news deserve to be on the front page or not.
- Err, The news was not accessible when he tried.
At the time I wrote this post, we collected the first five news of each web site during a period of 7 days, and 57 persons participated to the poll.
The figure above shows what could be considered as a summary of the results of the poll. For each competitor it presents the number of Yes, No, Dnk and ERR. The number of ERR that appears in surveyed people answers is not of interest since this is an external factor that applies for all three social news website. However a higher rate of error could indicates links to not reliable sites, i.e. sites containing not relevant content. Pay attention to the fact that for each competitor, each surveyed person is giving 15 answers, so the total number of answers is 855.
The important point is that SpotRank outperforms both competitors whatever the criterion. Our method received 171 Yes while comp1 and comp2 received respectively 131 and 78 Yes. The performances of comp1 (resp. comp2) are only 76.6% (resp. 45.6%) of those of SpotRank, thus surveyed people think that the ranking given by SpotRank is of higher quality than the two others. Concerning the No answer the situation is similar: this time the lower is the better since this means that the top spots are considered not legitimate and SpotRank received 78 No, while comp1 and comp2 received 112 and 123 such answers. Last, 28 DnK were received by SpotRank. This is again a better achievement than comp1 and comp2 and this means that the filtering of SpotRank gives clearer results (only a few borderline spots).