Skip to main content

Data & Intelligence

A Data Mining Approach to Spam Detection in Social Bookmarking Sites – Part 1

Social Bookmarking Sites

With the growing popularity of social bookmarking sites, spammers typically use these kind of services as a playground for their activities. As we sll know, one of the main disadvantages of Social Bookmarking Systems is Spam. The intention of spammers to use these systems is to pursue two goals:

  • Place links in the sites to attract people to advertising sites
  • Increase the PageRank of their sites by placing links in as many popular websites as possible, in order to increase their visibility in search engines

The usual counter-measures like captchas (a challenge-response test to ensure the response is not generated by a computer) are not efficient enough to effectively prevent the misuse of the system. In this 3 part series, we will take a look at a novel method using Neural Networks and Text Mining to learn a model to predict if a user is a spammer or not.

(Ref: ECML PKDD Discovery Challenge 2008)

What is social bookmarking?

Social Bookmarking is a method for users to store, organize, search and manage bookmarks of web pages with the help of tags. In a Social Bookmarking site, users save links to webpages that they want to remember or share which in turn can be public, private or shared with only specified people. Some of the popular social bookmarking sites include:

  • del.icio.us (Delicious)
  • Bibsonomy
  • Twitter
  • Digg
  • Reddit

Dataset

The training data (provided on the ECML Site) is heavily skewed – it consists of a list of 25000 spammers to 2000 non spammers.

In Part 2 of this series, we will look at the Approach, specifically related to Text Mining and the Modeling (Neural Networks) involved in order to make the predictions.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Deepak Ramanathan

More from this Author

Follow Us