A Data Mining Approach to Spam Detection in Social Bookmarking Sites

Social Bookmarking Sites

With the growing popularity of social bookmarking sites, spammers typically use these kind of services as a playground for their activities. As we sll know, one of the main disadvantages of Social Bookmarking Systems is Spam. The intention of spammers to use these systems is to pursue two goals:

Place links in the sites to attract people to advertising sites
Increase the PageRank of their sites by placing links in as many popular websites as possible, in order to increase their visibility in search engines

The usual counter-measures like captchas (a challenge-response test to ensure the response is not generated by a computer) are not efficient enough to effectively prevent the misuse of the system. In this 3 part series, we will take a look at a novel method using Neural Networks and Text Mining to learn a model to predict if a user is a spammer or not.

(Ref: ECML PKDD Discovery Challenge 2008)

Revolutionize Your Business With Generative AI

From product design and software development to virtual agents, content creation, and reporting, GenAI is transforming business. Our AI experts help you unlock GenAI’s full potential and drive growth.

Let’s Get Started

What is social bookmarking?

Social Bookmarking is a method for users to store, organize, search and manage bookmarks of web pages with the help of tags. In a Social Bookmarking site, users save links to webpages that they want to remember or share which in turn can be public, private or shared with only specified people. Some of the popular social bookmarking sites include:

del.icio.us (Delicious)
Bibsonomy
Twitter
Digg
Reddit

Dataset

The training data (provided on the ECML Site) is heavily skewed – it consists of a list of 25000 spammers to 2000 non spammers.

In Part 2 of this series, we will look at the Approach, specifically related to Text Mining and the Modeling (Neural Networks) involved in order to make the predictions.

A Data Mining Approach to Spam Detection in Social Bookmarking Sites – Part 1

by Deepak Ramanathan on July 22nd, 2011 | ~ minute read

Revolutionize Your Business With Generative AI

Tags

Leave a Reply

Deepak Ramanathan

Categories

Follow Us