Last week, Google announced some astounding statistics about the success rate of their spam filtering technology in Google Mail. Google says that less than 0.1% of email in the average user’s Gmail inbox is spam, and the rate for legitimate email ending up in the spam folder is even lower at less than %0.05.
Despite these superb results, Google is continuing to innovate and find new ways of detecting and blocking spam. This year, Google has announced several innovations across their product line that utilize their machine learning technology, affectionately referred to as Google Brain. Using their artificial neural network software, Google Photos can identify objects in your photographs, like dogs or cats or bridges or birthday cakes. Google Maps can automatically detect new businesses or speed limit signs in the imagery collected by their entire fleet of Street View cars. Google Earth Engine can detect new refuge camps or deforestation boundaries by scanning for visual patterns in petabytes of satellite images.
Google is now using its artificial neural networks to fight spam. A Google spokesman explains:
Our neural net system learns based on a huge collection of example “wanted” messages and a similar body of example spam mails. The system tracks thousands of attributes of each message (for example, the words in the message or the sender’s IP address). The spam filter then uses a technique called clustering analysis to find attribute groupings which differentiate spam from wanted mail. Essentially, the spam filter finds the sneaky spam by ignoring the similarities, and focusing only on the differences. As both spam and wanted mail evolve, the system is constantly relearning this differentiation. When users report spam (or not spam) that content is fed into the system, and it learns more. Ultimately, our spam filter learns from these user reports, which is how it has improved so much in the last few years.
Specifically, Google lists three ways they are making spam detection smarter:
- Using an artificial neural network to detect and block the especially sneaky spam—the kind that could actually pass for wanted mail.
- Recognizing that not all inboxes are alike. So while your neighbor may love weekly email newsletters, you may loathe them. With advances in machine learning, the spam filter can now reflect these individual preferences.
- Rooting out email impersonation—that nasty source of most phishing scams. Thanks to new machine learning signals, Gmail can now figure out whether a message actually came from its sender, and keep bogus email at bay.
Personally, I have been using Gmail, and now Google Apps at work, for over 10 years. I can’t say that I was disappointed with its spam filtering to begin with, but the improvements this year have taken it to the next level. The one or two messages that used to slip through each week are gone. I haven’t even bothered to check the spam folder for false positives in years. Spam doesn’t exist for me anymore, and I am just fine with that. Thank you, Google!